Monday, February 9, 2009

Synthetic feasibility in Chemical informatics

Speaker domain: Dr. Watson started with chemistry, moved to pharmaceutical and photographic system. Presently working in machine learning.

Background:
Recent pharmaceutical industry has been investing $43 billion per year and huge wave of patent expiration is looming in the near future. Also pharmaceutic industry lacks the advanced IT infrastructure for simulations and experiments. One such example is “Chemical information processing”. Speaker's work involve study of quantitative structure activity relationship which uses various machine learning and virtual screening approaches. Evolution of new feasible structure needs some sampling. It uses parallel sphere exclusion for oversampling of molecule.

Synthetic feasibility assessment:
What: It is an a-priori estimate of the difficulty involved with making one or a set of small organic molecules. This estimation/prediction approach can use either knowledge based approach or computation based approach. But each have its own share of problem.
Knowledge base system might not be a feasible way because of:
1.It can not be annotated.
2.Knowledge base might be incomplete
3.Everything in chemistry and reaction is having exception.
So obvious choice is to use computation model for making prediction but there are few other problems associated with Computation model.
1.hard to make
2.not feasible some time.

To overcome this a new knowledge based approach is used which bases abound the theory that existing body of molecule implicitly reflects the difficulty of making those molecules. Model extrapolate to molecule substructure and then decompose molecular sub structure and look at small sub problem in the data base. Database uses concept of concentric fingerprinting to store the information in the database because concentric fingerprinting are easy to canonicalize. Concentric fingerprinting is done on basis of radius of various sub structure.
A missing molecule key shows that molecule is never made while a small radius miss is more important then large molecule miss.
Summary:
Body of already-made molecules inherently embodies all concepts of difficulty, cost, toxicity and other attributes.
Difficult parts of molecule provide example of precedent molecules.

Future work:
Personalized medicine on basis of gene analysis.
Biggest challenge is evaluation and rules for side effects which increase risk component.
How work in gene informatics can be used in chem informatics.

No comments:

Post a Comment