Chemoinformatics Tools (2016 in publications and Patents)
38) Spirochromone-chalcone conjugates as antitubercular agents: synthesis, bio evaluation and molecular modeling studies
M Muthukrishnan, Mohammad Mujahid, Perumal Yogeeswari, Sriram Dharmarajan, Murali Basavanag, Erik Díaz-Cervantes, Luis Bahena, Juvencio Robles, Rajesh G. Gonnade, Karthikeyan M and Renu Vyas, RSC Adv., 2015, DOI: 10.1039/C5RA21737G Received 18 Oct 2015, Accepted 30 Nov 2015
A new series of spirochromone annulated chalcone conjugates were synthesized and evaluated for their antitubercular activity against Mycobacterium tuberculosis H37Rv strain. These compounds were subjected to molecular modeling studies using docking and chemoinformatics based approaches. The docking simulations were performed against a range of known receptors for chalcone derived compounds to reveal MTB phosphotyrosine phosphatase B [MtbPtpB] protein as the most probable target based on the high binding affinity scores. Five compounds exhibit significant inhibition, showing minimum inhibitory concentration values i.e. MIC ranging from 3.13-12.5 µg/ml. Further analysis of the synthesized compounds library with known and in-house developed chemoinformatics tools unequivocally established their potential as anti-tubercular compounds. QSAR modeling revealed a quantitative relationship between biological activities and frontier molecular orbital energies of synthesized compounds. The predictive model can be employed further for virtual screening of new compounds in this series.
37. Role of Open Source Tools and Resources in Virtual Screening for Drug Discovery,Combinatorial chemistry & high throughput screening 18(6): 528 – 543 (2015)Muthukumarasamy Karthikeyan and Renu Vyas. DOI: 10.2174/1386207318666150703111911
ABSTRACT: Advancement in chemoinformatics research in parallel with availability of high performance computing platform has made handling of large scale multi-dimensional scientific data for high throughput drug discovery easier. In this study we have explored publicly available molecular databases with the help of open-source based integrated in-house molecular informatics tools for virtual screening. The virtual screening literature for past decade has been extensively investigated and thoroughly analyzed to reveal interesting patterns with respect to the drug, target, scaffold and disease space. The review also focuses on the integrated chemoinformatics tools that are capable of harvesting chemical data from textual literature information and transform them into truly computable chemical structures, identification of unique fragments and scaffolds from a class of compounds, automatic generation of focused virtual libraries, computation of molecular descriptors for structure-activity relationship studies, application of conventional filters used in lead discovery along with in-house developed exhaustive PTC (Pharmacophore, Toxicophores and Chemophores) filters and machine learning tools for the design of potential disease specific inhibitors. A case study on kinase inhibitors is provided as an example.
36. ChemScreener: A Distributed Computing Tool for Scaffold based Virtual Screening,Combinatorial chemistry & high throughput screening 18(6): 544 – 561 (2015)Muthukumarasamy Karthikeyan, Deepak Pandit and Renu Vyas. DOI: 10.2174/1386207318666150703112242
ABSTRACT: In this work we present ChemScreener, a Java-based application to perform virtual library generation combined with virtual screening in a platform-independent distributed computing environment. ChemScreener comprises a scaffold identifier, a distinct scaffold extractor, an interactive virtual library generator as well as a virtual screening module for subsequently selecting putative bioactive molecules. The virtual libraries are annotated with chemophore-, pharmacophore- and toxicophore-based information for compound prioritization. The hits selected can then be further processed using QSAR, docking and other in-silico approaches which can all be interfaced within the ChemScreener framework. As a sample application, in this work scaffold selectivity, diversity, connectivity and promiscuity towards six important therapeutic classes have been studied. In order to illustrate the computational power of the application, 55 scaffolds extracted from 161 anti-psychotic compounds were enumerated to produce a virtual library comprising 118 million compounds (17 GB) and annotated with chemophore, pharmacophore and toxicophore based features in a single step which would be non-trivial to perform with many standard software tools today on libraries of this size.
35. Prediction of Bioactive Compounds Using Computed NMR Chemical Shifts,Combinatorial chemistry & high throughput screening 18(6): 562 – 576 (2015) Muthukumarasamy Karthikeyan, Pattuparambil Ramanpillai Rajamohanan and Renu Vyas. DOI: 10.2174/1386207318666150703113312
ABSTRACT: NMR based chemical shifts are an important diagnostic parameter for structure elucidation as they capture rich information related to conformational, electronic and stereochemical arrangement of functional groups in a molecule which is responsible for its activity towards any biological target. The present work discusses the importance of computing NMR chemical shifts from molecular structures. The NMR chemical shift data (experimental or computed) was used to generate fingerprints in binary formats for mapping molecular fragments (as descriptors) and correlating with the bioactivity classes. For this study, chemical shift data derived binary fingerprints were computed for 149 classes and 4800 bioactive molecules. The sensitivity and selectivity of fingerprints in discriminating molecules belonging to different therapeutic categories was assessed using a LibSVM based classifier. An accuracy of 82% for proton and 94% for carbon NMR fingerprints were obtained for anti-psoriatic and anti-psychotic molecules demonstrating the effectiveness of this approach for virtual screening.
34. Protein Ligand Complex Guided Approach for Virtual Screening,Combinatorial chemistry & high throughput screening 18(6): 577 – 590 (2015) Muthukumarasamy Karthikeyan, Deepak Pandit and Renu Vyas. DOI: 10.2174/1386207318666150703112620
ABSTRACT: The target ligand association data is a rich source of information which is not exploited enough for drug design efforts in virtual screening. A java based open-source toolkit for Protein Ligand Network Extraction (J-ProLiNE) focused on protein-ligand complex analysis with several features integrated in a distributed computing network has been developed. Sequence alignment and similarity search components have been automated to yield local, global alignment scores along with similarity and distance scores. 10000 proteins with co-crystallized ligands from pdb and MOAD databases were extracted and analyzed for revealing relationships between targets, ligands and scaffolds. Through this analysis , we could generate a protein ligand network to identify the promiscuous and selective scaffolds for multiple classes of proteins targets. Using J-ProLiNE we created a 507 x 507 matrix of protein targets and native ligands belonging to six enzyme classes and analyzed the results to elucidate the protein-protein, protein-ligand and ligand-ligand interactions. In yet another application of the J-ProLiNE software, we were able to process kinase related information stored in US patents to construct disease-gene-ligand-scaffold networks. It is hoped that the studies presented here will enable target ligand knowledge based virtual screening for inhibitor design.
33. MegaMiner: A Tool for Lead Identification Through Text Mining Using Chemoinformatics Tools and Cloud Computing Environment,Combinatorial chemistry & high throughput screening 18(6): 591 – 603 (2015) Muthukumarasamy Karthikeyan, Yogesh Pandit, Deepak Pandit and Renu Vyas. DOI: 10.2174/1386207318666150703113525
ABSTRACT: Virtual screening is an indispensable tool to cope with the massive amount of data being tossed by the high throughput omics technologies. With the objective of enhancing the automation capability of virtual screening process a robust portal termed MegaMiner has been built using the cloud computing platform wherein the user submits a text query and directly accesses the proposed lead molecules along with their drug-like, lead-like and docking scores. Textual chemical structural data representation is fraught with ambiguity in the absence of a global identifier. We have used a combination of statistical models, chemical dictionary and regular expression for building a disease specific dictionary. To demonstrate the effectiveness of this approach, a case study on malaria has been carried out in the present work. MegaMiner offered superior results compared to other text mining search engines, as established by F score analysis. A single query term 'malaria' in the portlet led to retrieval of related PubMed records, protein classes, drug classes and 8000 scaffolds which were internally processed and filtered to suggest new molecules as potential anti-malarials. The results obtained were validated by docking the virtual molecules into relevant protein targets. It is hoped that MegaMiner will serve as an indispensable tool for not only identifying hidden relationships between various biological and chemical entities but also for building better corpus and ontologies.
32. Design and Development of ChemInfoCloud: An Integrated Cloud Enabled Platform for Virtual Screening,Combinatorial chemistry & high throughput screening 18(6): 604 – 619 (2015) Muthukumarasamy Karthikeyan, Deepak Pandit, Arvind Bhavasar and Renu Vyas. DOI: 10.2174/1386207318666150703113656
ABSTRACT: The power of cloud computing and distributed computing has been harnessed to handle vast and heterogeneous data required to be processed in any virtual screening protocol. A cloud computing platform ChemInfoCloud was built and integrated with several chemoinformatics and bioinformatics tools. The robust engine performs the core chemoinformatics tasks of lead generation, lead optimisation and property prediction in a fast and efficient manner. It has also been provided with some of the bioinformatics functionalities including sequence alignment, active site pose prediction and protein ligand docking. Text mining, NMR chemical shift (1H, 13C) prediction and reaction fingerprint generation modules for efficient lead discovery are also implemented in this platform. We have developed an integrated problem solving cloud environment for virtual screening studies that also provides workflow management, better usability and interaction with end users using container based virtualization, OpenVz.
31. Editorial (Thematic Issue: Role of Data and Methods in Chemoinformatics for Virtual Screening),Combinatorial chemistry & high throughput screening 18(7): 622 - 623 (2015)Muthukumarasamy Karthikeyan and Renu Vyas. DOI: 10.2174/138620731807150903101821
30. Pharmacophore and Docking Based Virtual Screening of Validated Mycobacterium tuberculosis Targets,Combinatorial chemistry & high throughput screening 18(7): 624 – 637 (2015) Renu Vyas, Muthukumarasamy Karthikeyan, Ganesh Nainaru and Murugan Muthukrishnan. DOI: 10.2174/1386207318666150703112759
ABSTRACT: Target based virtual screening has surpassed ligand based virtual screening methods in the recent past mainly as it provides more clues regarding intermolecular interactions and takes into consideration the flexible receptor as well. The current methodology describes a computational strategy of predicting Mycobacterium tuberculosis (M. tuberculosis) binders for five well studied targets representing M. tuberculosis proteome encompassing most of the known mechanisms of action. The diversity of the targets was affirmed by their active site analysis and structural studies. The current approach employed pharmacophore searching, docking and clustering techniques in tandem and was validated by enrichment studies using the available Schrödinger data set consisting of 1000 decoys. The application of this methodology was demonstrated by predicting potential molecular targets for fifty newly synthesized compounds. Cross docking studies on the targets were carried out with 4512 known inhibitors utilizing a high performance computing platform to reveal underlying affinity and promiscuity patterns. Optimum binding energy range for all targets as determined by high throughput docking was found to be -3 to -13 kcal/mol.
29. Role of Chemical Reactivity and Transition State Modeling for Virtual Screening,Combinatorial chemistry & high throughput screening 18(7): 638 – 657 (2015) Muthukumarasamy Karthikeyan, Renu Vyas, Sanjeev S. Tambe, Deepthi Radhamohan and Bhaskar D Kulkarni.
ABSTRACT: Every drug discovery research program involves synthesis of a novel and potential drug molecule utilizing atom efficient, economical and environment friendly synthetic strategies. The current work focuses on the role of the reactivity based fingerprints of compounds as filters for virtual screening using a tool ChemScore. A reactant-like (RLS) and a product- like (PLS) score can be predicted for a given compound using the binary fingerprints derived from the numerous known organic reactions which capture the molecule-molecule interactions in the form of addition, substitution, rearrangement, elimination and isomerization reactions. The reaction fingerprints were applied to large databases in biology and chemistry, namely ChEMBL, KEGG, HMDB, DSSTox, and the Drug Bank database. A large network of 1113 synthetic reactions was constructed to visualize and ascertain the reactant product mappings in the chemical reaction space. The cumulative reaction fingerprints were computed for 4000 molecules belonging to 29 therapeutic classes of compounds, and these were found capable of discriminating between the cognition disorder related and anti-allergy compounds with reasonable accuracy of 75 % and AUC 0.8. In this study, the transition state based fingerprints also were developed and used effectively for virtual screening in drug related databases. The methodology presented here provides an efficient handle for the rapid scoring of molecular libraries for virtual screening.
28. A Study of Applications of Machine Learning Based Classification Methods for Virtual Screening of Lead Molecules,Combinatorial chemistry & high throughput screening 18(7): 658 – 672 (2015) Renu Vyas, Sanket Bapat, Esha Jain, Sanjeev S. Tambe, Muthukumarasamy Karthikeyan and Bhaskar D Kulkarni. DOI: 10.2174/1386207318666150703112447
ABSTRACT: The ligand-based virtual screening of combinatorial libraries employs a number of statistical modeling and machine learning methods. A comprehensive analysis of the application of these methods for the diversity oriented virtual screening of biological targets/drug classes is presented here. A number of classification models have been built using three types of inputs namely structure based descriptors, molecular fingerprints and therapeutic category for performing virtual screening. The activity and affinity descriptors of a set of inhibitors of four target classes DHFR, COX, LOX, NMDA have been utilized to train a total of six classifiers viz. Artificial Neural Network(ANN), k nearest neighbor (k-NN ), Support Vector Machine (SVM), Naïve Bayes (NB), Decision Tree(DT) and Random Forest(RF). Among these classifiers, the ANN was found as the best classifier with an AUC of 0.9 irrespective of the target. New molecular fingerprints based on pharmacophore, toxicophore and chemophore (PTC), were used to build the ANN models for each dataset. A good accuracy of 87.27% was obtained using 296 chemophoric binary fingerprints for the COX-LOX inhibitors compared to pharmacophoric (67.82 %) and toxicophoric (70.64 %). The methodology was validated on the classical Ames mutagenecity dataset of 4337 molecules. To evaluate it further, selectivity and promiscuity of molecules from five drug classes viz. anti-anginal, anti-convulsant, anti-depressant, anti-arrhythmic and anti-diabetic was studied. The TPC fingerprints computed for each category were able to capture the drug-class specific features using the k-NN classifier. These models can be useful for selecting optimal molecules for drug design.
27. Chemoinformatics Approach for Building Molecular Networks from Marine Organisms,Combinatorial chemistry & high throughput screening 18(7): 673 – 684 (2015) Muthukumarasamy Karthikeyan, Deepika Nimje, Rakhi Pahujani, Kushal Tyagi, Sanket Bapat, Renu Vyas and Krishna Pillai Padmakumar. DOI: 10.2174/1386207318666150703112950
ABSTRACT: Natural products obtained from marine sources are considered to be a rich and diverse source of potential drugs. In the present work we demonstrate the use of chemo informatics approach for the design of new molecules inspired by molecules from marine organisms. Accordingly we have assimilated information from two major scientific domains namely chemoinformatics and biodiversity informatics to develop an interactive marine database named MIMMO (Medicinally Important Molecules from Marine Organisms). The database can be queried for species, molecules, scaffolds, drugs, diseases and associated cumulative biological activity spectrum along with links to the literature resources. Molecular informatics analysis of the molecules obtained from MIMMO was performed to study their chemical space. The distinct skeletal features of the biologically active compounds isolated from marine species were identified. Scaffold molecules and species networks were created to identify common scaffolds from marine source and drug space. An analysis of the entire molecular data revealed a unique list of around 2000 molecules from which ten most frequently occurring distinct scaffolds were obtained.
31.12.2014 G06F 19/00 PCT/IB2014/062585 COUNCIL OF SCIENTIFIC & INDUSTRIAL RESEARCH KARTHIKEYAN, Muthukumarasamy
The invention discloses a method to generate and analyze NMR chemical shift based binary fingerprints for virtual high throughput screening in drug discovery. Further, the invention provides a method to analyze NMR chemical shifts based binary fingerprints that has implications for encoding several properties of a molecule besides the basic framework or scaffold and determine its propensity towards a particular bioactivity class.
Textbook: Text Book : Practical Chemoinformatics , Springer 2014 (ISBN: 978-81-322-1779-4) http://www.springer.com/chemistry/book/978-81-322-1779-4
23] Ch 1.
Open-Source Tools, Techniques, and Data in Chemoinformatics . M Karthikeyan and Renu Vyas. Practical Chemoinformatics © Springer India 2014 Pages 1-92 . DOI 10.1007/978-81-322-1780-0_1
Chemicals are everywhere and they are essentially composed of atoms and bonds that support life and provide comfort. The numerous combinations of these entities lead to the complexity and diversity in the universe. Chemistry is a subject which analyzes and tries to explain this complexity at the atomic level. Advancement in this subject led to more data generation and information explosion. Over a period of time, the observations were recorded in chemical documents that include journals, patents, and research reports. The vast amount of chemical literature covering more than two centuries demands the extensive use of information technology to manage it. Today, the chemoinformatics tools and methods have grown powerful enough to handle and discover unexplored knowledge from this huge resource of chemical information. The role of chemoinformatics is to add value to every bit of chemical data. The underlying theme of this domain is how to develop efficient chemical with predicted physico-chemical and biological properties for economic, social, health, safety, and environment. In this chapter, we begin with a brief definition and role of open-source tools in chemoinformatics and extend the discussion on the need for basic computer knowledge required to understand this specialized and interdisciplinary subject. This is followed by an in-depth analysis of traditional and advanced methods for handling chemical structures in computers which is an elementary but essential precursor for performing any chemoinformatics task. Practical guidance on step-by-step use of open-source, free, academic, and commercial structure representation tools is also provided. To gain a better understanding, it is highly recommended that the reader attempts the practice tutorials, Do it yourself exercises, and questions given in each chapter. The scope of this chapter is designed for experimental chemists, biologists, mathematicians, physicists, computer scientists, etc. to understand the subject in a practical way with relevant and easy-to-understand examples and also to encourage the readers to proceed further with advanced topics in the subsequent chapters.
22] Ch 2. Chemoinformatics Approach for the Design and Screening of focused virtual libraries M Karthikeyan and Renu Vyas. Practical Chemoinformatics © Springer India 2014 . Pages 93-131 DOI 10.1007/978-81-322-1780-0_2
It is challenging to handle a large volume of molecular data without appropriate tools. Here, we describe the need and the approaches for the development of focussed virtual libraries to design efficient molecules and optimize them for lead generation. The experimental chemists and biologists are more interested in properties of chemicals and their response to biological system in both beneficial and adverse effects context rather than just their structures. In this chapter, the focus is to relate newly designed chemical structures to their predicted activity, property or toxicity. Property prediction tools save time, money and lives of experimental animals. They come in handy while taking informed decisions especially in certain cases involving pharmacodynamic studies of drug molecules in humans where there are inevitable ethical and safety concerns. Property prediction is an important component in virtual screening which is at the heart of drug design and the most important step where chemoinformatics plays a major role. The other fields where structure–activity relation-based principles hold good for virtual screening are agrochemicals and environmental science, specifically the toxicity and biodegradability prediction of pollutant molecules. In this chapter, we will show how to design software tools to handle generation of focussed virtual libraries from a given set of molecules with common features, fragments or bioactivity spectrum.
21] Ch 3. Machine Learning Methods in Chemoinformatics for Drug Discovery M Karthikeyan and Renu Vyas. Practical Chemoinformatics © Springer India 2014 .Pages 133-194 DOI 10.1007/978-81-322-1780-0_3
It is well known that the structure of a molecule is responsible for its biological activity or physicochemical property. Here, we describe the role of machine learning (ML)/statistical methods for building reliable, predictive models in chemoinformatics. The ML methods are broadly divided into clustering, classification and regression techniques. However, the statistical/mathematical techniques which are part of the ML tools, such as artificial neural networks, hidden Markov models, support vector machine, decision tree learning, Random Forest and Naive Bayes and belief networks, are best suited for drug discovery and play an important role in lead identification and lead optimization steps. This chapter provides stepwise procedures for building ML-based classification and regression models using state-of-art open-source and proprietary tools. A few case studies using benchmark data sets have been carried out to demonstrate the efficacy of the ML-based classification for drug designing.
20] Ch 4. Docking and pharmacophore modeling for virtual screening M Karthikeyan and Renu Vyas. Practical Chemoinformatics © Springer India 2014 .Pages 195-269 DOI 10.1007/978-81-322-1780-0_4
Protein and ligand molecules as two separate entities appear and behave differently, but what happens when they come together and interact with each other is one of the interesting facts in modern molecular biology and molecular recognition. This interaction can be well explained with the concept of docking which in a simple way can be described as the study of how a molecule can bind to another molecule to result in a stable entity. The two binding molecules can be either a protein and a ligand or a protein and a protein. Irrespective of which two molecules are interacting, a docking process invariably includes two steps—conformational search through various algorithms and scoring or ranking. Even though prolific research has been carried out in this field, yet it is still a topic of current interest as there is a scope for improvement to rationalize binding interactions with biological function using docking program. This chapter focuses on how to set up and perform docking runs using freeware and commercial software. Most of the known docking protocols like induced fit docking, protein–protein docking, and pharmacophore-based docking have been discussed. The use of pharmacophore queries as filters in virtual screening is also demonstrated using suitable examples.
19] Ch 5. Active site directed pose prediction programs for efficient filtering of molecules M Karthikeyan and Renu Vyas. Practical Chemoinformatics © Springer India 2014 .Pages 271-316 DOI 10.1007/978-81-322-1780-0_5
It is well known that the three-dimensional structure of a protein is a prerequisite in the field of structure-based drug discovery. Proteins are usually crystallized along with substrates (small molecules) and the site of binding is used for further computational study and virtual screening. Homology is a method that helps in modelling when a protein structure lacks co-crystallized ligands and requires knowledge of the binding site or the sequences which are yet to be crystallized, that require some structural understanding to correlate with biological functions. Homology modelling and active site prediction steps are discussed in detail using standard state-of-the-art software. Knowing the exact sites on a particular protein structure where other molecules can bind and interact is of paramount importance for any drug design effort. Having learnt the basic elements of docking, in this chapter we probe further into the binding sites and the specific properties that impart them the capability of getting bound by ligands. Active site-based features like topology, shape volume and amino acid composition all contribute to its preference for binding to a particular ligand molecule. Deducing this knowledge is the crux of an efficient active site-based screening of molecules. Active site information also helps in building a receptor-based pharmacophore query which can be applied as a constraint while screening molecular libraries. The later section therefore highlights some efforts towards active site-based virtual screening of molecules using an internally developed program which computes phi–psi-based fingerprints of proteins and binary fingerprints of ligands as a pre-filtering step for docking.
18] Ch 6. Representation, fingerprinting and modeling of chemical reactions. M Karthikeyan and Renu Vyas. Practical Chemoinformatics © Springer India 2014 .Pages 317-374 DOI 10.1007/978-81-322-1780-0_6
Designing a better molecule is just one aspect of computational research, but getting it synthesized for biological evaluation is the most significant component in a drug discovery program. A molecule can be formed by a number of synthetic routes. Manually keeping track of all the available options for a product formation in various reaction conditions is a herculean task. Chemoinformatics comes to the rescue by providing a number of computational tools for reaction modelling, albeit less in number than structure property prediction software. The current computational tools help us in modelling various aspects of a given organic reaction—synthetic feasibility, synthesis planning, transition state prediction, the kinetic and thermodynamic parameters, and finally mechanistic features. Several methods like empirical, semiempirical, quantum mechanical, quantum chemical, machine learning, etc. have been developed to model a reaction. The computational approaches are based on the concept of rational synthesis planning, retro-synthetic approaches, and logic in organic synthesis. In this chapter, we begin with reaction representation in computers, reaction databases, free and commercial reaction prediction programs, followed by reaction searching methods based on ontologies and reaction fingerprints. The commonly employed quantum mechanics (QM) and quantum chemistry (QC)-based methods for intrinsic reaction coordinate (IRC) and transition state (TS) determination using the B3LYP/6–31G* scheme are described using simple name reactions. Most of the computational reaction prediction programs such as CHAOS/CAOS are based on the identification of the strategic bonds which are likely to be cleaved or formed during a certain chemical transformation. Accordingly, an algorithm has been developed to identify more than 300 types of unique bonds occurring in chemical reactions. The effect of implicit hydrogens on chemical reactivity modelling is discussed in the context of bioactivity spectrum for structure–activity relationship studies. Other parameters affecting reactivity such as solvent polarity, thermodynamics etc. are also briefly highlighted for frequently used name reactions, hazardous high-energy reactions, as well as industrially important reactions involving bulk chemicals.
17] Ch 7. Predictive methods for Organic Spectral data Simulation. M Karthikeyan and Renu Vyas. Practical Chemoinformatics © Springer India 2014 . Pages 375-414 DOI 10.1007/978-81-322-1780-0_7
New chemical entities (NCE) with potential bioactivity are synthesized, isolated, and thoroughly characterized for structure elucidation and purity before being subjected to further research. Spectroscopy is one of the most powerful means to deduce the correct structure and configuration of a compound or a fragment. In organic synthesis, the compounds are usually characterized by the spectral techniques such as ultraviolet–visible (UV–Vis), nuclear magnetic resonance (NMR), infrared (IR), mass spectrometry (MS), X-ray, etc. NMR and MS methods are employed in fragment-based drug discovery approaches to identify compounds from a high-throughput screen or a proteomics experiment. However, it is not possible to manually interpret the complex spectral data that require sophisticated computational tools for characterization. These tools aid in spectra analysis, peaks assignment, intensity, etc. and thereby annotate the compound with the appropriate functional group and fragments. The prediction algorithms are developed based on principles of quantum chemistry, machine learning, or simple database/pattern match-based methods. Some of the methods using quantum chemistry are accurate; however, they require more computational time; on the other hand, the machine learning methods such as neural network are faster but require more experimental data for improving their prediction capability. So, there is a trade-off between speed and accuracy, and the user has to decide his/her preference. A number of spectra prediction tools, commercial as well as open source, are discussed in this chapter accompanied with detailed tutorials on the use of some of them. To manage the data, many online servers and spectral databases are available today and a brief introduction to them is also provided. Here, we also describe an in-house-developed carbon and proton NMR chemical shift-based binary fingerprints and their use in virtual screening.
16] Ch 8. Chemical Text mining for Lead Discovery. M Karthikeyan and Renu Vyas. Practical Chemoinformatics © Springer India 2014 .Pages 415-449. DOI 10.1007/978-81-322-1780-0_8
With the growth of the Internet, the information disseminated and available in public resources has expanded enormously. There is a need for the development of new tools to navigate through each and every document automatically, word by word to extract useful patterns, concepts, knowledge, or discover something which is not explicitly mentioned in a document to derive useful conclusions. Recently, computational linguistics developers and scientists have devised several text-mining tools and techniques for converting the natural language and processing the information content into facts and data for interpretation, analysis, and predictions. Text mining comprises data mining, information retrieval, natural language processing (NLP), and machine learning (ML) methods. Text mining provides researchers with metadata to ascertain meaningful associations of terms prevalent in their respective domains. Thus, it aids in finding meaning, context, semantics, identifying hidden concepts, trends, and discovering hitherto unknown relationships and correlations from heaps of largely fragmented, unstructured, and scattered information lying in public realm. In this chapter, we highlight the general concept of text mining followed by its features and tools especially for handling biomedical and chemical literature data for drug/lead discovery available in over 22.9 million abstracts in PubMed. The emphasis is on building and using simple text-mining tools in a practical way by harnessing the power of open source and commercially available tools and comprehending the overall strategic challenges in this field. An open-source-based tool for text mining literature with chemical significance that can be effectively used for solving chemoinformatics problems related to lead discovery has been developed. MegaMiner can directly predict lead molecules for a target disease of interest by submitting a text-based query in a distributed computing platform.
15] Ch 9. Integration of Automated Work flow in Chemoinformatics for drug discovery. M Karthikeyan and Renu Vyas. Practical Chemoinformatics © Springer India 2014 .Pages 451-499. DOI 10.1007/978-81-322-1780-0_9
The ever-increasing data and restricted execution time require automated computational workflow systems to handle it. Several tools are emerging to support this activity. Automated workflow systems require scripting to define the repetitive tasks on new data to generate desired output. They help in focussing on what a particular virtual experiment will achieve rather than how the process is executed. The theme of this chapter is identification of the repetitive tasks which can be automated to employ workflows for streamlining a series of computational tasks efficiently. A brief introduction to workflows and their components is followed by in-depth tutorials using today’s state-of-art workflow-based applications in the field of chemoinformatics for drug discovery research. An in-house-developed stand-alone application for chemo-bioinformatics workflow for performing protein–ligand networks J-ProLINE is also presented.
14] Ch 10. Cloud computing Infrastructure development for Chemoinformatics. M Karthikeyan and Renu Vyas. Practical Chemoinformatics © Springer India 2014 . Pages 501-528 .DOI 10.1007/978-81-322-1780-0_10
Chemical research is progressing exponentially, thus fuelling the need to integrate data and applications and develop workflows. To support proper execution of workflows with multiple teams working on collaborative projects, we need robust portals powered by cloud computing infrastructure. A cloud computing portal provides customization configurability to users on a secured, unified and integrated platform with extensive computational power. The sheer magnitude and diversity of the chemical data require customized system-based solutions utilizing available mass storage, CPUs, GPUs and hybrid processors. Porting existing applications to a common portal to provide a single framework which can be deployed on a high-performance computing distributed computing platform for automated programmatic access to workflows. A portal enables efficient scanning, searching and annotating of the data for the users and resource monitoring for the enterprise. They also provide additional features like security, scalability, quality, data consistency and error checks. Portal development has a bright future as they can perform large-scale quantum chemical studies of molecules and become decision support tools to mine functional relationships in chemical biology. In this chapter, we first focus on the essentials of portal development with stepwise tutorials using relevant examples. Mobile computing has transformed the information technology scenario in recent times; consequently, a section is devoted to android, its open-source operating system. Few chemoinformatics-based apps are also discussed.
13] WO WO/2013/030850 - CHEMICAL STRUCTURE RECOGNITION TOOL 07.03.2013 G06F 19/00 PCT/IN2012/000567 COUNCIL OF SCIENTIFIC & INDUSTRIAL RESEARCH KARTHIKEYAN, Muthukumarasamy
A method of extracting and then reusing / remodeling chemical data from a hand written or digital input image without manual inputs using Chemical Structure Recognition Tool (CSRT) is disclosed herein. It comprises loading said input image, converting said input image into a grayscale image i.e. stretching of loaded input image, converting said grayscale image into a binary image i.e. binarisation, smoothing to reduce noise within said binary image, recognizing circle bond to identify presence of a circle inside a ring, predicting OCR region to find zones containing text, image thinning to identify specific shapes within said binary image, edge detection to detect image contrast, detecting double and triple bond, and obtaining output files.
12] Muthukumarasamy Karthikeyan1 , Renu Vyas Chemical Structure Representations and Applications in Computational Toxicity Computational Toxicology : Volume I Methods in Molecular Biology (2012) Volume: 929 , 167-192 | DOI: 10.1007/978-1-62703-050-2_8 (URL)
Efficient storage and retrieval of chemical structures is one of the most important prerequisite for solving any computational-based problem in life sciences. Several resources including research publications, text books, and articles are available on chemical structure representation. Chemical substances that have same molecular formula but several structural formulae, conformations, and skeleton framework/scaffold/functional groups of the molecule convey various characteristics of the molecule. Today with the aid of sophisticated mathematical models and informatics tools, it is possible to design a molecule of interest with specified characteristics based on their applications in pharmaceuticals, agrochemicals, biotechnology, nanomaterials, petrochemicals, and polymers. This chapter discusses both traditional and current state of art representation of chemical structures and their applications in chemical information management, bioactivity- and toxicity-based predictive studies.
11] Distributed Chemical Computing Using ChemStar: Open Source Java RMI Architecture applied to Large Scale Molecular Data from PubChem. (2008) J. Chem. Inf. Model., 48 (4), 691-703.
(International Conference on Chemoinformatics ICCI-2007)
9] Harvesting Chemical Information from the Internet Using a Distributed Approach: ChemXtreme (2006) J. Chem. Inf. Model., 46 (2), 452 -46 1.
2005 (MOST CITED]
8] General Melting Point Prediction Based on a Diverse Compound Data Set and Artificial Neural Networks. (2005) J. Chem. Inf. Model.; 45(3) pp 581 - 590.
7] Encoding and Decoding Graphical Chemical Structures as Two-Dimensional (PDF417) Barcodes M. (2005) J. Chem. Inf. Model.; 45(3) pp 572 - 580
2003-04 (BOYSCAST Fellow: DST Report)
6] PharmTree 2.1 M. Karthikeyan J. Chem. Inf. Comput. Sci., 2003, 43 (6), pp 2194–2195 Publication Date (Web): October 07, 2003 (Article)
DOI: 10.1021/ci034179j PharmTree is a Web-based software package that can fingerprint and classify a large number of chemical compounds according to their biological response data for identifying leads. (A Software Review) 4.07 Impact Factor
Chemoinformatics A tool for modern drug discovery, (2002) Intl. J. Inf. Tech Mgmt. 1, (1), 69-82. [DOI: 10.1504/IJITM.2002.001188]
2] New Intramolecular α-Arylation Strategy of Ketones by the Reaction of Silyl Enol Ethers to Photosensitized Electron Transfer Generated Arene Radical Cations: Construction of Benzannulated and Benzospiroannulated Compounds†
1] Intramolecular nucleophilic addition of silylenol ether to photosensitized electron transfer (PET) generated arene radical cations: a novel non-reagent based carboannulation reaction Pandey, Ganesh ; Krishna, A. ; Girija, K. ; Karthikeyan, M. (1993) Intramolecular nucleophilic addition of silylenol ether to photosensitized electron transfer (PET) generated arene radical cations: a novel non-reagent based carboannulation reaction Tetrahedron Letters, 34 (41). pp. 6631-6634. ISSN 0040-4039