Publications Publications





Application number: WO2016IN50367 20161028 

Priority number(s): IN2015DEL3527 20151030


An automated method for remote computing of molecular docking & dynamics from one or more jobs in a network of plurality of users is disclosed herein. The invention additionally employs a system to execute the said method comprising at least one user device, a remote computing server and a remote database. The job defining action tags are received and scanned by the remote server. A semantic analysis is performed on the jobs to distinguish between customized and non-customized tasks. A data analysis of the said jobs is packaged in a compressed format. The user is continually updated of the job status. A public link is generated and sent to the user to download the results. The link is disabled after the downloading of the results to ensure the security of the data. The method avoids any duplication of jobs and can be performed even when the user is offline.





 42) Method for encoding and decoding large scale molecular virtual libraries into a barcode 

WO 2016181412 A3 (click to download)
Method for encoding and decoding large scale molecular virtual libraries into a barcode Ligand-based drug discovery is often characterized with extraction of scaffolds, linkers and 5 building blocks from large small molecule datasets. Variable sites on scaffolds with attachment sites on building blocks participate in a combinatorial virtual reaction to generate a set of new virtual molecules. This process is time consuming and demands more storage space and is tedious to exchange data digitally. There is practically no quick way to sample molecules without enumerating the virtual library. Therefore, the present invention discloses a method of 10 encoding a virtual library of large scale molecular data into a single barcode. The present invention further discloses a method of decoding the barcode containing large scale data molecules.

41) Application of Genetic Programming (GP) Formalism for Building Disease Predictive Models from Protein-Protein Interactions (PPI) Data

Published in: IEEE/ACM Transactions on Computational Biology and Bioinformatics Volume: PPIssue: 99 )

Protein-protein interactions (PPIs) play a vital role in the biological processes involved in the cell functions and disease pathways. The experimental methods known to predict PPIs require tremendous efforts and the results are often hindered by the presence of a large number of false positives. Herein, we demonstrate the use of a new Genetic Programming (GP) based Symbolic Regression (SR) approach for predicting PPIs related to a disease. In a case study, a dataset consisting of one hundred and thirty five PPI complexes related to cancer was used to construct a generic PPI predicting model with good PPI prediction accuracy and generalization ability. A high correlation coefficient(CC) of 0.893, low root mean square error (RMSE) and mean absolute percentage error (MAPE) values of 478.221 and 0.239, respectively were achieved for both the training and test set outputs. To validate the discriminatory nature of the model, it was applied on a dataset of diabetes complexes where it yielded significantly low CC values. Thus, the GP model developed here serves a dual purpose: (a)a predictor of the binding energy of cancer related PPI complexes, and (b)a classifier for discriminating PPI complexes related to cancer from those of other diseases.


Page(s): 1 - 1
Date of Publication: 26 October 2016      


40) Building and analysis of protein-protein interactions related to diabetes mellitus using support vector machine, biomedical text mining and network analysis

  • Renu Vyasa, , ,Sanket Bapatb,Esha Jainb,Muthukumarasamy Karthikeyanb,Sanjeev Tambec,Bhaskar D. Kulkarnic
  • Abstract

    In order to understand the molecular mechanism underlying any disease, knowledge about the interacting proteins in the disease pathway is essential. The number of revealed protein-protein interactions (PPI) is still very limited compared to the available protein sequences of different organisms. Experiment based high-throughput technologies though provide some data about these interactions, those are often fairly noisy. Computational techniques for predicting protein–protein interactions therefore assume significance. 1296 binary fingerprints that encode a combination of structural and geometric properties were developed using the crystallographic data of 15,000 protein complexes in the pdb server. In a case study, these fingerprints were created for proteins implicated in the Type 2 diabetes mellitus disease. The fingerprints were input into a SVM based model for discriminating disease proteins from non disease proteins yielding a classification accuracy of 78.2% (AUC value of 0.78) on an external data set composed of proteins retrieved via text mining of diabetes related literature. A PPI network was constructed and analysed to explore new disease targets. The integrated approach exemplified here has a potential for identifying disease related proteins, functional annotation and other proteomics studies

39) ChemEngine: harvesting 3D chemical structures of supplementary data from PDF files

Journal of Cheminformatics20168:73

DOI: 10.1186/s13321-016-0175-x

©  The Author(s) 2016Received: 2 March 2016 Accepted: 18 October 2016 Published: 29 December 2016



Digital access to chemical journals resulted in a vast array of molecular information that is now available in the supplementary material files in PDF format. However, extracting this molecular information, generally from a PDF document format is a daunting task. Here we present an approach to harvest 3D molecular data from the supporting information of scientific research articles that are normally available from publisher’s resources. In order to demonstrate the feasibility of extracting truly computable molecules from PDF file formats in a fast and efficient manner, we have developed a Java based application, namely ChemEngine. This program recognizes textual patterns from the supplementary data and generates standard molecular structure data (bond matrix, atomic coordinates) that can be subjected to a multitude of computational processes automatically. The methodology has been demonstrated via several case studies on different formats of coordinates data stored in supplementary information files, wherein ChemEngine selectively harvested the atomic coordinates and interpreted them as molecules with high accuracy. The reusability of extracted molecular coordinate data was demonstrated by computing Single Point Energies that were in close agreement with the original computed data provided with the articles. It is envisaged that the methodology will enable large scale conversion of molecular information from supplementary files available in the PDF format into a collection of ready- to- compute molecular data to create an automated workflow for advanced computational processes. Software along with source codes and instructions available at




38) Spirochromone-chalcone conjugates as antitubercular agents: synthesis, bio evaluation and molecular modeling studies
M Muthukrishnan,   Mohammad Mujahid,   Perumal Yogeeswari,   Sriram Dharmarajan,   Murali Basavanag,   Erik Díaz-Cervantes,   Luis Bahena,   Juvencio Robles,   Rajesh G. Gonnade,   Karthikeyan M and   Renu Vyas, RSC Adv., 2015,
DOI: 10.1039/C5RA21737G Received 18 Oct 2015, Accepted 30 Nov 2015

A new series of spirochromone annulated chalcone conjugates were synthesized and evaluated for their antitubercular activity against Mycobacterium tuberculosis H37Rv strain. These compounds were subjected to molecular modeling studies using docking and chemoinformatics based approaches. The docking simulations were performed against a range of known receptors for chalcone derived compounds to reveal MTB phosphotyrosine phosphatase B [MtbPtpB] protein as the most probable target based on the high binding affinity scores. Five compounds exhibit significant inhibition, showing minimum inhibitory concentration values i.e. MIC ranging from 3.13-12.5 µg/ml. Further analysis of the synthesized compounds library with known and in-house developed chemoinformatics tools unequivocally established their potential as anti-tubercular compounds. QSAR modeling revealed a quantitative relationship between biological activities and frontier molecular orbital energies of synthesized compounds. The predictive model can be employed further for virtual screening of new compounds in this series.

37. Role of Open Source Tools and Resources in Virtual Screening for Drug Discovery,Combinatorial chemistry & high throughput screening 18(6): 528 – 543 (2015)Muthukumarasamy Karthikeyan and Renu Vyas. DOI: 10.2174/1386207318666150703111911

ABSTRACT: Advancement in chemoinformatics research in parallel with availability of high performance computing platform has made handling of large scale multi-dimensional scientific data for high throughput drug discovery easier. In this study we have explored publicly available molecular databases with the help of open-source based integrated in-house molecular informatics tools for virtual screening. The virtual screening literature for past decade has been extensively investigated and thoroughly analyzed to reveal interesting patterns with respect to the drug, target, scaffold and disease space. The review also focuses on the integrated chemoinformatics tools that are capable of harvesting chemical data from textual literature information and transform them into truly computable chemical structures, identification of unique fragments and scaffolds from a class of compounds, automatic generation of focused virtual libraries, computation of molecular descriptors for structure-activity relationship studies, application of conventional filters used in lead discovery along with in-house developed exhaustive PTC (Pharmacophore, Toxicophores and Chemophores) filters and machine learning tools for the design of potential disease specific inhibitors. A case study on kinase inhibitors is provided as an example.

36. ChemScreener: A Distributed Computing Tool for Scaffold based Virtual Screening,Combinatorial chemistry & high throughput screening 18(6): 544 – 561 (2015)Muthukumarasamy Karthikeyan, Deepak Pandit and Renu Vyas. DOI: 10.2174/1386207318666150703112242


View Abstract

ABSTRACT: In this work we present ChemScreener, a Java-based application to perform virtual library generation combined with virtual screening in a platform-independent distributed computing environment. ChemScreener comprises a scaffold identifier, a distinct scaffold extractor, an interactive virtual library generator as well as a virtual screening module for subsequently selecting putative bioactive molecules. The virtual libraries are annotated with chemophore-, pharmacophore- and toxicophore-based information for compound prioritization. The hits selected can then be further processed using QSAR, docking and other in-silico approaches which can all be interfaced within the ChemScreener framework. As a sample application, in this work scaffold selectivity, diversity, connectivity and promiscuity towards six important therapeutic classes have been studied. In order to illustrate the computational power of the application, 55 scaffolds extracted from 161 anti-psychotic compounds were enumerated to produce a virtual library comprising 118 million compounds (17 GB) and annotated with chemophore, pharmacophore and toxicophore based features in a single step which would be non-trivial to perform with many standard software tools today on libraries of this size.

35. Prediction of Bioactive Compounds Using Computed NMR Chemical Shifts,Combinatorial chemistry & high throughput screening 18(6): 562 – 576 (2015) Muthukumarasamy Karthikeyan, Pattuparambil Ramanpillai Rajamohanan and Renu Vyas. DOI: 10.2174/1386207318666150703113312


View Abstract

ABSTRACT: NMR based chemical shifts are an important diagnostic parameter for structure elucidation as they capture rich information related to conformational, electronic and stereochemical arrangement of functional groups in a molecule which is responsible for its activity towards any biological target. The present work discusses the importance of computing NMR chemical shifts from molecular structures. The NMR chemical shift data (experimental or computed) was used to generate fingerprints in binary formats for mapping molecular fragments (as descriptors) and correlating with the bioactivity classes. For this study, chemical shift data derived binary fingerprints were computed for 149 classes and 4800 bioactive molecules. The sensitivity and selectivity of fingerprints in discriminating molecules belonging to different therapeutic categories was assessed using a LibSVM based classifier. An accuracy of 82% for proton and 94% for carbon NMR fingerprints were obtained for anti-psoriatic and anti-psychotic molecules demonstrating the effectiveness of this approach for virtual screening.

34. Protein Ligand Complex Guided Approach for Virtual Screening,Combinatorial chemistry & high throughput screening 18(6): 577 – 590  (2015) Muthukumarasamy Karthikeyan, Deepak Pandit and Renu Vyas. DOI: 10.2174/1386207318666150703112620


View Abstract

ABSTRACT: The target ligand association data is a rich source of information which is not exploited enough for drug design efforts in virtual screening. A java based open-source toolkit for Protein Ligand Network Extraction (J-ProLiNE) focused on protein-ligand complex analysis with several features integrated in a distributed computing network has been developed. Sequence alignment and similarity search components have been automated to yield local, global alignment scores along with similarity and distance scores. 10000 proteins with co-crystallized ligands from pdb and MOAD databases were extracted and analyzed for revealing relationships between targets, ligands and scaffolds. Through this analysis , we could generate a protein ligand network to identify the promiscuous and selective scaffolds for multiple classes of proteins targets. Using J-ProLiNE we created a 507 x 507 matrix of protein targets and native ligands belonging to six enzyme classes and analyzed the results to elucidate the protein-protein, protein-ligand and ligand-ligand interactions. In yet another application of the J-ProLiNE software, we were able to process kinase related information stored in US patents to construct disease-gene-ligand-scaffold networks. It is hoped that the studies presented here will enable target ligand knowledge based virtual screening for inhibitor design.

33. MegaMiner: A Tool for Lead Identification Through Text Mining Using Chemoinformatics Tools and Cloud Computing Environment,Combinatorial chemistry & high throughput screening 18(6): 591 – 603  (2015) Muthukumarasamy Karthikeyan, Yogesh Pandit, Deepak Pandit and Renu VyasDOI: 10.2174/1386207318666150703113525


View Abstract

ABSTRACT: Virtual screening is an indispensable tool to cope with the massive amount of data being tossed by the high throughput omics technologies. With the objective of enhancing the automation capability of virtual screening process a robust portal termed MegaMiner has been built using the cloud computing platform wherein the user submits a text query and directly accesses the proposed lead molecules along with their drug-like, lead-like and docking scores. Textual chemical structural data representation is fraught with ambiguity in the absence of a global identifier. We have used a combination of statistical models, chemical dictionary and regular expression for building a disease specific dictionary. To demonstrate the effectiveness of this approach, a case study on malaria has been carried out in the present work. MegaMiner offered superior results compared to other text mining search engines, as established by F score analysis. A single query term 'malaria' in the portlet led to retrieval of related PubMed records, protein classes, drug classes and 8000 scaffolds which were internally processed and filtered to suggest new molecules as potential anti-malarials. The results obtained were validated by docking the virtual molecules into relevant protein targets. It is hoped that MegaMiner will serve as an indispensable tool for not only identifying hidden relationships between various biological and chemical entities but also for building better corpus and ontologies.

32. Design and Development of ChemInfoCloud: An Integrated Cloud Enabled Platform for Virtual Screening,Combinatorial chemistry & high throughput screening 18(6): 604 – 619  (2015) Muthukumarasamy Karthikeyan, Deepak Pandit, Arvind Bhavasar and Renu Vyas. DOI: 10.2174/1386207318666150703113656


View Abstract

ABSTRACT: The power of cloud computing and distributed computing has been harnessed to handle vast and heterogeneous data required to be processed in any virtual screening protocol. A cloud computing platform ChemInfoCloud was built and integrated with several chemoinformatics and bioinformatics tools. The robust engine performs the core chemoinformatics tasks of lead generation, lead optimisation and property prediction in a fast and efficient manner. It has also been provided with some of the bioinformatics functionalities including sequence alignment, active site pose prediction and protein ligand docking. Text mining, NMR chemical shift (1H, 13C) prediction and reaction fingerprint generation modules for efficient lead discovery are also implemented in this platform. We have developed an integrated problem solving cloud environment for virtual screening studies that also provides workflow management, better usability and interaction with end users using container based virtualization, OpenVz.

31. Editorial (Thematic Issue: Role of Data and Methods in Chemoinformatics for Virtual Screening),Combinatorial chemistry & high throughput screening 18(7): 622 - 623 (2015)Muthukumarasamy Karthikeyan and Renu Vyas. DOI: 10.2174/138620731807150903101821

30. Pharmacophore and Docking Based Virtual Screening of Validated Mycobacterium tuberculosis Targets,Combinatorial chemistry & high throughput screening 18(7): 624 – 637  (2015) Renu Vyas, Muthukumarasamy Karthikeyan, Ganesh Nainaru and Murugan Muthukrishnan. DOI: 10.2174/1386207318666150703112759


View Abstract

ABSTRACT: Target based virtual screening has surpassed ligand based virtual screening methods in the recent past mainly as it provides more clues regarding intermolecular interactions and takes into consideration the flexible receptor as well. The current methodology describes a computational strategy of predicting Mycobacterium tuberculosis (M. tuberculosis) binders for five well studied targets representing M. tuberculosis proteome encompassing most of the known mechanisms of action. The diversity of the targets was affirmed by their active site analysis and structural studies. The current approach employed pharmacophore searching, docking and clustering techniques in tandem and was validated by enrichment studies using the available Schrödinger data set consisting of 1000 decoys. The application of this methodology was demonstrated by predicting potential molecular targets for fifty newly synthesized compounds. Cross docking studies on the targets were carried out with 4512 known inhibitors utilizing a high performance computing platform to reveal underlying affinity and promiscuity patterns. Optimum binding energy range for all targets as determined by high throughput docking was found to be -3 to -13 kcal/mol.

29. Role of Chemical Reactivity and Transition State Modeling for Virtual Screening,Combinatorial chemistry & high throughput screening 18(7): 638 – 657  (2015) Muthukumarasamy Karthikeyan, Renu Vyas, Sanjeev S. Tambe, Deepthi Radhamohan and Bhaskar D Kulkarni.


View Abstract

ABSTRACT: Every drug discovery research program involves synthesis of a novel and potential drug molecule utilizing atom efficient, economical and environment friendly synthetic strategies. The current work focuses on the role of the reactivity based fingerprints of compounds as filters for virtual screening using a tool ChemScore. A reactant-like (RLS) and a product- like (PLS) score can be predicted for a given compound using the binary fingerprints derived from the numerous known organic reactions which capture the molecule-molecule interactions in the form of addition, substitution, rearrangement, elimination and isomerization reactions. The reaction fingerprints were applied to large databases in biology and chemistry, namely ChEMBL, KEGG, HMDB, DSSTox, and the Drug Bank database. A large network of 1113 synthetic reactions was constructed to visualize and ascertain the reactant product mappings in the chemical reaction space. The cumulative reaction fingerprints were computed for 4000 molecules belonging to 29 therapeutic classes of compounds, and these were found capable of discriminating between the cognition disorder related and anti-allergy compounds with reasonable accuracy of 75 % and AUC 0.8. In this study, the transition state based fingerprints also were developed and used effectively for virtual screening in drug related databases. The methodology presented here provides an efficient handle for the rapid scoring of molecular libraries for virtual screening.

28. A Study of Applications of Machine Learning Based Classification Methods for Virtual Screening of Lead Molecules,Combinatorial chemistry & high throughput screening 18(7): 658 – 672  (2015) Renu Vyas, Sanket Bapat, Esha Jain, Sanjeev S. Tambe, Muthukumarasamy Karthikeyan and Bhaskar D Kulkarni. DOI: 10.2174/1386207318666150703112447


View Abstract

ABSTRACT: The ligand-based virtual screening of combinatorial libraries employs a number of statistical modeling and machine learning methods. A comprehensive analysis of the application of these methods for the diversity oriented virtual screening of biological targets/drug classes is presented here. A number of classification models have been built using three types of inputs namely structure based descriptors, molecular fingerprints and therapeutic category for performing virtual screening. The activity and affinity descriptors of a set of inhibitors of four target classes DHFR, COX, LOX, NMDA have been utilized to train a total of six classifiers viz. Artificial Neural Network(ANN), k nearest neighbor (k-NN ), Support Vector Machine (SVM), Naïve Bayes (NB), Decision Tree(DT) and Random Forest(RF). Among these classifiers, the ANN was found as the best classifier with an AUC of 0.9 irrespective of the target. New molecular fingerprints based on pharmacophore, toxicophore and chemophore (PTC), were used to build the ANN models for each dataset. A good accuracy of 87.27% was obtained using 296 chemophoric binary fingerprints for the COX-LOX inhibitors compared to pharmacophoric (67.82 %) and toxicophoric (70.64 %). The methodology was validated on the classical Ames mutagenecity dataset of 4337 molecules. To evaluate it further, selectivity and promiscuity of molecules from five drug classes viz. anti-anginal, anti-convulsant, anti-depressant, anti-arrhythmic and anti-diabetic was studied. The TPC fingerprints computed for each category were able to capture the drug-class specific features using the k-NN classifier. These models can be useful for selecting optimal molecules for drug design.

27. Chemoinformatics Approach for Building Molecular Networks from Marine Organisms,Combinatorial chemistry & high throughput screening 18(7): 673 – 684  (2015) Muthukumarasamy Karthikeyan, Deepika Nimje, Rakhi Pahujani, Kushal Tyagi, Sanket Bapat, Renu Vyas and Krishna Pillai Padmakumar. DOI: 10.2174/1386207318666150703112950

View Abstract

ABSTRACT: Natural products obtained from marine sources are considered to be a rich and diverse source of potential drugs. In the present work we demonstrate the use of chemo informatics approach for the design of new molecules inspired by molecules from marine organisms. Accordingly we have assimilated information from two major scientific domains namely chemoinformatics and biodiversity informatics to develop an interactive marine database named MIMMO (Medicinally Important Molecules from Marine Organisms). The database can be queried for species, molecules, scaffolds, drugs, diseases and associated cumulative biological activity spectrum along with links to the literature resources. Molecular informatics analysis of the molecules obtained from MIMMO was performed to study their chemical space. The distinct skeletal features of the biologically active compounds isolated from marine species were identified. Scaffold molecules and species networks were created to identify common scaffolds from marine source and drug space. An analysis of the entire molecular data revealed a unique list of around 2000 molecules from which ten most frequently occurring distinct scaffolds were obtained.




26] USPATENT : 2014/0301608 (click to download)


25] Pharmacokinetic Modeling of Caco-2 Cell Permeability Using Genetic Programming (GP) Method



    31.12.2014    G06F 19/00         PCT/IB2014/062585    COUNCIL OF SCIENTIFIC & INDUSTRIAL RESEARCH    KARTHIKEYAN, Muthukumarasamy

The invention discloses a method to generate and analyze NMR chemical shift based binary fingerprints for virtual high throughput screening in drug discovery. Further, the invention provides a method to analyze NMR chemical shifts based binary fingerprints that has implications for encoding several properties of a molecule besides the basic framework or scaffold and determine its propensity towards a particular bioactivity class.

Textbook: Text Book : Practical Chemoinformatics , Springer 2014 (ISBN: 978-81-322-1779-4)


23] Ch 1.   

Open-Source Tools, Techniques, and Data in Chemoinformatics . M Karthikeyan and Renu Vyas. Practical Chemoinformatics © Springer India 2014 Pages 1-92 . DOI 10.1007/978-81-322-1780-0_1

Chemicals are everywhere and they are essentially composed of atoms and bonds that support life and provide comfort. The numerous combinations of these entities lead to the complexity and diversity in the universe. Chemistry is a subject which analyzes and tries to explain this complexity at the atomic level. Advancement in this subject led to more data generation and information explosion. Over a period of time, the observations were recorded in chemical documents that include journals, patents, and research reports. The vast amount of chemical literature covering more than two centuries demands the extensive use of information technology to manage it. Today, the chemoinformatics tools and methods have grown powerful enough to handle and discover unexplored knowledge from this huge resource of chemical information. The role of chemoinformatics is to add value to every bit of chemical data. The underlying theme of this domain is how to develop efficient chemical with predicted physico-chemical and biological properties for economic, social, health, safety, and environment. In this chapter, we begin with a brief definition and role of open-source tools in chemoinformatics and extend the discussion on the need for basic computer knowledge required to understand this specialized and interdisciplinary subject. This is followed by an in-depth analysis of traditional and advanced methods for handling chemical structures in computers which is an elementary but essential precursor for performing any chemoinformatics task. Practical guidance on step-by-step use of open-source, free, academic, and commercial structure representation tools is also provided. To gain a better understanding, it is highly recommended that the reader attempts the practice tutorials, Do it yourself exercises, and questions given in each chapter. The scope of this chapter is designed for experimental chemists, biologists, mathematicians, physicists, computer scientists, etc. to understand the subject in a practical way with relevant and easy-to-understand examples and also to encourage the readers to proceed further with advanced topics in the subsequent chapters.

22] Ch 2.    Chemoinformatics Approach for the Design and Screening of focused virtual libraries M Karthikeyan and Renu Vyas. Practical Chemoinformatics © Springer India 2014 . Pages 93-131  DOI 10.1007/978-81-322-1780-0_2

It is challenging to handle a large volume of molecular data without appropriate tools. Here, we describe the need and the approaches for the development of focussed virtual libraries to design efficient molecules and optimize them for lead generation. The experimental chemists and biologists are more interested in properties of chemicals and their response to biological system in both beneficial and adverse effects context rather than just their structures. In this chapter, the focus is to relate newly designed chemical structures to their predicted activity, property or toxicity. Property prediction tools save time, money and lives of experimental animals. They come in handy while taking informed decisions especially in certain cases involving pharmacodynamic studies of drug molecules in humans where there are inevitable ethical and safety concerns. Property prediction is an important component in virtual screening which is at the heart of drug design and the most important step where chemoinformatics plays a major role. The other fields where structure–activity relation-based principles hold good for virtual screening are agrochemicals and environmental science, specifically the toxicity and biodegradability prediction of pollutant molecules. In this chapter, we will show how to design software tools to handle generation of focussed virtual libraries from a given set of molecules with common features, fragments or bioactivity spectrum.

21] Ch 3.    Machine Learning Methods in Chemoinformatics for Drug Discovery M Karthikeyan and Renu Vyas. Practical Chemoinformatics © Springer India 2014 .Pages 133-194 DOI 10.1007/978-81-322-1780-0_3

It is well known that the structure of a molecule is responsible for its biological activity or physicochemical property. Here, we describe the role of machine learning (ML)/statistical methods for building reliable, predictive models in chemoinformatics. The ML methods are broadly divided into clustering, classification and regression techniques. However, the statistical/mathematical techniques which are part of the ML tools, such as artificial neural networks, hidden Markov models, support vector machine, decision tree learning, Random Forest and Naive Bayes and belief networks, are best suited for drug discovery and play an important role in lead identification and lead optimization steps. This chapter provides stepwise procedures for building ML-based classification and regression models using state-of-art open-source and proprietary tools. A few case studies using benchmark data sets have been carried out to demonstrate the efficacy of the ML-based classification for drug designing.

20] Ch 4.    Docking and pharmacophore modeling for virtual screening M Karthikeyan and Renu Vyas. Practical Chemoinformatics © Springer India 2014 .Pages 195-269 DOI 10.1007/978-81-322-1780-0_4


Protein and ligand molecules as two separate entities appear and behave differently, but what happens when they come together and interact with each other is one of the interesting facts in modern molecular biology and molecular recognition. This interaction can be well explained with the concept of docking which in a simple way can be described as the study of how a molecule can bind to another molecule to result in a stable entity. The two binding molecules can be either a protein and a ligand or a protein and a protein. Irrespective of which two molecules are interacting, a docking process invariably includes two steps—conformational search through various algorithms and scoring or ranking. Even though prolific research has been carried out in this field, yet it is still a topic of current interest as there is a scope for improvement to rationalize binding interactions with biological function using docking program. This chapter focuses on how to set up and perform docking runs using freeware and commercial software. Most of the known docking protocols like induced fit docking, protein–protein docking, and pharmacophore-based docking have been discussed. The use of pharmacophore queries as filters in virtual screening is also demonstrated using suitable examples.

19] Ch 5.    Active site directed pose prediction programs for efficient filtering of molecules M Karthikeyan and Renu Vyas. Practical Chemoinformatics © Springer India 2014 .Pages 271-316 DOI 10.1007/978-81-322-1780-0_5


It is well known that the three-dimensional structure of a protein is a prerequisite in the field of structure-based drug discovery. Proteins are usually crystallized along with substrates (small molecules) and the site of binding is used for further computational study and virtual screening. Homology is a method that helps in modelling when a protein structure lacks co-crystallized ligands and requires knowledge of the binding site or the sequences which are yet to be crystallized, that require some structural understanding to correlate with biological functions. Homology modelling and active site prediction steps are discussed in detail using standard state-of-the-art software. Knowing the exact sites on a particular protein structure where other molecules can bind and interact is of paramount importance for any drug design effort. Having learnt the basic elements of docking, in this chapter we probe further into the binding sites and the specific properties that impart them the capability of getting bound by ligands. Active site-based features like topology, shape volume and amino acid composition all contribute to its preference for binding to a particular ligand molecule. Deducing this knowledge is the crux of an efficient active site-based screening of molecules. Active site information also helps in building a receptor-based pharmacophore query which can be applied as a constraint while screening molecular libraries. The later section therefore highlights some efforts towards active site-based virtual screening of molecules using an internally developed program which computes phi–psi-based fingerprints of proteins and binary fingerprints of ligands as a pre-filtering step for docking.

18] Ch 6.    Representation, fingerprinting and modeling of chemical reactions. M Karthikeyan and Renu Vyas. Practical Chemoinformatics © Springer India 2014 .Pages 317-374 DOI 10.1007/978-81-322-1780-0_6


Designing a better molecule is just one aspect of computational research, but getting it synthesized for biological evaluation is the most significant component in a drug discovery program. A molecule can be formed by a number of synthetic routes. Manually keeping track of all the available options for a product formation in various reaction conditions is a herculean task. Chemoinformatics comes to the rescue by providing a number of computational tools for reaction modelling, albeit less in number than structure property prediction software. The current computational tools help us in modelling various aspects of a given organic reaction—synthetic feasibility, synthesis planning, transition state prediction, the kinetic and thermodynamic parameters, and finally mechanistic features. Several methods like empirical, semiempirical, quantum mechanical, quantum chemical, machine learning, etc. have been developed to model a reaction. The computational approaches are based on the concept of rational synthesis planning, retro-synthetic approaches, and logic in organic synthesis. In this chapter, we begin with reaction representation in computers, reaction databases, free and commercial reaction prediction programs, followed by reaction searching methods based on ontologies and reaction fingerprints. The commonly employed quantum mechanics (QM) and quantum chemistry (QC)-based methods for intrinsic reaction coordinate (IRC) and transition state (TS) determination using the B3LYP/6–31G* scheme are described using simple name reactions. Most of the computational reaction prediction programs such as CHAOS/CAOS are based on the identification of the strategic bonds which are likely to be cleaved or formed during a certain chemical transformation. Accordingly, an algorithm has been developed to identify more than 300 types of unique bonds occurring in chemical reactions. The effect of implicit hydrogens on chemical reactivity modelling is discussed in the context of bioactivity spectrum for structure–activity relationship studies. Other parameters affecting reactivity such as solvent polarity, thermodynamics etc. are also briefly highlighted for frequently used name reactions, hazardous high-energy reactions, as well as industrially important reactions involving bulk chemicals.

17] Ch 7.    Predictive methods for Organic Spectral data Simulation. M Karthikeyan and Renu Vyas. Practical Chemoinformatics © Springer India 2014 . Pages 375-414 DOI 10.1007/978-81-322-1780-0_7


New chemical entities (NCE) with potential bioactivity are synthesized, isolated, and thoroughly characterized for structure elucidation and purity before being subjected to further research. Spectroscopy is one of the most powerful means to deduce the correct structure and configuration of a compound or a fragment. In organic synthesis, the compounds are usually characterized by the spectral techniques such as ultraviolet–visible (UV–Vis), nuclear magnetic resonance (NMR), infrared (IR), mass spectrometry (MS), X-ray, etc. NMR and MS methods are employed in fragment-based drug discovery approaches to identify compounds from a high-throughput screen or a proteomics experiment. However, it is not possible to manually interpret the complex spectral data that require sophisticated computational tools for characterization. These tools aid in spectra analysis, peaks assignment, intensity, etc. and thereby annotate the compound with the appropriate functional group and fragments. The prediction algorithms are developed based on principles of quantum chemistry, machine learning, or simple database/pattern match-based methods. Some of the methods using quantum chemistry are accurate; however, they require more computational time; on the other hand, the machine learning methods such as neural network are faster but require more experimental data for improving their prediction capability. So, there is a trade-off between speed and accuracy, and the user has to decide his/her preference. A number of spectra prediction tools, commercial as well as open source, are discussed in this chapter accompanied with detailed tutorials on the use of some of them. To manage the data, many online servers and spectral databases are available today and a brief introduction to them is also provided. Here, we also describe an in-house-developed carbon and proton NMR chemical shift-based binary fingerprints and their use in virtual screening.

16] Ch 8.    Chemical Text mining for Lead Discovery. M Karthikeyan and Renu Vyas. Practical Chemoinformatics © Springer India 2014 .Pages 415-449. DOI 10.1007/978-81-322-1780-0_8


With the growth of the Internet, the information disseminated and available in public resources has expanded enormously. There is a need for the development of new tools to navigate through each and every document automatically, word by word to extract useful patterns, concepts, knowledge, or discover something which is not explicitly mentioned in a document to derive useful conclusions. Recently, computational linguistics developers and scientists have devised several text-mining tools and techniques for converting the natural language and processing the information content into facts and data for interpretation, analysis, and predictions. Text mining comprises data mining, information retrieval, natural language processing (NLP), and machine learning (ML) methods. Text mining provides researchers with metadata to ascertain meaningful associations of terms prevalent in their respective domains. Thus, it aids in finding meaning, context, semantics, identifying hidden concepts, trends, and discovering hitherto unknown relationships and correlations from heaps of largely fragmented, unstructured, and scattered information lying in public realm. In this chapter, we highlight the general concept of text mining followed by its features and tools especially for handling biomedical and chemical literature data for drug/lead discovery available in over 22.9 million abstracts in PubMed. The emphasis is on building and using simple text-mining tools in a practical way by harnessing the power of open source and commercially available tools and comprehending the overall strategic challenges in this field. An open-source-based tool for text mining literature with chemical significance that can be effectively used for solving chemoinformatics problems related to lead discovery has been developed. MegaMiner can directly predict lead molecules for a target disease of interest by submitting a text-based query in a distributed computing platform.

15] Ch 9.    Integration of Automated Work flow in Chemoinformatics for drug discovery. M Karthikeyan and Renu Vyas. Practical Chemoinformatics © Springer India 2014 .Pages 451-499. DOI 10.1007/978-81-322-1780-0_9


The ever-increasing data and restricted execution time require automated computational workflow systems to handle it. Several tools are emerging to support this activity. Automated workflow systems require scripting to define the repetitive tasks on new data to generate desired output. They help in focussing on what a particular virtual experiment will achieve rather than how the process is executed. The theme of this chapter is identification of the repetitive tasks which can be automated to employ workflows for streamlining a series of computational tasks efficiently. A brief introduction to workflows and their components is followed by in-depth tutorials using today’s state-of-art workflow-based applications in the field of chemoinformatics for drug discovery research. An in-house-developed stand-alone application for chemo-bioinformatics workflow for performing protein–ligand networks J-ProLINE is also presented.

14] Ch 10.    Cloud computing Infrastructure development for Chemoinformatics. M Karthikeyan and Renu Vyas. Practical Chemoinformatics © Springer India 2014 . Pages 501-528 .DOI 10.1007/978-81-322-1780-0_10


Chemical research is progressing exponentially, thus fuelling the need to integrate data and applications and develop workflows. To support proper execution of workflows with multiple teams working on collaborative projects, we need robust portals powered by cloud computing infrastructure. A cloud computing portal provides customization configurability to users on a secured, unified and integrated platform with extensive computational power. The sheer magnitude and diversity of the chemical data require customized system-based solutions utilizing available mass storage, CPUs, GPUs and hybrid processors. Porting existing applications to a common portal to provide a single framework which can be deployed on a high-performance computing distributed computing platform for automated programmatic access to workflows. A portal enables efficient scanning, searching and annotating of the data for the users and resource monitoring for the enterprise. They also provide additional features like security, scalability, quality, data consistency and error checks. Portal development has a bright future as they can perform large-scale quantum chemical studies of molecules and become decision support tools to mine functional relationships in chemical biology. In this chapter, we first focus on the essentials of portal development with stepwise tutorials using relevant examples. Mobile computing has transformed the information technology scenario in recent times; consequently, a section is devoted to android, its open-source operating system. Few chemoinformatics-based apps are also discussed.


WO    WO/2013/030850    - CHEMICAL STRUCTURE RECOGNITION TOOL    07.03.2013    G06F 19/00      PCT/IN2012/000567    COUNCIL OF SCIENTIFIC & INDUSTRIAL RESEARCH    KARTHIKEYAN, Muthukumarasamy

A method of extracting and then reusing / remodeling chemical data from a hand written or digital input image without manual inputs using Chemical Structure Recognition Tool (CSRT) is disclosed herein. It comprises loading said input image, converting said input image into a grayscale image i.e. stretching of loaded input image, converting said grayscale image into a binary image i.e. binarisation, smoothing to reduce noise within said binary image, recognizing circle bond to identify presence of a circle inside a ring, predicting OCR region to find zones containing text, image thinning to identify specific shapes within said binary image, edge detection to detect image contrast, detecting double and triple bond, and obtaining output files.


12] Muthukumarasamy Karthikeyan1 , Renu Vyas Chemical Structure Representations and Applications in Computational Toxicity Computational Toxicology : Volume I Methods in Molecular Biology  (2012)   Volume: 929 , 167-192  |  DOI: 10.1007/978-1-62703-050-2_8 (URL)


Efficient storage and retrieval of chemical structures is one of the most important prerequisite for solving any computational-based problem in life sciences. Several resources including research publications, text books, and articles are available on chemical structure representation. Chemical substances that have same molecular formula but several structural formulae, conformations, and skeleton framework/scaffold/functional groups of the molecule convey various characteristics of the molecule. Today with the aid of sophisticated mathematical models and informatics tools, it is possible to design a molecule of interest with specified characteristics based on their applications in pharmaceuticals, agrochemicals, biotechnology, nanomaterials, petrochemicals, and polymers. This chapter discusses both traditional and current state of art representation of chemical structures and their applications in chemical information management, bioactivity- and toxicity-based predictive studies.



11] Distributed Chemical Computing Using ChemStar: Open Source Java RMI Architecture applied to Large Scale Molecular Data from PubChem. (2008) J. Chem. Inf. Model., 48 (4), 691-703.

Journal of Chemical Information and Modeling 05/2008; 48(4):691-703. · 4.07 Impact Factor

(International Conference on Chemoinformatics ICCI-2007)


10] A tryptophan residue is identified in the substrate binding of penicillin G acylase from Kluyvera citrophila


 9] Harvesting Chemical Information from the Internet Using a Distributed Approach: ChemXtreme (2006) J. Chem. Inf. Model., 46 (2), 452 -46 1.

Harvesting chemical information from the Internet using a distributed approach: ChemXtreme.
Journal of Chemical Information and Modeling 02/2006; 46(2):452-61. ·  4.07 Impact Factor

2005   (MOST CITED]

8] General Melting Point Prediction Based on a Diverse Compound Data Set and Artificial Neural Networks. (2005) J. Chem. Inf. Model.; 45(3) pp 581 - 590

General melting point prediction based on a diverse compound data set and artificial neural networks.

Journal of Chemical Information and Modeling 08/2005; 45(3):581-90. ·  4.07 Impact Factor

7] Encoding and Decoding Graphical Chemical Structures as Two-Dimensional (PDF417) Barcodes M. (2005) J. Chem. Inf. Model.; 45(3) pp 572 - 580

Encoding and decoding graphical chemical structures as two-dimensional (PDF417) barcodes.

M Karthikeyan, Andreas Bender
Journal of Chemical Information and Modeling 05/2005; 45(3):572-80. ·  4.07 Impact Factor

2003-04 (BOYSCAST Fellow: DST Report)
6] PharmTree 2.1
M. Karthikeyan J. Chem. Inf. Comput. Sci., 2003, 43 (6), pp 2194–2195 Publication Date (Web): October 07, 2003 (Article)
DOI: 10.1021/ci034179j  PharmTree is a Web-based software package that can fingerprint and classify a large number of chemical compounds according to their biological response data for identifying leads. (A Software Review)  4.07 Impact Factor


5] EPZ-10 catalyzed regioselective transformation of alkenes into ??-iodo ethers, iodohydrins and 2-iodomethyl-2,3-dihydrobenzofurans
Green Chemistry 01/2002; 4(4):325-327. · 6.85 Impact Factor 

Chemoinformatics A tool for modern drug discovery, (2002) Intl. J. Inf. Tech Mgmt. 1, (1), 69-82. [DOI: 10.1504/IJITM.2002.001188]

4] Chemoinformatics: a tool for modern drug discovery.
Muthukumarasamy Karthikeyan, Subramanian Krishnan
IJITM. 01/2002; 1:69-82.

1999-2001 * (Strategic sector)

3] (1992-1997) PhD Thesis (Dr M Karthikeyan)


2] New Intramolecular α-Arylation Strategy of Ketones by the Reaction of Silyl Enol Ethers to Photosensitized Electron Transfer Generated Arene Radical Cations:  Construction of Benzannulated and Benzospiroannulated Compounds†

Ganesh Pandey, M. Karthikeyan, A. Murugan
Journal of Organic Chemistry - J ORG CHEM. 04/1998; 63(9). 4.638 Impact Factor
1994-07 (PhD Program)

1] Intramolecular nucleophilic addition of silylenol ether to photosensitized electron transfer (PET) generated arene radical cations: a novel non-reagent based carboannulation reaction Pandey, Ganesh ; Krishna, A. ; Girija, K. ; Karthikeyan, M. (1993) Intramolecular nucleophilic addition of silylenol ether to photosensitized electron transfer (PET) generated arene radical cations: a novel non-reagent based carboannulation reaction Tetrahedron Letters, 34 (41). pp. 6631-6634. ISSN 0040-4039



 free hit counter
moltable Counter