Você está na página 1de 102

PROJECT REPORT ON Drug Design: to find a drug that changes protein activity of Influenza Virus

Submitted in the partial fulfillment for the award of Degree Of BACHELOR OF TECHNOLOGY IN BIOTECHNOLOGY (Session 2009-2013) Submitted by Bosky Mangal (1109152) Supervisor Mr. Oisik Das

DEPARTMENT OF BIOTECHNOLOGY (Maharishi Markandeshwar Engineering College, Mullana) MAHARISHI MARKANDESHWAR UNIVERSITY MULLANA (AMBALA)-133207

CERTIFICATE

This is to certify that the project entitled Drug Design: to find a drug that changes protein activity of Influenza Virus submitted by Ms. Bosky Mangal, to The Department of Biotechnology, Maharishi Markandeshwar University, Mullana, Ambala for partial fulfillment of the requirements for the degree of Bachelor in Technology in Biotechnology, has been carried out under my supervision. The assistance and help received during the course of investigation and sources of literatures have been fully acknowledged.

Dr. Anil Kumar Sharma (Head of the Department) Department of Biotechnology MMU, Mullana, Ambala

CERTIFICATE

This is to certify that the project entitled Drug Design: to find a drug that changes protein activity of influenza Virus submitted by Bosky Mangal, student of B.tech (Biotechnology) Eighth Semester is a bonafide work carried out by her under my guidance, for the partial fulfillment of B.tech (Biotechnology) degree course awarded by Maharishi Markandeshwar University , Mullana (Ambala).

I wish her luck and success in the future.

Mr.Oisik Das (Biotechnology Department MMU Mullana ,Ambala)

DECLARATION

I hereby declare that the work contained in this project entitled Drug Design: to find a drug that changes protein activity of Influenza Virus has not been previously submitted for degree at any other higher education institute. To the best of my knowledge and believe, this project contains no material previously published or written by another person except where due references are made.

Bosky Mangal 110904655 B.Tech(Bio-technology) 8Th sem

ACKNOWLEDGEMENT

First of all I would like to thank almighty GOD who has given this wonderful gift of life to us. He is the one who is guiding us in right direction to follow noble path of humanity. In my six months minor project report it is a wonderful experience to be a part of work Drug Design: to find a drug that changes protein activity of Influenza Virus. I owe my deep regards for the supporting and kind staff authorities who are helping me in my lean patches during this project. I am grateful to all the staff and co-students of IBI Biosolutions Pvt. Ltd for sharing their experience with me. I would like to express my heartiest concern for Mr. Sachin Sharma for his able guidance and for his inspiring attitude, praiseworthy attitude and honest support. Not to forget the pain staking efforts of our college last but not the least I would express my utmost regards for the Biotechnology department of our Institute.

ABSTRACT

Influenza is a serious problem in the medical community. Each year in the United States, roughly 200,000 individuals are hospitalized due to influenza. Additionally, on average 36,000 deaths are attributed to influenza yearly in the US. Children and elderly are more susceptible to have serious complications from influenza. There are two types of influenza, A and B, with hundreds of strains of each. Influenza A is generally considered to be the more prevalent and dangerous type, as it is usually associated with epidemics. Influenza is an evolving virus, constantly reproducing new mutant strains resistant to treatment .Clearly, there is an enormous need for a practical approach to the treatment of influenza. The goal of this research is to design a new antiviral drug which is effective against both influenza A and B. The ideal drug should have minimal side effects and fewer restraints than the current drugs on the market. The purpose of this thesis is to present my research procedure, difficulties which were overcome, and resulting information.

Review of Literature

Influenza: Information, Biological Activity, and Current Options

Influenza is a serious problem in the medical community. Each year in the United States, roughly 200,000 individuals are hospitalized due to influenza. Additionally, on average 36,000 deaths are attributed to influenza yearly in the US. [Center for Disease Control and Prevention. Influenza. http://www.cdc.gov/flu/ (accessed June 19, 2009).]

Children and elderly are more susceptible to have serious complications from influenza. There are two types of influenza, A and B, with hundreds of strains of each. Influenza A is generally considered to be the more prevalent and dangerous type, as it is usually associated with epidemics. Influenza is an evolving virus, constantly reproducing new mutant strains resistant to treatment. [Couch, Robert B. The New England Journal of Medicine 1997, 337: 927-929.]

The influenza virus is a segmented, membrane-enclosed; negative-strand RNA virus.3 the influenza viral protein membrane is made up of three main components: hemagglutinin (HA), the M2 proton channel, and neuraminidase (NA). There are sixteen subtypes of hemagglutinin, HA: H1-H16. Hemagglutinin is involved in the attachment to sialic acid, which is a receptor on the target cell surface. The hemagglutinin allows binding onto and consequently penetration of the virus into the target cell.

[ Luo, M., Air, G. M., Brouillette, W.J. The Journal of Infectious Diseases. 1997, 176: 62-65. Malaisree, M., Rungrotmongkol, T., Decha, P., Intharathep, P., Aruksakunwon, O., Hannongbuw, S. Proteins 2008, 71: 1908-1918.]

There are nine subtypes of neuraminidase, NA: N1-N9. After the virus has replicated within the target cell, the neuraminidase cleaves the terminal sialic. acid from the receptor, allowing the newly formed virus to be released and infect other cells. Each of the three components are important in the replication and spread of influenza throughout the body, but if just but if just one segment of the cycle can be stopped, influenza could be more easily controlled Currently, there are few options for the prevention or treatment of influenza. Vaccines are typically readily available for prevention.

[Couch. , Robert B. The New England Journal of Medicine 2000, 343: 1778-1788]

Many people who are at-risk do not take advantage of this form of prevention. There are four pharmaceutical products currently approved by the FDA available for the treatment or prevention of influenza, which include two different types: ion channel blockers and neuraminidase inhibitors. These drugs are approved for either treatment or prevention if it is almost certain the patient will contract the virus. Amantadine and Rimantadine are two ion channel blocking drugs. They function by blocking an ion channel in the M2 protein of the viral membrane. The drugs prohibit the entrance of hydrogen ions through the membrane, which in turn prevent replication

[Balfour Jr, Henry H. The New England Journal of Medicine. 1999, 340: 1255-1269.]

Amantadine and Rimantadine reduce and shorten the symptoms of influenza A if given to patients within 48 hours of the emergence of symptoms.7 Oseltamivir and Zanamivir are two neuraminidase inhibitors. They are effective because they inhibit the production of neuraminidase, preventing the virus from penetrating the cell surface, and thus preventing infection. Oseltamivir and Zanamivir are effective in reducing symptoms for both influenza A and B when given to patients who are symptomatic for less than two days.

[Robert B. The New England Journal of Medicine 2000, 343: 1778-1788]

Clearly, there is an enormous need for a practical approach to the treatment of influenza. The goal of this research is to design a new antiviral drug which is effective against both influenza A and B. The ideal drug should have minimal side effects and fewer restraints than the current drugs on the market. The purpose of this thesis is to present my research procedure, difficulties which were overcome, and resulting information.

Structural Approaches to Drug Discovery: Ligand-Protein Interactions; Stroud, Robert M.; Finer-Moore, Janet; Royal Society of Chemistry: Cambridge, UK, 2012

Chapter 1

BIOINFORMATIC In biology, bioinformatics disciplinary field that develops and improves upon methods for storing, retrieving, organizing and analyzing biological data. A major activity in bioinformatics is to develop software tools to generate useful biological knowledge.

Bioinformatics has become an important part of many areas of biology. In experimental molecular biology, bioinformatics techniques such as image and signal processing allow extraction of useful results from large amounts of raw data. In the field of genetics and genomics, it aids in sequencing and annotating genomes and their observed mutations. It plays a role in the textual mining of biological literature and the development of biological and gene ontologies to organize and query biological data. It plays a role in the analysis of gene and protein expression and regulation. Bioinformatics tools aid in the comparison of genetic and genomic data and more generally in the understanding of evolutionary aspects of molecular biology. At a more integrative level, it helps analyze and catalogue the biological pathways and networks that are an important part of systems biology. In structural biology, it aids in the simulation and modeling of DNA, RNA, and protein structures as well as molecular interactions.

Bioinformatics uses many areas of computer science, mathematics and engineering to process biological data. Complex machines are used to read in biological data at a much faster rate than before. Databases and information systems are used to store and organize biological data. Analyzing biological data may involve algorithms in artificial intelligence, soft computing, data mining, image processing, and simulation. The algorithms in turn depend on theoretical foundations such as discrete mathematics, control theory, system theory, information theory, and statistics. Commonly used software tools and technologies in the field include Java, C#, XML, Perl, C, C++, Python, R, SQL, CUDA, MATLAB, and spreadsheet applications.

History Building on the recognition of the importance of information transmission, accumulation and processing in biological systems, in 1978 Paulien Hogeweg, coined the term "Bioinformatics" to refer to the study of information processes in biotic systems. This definition placed bioinformatics as a field parallel to biophysics or biochemistry (biochemistry is the study of chemical processes in biological systems Examples of relevant biological information processes studied in the early days of bioinformatics are the formation of complex social interaction structures by simple behavioral rules, and the information accumulation and maintenance in models of prebiotic evolution. One early contributor to bioinformatics was Elvin A. Kabat, who pioneered biological sequence analysis with his comprehensive volumes of antibody sequences released with Tai Te Wu between 1980 and 1991. Another significant pioneer in the field was Margaret Oakley Dayhoff, who has been hailed by David Lipman, director of the National Center for Biotechnology Information, as the "mother and father of bioinformatics."] At the beginning of the "genomic revolution", the term bioinformatics was rediscovered to refer to the creation and maintenance of a database to store biological information such as nucleotide sequences and amino acid sequences. Development of this type of database involved not only design issues but the development of complex interfaces whereby researchers could access existing data as well as submit new or revised data. Goals In order to study how normal cellular activities are altered in different disease states, the biological data must be combined to form a comprehensive picture of these activities. Therefore, the field of bioinformatics has evolved such that the most pressing task now involves the analysis and interpretation of various types of data. This includes nucleotide and amino acid sequences, protein domains, and protein structures.[9] The actual process of analyzing and interpreting data is referred to as computational biology. Important sub-disciplines within bioinformatics and computational biology include: the development and implementation of tools that enable efficient access to, use and management of, various types of information. the development of new algorithms (mathematical formulas) and statistics with which to assess relationships among members of large data sets. For example, methods to

locate a gene within a sequence, predict protein structure and/or function, and cluster protein sequences into families of related sequences. The primary goal of bioinformatics is to increase the understanding of biological processes. What sets it apart from other approaches, however, is its focus on developing and applying computationally intensive techniques to achieve this goal. Examples include: pattern recognition, data mining, machine learning algorithms, and visualization. Major research efforts in the field include sequence alignment, gene finding, genome assembly, drug design, drug discovery, protein structure alignment, protein structure prediction, prediction of gene expression and proteinprotein interactions, genome-wide association studies, and the modeling of evolution. Bioinformatics now entails the creation and advancement of databases, algorithms, computational and statistical techniques, and theory to solve formal and practical problems arising from the management and analysis of biological data. Over the past few decades rapid developments in genomic and other molecular research technologies and developments in information technologies have combined to produce a tremendous amount of information related to molecular biology. Bioinformatics is the name given to these mathematical and computing approaches used to glean understanding of biological processes. Approaches Common activities in bioinformatics include mapping and analyzing DNA and protein sequences, aligning different DNA and protein sequences to compare them, and creating and viewing 3-D models of protein structures. There are two fundamental ways of modelling a Biological system (e.g., living cell) both coming under Bioinformatic approaches. Static Sequences Proteins, Nucleic acids and Peptides Interaction data among the above entities including microarray data and Networks of proteins, metabolites Dynamic Structures Proteins, Nucleic acids, Ligands (including metabolites and drugs) and Peptides (structures studied with bioinformatics tools are not considered static anymore and their dynamics is often the core of the structural studies)

Systems Biology comes under this category including reaction fluxes and variable concentrations of metabolites Multi-Agent Based modelling approaches capturing cellular events such as signalling, transcription and reaction dynamics A broad sub-category under bioinformatics is structural bioinformatics. Major research areas Sequence analysis Since the Phage -X174 was sequenced in 1977,[10] the DNA sequences of thousands of organisms have been decoded and stored in databases. This sequence information is analyzed to determine genes that encode polypeptides (proteins), RNA genes, regulatory sequences, structural motifs, and repetitive sequences. A comparison of genes within a species or between different species can show similarities between protein functions, or relations between species (the use of molecular systematics to construct phylogenetic trees). With the growing amount of data, it long ago became impractical to analyze DNA sequences manually. Today, computer programs such as BLAST are used daily to search sequences from more than 260 000 organisms, containing over 190 billion nucleotides.[11] These programs can compensate for mutations (exchanged, deleted or inserted bases) in the DNA sequence, to identify sequences that are related, but not identical. A variant of this sequence alignment is used in the sequencing process itself. The so-called shotgun sequencing technique (which was used, for example, by The Institute for Genomic Research to sequence the first bacterial genome, Haemophilus influenzae)[12] does not produce entire chromosomes. Instead it generates the sequences of many thousands of small DNA fragments (ranging from 35 to 900 nucleotides long, depending on the sequencing technology). The ends of these fragments overlap and, when aligned properly by a genome assembly program, can be used to reconstruct the complete genome. Shotgun sequencing yields sequence data quickly, but the task of assembling the fragments can be quite complicated for larger genomes. For a genome as large as the human genome, it may take many days of CPU time on large-memory, multiprocessor computers to assemble the fragments, and the resulting assembly will usually contain numerous gaps that have to be filled in later. Shotgun sequencing is the method of choice for virtually all genomes sequenced today, and genome assembly algorithms are a critical area of bioinformatics research. Another aspect of bioinformatics in sequence analysis is annotation. This involves computational gene finding to search for protein-coding genes, RNA genes, and other functional sequences within a genome. Not all of the nucleotides within a genome are part of genes. Within the genomes of higher organisms, large parts of the DNA do not

serve any obvious purpose. This so-called junk DNA may, however, contain unrecognized functional elements. Bioinformatics helps to bridge the gap between genome and proteome projects for example, in the use of DNA sequences for protein identification. Genome annotation In the context of genomics, annotation is the process of marking the genes and other biological features in a DNA sequence. The first genome annotation software system was designed in 1995 by Dr. Owen White, who was part of the team at The Institute for Genomic Research that sequenced and analyzed the first genome of a free-living organism to be decoded, the bacterium Haemophilus influenzae. Dr. White built a software system to find the genes (fragments of genomic sequence that encode proteins), the transfer RNAs, and to make initial assignments of function to those genes. Most current genome annotation systems work similarly, but the programs available for analysis of genomic DNA, such as the GeneMark program trained and used to find protein-coding genes in Haemophilus influenzae, are constantly changing and improving. Computational evolutionary biology Evolutionary biology is the study of the origin and descent of species, as well as their change over time. Informatics has assisted evolutionary biologists in several key ways; it has enabled researchers to: trace the evolution of a large number of organisms by measuring changes in their DNA, rather than through physical taxonomy or physiological observations alone. more recently, compare entire genomes, which permits the study of more complex evolutionary events, such as gene duplication, horizontal gene transfer, and the prediction of factors important in bacterial speciation, build complex computational models of populations to predict the outcome of the system over time. track and share information on an increasingly large number of species and organisms.

Future work endeavours to reconstruct the now more complex tree of life. The area of research within computer science that uses genetic algorithms is sometimes confused with computational evolutionary biology, but the two areas are not necessarily related.

Literature analysis The growth in the number of published literature makes it virtually impossible to read every paper, resulting in disjointed sub-fields of research. Literature analysis aims to employ computational and statistical linguistics to mine this growing library of text resources. For example: abbreviation recognition - identify the long-form and abbreviation of biological terms, named entity recognition - recognizing biological terms such as gene names protein-protein interaction - identify which proteins interact with which proteins from text The area of research draws from statistics and computational linguistics. Analysis of gene expression The expression of many genes can be determined by measuring mRNA levels with multiple techniques including microarrays, expressed cDNA sequence tag (EST) sequencing, serial analysis of gene expression (SAGE) tag sequencing, massively parallel signature sequencing (MPSS), RNA-Seq, also known as "Whole Transcriptome Shotgun Sequencing" (WTSS), or various applications of multiplexed in-situ hybridization. All of these techniques are extremely noise-prone and/or subject to bias in the biological measurement, and a major research area in computational biology involves developing statistical tools to separate signal from noise in highthroughput gene expression studies. Such studies are often used to determine the genes implicated in a disorder: one might compare microarray data from cancerous epithelial cells to data from non-cancerous cells to determine the transcripts that are up-regulated and down-regulated in a particular population of cancer cells. Analysis of regulation Regulation is the complex orchestration of events starting with an extracellular signal such as a hormone and leading to an increase or decrease in the activity of one or more proteins. Bioinformatics techniques have been applied to explore various steps in this process. For example, promoter analysis involves the identification and study of sequence motifs in the DNA surrounding the coding region of a gene. These motifs influence the extent to which that region is transcribed into mRNA. Expression data can be used to infer gene regulation: one might compare microarray data from a wide variety of states of an organism to form hypotheses about the genes involved in each state. In a single-cell organism, one might compare stages of the cell cycle, along with various stress conditions (heat shock, starvation, etc.). One can then apply clustering algorithms to that expression data to determine which genes are co-expressed. For

example, the upstream regions (promoters) of co-expressed genes can be searched for over-represented regulatory elements. Examples of clustering algorithms applied in gene clustering are k-means clustering, self-organizing maps (SOMs), hierarchical clustering, and consensus clustering methods such as the Bi-CoPaM. The later, namely Bi-CoPaM, has been actually proposed to address various issues specific to gene discovery problems such as consistent co-expression of genes over multiple microarray datasets. Analysis of protein expression Protein microarrays and high throughput (HT) mass spectrometry (MS) can provide a snapshot of the proteins present in a biological sample. Bioinformatics is very much involved in making sense of protein microarray and HT MS data; the former approach faces similar problems as with microarrays targeted at mRNA, the latter involves the problem of matching large amounts of mass data against predicted masses from protein sequence databases, and the complicated statistical analysis of samples where multiple, but incomplete peptides from each protein are detected. Analysis of mutations in cancer In cancer, the genomes of affected cells are rearranged in complex or even unpredictable ways. Massive sequencing efforts are used to identify previously unknown point mutations in a variety of genes in cancer. Bioinformaticians continue to produce specialized automated systems to manage the sheer volume of sequence data produced, and they create new algorithms and software to compare the sequencing results to the growing collection of human genome sequences and germline polymorphisms. New physical detection technologies are employed, such as oligonucleotide microarrays to identify chromosomal gains and losses (called comparative genomic hybridization), and single-nucleotide polymorphism arrays to detect known point mutations. These detection methods simultaneously measure several hundred thousand sites throughout the genome, and when used in highthroughput to measure thousands of samples, generate terabytes of data per experiment. Again the massive amounts and new types of data generate new opportunities for bioinformaticians. The data is often found to contain considerable variability, or noise, and thus Hidden Markov model and change-point analysis methods are being developed to infer real copy number changes. Another type of data that requires novel informatics development is the analysis of lesions found to be recurrent among many tumors.

Comparative genomics The core of comparative genome analysis is the establishment of the correspondence between genes (orthology analysis) or other genomic features in different organisms. It is these intergenomic maps that make it possible to trace the evolutionary processes responsible for the divergence of two genomes. A multitude of evolutionary events acting at various organizational levels shape genome evolution. At the lowest level, point mutations affect individual nucleotides. At a higher level, large chromosomal segments undergo duplication, lateral transfer, inversion, transposition, deletion and insertion. Ultimately, whole genomes are involved in processes of hybridization, polyploidization and endosymbiosis, often leading to rapid speciation. The complexity of genome evolution poses many exciting challenges to developers of mathematical models and algorithms, who have recourse to a spectra of algorithmic, statistical and mathematical techniques, ranging from exact, heuristics, fixed parameter and approximation algorithms for problems based on parsimony models to Markov Chain Monte Carlo algorithms for Bayesian analysis of problems based on probabilistic models. Many of these studies are based on the homology detection and protein families computation.

Network and systems biology Network analysis seeks to understand the relationships within biological networks such as metabolic or protein-protein interaction networks. Although biological networks can be constructed from a single type of molecule or entity (such as genes), network biology often attempts to integrate many different data types, such as proteins, small molecules, gene expression data, and others, which are all connected physically and/or functionally. Systems biology involves the use of computer simulations of cellular subsystems (such as the networks of metabolites and enzymes which comprise metabolism, signal transduction pathways and gene regulatory networks) to both analyze and visualize the complex connections of these cellular processes. Artificial life or virtual evolution attempts to understand evolutionary processes via the computer simulation of simple (artificial) life forms.

High-throughput image analysis Computational technologies are used to accelerate or fully automate the processing, quantification and analysis of large amounts of high-information-content biomedical imagery. Modern image analysis systems augment an observer's ability to make measurements from a large or complex set of images, by improving accuracy, objectivity, or speed. A fully developed analysis system may completely replace the observer. Although these systems are not unique to biomedical imagery, biomedical imaging is becoming more important for both diagnostics and research. Some examples are: high-throughput and high-fidelity quantification and sub-cellular localization (high-content screening, cytohistopathology, Bioimage informatics) morphometrics clinical image analysis and visualization determining the real-time air-flow patterns in breathing lungs of living animals quantifying occlusion size in real-time imagery from the development of and recovery during arterial injury making behavioral observations from extended video recordings of laboratory animals infrared measurements for metabolic activity determination inferring clone overlaps in DNA mapping, e.g. the Sulston score

Structural Bio-informatic approaches: Prediction of protein structure Protein structure prediction is another important application of bioinformatics. The amino acid sequence of a protein, the so-called primary structure, can be easily determined from the sequence on the gene that codes for it. In the vast majority of cases, this primary structure uniquely determines a structure in its native environment. (Of course, there are exceptions, such as the bovine spongiform encephalopathy a.k.a. Mad Cow Disease prion.) Knowledge of this structure is vital in understanding the function of the protein. For lack of better terms, structural information is usually classified as one of secondary, tertiary and quaternary structure. A viable general solution to such predictions remains an open problem. Most efforts have so far been directed towards heuristics that work most of the time. One of the key ideas in bioinformatics is the notion of homology. In the genomic branch of bioinformatics, homology is used to predict the function of a gene: if the sequence of gene A, whose function is known, is homologous to the sequence of gene

B, whose function is unknown, one could infer that B may share A's function. In the structural branch of bioinformatics, homology is used to determine which parts of a protein are important in structure formation and interaction with other proteins. In a technique called homology modeling, this information is used to predict the structure of a protein once the structure of a homologous protein is known. This currently remains the only way to predict protein structures reliably. One example of this is the similar protein homology between hemoglobin in humans and the hemoglobin in legumes (leghemoglobin). Both serve the same purpose of transporting oxygen in the organism. Though both of these proteins have completely different amino acid sequences, their protein structures are virtually identical, which reflects their near identical purposes. Other techniques for predicting protein structure include protein threading and de novo (from scratch) physics-based modeling. . Molecular Interaction Efficient software is available today for studying interactions among proteins, ligands and peptides. Types of interactions most often encountered in the field include Proteinligand (including drug), proteinprotein and proteinpeptide. Molecular dynamic simulation of movement of atoms about rotatable bonds is the fundamental principle behind computational algorithms, termed docking algorithms for studying molecular interactions. . In the last two decades, tens of thousands of protein three-dimensional structures have been determined by X-ray crystallography and Protein nuclear magnetic resonance spectroscopy (protein NMR). One central question for the biological scientist is whether it is practical to predict possible proteinprotein interactions only based on these 3D shapes, without doing proteinprotein interaction experiments. A variety of methods have been developed to tackle the Proteinprotein docking problem, though it seems that there is still much work to be done in this field. Software and tools Software tools for bioinformatics range from simple command-line tools, to more complex graphical programs and standalone web-services available from various bioinformatics companies or public institutions. Open-source bioinformatics software

Many free and open-source software tools have existed and continued to grow since the 1980s.[16] The combination of a continued need for new algorithms for the analysis of emerging types of biological readouts, the potential for innovative in silico experiments, and freely available open code bases have helped to create opportunities for all research groups to contribute to both bioinformatics and the range of opensource software available, regardless of their funding arrangements. The open source tools often act as incubators of ideas, or community-supported plug-ins in commercial applications. They may also provide de facto standards and shared object models for assisting with the challenge of bioinformation integration. The range of open-source software packages includes titles such as Bioconductor, BioPerl, Biopython, BioJava, BioRuby, Bioclipse, EMBOSS, Taverna workbench, and UGENE. In order to maintain this tradition and create further opportunities, the non-profit Open Bioinformatics Foundation[16] have supported the annual Bioinformatics Open Source Conference (BOSC) since 2000.

Chapter 2

Drug designing Drug design, sometimes referred to as rational drug design or more simply rational design, is the inventive process of finding new medications based on the knowledge of a biological target. The drug is most commonly an organic small molecule that activates or inhibits the function of a biomolecule such as a protein, which in turn results in a therapeutic benefit to the patient. In the most basic sense, drug design involves the design of small molecules that are complementary in shape and charge to the biomolecular target with which they interact and therefore will bind to it. Drug design frequently but not necessarily relies on computer modeling techniques. This type of modeling is often referred to as computer-aided drug design. Finally, drug design that relies on the knowledge of the three-dimensional structure of the biomolecular target is known as structure-based drug design. The phrase "drug design" is to some extent a misnomer. What is really meant by drug design is ligand design (i.e., design of a small molecule that will bind tightly to its target). Although modeling techniques for prediction of binding affinity are reasonably successful, there are many other properties, such as bioavailability, metabolic half-life, lack of side effects, etc., that first must be optimized before a ligand can become a safe and efficacious drug. These other characteristics are often difficult to optimize using rational drug design techniques.

Background Typically a drug target is a key molecule involved in a particular metabolic or signaling pathway that is specific to a disease condition or pathology or to the infectivity or survival of a microbial pathogen. Some approaches attempt to inhibit the functioning of the pathway in the diseased state by causing a key molecule to stop functioning. Drugs may be designed that bind to the active region and inhibit this key molecule. Another approach may be to enhance the normal pathway by promoting specific molecules in the normal pathways that may have been affected in the diseased state. In addition, these drugs should also be designed so as not to affect any other important "off-target" molecules or antitargets that may be similar in appearance to the target molecule, since drug interactions with off-target molecules may lead to undesirable side effects. Sequence homology is often used to identify such risks.Most commonly, drugs are organic small molecules produced through chemical synthesis, but biopolymer-based drugs (also

known as biologics) produced through biological processes are becoming increasingly more common. In addition, mRNA-based gene silencing technologies may have therapeutic applications.

Types

Flow charts of two strategies of structure-based drug design There are two major types of drug design. The first is referred to as ligand-based drug design and the second, structure-based drug design.

Ligand-based Ligand-based drug design (or indirect drug design) relies on knowledge of other molecules that bind to the biological target of interest. These other molecules may be used to derive a pharmacophore model that defines the minimum necessary structural characteristics a molecule must possess in order to bind to the target.[4] In other words, a model of the biological target may be built based on the knowledge of what binds to it, and this model in turn may be used to design new molecular entities that interact with the target. Alternatively, a quantitative structure-activity relationship (QSAR), in which a correlation between calculated properties of molecules and their experimentally determined biological activity, may be derived. These QSAR relationships in turn may be used to predict the activity of new analogs.

Structure-based Structure-based drug design (or direct drug design) relies on knowledge of the three dimensional structure of the biological target obtained through methods such as x-ray crystallography or NMR spectroscopy.[5] If an experimental structure of a target is not available, it may be possible to create a homology model of the target based on the experimental structure of a related protein. Using the structure of the biological target, candidate drugs that are predicted to bind with high affinity and selectivity to the target may be designed using interactive graphics and the intuition of a medicinal chemist. Alternatively various automated computational procedures may be used to suggest new drug candidates. As experimental methods such as X-ray crystallography and NMR develop, the amount of information concerning 3D structures of biomolecular targets has increased dramatically. In parallel, information about the structural dynamics and electronic properties about ligands has also increased. This has encouraged the rapid development of the structure-based drug design. Current methods for structure-based drug design can be divided roughly into two categories. The first category is about finding ligands for a given receptor, which is usually referred as database searching. In this case, a large number of potential ligand molecules are screened to find those fitting the binding pocket of the receptor. This method is usually referred as ligandbased drug design. The key advantage of database searching is that it saves synthetic effort to obtain new lead compounds. Another category of structure-based drug design methods is about building ligands, which is usually referred as receptor-based drug design. In this case, ligand molecules are built up within the constraints of the binding pocket by assembling small pieces in a stepwise manner. These pieces can be either individual atoms or molecular fragments. The key advantage of such a method is that

novel structures, not contained in any database, can be suggested. These techniques are raising much excitement to the drug design community

Active site identification Active site identification is the first step in this program. It analyzes the protein to find the binding pocket, derives key interaction sites within the binding pocket, and then prepares the necessary data for Ligand fragment link. The basic inputs for this step are the 3D structure of the protein and a pre-docked ligand in PDB format, as well as their atomic properties. Both ligand and protein atoms need to be classified and their atomic properties should be defined, basically, into four atomic types: Hydrophobic atom: All carbons in hydrocarbon chains or in aromatic groups. H-bond donor: Oxygen and nitrogen atoms bonded to hydrogen atom(s). H-bond acceptor: Oxygen and sp2 or sp hybridized nitrogen atoms with lone electron pair(s). Polar atom: Oxygen and nitrogen atoms that are neither H-bond donor nor Hbond acceptor, sulfur, phosphorus, halogen, metal, and carbon atoms bonded to hetero-atom(s). The space inside the ligand binding region would be studied with virtual probe atoms of the four types above so the chemical environment of all spots in the ligand binding region can be known. Hence we are clear what kind of chemical fragments can be put into their corresponding spots in the ligand binding region of the receptor.

Amino acid symbols Amino acids are classified into different ways based on polarity, structure, nutritional requirement, metabolic fate, etc. Generally used classification is based on polarity. Based on polarity amino acids are classified into four groups. Non-polar amino acids They have equal number of amino and carboxyl groups and are neutral.These amino acids are hydrophobic and have no charge on the 'R' group. The amino acids in this group are alanine, valine, leucine, isoleucine, phenyl alanine, glycine, tryptophan, methionine and proline.

Polar amino acids with no charge These amino acids do not have any charge on the 'R' group. These amino acids participate in hydrogen bonding of protein structure. The amino acids in this group are - serine, threonine, tyrosine, cysteine, glutamine and aspargine.

Polar amino acids with positive charge Polar amino acids with positive charge have more amino groups as compared to carboxyl groups making it basic. The amino acids, which have positive charge on the 'R' group are placed in this category. They are lysine, arginine and histidine.

valine, leucine, isoleucine, phenyl alanine, glycine, tryptophan, methionine and proline.

Polar amino acids with no charge These amino acids do not have any charge on the 'R' group. These amino acids participate in hydrogen bonding of protein structure. The amino acids in this group are - serine, threonine, tyrosine, cysteine, glutamine and aspargine.

Polar amino acids with positive charge Polar amino acids with positive charge have more amino groups as compared to carboxyl groups making it basic.

The amino acids, which have positive charge on the 'R' group are placed in this category. They are lysine, arginine and histidine.

Polar amino acids with negative charge Polar amino acids with negative charge have more carboxyl groups than amino groups making them acidic. The amino acids, which have negative charge on the 'R' group are placed in this category. They are called as dicarboxylic mono-amino acids. They are aspartic acid and glutamic acid.

Proline is amino acid.

Amino acid alanine arginine asparagine aspartic acid cysteine glutamic acid glutamine glycine histidine isoleucine leucine lysine methionine phenylalanine proline serine threonine tryptophan tyrosine valine

One letter symbol A R N D C E Q G H I L K M F P S T W Y V

Three letter symbol Ala Arg Asn Asp Cys Glu Gln Gly His Ile Leu Lys Met Phe Pro Ser Thr Trp Tyr Val

Single letter and triple letter codes for amino acids.

CHAPTER 3
INTRODUCTION

Swine influenza This article is about influenza viruses in pigs. For the 2009 outbreak, see 2009 flu pandemic. For the 2009 human virus, see Pandemic H1N1/09 virus. Electron microscope image of the reassorted H1N1 influenza virus photographed at the CDC Influenza Laboratory. The viruses are 80120 nanometres in diameter Swine influenza, also called pig influenza, swine flu, hog flu and pig flu, is an infection caused by any one of several types of swine influenza viruses. Swine influenza virus (SIV) or swine-origin influenza virus (S-OIV) is any strain of the influenza family of viruses that is endemic in pigs.[2] As of 2009, the known SIV strains include influenza C and the subtypes of influenza A known as H1N1, H1N2, H2N1, H3N1, H3N2, and H2N3. Swine influenza virus is common throughout pig populations worldwide. Transmission of the virus from pigs to humans is not common and does not always lead to human flu, often resulting only in the production of antibodies in the blood. If transmission does cause human flu, it is called zoonotic swine flu. People with regular exposure to pigs are at increased risk of swine flu infection. During the mid-20th century, identification of influenza subtypes became possible, allowing accurate diagnosis of transmission to humans. Since then, only 50 such transmissions have been confirmed. These strains of swine flu rarely pass from human to human. Symptoms of zoonotic swine flu in humans are similar to those of influenza and of influenza-like illness in general, namely chills, fever, sore throat, muscle pains, severe headache, coughing, weakness and general discomfort. In August 2010, the World Health Organization declared the swine flu an endamic. Classification Of the three genera of influenza viruses that cause human flu, two also cause influenza in pigs, with influenza A being common in pigs and influenza C being rare.[3] Influenza B has not been reported in pigs. Within influenza A and influenza C, the strains found in pigs and humans are largely distinct, although because of reassortment there have been transfers of genes among strains crossing swine, avian, and human species boundaries.

Influenza C Influenza viruses infect both humans and pigs, but do not infect birds.[4] Transmission between pigs and humans have occurred in the past.[5] For example, influenza C caused small outbreaks of a mild form of influenza amongst children in Japan[6] and California.[6] Because of its limited host range and the lack of genetic diversity in influenza C, this form of influenza does not cause pandemics in humans. Influenza A Swine influenza is known to be caused by influenza A subtypes H1N1,[8] H1N2, H2N3, H3N1,[10] and H3N2. In pigs, three influenza A virus subtypes (H1N1, H1N2,H3N2 and H7N9) are the most common strains worldwide.[11] In the United States, the H1N1 subtype was exclusively prevalent among swine populations before 1998; however, since late August 1998, H3N2 subtypes have been isolated from pigs. As of 2004, H3N2 virus isolates in US swine and turkey stocks were triple reassortants, containing genes from human (HA, NA, and PB1), swine (NS, NP, and M), and avian (PB2 and PA) lineages.[12] In August 2012, the Center for Disease Control and Prevention confirmed 145 human cases (113 in Indiana, 30 in Ohio, one in Hawaii and one in Illinois) of H3N2v since July 2012.The death of a 61-year-old Madison County, Ohio woman is the first in the nation associated with a new swine flu strain. She contracted the illness after having contact with hogs at the Ross County Fair. Surveillance Although there is no formal national surveillance system in the United States to determine what viruses are circulating in pigs, an informal surveillance network in the United States is part of a world surveillance network History Swine influenza was first proposed to be a disease related to human flu during the 1918 flu pandemic, when pigs became ill at the same time as humans.[17] The first identification of an influenza virus as a cause of disease in pigs occurred about ten years later, in 1930.] For the following 60 years, swine influenza strains were almost exclusively H1N1. Then, between 1997 and 2002, new strains of three different subtypes and five different genotypes emerged as causes of influenza among pigs in North America. In 19971998, H3N2 strains emerged. These strains, which include genes derived by reassortment from human, swine and avian viruses, have become a major cause of swine influenza in North America. Reassortment between H1N1 and H3N2 produced H1N2. In 1999 in Canada, a strain of H4N6 crossed the species barrier from birds to pigs, but was contained on a single farm.

The H1N1 form of swine flu is one of the descendants of the strain that caused the 1918 flu pandemic. As well as persisting in pigs, the descendants of the 1918 virus have also circulated in humans through the 20th century, contributing to the normal seasonal epidemics of influenza. However, direct transmission from pigs to humans is rare, with only 12 recorded cases in the U.S. since 2005. Nevertheless, the retention of influenza strains in pigs after these strains have disappeared from the human population might make pigs a reservoir where influenza viruses could persist, later emerging to reinfect humans once human immunity to these strains has waned. Swine flu has been reported numerous times as a zoonosis in humans, usually with limited distribution, rarely with a widespread distribution. Outbreaks in swine are common and cause significant economic losses in industry, primarily by causing stunting and extended time to market. For example, this disease costs the British meat industry about 65 million every year. Transmission Transmission between pigs Influenza is quite common in pigs, with about half of breeding pigs having been exposed to the virus in the US. Antibodies to the virus are also common in pigs in other countries.[57] The main route of transmission is through direct contact between infected and uninfected animals These close contacts are particularly common during animal transport. Intensive farming may also increase the risk of transmission, as the pigs are raised in very close proximity to each other. The direct transfer of the virus probably occurs either by pigs touching noses, or through dried mucus. Airborne transmission through the aerosols produced by pigs coughing or sneezing are also an important means of infection. The virus usually spreads quickly through a herd, infecting all the pigs within just a few days. Transmission may also occur through wild animals, such as wild boar, which can spread the disease between farms.[60] Transmission to humans People who work with poultry and swine, especially those with intense exposures, are at increased risk of zoonotic infection with influenza virus endemic in these animals, and constitute a population of human hosts in which zoonosis and reassortment can co-occur. Vaccination of these workers against influenza and surveillance for new influenza strains among this population may therefore be an important public health measure Transmission of influenza from swine to humans who work with swine was documented in a small surveillance study performed in 2004 at the University of Iowa.] This study, among others, forms the basis of a recommendation that people

whose jobs involve handling poultry and swine be the focus of increased public health surveillance. Other professions at particular risk of infection are veterinarians and meat processing workers, although the risk of infection for both of these groups is lower than that of farm workers Interaction with avian H5N1 in pigs Pigs are unusual as they can be infected with influenza strains that usually infect three different species: pigs, birds and humans This makes pigs a host where influenza viruses might exchange genes, producing new and dangerous strains Avian influenza virus H3N2 is endemic in pigs in China, and has been detected in pigs in Vietnam, increasing fears of the emergence of new variant strains.[66] H3N2 evolved from H2N2 by antigenic shift In August 2004, researchers in China found H5N1 in pigs

These H5N1 infections may be quite common; in a survey of 10 apparently healthy pigs housed near poultry farms in West Java, where avian flu had broken out, five of the pig samples contained the H5N1 virus. The Indonesian government has since found similar results in the same region. Additional tests of 150 pigs outside the area were negative Signs and symptoms In swine In pigs, influenza infection produces fever, lethargy, sneezing, coughing, difficulty breathing and decreased appetite In some cases the infection can cause abortion. Although mortality is usually low (around 14%),] the virus can produce weight loss and poor growth, causing economic loss to farmers Infected pigs can lose up to 12 pounds of body weight over a three- to four-week period.

In humans

Direct transmission of a swine flu virus from pigs to humans is occasionally possible (called zoonotic swine flu). In all, 50 cases are known to have occurred since the first report in medical literature in 1958, which have resulted in a total of six deaths.[72] Of these six people, one was pregnant, one had leukemia, one had Hodgkin's lymphoma and two were known to be previously healthy Despite these apparently low numbers of infections, the true rate of infection may be higher, since most cases only cause a very mild disease, and will probably never be reported or diagnosed According to the Centers for Disease Control and Prevention (CDC), in humans the symptoms of the 2009 "swine flu" H1N1 virus are similar to those of influenza and of influenza-like illness in general. Symptoms include fever, cough, sore throat, body aches, headache, chills and fatigue. The 2009 outbreak has shown an increased percentage of patients reporting diarrhea and vomiting The 2009 H1N1 virus is not zoonotic swine flu, as it is not transmitted from pigs to humans, but from person to person. Because these symptoms are not specific to swine flu, a differential diagnosis of probable swine flu requires not only symptoms, but also a high likelihood of swine flu due to the person's recent history. For example, during the 2009 swine flu outbreak in the United States, the CDC advised physicians to "consider swine influenza infection in the differential diagnosis of patients with acute febrile respiratory illness who have either been in contact with persons with confirmed swine flu, or who were in one of the five U.S. states that have reported swine flu cases or in Mexico during the seven

days preceding their illness onset."[75] A diagnosis of confirmed swine flu requires laboratory testing of a respiratory sample (a simple nose and throat swab).[75] The most common cause of death is respiratory failure. Other causes of death are pneumonia (leading to sepsis),[76] high fever (leading to neurological problems), dehydration (from excessive vomiting and diarrhea), electrolyte imbalance and kidney failure.[77] Fatalities are more likely in young children and the elderly. Diagnosis The CDC recommends real-time RT-PCR as the method of choice for diagnosing H1N1 This method allows a specific diagnosis of novel influenza (H1N1) as opposed to seasonal influenza. Near-patient point-of-care tests are in development.[79] Prevention Prevention of swine influenza has three components: prevention in swine, prevention of transmission to humans, and prevention of its spread among humans. The proteins which was present influenza virus A Neuraminidase. were Hemagglutinin and

Hemmaglutinin Influenza hemagglutinin (HA) or haemagglutinin (British English) is a type of hemagglutinin found on the surface of the influenza viruses. It is an antigenic glycoprotein. It is responsible for binding the virus to the cell that is being infected. HA proteins bind to cells with sialic acid on the membranes, such as cells in the upper respiratory tract or erythrocytes. The name "hemagglutinin" comes from the protein's ability to cause red blood cells (erythrocytes) to clump together ("agglutinate") in vitro. Subtypes

Structure of influenza, showing neuraminidase marked as NA and hemagglutinin as HA.

There are at least 17 different HA antigens. These subtypes are named H1 through H17. H16 was discovered only in 2004 on influenza A viruses isolated from blackheaded gulls from Sweden and Norway. The most recent H17 was discovered in 2012 in fruit bats. he first three hemagglutinins, H1, H2, and H3, are found in human influenza viruses. Viral neuraminidase (NA) is another protein found on the surface of influenza. Influenza viruses are characterised by the type of HA and NA that they carry; hence H1N1, H5N2 etc. A highly pathogenic avian flu virus of H5N1 type has been found to infect humans at a low rate. It has been reported that single amino acid changes in this avian virus strain's type H5 hemagglutinin have been found in human patients that "can significantly alter receptor specificity of avian H5N1 viruses, providing them with an ability to bind to receptors optimal for human influenza viruses".[5][6] This finding seems to explain how an H5N1 virus that normally does not infect humans can mutate and become able to efficiently infect human cells. The hemagglutinin of the H5N1 virus has been associated with the high pathogenicity of this flu virus strain, apparently due to its ease of conversion to an active form by proteolysis. Function and Mechanism HA has two functions. Firstly, it allows the recognition of target vertebrate cells, accomplished through the binding to these cells' sialic acid-containing receptors. Secondly, once bound it facilitates the entry of the viral genome into the target cells by causing the fusion of host endosomal membrane with the viral membrane.[9] HA binds to the monosaccharide sialic acid which is present on the surface of its target cells, which causes the viral particles to stick to the cell's surface. The cell membrane then engulfs the virus and the portion of the membrane that encloses it pinches off to form a new membrane-bound compartment within the cell called an endosome, which contains the engulfed virus. The cell then attempts to begin digesting the contents of the endosome by acidifying its interior and transforming it into a lysosome. However, as soon as the pH within the endosome drops to about 6.0, the original folded structure of the HA molecule becomes unstable, causing it to partially unfold and release a very hydrophobic portion of its peptide chain that was previously hidden within the protein.[citation needed] This so-called "fusion peptide" acts like a molecular grappling hook by inserting itself into the endosomal membrane and locking on. Then, when the rest of the HA molecule refolds into a new structure (which is more stable at the lower pH), it "retracts the grappling hook" and pulls the endosomal membrane right up next to the virus particle's own membrane, causing the two to fuse together. Once this has

happened, the contents of the virus, including its RNA genome, are free to pour out into the cells cytoplasm. Structure HA is a homotrimeric integral membrane glycoprotein. It is shaped like a cylinder, and is approximately 13.5 nanometres long. The three identical monomers that constitute HA are constructed into a central helix coil; three spherical heads contain the sialic acid binding sites. HA monomers are synthesized as precursors that are then glycosylated and cleaved into two smaller polypeptides: the HA1 and HA2 subunits. Each HA monomer consists of a long, helical chain anchored in the membrane by HA2 and topped by a large HA1 globule Neuraminidase Neuraminidase enzymes are glycoside hydrolase enzymes (EC 3.2.1.18) that cleave the glycosidic linkages of neuraminic acids. Neuraminidase enzymes are a large family, found in a range of organisms. The best-known neuraminidase is the viral neuraminidase, a drug target for the prevention of the spread of influenza infection. The viral neuraminidases are frequently used as an antigenic determinants found on the surface of the Influenza virus. Some variants of the influenza neuraminidase confer more virulence to the virus than others. Other homologs are found in mammalian cells, which have a range of functions. At least four mammalian sialidase homologs have been described in the human genome (see NEU1, NEU2, NEU3, NEU4). Neuraminidases, also called sialidases, catalyze the hydrolysis of terminal sialic acid residues from the newly formed virions and from the host cell receptors.[1] Sialidase activities include assistance in the mobility of virus particles through the respiratory tract mucus and in the elution of virion progeny from the infected cell. Structure Influenza neuraminidase exists as a mushroom-shape projection on the surface of the influenza virus. It has a head consisting of four co-planar and roughly spherical subunits, and a hydrophobic region that is embedded within the interior of the virus' membrane. It comprises a single polypeptide chain that is oriented in the opposite direction to the hemagglutinin antigen. The composition of the polypeptide is a single chain of six conserved polar amino acids, followed by hydrophilic, variable amino acids. -Sheets predominate as the secondary level of protein conformation. Recent emergence of oseltamivir and zanamivir resistant human influenza A(H1N1) H274Y has emphasized the need for suitable expression systems to obtain large quantities of highly pure and stable, recombinant neuraminidase through two separate

artificial tetramerization domains that facilitate the formation of catalytically active neuraminidase homotetramers from yeast and Staphylothermus marinus, which allow for secretion of FLAG-tagged proteins and further purification.

Mechanism Proposed mechanism of catalysis of influenza virus sialidase 4 (Link to glycosidase mechanism) .The enzymatic mechanism of influenza virus sialidase has been studied by Taylor et al, shown in Figure 1. The enzyme catalysis process has four steps. The first step involves the distortion of the -sialoside from a 2C5 chair conformation (the lowest-energy form in solution) to a pseudoboat conformation when the sialoside binds to the sialidase. The second step leads to an oxocarbocation intermediate, the sialosyl cation. The third step is the formation of Neu5Ac initially as the -anomer, and then mutarotation and release as the more thermodynamically-stable -Neu5Ac. How does swine flu virus work ( A PICTORIAL REPRESENTATION)

Chapter : 4 Methodology:
Energy refinement of H1N1 was modeled by Modeller 9v10[4] using PDB entry 1LV1 as a template. The predicted models were evaluated for geometry, stereochemistry checks and energy distribution using PROCHECK[5].The models were systematically analyzed using ProSA for various structural properties and the best modelled structure containing94.6% residues in the core region of the Ramachandran plot was selected as the docking target enzyme. Three potential binding sites of modelled H1N1 were revealey Ligsite[6] program where pkt-48 is found to be the most favourable and conserved region containing critical aspartic , Glycine residues (D198,227 & G27) and has a better binding affinity. In this study, methyl-formamide, is considered as seed molecules for them de-novo generation with a final output of twenty structurally complimentary potential lead molecules using Ligbuilder V1.2[7]. All the twenty de-novo designed and selected ligand molecules were docked into the target enzyme using Autodock4.2.3[8].. Binding energies for all the 10 designed ligand molecules as examined by Autodock 4.2.3. ranges between -3.53 to -0.59KJ/mole.

1) TARGET IDENTIFICATION 2) TARGET VALIDATION 3) STRUCTURAL RETRIVAL OR DETERMINATION 4) STRUCTURE VALIDATION 5) ACTIVE SITE IDENTIFICATION 6) LEAD IDENTIFICATION 7) DEVELOPMENT OF LEAD INTO ACTIVE SITE 8) DOCKING ANALYSIS 9) ADME TOXICO ANALYSIS 10) PROPOSAL OF NEW DRUG CANDIDATE OR MOLECULE Structure Based drug designing in 1996 In silico = primirialy computer minded or data on silicon chips PDB is real time visualition technique(NMR,X-ray,etc.) If structure meets then for further steps If not structure modeling techniques Denovo, threading, homology modeling. Alignment = to bring together two similar or identical entities Global(FASTA). Local(BLAST).

TARGET IDENTIFICATION AND VALIDATION

Protein Selection

Prior to ligand development, the protein target was first selected. For this research, the neuraminidase subtype N4 (PDB ID 2HTV) was chosen for study. (See Figure 2.3) It is structurally similar to N1 neuraminidase, but has had fewer investigations involving antiviral activity. Its structure was initially released on September 9, 2005, but last modified on February 24, 2009. It is a strain of influenza A virus. It consists of two polypeptide chains and is classified as a hydrolase.8

Figure : Visualization of 2HTV, N4 neuraminidase Structural comparison of N1, N4 and N8 Group-1 neuraminidase shows their active sites to be virtually identical. Group-1 NAs consist of N1, N4, N5, and N8. Group-2 contains N2, N3, N6, N7 and N9. There are conformational differences between Group-1 and Group-2 NAs. These differences come in the form of various amino acid configurations. The differences result in a large cavity being present in Group-1 NAs which is not available in Group-2 NAs.9

TARDET VALIDATION

1) search for query protein sequence.{inFASTA} 2) search for query homologs (BLAST from NCBI) 3) search for homology structure 4) preparation of modeling (modeler version 9v10)

.ali file alignment file .atm file .py file atomic file

modellar pytham/ program file

FASTA FILE >3SAL:A|PDBID|CHAIN|SEQUENCE PEFLNNTEPLCNVSGFAIVSKDNGIRIGSRGHVFVIREPFVACGPTECRTFFLTQ GALLNDKHSNNTVKDRSPYRALMSV PLGSSPNAYQAKFESVAWSATACHDGKKWLAVGISGADDDAYAVIHYGGMP TDVVRSWRKQILRTQESSCVCMNGNCYWV MTDGPANSQASYKIFKSHEGMVTNEREVSFQGGHIEECSCYPNLGKVECVCR DNWNGMNRPILIFDEDLDYEVGYLCAGI PTDTPRVQDSSFTGSCTNAVGGSGTNNYGVKGFGFRQGNSVWAGRTVSISSR SGFEILLIEDGWIRTSKTIVKKVEVLNN KNWSGYSGAFTIPITMTSKQCLVPCFWLEMIRGKPEERTSIWTSSSSTVFCGVS SEVPGWSWDDGAILPFDIDKM >3SAL:B|PDBID|CHAIN|SEQUENCE PEFLNNTEPLCNVSGFAIVSKDNGIRIGSRGHVFVIREPFVACGPTECRTFFLTQ GALLNDKHSNNTVKDRSPYRALMSV PLGSSPNAYQAKFESVAWSATACHDGKKWLAVGISGADDDAYAVIHYGGMP TDVVRSWRKQILRTQESSCVCMNGNCYWV MTDGPANSQASYKIFKSHEGMVTNEREVSFQGGHIEECSCYPNLGKVECVCR DNWNGMNRPILIFDEDLDYEVGYLCAGI PTDTPRVQDSSFTGSCTNAVGGSGTNNYGVKGFGFRQGNSVWAGRTVSISSR SGFEILLIEDGWIRTSKTIVKKVEVLNN KNWSGYSGAFTIPITMTSKQCLVPCFWLEMIRGKPEERTSIWTSSSSTVFCGVS SEVPGWSWDDGAILPFDIDKM

ALI FILE: >p1;AAAA structure:X:::::::: PEFLNNTEPLCNVSGFAIVSKDNGIRIGSRGHVFVIREPFVACGPTECRTFFLTQ GALLN DKHSNNTVKDRSPYRALMSVPLGSSPNAYQAKFESVAWSATACHDGKKWL AVGISGADDD AYAVIHYGGMPTDVVRSWRKQILRTQESSCVCMNGNCYWVMTDGPANSQA SYKIFKSHEG MVTNEREVSFQGGHIEECSCYPNLGKVECVCRDNWNGMNRPILIFDEDLDYE VGYLCAGI PTDTPRVQDSSFTGSCTNAVGGSGTNNYGVKGFGFRQGNSVWAGRTVSISSR SGFEILLI EDGWIRTSKTIVKKVEVLNNKNWSGYSGAFTIPITMTSKQCLVPCFWLEMIRG KPEERTS IWTSSSSTVFCGVSSEVPGWSWDDGAILPFDIDKM*

>p2;BBBB sequence:y:::::::: PEFLNNTEPLCNVSGFAIVSKDNGIRIGSRGHVFVIREPFVACGPTECRTFFLTQ GALLN DKHSNNTVKDRSPYRALMSVPLGSSPNAYQAKFESVAWSATACHDGKKWL AVGVSGADDD AYAVIHYGGMPTDVVRSWRKQILRTQESSCVCMNGNCYWVMTDGPANSQA SYKIFKSHEG MVTNEREVSFQGGHIEECSCYPNLGKVECVCRDNWNGMNRPILIFDEDLDYE VGYLCAGI PTDTPRVQDSSFTGSCTNAVGGSGTNNYGVKGFGFRQGNSVWAGRTVSISSR SGFEILLI

EDGWIRTSKTIVKKVEVLNNKNWSGYSGAFTIPITMTGKQCLVPCFWLEMIR GKPEERTS IWTSSSSTVFCGVSSEVPGWSWDDGAILPFDIDKM*

PYTHON FILE: from modeller.automodel import* log.verbose() env=environ() env.io.atom_files_directory='./:../AAAA.atm' a=automodel(env, alnfile='AAAA.ali', knowns='AAAA', sequence='BBBB') a.starting_model=1 a.ending_model=5 a.make()

ATM FILE:

ATOM ATOM ATOM ATOM ATOM ATOM

155 CG2 THR 156 N ASP 157 CA ASP 158 C ASP 159 O ASP 160 CB ASP

30 31 31 31 31 31

5.649 80.387 76.658 1.00 8.90 5.327 81.123 81.387 1.00 21.51 5.105 81.984 82.553 1.00 22.00 5.353 81.336 83.886 1.00 24.06 4.845 80.257 84.163 1.00 24.63 3.643 82.432 82.603 1.00 27.02

C N C C O C

ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM

161 CG ASP 162 OD1 ASP 163 OD2 ASP 164 N ASP 165 CA ASP 166 C ASP 167 O ASP 168 CB ASP 169 CG ASP 170 OD1 ASP 171 OD2 ASP 172 N GLN 173 CA GLN 174 C GLN 175 O GLN 176 CB GLN 177 CG GLN 178 CD GLN 179 OE1 GLN 180 NE2 GLN 181 N ILE 182 CA ILE 183 C ILE 184 O ILE 185 CB ILE 186 CG1 ILE

31 31 31 32 32 32 32 32 32 32 32 33 33 33 33 33 33 33 33 33

3.351 83.592 81.695 1.00 28.60 3.585 84.735 82.134 1.00 29.87 2.874 83.357 80.553 1.00 32.70 6.041 82.056 84.756 1.00 27.54 6.305 81.561 86.098 1.00 32.45 5.035 81.781 86.953 1.00 31.11 4.687 80.959 87.797 1.00 32.59 7.514 82.285 86.703 1.00 38.29 8.150 81.493 87.832 1.00 45.66 8.697 80.391 87.550 1.00 51.06 8.074 81.942 89.003 1.00 49.70 4.325 82.876 86.692 1.00 29.33 3.104 83.206 87.406 1.00 29.10 2.092 83.840 86.483 1.00 25.47 2.336 84.888 85.895 1.00 29.43 3.378 84.203 88.514 1.00 36.87 4.176 83.666 89.666 1.00 49.28 5.098 84.735 90.231 1.00 56.34 5.927 85.307 89.503 1.00 61.39 4.933 85.052 91.516 1.00 60.19 0.944 83.215 86.357 1.00 20.02 -0.081 83.771 85.525 1.00 17.22 -1.383 83.502 86.267 1.00 14.70 -1.539 82.459 86.896 1.00 15.39 -0.028 83.175 84.105 1.00 17.97 -0.870 84.028 83.171 1.00 20.61

C O O N C C O C C O O N C C O C C C O N N C C O C C

34 34 34 34 34 34

ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM

187 CG2 ILE 188 CD1 ILE 189 N GLU 190 CA GLU 191 C GLU 192 O GLU 193 CB GLU 194 CG GLU 195 CD GLU 196 OE1 GLU 197 OE2 GLU 198 N VAL 199 CA VAL 200 C VAL 201 O VAL 202 CB VAL 203 CG1 VAL 204 CG2 VAL 205 N THR 206 CA THR 207 C THR 208 O THR 209 CB THR 210 OG1 THR 211 CG2 THR 212 N ASN

34 34 35 35 35 35 35 35 35 35 35 36 36 36 36 36 36 36 37 37 37 37 37 37 37 38

-0.531 81.772 84.085 1.00 18.49 -0.960 83.489 81.775 1.00 22.77 -2.276 84.478 86.271 1.00 12.14 -3.523 84.356 86.998 1.00 9.31 -4.622 83.670 86.233 1.00 7.29 -4.927 84.036 85.103 1.00 9.41 -3.984 85.735 87.437 1.00 11.57 -4.962 85.747 88.589 1.00 18.28 -5.559 87.124 88.838 1.00 22.13 -5.075 88.110 88.247 1.00 29.08 -6.539 87.229 89.597 1.00 25.32 -5.212 82.674 86.880 1.00 9.63 -6.317 81.882 86.353 1.00 9.15 -7.501 82.005 87.308 1.00 10.04 -7.364 82.467 88.445 1.00 9.25 -5.934 80.379 86.175 1.00 7.62 -4.928 80.217 85.029 1.00 6.60 -5.389 79.806 87.457 1.00 6.96 -8.675 81.614 86.842 1.00 11.38 -9.866 81.704 87.677 1.00 10.31 -9.822 80.707 88.818 1.00 11.93 -10.275 81.012 89.914 1.00 12.93 -11.171 81.473 86.869 1.00 7.72 -11.133 80.170 86.263 1.00 8.07 -11.345 82.546 85.792 1.00 5.67 -9.325 79.499 88.552 1.00 13.63

C C N C C O C C C O O N C C O C C C N C C O C O C N

ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM

213 CA ASN 214 C ASN 215 O ASN 216 CB ASN 217 CG ASN 218 OD1 ASN 219 ND2 ASN 220 N ALA 221 CA ALA 222 C ALA 223 O ALA 224 CB ALA 225 N THR 226 CA THR 227 C THR 228 O THR 229 CB THR 230 OG1 THR 231 CG2 THR 232 N GLU 233 CA GLU 234 C GLU 235 O GLU 236 CB GLU 237 CG GLU 238 CD GLU

38 38 38 38 38 38 38 39 39 39 39 39 40 40 40 40 40 40 40 41 41 41 41 41 41 41

-9.262 78.488 89.588 1.00 11.33 -8.104 77.534 89.361 1.00 10.27 -7.661 77.377 88.244 1.00 12.09 -10.579 77.723 89.690 1.00 14.55 -10.705 77.036 91.011 1.00 18.78 -11.333 77.514 91.920 1.00 25.55 -10.121 75.876 91.116 1.00 27.49 -7.627 76.903 90.428 1.00 8.87 -6.505 75.977 90.377 1.00 10.10 -6.689 74.953 91.483 1.00 12.30 -7.548 75.109 92.340 1.00 14.57 -5.189 76.726 90.590 1.00 6.40 -5.904 73.887 91.466 1.00 13.05 -6.013 72.879 92.505 1.00 10.71 -4.617 72.394 92.846 1.00 11.57 -3.736 72.351 91.987 1.00 13.18 -6.972 71.731 92.108 1.00 12.00 -7.134 70.844 93.213 1.00 18.49 -6.454 70.940 90.943 1.00 17.16 -4.394 72.183 94.134 1.00 9.95 -3.112 71.747 94.658 1.00 11.51 -2.846 70.257 94.405 1.00 12.88 -3.712 69.404 94.674 1.00 17.08 -3.060 72.060 96.154 1.00 9.18 -1.765 71.698 96.834 1.00 10.08 -0.588 72.445 96.270 1.00 13.05

C C O C C O N N C C O C N C C O C O C N C C O C C C

ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM

239 OE1 GLU 240 OE2 GLU 241 N LEU 242 CA LEU 243 C LEU 244 O LEU 245 CB LEU 246 CG LEU 247 CD1 LEU 248 CD2 LEU 249 N VAL 250 CA VAL 251 C VAL 252 O VAL 253 CB VAL 254 CG1 VAL 255 CG2 VAL 256 N GLN 257 CA GLN 258 C GLN 259 O GLN 260 CB GLN 261 CG GLN 262 CD GLN 263 OE1 GLN 264 NE2 GLN

41 41 42 42 42 42 42 42 42 42 43 43 43 43 43 43 43 44 44 44 44 44 44 44 44 44

-0.595 73.690 96.392 1.00 18.54 0.318 71.812 95.677 1.00 13.57 -1.652 69.950 93.898 1.00 9.69 -1.294 68.584 93.604 1.00 7.62 -0.325 67.968 94.594 1.00 8.70 -0.151 66.749 94.609 1.00 11.30 -0.714 68.471 92.194 1.00 8.30 -1.594 68.793 90.984 1.00 8.46 -0.855 68.410 89.713 1.00 7.53 -2.921 68.057 91.059 1.00 7.50 0.310 68.800 95.412 1.00 10.16 1.277 68.328 96.396 1.00 8.47 0.707 68.287 97.801 1.00 12.12 0.314 69.316 98.331 1.00 13.85 2.530 69.227 96.454 1.00 6.04 3.444 68.773 97.561 1.00 4.59 3.266 69.219 95.149 1.00 3.81 0.654 67.095 98.395 1.00 12.11 0.184 66.921 99.767 1.00 11.39 1.345 67.338 100.674 1.00 14.23 2.414 66.720 100.640 1.00 18.47 -0.159 65.452 100.021 1.00 12.84 -0.658 65.171 101.415 1.00 12.53 -1.973 65.878 101.710 1.00 14.94 -2.809 66.068 100.830 1.00 18.22 -2.159 66.273 102.950 1.00 17.85 N

O O

C C O C C C C N C C O C C C N C C O C C C O N

ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM

265 N SER 266 CA SER 267 C SER 268 O SER 269 CB SER 270 OG SER 271 N SER 272 CA SER 273 C SER 274 O SER 275 CB SER 276 OG SER 277 N SER 278 CA SER 279 C SER 280 O SER 281 CB SER 282 OG SER 283 N THR 284 CA THR 285 C THR 286 O THR 287 CB THR 288 OG1 THR 289 CG2 THR 290 N GLY

45 45 45 45 45 45 46 46 46 46 46 46 47 47 47 47 47 47 48 48 48 48 48 48 48 49

1.166 68.369 101.488 1.00 15.68 2.263 68.808 102.342 1.00 17.49 2.022 68.834 103.853 1.00 18.77 2.841 69.355 104.605 1.00 20.60 2.801 70.157 101.858 1.00 17.87 1.771 71.129 101.846 1.00 19.97 0.909 68.272 104.300 1.00 20.88 0.624 68.216 105.727 1.00 24.90 0.278 66.792 106.158 1.00 25.89 -0.204 65.979 105.362 1.00 22.45 -0.531 69.152 106.093 1.00 27.20 -1.743 68.769 105.442 1.00 31.95 0.501 66.511 107.433 1.00 28.61 0.198 65.204 107.976 1.00 30.19 -0.703 65.396 109.172 1.00 33.21 -0.797 66.498 109.726 1.00 32.75 1.475 64.503 108.415 1.00 28.46 1.163 63.308 109.103 1.00 29.14 -1.393 64.329 109.550 1.00 36.64 -2.269 64.371 110.714 1.00 37.39 -1.428 64.328 111.987 1.00 33.31 -1.828 64.865 113.017 1.00 37.43 -3.277 63.190 110.690 1.00 40.61 -2.579 61.934 110.629 1.00 42.64 -4.162 63.301 109.452 1.00 44.70 -0.250 63.714 111.897 1.00 28.62

N C C O C O N C C O C O N C C O C O N C C O C O C N

ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM

291 CA GLY 292 C GLY 293 O GLY 294 N LYS 295 CA LYS 296 C LYS 297 O LYS 298 CB LYS 299 CG LYS 300 CD LYS 301 CE LYS 302 NZ LYS 303 N ILE 304 CA ILE 305 C ILE 306 O ILE 307 CB ILE 308 CG1 ILE 309 CG2 ILE 310 CD1 ILE 311 N CYS 312 CA CYS 313 C CYS 314 O CYS 315 CB CYS 316 SG CYS

49 49 49 50 50 50 50 50 50 50 50 50 51 51 51 51 51 51 51 51 52 52 52 52 52 52

0.640 63.606 113.041 1.00 22.75 0.413 62.322 113.803 1.00 19.67 1.095 62.046 114.784 1.00 20.57 -0.542 61.536 113.318 1.00 18.46 -0.919 60.272 113.913 1.00 19.62 -1.001 59.278 112.811 1.00 16.92 -1.297 59.639 111.685 1.00 15.19 -2.286 60.365 114.569 1.00 24.36 -2.280 61.221 115.795 1.00 32.92 -3.659 61.783 116.069 1.00 39.66 -3.576 62.881 117.131 1.00 44.67 -2.714 64.056 116.686 1.00 48.88 -0.779 58.021 113.173 1.00 15.91 -0.791 56.887 112.278 1.00 13.89 -2.116 56.157 112.483 1.00 17.72 -2.432 55.727 113.589 1.00 18.02 0.386 55.939 112.624 1.00 12.07 1.722 56.611 112.321 1.00 12.41 0.263 54.629 111.878 1.00 10.33 2.926 55.756 112.678 1.00 13.03 -2.893 56.017 111.419 1.00 20.39 -4.170 55.332 111.511 1.00 22.64 -4.013 53.824 111.542 1.00 24.98 -3.357 53.237 110.668 1.00 27.03 -5.070 55.728 110.352 1.00 22.38 -5.493 57.483 110.406 1.00 22.84

C C O N C C O C C C C N N C C O C C C C N C C O C S

ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM

317 N ASN 318 CA ASN 319 C ASN 320 O ASN 321 CB ASN 322 CG ASN 323 OD1 ASN 324 ND2 ASN 325 N ASN 326 CA ASN 327 C ASN 328 O ASN 329 CB ASN 330 CG ASN 331 OD1 ASN 332 ND2 ASN 333 N PRO 334 CA PRO 335 C PRO 336 O PRO 337 CB PRO 338 CG PRO 339 CD PRO 340 N HIS 341 CA HIS 342 C HIS

53 53 53 53 53 53 53 53 54 54 54 54 54 54 54 54 55 55 55 55 55 55 55 56 56 56

-4.660 53.198 112.521 1.00 25.67 -4.595 51.758 112.679 1.00 28.06 -5.341 50.984 111.612 1.00 28.57 -5.343 49.747 111.624 1.00 29.65 -5.064 51.340 114.069 1.00 34.04 -6.446 51.883 114.428 1.00 39.14 -7.166 52.462 113.587 1.00 42.04 -6.816 51.715 115.695 1.00 40.57 -5.991 51.721 110.712 1.00 29.20 -6.707 51.129 109.583 1.00 29.29 -6.241 51.779 108.302 1.00 26.96 -6.024 52.988 108.265 1.00 28.44 -8.212 51.320 109.723 1.00 35.76 -8.885 50.120 110.323 1.00 40.37 -9.130 50.064 111.529 1.00 43.04 -9.168 49.128 109.483 1.00 43.25 -6.150 51.005 107.207 1.00 22.58 -6.467 49.582 107.092 1.00 21.79 -5.324 48.582 107.330 1.00 22.14 -5.552 47.369 107.300 1.00 24.75 -6.984 49.472 105.659 1.00 21.19 -6.383 50.688 104.923 1.00 22.15 -5.664 51.537 105.923 1.00 21.87 -4.103 49.055 107.543 1.00 21.55 -2.979 48.145 107.713 1.00 18.52 -2.859 47.708 109.160 1.00 17.50

N C C O C C O N N C C O C C O N N C C O C C C N C C

ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM

343 O HIS 344 CB HIS 345 CG HIS 346 ND1 HIS 347 CD2 HIS 348 CE1 HIS 349 NE2 HIS 350 N ARG 351 CA ARG 352 C ARG 353 O ARG 354 CB ARG 355 CG ARG 356 CD ARG 357 NE ARG 358 CZ ARG

56 56 56 56 56 56 56 57 57 57 57 57 57 57 57 57 57 57

-3.150 48.486 110.068 1.00 17.92 -1.675 48.835 107.242 1.00 17.60 -1.783 49.486 105.889 1.00 16.13 -1.768 48.745 104.727 1.00 16.95 -1.955 50.795 105.581 1.00 15.33 -1.928 49.612 103.746 1.00 12.46 -2.044 50.858 104.209 1.00 15.78 -2.482 46.453 109.378 1.00 16.91 -2.272 45.941 110.727 1.00 17.66 -0.976 46.563 111.253 1.00 16.88 0.128 46.165 110.870 1.00 17.98 -2.146 44.410 110.748 1.00 21.61 -2.057 43.861 112.183 1.00 31.80 -1.841 42.340 112.291 1.00 39.13 -2.045 41.805 113.652 1.00 47.25 -1.342 42.126 114.756 1.00 50.56 -0.350 43.014 114.721 1.00 51.19 -1.597 41.504 115.910 1.00 52.25 -1.125 47.581 112.084 1.00 17.36 0.002 48.283 112.677 1.00 16.14 0.421 47.602 113.976 1.00 18.69 -0.426 47.210 114.775 1.00 25.79 -0.375 49.749 113.016 1.00 16.08 -0.966 50.453 111.792 1.00 13.26 0.832 50.503 113.522 1.00 17.44 -0.098 50.403 110.560 1.00 13.84

O C C N C C N N C C O C C C N C N N N C C O C C C C

359 NH1 ARG 360 NH2 ARG 361 N ILE 362 CA ILE 363 C ILE 364 O ILE 365 CB ILE 366 CG1 ILE 367 CG2 ILE 368 CD1 ILE 58

58 58 58 58 58 58 58

ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM

369 N LEU 370 CA LEU 371 C LEU 372 O LEU 373 CB LEU 374 CG LEU 375 CD1 LEU 376 CD2 LEU 377 N ASP 378 CA ASP 379 C ASP 380 O ASP 381 CB ASP 382 CG ASP 383 OD1 ASP 384 OD2 ASP 385 N GLY 386 CA GLY 387 C GLY 388 O GLY 389 N ILE 390 CA ILE 391 C ILE 392 O ILE 393 CB ILE 394 CG1 ILE

59 59 59 59 59 59 59 59 60 60 60 60 60 60 60 60 61 61 61 61 62 62 62 62 62 62

1.715 47.387 114.160 1.00 18.86 2.197 46.792 115.392 1.00 15.59 3.202 47.750 116.005 1.00 15.46 4.241 48.036 115.418 1.00 18.17 2.843 45.419 115.159 1.00 14.37 3.369 44.731 116.433 1.00 12.72 2.249 44.553 117.433 1.00 12.09 3.974 43.401 116.108 1.00 12.74 2.857 48.295 117.158 1.00 15.32 3.735 49.208 117.868 1.00 15.10 4.775 48.374 118.640 1.00 15.30 4.419 47.482 119.417 1.00 16.20 2.895 50.058 118.824 1.00 14.36 3.671 51.207 119.446 1.00 16.13 4.923 51.223 119.408 1.00 15.32 3.001 52.124 119.965 1.00 17.51 6.052 48.628 118.370 1.00 15.53 7.126 47.908 119.030 1.00 16.96 7.364 48.399 120.443 1.00 18.81 8.013 47.721 121.248 1.00 20.76 6.803 49.566 120.746 1.00 19.44 6.909 50.215 122.056 1.00 20.00 8.365 50.465 122.425 1.00 18.96 8.964 51.395 121.900 1.00 21.69 6.158 49.432 123.185 1.00 23.28 4.718 49.127 122.759 1.00 23.75

N C C O C C C C N C C O C C O O N C C O N C C O C C

ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM

395 CG2 ILE 396 CD1 ILE 397 N ASP 398 CA ASP 399 C ASP 400 O ASP 401 CB ASP 402 CG ASP 403 OD1 ASP 404 OD2 ASP 405 N CYS 406 CA CYS 407 C CYS 408 O CYS 409 CB CYS 410 SG CYS 411 N THR 412 CA THR 413 C THR 414 O THR 415 CB THR 416 OG1 THR 417 CG2 THR 418 N LEU 419 CA LEU 420 C LEU

62 62 63 63 63 63 63 63 63 63 64 64 64 64 64 64 65 65 65 65 65 65 65 66 66 66

6.064 50.292 124.443 1.00 22.96 4.078 48.034 123.573 1.00 26.74 8.933 49.682 123.338 1.00 18.05 10.331 49.873 123.713 1.00 19.19 11.173 48.691 123.299 1.00 16.89 12.242 48.435 123.861 1.00 15.21 10.519 50.202 125.209 1.00 24.94 9.668 49.337 126.137 1.00 28.19 9.309 48.188 125.777 1.00 31.85 9.353 49.830 127.241 1.00 30.45 10.686 48.007 122.270 1.00 12.66 11.353 46.870 121.688 1.00 10.90 11.605 47.105 120.222 1.00 12.08 10.739 47.639 119.517 1.00 16.14 10.466 45.656 121.776 1.00 9.91 10.335 45.076 123.466 1.00 11.20 12.807 46.755 119.776 1.00 11.48 13.155 46.826 118.359 1.00 9.71 12.622 45.494 117.789 1.00 10.00 12.308 44.583 118.556 1.00 15.28 14.697 46.868 118.143 1.00 6.85 15.322 45.800 118.863 1.00 9.93 15.277 48.166 118.591 1.00 3.77 12.450 45.375 116.480 1.00 8.25 12.008 44.104 115.928 1.00 7.52 13.003 42.973 116.289 1.00 7.54

C C N C C O C C O O N C C O C S N C C O C O C N C C

ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM

421 O LEU 422 CB LEU 423 CG LEU 424 CD1 LEU 425 CD2 LEU 426 N ILE 427 CA ILE 428 C ILE 429 O ILE 430 CB ILE 431 CG1 ILE 432 CG2 ILE 433 CD1 ILE 434 N ASP 435 CA ASP 436 C ASP 437 O ASP 438 CB ASP 439 CG ASP 440 OD1 ASP 441 OD2 ASP 442 N ALA 443 CA ALA 444 C ALA 445 O ALA 446 CB ALA

66 66 66 66 66

12.586 41.858 116.575 1.00 11.33 11.839 44.211 114.411 1.00 9.78 11.594 42.944 113.577 1.00 9.03 10.375 42.191 114.068 1.00 8.35 11.401 43.335 112.135 1.00 8.66 14.305 43.256 116.305 1.00 7.74 15.307 42.241 116.642 1.00 7.06 15.211 41.738 118.098 1.00 9.71 15.331 40.539 118.349 1.00 12.03 16.761 42.716 116.304 1.00 7.61 16.907 42.962 114.794 1.00 6.08 17.798 41.683 116.751 1.00 5.65 16.421 41.848 113.908 1.00 5.09 14.994 42.635 119.057 1.00 10.41 14.866 42.224 120.452 1.00 9.51 13.607 41.392 120.678 1.00 10.31 13.610 40.475 121.502 1.00 14.74 14.875 43.420 121.389 1.00 11.56 16.264 43.976 121.606 1.00 13.91 17.235 43.198 121.459 1.00 14.87 16.387 45.195 121.899 1.00 17.28 12.541 41.704 119.942 1.00 8.26 11.282 40.965 120.016 1.00 7.07 11.471 39.588 119.372 1.00 8.01 10.823 38.621 119.753 1.00 14.19 10.182 41.733 119.300 1.00 6.36

O C C C C N C C O C C C C N C C O C C O O N C C O C

67 67 67 67 67 67 67 67 68 68 68 68 68 68 68 68 69 69 69 69 69

ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM

447 N LEU 448 CA LEU 449 C LEU 450 O LEU 451 CB LEU 452 CG LEU 453 CD1 LEU 454 CD2 LEU 455 N LEU 456 CA LEU 457 C LEU 458 O LEU 459 CB LEU 460 CG LEU 461 CD1 LEU 462 CD2 LEU 463 N GLY 464 CA GLY 465 C GLY 466 O GLY 467 N ASP 468 CA ASP 469 C ASP 470 O ASP 471 CB ASP 472 CG ASP

70 70 70 70 70 70 70 70 71 71 71 71 71 71 71 71 72 72 72 72 73 73 73 73 73 73

12.339 39.522 118.367 1.00 8.60 12.655 38.276 117.681 1.00 6.94 13.523 37.377 118.571 1.00 8.95 13.278 36.184 118.666 1.00 13.99 13.400 38.566 116.385 1.00 6.58 14.032 37.401 115.612 1.00 6.99 13.031 36.863 114.644 1.00 2.00 15.272 37.862 114.890 1.00 3.16 14.560 37.932 119.193 1.00 8.62 15.434 37.137 120.055 1.00 7.99 14.742 36.703 121.350 1.00 9.34 15.022 35.625 121.889 1.00 11.06 16.713 37.902 120.385 1.00 7.84 17.576 38.259 119.179 1.00 6.85 18.757 39.072 119.623 1.00 6.69 18.028 37.006 118.475 1.00 4.91 13.856 37.554 121.852 1.00 8.24 13.134 37.236 123.056 1.00 4.20 13.703 37.913 124.267 1.00 9.35 13.886 37.271 125.300 1.00 13.34 14.094 39.172 124.113 1.00 11.32 14.612 39.971 125.220 1.00 10.18 13.427 39.944 126.196 1.00 13.04 12.316 40.287 125.800 1.00 15.77 14.847 41.388 124.703 1.00 8.82 15.320 42.340 125.768 1.00 9.52

N C C O C C C C N C C O C C C C N C C O N C C O C C

ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM

473 OD1 ASP 474 OD2 ASP 475 N PRO 476 CA PRO 477 C PRO 478 O PRO 479 CB PRO 480 CG PRO 481 CD PRO 482 N HIS 483 CA HIS 484 C HIS 485 O HIS 486 CB HIS 487 CG HIS 488 ND1 HIS 489 CD2 HIS 490 CE1 HIS 491 NE2 HIS 492 N CYS 493 CA CYS 494 C CYS 495 O CYS 496 CB CYS 497 SG CYS 498 N ASP

73 73 74 74 74 74 74 74 74 75 75 75 75 75 75 75 75 75 75 76 76 76 76 76 76 77

14.804 42.350 126.902 1.00 15.85 16.202 43.142 125.460 1.00 15.99 13.635 39.526 127.466 1.00 14.37 12.522 39.473 128.426 1.00 12.63 11.437 40.571 128.357 1.00 13.86 10.247 40.255 128.422 1.00 15.34 13.241 39.422 129.774 1.00 10.96 14.440 38.613 129.468 1.00 10.36 14.906 39.181 128.137 1.00 12.18 11.815 41.839 128.184 1.00 13.77 10.798 42.901 128.108 1.00 14.05 9.998 42.964 126.797 1.00 15.14 9.022 43.716 126.683 1.00 16.06 11.363 44.287 128.508 1.00 14.49 12.088 45.028 127.420 1.00 15.72 13.366 44.699 127.008 1.00 16.17 11.755 46.142 126.731 1.00 14.34 13.785 45.577 126.119 1.00 16.35 12.824 46.468 125.931 1.00 15.88 10.370 42.116 125.843 1.00 14.34 9.708 42.052 124.550 1.00 12.58 8.935 40.764 124.410 1.00 13.66 8.458 40.442 123.327 1.00 15.62 10.736 42.143 123.437 1.00 10.76 11.736 43.647 123.608 1.00 8.33 8.786 40.044 125.515 1.00 15.60 C O N

O O N C C O C C C

C C N C C N N C C O C S N

ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM

499 CA ASP 500 C ASP 501 O ASP 502 CB ASP 503 CG ASP 504 OD1 ASP 505 OD2 ASP 506 N VAL 507 CA VAL 508 C VAL 509 O VAL 510 CB VAL 511 CG1 VAL 512 CG2 VAL 513 N PHE 514 CA PHE 515 C PHE 516 O PHE 517 CB PHE 518 CG PHE 519 CD1 PHE 520 CD2 PHE 521 CE1 PHE 522 CE2 PHE 523 CZ PHE 524 N GLN

77 77 77 77 77 77 77 78 78 78 78 78 78 78 79 79 79 79 79 79 79 79 79 79 79 80

8.071 38.772 125.527 1.00 18.52 6.641 38.895 125.043 1.00 15.92 6.056 37.934 124.548 1.00 16.25 8.095 38.147 126.932 1.00 22.74 9.331 37.275 127.178 1.00 25.55 9.981 36.806 126.199 1.00 25.96 9.639 37.041 128.367 1.00 29.00 6.091 40.092 125.167 1.00 16.28 4.721 40.364 124.743 1.00 17.71 4.501 40.245 123.232 1.00 17.13 3.381 40.014 122.799 1.00 22.17 4.234 41.770 125.252 1.00 18.39 5.089 42.900 124.668 1.00 18.08 2.762 41.984 124.918 1.00 19.95 5.573 40.332 122.446 1.00 17.04 5.472 40.242 120.998 1.00 13.60 5.606 38.845 120.459 1.00 14.61 5.604 38.636 119.248 1.00 18.88 6.510 41.140 120.344 1.00 12.68 6.337 42.577 120.681 1.00 14.04 5.311 43.313 120.107 1.00 15.21 7.162 43.188 121.617 1.00 13.90 5.108 44.637 120.466 1.00 16.22 6.971 44.510 121.983 1.00 13.35 5.944 45.237 121.410 1.00 17.06 5.713 37.872 121.342 1.00 16.26

C C O C C O O N C C O C C C N C C O C C C C C C C N

ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM

525 CA GLN 526 C GLN 527 O GLN 528 CB GLN 529 CG GLN 530 CD GLN 531 OE1 GLN 532 NE2 GLN 533 N ASN

80 80 80 80 80 80 80 80 81

5.863 36.509 120.892 1.00 18.65 4.748 36.133 119.968 1.00 20.29 3.582 36.333 120.293 1.00 23.21 5.860 35.546 122.059 1.00 21.01 7.150 35.544 122.807 1.00 24.76 7.210 34.443 123.829 1.00 27.90 6.371 33.524 123.829 1.00 29.59 8.221 34.496 124.690 1.00 31.72 5.129 35.650 118.789 1.00 21.13

C C O C C C O N N

STRUCTURE VALIDATION

RAMACHANDRAN PLOT A Ramachandran plot (also known as a Ramachandran map or a Ramachandran diagram or a [,] plot), developed by Gopalasamudram Narayana Ramachandran and Viswanathan Sasisekharan is a way to visualize dihedral angles against of amino acid residues in protein structure. It shows the possible conformations of and angles for a polypeptide. Mathematically, the Ramachandran plot is the visualization of a function . The domain of this function is the torus. Hence, the conventional Ramachandran plot is a projection of the torus on the plane, resulting in a distorted view and the presence of discontinuities

-protein complexation on ring-size

Chapter 5 HOMOLOGY MODELLING STEPS

1) homology target

2) target template alignment

3) backbone matching

4) side chain replacement

5) grouping

6) energy satisfaction

7) final model generation

RAMACHANDRAN PLOT FOR OTHER MODELS

Table:

Core(%)

Allowed(%) EDOBE VALUES 1.2 0.9 0.9 0.9 -39286.523438 -39286.523438 -39011.328125 -39214.257812

86.1 87.9 86.7 86.4

87.9

0.6

39214.257812

So our best model is plot no 2 becaause it is known that the edope score should be less and favoured region should be more.

The best Ramachandran plot taken for Modelled H1N1 complex

CHAPTER 6 LEAD IDENTIFICATION


An important step in this work was to identify a molecule of interest for docking, the ligand. Each protein active site has a unique size and shape which determines the complexity, and size and shape of a binding ligand. Because the size and shape of an active site are biologically determined, the substrate dictates the size, shape and chemical make-up of a target site. This relationship determines the complementary size, shape and complexity of possible ligands.5 For this research twenty-five ligands were created, based upon analysis of two previously FDA approved influenza medications, Relenza (Zanamivir) and Tamiflu (Oseltamivir).

fig:zanamivir

fig : oseltamivir

Similarities in these two molecules were identified. They include: a six-membered ring, possibly containing one double bond or a hetero-atom; a carbonyl or carboxyl group adjacent to the double bond or heteroatom; an amino group in the 3- position on the ring relative to the carbonyl or carboxyl group position; and an amide group in the 4-position on the ring and in an anti-configuration relative to the amino group. The similarities were combined into two new molecules, which will be referred to as the basic structures. (See Figure 2.2) After finding the basic structure with the sixmembered ring, a second basic structure was created using a five-membered ring even though neither Zanamivir nor Oseltamivir contain five-membered rings. The fivemembered ring structure was created to investigate the dependence of ligand of different size.

DEVELOPMENT OF LEAD INTO ACTIVE SITE HEX (protein-ligand docking)

File-open-receptor-(best model).pdb File-open-ligand-lead.pdb Attach the ligand in the pocket i.e active site of protein. Controls-matching-correlation type-shape and skin-activate-dismiss. File- save as-both.pdb View the both.pdbfile in SPDV to check that ligand lies in the pocket. Repeat the steps of Hex until ligand lies in pocket. Save this both.pdb as complex.pdb Make ligand.pdb from complex.pdb. Convert the pdb format to mol2 format. This gives us ligandmol2

LIGBUILDER Lead molecule is then grown in the pocket with the help of ligbuilder. To pocket.index, 2 input files are given which are: Complex.pdb Ligandmol2(sybylmol2) 5 output files are thus obtained which are: 1) Pocket_Atom.txt 2) Pocket_gridfile.txt 3) Pocket_keysite.txt 4) Pharmacophore.pdb 5) Pharmacophore.txt

To Grow.index, 3 input files are given: Ligandmol2 Pocket_gridfile.txt Pocket_Atom.txt

2 output files obtained are: Population record file.lig Ligand collection file.lig

Ligand generation cycle consists of following steps: Parent selection Elite selection Growing by mutation Filter application Final ligand

To process.index, Ligand collection file.lig is given as input file. 10 files i.e. results.mdb are obtained as output. These mdb files atre converted to pdb format.

CHAPTER: 7

DOCKING ANALYSIS

AutoDock Procedure and Theory

The second section of the research utilized AutoDock 4.110 which was run using a Linux platform, Ubuntu. AutoDock 4.1 functions on the docking simulation method of automated docking. It employs a more physically detailed docking technique that can incorporate flexible docking. AutoDock 4.1 gives good results when predicting rankings for a series of similar molecules.10,11 It contains a suite of automated docking tools. Its purpose is to predict how small molecules, such as drug candidates, bind to a receptor in a known three-dimensional structure. It consists of two main programs: AutoDock and AutoGrid. AutoGrid pre-calculates a set of grids.

AutoDock is responsible for the docking of the ligand to the protein. AutoDock 4.1 employs a graphical user interface, AutoDockTools. This allows the user to modify the ligand before a docking and to visually analyze dockings after completion Ligand files in the .mol2 format were first opened in AutoDock for preparation. Once opened, charges were added and all non-polar hydrogen atoms were merged. Next, bonds within the ligand were set as rotatable or fixed. After the root atom of the ligand was detected and all torsions were selected and set, the file was then saved as a .pdbqt file type. The protein file also was prepared for docking. Protein files can be found online at the Protein Data Bank website.12 The PDB website contains an archive housing information for experimentally determined structures of proteins, nucleic acids and other complex assemblies. Structures can be searched based upon sequence, structure or function. Each molecule can be viewed and downloaded for further analysis. Each structure has a unique four character ID, which can be used to import the structure directly into AutoDock, or the structure file may be downloaded from the website and opened from the saved PDB file. In this project, the 2HTV protein was imported directly into AutoDock. Once the protein file was opened in AutoDock, the excess water molecules were isolated and deleted from the structure. All hydrogen atoms were added to the protein structure. These changes were then saved. The rigid and flexible residues of the protein were selected, and two additional files created; a file_rigid.pbdqt and file_flex.pbdqt.

A set of grid maps were constructed, using the AutoGrid function. Both the protein and the appropriate ligand files were chosen for the mapping. A grid box was then used to select which area of the protein structure to be mapped. Ideally this grid box is located at the active site. In situations where the active site of a protein is unknown, it is possible for the grid box to encompass the entire protein, enabling blind docking. Because the active site of the 2HTV protein was unknown, a grid box covering the entire protein structure was implemented. The final step in submitting the docking is to run the AutoDock function. To prepare for this, the rigid protein and ligand files were selected. The Lamarkian genetic algorithm was set up, which controlled the number of scans, the number of mutations, and number of conformations returned. The docking parameters were set, and a docking product file was created. Finally, AutoDock was launched, and the resulting docking conformations were returned in the .dpf file. After the docking completed, the product file was opened for viewing and the returned conformations were analyzed. The conformations are sorted by the software from best to worst, based upon their docked energy. Selecting analyze clusterings allows the user to view the ligand at its docked location, which ideally would be within the active site of the protein. Each resulting set of dockings can be viewed as spheres, to visualize where each of the dockings occurred. Isocontour maps can be created to display the interactions between oxygen atoms in the protein and the ligand. Hydrogen bonding interactions also can be modeled

Preparing a Ligand for AutoDock

Ligand files are opened in AutoDock for preparation. The previously saved .mol2 files are compatible with AutoDock. Charges are added, and all non-polar hydrogens are merged. Next, bonds within the ligand are set as rotatable or fixed. After the root is detected and all torsions are selected and set, the file must be saved as a .pdbqt file type. .mol2 in dropdown box. Select ligand. Charges will be added and non-polar hydrogen merged if needed. Click OK sphere will appear (See Figure G.1)

bonds to be selected as rotatable or fixed. Click on purple bonds to activate. Rotatable bonds are green, fixed bonds are red. Click Done. Choose rotations based upon moving the fewest atoms or most atoms, as well as the number of torsions allowed. Click Dismiss. -Save .pbdqt: filename.pbdqt to save ligand. Click Save.

Preparing a PDB file for AutoDockOnce a protein file is opened in AutoDock, the excess water molecules should be isolated and deleted from the structure. All hydrogen atoms shouldbe added to the protein structure. These changes must be saved. The rigid and flexible residues of the protein are selected, and two additional files are created; a file_rigid.pbdqt and file_flex.pbdqt.

-Open protein: If file is saved on computer, Right click PMV Molecules. Choose protein file.pbd. Click Open. If file will be downloaded from Click OK. HOH*. In Atom box, type *. Click Add. Click Dismiss. (See Figure H.1) method of noBond Order, and yes to renumber. Click OK.

.pbd. Write ATOM and HETATM records. Choose Sort Nodes. Click OK. (See Figure H.2) -Choose macromolecule: Flexible Residue Macromolecule Select protein. Click Select Molecule. Click Yes to merge non-polar hydrogens. Click OK. Form. Enter desired flexible residue in the Residue field and click Add. Click Dismiss. Selected Residues Click on a desired bond to inactivate it. Click Close. Clear selection using the Pencil Eraser icon. PBDQT Enter filename_flex.pbdqt. Enter filename_rigid.pdbqt. -Remo protein and click Delete Molecule. Click Continue. Click Dismiss

Figure : Removing water from macromolecule

Figure : Saving PDB file

Running AutoGrid

A grid map must be set up, using AutoGrid. Both the protein and the appropriate ligand files are chosen. A grid box is used to select which area of the protein structure to be mapped. This grid box ideally is positioned in order to include the active site within its boundaries. When the active site of a protein is unknown, this grid box can encompass the entire protein. A grid parameter file (.gpf) needs to be created. Then, AutoGrid can be run

n Choose filename_rigid.pbdqt. Click Open. Click Yes to preserve the input charges. Click OK to warning boxes. Choose/open ligand file. osition the grid box over the active site of the protein. If the active site is unknown,

the grid box can encompass the entire protein. Record the grid settings to Saving Current. (See Figure I.1) filename.gpf. click Launch.

Figure I.1: Grid box

Running AutoDock

The final step in submitting the docking is to run AutoDock . To set up this procedure, the rigid protein and ligand files are selected. The genetic algorithm is set up, which controls the number of scans, the number of mutations, and number of conformations returned. The docking parameters are set, and a docking product file is created. AutoDock is launched, and the resulting docking conformations are returned in the .dpf file.

Filename Select filename_rigid.pdbqt. Click Open. Click Select Ligand. Residue filename Select filename_flexible.pdbqt. Click Open. Accept. defaults. Click Close. filename.dpf. Click Save. click Launch. ian GA Enter

Analyzing Docking Results

After the docking is completed, the product file will be opened for viewing. After opening the product file, the returned conformations are analyzed. By opening clusterings, the ligand is viewed at its docked location. Each resulting set of dockings can be viewed as spheres, to visualize where each of the dockings occurred. Isocontour maps can be created to display the interactions between the oxygen atoms in the protein and the ligand. Hydrogen bonding interactions can also be modeled -Open results: Analy dlg

Conformation Chooser, displaying energies and clusters of the results, ranked from best to worst based upon lowest energy in cluster and best individual per cluster. (See Figure K.1) -

ligand and receptor can be viewed (See Figure K.2) docked conformation is shown as a sphere (See Figure K.3) Alters the display to show interactions between ligand and receptor

FINAL STEP OF DRUG DESIGNING ADME TOXICO ANALYSIS : The best docked file is taken that is in which the bining energy is higher. The best docked file was file no: 10. With binding energy -3.66 k/mol For this the docked file is viewed under software named spdv4.0.4 It is then drawn on molsoft to check whether it comes under rule of five of drugs.

The docked file is swiss pdb viewer 4.0.4

The structure is drawn on molsoft and molecular properties are detected

Fig : Molecular properties of .the chosen docked file

Fig : drug likeness model score

Chapter 8 Result and Discussion:


Five modelled structures of H1N1 VIRUS generated by Modeller 9.10 contains 96% to 98% residues in the core region of Ramachandran plot and the overall G factor ranges between -0.09 to -0.04. Z-scores were within the range and energy functions of the residues were at minimum as analyzed by ProSA. Binding pocket determination of modelled by H1N1 VIRUS Ligsite program revealed three potential binding sites pkt-139, pkt-48 and pkt- 26 where -48 was found to be the major cleft with critical D13, T26 & G27. Ligand docking predicted the binding of generated derivatives at the substrate binding cleft with negative interaction energy and efficient binding. Pharmacokinetic properties analysis of the optimized lead molecules performed by Molsoft LLC program predicted minimum number of Hydrogen bond acceptors, Hydrogen bond donors and molecular weight. The partition co-efficient CLogP and Solubility CLogS were found to be minimal for the designed ligand. These observed properties suggested good absorption and easy transportation of the molecule across the membrane, which according to the rule of five; a compound could possibly behave as a drug. The pharmacodynamic properties were calculated using PASS program at Pa>Pi.

Docking

Model no.

G bind kj/mol

Lig-1

-2.01

Lig-2

-2.19

Lig-3

-2.19

Lig-4

-3.28

Lig-5

-3.17

Lig-6

-3.07

Lig-7

-3.21

Lig-8

-0.96

Lig-9

10

-3.69

Lig-10

10

-3.66

Molecular formula: C29 H58 Molecular weight: 406.45 Number of HBA: 0 Number of HBD: 0 MolLogP : 5.66 MolLogS : -8.14 (in Log(moles/L)) 21.96 (in mg/L) MolPSA : 0.00 A2 MolVol : 530..13 A3 Number of stereo centers: 14

Project Title:

REVISTING LIPINSKIS RULE : A DEVEATION STUDY FOCUSING HEPATIC ESTABLISED LIGANDS

INTRODUCTION Lipinski's rule of five


Lipinski's rule of five also known as the Pfizer's rule of five or simply the Rule of five (RO5) is a rule of thumb to evaluate druglikeness or determine if a chemical compound with a certain pharmacological or biological activity has properties that would make it a likely orally active drug in humans. The rule was formulated by Christopher A. Lipinski in 1997, based on the observation that most medication drugs are relatively small and lipophilic molecules.

The rule describes molecular properties important for a drug's pharmacokinetics in the human body, including their absorption, distribution, metabolism, and excretion ("ADME"). However, the rule does not predict if a compound is pharmacologically active.

The rule is important to keep in mind during drug discovery when a pharmacologically active lead structure is optimized step-wise to increase the activity and selectivity of the compound as well as to insure drug-like physicochemical properties are maintained as described by Lipinski's rule. Candidate drugs that conform to the RO5 tend to have lower attrition rates during clinical trials and hence have an increased chance of reaching the market.

Components of the rule Lipinski's rule states that, in general, an orally active drug has no more than one violation of the following criteria: Not more than 5 hydrogen bond donors (nitrogen or oxygen atoms with one or more hydrogen atoms) Not more than 10 hydrogen bond acceptors (nitrogen or oxygen atoms) A molecular mass less than 500 daltons An octanol-water partition coefficient[5] log P not greater than 5 Note that all numbers are multiples of five, which is the origin of the rule's name. As with many other rules of thumb, (such as Baldwin's rules for ring closure or Murphy's law), there are many exceptions to Lipinski's Rule. Variants In an attempt to improve the predictions of drug likeness, the rules have spawned many extensions, for example the following: Partition coefficient log P in 0.4 to +5.6 range Molar refractivity from 40 to 130 Molecular weight from 180 to 500 Number of atoms from 20 to 70 (includes H-bond donors [e.g.;OH's and NH's] and H-bond acceptors [e.g.; N's and O's] Polar surface area no greater than 140 Also the 500 molecular weight cutoff has been questioned. Polar surface area and the number of rotatable bonds has been found to better discriminate between compounds that are orally active and those that are not for a large data set of compounds in the rat. In particular, compounds which meet only the two criteria of: 10 or fewer rotatable bonds and polar surface area equal to or less than 140 2 are predicted to have good oral bioavailability. Lead-like During drug discovery, lipophilicity and molecular weight are often increased in order to improve the affinity and selectivity of the drug candidate. Hence it is often difficult to maintain drug-likeness (i.e., RO5 complicance) during hit and lead optimization. Hence it has been proposed that members of screening libraries from which hits are discovered should be biased toward lower molecular weight and lipophility so that medicinal chemists will have an easier time in delivering optimized drug development candidates that are also drug-like. Hence the rule of five has been extended to the rule

of three (RO3) for defining lead-like compounds. A rule of three compliant compound is defined as one that has: octanol-water partition coefficient log P not greater than 3 molecular mass less than 300 daltons not more than 3 hydrogen bond donors not more than 3 hydrogen bond acceptors not more than 3 rotatable bonds

STEPS FOR DEVIATION STUDY

Diseases related to a particular organ Their ligand structure and their study Finding out the most common part in them Checking them on rule of five parameter. Finding out which one and how many of them deviate from rule of five parameters. By getting the most common part in those ligand we can conclude that the common part is mostly required in any ligand that works as a drug for disease related to that parameter of the organ . By checking the deviation from rule of five of those ligand we can conclude that rule 5 parameter can vary in order to get a good and highly effective drug.

Liver : The organ taken for study The liver is a vital organ present in vertebrates and some other animals. It has a wide range of functions, including detoxification, protein synthesis, and production of biochemicals necessary for digestion. The liver is necessary for survival; there is currently no way to compensate for the absence of liver function in the long term, although new liver dialysis techniques can be used in the short term. This organ plays a major role in metabolism and has a number of functions in the body, including glycogen storage, decomposition of red blood cells, plasma protein synthesis, hormone production, and detoxification. It lies below the diaphragm in the

abdominal-pelvic region of the abdomen. It produces bile, an alkaline compound which aids in digestion via the emulsification of lipids. The liver's highly specialized tissues regulate a wide variety of high-volume biochemical reactions, including the synthesis and breakdown of small and complex molecules, many of which are necessary for normal vital functions.[2] Terminology related to the liver often starts in hepar- or hepat- from the Greek word for liver, hpar (, root hepat-, -). Functions The liver stores a multitude of substances, including glucose (in the form of glycogen), vitamin A (12 years' supply), vitamin D (14 months' supply)[ vitamin B12 (13 years' supply), vitamin K, iron, and copper. The liver is responsible for immunological effectsthe reticuloendothelial system of the liver contains many immunologically active cells, acting as a 'sieve' for antigens carried to it via the portal system. The liver produces albumin, the major osmolar component of blood serum. The liver synthesizes angiotensinogen, a hormone that is responsible for raising the blood pressure when activated by renin, an enzyme that is released when the kidney senses low blood pressure.

Diseases of the liver The liver supports almost every organ in the body and is vital for survival. Because of its strategic location and multidimensional functions, the liver is also prone to many diseases. The most common include: Infections such as hepatitis A, B, C, D, E, alcohol damage, fatty liver, cirrhosis, cancer, drug damage (particularly by acetaminophen (paracetamol) and cancer drugs). Many diseases of the liver are accompanied by jaundice caused by increased levels of bilirubin in the system. The bilirubin results from the breakup of the hemoglobin of dead red blood cells; normally, the liver removes bilirubin from the blood and excretes it through bile. There are also many pediatric liver diseases including biliary atresia, alpha-1 antitrypsin deficiency, alagille syndrome, progressive familial intrahepatic cholestasis, and Langerhans cell histiocytosis, to name but a few.

Diseases that interfere with liver function will lead to derangement of these processes. However, the liver has a great capacity to regenerate and has a large reserve capacity. In most cases, the liver only produces symptoms after extensive damage. Liver diseases may be diagnosed by liver function tests, for example, by production of acute phase proteins. Disease symptoms The classic symptoms of liver damage include the following: Pale stools occur when stercobilin, a brown pigment, is absent from the stool. Stercobilin is derived from bilirubin metabolites produced in the liver. Dark urine occurs when bilirubin mixes with urine Jaundice (yellow skin and/or whites of the eyes) This is where bilirubin deposits in skin, causing an intense itch. Itching is the most common complaint by people who have liver failure. Often this itch cannot be relieved by drugs. Swelling of the abdomen, ankles and feet occurs because the liver fails to make albumin. Excessive fatigue occurs from a generalized loss of nutrients, minerals and vitamins. Bruising and easy bleeding are other features of liver disease. The liver makes substances which help prevent bleeding. When liver damage occurs, these substances are no longer present and severe bleeding can occur.

Diagnosis The diagnosis of liver function is made by blood tests. Liver function tests can readily pinpoint the extent of liver damage. If infection is suspected, then other serological tests are done. Sometimes, one may require an ultrasound or a CT scan to produce an image of the liver. Physical examination of the liver is not accurate in determining the extent of liver damage. It can only reveal presence of tenderness or the size of liver, but in all cases, some type of radiological study is required to examine it.[12] Biopsy / scan Damage to the liver is sometimes determined with a biopsy, particularly when the cause of liver damage is unknown. In the 21st century they were largely replaced by high-resolution radiographic scans. The latter do not require ultrasound guidance, lab involvement, microscopic analysis, organ damage, pain, or patient sedation; and the results are available immediately on a computer screen.

In a biopsy, a needle is inserted into the skin just below the rib cage and a tissue sample obtained. The tissue is sent to the laboratory, where it is analyzed under a microscope. Sometimes, a radiologist may assist the physician performing a liver biopsy by providing ultrasound guidance.[13] Regeneration The liver is the only human internal organ capable of natural regeneration of lost tissue; as little as 25% of a liver can regenerate into a whole liver. This is, however, not true regeneration but rather compensatory growth. The lobes that are removed do not regrow and the growth of the liver is a restoration of function, not original form. This contrasts with true regeneration where both original function and form are restored. In liver, large areas of the tissues are formed but for the formation of new cells there must be sufficient amount of material so the circulation of the blood becomes more active. This is predominantly due to the hepatocytes re-entering the cell cycle. That is, the hepatocytes go from the quiescent G0 phase to the G1 phase and undergo mitosis. This process is activated by the p75 receptors. There is also some evidence of bipotential stem cells, called hepatic oval cells or ovalocytes (not to be confused with oval red blood cells of ovalocytosis), which are thought to reside in the canals of Hering. These cells can differentiate into either hepatocytes or cholangiocytes, the latter being the cells that line the bile ducts. Scientific and medical works about liver regeneration often refer to the Greek Titan Prometheus who was chained to a rock in the Caucasus where, each day, his liver was devoured by an eagle, only to grow back each night. The myth suggests the ancient Greeks knew about the livers remarkable capacity for self-repair, however, this claim is without evidence. Liver transplantation Human liver transplants were first performed by Thomas Starzl in the United States and Roy Calne in Cambridge, England in 1963 and 1965, respectively. Liver transplantation is the only option for those with irreversible liver failure. Most transplants are done for chronic liver diseases leading to cirrhosis, such as chronic hepatitis C, alcoholism, autoimmune hepatitis, and many others. Less commonly, liver transplantation is done for fulminant hepatic failure, in which liver failure occurs over days to weeks. Liver allografts for transplant usually come from donors who have died from fatal brain injury. Living donor liver transplantation is a technique in which a portion of a living person's liver is removed and used to replace the entire liver of the recipient.

This was first performed in 1989 for pediatric liver transplantation. Only 20 percent of an adult's liver (Couinaud segments 2 and 3) is needed to serve as a liver allograft for an infant or small child. More recently, adult-to-adult liver transplantation has been done using the donor's right hepatic lobe, which amounts to 60 percent of the liver. Due to the ability of the liver to regenerate, both the donor and recipient end up with normal liver function if all goes well. This procedure is more controversial, as it entails performing a much larger operation on the donor, and indeed there have been at least two donor deaths out of the first several hundred cases. A recent publication has addressed the problem of donor mortality, and at least 14 cases have been found.[19] The risk of postoperative complications (and death) is far greater in right-sided operations than that in left-sided operations. With the recent advances of noninvasive imaging, living liver donors usually have to undergo imaging examinations for liver anatomy to decide if the anatomy is feasible for donation. The evaluation is usually performed by multidetector row computed tomography (MDCT) and magnetic resonance imaging (MRI). MDCT is good in vascular anatomy and volumetry. MRI is used for biliary tree anatomy. Donors with very unusual vascular anatomy, which makes them unsuitable for donation, could be screened out to avoid unnecessary operations. THE DISEASE AND THEIR RELATED DRUGS WITH STRUCTURE: NAME OF THE DISEASE : LIVER CANCER DRUG USED IN CHEMOTHERAPY Doxorubinin TREATMENT OF CANCER IN THE TIME OF

The structure of doxorubicin

Fluorouracil(5fu)

Gemcitabine

Name of diseases cirrhosis, alchoholic liver disease Name of the drug used for treatment of following disease

URSODIOL

NAME OF THE DISEASE : LIVER CYST /ABSCESS NAME OF DRUGS USED Metronidazole

Cephalosporin NAME OF DISEASE: ALCHOHOLIC LIVER DISEASE NAME OF THE DRUG USED IN THE TREATMENT

Disulfiram:

Acamprosate: NAME OF THE DISEASE: HEPATITIS A, B, C DRUGS USE IN THE TREATMENT FOR HEPATITIS A : NO DRUGS ARE USED.

FOR HEPATITUS B: LAMIVUDINE (Epivir)

TENOFOVIR:

ENTACAVIR:

HEPATITIS C: AS SUCH NO DRUGS ARE PRESCRIBED BUT IT LEADS TO LIVER CIRRHOSIS AND LATER LIVER TRANSPLANT.

Checking of structure on the rule of five parameter with help of softwares namely orisis or molsoft ORISIS The OSIRIS Property Explorer shown in this page is an integral part of Actelion's (1) inhouse substance registration system. It lets you draw chemical structures and calculates on-the-fly various drug-relevant properties whenever a structure is valid. Prediction results are valued and color coded. Properties with high risks of undesired effects like mutagenicity or a poor intestinal absorption are shown in red. Whereas a green color indicates drug-conform behaviour.

MOLSOFT Welcome to Molsoft LLC! Molsoft a leading provider of tools, databases and consulting services in the area of structure prediction, structural proteomics, bioinformatics, cheminformatics, molecular visualization and animation, and rational drug design. Molsoft offers complete solutions customized for a biotechnology or pharmaceutical company in the areas of computational biology and chemistry. Molsoft is committed to continuous innovation, scientific excellence, the development of the cutting edge technologies and original ideas. Molsoft offers software tools and services in lead discovery, modeling, cheminformatics, bioinformatics, and corporate data management; and forms partnerships with biotechnology and pharmaceutical companies.

The details of terminology used to satisfy the rule of five: Molecular weight known relationship between poor permeability and high molecular weight. Lipophilicity (ratio of octanol solubility to water solubility) measured through LogP. Number of hydrogen bond donors and acceptors High numbers may impair permeability across membrane bilayer The rule of five formulation

Poor absorption or permeation are more likely when: There are more than 5 H-bond donors. The molecular weight is over 500. The LogP is over 5. There are more than 10 H-bond acceptors.

THE STRUCTURE VIEWED UNDER THE FOLLOWING SOFT WARE DISULFIRAM(ALCHOHOLIC LIVER DISEASE) C;

clogp- 4.13 Solubility : 2.39 molecular wieght : 234 druglikeness:3.8 drug score:0.4 ACAMPROSATE

clogp- 1.95 Solubility : 2.25 molecular wieght : 131 druglikeness: 4.75 drug score: 0.48

CEPHOLOSPORIN

clogp- 2.16 Solubility : 1.94 molecular wieght : 195 druglikeness: 1.24 drug score: 0.83

METRONIDAZOLE

clogp- 1.04 Solubility : 0.62 molecular wieght : 157 druglikeness: 0.34 drug score: 0.78

URASODIOL

clogp- 3.54 Solubility : 4.36 molecular wieght : 366 druglikeness: 5.34

drug score: 0.38

ENTECAVIR

clogp- 1.88 Solubility : 2.57 molecular wieght : 185 druglikeness: 1.0 drug score: 0.82

TENOFOVIR

clogp- 1.88

Solubility : 1.12 molecular wieght : 136 druglikeness: .71 drug score: 0.82 LAMIVUDINE

clogp- 0.52 Solubility : 2.36 molecular wieght : 245 druglikeness: 2.41 drug score: 0.9

ALBENDAZOLE

clogp- 2.05 Solubility : 3.35 molecular wieght : 295 druglikeness: 7.66 drug score: 0.43

DOXORUBICIN

clogp- 2.5 Solubility : 3.41 molecular wieght : 376 druglikeness: 2.65 drug score: 0.43

GEMCITABINE

clogp- 0.44 Solubility : 1.5 molecular wieght : 125gms druglikeness: 1.74 drug score: 0.56

CONCLUSION

The common structure in the above ligand is the benzene ring and carboxylic acid. We can conclude that the common part is mostly required in the above drugs for curing liver disease are so . No drug deviated from the rule of five parameter. So it is concluded that all the drugs are highly effective and as a drug . The goal which was to Identifying calculable parameters of the selected compound library, related to absorption and permeability was achieved Calculations, however imprecise (give only probabilities), may help when choices must be made as to the design or purchase .Accurate prediction of solubility of complex compound is still an elusive target

References

[1] Center for Disease Control and Prevention. Influenza. http://www.cdc.gov/flu/ (accessed June 19, 2009). [2] Couch, Robert B. The New England Journal of Medicine 1997, 337: 927-929. [3] Luo, M., Air, G. M., Brouillette, W.J. The Journal of Infectious Diseases. 1997, 176: 62-65. [4] Malaisree, M., Rungrotmongkol, T., Decha, P., Intharathep, P., Aruksakunwon, O., Ha nnongbuw, S. Proteins 2008, 71: 1908-1918. [5] Kass, Itamar and Arking, Isaiah, T. Structure 2005, 13: 1789-1798. [6] Couch, Robert B. The New England Journal of Medicine 2000, 343: 1778-1788. [7] Balfour Jr, Henry H. The New England Journal of Medicine. 1999, 340: 12551269. [8] Lewars, Errol. Computational Chemistry: Introduction to the Theory and Applications of Mo lecular and Quantum Mechanics; Kluwer Academic Publishers: Boston, 2003. [9] Finer-Moore, Janet S.; Blaney, Jeff; Stroud, Robert M. Facing the Wall in Computationall Based Approaches to Drug Discovery. In Computational and Structural Approaches to Drug Discovery: Ligand-Protein Interactions; Stroud, Robert M.; Finer [10] Rosenfeld, Robin J.; Goodsel, David S.; Musah, Rabi A.; Morris, Garrett M.; Gooding, David B.; Oson, Arthur J. Journal of Computer-Aided Molecular Design 2003, 17: 525-536. [11] Morris, Garrett M.; Goodsell, David S.; Halliday, Robert S.; Huey, Ruth; Hart, William E.; Belew, Richard K.; Olson, Arthur J. Journal of Computational Chemistry 1998, 19: 1639-1662. [12] rotein Data Bank http://www.rcsb.org/pdb/home/home.do (accessed april, 2013) [13] Morris, Garret; Huey, Ruth. 2013. http://autodock.scripps.edu/faqshelp/tutorial/using-autodock-4-with-autodocktools (accessed September 8, 2008) .

Você também pode gostar