Bem-vindo(a) ao Scribd!

Pular no carrossel

Gene Finding

Enviado por

Vineetha Mary Ipe

0% acharam este documento útil (0 voto)

38 visualizações5 páginas

Different features to discriminate exons and introns

Título original

Gene finding

Direitos autorais

Formatos disponíveis

PDF, TXT ou leia online no Scribd

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Denunciar este documento

Different features to discriminate exons and introns

Direitos autorais:

Attribution Non-Commercial (BY-NC)

Formatos disponíveis

Baixe no formato PDF, TXT ou leia online no Scribd

Sinalizar o conteúdo como inadequado

0% acharam este documento útil (0 voto)

38 visualizações5 páginas

Gene Finding

Enviado por

Vineetha Mary Ipe

Different features to discriminate exons and introns

Direitos autorais:

Attribution Non-Commercial (BY-NC)

Formatos disponíveis

Baixe no formato PDF, TXT ou leia online no Scribd

Sinalizar o conteúdo como inadequado

Pular para a página

Você está na página 1de 5

Pesquisar no documento

ABSTRACT

The fast developments in DNA sequencing techniques have paved the way for tremendous increase in biological databases. Once the whole DNA of an organism is sequenced, the next big task is to predict the protein coding DNA/exons present in that sequence. This idea known as gene finding is one of the major challenges in the analysis of newly sequenced genomes. The most critical thing with biological studies is their accuracy in predicting exact protein coding DNA. Though the state-of-the-art tools report high accuracy, they exhibit performance variations with respect to the length of the sequence being analysed. This work began with an investigation about correspondence between accuracy and length of the DNA sequence being analysed for gene prediction. The preliminary results implied that there is correlation between length and accuracy. Hence in this work, we have developed different models specialized for different ranges of length starting with less than 500 nucleotides to greater than 10000 nucleotides. From a set of features that could discriminate between exons and introns, we have identified those features powerful for each length range. Based on these features we have trained the models and a tool is developed such that when an input sequence is given, it is assigned to the model that is tuned for that particular length range and the prediction is obtained. The proposed work employing Adaboost.M1 in conjunction with random forests as the base classifier shows considerable enhancement of prediction accuracy.

ACKNOWLEDGEMENT

This thesis would not have been possible without the assistance and support of many people. I would sincerely like to thank my supervisor Dr. Achuthsankar S.Nair, HOD Dept. of Computational Biology & Bioinformatics, University of Kerala for offering me this thesis topic and then supporting and guiding me throughout my research. His teaching will definitely have a continuing impact in my future academic and professional career. I would also like to thank my internal guide Ms. Muneera C.R., Associate Professor, Dept. of Electronics & Communication, GEC Thrissur & external guide Ms. Baharak Goli, Research Scholar, Dept. of Computational Biology & Bioinformatics, University of Kerala, for their time and support during the completion of my thesis. I would take this opportunity to thank Dr. Sheeba V.S. , HOD Dept. of Electronics & Communication, GEC Thrissur and the project coordinators Mr. Mohammed Salih K.K., Assistant Professor, Dept. of Electronics & Communication, GEC Thrissur & Mr. Roy Francis, Assistant Professor, Dept. of Electronics & Communication, GEC Thrissur. Last but not the least; I would like to acknowledge as well the invaluable support and encouragement supplied by my family and friends. I greatly appreciate their support.

LIST OF TABLES

Table No. 3.1 3.2 3.3 3.4 3.5

Title FrameD Result GeneMark Result Distribution Mapping Schemes Physicochemical properties of nucleotides

Page No. 16 17 19 27 28

3.6 3.7 3.8 3.9 4.1

Summary of Filters Feature Vector Attribute Selection

Comparison of Various Classifier Methods

29 32 34 36 43

Self Consistency Test Results

4.2

Independent Dataset Test Results

5.1

Comparison of Prediction Accuracy

iii

LIST OF FIGURES

Figure No. 1.1 1.2 2.1 2.2 2.3 2.4 2.5 2.6

Title Accuracy of Frame D Accuracy of GeneMark DNA Structure DNA Replication Central Dogma Codons for Amino Acids Eukaryotic DNA Classical Approaches to Gene Finding

Page No. 3 3 6 7 8 9 10 11

3.1

General Schematic Representation of Work

3.2

Spectral Content with Nucleotide Position

3.3

Feature Extraction using Mapping Techniques

3.4 3.5

Tool Developed Working of Length Specific Gene Finding Tool

38 41

LIST OF ABBREVIATIONS AND ACCRONYMS

DNA f SC PSC SR PFDN

Deoxyribo Nucleic Acid Feature Spectral content Paired Spectral Content Spectral Rotation Positional Frequency Distribution of Nucleotides

AMDF

Average Magnitude Difference Function

Cross Validation

Você também pode gostar

USMLE Road Map Pharmacology PDF
Documento497 páginas
USMLE Road Map Pharmacology PDF
Yeshaa Mirani
100% (1)
Classification of Wisconsin Breast Cancer Diagnostic and Prognostic Dataset Using Polynomial Neural Network
Documento82 páginas
Classification of Wisconsin Breast Cancer Diagnostic and Prognostic Dataset Using Polynomial Neural Network
sapmeen
100% (2)
This Study Resource Was: DNA Base Pairing Worksheet
Documento5 páginas
This Study Resource Was: DNA Base Pairing Worksheet
phil tolentino
Ainda não há avaliações
Metabolism of Carbohydrates
Documento26 páginas
Metabolism of Carbohydrates
MM Qizill
67% (3)
Text To PDF 09102023 183852
Documento2 páginas
Text To PDF 09102023 183852
zephrine86
Ainda não há avaliações
Bioinformatics: Analyzing DNA Sequence Using BLAST
Documento30 páginas
Bioinformatics: Analyzing DNA Sequence Using BLAST
Vinayak Doifode
Ainda não há avaliações
Ihesiulor Obinna Thesis
Documento172 páginas
Ihesiulor Obinna Thesis
Payel Dutta
Ainda não há avaliações
BookThesis
Documento38 páginas
BookThesis
rokibul hasan
Ainda não há avaliações
S5464564equencing A Genome
Documento11 páginas
S5464564equencing A Genome
Rodrigo
Ainda não há avaliações
19BMB012 (Md. Hasib Munsi)
Documento12 páginas
19BMB012 (Md. Hasib Munsi)
md hasib munsi
Ainda não há avaliações
Anderson - 08 PHD Design of Multiple Frequency Continuous Wave Radar Hardware and Micro-Doppler Based Detection and Classification Algorithms
Documento192 páginas
Anderson - 08 PHD Design of Multiple Frequency Continuous Wave Radar Hardware and Micro-Doppler Based Detection and Classification Algorithms
Anatol Kiev
Ainda não há avaliações
Dna Sequencing Thesis
Documento7 páginas
Dna Sequencing Thesis
dwndnjfe
100% (2)
Bachelor of Engineering in Computer Science & Engineering: Gene Recognition
Documento52 páginas
Bachelor of Engineering in Computer Science & Engineering: Gene Recognition
nakul_gv
Ainda não há avaliações
Genes Paper
Documento11 páginas
Genes Paper
Tomas Petitti
Ainda não há avaliações
Dna Sequencing Research Paper
Documento8 páginas
Dna Sequencing Research Paper
fnwsnjznd
100% (1)
Alaguraj Thesis PDF PDF
Documento169 páginas
Alaguraj Thesis PDF PDF
Raj Kumar
Ainda não há avaliações
Data Mining in Molecular Biology A Journey From Ra
Documento7 páginas
Data Mining in Molecular Biology A Journey From Ra
Tai Man Chan
Ainda não há avaliações
CV Nitin Gupta
Documento6 páginas
CV Nitin Gupta
Mohit Raj Saxena
Ainda não há avaliações
(PhDThesis10) Feature-Based Transfer Learning With Real-World Applications
Documento128 páginas
(PhDThesis10) Feature-Based Transfer Learning With Real-World Applications
Radha Sheela
Ainda não há avaliações
University of Cincinnati: 07/11/2008 Arun Janarthanan Doctor of Philosophy Computer Engineering
Documento143 páginas
University of Cincinnati: 07/11/2008 Arun Janarthanan Doctor of Philosophy Computer Engineering
Mallikarjunaswamy Swamy
Ainda não há avaliações
Seminar Report
Documento15 páginas
Seminar Report
Abhishek Shivgan
Ainda não há avaliações
Dna Print
Documento17 páginas
Dna Print
Harini S
Ainda não há avaliações
DNA - Sequencing Mthods and Applicaiton - A. - Munshi PDF
Documento184 páginas
DNA - Sequencing Mthods and Applicaiton - A. - Munshi PDF
Felipe Cardenas
100% (1)
Tax Et Al., 2019
Documento15 páginas
Tax Et Al., 2019
Noah Kim
Ainda não há avaliações
Deepa V
Documento179 páginas
Deepa V
Rene Dev
Ainda não há avaliações
Current Topics in Genomics (Prontuario Oficial)
Documento6 páginas
Current Topics in Genomics (Prontuario Oficial)
api-274369365
Ainda não há avaliações
Minor Project Ravantika 001
Documento16 páginas
Minor Project Ravantika 001
Lalru Lalru
Ainda não há avaliações
High-Throughput Sequencing Technology and Its Application: Sciencedirect
Documento13 páginas
High-Throughput Sequencing Technology and Its Application: Sciencedirect
amar
Ainda não há avaliações
Concentration-Controlled Length-Based Dna Computin PDF
Documento120 páginas
Concentration-Controlled Length-Based Dna Computin PDF
Carlos Martínez Díaz
Ainda não há avaliações
Sanger Sequencing Research Paper
Documento5 páginas
Sanger Sequencing Research Paper
naneguf0nuz3
100% (1)
An Assimilated Approach For Statistical Genome Streak Assay Between Matriclinous Datasets
Documento6 páginas
An Assimilated Approach For Statistical Genome Streak Assay Between Matriclinous Datasets
vol1no2
Ainda não há avaliações
Dna Biosensor Thesis
Documento7 páginas
Dna Biosensor Thesis
BuyEssaysTulsa
100% (2)
Optimizing Classification Efficiency With Machine Learning Techniques For Pattern Matching
Documento18 páginas
Optimizing Classification Efficiency With Machine Learning Techniques For Pattern Matching
m.albaiti111
Ainda não há avaliações
Study Fusion Diagram
Documento249 páginas
Study Fusion Diagram
SumaLatha
Ainda não há avaliações
Inferring Gene Regulatory Networks Using Heterogeneous Microarray Dataset
Documento50 páginas
Inferring Gene Regulatory Networks Using Heterogeneous Microarray Dataset
esan_vela
Ainda não há avaliações
DNA Sequencing Technologies: Sequencing Data Protocols and Bioinformatics Tools
Documento33 páginas
DNA Sequencing Technologies: Sequencing Data Protocols and Bioinformatics Tools
Siti khaizatul minnah
Ainda não há avaliações
DNA Fingerprinting Class 12th Investigatory Project
Documento11 páginas
DNA Fingerprinting Class 12th Investigatory Project
meprefernotsay
100% (1)
A Microarray Gene Expression Data Classification U
Documento14 páginas
A Microarray Gene Expression Data Classification U
Huseyin Oztoprak
Ainda não há avaliações
High-Performance Virus Detection System by Using Deep Learning
Documento9 páginas
High-Performance Virus Detection System by Using Deep Learning
m.albaiti111
Ainda não há avaliações
Design Issues For cDNA Microarray Experiments
Documento10 páginas
Design Issues For cDNA Microarray Experiments
Ross CUI
Ainda não há avaliações
Thesis - Data-Driven Cell Engineering of Chinese Hamster Ovary Cells Through Machine Learning
Documento102 páginas
Thesis - Data-Driven Cell Engineering of Chinese Hamster Ovary Cells Through Machine Learning
Maxwell Pryce
Ainda não há avaliações
Bioinformatic Tools For Next Generation DNA Sequencing - PHD Thesis
Documento237 páginas
Bioinformatic Tools For Next Generation DNA Sequencing - PHD Thesis
Dan Pintilescu
Ainda não há avaliações
Comparison of High Throughput Next Gener
Documento12 páginas
Comparison of High Throughput Next Gener
Ani Ioana
Ainda não há avaliações
Quantum Algorithms and The Genetic Code
Documento11 páginas
Quantum Algorithms and The Genetic Code
Tarun Sharma
Ainda não há avaliações
DNA Fingerprinting - Bio-Rad
Documento102 páginas
DNA Fingerprinting - Bio-Rad
ebujak
100% (3)
Bioinformatics Assingment - New Kandy - Draft
Documento14 páginas
Bioinformatics Assingment - New Kandy - Draft
visini
100% (1)
Wireless Channel Characterization Based On Crowdsourced Data and
Documento119 páginas
Wireless Channel Characterization Based On Crowdsourced Data and
uttuttimail
Ainda não há avaliações
Understanding and Improving High-Throughput Sequencing Data Production and Analysis, Dissertation, Martin Kirchner
Documento225 páginas
Understanding and Improving High-Throughput Sequencing Data Production and Analysis, Dissertation, Martin Kirchner
atpowr
Ainda não há avaliações
Bachelor of Engineering in Computer Science & Engineering: Gene Recognition
Documento52 páginas
Bachelor of Engineering in Computer Science & Engineering: Gene Recognition
nakul_gv
Ainda não há avaliações
Research Paper On Dna Microarray
Documento7 páginas
Research Paper On Dna Microarray
afnhbijlzdufjj
100% (1)
Thesis On Dna Fingerprinting
Documento7 páginas
Thesis On Dna Fingerprinting
afcnwwgnt
100% (2)
Bio Report El
Documento8 páginas
Bio Report El
Jateen Rathod
Ainda não há avaliações
Bio Project-5
Documento29 páginas
Bio Project-5
Shifa Thasneem S
Ainda não há avaliações
Multi-Chaos, Fractal and Multi-Fractional Artificial Intelligence of Different Complex Systems
No Everand
Multi-Chaos, Fractal and Multi-Fractional Artificial Intelligence of Different Complex Systems
Yeliz Karaca
Ainda não há avaliações
6456456accepted Manuscript (Woerner)
Documento47 páginas
6456456accepted Manuscript (Woerner)
Rodrigo
Ainda não há avaliações
Final Procedings Nec Conference
Documento261 páginas
Final Procedings Nec Conference
k ramesh
Ainda não há avaliações
For The Degree of at Thiruvalluvar University, Serkkadu, Vellore - 632115 by
Documento7 páginas
For The Degree of at Thiruvalluvar University, Serkkadu, Vellore - 632115 by
Baranishankar
Ainda não há avaliações
Comparation Analysis of Ensemble Technique With Boosting (Xgboost) and Bagging (Randomforest) For Classify Splice Junction Dna Sequence Category
Documento10 páginas
Comparation Analysis of Ensemble Technique With Boosting (Xgboost) and Bagging (Randomforest) For Classify Splice Junction Dna Sequence Category
Fatrina
Ainda não há avaliações
An Evolutionary Artificial Neural Networks Approach For Breast Cancer Diagnosis
Documento30 páginas
An Evolutionary Artificial Neural Networks Approach For Breast Cancer Diagnosis
Sandeep Chaurasia
Ainda não há avaliações
Zhou Masc f2013
Documento136 páginas
Zhou Masc f2013
adnanadnan
Ainda não há avaliações
Brain Tumor MRI Image Segmentation Using Deep Learning Techniques
No Everand
Brain Tumor MRI Image Segmentation Using Deep Learning Techniques
Jyotismita Chaki
Ainda não há avaliações
Integration and Visualization of Gene Selection and Gene Regulatory Networks for Cancer Genome
No Everand
Integration and Visualization of Gene Selection and Gene Regulatory Networks for Cancer Genome
Shruti Mishra
Ainda não há avaliações
SERS for Point-of-care and Clinical Applications
No Everand
SERS for Point-of-care and Clinical Applications
Andrew Fales
Ainda não há avaliações
Certificate Page2
Documento2 páginas
Certificate Page2
Vineetha Mary Ipe
Ainda não há avaliações
Abstract
Documento1 página
Abstract
Vineetha Mary Ipe
Ainda não há avaliações
Your Software ID Is 2f1-370c-cb9
Documento1 página
Your Software ID Is 2f1-370c-cb9
Vineetha Mary Ipe
Ainda não há avaliações
No. Feature: ST ND RD ST ND RD ST ND RD ST ND RD
Documento2 páginas
No. Feature: ST ND RD ST ND RD ST ND RD ST ND RD
Vineetha Mary Ipe
Ainda não há avaliações
Question Bank Bee
Documento3 páginas
Question Bank Bee
Vineetha Mary Ipe
Ainda não há avaliações
Instrumentation
Documento2 páginas
Instrumentation
Vineetha Mary Ipe
Ainda não há avaliações
Cell Biology: An Introduction: Grading System
Documento8 páginas
Cell Biology: An Introduction: Grading System
Jacqueline Rose Alipo-on
Ainda não há avaliações
CH 2 Test Bank For Essential Cell Biology 3rd Edition Alberts
Documento16 páginas
CH 2 Test Bank For Essential Cell Biology 3rd Edition Alberts
Rokia Gharieb
Ainda não há avaliações
CBSE Class 12 Chemistry Worksheet
Documento1 página
CBSE Class 12 Chemistry Worksheet
Nitin Chahal
Ainda não há avaliações
Degradasi Karbohidrat 1
Documento32 páginas
Degradasi Karbohidrat 1
Bryan Asw
Ainda não há avaliações
Designing Insulin For Diabetes Therapy by Protein Engineering
Documento7 páginas
Designing Insulin For Diabetes Therapy by Protein Engineering
Jemi
Ainda não há avaliações
Myositis Autoantibodies and Clinical Phenotypes 2104
Documento7 páginas
Myositis Autoantibodies and Clinical Phenotypes 2104
Mario Suarez
Ainda não há avaliações
Calculation Examples Feeding Values For Ruminants
Documento8 páginas
Calculation Examples Feeding Values For Ruminants
Володимир Сидоренко
100% (1)
PEGFP-N1 Vector Information
Documento3 páginas
PEGFP-N1 Vector Information
Nicholas So
Ainda não há avaliações
Cancer Anorexia and Cachexia
Documento5 páginas
Cancer Anorexia and Cachexia
Wildan Satrio Wemindra
Ainda não há avaliações
Lecture Notes
Documento32 páginas
Lecture Notes
lira shrestha
Ainda não há avaliações
Watson and Crick
Documento3 páginas
Watson and Crick
api-282293385
Ainda não há avaliações
Devbio
Documento140 páginas
Devbio
Aaryan Gupta
Ainda não há avaliações
Amino Acid Analysis
Documento24 páginas
Amino Acid Analysis
alaafathy
Ainda não há avaliações
PhysioEx - Digestive
Documento59 páginas
PhysioEx - Digestive
michelle
Ainda não há avaliações
SHS - LAS - Earth & Life Science - MELC - 4 - Q2 - Week-4 (4) - Removed
Documento7 páginas
SHS - LAS - Earth & Life Science - MELC - 4 - Q2 - Week-4 (4) - Removed
Nini Villeza
Ainda não há avaliações
Patel Hospital: Molecular Pathology
Documento1 página
Patel Hospital: Molecular Pathology
Farrukh Naveed
Ainda não há avaliações
Lipid Test
Documento4 páginas
Lipid Test
Hak Kub
Ainda não há avaliações
AutoDock Tutorial
Documento3 páginas
AutoDock Tutorial
api-3823929
50% (2)
Daftar Pustaka
Documento4 páginas
Daftar Pustaka
Eva Suroya
Ainda não há avaliações
200-Protein Quantification BCA™, Modified Lowry and Bradford Assays
Documento5 páginas
200-Protein Quantification BCA™, Modified Lowry and Bradford Assays
Musa Loo
Ainda não há avaliações
Catalytic Rna
Documento12 páginas
Catalytic Rna
Dr-Sumanta Banerjee
Ainda não há avaliações
Chapter 2 Lipids
Documento109 páginas
Chapter 2 Lipids
Tiffany Shane Vallente
Ainda não há avaliações
Anaphy Lab - Activity in Cell and Microscope
Documento5 páginas
Anaphy Lab - Activity in Cell and Microscope
Alvin Cris Rongavilla
Ainda não há avaliações
Presentation 4 Fe-S Protein, Cytochrome, Nitrogenase
Documento18 páginas
Presentation 4 Fe-S Protein, Cytochrome, Nitrogenase
Ruby Ahmed
Ainda não há avaliações
190603-Metformin Inhibits Gluconeogenesis Via A Redox-Dependent
Documento16 páginas
190603-Metformin Inhibits Gluconeogenesis Via A Redox-Dependent
Felipe Robinson
Ainda não há avaliações
GHJK
Documento5 páginas
GHJK
Youness Elbabouri
Ainda não há avaliações
Untitled
Documento339 páginas
Untitled
JOS� FRANCISCO G�MEZ RODR�GUEZ
Ainda não há avaliações