Você está na página 1de 13

UNIVERSITI TEKNOLOGI MARA

FAKULTI KEJURUTERAAN KIMIA


BIOINFORMATICS
(CBE 647)

NAME
STUDENT NO.
GROUP
EXPERIMENT
DATE PERFORMED
SEMESTER
PROGRAMME / CODE
SUBMIT TO

No
1
2
3
4
5
6
7
8

: NORAFIQAH BINTI AZMAN


: 2010872226
: EH 222 8A
: LAB 1
: 19TH MARCH 2014
:8
: CBE 647 (BIOINFORMATICS)
:DR. TAN HUEY LING

Title
Abstract / Summary
Introduction
Aims
Theory
Methodology
Results
Discussions
Conclusion

Marks

Remarks:
Checked by :

..

..........................
Date :

Recheck by :

.............................
Date :

ABSTRACT
From this exercise,students was introduce to the basic knowledge about Bioinformatics. Many bioinformatics
website had been introduced to students such as GenBank, KEGG, UniProtKB, OMIM, GO, ORF Finder, and NCBI. This
website databases provide information that are vital in completing the task given. There are four parts of exercise
need to be solved which are finding public biological databases, NCBI Entrez and searching biological databases,
determining the Open Reading Frame (ORF) of the Hemoglobin Alpha 2 Gene, and extracting sequence.
INTRODUCTION
To familiarize students with the website datbase regarding bioinformatics, 4 laboratory exercise were performed.
First is finding public biological databases. These biological databases can be accessed from the Gen Bank, KEGG,
UniProtKB, OMIM and GO. These databases can be accessed through NAR (Nucleic Acid Research).NAR Online
contains hotlinks to all of the databases in the compilation as well as brief summaries of their content.
Second part is the NCBI Entrez and searching biological databases.in this particular part, students were assigned to
investigate the human triose phosphate isomerase 1 gene which is responsible for the reaction that converts
dihdroxyacetone phosphate to glyceraldehyde-3-phosphate in glycolysis. In order to performed this task, students
must visit the NCBI website and visit the All Databases page.
The third part is determination of the Open Frame Findings (ORF) and determination of the gene product in the ORF.
Students were assigned to determine the start and stop codon for Hemoglobin Alpha 2 (HBA2) Gene. For this task,
students can accessed GenBank database to solve the task.
Lastly is sequences extraction. In this section, students were given guidelines steps by step in finding whether
nucleotide ara h2 or opsins were related to peanut allergic or an eye gene related to long wave sensitivity and colour
blindness.
OBJECTIVE
The main goal of this laboratory exercise is to give students practical experienxes sing the NCBI interface. Apart from
that, it also to give students an early idea on how to navigate and perform basic and advanced searches using the
NCBI website.
THEORY
Biological databases are libraries of life sciences information, collected from scientific experiments, published
literature, high-throughput experiment technology, and computational analyses. They contain information from
research areas including genomics, proteomics, metabolomics, microarray gene expression, and phylogenetics.
Triose Phosphate Isomerase plays an important role in glycolysis and is essential for efficient energy production. TPI
has been found in nearly every organism searched for the enzyme, including animals such as mammals and insects
as well as in fungi, plants, and bacteria.
In molecular genetics, an open reading frame (ORF) is the part of a reading frame that contains no stop codons. The
transcription termination pause site is located after the ORF, beyond the translation stop codon, because if
transcription were to cease before the stop codon, an incomplete protein would be made during translation.

METHODOLOGY

RESULT & DISCUSSION


Based on question part A) in Finding Public Biological Databases, we have to click on website as stated below.
(http://www.oxfordjournals.org/nar/database/a/). Below was the screenshot of the website.

Basically Nucleic Acids Research (NAR) was established to help researcher in finding the results of research in
physical, chemical, biochemical and biological aspects of nucleic acids and proteins involved in nucleic acid
metabolism and/or interactions. This site focus on the database and summary of particularly selected NAR
database.
First is GenBank (Nucleotide Sequence Databases)

GenBank contains more than 300 000 available nucleotide sequences. NCBI helps reseacher in understanding of
fundamental molecular and genetic processes that control health and disease. For example, if we want to know
more on DNA&RNA, just click on DNA&RNA on the left side of the site. Then it will link us to all the databases related
to it. Through this, reseacher can limit their time usage and findings information in an ease method.

Second is the KEGG (Metabolic Pathways).

This site provides a database resource understanding high-level functions and utilities of the biological system. At
the main page, it has organism-specific entry points. Through this, if researcher known the specific org codes, they
can just fill it and will be directly directed to the specific page. For example. Input code of hsa which stands for Homo
sapiens (human). The new link will provide all the genome information such as pathway map, brite hierarchy,
module, blast and taxonomy.
Third is UniProtKB (Proteins).

This site provide a scientific comunity with a comprehensive, high quality and freely accessible resource of protein
sequence and functional information. For example, if we want to know amylase protein. When amylase was key in,
the page will be directed to page with all of proteins amylase database. There are 80 657 results for amylase
available in UniProtKB.

Fourth is OMIM (Online Mendelian Inheritance in Man).

Basically, OMIM provides a comprehensive, authoritative compedium of human genes and genetic phenotypes that
is freely available and updated daily. For example, type amylase and the new link will provide a range of 33 available
amylase genes.
Lastly, is GO (Gene Ontology)

It is a major bioinformatics initiative to standardize the presentation of gene and gene product across species and
databases.

Part B (NCBI Entrez and Searching Biological Databases).


For question B1), we have to use database from NCBI website to determine which among these query: gene,
proteins or nucleotide were a good search query in finding triosephosphate isomerase.
For query gene:

For query protein:

For query nucleotide:

Based on these queries, query gene was proven to be the most efficient in specify both the name of the genes as
well as the organism. From the data obtained, name of the gene is TPI1 triosephosphate isomerase 1 and the
organism is human sapiens (human).

For question B2), we have to search the RefSeq accession number for this gene in the mRNA form and protein form.

From this page, we can see clearly the RefSeq accession number for this gene in the mRNA form is NM_000365.5
and in protein form is NP_000356.1.
For question B3), we haave to determine which cromosomes does this gene lies.

For this question, the gene lies on chromosomes 12.


For question B4), we have to determine how many amino acids are presents in this gene and identify the first five
sequences.

The first five amino acids obtained are mapsr.


For question B5), we have to determine the author of of this paper and its unique PubMed ID.

For this question, the authors are Watanabe H, Seino T and Sato Y and the PubMed ID is 15358119.

Part C. Determination of the Open Reading Frame (ORF) of the Hemoglobin Alpha 2 (HBA2) Gene.
For part C1) we have to retrieve the mRNA sequence from GenBack database.

The mRNA sequence is

Start and stop codons are:


auggugcug
ggcgcgcac
uuccccacc
cagguuaag
cacguggac
aagcuucgg
acccuggcc
gacaaguuc
ggagccucg
cccuccuug

ucuccugcc
gcuggcgag
accaagacc
ggccacggc
gacaugccc
guggacccg
gcccaccuc
cuggcuucu
guagccguu
caccggccc

gacaagacc
uauggugcg
uacuucccg
aagaaggug
aacgcgcug
gucaacuuc
cccgccgag
gugagcacc
ccuccugcc
uuccugguc

aacgucaag
gaggcccug
cacuucgac
gccgacgcg
uccgcccug
aagcuccua
uucaccccu
gugcugacc
cgcugggcc
uuugaauaa

gccgccugg
gagaggaug
cugagccac
cugaccaac
agcgaccug
agccacugc
gcggugcac
uccaaauac
ucccaacgg

gguaagguc
uuccugucc
ggcucugcc
gccguggcg
cacgcgcac
cugcuggug
gccucccug
cguuaagcu
gcccuccuc

from this sequences, we can see clearly that aug acts as start codon while uaa acts as stop codon.
For the ORF, the mRNAs sequences are:
auggugcug
ggcgcgcac
uuccccacc
cagguuaag
cacguggac
aagcuucgg
acccuggcc
gacaaguuc

ucuccugcc
gcuggcgag
accaagacc
ggccacggc
gacaugccc
guggacccg
gcccaccuc
cuggcuucu

gacaagacc
uauggugcg
uacuucccg
aagaaggug
aacgcgcug
gucaacuuc
cccgccgag
gugagcacc

aacgucaag
gaggcccug
cacuucgac
gccgacgcg
uccgcccug
aagcuccua
uucaccccu
gugcugacc

gccgccugg
gagaggaug
cugagccac
cugaccaac
agcgaccug
agccacugc
gcggugcac
uccaaauac

gguaagguc
uuccugucc
ggcucugcc
gccguggcg
cacgcgcac
cugcuggug
gccucccug
cgu

For question C2),we have to translate the first 10 codons to amino acid sequence by using genetic code table.
i.
ii.
iii.
iv.
v.
vi.
vii.
viii.
ix.
x.

gug = valine (V)


cug = leucine (L)
ucu = serine (S)
ccu = proline (P)
gcc = alanine (A)
gac = aspartic acid (D)
aag = lysine (K)
acc = threonine (T)
aac = asparagine (N)
guc = valine (V)

For question B3), the results of the ORFs will be all six reading frames. The longest frames will most probably
translated to the protein.

For this question just copy/paste the mRNA sequence and the new link will shown the six frames available. By
clicking the longest length will provide the above diagram. When clicking the longest frames, a corresponding
translation is provided. Thus the statement is true.

Part D (Sequence Extraction)


Step 1-2

Step 3

Step 4

Step 5

Step 6

Step 7

From this, we can stated that ara h2 is related to peanut allergen and opsins is refering to eye gene related to longwave sensitivity and colour blindness.

CONCLUSIONS
From this experiments, students should be familiarize with the NCBI interface on findings information needed in
solving Bioinformatics problems. NCBI website contains a wide range of resources that shoulb be able to helps
students in finding solution related to their task.