Escolar Documentos
Profissional Documentos
Cultura Documentos
do
DNA
-‐-‐
1
-‐-‐
• Pergunta:
Considere
que
o
Padrao
é
o
k-‐mer
mais
frequente
se
ele
maximiza
COUNT(Texto,
Padrao)
Mensagens
escondidas
no
genoma
Agora
temos
problemas
computacionais
rigorosamente
definidos
9-‐mers
mais
frequentes
-‐
cólera
INTERESSANTE:
Entre
os
quatro
mais
frequentes
9-‐mers
na
região
oriC
da
Vibrio
cholerae,
ATGATCAAG
e
CTTGATCAT
são
complementares
reversos.
Resultando
assim
em
seis
ocorrências.
that
in appears three or
ofmore times (as itself).
moreThis statistical evidence leads us to the
check if complement)
there are other a short
DNA string
regions length
in the 500 is far
Vibrio surprising than finding a 9-mer
working
that appears three orhypothesis
more that ATGATCAAG
times (as itself). and its
This reverse complement
statistical CTTGATCAT
evidence leads indeed
us to the
occurrences of ATGATCAAG (or CTTGATCAT ). Af-
represent DnaA boxes in Vibrio cholerae. This computational conclusion makes sense
working hypothesis that ATGATCAAG and its reverse complement CTTGATCAT indeed
peats throughout the entire Vibrio
biologically the DnaAgenome,
becausecholerae protein that binds to DnaA boxes and initiates replication
Quão
frequente
é
no
genoma?
represent DnaA boxes in Vibrio cholerae. This computational conclusion makes sense
this end, we need does
to solve thewhich
not care following problem.
of the two strands it binds to. For our purposes, both ATGATCAAG
biologically because the DnaA protein that binds to DnaA boxes and initiates replication
and CTTGATCAT represent DnaA boxes.
does not care which of the
However, two strands
before concludingit binds
that we to. For
haveour purposes,
found the DnaA both ATGATCAAG
box of Vibrio cholerae,
• Aparece
and CTTGATCAT 1bioinformatician
7
vezes
represent
the careful em
should
DnaA boxes. todo
o
gifenoma
check there are other short regions in the Vibrio
However, before concluding thatmultiple
we have found theofDnaA box of (orVibrio cholerae,). Af-
string. • Na
r egião
cholerae
the careful bioinformatician
o
genome
riC,
ter all, maybe theseshould
a parece
exhibiting
6
v
checkasifrepeats
strings occur
ezes
occurrences
there are
ATGATCAAG
other short
throughout regions
the entire
CTTGATCAT
in cholerae
Vibrio the Vibrio
genome,
(500
nome. cholerae genome rather nexhibiting
ucleohdeos)
than just inmultiple occurrences
the oriC region. To thisof end, we 1C
ATGATCAAG (or CTTGATCAT
need to solve the following). problem.
Af-
ter all, maybe
ns in Genome where these strings
Pattern occur asas
appears repeats
a sub-throughout the entire Vibrio cholerae genome,
• Frequencia
rather than justPattern
in the oriC
m uito
region. To
m aior
this end,
na
we
rneed
egião
to
oriC.
solve the
Ifollowing
ndicaDvo
de
problem.
Matching Problem:
que
éFind
a
allDnaA
box
procurada.
occurrences of a pattern in a string.
Pattern Matching Problem:
Input: Strings Pattern and Genome. 1C
roblem, we discover thatOutput:
ATGATCAAG appears 17
Find all occurrences of a pattern in a string.
All starting positions in Genome where Pattern appears as a sub-
e Vibrio cholerae genome:string.
Input: Strings Pattern and Genome. 1C
Output:
3, 152394, 186189, All starting
194276, 200076,positions
224527, in Genome where Pattern appears as a sub-
After solving the Pattern Matching Problem, we discover that ATGATCAAG appears 17
53338, 679985, string.
768828, 878903, 985368
times in the following positions of the Vibrio cholerae genome:
Juntos
rences of ATGATCAAG in116556,oriC at149355,
starting positions
151913, 152013, 186189, 194276, 200076, 224527,17
After solving the Pattern Matching Problem, we152394,
discover that ATGATCAAG appears
r instances ofinATGATCAAG
times form clumps,
the following 307692, i.e., 653338,
479770, 610980,
positions of the Vibrio ap-
cholerae679985,
genome:768828, 878903, 985368
gion of the genome. You may check that the same
With the exception of the three occurrences of ATGATCAAG in oriC at starting positions
Replicação
do
DNA
-‐-‐
2
-‐-‐