Escolar Documentos
Profissional Documentos
Cultura Documentos
Submitted by
SONIKA KATTA
Master Of Technology
IN
Software Engineering
AT
ABSTRACT ii
ACKNOWLEDGMENT iii
1. INTRODUCTION
In 2001, scientists at the Weizmann Institute of Science in Israel announced that they had
manufactured a computer so small that a single drop of water would hold a trillion of the
machines. The devices used DNA and enzymes as their software and hardware and could
collectively perform a billion operations a second. Now the same team, led by Ehud Shapiro,
has announced a novel model of its biomolecular machine that no longer requires an external
energy source and performs 50 times faster than its predecessor did. The Guinness Book of
World Records has crowned it the world's smallest biological computing device.
Many designs for minuscule computers aimed at harnessing the massive storage capacity of
DNA have been proposed over the years. Earlier schemes have relied on a molecule known
as ATP, which is a common source of energy for cellular reactions, as a fuel source. But in
the new set up, a DNA molecule provides both the initial data and sufficient energy to
complete the computation.
We propose a new class of algorithms to be implemented on a DNA computer. The
algorithms we are going to introduce will not be affected much by the initial condition
change. This will give DNA computers great extensibility. Knapsack problems are classical
problems solvable by this method. It is unrealistic to solve these problems using conventional
electronic computers when the size of them gets large due to the NP-complete property of
these problems.
DNA computers using our method can solve substantially large size problems because of
their massive parallelism.
DNA
DNA (deoxyribonucleic acid) is the primary genetic material in all living organisms - a
molecule composed of two complementary strands that are wound around each other in a
double helix formation. The strands are connected by base pairs that look like rungs in a
ladder. Each base will pair with only one other: adenine (A) pairs with thymine (T), guanine
(G) pairs with cytosine (C). The sequence of each single strand can therefore be deduced by
the identity of its partner.
Genes are sections of DNA that code for a defined biochemical function, usually the
production of a protein. The DNA of an organism may contain anywhere from a dozen genes,
as in a virus, to tens of thousands of genes in higher organisms like humans. The structure of
a protein determines its function. The sequence of bases in a given gene determines the
structure of a protein. Thus the genetic code determines what proteins an organism can make
and what those proteins can do. It is estimated that only 1-3% of the DNA in our cells codes
for genes; the rest may be used as a decoy to absorb mutations that could otherwise damage
vital genes.
mRNA (Messenger RNA) is used to relay information from a gene to the protein synthesis
machinery in cells. mRNA is made by copying the sequence of a gene, with one subtle
difference: thymine (T) in DNA is substituted by uracil (U) in mRNA. This allows cells to
differentiate mRNA from DNA so that mRNA can be selectively degraded without
destroying DNA. The DNA-o-gram generator simplifies this step by taking mRNA out of the
equation.
The genetic code is the language used by living cells to convert information found in DNA
into information needed to make proteins. A protein's structure, and therefore function, is
determined by the sequence of amino acid subunits. The amino acid sequence of a protein is
determined by the sequence of the gene encoding that protein. The "words" of the genetic
code are called codons. Each codon consists of three adjacent bases in an mRNA molecule.
Using combinations of A, U, C and G, there can be sixty four different three-base codons.
There are only twenty amino acids that need to be coded for by these sixty four codons. This
excess of codons is known as the redundancy of the genetic code. By allowing more than one
codon to specify each amino acid, mutations can occur in the sequence of a gene without
affecting the resulting protein.
The DNA-o-gram generator uses the genetic code to specify letters of the alphabet instead of
coding for proteins.
1.2 Structure of DNA
Think of DNA as software, and enzymes as hardware. Put them together in a test tube. The
way in which these molecules undergo chemical reactions with each other allows simple
operations to be performed as a by-product of the reactions. The scientists tell the devices
what to do by controlling the composition of the DNA software molecules. It's a completely
different approach to pushing electrons around a dry circuit in a conventional computer.
To the naked eye, the DNA computer looks like clear water solution in a test tube. There is
no mechanical device. A trillion bio-molecular devices could fit into a single drop of water.
Instead of showing up on a computer screen, results are analyzed using a technique that
allows scientists to see the length of the DNA output molecule.
"Once the input, software, and hardware molecules are mixed in a solution it operates to
completion without intervention," said David Hawksett, the science judge at Guinness World
Records. "If you want to present the output to the naked eye, human manipulation is needed."
DNA computing, also called ‘Nano Computing’, is a rising interdisciplinary field that uses
the four DNA bases (A, T, G, C) to perform computation. DNA, a genetic material, is a
polymer of deoxyribonucleotides. The components of individual monomers –
deoxyribonucleotides (nucleotides) are:
(1) deoxyribose (pentose sugar),
(2) phosphate group,
(3) nitrogen base.
The phosphate group is attached to deoxyribose at the 5’ carbon position and 3’ carbon
position. The alternation of deoxyribose-phosphate-deoxyribose etc is referred to as
phosphate-sugar backbone of DNA. The type of chemical bond that holds the backbone
together is phosphodiester bond which is a strong covalent bond. Nitrogen base is attached to
deoxyribose at the first carbon position. Four types of bases are found in DNA. viz.-
adenine(A), cytosine(C), guanine(G) and thymine(T). The bases are classified in two
structural families:
• purine - adenine and guanine
• pyrimidine - thymine and cytosine
The DNA molecule is a polymer of four kinds of deoxyribonucleotides which are attached by
phosphodiester bond. The two long chains of DNA molecule are held together by
complementary base pairs. Three hydrogen bonds are present between the complementary
base pair G and C. Two hydrogen bonds are present between the complementary base pair A
and T.
DNA having an ability to store and process information inspires the idea of DNA computing.
DNA computing has a great advantage of in vivo computing and in vitro computing which is
independent of traditional silicon based technology. The advantages of DNA computing are
as follows:
• massive parallelism: in an in vitro assay 1018 processors working in parallel can be easily
handled.
• potential for information storage: in existing storage media data can be stored at a density
of approximately 1 bit per 1012 cubic nanometer while DNA requires only 1 cubic nanometer
to store 1 bit of data that is a trillion times less space.
• speed: although the elementary operations of DNA computer are slow in compare to
electronic computer but their parallelism would strongly prevail, so that in certain models 330
trillion operations per second can be performed which is more than 100,000 times the speed
of the fastest super computer existing today.
• energy efficiency: DNA can perform 1019 power operations using 1 joule of energy, while
a super computer only manages 1010 operations, making it 109 less energy efficient. In DNA
computers the energy consumption from DNA strand synthesis and PCR should also be small
compared to that used up by a super computer.
Deoxyribonucleic acid, or DNA, serves as the chemical vocabulary in which the genetic
makeup of every living organism on Earth is expressed. It is composed of long strings of
nucleotides, which themselves are simple molecules, each made up of a phosphate group, a
sugar, and a base. While each nucleotide’s sugar and phosphate play an important but
supporting role—namely, bonding, respectively, with the phosphate group or the sugar of
another nucleotide, forming part of a DNA strand’s “backbone”—it is by its base that each
nucleotide is characterized, and it is in the bases of the DNA nucleotides that genetic
information is stored. Unfortunately, a detailed discussion of DNA’s role in genetics is far
beyond the scope of this report.
For DNA computation, we are mainly interested in DNA’s structure and it’s binding
properties.
For simplicity, let us identify DNA nucleotides simply by their bases. There are four of these:
adenine (A), thymine (T), cytosine (C), and guanine (G). As nucleotides bind together to
form DNA strands, sequences of these bases are formed. A short DNA strand is called an
oligonucleotide or an n-mer (where n is the number of nucleotides in the strand). Here, we
represent an oligonucleotide as the string of the letters corresponding to its base sequence
(e.g. ACTG, as in Figure 1). The end of an oligonucleotide with a free phosphate group is
referred to as the 5’ end, while the end with a free sugar is called the 3’ end. One of DNA’s
interesting and important properties is its propensity to form double strands via bonding
between the nucleotide bases of two single DNA strands. However, this bonding occurs only
in a very specific manner. In particular, A can bond only to T and vice versa,
while, likewise, C can bond only to G and vice versa. Thus, any given single-stranded n-mer
has a unique complementary single-stranded n-mer—called its Watson-Crick complement—
to which it can bond. This concept is illustrated in Figure 1. For a given oligonucleotide O,
we represent its Watson-Crick complement as O.
In any model of computation, there must exist a set of operations, and in the case of DNA
computation, the fundamental operations are used to perform various manipulations of DNA
strands. Without being too specific, the operations most commonly used in the DNA
computing literature are:
• Synthesize: synthesize a desired oligonucleotide
• Mix: mix together two test tubes of DNA to perform a union
Figure 1: A 4-mer ACTG bonded to its Watson-Crick complement TGAC to form a double
strand of DNA. Here, the pentagons represent each nucleotide’s sugar, and the small circles
represent their phosphate groups. The puzzle-piece parts are the bases. Note how only A and
T fit together and likewise with C and G.
Before specifying the operations on sticker model we define a test tube to be a multiset
containing the memory complexes. The general operations on the memory complexes in a
test tube are merge, separate, set and clear.
merge: Two test tubes are combined into one. This is just mixing the solution of two test
tubes.
separate: Given a test tube T and an integer i, 1 _ i _ k this produces two test tube
+(T, i) and −(T, i) where +(T, i) (−(T, i)) contains all memory complexes whose ith substrand
is on (resp. off).
set: Given a test tube T and an integer i, 1 _ i _ k this produces another test tube
set(T, i) where each memory complex has its ith substrand on.
clear: Given a test tube T and an integer i, 1 _ i _ k this produces another test tube
clear(T, i) where each memory complex has its ith substrand off.
The input or initial test tube will be a library of memory complexes. In particular a (k, l)
library, 1 _ l _ k, consists of memory complexes with k substrands, the last k − l substrands
are on whereas the first l substrands are on and off in all possible ways. Thus, a (k, l) library
contains 2l different memory complexes.
Splicing system was proposed by Tom Head [8]. Splicing system captures mathematically the
two molecular operation cut and ligate introduced in Chapter 2. The mathematical model was
introduced and studied before Adleman’s experiment. Many results including universality of
splicing system was obtained in [7, 19, 20]. In several organisms the DNA present is circular.
If both circular and linear DNA strands are present the mathematical analysis becomes much
more complicated because then several different possibility must be handled. Despite of that
various result concerning circular splicing has been obtained in [25, 21]. In Chapter 5 we
describe a special type of circular splicing introduced in [kn:head], which has been used to
solve various combinatorial problems. From practical point of view this splicing seems
feasible because of easy availability of the particular type of circular DNA strands and the
simple operations on it. This splicing has been used in [24] to break a public key
cryptosystem.