http://www.cosmicfingerprints.com/blog/ee/ 1.1 7 Biology Myths an Electrical Engineer Would Never Tolerate As an Electrical Engineer, I am utterly appalled at the intellectual slop that passes for science in biology. Engineers would lose their jobs in droves if they tolerated the mushy thinking and lack of rigor that is routine in the life sciences. Before I elaborate on this, some background. 10 years ago I couldnt have imagined I would become interested in DNA, biology, evolution or any such thing. Biology in high school was b-o-r-i-n-g. Chemistry in college was a hard slog. I got my degree in Electrical Engineering. Specialized in communications and control systems. Graduated and developed analog circuits. Worked as an acoustical engineer. Designed the speakers in the 1994 Ford Probe, the 1995 Acura Vigor, the 1995 Jeep Cherokee and the 1996 Honda Civic. Left acoustics & pursued digital communications. Sold embedded networking hardware, software and ICs in the automation and robotics industry. Fought digital networking standards battles in manufacturing. Wrote an Ethernet book, published by the worlds #1 technical society for process control engineers. And now here I am discussing DNA, evolution, and telling you about scientific discoveries so new, you cant buy books about them in the bookstore. Im loving it. As an outsider to the biology industry I bring a very particular perspective: That of an engineer whos performed digital network design (very exact), analog circuit design (a quasi- art form), and acoustics (extremely complex and messy). All industries become incestuous as they age. They resist change. All professions are run by good ol boys clubs. In every industry, innovations almost NEVER come from the inside. Novel approaches usually come from outsiders. External innovations are opposed by the old guard because they threaten the status quo. Bill Gates was a complete outsider to the computer business. Larry and Sergey, founders of Google, were complete foreigners to the search engine game. (Early on, they tried to sell their search technology to Yahoo for $1 million but Yahoo turned them down.) Fred Smith, founder of Federal Express, was a complete virgin in the shipping industry. Ray Kroc of McDonalds wasnt a restaurant veteran; he was a milkshake machine salesman. All these people had an outsiders point of view that enabled them to see what insiders were blind to. 2
Like these men, I am a total outsider in biology. Yet despite the fact that I wouldnt pass a test on retroviruses or organic chemistry, as an EE I see certain things with crystal clarity that biologists are blind to. One reason is, in Electrical Engineering, theory matches reality better than it does in almost any other engineering discipline. Examples: In metallurgy, when you predict the failure load of a steel beam, youre lucky if your guess is within 10%. In chemical engineering, a 5-10% error factor is considered good for many reactions. Civil engineers over-design bridges by 50% to 100% just to be safe. But a model of an electrical circuit or computer chip is often accurate to within 1% and sometimes 0.01%. Because you cant see electricity and shouldnt touch it, EE is abstract and very mathematical. Its also rigorous. I cant tell you how many times in my engineering classes, the professor would be explaining something like, say, the behavior of a semiconductor, and he would derive the calculus equation from scratch. Of the appliances in your house, which ones work exactly the way theyre supposed to? Your car doesnt. Your dishwasher doesnt. Your refrigerator needs new parts every few years. The mechanical stuff is prone to problems. But your TV does exactly what its supposed to, for years. So does your iPod and your Microwave oven and your clock radio and your cell phone. You can thank an EE for that. For this reason, EEs have very high expectations of theoretical models because the model has to be built and it has to work. Engineers dont have much tolerance for B.S. Today: 7 Urban Legends Biologists Believe. but an Engineer Would Never Tolerate:
1. Random mutations are usually neutral or harmful but occasionally they confer a benefit to an organism. Natural Selection filters out the harmful mutations, causing species to evolve. This is THE central dogma of neo-Darwinism and is allegedly accepted by virtually all scientists. You will find it in literally 1,000 textbooks and 10,000 websites. To the average biologist and to the average man on the street, it sounds perfectly plausible. And I fully understand why people believe this. But Im an EE. I know that the information in DNA is a signal. By definition, random mutations are noise. 3
Telling a communications engineer that adding noise to a signal sometimes create new, useful data structures is like telling a nurse you can occasionally cure a common cold by swallowing rat poison. This is absurd! Youll be hard pressed to find any communications engineer who, upon examining this claim, would agree with it. Have you ever had a data glitch on your computer that improved your files? Ever? There is not a one single principle or practice in engineering that would ever suggest that this is actually true. All the Natural Selection in the world is powerless without a beneficial mutation. And youll never get a major benefit from accidental copying errors. The mutations that drive evolution are systematic and directed, not accidental. 2. 97% of your DNA is junk - an accumulation of evolutionary leftovers from random mutations over millions of years. The only reason anyone believes lie #2 is that they believe lie #1. Heres how any rational person can quickly figure out that #2 is B.S.: Human DNA holds 750 megabytes of data, the same as a Compact Disc. If 97% of your DNA is junk, that means the 3% that isnt junk is 22 megabytes. In other words, theyre implying that the entire plan for a human body only takes up 22 megabytes of storage space. Heck, the Windows folder on my PC - the directory that contains most of the Operating System - is 27 gigabytes. Does anyone actually think Microsoft Windows Vista is more sophisticated than the human body? Bill Gates sure doesnt. The fact that a plan for an entire human body can even be contained on one CD is nothing short of a miracle of data compression. Actual fact: DNA is not 3% efficient. Its more like 1,000% efficient. The same gene can be used in completely different ways by a dozen different processes. The result is a level of data density that software engineers only dream of. Engineers see profound elegance where biologists see junk. Which perspective is more in keeping with the aims of science? 3. You only need 3 things for evolution to occur: heredity, variation and selection. Tufts university philosopher and prominent atheist Daniel Dennett famously said this. He would never say this if he had an engineering degree. If this were true, computer viruses (which have heredity, variation and selection) would mutate all by themselves and develop resistance to anti-virus software. They dont. 4
If this were true, the pirated copy of a copy of a copy of a copy of Windows XP or The Eagles Hotel California that you can buy on the street corner for $2 in China would occasionally be superior to the original. It never is. If this were true, Bill Gates wouldnt have to employ 10,000 programmers in Redmond Washington. He would just buy truckloads of computers, add random errors to a billion copies of Windows and filter them through natural selection. Nobody writes software that way. Nobody. Have you ever wondered why? Most biologists think evolution just happens automatically. They say all you need is time and a lot of raw materials and it will just happen. So why dont computer programs ever evolve by themselves? They dont and they never will - not unless theyre programmed to do so. Evolution is not a given; at some level its always a design feature. Software programmers will tell you that self- adaptive code is profoundly difficult to write. Never happens by accident.
4. Biology is nothing more than sophisticated physics and chemistry. Thats like saying the Internet is nothing more than sophisticated copper wire and silicon chips. Im an e-commerce consultant. I practically live on the Internet. I have conversations with people about the Internet all the time. Nobody I talk to ever describes the Internet that way. Do you? You talk about things like email and Google and Facebook. You tell your friend about the Youtube video where the guy goes to every country in the world and does his little dance jig. And the latest gaffe by Sarah Palin. All those things are INFORMATION. 90% of Electrical Engineering is concerned with controlling and processing information. Only a small part of EE is concerned with things like motors and generators and watts and horsepower. Even power equipment is controlled by information. All the interesting things you do with electricity involve signals or digital codes. Temperature measurement or text messages or a radio transmission. The SOFTWARE is more interesting than the hardware. So it is with DNA. Chemicals are just the hardware. Until the biology profession accepts that the real power in biology is in the information - the software and not the chemicals - it will continue to slam into brick walls and put forth evolutionary theories that make wrong predictions. It will continue to get nowhere in Origin of Life research. 5
Information never improves by accident. Information evolves only through highly structured processes. 5. Genetic Algorithms Prove Darwinian Evolution. A Genetic Algorithm (GA) is a computer program that modifies code and then evaluates the code against some pre-programmed goal, keeping the winners and discarding the losers. GAs refine software programs through an evolution-like process. GAs are not a be-all-end-all by any means, and they have limited application. But they are useful. Some years ago Richard Dawkins wrote a software program that took the following garbage text: WDLTMNLT DTJBKWIRZREZLMQCO P After only 43 iterations, by deleting characters it didnt want, the program reached its pre- programmed goal: METHINKS IT IS LIKE A WEASEL Traditional Darwinian evolution by definition has no goals, just blind natural selection. Dawkins program has a definite goal and is programmed to reach it. This program has nothing to do with formal Darwinian evolution. Its intelligent evolution. Every single Genetic Algorithm Ive ever seen, no matter how simple or complicated, only works if it has pre-programmed goals. Which requires both a program and objectives. Ive never seen a GA that actually mirrored Darwinian Evolution. They always sneak in some element of design. Which only adds to the reasons why the Neo-Darwinian theory of purposeless random events is wrong. Real world evolution is pre-programmed and has goals of some sort pre-loaded. Ive never seen an exception. This is no different than computer programs that evolve.
6. The human eye is a pathetic design. Its got a big blind spot and the wires are installed backwards. There are many, many variations on this argument. Its just another version of Junk DNA. When I was a manufacturing production manager, I had to produce an indicator lamp assembly for a piece of equipment. The design had a light bulb and 2 identical resistors, which I thought were stupid. I suggested that we replace the 2 resistors with one resistor of twice the value. This would save money and space. I told the customer the design was obviously lousy. 6
The engineer got angry and almost took his business elsewhere. Then my boss spent 30 minutes lecturing me. He reminded me that my job was to put the customers product into production, not insult him with my warped critique of his design skills. What I didnt know was that 600 volts would arc across one resistor, but not across two. A second, redundant resistor was an elegant way to solve that problem and it only cost 2 cents. I learned the hard way that when you criticize a design, you may have a very incomplete picture of the many constraints the designer has to work within. Designs always have delicate tradeoffs. Some have amazing performance but are extremely difficult to manufacture. Sometimes a minor change in material would make a huge improvement but the material is unavailable. Sometimes you have to make a compromise between 15 competing priorities. Sometimes people have no appreciation for how difficult that maze is to navigate. I am not saying that there are no sub-optimal designs in biology - Im sure there are lots of sub-optimal designs. Furthermore I do believe that life followed an evolutionary process and many designs are best guesses engineered by the organisms ancestors. But human beings must be very careful to not proudly assert that we could obviously do better. We dont know that. We do not understand whats involved in designing an eye because weve never built one. My friend, if you lose your eye, theres not a single arrogant scientist in the world who can build you a new one. Especially not the scientists who try to tell you why the design of the eye is pathetic. If I were selecting an eye surgeon, Id look for one who has deep respect for the eye, not disdain for it. How about you? Every engineer knows that you never truly know how something works until you can build it. Merely taking it apart is not enough. Until we can DESIGN eyes for ourselves, we must be very cautious about what we say. The scientist must ALWAYS be humble in the face of nature and you should be wary of anyone who is not.
7. There is no such thing as purpose in nature. There is only the appearance of purpose. Teleology is a scientific term which is defined as purpose in nature. Atheism denies teleology in the universe. For this reason some biologists have forbidden their students to use purposeful language. In 1974 Ernst Mayr illustrated it like this: 1. The Wood Thrush migrates in the fall in order to escape the inclemency of the weather and the food shortages of the northern climates. 7
2. The Wood Thrush migrates in the fall and thereby escapes the inclemency of the weather and the food shortages of the northern climates. Statement #1 is purposeful, statement #2 is not. Mayr does fancy footwork in order to avoid reference to design in biology. (It also converts all of his writing to colorless passive sentences. Any good writer will tell you passive language is a sign of mushy thinking.) The famous biologist JBS Haldane joked, Teleology is like a mistress to a biologist: he cannot live without her but hes unwilling to be seen with her in public. Everything in biology is purposeful. Which is precisely why biology is fundamentally different than chemistry. Chemicals have no purpose. Organisms do. You cannot formulate a coherent description of life if you deny purpose. For proof of this, look no further than the genetic code. Every codon in DNA maps to an amino acid that it is SUPPOSED TO make - but an error is possible. It is not possible to even talk about any code at all without acknowledging purpose. Purpose is absolutely implicit in every strand of DNA in every organism in the world. In his book Perceptual Control Theory, William Powers explains that the study of any goal- directed (control feedback) system is fundamentally different than the study of rocks or chemicals or magnetic fields or anything purely physical. The failure to acknowledge this has wreaked all kinds of havoc in science for 150 years. Even something as simple as a thermostat cannot be understood if you see it as only an assembly of molecules. A thermostat is programmed to hold your room at a certain temperature. The thermostats purpose can only be understood from a top-down point of view. It has a goal. In Electrical Engineering, the top-down nature of information is described by something we call the OSI 7 Layer Model. Simplified explanation: The 7 Layer model says that in your computer, theres an Ethernet cable that connects you to the Internet. The copper wire and the voltage on that wire is Layer 1 - the physical layer. Layer 2 is is the 1s and 0s that voltage represents. Layers 3, 4, 5 and 6 are the operating system and layer 7 is your spreadsheet or email program or web browser, the application layer. When you send me an email, information is encoded from the top down and sent through your Ethernet cable. When I receive your email, information is decoded from the bottom up starting with the signal on the cable, and I read your email on my screen. ALL information is organized this way - in a top-down hierarchy. The wire has its purpose. The 1s and 0s have their purpose. The operating system has a purpose, my email program has a purpose and your message has a purpose. 8
You cannot deny purpose in computers or biology without immediately contradicting yourself 2 minutes later. Even a person who denies purpose is purposefully denying it. Everything I just told you, I absolutely know to be true as a result of my education and experience as an engineer. Which is why Im willing to make ballsy proclamations like Darwinism as we currently know it is going to crumble in the next 2-5 years. Yes, I know that might sound ridiculous. Some people scoff at that. But people scoffed at the idea that communism would fall. They were quick to remind you that every time someone tried to cross the Berlin Wall, they got shot by the guards in the tower. But then one day someone made it across and nobody opened fire. Then another. Then another. It didnt take long before that wall became rubble. The fall of communism was surprisingly swift and thorough. Just a few years ago, people mocked the idea that real estate prices would stop rising. But those who had a deep understanding of the inside story of both of those industries saw the cracks forming. (My friend Nathan, who was a mortgage broker at the time, tells me about stacks of paper being sold, that no sane investor would touch with a 10 foot pole.) Darwinism as we know it CANNOT stand under the weight of 21st century DNA research. Its impossible. Because Ive read the literature. Amazon is absolutely littered with books written from every imaginable point of view, both religious and non-religious, pointing to the creaking, groaning edifice of Neo-Darwinism. It is inevitable that it will fall. And its not going to be long. It will be replaced by an algorithmic model of Evolution. BOLD HYPOTHESIS: When Biologists accept what Electrical Engineers know about information, a whole bunch of problems in biology will be solved: 1. The random mutation theory will be discarded. It will be replaced with Transposition, Natural Genetic Engineering, Horizontal Gene Transfer and Genome Doubling. Suddenly evolution will make sense because it is understood as an engineered process not random accident. 2. Well discover that what was originally thought to be junk DNA is actually the heart of the most sophisticated database format ever devised. 3a. Evolution will not be taken for granted but deeply appreciated as an utterly ingenious mechanism, pre-programmed into living things. As software engineers replicate the evolutionary algorithm in computer programs, well achieve huge breakthroughs in Artificial Intelligence. 9
3b: Evolution is orchestrated at a very high level within the organism. It is controlled by a mechanism that is currently poorly understood. This mechanism is beautifully efficient, elegant, fractal, and follows a very exact mathematical protocol. Bioninformatics will become the most rigorous discipline in engineering. The code of this protocol will be cracked because of the Human Genome Project and the public availability of DNA sequences. This discovery will lay the foundation of an entire new branch of Computer Science in the 21st century. 4. The Physics and Chemistry paradigm of biology will be replaced with a Bioinformatics paradigm. Evolution and the origin of life theories will make much more successful predictions. 5. Neo-Darwinism will be discarded because biologists will recognize that biological evolution is just like Genetic Algorithms: It employs pre-programmed goals and educated guesses, not random chance. 6. Rather than assuming designs in biology are pathetic or stupid well discover deeper reasons for why organisms are the way they are. And greater insights into the subtlety of living things. 7. Everything in biology makes sense once you understand that every single one of the 5 million trillion trillion cells on earth is purposeful and intentional and the original cells were designed to evolve and adapt. Finally I would like to suggest that there is nothing in the world that can teach us more about digital communications and software programming than DNA. DNA is an absolute gold mine, a treasure trove of insights of data storage, error correction, software architecture, robust design and fractal data compression. Every Electrical Engineer and Computer Science major should study it intensively. And there is much we engineers can learn from the biologists - because even the simplest living thing is more elegant than the greatest man-made supercomputer. As Engineers and Biologists begin to talk to each other, the 21st century will be amazing indeed. Perry Marshall P.S.: Innovations almost always come from outsiders. This means that those who read widely and embrace multiple disciplines - pockets of humanity that dont normally talk to each other - can enjoy long and prosperous careers as innovators. The watchword of 21st century biology will be Interdisciplinary - the great mysteries will be solved by people who bring the expertise of other fields to bear on the biggest questions in science. My challenge to you: Make a deliberate decision to step outside of your normal and familiar environment and innovate. The world will reward you for it. 10
2.0 The Information Challenge By Richard Dawkins http://www.skeptics.com.au/publications/articles/the-information-challenge/ In September 1997, I allowed an Australian film crew into my house in Oxford without realising that their purpose was creationist propaganda. In the course of a suspiciously amateurish interview, they issued a truculent challenge to me to give an example of a genetic mutation or an evolutionary process which can be seen to increase the information in the genome. It is the kind of question only a creationist would ask in that way, and it was at this point I tumbled to the fact that I had been duped into granting an interview to creationists a thing I normally dont do, for good reasons. In my anger I refused to discuss the question further, and told them to stop the camera. However, I eventually withdrew my peremptory termination of the interview as a whole. This was solely because they pleaded with me that they had come all the way from Australia specifically in order to interview me. Even if this was a considerable exaggeration, it seemed, on reflection, ungenerous to tear up the legal release form and throw them out. I therefore relented. My generosity was rewarded in a fashion that anyone familiar with fundamentalist tactics might have predicted. When I eventually saw the film a year later 1 , I found that it had been edited to give the false impression that I was incapable of answering the question about information content 2 . In fairness, this may not have been quite as intentionally deceitful as it sounds. You have to understand that these people really believe that their question cannot be answered! Pathetic as it sounds, their entire journey from Australia seems to have been a quest to film an evolutionist failing to answer it. With hindsight given that I had been suckered into admitting them into my house in the first place it might have been wiser simply to answer the question. But I like to be understood whenever I open my mouth I have a horror of blinding people with science and this was not a question that could be answered in a soundbite. First you first have to explain the technical meaning of information. Then the relevance to evolution, too, is complicated not really difficult but it takes time. Rather than engage now in further recriminations and disputes about exactly what happened at the time of the interview (for, to be fair, I should say that the Australian producers memory of events seems to differ from mine), I shall try to redress the matter now in constructive fashion by answering the original question, the Information Challenge, at adequate length the sort of length you can achieve in a proper article. 2.1 Information The technical definition of information was introduced by the American engineer Claude Shannon in 1948. An employee of the Bell Telephone Company, Shannon was concerned to measure information as an economic commodity. It is costly to send messages along a telephone line. Much of what passes in a message is not information: it is redundant. You could save money by recoding the message to remove the redundancy. Redundancy was a second technical term introduced by Shannon, as the inverse of information. Both definitions were mathematical, but we can convey Shannons intuitive meaning in words. 11
Redundancy is any part of a message that is not informative, either because the recipient already knows it (is not surprised by it) or because it duplicates other parts of the message. In the sentence Rover is a poodle dog, the word dog is redundant because poodle already tells us that Rover is a dog. An economical telegram would omit it, thereby increasing the informative proportion of the message. Arr JFK Fri pm pls mt BA Cncrd flt carries the same information as the much longer, but more redundant, Ill be arriving at John F Kennedy airport on Friday evening; please meet the British Airways Concorde flight. Obviously the brief, telegraphic message is cheaper to send (although the recipient may have to work harder to decipher it redundancy has its virtues if we forget economics). Shannon wanted to find a mathematical way to capture the idea that any message could be broken into the information (which is worth paying for), the redundancy (which can, with economic advantage, be deleted from the message because, in effect, it can be reconstructed by the recipient) and the noise (which is just random rubbish). It rained in Oxford every day this week carries relatively little information, because the receiver is not surprised by it. On the other hand, It rained in the Sahara desert every day this week would be a message with high information content, well worth paying extra to send. Shannon wanted to capture this sense of information content as surprise value. It is related to the other sense that which is not duplicated in other parts of the message because repetitions lose their power to surprise. Note that Shannons definition of the quantity of information is independent of whether it is true. The measure he came up with was ingenious and intuitively satisfying. Lets estimate, he suggested, the receivers ignorance or uncertainty before receiving the message, and then compare it with the receivers remaining ignorance after receiving the message. The quantity of ignorance-reduction is the information content. Shannons unit of information is the bit, short for binary digit. One bit is defined as the amount of information needed to halve the receivers prior uncertainty, however great that prior uncertainty was (mathematical readers will notice that the bit is, therefore, a logarithmic measure). In practice, you first have to find a way of measuring the prior uncertainty that which is reduced by the information when it comes. For particular kinds of simple message, this is easily done in terms of probabilities. An expectant father watches the Caesarian birth of his child through a window into the operating theatre. He cant see any details, so a nurse has agreed to hold up a pink card if it is a girl, blue for a boy. How much information is conveyed when, say, the nurse flourishes the pink card to the delighted father? The answer is one bit the prior uncertainty is halved. The father knows that a baby of some kind has been born, so his uncertainty amounts to just two possibilities boy and girl and they are (for purposes of this discussion) equal. The pink card halves the fathers prior uncertainty from two possibilities to one (girl). If thered been no pink card but a doctor had walked out of the operating theatre, shook the fathers hand and said Congratulations old chap, Im delighted to be the first to tell you that you have a daughter, the information conveyed by the 17 word message would still be only one bit. 2.2 Computer information Computer information is held in a sequence of noughts and ones. There are only two possibilities, so each 0 or 1 can hold one bit. The memory capacity of a computer, or the storage capacity of a disc or tape, is often measured in bits, and this is the total number of 0s 12
or 1s that it can hold. For some purposes, more convenient units of measurement are the byte (8 bits), the kilobyte (1000 bytes or 8000 bits), the megabyte (a million bytes or 8 million bits) or the gigabyte (1000 million bytes or 8000 million bits). Notice that these figures refer to the total available capacity. This is the maximum quantity of information that the device is capable of storing. The actual amount of information stored is something else. The capacity of my hard disc happens to be 4.2 gigabytes. Of this, about 1.4 gigabytes are actually being used to store data at present. But even this is not the true information content of the disc in Shannons sense. The true information content is smaller, because the information could be more economically stored. You can get some idea of the true information content by using one of those ingenious compression programs like Stuffit. Stuffit looks for redundancy in the sequence of 0s and 1s, and removes a hefty proportion of it by recoding stripping out internal predictability. Maximum information content would be achieved (probably never in practice) only if every 1 or 0 surprised us equally. Before data is transmitted in bulk around the Internet, it is routinely compressed to reduce redundancy. Thats good economics. But on the other hand it is also a good idea to keep some redundancy in messages, to help correct errors. In a message that is totally free of redundancy, after theres been an error there is no means of reconstructing what was intended. Computer codes often incorporate deliberately redundant parity bits to aid in error detection. DNA, too, has various error-correcting procedures which depend upon redundancy. When I come on to talk of genomes, Ill return to the three-way distinction between total information capacity, information capacity actually used, and true information content. It was Shannons insight that information of any kind, no matter what it means, no matter whether it is true or false, and no matter by what physical medium it is carried, can be measured in bits, and is translatable into any other medium of information. The great biologist J B S Haldane used Shannons theory to compute the number of bits of information conveyed by a worker bee to her hivemates when she dances the location of a food source (about 3 bits to tell about the direction of the food and another 3 bits for the distance of the food). In the same units, I recently calculated that Id need to set aside 120 megabits of laptop computer memory to store the triumphal opening chords of Richard Strausss Also Sprach Zarathustra (the 2001 theme) which I wanted to play in the middle of a lecture about evolution. Shannons economics enable you to calculate how much modem time itll cost you to e-mail the complete text of a book to a publisher in another land. Fifty years after Shannon, the idea of information as a commodity, as measurable and interconvertible as money or energy, has come into its own. 2.3 DNA information DNA carries information in a very computer-like way, and we can measure the genomes capacity in bits too, if we wish. DNA doesnt use a binary code, but a quaternary one. Whereas the unit of information in the computer is a 1 or a 0, the unit in DNA can be T, A, C or G. If I tell you that a particular location in a DNA sequence is a T, how much information is conveyed from me to you? Begin by measuring the prior uncertainty. How many possibilities are open before the message T arrives? Four. How many possibilities remain after it has arrived? One. So you might think the information transferred is four bits, but actually it is two. Heres why (assuming that the four letters are equally probable, like the four suits in a pack of cards). Remember that Shannons metric is concerned with the most 13
economical way of conveying the message. Think of it as the number of yes/no questions that youd have to ask in order to narrow down to certainty, from an initial uncertainty of four possibilities, assuming that you planned your questions in the most economical way. Is the mystery letter before D in the alphabet? No. That narrows it down to T or G, and now we need only one more question to clinch it. So, by this method of measuring, each letter of the DNA has an information capacity of 2 bits. Whenever prior uncertainty of recipient can be expressed as a number of equiprobable alternatives N, the information content of a message which narrows those alternatives down to one is log2N (the power to which 2 must be raised in order to yield the number of alternatives N). If you pick a card, any card, from a normal pack, a statement of the identity of the card carries log252, or 5.7 bits of information. In other words, given a large number of guessing games, it would take 5.7 yes/no questions on average to guess the card, provided the questions are asked in the most economical way. The first two questions might establish the suit. (Is it red? Is it a diamond?) the remaining three or four questions would successively divide and conquer the suit (is it a 7 or higher? etc.), finally homing in on the chosen card. When the prior uncertainty is some mixture of alternatives that are not equiprobable, Shannons formula becomes a slightly more elaborate weighted average, but it is essentially similar. By the way, Shannons weighted average is the same formula as physicists have used, since the nineteenth century, for entropy. The point has interesting implications but I shall not pursue them here. 2.4 Information and evolution Thats enough background on information theory. It is a theory which has long held a fascination for me, and I have used it in several of my research papers over the years. Lets now think how we might use it to ask whether the information content of genomes increases in evolution. First, recall the three way distinction between total information capacity, the capacity that is actually used, and the true information content when stored in the most economical way possible. The total information capacity of the human genome is measured in gigabits. That of the common gut bacterium Escherichia coli is measured in megabits. We, like all other animals, are descended from an ancestor which, were it available for our study today, wed classify as a bacterium. So perhaps, during the billions of years of evolution since that ancestor lived, the information capacity of our genome has gone up about three orders of magnitude (powers of ten) about a thousandfold. This is satisfyingly plausible and comforting to human dignity. Should human dignity feel wounded, then, by the fact that the crested newt, Triturus cristatus, has a genome capacity estimated at 40 gigabits, an order of magnitude larger than the human genome? No, because, in any case, most of the capacity of the genome of any animal is not used to store useful information. There are many nonfunctional pseudogenes (see below) and lots of repetitive nonsense, useful for forensic detectives but not translated into protein in the living cells. The crested newt has a bigger hard disc than we have, but since the great bulk of both our hard discs is unused, we neednt feel insulted. Related species of newt have much smaller genomes. Why the Creator should have played fast and loose with the genome sizes of newts in such a capricious way is a problem that creationists might like to ponder. From an evolutionary point of view the explanation is simple (see The Selfish Gene pp 44 45 and p 275 in the Second Edition). 14
2.5 Gene duplication Evidently the total information capacity of genomes is very variable across the living kingdoms, and it must have changed greatly in evolution, presumably in both directions. Losses of genetic material are called deletions. New genes arise through various kinds of duplication. This is well illustrated by haemoglobin, the complex protein molecule that transports oxygen in the blood. Human adult haemoglobin is actually a composite of four protein chains called globins, knotted around each other. Their detailed sequences show that the four globin chains are closely related to each other, but they are not identical. Two of them are called alpha globins (each a chain of 141 amino acids), and two are beta globins (each a chain of 146 amino acids). The genes coding for the alpha globins are on chromosome 11; those coding for the beta globins are on chromosome 16. On each of these chromosomes, there is a cluster of globin genes in a row, interspersed with some junk DNA. The alpha cluster, on Chromosome 11, contains seven globin genes. Four of these are pseudogenes, versions of alpha disabled by faults in their sequence and not translated into proteins. Two are true alpha globins, used in the adult. The final one is called zeta and is used only in embryos. Similarly the beta cluster, on chromosome 16, has six genes, some of which are disabled, and one of which is used only in the embryo. Adult haemoglobin, as weve seen contains two alpha and two beta chains. Never mind all this complexity. Heres the fascinating point. Careful letter-by-letter analysis shows that these different kinds of globin genes are literally cousins of each other, literally members of a family. But these distant cousins still coexist inside our own genome, and that of all vertebrates. On a the scale of whole organism, the vertebrates are our cousins too. The tree of vertebrate evolution is the family tree we are all familiar with, its branch-points representing speciation events the splitting of species into pairs of daughter species. But there is another family tree occupying the same timescale, whose branches represent not speciation events but gene duplication events within genomes. The dozen or so different globins inside you are descended from an ancient globin gene which, in a remote ancestor who lived about half a billion years ago, duplicated, after which both copies stayed in the genome. There were then two copies of it, in different parts of the genome of all descendant animals. One copy was destined to give rise to the alpha cluster (on what would eventually become Chromosome 11 in our genome), the other to the beta cluster (on Chromosome 16). As the aeons passed, there were further duplications (and doubtless some deletions as well). Around 400 million years ago the ancestral alpha gene duplicated again, but this time the two copies remained near neighbours of each other, in a cluster on the same chromosome. One of them was destined to become the zeta of our embryos, the other became the alpha globin genes of adult humans (other branches gave rise to the nonfunctional pseudogenes I mentioned). It was a similar story along the beta branch of the family, but with duplications at other moments in geological history. Now heres an equally fascinating point. Given that the split between the alpha cluster and the beta cluster took place 500 million years ago, it will of course not be just our human genomes that show the split possess alpha genes in a different part of the genome from beta genes. We should see the same within-genome split if we look at any other mammals, at birds, reptiles, amphibians and bony fish, for our common ancestor with all of them lived less than 15
500 million years ago. Wherever it has been investigated, this expectation has proved correct. Our greatest hope of finding a vertebrate that does not share with us the ancient alpha/beta split would be a jawless fish like a lamprey, for they are our most remote cousins among surviving vertebrates; they are the only surviving vertebrates whose common ancestor with the rest of the vertebrates is sufficiently ancient that it could have predated the alpha/beta split. Sure enough, these jawless fishes are the only known vertebrates that lack the alpha/beta divide. Gene duplication, within the genome, has a similar historic impact to species duplication (speciation) in phylogeny. It is responsible for gene diversity, in the same way as speciation is responsible for phyletic diversity. Beginning with a single universal ancestor, the magnificent diversity of life has come about through a series of branchings of new species, which eventually gave rise to the major branches of the living kingdoms and the hundreds of millions of separate species that have graced the earth. A similar series of branchings, but this time within genomes gene duplications has spawned the large and diverse population of clusters of genes that constitutes the modern genome. The story of the globins is just one among many. Gene duplications and deletions have occurred from time to time throughout genomes. It is by these, and similar means, that genome sizes can increase in evolution. But remember the distinction between the total capacity of the whole genome, and the capacity of the portion that is actually used. Recall that not all the globin genes are actually used. Some of them, like theta in the alpha cluster of globin genes, are pseudogenes, recognizably kin to functional genes in the same genomes, but never actually translated into the action language of protein. What is true of globins is true of most other genes. Genomes are littered with nonfunctional pseudogenes, faulty duplicates of functional genes that do nothing, while their functional cousins (the word doesnt even need scare quotes) get on with their business in a different part of the same genome. And theres lots more DNA that doesnt even deserve the name pseudogene. It, too, is derived by duplication, but not duplication of functional genes. It consists of multiple copies of junk, tandem repeats, and other nonsense which may be useful for forensic detectives but which doesnt seem to be used in the body itself. Once again, creationists might spend some earnest time speculating on why the Creator should bother to litter genomes with untranslated pseudogenes and junk tandem repeat DNA. 2.6 Information in the genome Can we measure the information capacity of that portion of the genome which is actually used? We can at least estimate it. In the case of the human genome it is about 2% considerably less than the proportion of my hard disc that I have ever used since I bought it. Presumably the equivalent figure for the crested newt is even smaller, but I dont know if it has been measured. In any case, we mustnt run away with a chaunvinistic idea that the human genome somehow ought to have the largest DNA database because we are so wonderful. The great evolutionary biologist George C Williams has pointed out that animals with complicated life cycles need to code for the development of all stages in the life cycle, but they only have one genome with which to do so. A butterflys genome has to hold the complete information needed for building a caterpillar as well as a butterfly. A sheep liver fluke has six distinct stages in its life cycle, each specialised for a different way of life. We 16
shouldnt feel too insulted if liver flukes turned out to have bigger genomes than we have (actually they dont). Remember, too, that even the total capacity of genome that is actually used is still not the same thing as the true information content in Shannons sense. The true information content is whats left when the redundancy has been compressed out of the message, by the theoretical equivalent of Stuffit. There are even some viruses which seem to use a kind of Stuffit-like compression. They make use of the fact that the RNA (not DNA in these viruses, as it happens, but the principle is the same) code is read in triplets. There is a frame which moves along the RNA sequence, reading off three letters at a time. Obviously, under normal conditions, if the frame starts reading in the wrong place (as in a so-called frame-shift mutation), it makes total nonsense: the triplets that it reads are out of step with the meaningful ones. But these splendid viruses actually exploit frame-shifted reading. They get two messages for the price of one, by having a completely different message embedded in the very same series of letters when read frame-shifted. In principle you could even get three messages for the price of one, but I dont know whether there are any examples. 2.7 Information in the body It is one thing to estimate the total information capacity of a genome, and the amount of the genome that is actually used, but its harder to estimate its true information content in the Shannon sense. The best we can do is probably to forget about the genome itself and look at its product, the phenotype, the working body of the animal or plant itself. In 1951, J W S Pringle, who later became my Professor at Oxford, suggested using a Shannon-type information measure to estimate complexity. Pringle wanted to express complexity mathematically in bits, but I have long found the following verbal form helpful in explaining his idea to students. We have an intuitive sense that a lobster, say, is more complex (more advanced, some might even say more highly evolved) than another animal, perhaps a millipede. Can we measure something in order to confirm or deny our intuition? Without literally turning it into bits, we can make an approximate estimation of the information contents of the two bodies as follows. Imagine writing a book describing the lobster. Now write another book describing the millipede down to the same level of detail. Divide the word-count in one book by the word-count in the other, and you have an approximate estimate of the relative information content of lobster and millipede. It is important to specify that both books describe their respective animals down to the same level of detail. Obviously if we describe the millipede down to cellular detail, but stick to gross anatomical features in the case of the lobster, the millipede would come out ahead. But if we do the test fairly, Ill bet the lobster book would come out longer than the millipede book. Its a simple plausibility argument, as follows. Both animals are made up of segments modules of bodily architecture that are fundamentally similar to each other, arranged fore-and-aft like the trucks of a train. The millipedes segments are mostly identical to each other. The lobsters segments, though following the same basic plan (each with a nervous ganglion, a pair of appendages, and so on) are mostly different from each other. The millipede book would consist of one chapter describing a typical segment, followed by the phrase Repeat N times where N is the number of segments. The lobster book would need a 17
different chapter for each segment. This isnt quite fair on the millipede, whose front and rear end segments are a bit different from the rest. But Id still bet that, if anyone bothered to do the experiment, the estimate of lobster information content would come out substantially greater than the estimate of millipede information content. Its not of direct evolutionary interest to compare a lobster with a millipede in this way, because nobody thinks lobsters evolved from millipedes. Obviously no modern animal evolved from any other modern animal. Instead, any pair of modern animals had a last common ancestor which lived at some (in principle) discoverable moment in geological history. Almost all of evolution happened way back in the past, which makes it hard to study details. But we can use the length of book thought-experiment to agree upon what it would mean to ask the question whether information content increases over evolution, if only we had ancestral animals to look at. The answer in practice is complicated and controversial, all bound up with a vigorous debate over whether evolution is, in general, progressive. I am one of those associated with a limited form of yes answer. My colleague Stephen Jay Gould tends towards a no answer. I dont think anybody would deny that, by any method of measuring whether bodily information content, total information capacity of genome, capacity of genome actually used, or true (Stuffit compressed) information content of genome there has been a broad overall trend towards increased information content during the course of human evolution from our remote bacterial ancestors. People might disagree, however, over two important questions: first, whether such a trend is to be found in all, or a majority of evolutionary lineages (for example parasite evolution often shows a trend towards decreasing bodily complexity, because parasites are better off being simple); second, whether, even in lineages where there is a clear overall trend over the very long term, it is bucked by so many reversals and re-reversals in the shorter term as to undermine the very idea of progress. This is not the place to resolve this interesting controversy. There are distinguished biologists with good arguments on both sides. Supporters of intelligent design guiding evolution, by the way, should be deeply committed to the view that information content increases during evolution. Even if the information comes from God, perhaps especially if it does, it should surely increase, and the increase should presumably show itself in the genome. Unless, of course for anything goes in such addle-brained theorising God works his evolutionary miracles by nongenetic means. Perhaps the main lesson we should learn from Pringle is that the information content of a biological system is another name for its complexity. Therefore the creationist challenge with which we began is tantamount to the standard challenge to explain how biological complexity can evolve from simpler antecedents, one that I have devoted three books to answering (The Blind Watchmaker, River Out of Eden, Climbing Mount Improbable) and I do not propose to repeat their contents here. The information challenge turns out to be none other than our old friend: How could something as complex as an eye evolve? It is just dressed up in fancy mathematical language perhaps in an attempt to bamboozle. Or perhaps those who ask it have already bamboozled themselves, and dont realise that it is the same old and thoroughly answered question. 18
2.8 The Genetic Book of the Dead Let me turn, finally, to another way of looking at whether the information content of genomes increases in evolution. We now switch from the broad sweep of evolutionary history to the minutiae of natural selection. Natural selection itself, when you think about it, is a narrowing down from a wide initial field of possible alternatives, to the narrower field of the alternatives actually chosen. Random genetic error (mutation), sexual recombination and migratory mixing, all provide a wide field of genetic variation: the available alternatives. Mutation is not an increase in true information content, rather the reverse, for mutation, in the Shannon analogy, contributes to increasing the prior uncertainty. But now we come to natural selection, which reduces the prior uncertainty and therefore, in Shannons sense, contributes information to the gene pool. In every generation, natural selection removes the less successful genes from the gene pool, so the remaining gene pool is a narrower subset. The narrowing is nonrandom, in the direction of improvement, where improvement is defined, in the Darwinian way, as improvement in fitness to survive and reproduce. Of course the total range of variation is topped up again in every generation by new mutation and other kinds of variation. But it still remains true that natural selection is a narrowing down from an initially wider field of possibilities, including mostly unsuccessful ones, to a narrower field of successful ones. This is analogous to the definition of information with which we began: information is what enables the narrowing down from prior uncertainty (the initial range of possibilities) to later certainty (the successful choice among the prior probabilities). According to this analogy, natural selection is by definition a process whereby information is fed into the gene pool of the next generation. If natural selection feeds information into gene pools, what is the information about? It is about how to survive. Strictly it is about how to survive and reproduce, in the conditions that prevailed when previous generations were alive. To the extent that present day conditions are different from ancestral conditions, the ancestral genetic advice will be wrong. In extreme cases, the species may then go extinct. To the extent that conditions for the present generation are not too different from conditions for past generations, the information fed into present-day genomes from past generations is helpful information. Information from the ancestral past can be seen as a manual for surviving in the present: a family bible of ancestral advice on how to survive today. We need only a little poetic licence to say that the information fed into modern genomes by natural selection is actually information about ancient environments in which ancestors survived. This idea of information fed from ancestral generations into descendant gene pools is one of the themes of my new book, Unweaving the Rainbow. It takes a whole chapter, The Genetic Book of the Dead, to develop the notion, so I wont repeat it here except to say two things. First, it is the whole gene pool of the species as a whole, not the genome of any particular individual, which is best seen as the recipient of the ancestral information about how to survive. The genomes of particular individuals are random samples of the current gene pool, randomised by sexual recombination. Second, we are privileged to intercept the information if we wish, and read an animals body, or even its genes, as a coded description of ancestral worlds. To quote from Unweaving the Rainbow: And isnt it an arresting thought? We are digital archives of the African Pliocene, even of Devonian seas; walking repositories of wisdom out of the old days. You could spend a lifetime reading in this ancient library and die unsated by the wonder of it. 19
1 The producers never deigned to send me a copy: I completely forgot about it until an American colleague called it to my attention. 2 See Barry Williams (1998): Creationist Deception Exposed, The Skeptic 18, 3, pp 7 10, for an account of how my long pause (trying to decide whether to throw them out) was made to look like hesitant inability to answer the question, followed by an apparently evasive answer to a completely different question.
20
A Response to Dr. Dawkins' "The Information Challenge" By: Casey Luskin Evolution News & Views October 4, 2007
In September, 2007, I posted a link to a YouTube video where Richard Dawkins was asked to explain the origin of genetic information, according to Darwinism. I also posted a link to Dawkins' rebuttal to the video, where he purports to explain the origin of genetic information according to Darwinian evolution. The question posed to Dawkins was, "Can you give an example of a genetic mutation or evolutionary process that can be seen to increase the information in the genome?" Dawkins famously commented that the question was "the kind of question only a creationist would ask . . ." Dawkins writes, "In my anger I refused to discuss the question further, and told them to stop the camera." Dawkins' highly emotional response calls into question whether he is capable of addressing this issue objectively. This will be a response assessing Dawkins' answer to "The Information Challenge."
Part 1: Specified Complexity I s the Measure of Biological Complexity. Dawkins writes, "First you first have to explain the technical meaning of 'information'." While that sounds reasonable, Dawkins pulls a bait-and-switch and defines information as "Shannon information"a formulation of "information" that applies to signal transmission and does not account for the type of specified complexity found in biology.
It is common for Darwinists to define information as "Shannon information," which is related to calculating the mere unlikelihood of a sequence of events. Under their definition, a functionless stretch of genetic junk might have the same amount "information" as a fully functional gene of the same sequence-length. ID-proponents don't see this as a useful way of measuring biological information. ID-proponents define information as complex and specified informationDNA which is finely-tuned to do something. Stephen C. Meyer writes that ID- theorists use "(CSI) as a synonym for "specified complexity" to help distinguish functional biological information from mere Shannon informationthat is, specified complexity from mere complexity." As the ISCID encyclopedia explains, "Unlike specified complexity, Shannon information is solely concerned with the improbability or complexity of a string of characters rather than its patterning or significance." The I nconvenient Truth for Dawkins: The difference between the Darwinist and ID definitions of information is equivalent to the difference between getting 10 consecutive losing hands in a poker game versus getting 10 consecutive royal flushes. One implicates design, while the other does not.
It is important to note ID proponents did not invent the notion of "specified complexity," nor were they the first to observe that "specified complexity" is the best way to describe biological information. My first knowledge of the term being used comes from leading origin of life theorist Leslie Orgel, who used it in 1973 in a fashion that closely resembles the modern usage by ID proponents: 21
[L]iving organisms are distinguished by their specified complexity. Crystals are usually taken as the prototypes of simple, well-specified structures, because they consist of a very large number of identical molecules packed together in a uniform way. Lumps of granite or random mixtures of polymers are examples of structures which are complex but not specified. The crystals fail to qualify as living because they lack complexity; the mixtures of polymers fail to qualify because they lack specificity.
(Leslie E. Orgel, The Origins of Life: Molecules and Natural Selection," pg.189 (Chapman & Hall: London, 1973).) Orgel thus captures the fact that specified complexity requires both order and a specific arrangement of parts or symbols. This matches the definition given by Dembski, where he defines specified complexity as an unlikely event that conforms to an independent pattern. This establishes that specified complexity is the appropriate measure of biological complexity.
Additionally, Richard Dawkins' article admits that "DNA carries information in a very computer-like way, and we can measure the genome's capacity in bits too, if we wish." That's an interesting analogy, reminiscent of the design overtones of Dawkins concession elsewhere that "[t]he machine code of the genes is uncannily computer-like. Apart from differences in jargon, the pages of a molecular biology journal might be interchanged with those of a computer engineering journal." (Richard Dawkins, River Out of Eden: A Darwinian View of Life, pg. 17 (New York: Basic Books, 1995).) Of course, Dawkins believes that the processes of random mutation and unguided selection ultimately built "[t]he machine code of the genes" and made it "uncannily computer-like." But I do not think a scientist is unjustified in reasoning that in our experience, machine codes and computers only derive from intelligence.
Part 2: Does Gene Duplication Increase Information Content? In this section, I will show why merely citing gene duplication does not help one understand how Darwinian evolution can produce new genetic information. Dawkins' main point in his "The Information Challenge" article is that "[n]ew genes arise through various kinds of duplication." So his answer to the creationist question that so upset him is gene duplication. Yet during the actual gene-duplication process, a pre-existing gene is merely copied, and nothing truly new is generated. As Michael Egnor said in response to PZ Myers: "[G]ene duplication is, presumably, not to be taken too seriously. If you count copies as new information, you must have a hard time with plagiarism in your classes. All that the miscreant students would have to say is 'It's just like gene duplication. Plagiarism is new information- you said so on your blog!'"
Duplicating Genes Doesn't Increase Biological Information in Any Important Sense I now have 2 questions to ask of Darwinists who claim that the mechanism of gene duplication explains how Darwinian evolutionary processes can increase the information content in the genome: (1) Does gene duplication increase the information content?
(2) Does gene duplication increase the information content? 22
Asking the question twice obviously does not double the meaningful information conveyed by the question. How many times would the question have to be duplicated before the meaningful information conveyed by the list of duplicated questions is twice that of the original question? The answer is that the mere duplication of a sentence does NOT increase the complex and specified information content in any meaningful way. Imagine that a builder of houses has a blueprint to build a new house, but the blueprint does not contain enough information to build the house to the specifications that the builder desires. Could the builder obtain the needed additional information merely by photocopying the original blueprint? Of course not.
Darwinists Must Give Detailed Accounts of how a Duplicated Gene Acquires its New Function The Darwinist would probably reply to my objection by saying, "Well, it isn't just gene duplication that increases the genetic information such information is increased when gene duplication is coupled with the subsequent evolution of one of the new copies of the gene." Aye, there's the rub.
Darwinists laud the mechanism of gene duplication because they claim it shows how one copy of a gene can perform the original function, freeing up the other copy to mutate, evolve, and acquire a new function. But the new genetic information must somehow be generated during that subsequent evolution of the gene. To explain how Darwinian processes can generate new and meaningful genetic information, Darwinists must provide a detailed account of how a duplicate copy of a gene can evolve into an entirely new gene. But ask Darwinists for details as to how the duplicate copy then starts to perform some new function, and you probably won't get any. At least, Dawkins didn't given us any details (as I explain below) about this in his "The Information Challenge" article, which I am rebutting here.
A recent study in Nature admitted, "Gene duplication and loss is a powerful source of functional innovation. However, the general principles that govern this process are still largely unknown." (Ilan Wapinski, Avi Pfeffer, Nir Friedman & Aviv Regev, "Natural history and evolutionary principles of gene duplication in fungi," Nature, Vol. 449:54-61 (September 6, 2007).) Yet the crucial question that must be answered by the gene duplication mechanism is, exactly how does the duplicate copy acquire an entirely new function? Stephen Meyer explains in Proceedings of the Biological Society of Washington that it is difficult to imagine how duplicated genes acquire new functions since they must successfully undergo "neutral evolution" and traverse a random walk in order to acquire a new function: [N]eo-Darwinists envision new genetic information arising from those sections of the genetic text that can presumably vary freely without consequence to the organism. According to this scenario, non-coding sections of the genome, or duplicated sections of coding regions, can experience a protracted period of "neutral evolution" (Kimura 1983) during which alterations in nucleotide sequences have no discernible effect on the function of the organism. Eventually, however, a new gene sequence will arise that can code for a novel protein. At that point, natural selection can favor the new gene and its functional protein product, thus securing the preservation and heritability of both.
This scenario has the advantage of allowing the genome to vary through many generations, as 23
mutations "search" space of possible base sequences. The scenario has an overriding problem, however: the size of the combinatorial space (i.e., the number of possible amino acid sequences) and the extreme rarity and isolation of the functional sequences within that space of possibilities. Since natural selection can do nothing to help generate new functional sequences, but rather can only preserve such sequences once they have arisen, chance alone random variationmust do the work of information generationthat is, of finding the exceedingly rare functional sequences within the set of combinatorial possibilities. Yet the probability of randomly assembling (or "finding," in the previous sense) a functional sequence is extremely small.
(Stephen C. Meyer, "The origin of biological information and the higher taxonomic categories," Proceedings for the Biological Society of Washington, Vol. 117(2):213-239 (2004).) The I nconvenient Truth for Dawkins: At best, the mechanism of gene duplication shows how a hiker can get to the foot of a hiking trail, but never explains how the hiker finds the peak of the mountain, while doing a random, blindfolded walk. We don't need to know that genes can make copies of themselves; we need to know how the duplicate gene evolves, step-by-step, into an entirely new gene.
Mistaking Similarity as Evidence for Common Descent, and then Mistaking Common Descent as Evidence for Darwinian Evolution Rather than giving a step-by-step mutational account of how a duplicated gene acquires a new function, Dawkins' article substitutes bland evidence of sequence identity between different genes as evidence for Darwinian evolution by gene duplication. Dawkins gives the example of the evolution of various globin genes that he claims arose via gene duplication. His evidence is that "[c]areful letter-by-letter analysis shows that these different kinds of globin genes are literally cousins of each other, literally members of a family." Of course the "[c]areful letter-by-letter analysis" simply means finding amino acid sequences that are similar or identical between two different proteins. David Swift explains that such claims of relationship "are inferred solely on the basis of assuming a common ancestry and then deriving a route of polypeptide evolution, typically the most parsimonious one, to fit the known present day amino acid sequences and consistent with the observed pattern of conserved amino acids." (David Swift, Evolution Under the Microscope, pg. 165 (Leighton Academic Press, 2002), emphasis in original.) At best, such sequence identity demonstrates common ancestry (if one ignores the possibility of common design), but it does not demonstrate Darwinian evolution. Michael Behe easily rebutted the over-extrapolation from sequence-similarity to Darwinian evolution in both Darwin's Black Box and The Edge of Evolution: "Although useful for determining lines of descent ... comparing sequences cannot show how a complex biochemical system achieved its functionthe question that most concerns us in this book. By way of analogy, the instruction manuals for two different models of computer put out by the same company might have many identical words, sentences, and even paragraphs, suggesting a common ancestry (perhaps the same author wrote both manuals), but comparing the sequences of letters in the instruction manuals will never tell us if a computer can be produced step-by-step starting from a typewriter....Like the sequence analysts, I believe the evidence strongly supports common descent. But the root question remains unanswered: What 24
has caused complex systems to form?" (Michael Behe, Darwin's Black Box, pgs. 175-176.)
"[M]odern Darwinists point to evidence of common descent and erroneously assume it to be evidence of the power of random mutation." (Michael Behe, The Edge of Evolution, pg. 95.) Darwinists like Dawkins continue to make the mistake cited by Behe and Swift. (In fact, if you read the aforementioned "Natural history and evolutionary principles of gene duplication in fungi" article, you'll find it gives only anecdotal or circumstantial evidence of evolution by gene duplication, not directly observed evidence, and there certainly aren't any detailed step- by-step models for how the genes evolved.)
The Dangerous Road Faced by Duplicated Genes If a duplicated gene cannot successfully traverse its random walk, it may die. As Lynch and Conery found, "the vast majority of gene duplicates are silenced within a few million years." (Lynch & Conery, "The Evolutionary Fate and Consequence of Duplicate Genes," Science Vol. 290:1151-1155 (Nov 10, 2000).) Does Richard Dawkins give a step-by-step mutational account of how globin genes evolved from one another while remaining functional at all times, such that the duplicate copies were never "silenced," terminating their evolution? Of course not. Dawkins has not demonstrated how Darwinian evolution can take a duplicated gene and evolve it into a new gene. The problem for Dawkins is that duplicating a gene may increase your amount of Shannon information, but it does not increase the amount of specified complexity in any non-trivial sense. To explain how one gene can turn into another, Dawkins must explain how new specified and complex information can enter the genome, and give a step-by-step mutational account of the origin of some gene via gene duplication. Dawkins has provided none of this.
To understand this point, consider the following sentence (with spaces removed): METHINKSDAWKINSDOTHPROTESTTOOMUCH If we merely consider the Shannon information of the 33 letters (not counting spaces) in the sentence, then it has about 155 bits of Shannon Information. Now we duplicate it, like what happens in a gene duplication event: METHINKSDAWKINSDOTHPROTESTTOOMUCHMETHINKSDAWKINSDOTHPROT ESTTOOMUCH The amount of Shannon information has now doubled (~310 bits), but we have seen no non- trivial increase in the amount of specified complexity. Still, Dawkins thinks gene duplication is the answer, and that "[i]t is by these, and similar means, that genome sizes can increase in evolution."
The Shannon information in the doubled-string is twice the Shannon information in the shorter string if the shorter string does nothing to predict the sequence of the doubled-string. By granting this assumption, we are able to increase the Shannon information in the genome, even though this is a trivial informational increase that does not provide a meaningful increase in the specified complexity. The key questions are (a) what process is generating the new sequence, and (b) to what extent does that process predict the new sequence? In this sense, 25
duplicating a gene would predict that the duplicate gene would be an identical copy of the original gene. From this standpoint, gene duplication actually does NOTHING to increase the Shannon information in the genome because you can predict the sequence of the new stretch of the DNA with a Probability of 1 (where Log (1) = 0), leading to an increase in the Shannon information of 0 bits. In this sense, the Shannon information in the doubled-string is not increased at all from the original, shorter string, as it remains 155 bits. Keep in mind that it is Dawkins who raised the issue of increasing Shannon information in the genome via gene duplication. Viewed in this fashion, Dawkins' claim that gene duplication can increase the Shannon information is even more dubious: if gene duplication predicts that you will have an identical copy of the original gene, then gene duplication not only fails to increase the specified and complex information, it also fails to increase the Shannon information in the genome.
But we aren't trying to simply change the "genome siz[e]," and thereby change the Shannon information. We're trying to construct something functionally new. Thus, imagine that one duplicate copy of the original sentence evolves into a new sentence of the same length: BUTIMSUREDAWKINSBELIEVESHEISRIGHT A Darwinian theorist would find that both sentences contain the word "Dawkins," and thus share a 21% sequence identity. They would then infer that both sentences evolved from that common ancestor via Darwinian evolution. They would conclude that a duplicated version of the sentence "METHINKSDAWKINSDOTHPROTESTTOOMUCH" has evolved into "BUTIMSUREDAWKINSBELIEVESHEISRIGHT."
David Swift explains that finding such similarities is not enough to justify the claim that Darwinian evolution has produced the observed pattern: "[F]or family trees to be credible, most if not all of the putative ancestral sequences must be functional; but this presents a major stumbling block in the production by divergence of proteins with different functions. To get from one set of conserved amino acids to another is either an unlikely big jump, or the intermediates must have biological activity; but the latter seems unlikely because it contradicts what we know about conserved amino acids." (Pg. 166). Thus, in order for Darwinists to convince me that Darwinian evolution can produce new information, at minimum I need to see a step-by-step mutational account of how they can take the sentence: "METHINKSDAWKINSDOTHPROTESTTOOMUCH" and evolve it into: "BUTIMSUREDAWKINSBELIEVESHEISRIGHT" by changing the first sentence one letter at a time, and having it always retain some comprehensible English meaning along each small step of its evolution. Telling me that you can duplicate the sentence does NOT answer the question posed in the video, "Can you give an example of a genetic mutation or evolutionary process that can be seen to increase the information in the genome?" As Michael Behe requested over ten years ago in Darwin's Black Box, what is required is a "detailed, scientific [explanation of] how mutation and natural selection could build" the sentence. (Behe, Darwin's Black Box, pg. 176.) 26
Don't Blame Natural Selection: It's Just Acting upon What Mutations Provide It's worth noting that Dawkins finally claims that it is natural selection that "feeds information into gene pools" by selecting for mutations that help organisms survive. Thus, Dawkins would argue that the information in the environment is transferred into the genome of the organism. Fair enough. But Dawkins isn't telling the most important part of this story. We all know that mutations must provide the raw fuel upon which natural selection can act. As Gilbert, Opitz, and Raff write: The Modern Synthesis is a remarkable achievement. However, starting in the 1970s, many biologists began questioning its adequacy in explaining evolution. Genetics might be adequate for explaining microevolution, but microevolutionary changes in gene frequency were not seen as able to turn a reptile into a mammal or to convert a fish into an amphibian. Microevolution looks at adaptations that concern only the survival of the fittest, not the arrival of the fittest. As Goodwin (1995) points out, "the origin of speciesDarwin's problem remains unsolved.
(Scott Gilbert, John Opitz, and Rudolf Raff (1996) "Resynthesizing Evolutionary and Developmental Biology," Developmental Biology 173, 1996, pg. 361.) Natural selection can (given the right population circumstances, etc.) preserve traits that confer a survival advantage, and it is very effective at weeding out traits that are disadvantageous. But natural selection can only act upon what mutations provide. Thus, we can't account for the survival of particular mutations until we account for the arrival of particular mutations. We cannot account for the increase in information content of genomes until we consider how random mutations produce the raw fuel that natural selection can preserve.
My Information Challenge Reiterated: So here is my "Information Challenge": For the sake of the argument, I will grant that every stage of the evolutionary pathway I requested above will survive, and thus I'll give natural selection every possible benefit of the doubt. What I need is a step-by-step mutation account of how one sentence evolved into the other wherein the sentence remains functional i.e., it has comprehensible English meaning at all stages of its evolution. In short, I request to see how: "METHINKSDAWKINSDOTHPROTESTTOOMUCH" can evolve into: "BUTIMSUREDAWKINSBELIEVESHEISRIGHT" by changing the first sentence one letter at a time, and having it always retain some comprehensible English meaning along each small step of its evolution. This seems like a reasonable request, as it is not highly different from what Darwinists are telling me can happen in nature.
How would Dawkins reply? Would he get angry and complain that this is "the kind of 27
question only a creationist would ask"? Or would he dodge the question like he did in his "The Information Challenge" article? Personally, I'd like to see an answer to the question.
Part 3: Dawkins "Junk"-DNA Blunder. Dawkins' article has other problems. He writes that "most of the capacity of the genome of any animal is not used to store useful information." This is another good example demonstrating how Neo-Darwinism led may scientists to wrongly believe that non-coding DNA was largely junk. Dawkins' statement is directly refuted by the findings of recent studies, which the Washington Post reported that scientists have now found that "the vast majority of the 3 billion 'letters' of the human genetic code are busily toiling at an array of previously invisible tasks." That strikes a fatal blow to Dawkins' argument: Dawkins then (1998) Scientists now (2007) Position regarding "Junk"-DNA: "most of the capacity of the genome of any animal is not used to store useful information" "the vast majority of the 3 billion 'letters' of the human genetic code are busily toiling at an array of previously invisible tasks"
Dawkins claims that there is "lots of repetitive nonsense" in the genome. But is it really "nonsense"? Recent studies are finding increasing function for allegedly non-functional repetitive DNA. Richard Sternberg surveyed the literature and found extensive evidence for function in repetitive DNA (also called repetitive elements, or "REs"). A listing of functions for REs reprinted from Sternberg's paper is shown below: satellite repeats forming higher-order nuclear structures; satellite repeats forming centromeres; satellite repeats and other REs involved in chromatin condensation; telomeric tandem repeats and LINE elements; subtelomeric nuclear positioning/chromatin boundary elements; non-TE interspersed chromatin boundary elements; short, interspersed nuclear elements or SINEs as nucleation centers for methylation; SINEs as chromatin boundary/insulator elements; SINEs involved in cell proliferation; SINEs involved in cellular stress responses; SINEs involved in translation (may be connected to stress response); SINEs involved in binding cohesin to chromosomes; and LINEs involved in DNA repair.
(Richard Sternberg, "On the Roles of Repetitive DNA Elements in the Context of a Unified 28
Genomic-Epigenetic System," Annals of the New York Academy of Sciences, Vol. 981:154- 188 (2002).) Dawkins not only got repetitive junk-DNA wrong, he provides a shimmering example of the fact that neo-Darwinism has led many scientists to wrongly presume that junk-DNA has no function. Some Darwinists have tried to counter that claim by arguing that Neo-Darwinism also led other biologists to presume function for junk-DNA, since its mere presence in the genome implies that natural selection has preserved it for some purpose. Even if that were a good argument, the fact remains that the false junk-DNA mindset was born and bred out of the Neo-Darwinian paradigm. That paradigm misled many scientists on this point, and in fact continues to mislead them.
But it isn't even clear that Darwinists have a good scientific justification to believe that junk- DNA, if it exists, would be naturally selected out of the genome. According to the 2006 edition of Voet and Voet's Biochemistry, there is insufficient selection pressure on functionless repetitive "junk"-DNA to remove it from the genome: No function has been unequivocally assigned to moderately repetitive DNA, which has therefore been termed selfish or junk DNA. This DNA apparently is a molecular parasite that, over many generations, has disseminated itself throughout the genome through transposition. The theory of natural selection predicts that the increased metabolic burden imposed by the replication of an otherwise harmless selfish DNA would eventually lead to its elimination. Yet for slowly growing eukaryotes, the relative disadvantage of replicating an additional 100 bp of selfish DNA in an 1-billion-bp genome would be so slight that its rate of elimination would be balanced by its rate of propagation. Because unexpressed sequences are subject to little selective pressure, they accumulate mutations at a greater rate than do expressed sequences.
(Donald Voet and Judith G. Voet, Biochemistry, pg. 1020 (Jon Wiley & Sons, 2006), emphasis added.) In other words, Darwinists like Dawkins had every reason to presume that non-coding repetitive DNA was, in Dawkins' words, functionless "nonsense" that was, in Voet and Voet's words, a "molecular parasite," even though it persisted in the genome. But Voet and Voet are wrong to presume that such repetitive DNA is mere parasitic junk, given that examples of functions for it abound. Sternberg's article concluded that "the selfish DNA narrative and allied frameworks must join the other 'icons' of neo-Darwinian evolutionary theory that, despite their variance with empirical evidence, nevertheless persist in the literature." Sternberg, along with geneticist James A. Shapiro, concludes elsewhere that "one day, we will think of what used to be called 'junk DNA' as a critical component of truly 'expert' cellular control regimes." (Richard Sternberg and James A. Shapiro, "How Repeated Retroelements format genome function," Cytogenetic and Genome Research, Vol. 110:108-116 (2005).)
It looks like Dawkins has some work to do if he is to update all of his arguments against ID and answer "The Information Challenge." 29
3.0 Does a computer networking expert have something new and important to say about the Evolution vs. Intelligent Design Debate? by Perry Marshall Im author of the book Industrial Ethernet published by ISA, now in its 2 nd edition, and have written many dozens of magazine articles and white papers on computer networks. Now you may ask, what do computers have to do with DNA and all those endless arguments about intelligent design? Actually, a lot. Just like all those 1s and 0s that make our modern world go round, DNA is also a digital communication system. All the same formulas and communication theory that created our modern digital age apply to DNA too. In fact many methods that are commonplace in the information technology field have been adapted and applied to genetics research and the Human Genome Project. Now, discover what our knowledge of modern communication systems now tells us about the Origins Debate. Download Power Point in Adobe PDF Read Transcript - Download Printable Version in PDF
Executive Summary: Part 1: Language, I nformation, and the Origin of DNA (Read Transcript) Most arguments about evolution and intelligent design offer only anecdotal evidence and are inherently incapable of actually proving anything. We must get better evidence in order to get to the bottom of this! Fortunately, the science of modern communications easily provides us with the tools we need to get answers. Although the details are complex, the concepts are easily grasped by anyone with a high school education. Patterns occur naturally - no help required from a 'designer'. Many patterns occur in nature without the help of a designer snowflakes, tornados, hurricanes, sand dunes, stalactites, rivers and ocean waves. These patterns are the natural result of what scientists categorize as chaos and fractals. These things are well-understood and we experience them every day. Codes, however, do not occur without a designer. Examples of symbolic codes include music, blueprints, languages like English and Chinese, computer programs, and yes, DNA. The essential distinction 30
is the difference between a pattern and a code. Chaos can produce patterns, but it has never been shown to produce codes or symbols. Codes and symbols store information, which is not a property of matter and energy alone. Information itself is a separate entity on par with matter and energy. Proof that DNA was designed by a mind: (1) DNA is not merely a molecule with a pattern; it is a code, a language, and an information storage mechanism. (2) All codes we know the origin of are created by a conscious mind. (3) Therefore DNA was designed by a mind, and language and information are proof of the action of a Superintelligence. We can explore five possible conclusions: 1) Humans designed DNA 2) Aliens designed DNA 3) DNA occurred randomly and spontaneously 4) There must be some undiscovered law of physics that creates information 5) DNA was Designed by a Superintelligence, i.e. God.
(1) requires time travel or infinite generations of humans. (2) could well be true but only pushes the question back in time. (3) may be a remote possibility, but it's not a scientific explanation in that it doesn't refer to a systematic, repeatable process. It's nothing more than an appeal to luck . (4) could be true but no one can form a testable hypothesis until someone observes a naturally occurring code. So the only systematic explanation that remains is (5) a theological one. To the extent that scientific reasoning can prove anything, DNA is proof of a designer. Part 2: A Christian and an Atheist Go the Zoo (Read Transcript) Did the Antelope evolve into the Giraffe? According to Darwinian evolution, the necessities of the environment, random mutation and natural selection working together caused the antelope to grow a longer neck and become a giraffe. OK, then what does communication theory say about that hypothesis? Natural Selection is perfectly valid and has been proven time and time again. But most people will be very surprised to discover that no one has ever actually demonstrated that random mutation can create new information. Information theory shows us why this is so: In communication systems, Random Mutation is exactly the same as noise, and noise always destroys the signal, never enhances it. 31
In communication systems this is called information entropy, and the formula for information entropy is exactly the same as thermodynamic entropy. Once lost, the information can never be recovered, much less enhanced. Thus we can be 100% certain that random mutation is not the source of biodiversity. A tool is provided, www.RandomMutation.com, that allows you to experiment and see for yourself that random mutation always destroys information, never enhances it. This observation is also confirmed biologically by Theodosius Dobzhansky's fruit fly radiation experiments, Goldschmidt's gypsy moth experiments, and others. Decades of research were conducted in the early 20 th century, bombarding fruit flies and moths with radiation in hope of mutating their DNA and producing improved creatures. These experiments were a total failure there were no observed improvements only weak, sickly, deformed fruit flies. Giraffes may have evolved from antelopes - I never said that couldn't happen, and I remain open to the possibility that it did. But it certainly wasn't because of Random Mutation! We have proof that life on planet earth was designed by a mind - and that if life did evolve, the capacity to evolve had to be designed in. The word Evolution in the English language always refers to an intelligent process (in business, society, technology etc.) and the only usage in which it allegedly doesn't is naturalistic Darwinian evolution. But communication theory shows us that Evolution by Random Process is a hypothesis without proof. Finally this presentation concludes with a brief observation: There is an interesting correspondence between Judeo-Christian theology and modern information theory, the statement words and language are the essence of creation: And God said In the beginning was the WORD; that the worlds were spoken into existence.