[color=#0033cc][size=27]Intelligent Design: The Origin of Biological Information and the Higher Taxonomic Categories[/color]
By: Stephen C. Meyer
Proceedings of the Biological Society of Washington
May 18, 2007
On August 4th, 2004 an extensive review essay by Dr. Stephen C. Meyer, Director of Discovery Institute's Center for Science & Culture appeared in the Proceedings of the Biological Society of Washington (volume 117, no. 2, pp. 213-239). The Proceedings is a peer-reviewed biology journal published at the National Museum of Natural History at the Smithsonian Institution in Washington D.C.
In the article, entitled “The Origin of Biological Information and the Higher Taxonomic Categories”, Dr. Meyer argues that no current materialistic theory of evolution can account for the origin of the information necessary to build novel animal forms. He proposes intelligent design as an alternative explanation for the origin of biological information and the higher taxa.
Fonte & Copyright: http://www.discovery.org/scripts/viewDB/index.php?command=view&id=2177
En Espanol (PDF)
[color=#003399]PROCEEDINGS OF THE BIOLOGICAL SOCIETY OF WASHINGTON[/color]
The origin of biological information and the higher taxonomic categories
Stephen C. Meyer
In a recent volume of the Vienna Series in a Theoretical Biology (2003), Gerd B. Muller and Stuart Newman argue that what they call the “origination of organismal form” remains an unsolved problem. In making this claim, Muller and Newman (2003:3-10) distinguish two distinct issues, namely, (1) the causes of form generation in the individual organism during embryological development and (2) the causes responsible for the production of novel organismal forms in the first place during the history of life. To distinguish the latter case (phylogeny) from the former (ontogeny), Muller and Newman use the term “origination” to designate the causal processes by which biological form first arose during the evolution of life. They insist that “the molecular mechanisms that bring about biological form in modern day embryos should not be confused” with the causes responsible for the origin (or “origination”) of novel biological forms during the history of life (p.3). They further argue that we know more about the causes of ontogenesis, due to advances in molecular biology, molecular genetics and developmental biology, than we do about the causes of phylogenesis– the ultimate origination of new biological forms during the remote past.
In making this claim, Muller and Newman are careful to affirm that evolutionary biology has succeeded in explaining how preexisting forms diversify under the twin influences of natural selection and variation of genetic traits. Sophisticated mathematically-based models of population genetics have proven adequate for mapping and understanding quantitative variability and populational changes in organisms. Yet Muller and Newman insist that population genetics, and thus evolutionary biology, has not identified a specifically causal explanation for the origin of true morphological novelty during the history of life. Central to their concern is what they see as the inadequacy of the variation of genetic traits as a source of new form and structure. They note, following Darwin himself, that the sources of new form and structure must precede the action of natural selection (2003:3)– that selection must act on what already exists. Yet, in their view, the “genocentricity” and “incrementalism” of the neo-Darwinian mechanism has meant that an adequate source of new form and structure has yet to be identified by theoretical biologists. Instead, Muller and Newman see the need to identify epigenetic sources of morphological innovation during the evolution of life. In the meantime, however, they insist neo-Darwinism lacks any “theory of the generative” (p. 7).
As it happens, Muller and Newman are not alone in this judgment. In the last decade or so a host of scientific essays and books have questioned the efficacy of selection and mutation as a mechanism for generating morphological novelty, as even a brief literature survey will establish. Thomson (1992:107) expressed doubt that large-scale morphological changes could accumulate via minor phenotypic changes at the population genetic level. Miklos (1993:29) argued that neo-Darwinism fails to provide a mechanism that can produce large-scale innovations in form and complexity. Gilbert et al. (1996) attempted to develop a new theory of evolutionary mechanisms to supplement classical neo-Darwinism, which, they argued, could not adequately explain macroevolution. As they put it in a memorable summary of the situation: “starting in the 1970s, many biologists began questioning its (neo-Darwinism's) adequacy in explaining evolution. Genetics might be adequate for explaining microevolution, but microevolutionary changes in gene frequency were not seen as able to turn a reptile into a mammal or to convert a fish into an amphibian. Microevolution looks at adaptations that concern the survival of the fittest, not the arrival of the fittest. As Goodwin (1995) points out, 'the origin of species– Darwin's problem– remains unsolved'“ (p. 361). Though Gilbert et al. (1996) attempted to solve the problem of the origin of form by proposing a greater role for developmental genetics within an otherwise neo-Darwinian framework,1 numerous recent authors have continued to raise questions about the adequacy of that framework itself or about the problem of the origination of form generally (Webster & Goodwin 1996; Shubin & Marshall 2000; Erwin 2000; Conway Morris 2000, 2003b; Carroll 2000; Wagner 2001; Becker & Lonnig 2001; Stadler et al. 2001; Lonnig & Saedler 2002; Wagner & Stadler 2003; Valentine 2004:189-194).
What lies behind this skepticism? Is it warranted? Is a new and specifically causal theory needed to explain the origination of biological form?
This review will address these questions. It will do so by analyzing the problem of the origination of organismal form (and the corresponding emergence of higher taxa) from a particular theoretical standpoint. Specifically, it will treat the problem of the origination of the higher taxonomic groups as a manifestation of a deeper problem, namely, the problem of the origin of the information (whether genetic or epigenetic) that, as it will be argued, is necessary to generate morphological novelty.
In order to perform this analysis, and to make it relevant and tractable to systematists and paleontologists, this paper will examine a paradigmatic example of the origin of biological form and information during the history of life: the Cambrian explosion. During the Cambrian, many novel animal forms and body plans (representing new phyla, subphyla and classes) arose in a geologically brief period of time. The following information-based analysis of the Cambrian explosion will support the claim of recent authors such as Muller and Newman that the mechanism of selection and genetic mutation does not constitute an adequate causal explanation of the origination of biological form in the higher taxonomic groups. It will also suggest the need to explore other possible causal factors for the origin of form and information during the evolution of life and will examine some other possibilities that have been proposed.
The Cambrian Explosion
The “Cambrian explosion” refers to the geologically sudden appearance of many new animal body plans about 530 million years ago. At this time, at least nineteen, and perhaps as many as thirty-five phyla of forty total (Meyer et al. 2003), made their first appearance on earth within a narrow five- to ten-million-year window of geologic time (Bowring et al. 1993, 1998a:1, 1998b:40; Kerr 1993; Monastersky 1993; Aris-Brosou & Yang 2003). Many new subphyla, between 32 and 48 of 56 total (Meyer et al. 2003), and classes of animals also arose at this time with representatives of these new higher taxa manifesting significant morphological innovations. The Cambrian explosion thus marked a major episode of morphogenesis in which many new and disparate organismal forms arose in a geologically brief period of time.
To say that the fauna of the Cambrian period appeared in a geologically sudden manner also implies the absence of clear transitional intermediate forms connecting Cambrian animals with simpler pre-Cambrian forms. And, indeed, in almost all cases, the Cambrian animals have no clear morphological antecedents in earlier Vendian or Precambrian fauna (Miklos 1993, Erwin et al. 1997:132, Steiner & Reitner 2001, Conway Morris 2003b:510, Valentine et al. 2003:519-520). Further, several recent discoveries and analyses suggest that these morphological gaps may not be merely an artifact of incomplete sampling of the fossil record (Foote 1997, Foote et al. 1999, Benton & Ayala 2003, Meyer et al. 2003), suggesting that the fossil record is at least approximately reliable (Conway Morris 2003b:505).
As a result, debate now exists about the extent to which this pattern of evidence comports with a strictly monophyletic view of evolution (Conway Morris 1998a, 2003a, 2003b:510; Willmer 1990, 2003). Further, among those who accept a monophyletic view of the history of life, debate exists about whether to privilege fossil or molecular data and analyses. Those who think the fossil data provide a more reliable picture of the origin of the Metazoan tend to think these animals arose relatively quickly– that the Cambrian explosion had a “short fuse.” (Conway Morris 2003b:505-506, Valentine & Jablonski 2003). Some (Wray et al. 1996), but not all (Ayala et al. 1998), who think that molecular phylogenies establish reliable divergence times from pre-Cambrian ancestors think that the Cambrian animals evolved over a very long period of time– that the Cambrian explosion had a “long fuse.” This review will not address these questions of historical pattern. Instead, it will analyze whether the neo-Darwinian process of mutation and selection, or other processes of evolutionary change, can generate the form and information necessary to produce the animals that arise in the Cambrian. This analysis will, for the most part, 2 therefore, not depend upon assumptions of either a long or short fuse for the Cambrian explosion, or upon a monophyletic or polyphyletic view of the early history of life.
Defining Biological Form and Information
Form, like life itself, is easy to recognize but often hard to define precisely. Yet, a reasonable working definition of form will suffice for our present purposes. Form can be defined as the four-dimensional topological relations of anatomical parts. This means that one can understand form as a unified arrangement of body parts or material components in a distinct shape or pattern (topology)– one that exists in three spatial dimensions and which arises in time during ontogeny.
Insofar as any particular biological form constitutes something like a distinct arrangement of constituent body parts, form can be seen as arising from constraints that limit the possible arrangements of matter. Specifically, organismal form arises (both in phylogeny and ontogeny) as possible arrangements of material parts are constrained to establish a specific or particular arrangement with an identifiable three dimensional topography– one that we would recognize as a particular protein, cell type, organ, body plan or organism. A particular “form,” therefore, represents a highly specific and constrained arrangement of material components (among a much larger set of possible arrangements).
Understanding form in this way suggests a connection to the notion of information in its most theoretically general sense. When Shannon (1948) first developed a mathematical theory of information he equated the amount of information transmitted with the amount of uncertainty reduced or eliminated in a series of symbols or characters. Information, in Shannon's theory, is thus imparted as some options are excluded and others are actualized. The greater the number of options excluded, the greater the amount of information conveyed. Further, constraining a set of possible material arrangements by whatever process or means involves excluding some options and actualizing others. Thus, to constrain a set of possible material states is to generate information in Shannon's sense. It follows that the constraints that produce biological form also imparted information. Or conversely, one might say that producing organismal form by definition requires the generation of information.
In classical Shannon information theory, the amount of information in a system is also inversely related to the probability of the arrangement of constituents in a system or the characters along a communication channel (Shannon 1948). The more improbable (or complex) the arrangement, the more Shannon information, or information-carrying capacity, a string or system possesses.
Since the 1960s, mathematical biologists have realized that Shannon's theory could be applied to the analysis of DNA and proteins to measure the information-carrying capacity of these macromolecules. Since DNA contains the assembly instructions for building proteins, the information-processing system in the cell represents a kind of communication channel (Yockey 1992:110). Further, DNA conveys information via specifically arranged sequences of nucleotide bases. Since each of the four bases has a roughly equal chance of occurring at each site along the spine of the DNA molecule, biologists can calculate the probability, and thus the information-carrying capacity, of any particular sequence n bases long.
The ease with which information theory applies to molecular biology has created confusion about the type of information that DNA and proteins possess. Sequences of nucleotide bases in DNA, or amino acids in a protein, are highly improbable and thus have large information-carrying capacities. But, like meaningful sentences or lines of computer code, genes and proteins are also specified with respect to function. Just as the meaning of a sentence depends upon the specific arrangement of the letters in a sentence, so too does the function of a gene sequence depend upon the specific arrangement of the nucleotide bases in a gene. Thus, molecular biologists beginning with Crick equated information not only with complexity but also with “specificity,” where “specificity” or “specified” has meant “necessary to function” (Crick 1958:144, 153; Sarkar, 1996:191).3 Molecular biologists such as Monod and Crick understood biological information– the information stored in DNA and proteins– as something more than mere complexity (or improbability). Their notion of information associated both biochemical contingency and combinatorial complexity with DNA sequences (allowing DNA's carrying capacity to be calculated), but it also affirmed that sequences of nucleotides and amino acids in functioning macromolecules possessed a high degree of specificity relative to the maintenance of cellular function.
The ease with which information theory applies to molecular biology has also created confusion about the location of information in organisms. Perhaps because the information carrying capacity of the gene could be so easily measured, it has been easy to treat DNA, RNA and proteins as the sole repositories of biological information. Neo-Darwinists in particular have assumed that the origination of biological form could be explained by recourse to processes of genetic variation and mutation alone (Levinton 1988:485). Yet if one understands organismal form as resulting from constraints on the possible arrangements of matter at many levels in the biological hierarchy– from genes and proteins to cell types and tissues to organs and body plans– then clearly biological organisms exhibit many levels of information-rich structure.
Thus, we can pose a question, not only about the origin of genetic information, but also about the origin of the information necessary to generate form and structure at levels higher than that present in individual proteins. We must also ask about the origin of the “specified complexity,” as opposed to mere complexity, that characterizes the new genes, proteins, cell types and body plans that arose in the Cambrian explosion. Dembski (2002) has used the term “complex specified information” (CSI) as a synonym for “specified complexity” to help distinguish functional biological information from mere Shannon information– that is, specified complexity from mere complexity. This review will use this term as well.
The Cambrian Information Explosion
The Cambrian explosion represents a remarkable jump in the specified complexity or “complex specified information” (CSI) of the biological world. For over three billions years, the biological realm included little more than bacteria and algae (Brocks et al. 1999). Then, beginning about 570-565 million years ago (mya), the first complex multicellular organisms appeared in the rock strata, including sponges, cnidarians, and the peculiar Ediacaran biota (Grotzinger et al. 1995). Forty million years later, the Cambrian explosion occurred (Bowring et al. 1993). The emergence of the Ediacaran biota (570 mya), and then to a much greater extent the Cambrian explosion (530 mya), represented steep climbs up the biological complexity gradient.
One way to estimate the amount of new CSI that appeared with the Cambrian animals is to count the number of new cell types that emerged with them (Valentine 1995:91-93). Studies of modern animals suggest that the sponges that appeared in the late Precambrian, for example, would have required five cell types, whereas the more complex animals that appeared in the Cambrian (e.g., arthropods) would have required fifty or more cell types. Functionally more complex animals require more cell types to perform their more diverse functions. New cell types require many new and specialized proteins. New proteins, in turn, require new genetic information. Thus an increase in the number of cell types implies (at a minimum) a considerable increase in the amount of specified genetic information. Molecular biologists have recently estimated that a minimally complex single-celled organism would require between 318 and 562 kilobase pairs of DNA to produce the proteins necessary to maintain life (Koonin 2000). More complex single cells might require upward of a million base pairs. Yet to build the proteins necessary to sustain a complex arthropod such as a trilobite would require orders of magnitude more coding instructions. The genome size of a modern arthropod, the fruitfly Drosophila melanogaster, is approximately 180 million base pairs (Gerhart & Kirschner 1997:121, Adams et al. 2000). Transitions from a single cell to colonies of cells to complex animals represent significant (and, in principle, measurable) increases in CSI.
Building a new animal from a single-celled organism requires a vast amount of new genetic information. It also requires a way of arranging gene products– proteins– into higher levels of organization. New proteins are required to service new cell types. But new proteins must be organized into new systems within the cell; new cell types must be organized into new tissues, organs, and body parts. These, in turn, must be organized to form body plans. New animals, therefore, embody hierarchically organized systems of lower-level parts within a functional whole. Such hierarchical organization itself represents a type of information, since body plans comprise both highly improbable and functionally specified arrangements of lower-level parts. The specified complexity of new body plans requires explanation in any account of the Cambrian explosion.
Can neo-Darwinism explain the discontinuous increase in CSI that appears in the Cambrian explosion– either in the form of new genetic information or in the form of hierarchically organized systems of parts? We will now examine the two parts of this question.
Novel Genes and Proteins
Many scientists and mathematicians have questioned the ability of mutation and selection to generate information in the form of novel genes and proteins. Such skepticism often derives from consideration of the extreme improbability (and specificity) of functional genes and proteins.
A typical gene contains over one thousand precisely arranged bases. For any specific arrangement of four nucleotide bases of length n, there is a corresponding number of possible arrangements of bases, 4n. For any protein, there are 20n possible arrangements of protein-forming amino acids. A gene 999 bases in length represents one of 4999 possible nucleotide sequences; a protein of 333 amino acids is one of 20333 possibilities.
Since the 1960s, some biologists have thought functional proteins to be rare among the set of possible amino acid sequences. Some have used an analogy with human language to illustrate why this should be the case. Denton (1986, 309-311), for example, has shown that meaningful words and sentences are extremely rare among the set of possible combinations of English letters, especially as sequence length grows. (The ratio of meaningful 12-letter words to 12-letter sequences is 1/1014, the ratio of 100-letter sentences to possible 100-letter strings is 1/10100.) Further, Denton shows that most meaningful sentences are highly isolated from one another in the space of possible combinations, so that random substitutions of letters will, after a very few changes, inevitably degrade meaning. Apart from a few closely clustered sentences accessible by random substitution, the overwhelming majority of meaningful sentences lie, probabilistically speaking, beyond the reach of random search.
Denton (1986:301-324) and others have argued that similar constraints apply to genes and proteins. They have questioned whether an undirected search via mutation and selection would have a reasonable chance of locating new islands of function– representing fundamentally new genes or proteins– within the time available (Eden 1967, Shutzenberger 1967, Lovtrup 1979). Some have also argued that alterations in sequencing would likely result in loss of protein function before fundamentally new function could arise (Eden 1967, Denton 1986). Nevertheless, neither the extent to which genes and proteins are sensitive to functional loss as a result of sequence change, nor the extent to which functional proteins are isolated within sequence space, has been fully known.
Recently, experiments in molecular biology have shed light on these questions. A variety of mutagenesis techniques have shown that proteins (and thus the genes that produce them) are indeed highly specified relative to biological function (Bowie & Sauer 1989, Reidhaar-Olson & Sauer 1990, Taylor et al. 2001). Mutagenesis research tests the sensitivity of proteins (and, by implication, DNA) to functional loss as a result of alterations in sequencing. Studies of proteins have long shown that amino acid residues at many active positions cannot vary without functional loss (Perutz & Lehmann 1968). More recent protein studies (often using mutagenesis experiments) have shown that functional requirements place significant constraints on sequencing even at non-active site positions (Bowie & Sauer 1989, Reidhaar-Olson & Sauer 1990, Chothia et al. 1998, Axe 2000, Taylor et al. 2001). In particular, Axe (2000) has shown that multiple as opposed to single position amino acid substitutions inevitably result in loss of protein function, even when these changes occur at sites that allow variation when altered in isolation. Cumulatively, these constraints imply that proteins are highly sensitive to functional loss as a result of alterations in sequencing, and that functional proteins represent highly isolated and improbable arrangements of amino acids -arrangements that are far more improbable, in fact, than would be likely to arise by chance alone in the time available (Reidhaar-Olson & Sauer 1990; Behe 1992; Kauffman 1995:44; Dembski 1998:175-223; Axe 2000, 2004). (See below the discussion of the neutral theory of evolution for a precise quantitative assessment.)
Of course, neo-Darwinists do not envision a completely random search through the set of all possible nucleotide sequences– so-called “sequence space.” They envision natural selection acting to preserve small advantageous variations in genetic sequences and their corresponding protein products. Dawkins (1996), for example, likens an organism to a high mountain peak. He compares climbing the sheer precipice up the front side of the mountain to building a new organism by chance. He acknowledges that his approach up “Mount Improbable” will not succeed. Nevertheless, he suggests that there is a gradual slope up the backside of the mountain that could be climbed in small incremental steps. In his analogy, the backside climb up “Mount Improbable” corresponds to the process of natural selection acting on random changes in the genetic text. What chance alone cannot accomplish blindly or in one leap, selection (acting on mutations) can accomplish through the cumulative effect of many slight successive steps.
Yet the extreme specificity and complexity of proteins presents a difficulty, not only for the chance origin of specified biological information (i.e., for random mutations acting alone), but also for selection and mutation acting in concert. Indeed, mutagenesis experiments cast doubt on each of the two scenarios by which neo-Darwinists envisioned new information arising from the mutation/selection mechanism (for review, see Lonnig 2001). For neo-Darwinism, new functional genes either arise from non-coding sections in the genome or from preexisting genes. Both scenarios are problematic.
In the first scenario, neo-Darwinists envision new genetic information arising from those sections of the genetic text that can presumably vary freely without consequence to the organism. According to this scenario, non-coding sections of the genome, or duplicated sections of coding regions, can experience a protracted period of “neutral evolution” (Kimura 1983) during which alterations in nucleotide sequences have no discernible effect on the function of the organism. Eventually, however, a new gene sequence will arise that can code for a novel protein. At that point, natural selection can favor the new gene and its functional protein product, thus securing the preservation and heritability of both.
This scenario has the advantage of allowing the genome to vary through many generations, as mutations “search” the space of possible base sequences. The scenario has an overriding problem, however: the size of the combinatorial space (i.e., the number of possible amino acid sequences) and the extreme rarity and isolation of the functional sequences within that space of possibilities. Since natural selection can do nothing to help generate new functional sequences, but rather can only preserve such sequences once they have arisen, chance alone– random variation– must do the work of information generation– that is, of finding the exceedingly rare functional sequences within the set of combinatorial possibilities. Yet the probability of randomly assembling (or “finding,” in the previous sense) a functional sequence is extremely small.
Cassette mutagenesis experiments performed during the early 1990s suggest that the probability of attaining (at random) the correct sequencing for a short protein 100 amino acids long is about 1 in 1065 (Reidhaar-Olson & Sauer 1990, Behe 1992:65-69). This result agreed closely with earlier calculations that Yockey (1978) had performed based upon the known sequence variability of cytochrome c in different species and other theoretical considerations. More recent mutagenesis research has provided additional support for the conclusion that functional proteins are exceedingly rare among possible amino acid sequences (Axe 2000, 2004). Axe (2004) has performed site directed mutagenesis experiments on a 150-residue protein-folding domain within a B-lactamase enzyme. His experimental method improves upon earlier mutagenesis techniques and corrects for several sources of possible estimation error inherent in them. On the basis of these experiments, Axe has estimated the ratio of (a) proteins of typical size (150 residues) that perform a specified function via any folded structure to (b) the whole set of possible amino acids sequences of that size. Based on his experiments, Axe has estimated his ratio to be 1 to 1077. Thus, the probability of finding a functional protein among the possible amino acid sequences corresponding to a 150-residue protein is similarly 1 in 1077.
Other considerations imply additional improbabilities. First, new Cambrian animals would require proteins much longer than 100 residues to perform many necessary specialized functions. Ohno (1996) has noted that Cambrian animals would have required complex proteins such as lysyl oxidase in order to support their stout body structures. Lysyl oxidase molecules in extant organisms comprise over 400 amino acids. These molecules are both highly complex (non-repetitive) and functionally specified. Reasonable extrapolation from mutagenesis experiments done on shorter protein molecules suggests that the probability of producing functionally sequenced proteins of this length at random is so small as to make appeals to chance absurd, even granting the duration of the entire universe. (See Dembski 1998:175-223 for a rigorous calculation of this “Universal Probability Bound”; See also Axe 2004.) Yet, second, fossil data (Bowring et al. 1993, 1998a:1, 1998b:40; Kerr 1993; Monatersky 1993), and even molecular analyses supporting deep divergence (Wray et al. 1996), suggest that the duration of the Cambrian explosion (between 5-10 x 106 and, at most, 7 x 107 years) is far smaller than that of the entire universe (1.3-2 x 1010 years). Third, DNA mutation rates are far too low to generate the novel genes and proteins necessary to building the Cambrian animals, given the most probable duration of the explosion as determined by fossil studies (Conway Morris 1998b). As Ohno (1996:8475) notes, even a mutation rate of 10-9 per base pair per year results in only a 1% change in the sequence of a given section of DNA in 10 million years. Thus, he argues that mutational divergence of preexisting genes cannot explain the origin of the Cambrian forms in that time.4
The selection/mutation mechanism faces another probabilistic obstacle. The animals that arise in the Cambrian exhibit structures that would have required many new types of cells, each of which would have required many novel proteins to perform their specialized functions. Further, new cell types require Asystems of proteins that must, as a condition of functioning, act in close coordination with one another. The unit of selection in such systems ascends to the system as a whole. Natural selection selects for functional advantage. But new cell types require whole systems of proteins to perform their distinctive functions. In such cases, natural selection cannot contribute to the process of information generation until after the information necessary to build the requisite system of proteins has arisen. Thus random variations must, again, do the work of information generation– and now not simply for one protein, but for many proteins arising at nearly the same time. Yet the odds of this occurring by chance alone are, of course, far smaller than the odds of the chance origin of a single gene or protein– so small in fact as to render the chance origin of the genetic information necessary to build a new cell type (a necessary but not sufficient condition of building a new body plan) problematic given even the most optimistic estimates for the duration of the Cambrian explosion.
Dawkins (1986:139) has noted that scientific theories can rely on only so much “luck” before they cease to be credible. The neutral theory of evolution, which, by its own logic, prevents natural selection from playing a role in generating genetic information until after the fact, relies on entirely too much luck. The sensitivity of proteins to functional loss, the need for long proteins to build new cell types and animals, the need for whole new systems of proteins to service new cell types, the probable brevity of the Cambrian explosion relative to mutation rates– all suggest the immense improbability (and implausibility) of any scenario for the origination of Cambrian genetic information that relies upon random variation alone unassisted by natural selection.
Yet the neutral theory requires novel genes and proteins to arise– essentially– by random mutation alone. Adaptive advantage accrues after the generation of new functional genes and proteins. Thus, natural selection cannot play a role until new information-bearing molecules have independently arisen. Thus neutral theorists envisioned the need to scale the steep face of a Dawkins-style precipice of which there is no gradually sloping backside– a situation that, by Dawkins' own logic, is probabilistically untenable.
In the second scenario, neo-Darwinists envisioned novel genes and proteins arising by numerous successive mutations in the preexisting genetic text that codes for proteins. To adapt Dawkins's metaphor, this scenario envisions gradually climbing down one functional peak and then ascending another. Yet mutagenesis experiments again suggest a difficulty. Recent experiments show that, even when exploring a region of sequence space populated by proteins of a single fold and function, most multiple-position changes quickly lead to loss of function (Axe 2000). Yet to turn one protein into another with a completely novel structure and function requires specified changes at many sites. Indeed, the number of changes necessary to produce a new protein greatly exceeds the number of changes that will typically produce functional losses. Given this, the probability of escaping total functional loss during a random search for the changes needed to produce a new function is extremely small– and this probability diminishes exponentially with each additional requisite change (Axe 2000). Thus, Axe's results imply that, in all probability, random searches for novel proteins (through sequence space) will result in functional loss long before any novel functional protein will emerge.
Blanco et al. have come to a similar conclusion. Using directed mutagenesis, they have determined that residues in both the hydrophobic core and on the surface of the protein play essential roles in determining protein structure. By sampling intermediate sequences between two naturally occurring sequences that adopt different folds, they found that the intermediate sequences “lack a well defined three-dimensional structure.” Thus, they conclude that it is unlikely that a new protein fold via a series of folded intermediates sequences (Blanco et al. 1999:741).
Thus, although this second neo-Darwinian scenario has the advantage of starting with functional genes and proteins, it also has a lethal disadvantage: any process of random mutation or rearrangement in the genome would in all probability generate nonfunctional intermediate sequences before fundamentally new functional genes or proteins would arise. Clearly, nonfunctional intermediate sequences confer no survival advantage on their host organisms. Natural selection favors only functional advantage. It cannot select or favor nucleotide sequences or polypeptide chains that do not yet perform biological functions, and still less will it favor sequences that efface or destroy preexisting function.
Evolving genes and proteins will range through a series of nonfunctional intermediate sequences that natural selection will not favor or preserve but will, in all probability, eliminate (Blanco et al. 1999, Axe 2000). When this happens, selection-driven evolution will cease. At this point, neutral evolution of the genome (unhinged from selective pressure) may ensue, but, as we have seen, such a process must overcome immense probabilistic hurdles, even granting cosmic time.
Thus, whether one envisions the evolutionary process beginning with a noncoding region of the genome or a preexisting functional gene, the functional specificity and complexity of proteins impose very stringent limitations on the efficacy of mutation and selection. In the first case, function must arise first, before natural selection can act to favor a novel variation. In the second case, function must be continuously maintained in order to prevent deleterious (or lethal) consequences to the organism and to allow further evolution. Yet the complexity and functional specificity of proteins implies that both these conditions will be extremely difficult to meet. Therefore, the neo-Darwinian mechanism appears to be inadequate to generate the new information present in the novel genes and proteins that arise with the Cambrian animals.
Novel Body Plans
The problems with the neo-Darwinian mechanism run deeper still. In order to explain the origin of the Cambrian animals, one must account not only for new proteins and cell types, but also for the origin of new body plans. Within the past decade, developmental biology has dramatically advanced our understanding of how body plans are built during ontogeny. In the process, it has also uncovered a profound difficulty for neo-Darwinism.
Significant morphological change in organisms requires attention to timing. Mutations in genes that are expressed late in the development of an organism will not affect the body plan. Mutations expressed early in development, however, could conceivably produce significant morphological change (Arthur 1997:21). Thus, events expressed early in the development of organisms have the only realistic chance of producing large-scale macroevolutionary change (Thomson 1992). As John and Miklos (1988:309) explain, macroevolutionary change requires alterations in the very early stages of ontogenesis.
Yet recent studies in developmental biology make clear that mutations expressed early in development typically have deleterious effects (Arthur 1997:21). For example, when early-acting body plan molecules, or morphogens such as bicoid (which helps to set up the anterior-posterior head-to-tail axis in Drosophila), are perturbed, development shuts down (Nusslein-Volhard & Wieschaus 1980, Lawrence & Struhl 1996, Muller & Newman 2003).5 The resulting embryos die. Moreover, there is a good reason for this. If an engineer modifies the length of the piston rods in an internal combustion engine without modifying the crankshaft accordingly, the engine won't start. Similarly, processes of development are tightly integrated spatially and temporally such that changes early in development will require a host of other coordinated changes in separate but functionally interrelated developmental processes downstream. For this reason, mutations will be much more likely to be deadly if they disrupt a functionally deeply-embedded structure such as a spinal column than if they affect more isolated anatomical features such as fingers (Kauffman 1995:200).
This problem has led to what McDonald (1983) has called “a great Darwinian paradox” (p. 93). McDonald notes that genes that are observed to vary within natural populations do not lead to major adaptive changes, while genes that could cause major changes– the very stuff of macroevolution– apparently do not vary. In other words, mutations of the kind that macroevolution doesn't need (namely, viable genetic mutations in DNA expressed late in development) do occur, but those that it does need (namely, beneficial body plan mutations expressed early in development) apparently don't occur.6 According to Darwin (1859:108) natural selection cannot act until favorable variations arise in a population. Yet there is no evidence from developmental genetics that the kind of variations required by neo-Darwinism– namely, favorable body plan mutations– ever occur.
Developmental biology has raised another formidable problem for the mutation/selection mechanism. Embryological evidence has long shown that DNA does not wholly determine morphological form (Goodwin 1985, Nijhout 1990, Sapp 1987, Muller & Newman 2003), suggesting that mutations in DNA alone cannot account for the morphological changes required to build a new body plan.
DNA helps direct protein synthesis.7 It also helps to regulate the timing and expression of the synthesis of various proteins within cells. Yet, DNA alone does not determine how individual proteins assemble themselves into larger systems of proteins; still less does it solely determine how cell types, tissue types, and organs arrange themselves into body plans (Harold 1995:2774, Moss 2004). Instead, other factors– such as the three-dimensional structure and organization of the cell membrane and cytoskeleton and the spatial architecture of the fertilized egg– play important roles in determining body plan formation during embryogenesis.
For example, the structure and location of the cytoskeleton influence the patterning of embryos. Arrays of microtubules help to distribute the essential proteins used during development to their correct locations in the cell. Of course, microtubules themselves are made of many protein subunits. Nevertheless, like bricks that can be used to assemble many different structures, the tubulin subunits in the cell's microtubules are identical to one another. Thus, neither the tubulin subunits nor the genes that produce them account for the different shape of microtubule arrays that distinguish different kinds of embryos and developmental pathways. Instead, the structure of the microtubule array itself is determined by the location and arrangement of its subunits, not the properties of the subunits themselves. For this reason, it is not possible to predict the structure of the cytoskeleton of the cell from the characteristics of the protein constituents that form that structure (Harold 2001:125).
Two analogies may help further clarify the point. At a building site, builders will make use of many materials: lumber, wires, nails, drywall, piping, and windows. Yet building materials do not determine the floor plan of the house, or the arrangement of houses in a neighborhood. Similarly, electronic circuits are composed of many components, such as resistors, capacitors, and transistors. But such lower-level components do not determine their own arrangement in an integrated circuit. Biological symptoms also depend on hierarchical arrangements of parts. Genes and proteins are made from simple building blocks– nucleotide bases and amino acids– arranged in specific ways. Cell types are made of, among other things, systems of specialized proteins. Organs are made of specialized arrangements of cell types and tissues. And body plans comprise specific arrangements of specialized organs. Yet, clearly, the properties of individual proteins (or, indeed, the lower-level parts in the hierarchy generally) do not fully determine the organization of the higher-level structures and organizational patterns (Harold 2001:125). It follows that the genetic information that codes for proteins does not determine these higher-level structures either.
These considerations pose another challenge to the sufficiency of the neo-Darwinian mechanism. Neo-Darwinism seeks to explain the origin of new information, form, and structure as a result of selection acting on randomly arising variation at a very low level within the biological hierarchy, namely, within the genetic text. Yet major morphological innovations depend on a specificity of arrangement at a much higher level of the organizational hierarchy, a level that DNA alone does not determine. Yet if DNA is not wholly responsible for body plan morphogenesis, then DNA sequences can mutate indefinitely, without regard to realistic probabilistic limits, and still not produce a new body plan. Thus, the mechanism of natural selection acting on random mutations in DNA cannot in principle generate novel body plans, including those that first arose in the Cambrian explosion.
Of course, it could be argued that, while many single proteins do not by themselves determine cellular structures and/or body plans, proteins acting in concert with other proteins or suites of proteins could determine such higher-level form. For example, it might be pointed out that the tubulin subunits (cited above) are assembled by other helper proteins– gene products– called Microtubule Associated Proteins (MAPS). This might seem to suggest that genes and gene products alone do suffice to determine the development of the three-dimensional structure of the cytoskeleton.
Yet MAPS, and indeed many other necessary proteins, are only part of the story. The location of specified target sites on the interior of the cell membrane also helps to determine the shape of the cytoskeleton. Similarly, so does the position and structure of the centrosome which nucleates the microtubules that form the cytoskeleton. While both the membrane targets and the centrosomes are made of proteins, the location and form of these structures is not wholly determined by the proteins that form them. Indeed, centrosome structure and membrane patterns as a whole convey three-dimensional structural information that helps determine the structure of the cytoskeleton and the location of its subunits (McNiven & Porter 1992:313-329). Moreover, the centrioles that compose the centrosomes replicate independently of DNA replication (Lange et al. 2000:235-249, Marshall & Rosenbaum 2000:187-205). The daughter centriole receives its form from the overall structure of the mother centriole, not from the individual gene products that constitute it (Lange et al. 2000). In ciliates, microsurgery on cell membranes can produce heritable changes in membrane patterns, even though the DNA of the ciliates has not been altered (Sonneborn 1970:1-13, Frankel 1980:607-623; Nanney 1983:163-170). This suggests that membrane patterns (as opposed to membrane constituents) are impressed directly on daughter cells. In both cases, form is transmitted from parent three-dimensional structures to daughter three-dimensional structures directly and is not wholly contained in constituent proteins or genetic information (Moss 2004).
Thus, in each new generation, the form and structure of the cell arises as the result of both gene products and preexisting three-dimensional structure and organization. Cellular structures are built from proteins, but proteins find their way to correct locations in part because of preexisting three-dimensional patterns and organization inherent in cellular structures. Preexisting three-dimensional form present in the preceding generation (whether inherent in the cell membrane, the centrosomes, the cytoskeleton or other features of the fertilized egg) contributes to the production of form in the next generation. Neither structural proteins alone, nor the genes that code for them, are sufficient to determine the three-dimensional shape and structure of the entities they form. Gene products provide necessary, but not sufficient conditions, for the development of three-dimensional structure within cells, organs and body plans (Harold 1995:2767). But if this is so, then natural selection acting on genetic variation alone cannot produce the new forms that arise in history of life.
Of course, neo-Darwinism is not the only evolutionary theory for explaining the origin of novel biological form. Kauffman (1995) doubts the efficacy of the mutation/selection mechanism. Nevertheless, he has advanced a self-organizational theory to account for the emergence of new form, and presumably the information necessary to generate it. Whereas neo-Darwinism attempts to explain new form as the consequence of selection acting on random mutation, Kauffman suggests that selection acts, not mainly on random variations, but on emergent patterns of order that self-organize via the laws of nature.
Kauffman (1995:47-92) illustrates how this might work with various model systems in a computer environment. In one, he conceives a system of buttons connected by strings. Buttons represent novel genes or gene products; strings represent the law-like forces of interaction that obtain between gene products-i.e., proteins. Kauffman suggests that when the complexity of the system (as represented by the number of buttons and strings) reaches a critical threshold, new modes of organization can arise in the system “for free”– that is, naturally and spontaneously– after the manner of a phase transition in chemistry.
Another model that Kauffman develops is a system of interconnected lights. Each light can flash in a variety of states– on, off, twinkling, etc. Since there is more than one possible state for each light, and many lights, there are a vast number of possible states that the system can adopt. Further, in his system, rules determine how past states will influence future states. Kauffman asserts that, as a result of these rules, the system will, if properly tuned, eventually produce a kind of order in which a few basic patterns of light activity recur with greater-than-random frequency. Since these actual patterns of light activity represent a small portion of the total number of possible states in which the system can reside, Kauffman seems to imply that self-organizational laws might similarly result in highly improbable biological outcomes– perhaps even sequences (of bases or amino acids) within a much larger sequence space of possibilities.
Do these simulations of self-organizational processes accurately model the origin of novel genetic information? It is hard to think so.
First, in both examples, Kauffman presupposes but does not explain significant sources of preexisting information. In his buttons-and-strings system, the buttons represent proteins, themselves packets of CSI, and the result of preexisting genetic information. Where does this information come from? Kauffman (1995) doesn't say, but the origin of such information is an essential part of what needs to be explained in the history of life. Similarly, in his light system, the order that allegedly arises for “for free” actually arises only if the programmer of the model system “tunes” it in such a way as to keep it from either (a) generating an excessively rigid order or (b) developing into chaos (pp. 86-88). Yet this necessary tuning involves an intelligent programmer selecting certain parameters and excluding others– that is, inputting information.
Second, Kauffman's model systems are not constrained by functional considerations and thus are not analogous to biological systems. A system of interconnected lights governed by pre-programmed rules may well settle into a small number of patterns within a much larger space of possibilities. But because these patterns have no function, and need not meet any functional requirements, they have no specificity analogous to that present in actual organisms. Instead, examination of Kauffman's (1995) model systems shows that they do not produce sequences or systems characterized by specified complexity, but instead by large amounts of symmetrical order or internal redundancy interspersed with aperiodicity or (mere) complexity (pp. 53, 89, 102). Getting a law-governed system to generate repetitive patterns of flashing lights, even with a certain amount of variation, is clearly interesting, but not biologically relevant. On the other hand, a system of lights flashing the title of a Broadway play would model a biologically relevant self-organizational process, at least if such a meaningful or functionally specified sequence arose without intelligent agents previously programming the system with equivalent amounts of CSI. In any case, Kauffman's systems do not produce specified complexity, and thus do not offer promising models for explaining the new genes and proteins that arose in the Cambrian.
Even so, Kauffman suggests that his self-organizational models can specifically elucidate aspects of the Cambrian explosion. According to Kauffman (1995:199-201), new Cambrian animals emerged as the result of “long jump” mutations that established new body plans in a discrete rather than gradual fashion. He also recognizes that mutations affecting early development are almost inevitably harmful. Thus, he concludes that body plans, once established, will not change, and that any subsequent evolution must occur within an established body plan (Kauffman 1995:201). And indeed, the fossil record does show a curious (from a neo-Darwinian point of view) top-down pattern of appearance, in which higher taxa (and the body plans they represent) appear first, only later to be followed by the multiplication of lower taxa representing variations within those original body designs (Erwin et al. 1987, Lewin 1988, Valentine & Jablonski 2003:518). Further, as Kauffman expects, body plans appear suddenly and persist without significant modification over time.
But here, again, Kauffman begs the most important question, which is: what produces the new Cambrian body plans in the first place? Granted, he invokes “long jump mutations” to explain this, but he identifies no specific self-organizational process that can produce such mutations. Moreover, he concedes a principle that undermines the plausibility of his own proposal. Kauffman acknowledges that mutations that occur early in development are almost inevitably deleterious. Yet developmental biologists know that these are the only kind of mutations that have a realistic chance of producing large-scale evolutionary change– i.e., the big jumps that Kauffman invokes. Though Kauffman repudiates the neo-Darwinian reliance upon random mutations in favor of self-organizing order, in the end, he must invoke the most implausible kind of random mutation in order to provide a self-organizational account of the new Cambrian body plans. Clearly, his model is not sufficient.
Of course, still other causal explanations have been proposed. During the 1970s, the paleontologists Eldredge and Gould (1972) proposed the theory of evolution by punctuated equilibrium in order to account for a pervasive pattern of “sudden appearance” and “stasis” in the fossil record. Though advocates of punctuated equilibrium were mainly seeking to describe the fossil record more accurately than earlier gradualist neo-Darwinian models had done, they did also propose a mechanism– known as species selection– by which the large morphological jumps evident in fossil record might have been produced. According to punctuationalists, natural selection functions more as a mechanism for selecting the fittest species rather than the most-fit individual among a species. Accordingly, on this model, morphological change should occur in larger, more discrete intervals than it would given a traditional neo-Darwinian understanding.
Despite its virtues as a descriptive model of the history of life, punctuated equilibrium has been widely criticized for failing to provide a mechanism sufficient to produce the novel form characteristic of higher taxonomic groups. For one thing, critics have noted that the proposed mechanism of punctuated evolutionary change simply lacked the raw material upon which to work. As Valentine and Erwin (1987) note, the fossil record fails to document a large pool of species prior to the Cambrian. Yet the proposed mechanism of species selection requires just such a pool of species upon which to act. Thus, they conclude that the mechanism of species selection probably does not resolve the problem of the origin of the higher taxonomic groups (p. 96).8 Further, punctuated equilibrium has not addressed the more specific and fundamental problem of explaining the origin of the new biological information (whether genetic or epigenetic) necessary to produce novel biological form. Advocates of punctuated equilibrium might assume that the new species (upon which natural selection acts) arise by known microevolutionary processes of speciation (such as founder effect, genetic drift or bottleneck effect) that do not necessarily depend upon mutations to produce adaptive changes. But, in that case, the theory lacks an account of how the specifically higher taxa arise. Species selection will only produce more fit species. On the other hand, if punctuationalists assume that processes of genetic mutation can produce more fundamental morphological changes and variations, then their model becomes subject to the same problems as neo-Darwinism (see above). This dilemma is evident in Gould (2002:710) insofar as his attempts to explain adaptive complexity inevitably employ classical neo-Darwinian modes of explanation.9