Fresh Brainz: Identification Of Design Motifs

Tuesday, December 11, 2007

Identification Of Design Motifs

A few months ago, prominent science blogger Prof. Larry Moran posted an interesting post on his Sandwalk blog.

The article was about intelligent design creationism, and how proponents of this idea intend to prove that our genetic code was designed. They used the movie Contact (1997) as an example of how an alien signal can be identified as intelligent, based on the mathematical content of the signal.

An interesting question was then raised by intelligent design proponent, Michael Egnor.

He asked:

If the scientific discovery of a ‘blueprint’ would justify the design inference, then why is it unreasonable to infer that the genetic code was designed?

Well, to be picky, genetic codes are more like recipes rather than blueprints, because most of the information does not topographically map to the phenotype (there are exceptions, such as the Hox gene clusters).

Still, the question is broadly valid. Since people are already routinely modifying genetic codes, and soon will have the capability to design de novo genetic codes, this question sounds rather reasonable. I'll rephrase it as:

"Is there any method to demonstrate that a genome contains elements that can be identified to be designed by an intelligent being?"

My answer?

Yes.

Let me show you how it can be done, in a falsifiable and scientifically sound manner.

I call it "Identification of Design Motifs" or IDM.

But first, we need to make some basic assumptions.

The only currently known designers in the Universe are human beings and some animals. To discover designed motifs, we need to know what constitutes design.

For this purpose, we assume that the putative genome designer must exhibit some human-like design principles, such as consistency and systematic organization.

Genome designers that are in principle unknowable, undefinable or show no comprehensible design habits cannot be discovered using IDM (or for that matter, any other method of rational inquiry).

So we are limited to genome designers we can understand - perhaps a Super Alien or a Lifegiver Deity.

To proceed further, let me use the illustration of a computer programmer, working on a piece of code.

Human software programmers use a number of systematic principles to make their job of writing and debugging the code easier.

Lines of code are numbered in sequence. Command terms have exactly the same function throughout the code. Programmers often leave comments within the lines of code to explain what they're trying to do, so that it's easier to fix things if the program doesn't work properly.

These are crucial elements of design that can be detected using IDM.

Salient features such as mathematical progressions, invariant function of key sequences and *gasp* the designer's own original comments in any sort of comprehensible language - if these are discovered, they will present a strong case that our genetic code was designed.

Now let's go into the specifics: how to apply the IDM strategy!

1. Coding sequence

One of the best understood parts of a genome are the protein-coding sequences. Sure, they are important for making proteins, but are there also hidden messages contained within?

I'll give you a concrete example.

Here, look at the sequence of the human Oct-1 protein. Each of the letters represent one of 20 amino acids most commonly found in living organisms:

MNNPSETSKPSMESGDGNTGTQTNGLDFQKQPVPVGGAISTAQAQAFLGHLHQVQLAGTSLQAAAQSLNVQSKSNEESGDSQQPSQPSQQPSVQAAIPQTQLMLAGGQITGLTLTPAQQQLLLQQAQAQAQLLAAAVQQHSASQQHSAAGATISASAATPMTQIPLSQPIQIAQDLQQLQQLQQQNLNLQQFVLVHPTTNLQPAQFIISQTPQGQQGLLQAQNLLTQLPQQSQANLLQSQPSITLTSQPATPTRTIAATPIQTLPQSQSTPKRIDTPSLEEPSDLEELEQFAKTFKQRRIKLGFTQGDVGLAMGKLYGNDFSQTTISRFEALNLSFKNMCKLKPLLEKWLNDAENLSSDSSLSSPSALNSPGIEGLSRRRKKRTSIETNIRVALEKSFLENQKPTSEEITMIADQLNMEKEVIRVWFCNRRQKEKRINPPSSGGTSSSPIKAIFPSPTSLVATTPSLVTSSAATTLTVSPVLPLTSAAVTNLSVTGTSDTTSNNTATVISTAPPASSAVTSPSLSPSPSASASTSEASSASETSTTQTTSTPLSSPLGTSQVMVTASGLQTAAAAALQGAAQLPANASLAAMAAAAGLNPSLMAPSQFAAGGALLSLNPGTLSGALSPALMSNSTLATIQALASGGSLPITSLDATGNLVFANAGGAPNIVTAPLFLNPQNLSLLTSNPVSLVSAAAASAGNSAPVASLHATSTSAESIQNSLFTVASASGAASTTTTASKAQ

Scanning quickly through the sequence (not exhaustive), I can already find a series of English words contained within. Of course, the genome designer may not use English, but it's possible to write a computer program that can screen coding sequences for a number of human languages.

A word can be coincidental, but what about meaningful sentences?

If a comprehensible sentence in any human language can be found, in the sequence that these genes appear in the chromosome, then that would be very, very, very difficult to explain using conventional evolutionary biology.

2. Non-coding, non-regulatory sequence

There are huge sections in any genome (especially the onion) that do not code for proteins, and do not serve any regulatory function. Why are they there?

Pan-selectionists loathe the existence of these, preferring to believe that they serve some hitherto unknown function. Neutralists disregard these as useless products of mutation that doesn't impact reproductive fitness.

For IDM proponents though, it's a unique opportunity to find design motifs, such as mathematical progressions, large systematic repeats, and sentences of words in any language that may uncover the true function behind these massive sections of DNA.

To give a detailed example, the mathematical content of DNA can be decoded using the Base-4 (or quaternary) method of counting.

If someone was to discover a series of numbers rising systematically like lines in a computer program, that would be very, very, very difficult to explain using conventional evolutionary biology.

----------

The genetic code of a number of organisms are already publicly available. Although it is tedious work, anyone with good computer programming skills can analyze these genomes using the IDM approach explained above.

Start with a few genes... a short section of non-coding DNA... who knows what you might uncover.

So if you're a proponent of intelligent design - what are you waiting for?

A Nobel Prize awaits you!

Or me!?!!*

*If I can find this, I'll win the Nobel Prize. Not only that, I'll uncover the mystery of non-coding sequences, directly revealing the original purpose of those sections of DNA and potentially saving millions of people who suffer from genetic diseases of any kind. Ha ha ha!

Please leave a comment, or send me an email if you think that IDM is a good/stupid idea.

2 Comments:

Elia Diodati said...: I call bullshit on both.

1. The assignment of AAs to a one-letter alphabet is essentially arbitrary. There is no *physical* reason why Glycine ought to be H or X or G. Thus to truly validate your claim, you would need to test all possible mappings to all possible alphabets. (In the case of 20 AAs mapping onto 26 letters you would need to evaluate 26!/(26-20)! = 21543347282404147200000 possible permutations) And heaven forbid that English isn't the Chosen Language and you had to map AAs to (say) Russian, Chinese or Japanese. Even so, the presence of sequences of sensible letter streams can still occur purely by chance, given the large combinatorial space you are working with. Books like the Bible Code have been thoroughly debunked, as they ought to have been.

2. Your idea of series of numbers will be very hard to prove one way or another. All sorts of mathematically rigorous sequences have been shown to exist in biology, e.g. the famous Fibonacci sequence. Understanding why such sequences occur is interesting in itself, but does not obviate the need for a creationist explanation.; Wednesday, December 12, 2007 6:03:00 AM
The Key Question said...: The assignment of AAs to a one-letter alphabet is essentially arbitrary. There is no *physical* reason why Glycine ought to be H or X or G. Thus to truly validate your claim, you would need to test all possible mappings to all possible alphabets. (In the case of 20 AAs mapping onto 26 letters you would need to evaluate 26!/(26-20)! = 21543347282404147200000 possible permutations)

21.5 sextillion permutations? Why, that would just take up a year of supercomputer time!:P

The main assumption of IDM is that the putative genome designer exhibits systematic habits. If there are hidden English sentences within the protein sequence, once any comprehensible word of sufficient length (say 10 letters) appears, we immediately have a cipher that can be used to examine the rest of the sequence. No need to reset the permutations for each letter that we move forward.

This is broadly similar to what codebreakers did at Bletchley Park. Though unlike this case, they know that the 1. target language is German, and that 2. the message is usually not gibberish.

And heaven forbid that English isn't the Chosen Language and you had to map AAs to (say) Russian, Chinese or Japanese.

That's true. However proponents of intelligent design may feel that the genome designer is more likely to use certain languages (for example the 22-character Aramaic), which would support their version of a Designer God.

As for the 40,000+ character Chinese...yup, let's examine that language later.

Even so, the presence of sequences of sensible letter streams can still occur purely by chance, given the large combinatorial space you are working with. Books like the Bible Code have been thoroughly debunked, as they ought to have been.

Indeed, which is why it will be more persuasive if topographical continuity is demonstrated. It's one thing to fish out a few words here and there; totally something else if a comprehensible sentence appears in sequence.

Your idea of series of numbers will be very hard to prove one way or another. All sorts of mathematically rigorous sequences have been shown to exist in biology, e.g. the famous Fibonacci sequence. Understanding why such sequences occur is interesting in itself, but does not obviate the need for a creationist explanation.

I originally had a simple arithmetic sequence in mind (10,20,30...), like those used in BASIC programming, but while riding on the MRT last night I also remembered the Fibonacci sequence. I don't know if this sequence has been found in the genome, but it would be quite bizarre if running numbers were found in non-coding, non-regulatory regions.

The first candidate I would suggest to examine using the IDM approach would be the mysteriously massive dystrophin gene. It'll be cool to find out what secrets lie inside its numerous introns.

Personally I find that the chances of finding anything interesting this way is very, very slim, since there is significant genome variation even between healthy individuals (SNPs, CNPs and other mutations), demonstrating that genomes don't have to be exactly in a certain configuration in order to work properly, unlike most computer programmes which can be garbled with a bad line of code.

But I was trying to demonstrate that a design hypothesis is not ontologically impossible, even if my methodology is shaky. I'm happy to hear of any other possible strategies.; Wednesday, December 12, 2007 2:24:00 PM

Fresh Brainz

Fresh Reads from the Science 'o sphere!

Tuesday, December 11, 2007

Identification Of Design Motifs

2 Comments:

Fresh Brainziacs

About Me

Delicious Brainz!

Accolades

All Time Favourites!

Links

Blog Roll

Blog Archive

Blog Tools

Blog Performance

Visitor Map

Number of people who saw what I did there: