[FP Coffee] DNA Computing Problem

Jesse kebertyx at gmail.com
Fri Sep 9 15:46:48 MDT 2016


Hi Everyone!

I was talking to Daniel and Hugo yesterday about an interesting problem with applications in DNA computing. I thought I'd share a description and maybe give you guys some ideas for your next just-for-fun toy program! So here's how it goes:

-----

A strand of DNA is a string over the alphabet {A,T,G,C}, where (A,T) and (G,C) are matching base pairs. An important operation that certain enzymes can perform on DNA is called inversion (denoted x*), which reverses the string and replaces every symbol by it's base pair match. For example:

GATTACA* =TGTAATC
AATT* = AATT

Note that (x*)* == x, for all x
And that (x y)* == y* x*

There are certain strings which have semantic meaning for special [DNA processing enzymes][1], called *recognition sites*. A recognition site, x, is always matched by it's inverse, x*. One tells a recombinase enzyme to start, and the other tells it to stop. Since we're computer scientists, we can think of these restriction sites as pairs of matching parentheses. For example:

If
    [ == AGTATC
Then
    ] == [* == GATACT

It doesn't matter what the actual strings are, they're just short arbitrary markers. When the enzyme finds a substring enclosed in two opposing parentheses, it replaces the string with its inverse then deletes the parentheses. On the other hand, if it sees something between two of the *same* parentheses, then the whole thing will be deleted. For example:

x [ y ] z  ->  x y* z
x [ y [ z  ->  x z

Importantly, the recombinase enzyme doesn't care whether the recognition site is "facing in" or "facing out" – it can only detect whether they match or mismatch. So the previous example would be equivalent to this:

x ] y [ z  ->  x y* z
x ] y ] z  ->  x z

A completely unmatched paren in the string has no effect whatsoever, the enzyme just leaves it:

x [ y  ->  x [ y

Also, parens don't need to be properly nested:

x [ y [ z ] w  ->  x z ] w
x [ y ( z ] w  ->  x z* ) y* w

Notice that the []-enzyme ignored the "(", because every recombinase only searches for it's own specific marker. The process is also un-lazy, it starts as soon as it can, and stops as soon as it can. One last thing to notice about this system: *order matters!* Say we have this string:

x { y ] z { w [ v

If we give it to the {}-enzyme then the []-enzyme, we get this:

{}:  x w [ v
[]:  x w [ v

But if we do this the other way around, we get this:

[]:  x { y w* } z* v
{}:  x w y* z* v

And those are all the rules. It turns out your cells process DNA using this crazy programming language that looks like the hybrid baby of lisp and brainfuck! Who knew?

-----

So now that you're an expert in this string rewriting system, I have a problem for you:

Write a program that takes as input a  "DNA" sequence, encoded as an alternating series of letters and parentheses, and outputs the number of different results you could get back by running the recombinase enzymes in different orders.

You can assume that no two symbols are repeated twice in the sequence (except for mismatched brackets!), and that it's impossible to accidentally create a new recognition site by gluing two unrelated subsequences together, like this:

If
    [ = ATCA
    ] = TGAT
Then
    A(GA)A GCA (CAA)AT  ->
    ATCA GCAT TGAT  =
    [ GCAT ]  ->  ATGC

I have a strong hunch this happens in real life, which would make mother nature seem *waaay* more hacker than I ever gave her credit for. Anyways, this problem doesn't require you to think at all about the underlying base-pair encoding!

This originally came to me via [Scott Aaronson's blog][2], which you can read of you're interested in learning more. In that post he actually proves an upper bound of 2^n for the number of distinct outputs, where `n` is the number of different types of parentheses in the input sequence. This is surprising, because you'd expect to find a case where *every* permutation of enzymes gives a different result, which would raise that upper bound to n!.

Anyways, have fun!
- Jesse

[1]: https://en.m.wikipedia.org/wiki/Recombinase
[2]: http://www.scottaaronson.com/blog/?p=2862
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.socorrofp.org/pipermail/fpcoffee/attachments/20160909/e38dd353/attachment.html>


More information about the FPCoffee mailing list