ICFP 07 - Solution

go next top of page

The Perfect Prefix

This page explains the parts of my ‘perfect prefix’, which is a solution for the ICFP Contest 2007. The contest task was to repair the genetic material of an alien called Endo. He has a special genetic structure consisting of the bases I,C,F, and P, which encodes a programming language based on regular expressions. The output of this process, the RNA, consisted of drawing primitives, that allowed the generation of an image displaying Endo's shape. The goal was to change Endo's genes by adding a prefix in front of it, that should modify the process to produce a different image where Endo lives happily in the shape of a cow.

In fact the DNA of Endo is a large program, written partly in an imperative, partly in a functional style and providing most functionality that is needed to build the target picture. However, some parts are slightly broken and need to be repaired, other parts are encrypted, others infected with a virus that replaces RNA fragments. The following description assumes that you are familiar with the contest task. You can download the task description and the final technical report from the ICFP contest web-site.

There is no official solution, but every prefix is associated with a ‘risk’ value, that is the length of the prefix plus 10 times the sum of the wrong pixels. The smaller the risk value, the better the solution. My solution reproduces the image perfectly, so the risk equals the length of the prefix. I think even the organizers did not have a perfect prefix. There were some contestants reproducing the image perfectly with a huge prefix, so their overall risk value was very poor.

My solution is the best known solution to the problem. A previous version of it appears in the technical report of the ICFP contest. When preparing this page I found a way to improve the prefix slightly. This page describes my current record of a 3567 bases prefix. If you can find an even shorter prefix, please tell me.

It roughly consists of three parts: (1) A generic patching routine with several patches that overwrite the original DNA, (2) a routine that decompresses 11 bit numbers and writes them over the speech bubble polygon, and (3) a routine to restore the cloud-duolc palindrome by copying the latter in reverse.

go next top of page go back

Compression tricks

The main idea to keep the prefix short, is to use small loops that execute the same code over and over with different parameters. The original DNA uses the same trick to build compressed RNA, which encodes characters and images. A loop consists of a bunch of bases given two times. These bases encode the commands that are repeatedly executed. The last command reads in the second copy of the bases and outputs it twice. The different parameters directly follow the loop and are removed when they are processed. There is an end marker after the last parameter which will cause the loop to terminate.

We need a lot of numbers which encode patch position, lengths, and polygon coordinates. The original encoding uses only the bases I, C, and P. We can compress these numbers slightly by ‘unquoting’ them: P is replaced with F, IC with P, I with C and C with F. Quoting the number yields the original number again, except that some I's are replaced with F's. However, the DNA processor does not distinguish between I and F in binary numbers.

The other trick is to change as few bases as possible. If some command should not be executed it is often enough to change one or two bases to prevent the pattern to be applied. Also we try to reuse parts of existing instructions by aligning the new instructions carefully.

go next top of page go back

Patching routine

The prefix starts with a command to prepare the first loop:
IIPIFICCFPIICIICIPPPIPPPIIC

This encodes the pattern and template ‘(?IICF)→0000 ’, which copies the next bases upto IICF two times. One of these copies is executed while the second is used to restore the loop and thus repeat the commands. The 186 bases that form the loop encode two commands. The first is

IIPIFICCFPIICIPCCICIICCPFCIICIPPPIIC

which encodes ‘(?IICF)!203CI→00’. This is the aborting instruction. It checks for a end marker ‘CI’ and if it is present it will execute. It copies the second instruction which is used to patch the length in the polygon in the second part of the prefix. The copy of the patch loop is skipped and thus removed from the DNA string. If the end marker is not found the main command of the first loop is executed:

PIIPIFICCFPIICIIPIFIICIICIIPIFIPIICIIPIFIPIICIIC
CCICCICFIFCPCCPCCFCCICCICIFPCPCICFIFCPICPCCFCICFIFCPCCPCCF
IFCPPIFCPPCCICFICCPICFICCPICICCCFIIC F

The command reads in the dummy F (that is only used to search for the end of the loop), the copy of the loop, the binary number ending with P that encodes the last patch position, the ‘unquoted’ binary number ending with F that encodes the distance between the last and the next patch and the ‘unquoted’ number that encodes the length of the next patch. It creates the second command (given in blue).

F(?IICF)(?P)(?F)(?F) → IIPIPC 31 IICIIPIP 10 IPC 21 IICIPC 31 IIC 0101 IIPCPIFPCPIFPPIIC

For space reasons the LSB of the distance and length parameter is omitted in the patch table. It is always assumed to be one. This way we can only encode an odd distance and length. Because the distance is relative to the current position, which also depends on the length of the encoding, we can have arbitrary distances by adding leading zero bits to the distance. Also, it does not hurt to patch one bit more to get an odd length. The second command in blue encodes the following, where the parameters coming from the first command are marked in black:

(! »C31« ) (! »10« ! »C21«) ! »C31« → »0101« |1| 1000

A one bit (C) is prepended to the length and distance parameters. The first parenthesized part matches the patch, which is found immediately after this command. The next match skips to the patch position: we add the last patch offset and the distance parameter, where we set the LSB to one (by prepending a C). The fourth skip matches the bases that should be overwritten. The command then rebuilds the loop by outputting it two times, followed by the address of the patch position, which is the length of match 1, followed by the original bases leading to the patch position (argument 1) and finally followed by the new patch, which was matched by argument 0.

P

This is the initial address (0). This is computed by the program that builds the patch table. It is either one or zero, because the offset is implicitly odd.

go next top of page go back

Patch Table

Each entry of the patch table has three parts: offset, length, and patch. The offset and length are unquoted integers. The offset is relative to the last patch position. Since the offset also depends on the length of the current patch it is difficult to compute. Therefore I additionally give the offset relative to the start of the green zone and length in hex.

00005e:021 IPIIICPCCCPF CCCPF FICCCFCPCFCFCCFCPIICFCCFCPICCFCCF

This writes the EBCDIC encoded string “9546” (the number hidden in the E.T. image) into the variable giveMeAPresent. Normally this should contain the encryption key for vmu-code; we later change the code to decrypt the cow-tail gene instead, which is encrypted with E.T.'s password.

00050f:001 IICPPCPF F F
Switch on daylight by setting variable night-or-day to false.
03346a:0a1 IICPPPIIPCPIF CCCPPF PCCCCCIIIIIIIIIIIIIIIIIIP
    ICCICIICPIICIICCIPCCIIICCIPCICCICIIPICCICICIPICCIIICIPCICCICIIPICIIIIICP
    CIIIIICIPCICICICIPIICIIICIPCICCICIIPCIICIIICPCIICIIICPCCCCCCCCPI
Set hillsEnabled to true, vmuMode to 31, and vmuRegCode to “Out_of_Band_II”. The latter enables drawing of the caravan.
0c7dc0:00d IIPICPCPCPPCPCF PIF IIIIIICCCCCCC
Set seed to 8128 (a perfect number), so that the weed grows correctly.
0c91e0:003 CCPCCCCPPF IF CIC
Set polarAngleIncr to 5, which rotates wind mill by five degrees.
0da070:001 IIIIIICPIICCCPF F P
0da0af:003 PPICF IF CCI
0da0b8:001 PF F C
0da0d5:011 PCF CCPF FFCCFFFFCFFCFFCFC
This fixes the errors in cow-spot-middle. Since this method is similar to many others, we change a call to return to a different correct method.
0daf30:009 PCCPCPIICF CPF ICCCCICCP
0daf3c:003 PF IF IIC
Skip the first part of the bmu gene to jump directly to the drawing of Endo. This gene is capable of drawing Endo in several forms; of course, none of that is the right one. However, there is the code that draws Endo in the right cow-shaped form but only as a shadow. This patch skips the shadow drawing part.
0df477:011 IPICCPPCCPF CCPF PCCIIIIIICCPIICII
Skip the last part of bmu which would draw Endo only as a shadow.
206d4a:001 IIIIIPCCPIIICPCPCF F C
206d4e:001 F F C
207125:001 PICPIIIF F I
20715d:009 IICPF CPF FFFCCFCCF
20716b:005 PF PF FFFCF
Fix the sun gene (values come from sunflower xor flower, where I=00, C=01, F=10, P=11). Similarly to cow-spot-middle we only fix the call and change return address to jump to a correct method.
21f149:003 IPCPIIIIIIIIPF IF CFI
21f1ca:001 CPIIIF F P
Patch color of cargo box. It seems, that somehow the color entries in the lookup table for the compressed RNA got swapped.
29bdf4:005 ICPCCCCPICPIIIIF PF FFFFF
29be34:005 PPIF PF CCCCC
2aae9c:00b PICPCCCCPIIIF IPF CICCICICCCI
2abdf9:001 CCPPPIIICF F I
The printGeneTable contains a slightly broken bitmap of the whale blow. Fix rotation in compressed whale blow RNA, by swapping the code for rotate left and rotate right. After drawing the RNA skip to end of method.
2c932e:001 IICPCPPPPIIF F C
Fix mother duck polygon. The numbers didn't add up correctly which lead to coordinates getting out of sync and breaking the remainder of the picture. It turned out that a single number was wrong.
3c9eb5:00b CPPIPIPCCCCCCCPF IPF FFCCCCCCFCF
3c9ec6:005 IIF PF CFFCF
Patch return address in appletree to switch to peartree.
3ea544:001 PIPICPICCCCCPCF F P
Patch integrity check to always return true; Normally, the checks prevent activating the cow-tail or cow-spot-middle gene unless they are fixed. However, since we modify these genes, the check would fail.
45ad2e:001 IIPPIIIICCCCPIICF F F
45ad32:001 F F F
45b1ad:005 IPPICCPCF PF CCCCC
45b634:007 PPIICCPF IIF PCFCFFF
45b683:003 PCCPF IF FFC
45babe:007 CPPCCCPF IIF FCCCFCC
45bb0c:003 ICCCPF IF CFF
Swap coordinates in flowerbed. This swaps the yellow-red with the blue-white flowers.
4aa7c2:017 PICCPCPIPIICPF IIPF FPFCCCFIIFCPCPPPFPIIIFF
4aa7e0:00b PIF IPF PPIICFFICFP
4aa7f4:001 PICCF F C
4aa7fe:001 PCF F F
4aa808:001 PCF F F

Patch color of cow-tail to draw it transparently. It adds seven opaque and three transparent items to the color bucket. This patches the encrypted function. The original DNA consists of twelve RNA commands to add white to the color bucket. The sequence is replaced by:

RNA:opaque (!70)→ 000000 RNA:opaque opaque transp white white white white

The DNA command will triple the next seven RNA command, so that it gives a total of seven opaque and three transparent additionally to the twelve white RNA commands. The patch is carefully aligned to minimize the number of bases to be replaced.

4cd7af:001 IIPCPIIIIPCCPF F C
4cd7e7:005 PIPF PF FCCFC
4cd861:003 IIIPIF IF CFC
4cf41c:001 IIIPPIIPIF F I
4cf423:003 F IF III
4cfe86:009 PICPCPPCCF CPF FCFCFCFCC
4cfed2:007 PIIIF IIF FCFFCCF
4d01f0:007 IPCCCPICF IIF FFCCCCF
4d01fc:003 PF IF FFC
4d0209:007 CF IIF CCCFFFC
4d0215:003 PF IF CCC
4d08d3:005 IIIPPPIF PF FFCFC
4d08da:001 IF F C
4d0921:005 IPIIF PF CCFCF
4d0eff:00b IICCPIIPF IPF CCCFCCCFFFF
4d0f4c:00b CPIICF IPF CFCFFFCCFFC
4d1041:017 PPPIF IIPF IICCCICICICCCICCIICICII
4d1060:00f CPF IIIF CCIICCCCIIIIIIC
4d107d:017 F IIPF ICCCIIICCCICICCCCCIIICI
4d535a:005 IPCPIPCCCPF PF CFCFF
4d53a9:005 CCCCPCF PF FCCFF
4d54e1:003 PIPCPF IF IPP
4d570e:007 IPICCCPF IIF FFFCFFC
4d575b:003 CCCCPCF IF CFF
4d584e:00d PIIPIF PIF IIICCCICCICIC
4d585f:005 IIF PF ICIIC
4d586b:005 IF PF PICIC
4d5877:003 PF IF IIC
4d588b:009 PCF CPF CCCIICICC
4d5894:005 F PF ICCCI
4d589c:003 CF IF CCI
4d5d39:007 IPCCPCPF IIF CCFFFCC
4d5d85:003 CCCCPF IF CFC
4d6316:001 PCCCPIPCCF F C
4d632a:005 CPCF PF CCFCF
4d6334:005 CF PF CFFCF
4d633e:005 CF PF FCCFF
4d6348:001 PCF F F
4d6352:005 CF PF CCCFC
4d635c:005 CF PF FCFCC
4d6368:003 PF IF FCC
4d6372:005 CF PF FFCCC
4d687b:017 PCPIICPCF IIPF IIICIICIICCCCCCCIICCIIC
4d6899:009 PICCF CPF ICCCICICI
4d68a5:003 PF IF CCC
4d68dd:005 PIPF PF CPIIC
This is the patch to scenario function, which changes the coordinates and the drawing order of some objects, replaces the string ‘Endo has morphed’, and fixes position of grass (by emulating the effect of the biomorph unit). It also replaces some functions with different: lambda-id with whale blow and crater with ufo.
52dd16:005 PCPCCCPPIIPPF PF FFFFC
52dd61:007 IPIICF IIF CFCFFCC
52de43:005 PPPICF PF PFFFF
52de4d:003 IF IF III
52dedf:007 CCCCCPF IIF IPFFFFF
Patch the water gene to include the RNA for two turns. This prepares the rotation for the ufo-cup. Since there is only space for one RNA command we need to hack the return statement to include the second RNA command. Getting the water into the ufo-cup is easy, since the original DNA already does this when the weather condition changes. Rotating the cup is the difficult part.
5c9164:007 PIPICPCPIPICPF IIF FCCFFCC
5c94fa:005 PCCCPIICCF PF FFFCF
5c963d:003 IIPICPF IF CFC
5c989e:003 PPPCPCF IF FCC
5c99dd:005 CCPICPF PF CCFCF
Patch scale factor of clouds. The original DNA draws them all in the same size and at the wrong position.
6130e2:001 PPIIIPIPCPCPCF F I
Patch duolc gene to not set the cloudy variable. The latter has some very strange effects :)
63a565:01d PPPCCPPIICPCF PIIF ICFCFCFCFCICCFFCFCFCICFCCCFFC
63a586:01b F IPIF CFFFFCCICFCCCCCCFICFFFCCFFC
63a5a5:005 PPF PF CCFFC
63a5af:003 IF IF FFF
This replaces the password in the locals of main with the one for the mu character “No1@Ax3” (found in the sticky note bitmap).
63a7a5:017 PPPIIF IIPF CFFFFFCFFFCCFCFCFCFCCFI
63aaa0:009 IICPIIPF CPF CCCIIICPI
63b699:015 IIPPIIIPCF PPF FFCFFFFCFCCFCFCFFFFCC
63b992:003 PIPIIPF IF IIC
63b99b:005 F PF CPIIC
Patch main to decrypt cow tail and mu. The original DNA already contains some code to decrypt vmu-code and help-beautiful-numbers, which we reuse.
642b2b:015 PIPIPCCPIIF PPF FCFFFFFCFFFCFCCCFFFFC
642b45:007 PIF IIF FFFCFCF
642b4f:005 CF PF CCCCC
The original DNA contains a page about spirographs. One of them is the one that should be drawn into the sun, just scaled up. We add a jump operation after this code to the resetOrigin gene. The spirograph part of main is called from ufo, see below.
653631:001 PPPIPPCCCPF F C
65367d:007 PIIIF IIF FFFFCCC
654ca3:003 IIPCCCPIPF IF FFF
654cab:003 CF IF FFF
654d20:00b CCCPIF IPF FCFCFCCFFFF
654d6c:003 CCCCPF IF FFF
654d71:001 CF F F
654e0c:029 CPPIF CPPF FFCCCCCFCCFCFCCCCCFCCFFICFCFCFFCFCCCCCCFF
The ufo function is rewritten to draw the upside-down cup filled with water and patch return address to jump to the spirograph function.
6d4031:001 ICPCCCPCPIIIIIIF F P
Disable the virus code in surfaceTransform which damages RNA. Note that we just need to change a single base to sabotage the virus.
6d4576:003 IIPICPPCF IF FFC
6d46bb:001 IPIICPCF F F
6d4706:003 IIIIICF IF CFF
6d470d:003 F IF FFC
6d48c9:001 ICPIPIF F F
6d4913:005 PIIICF PF CCCFC
6d5440:005 PIICCPIPF PF FCFCF
6d5584:005 PPICPCCF PF CCFCC
6d55d2:00b IPIIF IPF FCFCCCFCFFC
6d6439:007 PPPCPIIF IIF CCFFFFC
Restore the parabolas that draw the hills. It was one of most difficult tasks to find the right values for them.
6f33d4:003 IICCCPIIIICPIIF IF FCF
6f33d8:001 F F F
6f341f:005 IPIIF PF CFCCF
6f3821:007 IIIPIIIIF IIF FCCFFCC
6f386b:009 CPIIF CPF FCCFCFCFC
6f4a70:003 PPIIIICCPCF IF CCF
Reposition the background of the balloon polygon, and change lambda to mu. The actual polygon is overwritten here.
6f4f6b:001 IIIPIICPCF F I
6f51ff:001 IPCCPPF F I
Disable the scaler in spirograph that would draw the spirographs four times bigger.
6f9e45:001 IIPICCCPICPCF F C
6fa552:037 CPCPIPIF IIPIF CCCIIIIICICIICCCICIPIIPIPIICICICICIIPIICIICIICIPPPIPPCP
Patch sky-day-bodies to draw clouds and sun. The original DNA contained code that draws them depending on the weather value. However, there is no setting for drawing both sun and clouds. We also change here the sun color by copying the contents of the colorSoftYellow gene.
71cc40:005 CPIPIPICPCCPCF PF CIICI
71cf0f:00f PPIPPCF IIIF IICICIICICIIICC
Patch goldenFish_adaptation to draw two extra fishes and move the bottom fishes.
72035f:007 IPIICCCPPIF IIF CCCCCCI
72038f:007 CCCPF IIF ICCCIIC
Patch pictureDescrRenderer_adaptation to swap drawGoldFishR with drawGoldfishL.
CI
The end marker. This will tell the patching loop to terminate. go next top of page go back

Compressed speech bubble polygon

The speech bubble in the target image looks quite different from the one in the source image. Endo's DNA uses a polygon routine to draw it. Unfortunately there is no polygon in the DNA matching the bubble in the target image. I therefore had to find the right coordinates by hand and write them over the old polygon. Doing it straight-forward would take a few thousand bases. Therefore, I compressed the numbers forming the polygon (using variable length numbers and the unquoting trick). The following loop will decode the compressed numbers.

IPCCICCICIICPFCIIPIFIFPCCFIICIICIPPPIPPPIIC
This encodes ‘!603CI(?CFIIC)→0000’, which checks for end marker CI and aborts polygon loop. It also prepares the palindrome loop by doubling it.
PIIPIFICCFPIICIIPIFIICIICIIPIFIPIICIICCCICCPCICCFCCFFCFCCFFFCFFCFCCICI
CFFPFICFICFFPCPFICICFCFCFCFFFPFCFFCFFCFFFFFFFCCFCCICCICIFCPICPCICFFFFF
FFFFFFICCCFCCFFFCFFCFPPFPCFFFPFICFFPFFCFFCFIPCPCPFICFPICFFPFCFPPFPCFFF
PIPICPPIPICPPFFCFPCFFICCFPCFFICPCFCFFFPCCICICCPICICCCFIIC F
The decompressor. This is a bit tricky. There is some self-modifying code: the gray and green instructions change the low bits in the red number to zero.
F(?IICF)(?P)(?F) → IIPIFIPICIICCICIIC CCICCICIIPPCCFCPCPCCFIFCPPCICICICCCFCICCICCICCCCCCCIIC FFCFFCFPPFPCFFFPFICFFPFFCFFCFIPCPCPFICFPICFFPFCFPPFPCFFF PIPICPPIPICPPFFCFPCFFICCFPCFFICPCFCFFFPCCICICCPICICCCFIIC F IIPIP 21 IPCCCCCCCCCCCPIICIIC CCICCICFFCFICCCFCPCCFCCICCIC 11 CPFFPCCFCICFFCFICCCF 0202 CCICFICCPICFICCPFICICCCFIIPPIFPPIIC
The gray instruction that is output by the above command is a conditional command. If a negative number is encoded (trailing FP), it can match on the blue command. It outputs a new green command, which replaces the low bits of the red number with zeros. The number of zeros equals the length of the number minus one.
(?FP)IP → IIPIP |0| IICIFCFIIC 01 IPPP IIC IPIPIPIIIIII
(! |0| )?C → 01 00
Now the blue instruction is executed, which computes the value of the sum. For positive numbers it will compute 0x800+number, so that there are enough leading zeros in the encoding. The eleven low bits of the results can then be copied directly into the right position by the magenta command.
(! 21 !0x7ff) → IIPIPCCICPIICIFIICIIPIP 11 IFCCFIICIPCCICPIIC 0202 IIPCPIFPCPIFCPPIIC |0| 00
(!11)?P(!11?IC)!11 → 0202 |1| 10 01
CP PPICPICPIIPIIIPIF F C
The patch for the polygon size. This is processed by the patch routine, which is copied by the abort condition of the first loop. It reads the offset CP and unquoted offset PPICPICPIIPIIIPIF and patches one byte. It also writes the offset back, which can then be used by the polygon patching routine.
CPIPF IIIIPPF F ICF IF IIICF CCPF PCF CPCPF IPIICF
IIF ICF ICF PCF ICF CCF CCF PCF IF ICF ICF ICF F ICF IF PICF IIIF ICCF
CPF CF IIF CF IPF ICF CF IICF CF PCF F CCCF IIF IICCF CPF IICF IIF ICF
IIIF PCF PIF CF IIIF CF IIIF F PF PF IIPF CPF PPF IIIF IPF CPF IIIF
PPF PF PIF CF IPIF F PF CF CPF ICF PIF CF IIF ICCF IPF CCF CPF IPCF
IIF PCCF IF ICCF F PCF IIF CCF PIF CF IIF IF IPIF IIF CPF CPF PIF IIF
IIIF PIF IIIF ICF PIF CF PF CCF CCPF F IF ICF CPF CPCF PIIF CPICF PCPF
IF CPF PF PIF CPF PIF IPF IPF PIF PF PF PF PIF IF IF PF F IF CF IPF
F F CCCF ICFPPF PPF

The compressed numbers of the polygon. Numbers ending with CF encode negative, F encodes -1, and those ending with PF or IF encode positive numbers. These are decompressed by quoting, adding 0x7ff for positive numbers. For numbers ending with CF, we set the last bits of 0x7ff to zero according to the length of the number. For example, “CPICF” gets unquoted to “FICCFP”, which marks a negative number. We add to it “IIIIICCCCCCCCCCCCCCP” since it is 5 bits long, which results into “IICCICCCCCCCCCCCCCCP”, i.e., the number -20.

CI
The end marker.  top of page go back

Palindrome

The cloud gene has got seriously scrambled by Endo's sudden crash. Luckily the DNA contains a copy of it right afterwards in the gene duolc. Though, as suggested by its name, the bases are in reversed order and thus form a palindrome (a word that equals its reverse, just like cloudduolc). The following loop will copy the bases from the duolc part back over the cloud gene.

IIPIPCCCCICICPIICIIPIPCICCPFIICICIICCCICCICCCFCCCFCFCFFCCCC
FCCCCFFICCCICPCICIPPCPICCCFCCFCICFICCCICCICCIPCPCPICCCICCIC
FICCCFCCFCCFIPCPPIPCPPCCICICCPICFICCPICCFICCPICFFICCCFIIC
This loop has no explicit aborting condition, instead the loop instruction just does not apply when we are finished and thus it is no longer copied. It prints some dummy RNA after terminating, but this does not hurt.
(!175)(!13C)P → IIPIPIICIIICICICCIIIICIIIICCPIIPFIP 10 PIICIIC IPCP IIPIP I 11 PIIPIPCPIICIICIIC 0101 IIPP IFPCP IFPICP IFPCCP IIC
(!0x610d44(C!«10P»)) !1 (!«I11P»(!1)) → 0101 |0| 10 20 30
ICIICIICICCIICPIIII

This is the negative length of the cloud function. This number is incremented in every step until there is an overflow. Then the loop terminates, because the CP is not at the expected position anymore. Together with the trailing ‘IIII’ this number is then exactly 20 bases long and consists mostly of I. These encode two dummy RNA commands.