Lorem Ipsum to Hebrew

Questions for Evolutionists asks:

How can mutations (recombining of the genetic code) create any new, improved varieties?
(Recombining English letters will never produce Chinese books.)

Lorem Ipsum to "ןגכה9'kpx%y'ךbעX]kגkVגגהiEג9*Qעfעt9fכ9עגכf' qFxה%עגה9תגכג9X' k-?%עגכהןגכה9'" in only 118 generations. Sure it's garbled gobbledygook but if you have any familiarity with Hebrew you should be able to spot kaf, ayin and tav among others. In this string there is a total of 33 Hebrew characters ranging from aleph to tav.

evolution.py is the program which produced this particular piece of gibberish. It models evolution by natural selection by successively evolving a population of phrases for two hundred generations. In the case of this particular string, fitness was calculated as a function of the number of Hebrew characters contained by an individual.

Method

Each population starts as a single copy of Lorem Ipsum. It is allowed to reproduce without the influence of any kind of selection until the population reaches 1024 individuals. At which point only the 1024 fittest individuals will be allowed to be part of the next generation.

Generation n + 1 is composed of the daughters of generation. Each individual produces two offspring and dies (binary fusion) each daughter is an inexact copy of their mother: 1 in 10 bases are replaced by a point mutation. Daughters are then decoded from utf-8 to unicode and then encoded back to utf-8, errors are ignored. The genome of the mother is recycled until a full 128 bytes are copied to the daugther's genome. Once 2 daughter cells are generated for each mother, the 1024 fittest cells are selected to be the mothers for the next generation.

Three different fitness criteria were used in three different evolutions: # Hebrew characters, # Latin Uppercase characters, # Latin Lowercase characters.

Results

In each of the following figures the absolute fitness of an individual (in terms of the # of relevant characters) is shown on the Y axis while the generation is shown on the X axis. The green dots are the most fit individuals of their generation while the blue dots are the least fit.

Scatter plot of population fitness.

Figure 1: Evolution of Lorem Ipsum when selecting for # Hebrew characters

Scatter plot of population fitness.

Figure 2: Evolution of Lorem Ipsum when selecting for Uppercase Latin

Scatter plot of population fitness.

Figure 3: Evolution of Lorem Ipsum when selecting for Lowercase Latin

Discussion

In the three figures there are a couple of important similarities:

  1. Maximum fitness reaches a fitness plateau before it reaches the theoretical maximum in any of the experiments (128 in the case of either Latin experiment or 64 in the case of Hebrew).
  2. Minimum fitness drops off for the first 10 generations in the two Latin experiments.
  3. In two of the three experiments minimum fitness after 100 generations greatly exceeds initial fitness of the population.

Plateau

This is an example of the mutation-selection balance. Essentially, after a certain point the mutation rate is introducing deleterious mutations (replacing characters relevant to fitness with characters that are not) as fast as selection removes them from the population. Thus the population's fitness reaches a plateau.

The Hebrew experiment was far more susceptible to this effect than either of the Latin experiments because each Hebrew character is represented by two bytes rather than one, the chance of a beneficial mutation is lower and the chance of a deleterious mutation is higher.

Minimum Fitness Dropoff

Lorem Ipsum is a text that is written using Latin characters as such the fitness of the founding individual is greater than zero. Since the population starts with a single individual which divides until it reaches the maximum population of 1024, individuals are not under selection for the first ~10 generations and thus deleterious mutations may accumulate without facing negative selective pressure.

Improved Minimum Fitness Over Time

Lorem Ipsum is a text mostly composed of lowercase Latin characters, as such it is not overly surprising that it shows almost no improvement over time.

On the other hand, the other two experiments show vast improvements in both minimum and maximum fitness over time. In fact the Hebrew experiment starts out with an individual of fitness 0 and results in a population of median fitness ~20 a mere 100 generations later. The Latin Uppercase experiment starts with an individual of fitness ~5 and results in a population with a median fitness of ~85 after 50 generations.

Conclusions

This experiment offers example of several interesting evolutionary effect. We've seen the mutation selection balance in play. Mutations introducing novel variants into the population. The steady increase of fitness over multiple generations. Increase in complexity of the organisms under study (Hebrew characters are represented using two bytes while Latin characters only use one).

Maybe rearranging the letters of the poems of Shakespear can never produce the I Ching, but rearranging the bytes most certainly could.