Model Purpose and Overview

What is Aevol?

Purpose of the model

Aevol is a simulation platform specifically designed to study the evolution of genome architecture – defined as the non-random arrangement of the functional and non-functional elements in the genome – and the role of chromosomal rearrangements – segmental duplications and deletions, inversions, translocations… – on evolution. Both objectives, and the relative scientific questions, are obviously tightly linked since chromosomal rearrangements are evidently the main driver of the evolution of genome architecture.

Aevol simulates evolution as a multi-scale process, including evolution of the sequence, of the gene repertoire and of the genome archirecture, and includes a large variety of mutational operators, including point mutations, indels and large scale chromosomal rearrangements (inversions, sequence duplications, sequence deletions, translocations). Aevol can therefore be used to decipher the effect of evolutionary conditions on the evolution of the organization of functional and non-functional elements on the genome, including the effect of chromosomal rearrangements and their interactions with other types of mutational events.

Principle of the model

In order to simulate evolution, any model needs to compute the probability of replication of an individual, meaning it needs to attribute, more or less directly, a fitness to the individuals. This simple requirement leads to two main strategies when it comes to simulate evolution: either the variation of fitness is computed at the sequence level (meaning that sequence variations – i.e. mutations – are directly attributed a fitness effect drawn from a predefined “Distribution of Fitness Effect” (DFE), or the variation of fitness is computed relatively of the difference in the phenotypes of the organisms. The former typically correspond to population genetics simulators such as SLiM developped by Messer Lab at Cornell University. It makes it possible to simulate the evolution of sequences but has limited power when it comes to simulate the evolution of genome structure or the effect of chromosomal rearrangements. The latter typically correspond to derivations of the classical fitness landscape metaphor, making it possible to attribute fitness variations to phenotypic differences but at the cost of the absence of explicit nucleotidic sequence. This typically corresponds to Fisher’s Geometric Model or to the classical NK-Fitness Landscape model.

Aevol adopts a third, intermediate, strategy that consist in mimicking the biological Genotype-to-Phenotype mapping that transforms the genome content into a phenotype. This makes it possible to model the effect of any kind of mutational event on the information encoded on the genome and to compute its phenotypic effect from the variation induced on the sequence. However, it is obviously not possible, based on current knowledge, to model a complete Genotype-to-Phenotype map and to compute the phenotype associated to any genome sequence. That is why Aevol models a multistep Genotype-to-Phenotype decoding process divided in two main parts: the first steps identify functional sequences (promoters, terminators, mRNAs, Ribozome Binding Sites, Open Reading Frames, Proteins sequences) on the genome thanks to initiating subsequences on the genome that initiate transcription or translation of the sequences. Then, at the functional levels (proteins and phenotype), aevol switches to an abstract mathematical world in which biological traits are represented by a couple of values \(x\); \(y\) with \(x\) a value in \([0,1]\) identifying the trait and \(y\) a value in \([-1,1]\) representing its activation (if positive) or inhibition (if negative) level. At this level, each protein primary sequence is folded to compute the mathematical function of the protein, that is, the set of \(x\) values of this protein and the associated \(y\) values corresponding to their inhibition/activation. For sake of simplicity, in the model all protein traits are represented by a simple triangle functions, hence enabling fast computation of the protein functions from the amino-acid sequence. Finally, all triangle-proteins are linearly combined to compute the phenotype, i.e. a \([0,1]\) to \([0,1]\) function representing all the biological traits resulting from the decoding of the genome.

Multistep organization of the Genotype-to-Phenotype map in Aevol — Overview of the Genotype-to-Phenotype map in Aevol
In Aevol the Genotype-to-Phenotype map is divided in 5 steps, with the first ones being based on sequence decoding: (1) Transcrition, (2) ORF identification, (3) translation; and the last ones based on a mathematical formulation of the biological functions: (3’) Folding, (4) Metabolism and (5) Fitness computation.

*Figure from (Khalor et al., 2024).*

Aevol’s modeling strategy makes it possible to simulate the evolution of realistic genomes structure (since the genome is decoded in a way that respects the central dogma of molecular biology) while computing a phenotype and attributing a fitness to the sequence. This allows to simulate any modification of the sequence without attributing a priori a fitness to the event. On the opposite, the fitness is computed a posteriori based on the changes induced by the mutation(s) on the phenotype.

Typical usage

Typical Aevol usage is based on in silico comparative genomics. Starting from a known individual, the experimenter simulates evolutionary trajectories under a series of experimental conditions. Then, by comparing trajectories according to conditions, it is possible to identify the evolutionary forces that gave rise to the observed divergences. Furthermore, by repeating the experiment a large number of times, it is possible to obtain a statistical quantification of the observed effects.

After simulating a series of evolutionary experiments, Aevol makes it possible to post-analyse the evolutionary histories by e.g. visualising the evolution of the main genomic compartments (size of coding and non-coding genome, number of genes, size of the polycistronic sequences…), analysing the sequence of fixed mutations, or analysing the final genomes properties (structural properties, robustness, evolvability, Distribution of Fitness Effects…).

When to use Aevol

There is an extensive litterature about modeling and simulating evolution and many simulators are available, some well known and used worldwide (as SLiM (Haller et al., 2026) or Avida (Ofria & Wilke, 2004) for instance), other more confidential because more dedicated to specific usecases. This of course raise the question ``Why using Aevol rather than another model?".

Aevol has been specifically designed to study the evolution of genome archirecture and the effect of chromosomal rearrangements. So its use is recommended for any question involving evolutionary phenomena on the scale of genome archirecture or the evolution of gene repertoire. Similarly, Aevol usage is recommended for any question involving direct of indirect effect of chromosomal rearrangements. On the opposite, Aevol is not designed for adressing complex complex demographic scenarii or to study scenarii of mutational accumulation on real sequences. For such question, we recommend other tools such as SLiM (Haller et al., 2026).

Structure of the model

Aevol is a forward-in-time evolutionary simulator that models the evolution of a population of organisms through a process of variation and selection. The design of the model focuses on the realism of the genome structure and of the mutational process. Aevol exists in three different flavors:

Standard Aevol : haploid circular genomes with binary nucleotides,
Eukaryote Aevol : diploid linear genomes with sexual reproduction,
4-Bases Aevol Aevol : haploid circular genomes with ACTG nucleotides and a translation process based on the standard genetic code.

Standard Aevol is the original version of the model and has been used in numerous studies (see Publications for a list of publications using Aevol). Eukaryote Aevol and 4-Bases Aevol have been added to the model in 2024 and shall be considered to be beta-versions at that stage. Don’t hesitate to contact us if you want to use these flavors.

In short, Aevol is made of three components:

a realistic Genotype-to-Phenotype-to-Fitness map: Each individual’s genomic sequence is decoded into a phenotype, and the corresponding fitness value is computed. Depending on the flavor, the genome can be either haploid or diploid and use a 4-bases alphabet sequence with the standard genetic code or a simplified 2-bases alphabet.
a Population of organisms: Each organsims has its own genome, phenotype and fitness. At each generation, the organisms compete to populate the next generation.
a process of Genome replication with variation: During reproduction, genomes can undergo several kinds of mutational events, including chromosomal rearrangements and local mutations. The seven modelled types of mutation entail 3 local mutations: substitutions, small insertion, small deletion, 2 balanced rearrangements (which conserve the genome size): inversions and translocations, and 2 unbalanced rearrangements: duplications and deletions. This allows the user to study the effect of chromosomal rearrangements and their interaction with other kinds of events such as substitutions and InDels.

Genome model and Genotype-to-Phenotype map

Each individual owns an explicit DNA sequence, composed of \(0\)s and \(1\)s for Standard and Eukaryote Aevol, and ATGCs in 4-Bases Aevol. Genomes are double-stranded, with complementary bases on each strand. Leading and Lagging strands are read in opposite directions.

On the sequence, promoters are recognized using a consensus patterns and mark the mRNA start sites. Their activity is inversely proportional to its distance from the consensus and determines the level of expression of all the protein-coding genes located on the corresponding mRNA. An mRNA ends at the first encountered terminator sequence, which are sequences that would form a stem-loop structure, alike to ρ-independent bacterial terminators.

On each mRNA, another layer of pattern recognition defines potential genes. The translation is initiated by a Shine-Dalgarno-like sequence followed by a Start codon, and continues until a Stop codon is reached. Each codon lying between the initiation and termination signals is translated into an abstract “amino-acid’’ using an artificial genetic code, thus giving rise to the protein’s primary sequence.

Each protein’s primary sequence is then “folded” into a mathematical function defining the protein’s contribution to the phenotype. This phenotypic contribution is defined as a triangular function. The x-axis represents functional traits, and the y-ayis the activation level of said-trait. The phenotype of an individual is the sum of all these protein functions.

Population and selection

The population follows a generational model: the whole population is replaced at each generation, and individuals compete to populate the next generation. This competition can either be global, or local on a pre-defined neighborhood, except in the Eukaryote model that only allows for global selection. Reproduction is asexual in Standard and 4-Bases Aevol while it is sexual in Eukaryote Aevol. In the latter case, there can be selfing at a predefined rate, and there is always one meiotic recombination upon gametes production.

Replication with mutations

Upon replication of an individual, mutations may occur in the sequence. They do not have a predefined fitness effect, as they are applied to positions drawn at random in the genome and the new phenotype is computed afterward.

Mutations can be chromosomal rearrangements (inversions, translocations, duplications, or deletions), or local events (point mutations, InDels). Their rates are per-base and are defined for each mutation type separately.

References

Haller, B. C., Ralph, P. L., & Messer, P. W. (2026). SLiM 5: Eco-evolutionary simulations across multiple chromosomes and full genomes. Molecular Biology and Evolution, 43(1), msaf313.

Ofria, C., & Wilke, C. O. (2004). Avida: A software platform for research in computational evolutionary biology. Artificial life, 10(2), 191-229.