Aevol
Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Back to homepage

User Documentation

Introduction

Aevol is based on running and analyzing forward-in-time simulations. More specifically, any experiment with Aevol is divided into four main steps:

  1. Prepare a simulation
  2. Run a simulation
  3. Reconstruct a lineage
  4. Compute statistics on that lineage

Users might be tempted to stop the experiments after the aevol_run step. However, the statistics of the best individuals along generations, although representative of the global trend of simulation, must not be confused with the statistics of the ancestral lineage as mutational events carried by the best individual may not get fixed on the long term.

We end this user documentation a fifth part explaining our Wild-Type methodology

1. Prepare a simulation

First, create a directory for your future simulation and enter it. We recommand it to be empty, except for the parameter file param.in. You can file some example parameters file in the examples folder of your local aevol clone, or on our gitlab (https://gitlab.inria.fr/aevol/aevol/-/tree/aevol-9/examples). You can edit any of the parameters in the file. Once your parameter file is ready, you can prepare the simulation with the aevol_create command:

1
aevol_create

This reads the parameter file and creates a population of organisms at generation zero according to the specified values. The most common parameters are described at the end of the page. An exhaustive list will be provided soon !

2. Run a simulation

The second step consists of actually running the simulation. This is done with the aevol_run command that you must run from within the directory where aevol_create was run. The most common options are -b or --begin to specify the timestep at which to start/resume the simulation (this would be 0 if you’ve just run aevol_create), -e or --end to specify the timestep at which to stop the simulation, and -p or --parallel to parallelize computations and hence speed up the simulation. Note that if the chosen end generation is not a multiple of the BACKUP_STEP specified in the parameter file, all generations between the last backup step and the end will not be recorded.

2
aevol_run --begin 0 --end 10000 --parallel 8

aevol_run outputs several data files: summary statistics regarding the best individual at each generation (fitness, genome size, gene number…), backup files (to resume a simulation) and phylogenetic tree files. Tree files store the “replication reports” that log all replications and mutational events. Hence, by analyzing trees, one can precisely reconstruct the events that went to fixation along the line of descent of the final population.

3. Reconstruct a lineage

aevol_post_lineage starts from a given population (e.g. the final population), reads the tree files backward-in-time to reconstruct the line of descent, and outputs the corresponding replication reports to a lineage file:

3
aevol_post_lineage --begin 0 --end 10000

The output filename will be something like lineage-b000000000-e000010000-i3-r-1.ae

4. Compute statistics on that lineage

Finally, the fourth step is done with aevol_post_ancestor_stats that computes the statistics of the ancestral lineage from a given lineage file and the corresponding initial population:

4
aevol_post_ancestor_stats lineage-b000000000-e000010000-i3-r-1.ae

In addititon, one can compile all mutations that occured in the lineage by adding the -M option.

4
aevol_post_ancestor_stats -M lineage-b000000000-e000010000-i3-r-1.ae

The statistics will be recorded in the file stats/ancestor_stats/stats_ancestor_best.csv, and the fixed mutations in stats/ancestor_stats/fixedmut-b000000000-e000010000-i3-r-1.out.

List of parameters

An exhaustive list of the parameters is still in preparation, but here you can find the most common ones.

Parameter Typical value Description
STRAIN_NAME basic_example name of the simulation
SEED 7250909 number used for the pseudorandom generator. It enables reproductibility
INIT_POP_SIZE 1024 size of the population
WORLD_SIZE 32 32 length and width of the grid. Their product must equal the population size
INIT_METHOD ONE_GOOD_GENE CLONE how the initial genome is created
CHROMOSOME_INITIAL_LENGTH 5000
SELECTION_SCHEME fitness_proportionate 1000
POINT_MUTATION_RATE 1e-6
SMALL_INSERTION_RATE 1e-6
SMALL_DELETION_RATE 1e-6
MAX_INDEL_SIZE 6
DUPLICATION_RATE 1e-6
DELETION_RATE 1e-6
TRANSLOCATION_RATE 1e-6
INVERSION_RATE 1e-6
ENV_SAMPLING 300
ENV_ADD_GAUSSIAN 1.2 0.52 0.12
MAX_TRIANGLE_WIDTH 0.033333333
BACKUP_STEP 1000
TREE_STEP 1000
RECORD_TREE true
ENV_AXIS_FEATURES METABOLISM

Advanced usage: Wild-Typing

The basic usage of Aevol, consist in testing the effect of different parameters (mutation rate, population size, pleiotropy level, …) on evolution, starting from ``naive’’ individuals.

In this case, aevol_create generates random sequences of a predefined length (typically 5,000 bp) until it finds a genome that has a better fitness than that of a gene-less genome. This approach enables to study evolution when starting far from the fitness optimum. However, in that case the evolutionary dynamics is strongly dominated by genes recruitment, with massive genome size variation, hence putting the emphasis on a very specific evolutionary dynamics.

If one wishes to study more subtle effects, this basic usage is not appropriate and one can turn to a more advanced experimental design based on “Wild-Typing”.

Once populations have evolved for a sufficiently long time (from a few hundred thousand generations up to millions of generations depending on the parameters, see below for details) under stable evolutionary conditions, individuals own a stable set of genes and are well adapted to their environments. Wild-Typing'' then consists in extracting one or more individuals in the coalescent lineage of the final population, and use these individuals as Wild-Types’’ to initiate new evolution experiments, where one can change one or more of the parameters.

Wild-Typing allows studying the response of a well-adapted organism to different types of perturbations, and thus to analyze evolutionary trajectories of more biologically realistic scenarios.

Determining the number of generations to run

Unfortunately it is impossible to answer this question with a magic number that would fit any setup. This is mainly due to the speed of evolution varying considerably depending on the parameters of a simulation (e.g. mutation rates, selection scheme and population size).

To determine what a “sufficiently long time” might be for a particular setup, we recommend running one or more test simulations and regularly checking the data to search for a convergence. This allows to calibrate the number of generations needed for an experimental campaign.

Post-evolution analyzes

Once the simulations are complete, the general characteristics of the ancestors are available (genome size, gene number, coding proportion, etc.), as well as the list of all fixed mutations with their types, loci, and effects on fitness. Now, the ultimate objective is to decipher the relative role of the different evolutionary forces (direct and indirect selection, drift, and the different mutational events – local events, balanced and unbalanced chromosomal rearrangements) on the observed evolutionary dynamics.

Aevol provides several tools to help the user analyze the individuals along the line of descent by estimating their robustness, evolvability and distribution of the fitness effect (DFE) for all types of mutation. To this end, it generates large numbers of independent offspring and, by analyzing the fitness of this offspring, computes the robustness and the evolvability of the ancestors. Similarly, Aevol can generate and analyze single-mutant offspring to estimate the DFE and the mutational robustness for any type of mutation.

A list of the most used post-evolution analyzes tools is provided below:

  • aevol_post_ancestor_extract
  • aevel_post_ancestor_mutagenesis
  • aevol_post_ancestor_robustness
  • aevol_post_mutagenesis
  • aevol_post_neutral_mut_acc