User Documentation
Aevol is based on running and analyzing forward-in-time simulations. More specifically, any experiment with Aevol is divided into four main steps:
Users might be tempted to stop the experiments after the aevol_run
step.
However, the statistics of the best individuals along generations, although representative of the global trend
of simulation, must not be confused with the statistics of the ancestral lineage as mutational events
carried by the best individual may not get fixed on the long term.
We end this user documentation a fifth part explaining our Wild-Type methodology
First, create a directory for your future simulation and enter it. We recommand it to be empty, except for the parameter file param.in
. You can file some example parameters file in the examples
folder of your local aevol clone, or on our gitlab (https://gitlab.inria.fr/aevol/aevol/-/tree/aevol-9/examples).
You can edit any of the parameters in the file.
Once your parameter file is ready, you can prepare the simulation with the aevol_create
command:
|
|
This reads the parameter file and creates a population of organisms at generation zero according to the specified values. The most common parameters are described at the end of the page. An exhaustive list will be provided soon !
The second step consists of actually running the simulation. This is done with the aevol_run
command
that you must run from within the directory where aevol_create
was run.
The most common options are -b or --begin
to specify the timestep at which to start/resume the simulation
(this would be 0 if you’ve just run aevol_create
),
-e or --end
to specify the timestep at which to stop the simulation,
and -p or --parallel
to parallelize computations and hence speed up the simulation. Note that if the chosen end generation is not a multiple of the BACKUP_STEP
specified in the parameter file, all generations between the last backup step and the end will not be recorded.
|
|
aevol_run
outputs several data files: summary statistics regarding the best individual at each generation
(fitness, genome size, gene number…), backup files (to resume a simulation) and phylogenetic tree files.
Tree files store the “replication reports” that log all replications and mutational events.
Hence, by analyzing trees, one can precisely reconstruct the events that went to fixation along
the line of descent of the final population.
aevol_post_lineage
starts from a given population (e.g. the final population),
reads the tree files backward-in-time to reconstruct the line of descent,
and outputs the corresponding replication reports to a lineage file:
|
|
The output filename will be something like lineage-b000000000-e000010000-i3-r-1.ae
Finally, the fourth step is done with aevol_post_ancestor_stats
that computes the statistics of the ancestral lineage from a given lineage file and the corresponding initial population:
|
|
In addititon, one can compile all mutations that occured in the lineage by adding the -M
option.
|
|
The statistics will be recorded in the file stats/ancestor_stats/stats_ancestor_best.csv
, and the fixed mutations in stats/ancestor_stats/fixedmut-b000000000-e000010000-i3-r-1.out
.
An exhaustive list of the parameters is still in preparation, but here you can find the most common ones.
Parameter | Typical value | Description |
---|---|---|
STRAIN_NAME | basic_example | name of the simulation |
SEED | 7250909 | number used for the pseudorandom generator. It enables reproductibility |
INIT_POP_SIZE | 1024 | size of the population |
WORLD_SIZE | 32 32 | length and width of the grid. Their product must equal the population size |
INIT_METHOD | ONE_GOOD_GENE CLONE | how the initial genome is created |
CHROMOSOME_INITIAL_LENGTH | 5000 | |
SELECTION_SCHEME | fitness_proportionate 1000 | |
POINT_MUTATION_RATE | 1e-6 | |
SMALL_INSERTION_RATE | 1e-6 | |
SMALL_DELETION_RATE | 1e-6 | |
MAX_INDEL_SIZE | 6 | |
DUPLICATION_RATE | 1e-6 | |
DELETION_RATE | 1e-6 | |
TRANSLOCATION_RATE | 1e-6 | |
INVERSION_RATE | 1e-6 | |
ENV_SAMPLING | 300 | |
ENV_ADD_GAUSSIAN | 1.2 0.52 0.12 | |
MAX_TRIANGLE_WIDTH | 0.033333333 | |
BACKUP_STEP | 1000 | |
TREE_STEP | 1000 | |
RECORD_TREE | true | |
ENV_AXIS_FEATURES | METABOLISM |
The basic usage of Aevol, consist in testing the effect of different parameters (mutation rate, population size, pleiotropy level, …) on evolution, starting from ``naive’’ individuals.
In this case, aevol_create
generates random sequences of a predefined length (typically 5,000 bp)
until it finds a genome that has a better fitness than that of a gene-less genome.
This approach enables to study evolution when starting far from the fitness optimum.
However, in that case the evolutionary dynamics is strongly dominated by genes recruitment,
with massive genome size variation, hence putting the emphasis on a very specific evolutionary dynamics.
If one wishes to study more subtle effects, this basic usage is not appropriate and one can turn to a more advanced experimental design based on “Wild-Typing”.
Once populations have evolved for a sufficiently long time (from a few hundred thousand generations up to millions of generations depending on the parameters, see below for details) under stable evolutionary conditions, individuals own
a stable set of genes and are well adapted to their environments.
Wild-Typing'' then consists in extracting one or more individuals in the coalescent lineage of the final population, and use these individuals as
Wild-Types’’ to initiate new evolution experiments,
where one can change one or more of the parameters.
Wild-Typing allows studying the response of a well-adapted organism to different types of perturbations, and thus to analyze evolutionary trajectories of more biologically realistic scenarios.
Unfortunately it is impossible to answer this question with a magic number that would fit any setup. This is mainly due to the speed of evolution varying considerably depending on the parameters of a simulation (e.g. mutation rates, selection scheme and population size).
To determine what a “sufficiently long time” might be for a particular setup, we recommend running one or more test simulations and regularly checking the data to search for a convergence. This allows to calibrate the number of generations needed for an experimental campaign.
Once the simulations are complete, the general characteristics of the ancestors are available (genome size, gene number, coding proportion, etc.), as well as the list of all fixed mutations with their types, loci, and effects on fitness. Now, the ultimate objective is to decipher the relative role of the different evolutionary forces (direct and indirect selection, drift, and the different mutational events – local events, balanced and unbalanced chromosomal rearrangements) on the observed evolutionary dynamics.
Aevol provides several tools to help the user analyze the individuals along the line of descent by estimating their robustness, evolvability and distribution of the fitness effect (DFE) for all types of mutation. To this end, it generates large numbers of independent offspring and, by analyzing the fitness of this offspring, computes the robustness and the evolvability of the ancestors. Similarly, Aevol can generate and analyze single-mutant offspring to estimate the DFE and the mutational robustness for any type of mutation.
A list of the most used post-evolution analyzes tools is provided below:
aevol_post_ancestor_extract
aevel_post_ancestor_mutagenesis
aevol_post_ancestor_robustness
aevol_post_mutagenesis
aevol_post_neutral_mut_acc