User Documentation

Introduction

Aevol is a forward-in-time evolutionary simulator that simulates the evolution of a population of organisms through a process of variation and selection. The design of the model focuses on the realism of the genome structure and of the mutational process: the mutations affect directly the sequence, without any a priori fitness effect. Aevol can therefore be used to decipher the effect of different operators or processes on genome evolution.

To run Aevol, you must first initiate a population of organisms with the desired parameters (Sections Parameter File and Initiate a Simulation), and then run the simulation for the desired number of steps (Section Run a simulation). The simulations can be further analyzed afterward with dedicated tools (Section Post-Treatments).

Aevol exists in three flavors: Standard, 4-Bases and Eukaryote. Most of the model is identical across the different flavors, and variations will be presented explicitly throughout the model presentation. In short:

In Standard Aevol, each organism is asexual, haploid, and owns a single circular chromosome. The genome is encoded as a double-strand binary string. This is inspired by prokaryotic organisms
In 4-Bases Aevol, the genome is not binary but encoded with $4$ letters. That changes the genotype-to-phenotype map, but not the rest of the model.
In Eukaryote Aevol, each organism owns two linear chromosomes, and the reproduction is sexual and includes a meiotic recombination event. The genome is binary.

Note: although the 4-Bases and Eukaryote versions should work together, this has not been sufficiently tested yet.

Please note it is assumed you have fully installed Aevol on your computer and that the executables can be run directly from the command line. If this is not the case, you can use the full path to the Aevol executables instead of their name. E.g. if you have built aevol in /home/login/aevol/build, you can use:

/home/login/aevol/build/bin/aevol_2b_create --help # beware of the additional bin level

instead of

aevol_2b_create --help

Please also note that each independent Aevol simulation should be run in its own separate directory.

Parameter File

To run a simulation, you will generally want to provide a parameter file to specify the experimental conditions for the run (it is also possible to use default parameters, but it is not recommended). A parameter file comprises one keyword parameter per line with its arguments, separated by spaces. Additional comments should be preceded with a # sign. Examples are provided in the example directory: example directory on the GitLab.

Parameter name	Default value(s)	Description
SEED	$5,000$	Seed used to initialize the random number generators
WORLD_SIZE	$32 \times 32$	Width and height of the world toroidal grid. It gives the total number of individuals and their geographic structure
CHROMOSOME_INITIAL_LENGTH	$5,000$	Used only when no chromosome file is provided: initial length of the chromosome to generate
SELECTION_SCOPE	local $3 \times 3$	Type of selection scope (local or global), and in case of a local selection, width and height of the patch on which the local competition is done
SELECTION_SCHEME	fitness_proportionate 1000	Selection method and associated parameter
POINT_MUTATION_RATE	$5 \times 10^{-5}$	Per base substitution rate
SMALL_INSERTION_RATE	$5 \times 10^{-5}$	Per base small insertion rate
SMALL_DELETION_RATE	$5 \times 10^{-5}$	Per base small deletion rate
DUPLICATION_RATE	$5 \times 10^{-5}$	Per base duplication rate
DELETION_RATE	$5 \times 10^{-5}$	Per base deletion rate
TRANSLOCATION_RATE	$5 \times 10^{-5}$	Per base translocation rate
INVERSION_RATE	$5 \times 10^{-5}$	Per base inversion rate
MAX_INDEL_SIZE	6	Maximal size of the small deletions and small insertions
ENV_ADD_GAUSSIAN		Add a Gaussian component to the phenotypic target
MAX_TRIANGLE_WIDTH $^{1}$	$0.033333333$	Maximum width of the metabolic contribution of a gene to the phenotype (~level of pleiotropy)
CHECKPOINT_STEP	$1,000$	Interval between 2 checkpoints
RECORD_TREE	ON $1,000$	Whether to record the genealogical trees (containing all the mutational events) and at which interval
STATS_BEST	ON $1$	Whether to record statistics about the best individual and at which interval
STATS_POP	ON $1$	Whether to record statistics about the whole population and at which interval

(1): Note that MAX_TRIANGLE_WIDTH is a scaling factor for the $w$ parameter of a protein (see this section of the model description)

Some parameters are specific to Eukaryote Aevol:

Parameter name	Default value(s)	Description
SELFING_RATE	$0$	Probability of autofecondation at the reproduction event
ALIGN_SCORE		Minimal alignment score to find to perform a meiotic recombination

Initiate a simulation (aevol_create)

Warning

Any new simulation must be run in a new directory.

There are two main ways to initiate a simulation: from scratch, using a randomly generated initial genome, or providing a sequence (usually a WildType).

From scratch

When creating a new simulation from scratch, a simple bootstrapping method is used to generate the initial genome: genomes whose corresponding fitness is lower than that of a genome with no genes are discarded. This implies that the generated genome codes for at least one beneficial gene.

For Standard Aevol

aevol_2b_create parameter_file.in

For 4-Bases Aevol

aevol_4b_create parameter_file.in

For Eukaryote Aevol

aevol_eukaryote_2b_create parameter_file.in

For Eukaryote Aevol, the current recommendation is to use the provided Wild-Types

Starting from scratch generally results in a single functional chromosome and the other one empty. This is due to dosage imbalance when duplicating the first genes and a strong founding effect. The recommended way to bootstrap a eukaryotic run is to generate a haploid organism (with the Standard version of the model) with a halved phenotypic target, and then perform a whole genome duplication by creating a second copy of the obtained chromosome. However, going from a circular chromosome to a linear chromosome may break essential genes and reduce fitness in the process.

From a WildType

Note that example sequence files with pre-evolved organisms are provided in the example directory.

For Standard Aevol

aevol_2b_create parameter_file.in --fasta sequence_file.fa

For 4-Bases Aevol

aevol_4b_create parameter_file.in --fasta sequence_file.fa

For Eukaryote Aevol

aevol_eukaryote_2b_create parameter_file.in --fasta sequence_file.fa

Usage of aevol_create (output of `aevol_create --help`)

aevol_create: create an experiment with setup as specified in PARAM_FILE.

Usage : aevol_create -h or --help
   or : aevol_create -V or --version
   or : aevol_create [PARAM_FILE] [--fasta SEQ_FILE]

Options
  -h, --help
	print this help, then exit
  -V, --version
	print version number, then exit
  --fasta SEQUENCE_FILE
	load sequences from given file (in fasta format) instead of generating it

Run a simulation (aevol_run)

For Standard Aevol

aevol_2b_run

For 4-Bases Aevol

aevol_4b_run

For Eukaryote Aevol

aevol_eukaryote_2b_run

Usage of aevol_run (output of `aevol_run --help`)

aevol_run: run an aevol simulation.

Usage : aevol_run -h or --help
   or : aevol_run -V or --version
   or : aevol_run [-b TIMESTEP] [-e TIMESTEP] [-p NB_THREADS] [-v]

Options
  -h, --help
	print this help, then exit
  -V, --version
	print version number, then exit
  -b, --begin TIMESTEP
	specify time t0 to resume simulation at (default read in last_gener.txt)
  -e, --end TIMESTEP
	specify time of the end of the simulation
	(if omitted, run for 1000 timesteps)
  -p, --parallel NB_THREADS
	run on NB_THREADS threads (use -1 for system default)
  -v, --verbose
	be verbose
  --ui-output-dir UI_OUTDIR
	directory in which to output data for the UI
  --ui-output-frequency NB_GENER
	frequency at which to output data for the UI

Post-Treatments

Reconstruct a lineage

The lineage of a given individual can be reconstructed from the tree files, provided these tree files have been saved at runtime (see Section Parameter File).

Usage of aevol_post_lineage (output of `aevol_post_lineage --help`)

aevol_post_lineage:
	Reconstruct the lineage of a given individual from the tree files

Usage : aevol_post_lineage -h or --help
   or : aevol_post_lineage -V or --version
   or : aevol_post_lineage [-b TIMESTEP] [-e TIMESTEP] [-I INDEX] [-F] [-v]

Options
  -h, --help
	print this help, then exit
  -V, --version
	print version number, then exit
  -b, --begin TIMESTEP
	specify time t0 up to which to reconstruct the lineage
  -e, --end TIMESTEP
	specify time t_end of the indiv whose lineage is to be reconstructed
  -I, --index INDEX
	specify the index of the indiv whose lineage is to be reconstructed
	(default: treat only the best)
  -F, --full-check
	perform genome checks whenever possible
  -v, --verbose
	be verbose

Note that running aevol_post_2b_lineage with no options will reconstruct the lineage for the whole simulation, starting from the best individual of the final generation (the beginning is $0$, and the end is the last computed generation).

Examples

# Reconstruct the lineage of the best individual at generation 1000 (Standard Aevol)
aevol_2b_post_lineage -b 0 -e 1000

# Reconstruct the lineage of the individual with index 42 at generation 1000,
# starting at generation 500 (4-Bases Aevol)
aevol_4b_post_lineage -b 500 -e 1000 -I 42

Compute stats on a lineage

This requires having reconstructed a lineage (see reconstruct a lineage)

Usage of aevol_post_ancestor_stats (output of `aevol_post_ancestor_stats --help`)

aevol_post_ancestor_stats:
	Compute statistics on ancestry described in provided lineage file.

Usage : aevol_post_ancestor_stats -h or --help
   or : aevol_post_ancestor_stats -V or --version
   or : aevol_post_ancestor_stats [-FMv] [-p NB_THREADS] LINEAGE_FILE PARAM_FILE

Options
  -h, --help
	print this help, then exit
  -V, --version
	print version number, then exit
  -F, --full-check
	perform genome checks whenever possible
  -M, --trace-mutations
	outputs the fixed mutations (in a separate file)
  -v, --verbose
	be verbose
  -p, --parallel NB_THREADS
	run on NB_THREADS threads (use -1 for system default)

Examples

# Compute stats along the provided lineage (Standard Aevol)
aevol_2b_post_ancestor_stats LINEAGE_FILE

# Compute stats and trace mutations along the provided lineage (4-Bases Aevol)
aevol_4b_post_ancestor_stats -M LINEAGE_FILE

Parameter name	Default value(s)	Description
SEED	\(5,000\)	Seed used to initialize the random number generators
WORLD_SIZE	\(32 \times 32\)	Width and height of the world toroidal grid. It gives the total number of individuals and their geographic structure
CHROMOSOME_INITIAL_LENGTH	\(5,000\)	Used only when no chromosome file is provided: initial length of the chromosome to generate
SELECTION_SCOPE	local \(3 \times 3\)	Type of selection scope (local or global), and in case of a local selection, width and height of the patch on which the local competition is done
SELECTION_SCHEME	fitness_proportionate 1000	Selection method and associated parameter
POINT_MUTATION_RATE	\(5 \times 10^{-5}\)	Per base substitution rate
SMALL_INSERTION_RATE	\(5 \times 10^{-5}\)	Per base small insertion rate
SMALL_DELETION_RATE	\(5 \times 10^{-5}\)	Per base small deletion rate
DUPLICATION_RATE	\(5 \times 10^{-5}\)	Per base duplication rate
DELETION_RATE	\(5 \times 10^{-5}\)	Per base deletion rate
TRANSLOCATION_RATE	\(5 \times 10^{-5}\)	Per base translocation rate
INVERSION_RATE	\(5 \times 10^{-5}\)	Per base inversion rate
MAX_INDEL_SIZE	6	Maximal size of the small deletions and small insertions
ENV_ADD_GAUSSIAN		Add a Gaussian component to the phenotypic target
MAX_TRIANGLE_WIDTH \(^{1}\)	\(0.033333333\)	Maximum width of the metabolic contribution of a gene to the phenotype (~level of pleiotropy)
CHECKPOINT_STEP	\(1,000\)	Interval between 2 checkpoints
RECORD_TREE	ON \(1,000\)	Whether to record the genealogical trees (containing all the mutational events) and at which interval
STATS_BEST	ON \(1\)	Whether to record statistics about the best individual and at which interval
STATS_POP	ON \(1\)	Whether to record statistics about the whole population and at which interval

Parameter name	Default value(s)	Description
SELFING_RATE	\(0\)	Probability of autofecondation at the reproduction event
ALIGN_SCORE		Minimal alignment score to find to perform a meiotic recombination

User Documentation

Introduction

Parameter File

Initiate a simulation (aevol_create)

From scratch

For Standard Aevol

For 4-Bases Aevol

For Eukaryote Aevol

From a WildType

For Standard Aevol

For 4-Bases Aevol

For Eukaryote Aevol

Usage of aevol_create (output of aevol_create --help)

Run a simulation (aevol_run)

For Standard Aevol

For 4-Bases Aevol

For Eukaryote Aevol

Usage of aevol_run (output of aevol_run --help)

Post-Treatments

Reconstruct a lineage

Usage of aevol_post_lineage (output of aevol_post_lineage --help)

Examples

Compute stats on a lineage

Usage of aevol_post_ancestor_stats (output of aevol_post_ancestor_stats --help)

Examples

Other post-evolution analyses

Usage of aevol_create (output of `aevol_create --help`)

Usage of aevol_run (output of `aevol_run --help`)

Usage of aevol_post_lineage (output of `aevol_post_lineage --help`)

Usage of aevol_post_ancestor_stats (output of `aevol_post_ancestor_stats --help`)