3.1.  Initialization functions

(integer$)initializeAncestralNucleotides(is sequence)

This function, which may be called only in nucleotide-based models, supplies an ancestral nucleotide sequence for the model.  The sequence parameter may be an integer vector providing nucleotide values (A=0, C=1, G=2, T=3), or a string vector providing single-character nucleotides ("A", "C", "G", "T"), or a singleton string providing the sequence as one string ("ACGT..."), or a singleton string providing the filesystem path of a FASTA file which will be read in to provide the sequence (if the file contains than one sequence, the first sequence will be used).  Only A/C/G/T nucleotide values may be provided; other symbols, such as those for amino acids, gaps, or nucleotides of uncertain identity, are not allowed.  The two semantic meanings of sequence that involve a singleton string value are distinguished heuristically; a singleton string that contains only the letters ACGT will be assumed to be a nucleotide sequence rather than a filename.  The length of the ancestral sequence is returned.

A utility function, randomNucleotides(), is provided by SLiM to assist in generating simple random nucleotide sequences.

(object<Chromosome>$)initializeChromosome(integer$ id, [Ni$ length = NULL], [string$ type = "A"], [Ns$ symbol = NULL], [Ns$ name = NULL], [integer$ mutationRuns = 0])

Calling this function, added in SLiM 5, initiates the configuration of a chromosome in the species being initialized.  The new Chromosome object is returned, but it is still under construction and will error if used; see below for details.  That chromosome is then the “focal chromosome” for subsequent genetic initialization functions – specifically, for initializeAncestralNucleotides(), initializeGeneConversion(), initializeGenomicElement(), initializeHotspotMap(), initializeMutationRate(), and initializeRecombinationRate().  If you wish to call initializeChromosome() at all (which is not required), you must call it before calling any of those genetic initialization functions, so that the focal chromosome is created before being configured further; otherwise, SLiM will assume that you want a default single-chromosome model, and when initializeChromosome() is called later (contradicting that assumption), an error will result.

Furthermore, there are some other initialization functions must be called before initializeChromosome() if they are called at all – specifically, initializeSex(), initializeTreeSeq(), initializeSpecies(), and initializeSLiMOptions().  This is so that initializeChromosome() knows the context within which the new chromosome is to be created; if these methods have not been called when initializeChromosome() is called, the default context is assumed (non-sexual, no tree-sequence recording, single-species, non-nucleotide-based), and an error will result downstream if one of those functions is later called (indicating that those assumptions might be incorrect).

The parameters to initializeChromosome() configure the chromosome created.  They will be discussed out of order here, because that order of presentation will, I hope, be clearer.

There are three parameters that in some way identify the chromosome.  First, the required id parameter provides an integer identifier for the chromosome, which can be used to look up the chromosome later in the simulation; it can be any non-negative integer value, but must be unique within the species (two chromosomes in the same species cannot have the same id).  Often it is an empirical chromosome number, for convenience and clarity; if modeling human chromosome 7, for example, you might provide 7.  Second, the symbol parameter provides a string identifier for the chromosome, which can also be used to look up the chromosome later in the simulation.  If NULL (the default) is passed for symbol, the chromosome’s default symbol value will be the string version of its id ("7" for an id of 7, for example).  The chromosome’s symbol value will be used to identify the chromosome in output – in VCF output, for example, and in SLiMgui.  It must be non-empty (not ""), no more than five characters long, and unique within the species.  Third, the name parameter can be any string value; if NULL (the default) is passed, the name value will be "".  The name is not used by SLiM, and can be used in any way you wish.

The length parameter sets the length, in base positions, of the chromosome, and must either be NULL, or an integer greater than or equal to 1.  If length is NULL, the length of the chromosome will be calculated after all initialize() callbacks have been called, as the maximum position referenced by the chromosome’s genomic elements, recombination map, mutation rate map, and (in nucleotide-based models) hotspot map; in other words, the chromosome will be sized to encompass all of the things it contains (which is also the behavior of the implicitly defined chromosome if initializeChromosome() is not called).  Otherwise – if length is specified with an integer value – the chromosome’s length will be fixed at that value, and the last valid base position in the chromosome will be length-1.  Attempting to add a genomic element or a mutation after the last position will raise an error.  Similarly, the last position of the chromosome must match the last position specified for recombination, mutation, and hotspot maps for that chromosome, but not all positions on a chromosome have to actually be used in the model (i.e., not all positions must be covered by a genomic element).

The type parameter specifies the type of chromosome to be created.  There are numerous options, and they are somewhat complex.  They are discussed in more detail in the documentation for class Chromosome, particularly their specific patterns of inheritance; but they are briefly summarized here for quick reference.  Note that “–“ below indicates a null haplosome.  First of all, in hermaphroditic models type will generally be one of:

"A" (autosome), the default, specifying a diploid autosomal chromosome.

"H" (haploid), specifying a haploid autosomal chromosome that recombines in biparental crosses.

Some sex-chromosome types are supported only in sexual models:

"X" (X), specifying an X chromosome that is diploid (XX) in females, haploid (X–) in males.

"Y" (Y), specifying a Y chromosome that is haploid (Y) in males, absent (–) in females.

"Z" (Z), specifying a Z chromosome that is diploid (ZZ) in males, haploid (–Z) in females.

"W" (W), specifying a W chromosome that is haploid (W) in females, absent (–) in males.

And there are some haploid chromosome types that are also supported only in sexual models:

"HF" (haploid female-inherited), specifying a haploid autosomal chromosome that is inherited by both sexes from the first (female) parent in biparental crosses.

"FL" (female line), specifying a haploid autosomal chromosome that is inherited only by females, from the female parent, and is represented by a null haplosome in males.

"HM" (haploid male-inherited), specifying a haploid autosomal chromosome that is inherited by both sexes from the second (male) parent in biparental crosses.

"ML" (male line), specifying a haploid autosomal chromosome that is inherited only by males, from the male parent, and is represented by a null haplosome in females.

Finally, two additional values of type, "H-" and "-Y", are supported for backward compatibility (not intended for use in new models).  They are discussed in the Chromosome documentation.

The mutationRuns parameter specifies how many mutation runs the chromosome should use.  Internally, SLiM divides haplosomes into a sequence of consecutive mutation runs, allowing more efficient internal computations.  The optimal mutation run length is short enough that each mutation run is relatively unlikely to be modified by mutation/recombination events when inherited, but long enough that each mutation run is likely to contain a relatively large number of mutations; these priorities are in tension, so an intermediate balance between them is generally optimal.  The optimal number of mutation runs will depend on the model’s details, and may also depend upon the machine and even the compiler used to build SLiM.  If the mutationRuns parameter is not 0, SLiM will use the value given as the number of mutation runs inside Haplosome objects for the chromosome.  If mutationRuns is 0 (the default), then the behavior depends upon a parameter to the initializeSLiMOptions() function, doMutationRunExperiments.  If that flag is F, the behavior here is as if mutationRuns=1 had been passed: one mutation run will be used, and mutation run experiments will not be conducted.  If that flag is T (the default), then for mutationRuns=0 SLiM will conduct experiments at runtime, using different mutation run counts, to try to determine the number of mutation runs that produces the best performance.  The value that SLiM’s experiments determine may not be optimal, however, and in any case there is some overhead associated with conducting these experiments; for maximal performance it can thus be beneficial to determine the true optimal value for the simulation yourself, and set it explicitly using this parameter. Specifying the number of mutation runs is an advanced technique, but in some cases it can improve performance significantly.

The order in which initializeChromosome() calls are made is generally unimportant, since the chromosomes assort independently of each other anyway, but SLiM will preserve the order in which they were defined for you (for the chromosomes property of Species, for display in SLiMgui, for writing out to VCF, and so forth).  All of the above types of chromosomes can be defined any number of times; you can have any number of autosomal chromosomes, for example.  In a sexual model you could even have multiple defined sex chromosomes – not in the sense of a female being XX, but in the sense of a female being X1X1X2X2, where X1 and X2 are two different kinds of X chromosome.  Similarly, you could define both an X and a Z for a species, if you wish; each would segregate correctly according to the sex of the offspring.  In sexual models in SLiM the sex of an offspring is determined randomly or given by the user in script; it is not a function of the sex chromosomes present in the individual, although the sex chromosomes present in the individual will correlate with sex.  In other words, SLiM does not know and does not care what sex-determination system the species is using; the chromosomes follow the sex, rather than the sex following the chromosomes.  This should allow any sex-determination system to be modeled, even if it is unusual, non-genetic, etc.

As stated above, the new Chromosome object is returned by this call, but it is still under construction so most of its methods and properties will error.  It will remain in this state until initialize() callbacks have completed, and will then become active and usable.  Until that point, there are only a handful of uses that are guaranteed to be allowed: storing it in a variable; remembering it with defineConstant(); using the methods and properties of its superclasses, notably Dictionary; setting SLiMgui display-related properties such as colorSubstitution; getting and setting its tag property; and accessing those of its properties that were passed to the initializeChromosome() call, specifically id, symbol, name, type, length, and lastPosition.  Its other properties, and all Chromosome methods, will raise an error while in this state.  This safeguard protects the new Chromosome object from being used while still in an inconsistent state.

(void)initializeGeneConversion(numeric$ nonCrossoverFraction, numeric$ meanLength, numeric$ simpleConversionFraction, [numeric$ bias = 0], [logical$ redrawLengthsOnFailure = F])

Calling this function switches the recombination model from a “simple crossover” model to a “double-stranded break (DSB)” model, and configures the details of the gene conversion tracts that will therefore be modeled.  The fraction of DSBs that will be modeled as non-crossover events is given by nonCrossoverFraction.  The mean length of gene conversion tracts (whether associated with crossover or non-crossover events) is given by meanLength; the actual extent of a gene conversion tract will be the sum of two independent draws from a geometric distribution with mean meanLength/2.  The fraction of gene conversion tracts that are modeled as “simple” is given by simpleConversionFraction; the remainder will be modeled as “complex”, involving repair of heteroduplex mismatches.  Finally, the GC bias during heteroduplex mismatch repair is given by bias, with the default of 0.0 indicating no bias, 1.0 indicating an absolute preference for G/C mutations over A/T mutations, and -1.0 indicating an absolute preference for A/T mutations over G/C mutations.  A non-zero bias may only be set in nucleotide-based models.  This function, and the way that gene conversion is modeled, fundamentally changed in SLiM 3.3.

Beginning in SLiM 4.1, the redrawLengthsOnFailure parameter can be used to modify the internal mechanics of layout of gene conversion tracts.  If it is F (the default, and the only behavior supported before SLiM 4.1), then if an attempt to lay out gene conversion tracts fails (because the tracts overlap each other, or overlap the start or end of the chromosome), SLiM will try again by drawing new positions for the tracts – essentially shuffling the tracts around to try to find positions for them that don’t overlap.  If redrawLengthsOnFailure is T, then if an attempt to lay out gene conversion tracts fails, SLiM will try again by drawing new lengths for the tracts, as well as new positions.  This makes it more likely that layout will succeed, but risks biasing the realized mean tract length downward from the requested mean length (since layout of long tracts is more likely fail due to overlap).  In either case, if SLiM attempts to lay out gene conversion tracts 100 times without success, an error will result.  That error indicates that the specified constraints for gene conversion are difficult to satisfy – tracts may commonly be so long that it is difficult or impossible to find an acceptable layout for them within the specified chromosome length.  Setting redrawLengthsOnFailure to T may mitigate this problem, at the price of biasing the mean tract length downward as discussed.

(object<GenomicElement>)initializeGenomicElement(io<GenomicElementType> genomicElementType, [Ni start = NULL], [Ni end = NULL])

Add a genomic element to the chromosome at initialization time.  The start and end parameters give the first and last base positions to be spanned by the new genomic element.  The new element will be based upon the genomic element type identified by genomicElementType, which can be either an integer, representing the ID of the desired element type, or an object of type GenomicElementType specified directly.

Beginning in SLiM 3.3, this function is vectorized: the genomicElementType, start, and end parameters do not have to be singletons.  In particular, start and end may be of any length, but must be equal in length; each start/end element pair will generate one new genomic element spanning the given base positions.  In this case, genomicElementType may still be a singleton, providing the genomic element type to be used for all of the new genomic elements, or it may be equal in length to start and end, providing an independent genomic element type for each new element.  When adding a large number of genomic elements, it will be much faster to add them in order of ascending position with a vectorized call.

Beginning in SLiM 5, passing NULL for start and end is allowed by initializeGenomicElement(), but only in one specific case: if the focal chromosome being configured was explicitly defined with initializeChromosome(), and that focal chromosome was given an explicit length (rather than a length of NULL).  In that case, start and end may be NULL (both of them, not just one of them), indicating that the genomic element created should span the entire length of the focal chromosome.  Since NULL is now the default value for start and end, this makes this common configuration very simple to set up.

The return value provides the genomic element(s) created by the call, in the order in which they were specified in the parameters to initializeGenomicElement().

(object<GenomicElementType>$)initializeGenomicElementType(is$ id, io<MutationType> mutationTypes, numeric proportions, [Nf mutationMatrix = NULL])

Add a genomic element type at initialization time.  The id must not already be used for any genomic element type in the simulation.  The mutationTypes vector identifies the mutation types used by the genomic element, and the proportions vector should be of equal length, specifying the relative proportion of mutations that will be drawn from the corresponding mutation type (proportions do not need to add up to one; they are interpreted relatively).  The id parameter may be either an integer giving the ID of the new genomic element type, or a string giving the name of the new genomic element type (such as "g5" to specify an ID of 5).  The mutationTypes parameter may be either an integer vector representing the IDs of the desired mutation types, or an object vector of MutationType elements specified directly.  The global symbol for the new genomic element type is immediately available; the return value also provides the new object.

The mutationMatrix parameter is NULL by default, and in non-nucleotide-based models it must be NULL.  In nucleotide-based models, on the other hand, it must be non-NULL, and therefore must be supplied.  In that case, mutationMatrix should take one of two standard forms.  For sequence-based mutation rates that depend upon only the single nucleotide at a mutation site, mutationMatrix should be a 4×4 float matrix, specifying mutation rates for an existing nucleotide state (rows from 03 representing A/C/G/T) to each of the four possible derived nucleotide states (columns, with the same meaning).  The mutation rates in this matrix are absolute rates, per nucleotide per gamete; they will be used by SLiM directly unless they are multiplied by a factor from the hotspot map (see initializeHotspotMap()).  Rates in mutationMatrix that involve the mutation of a nucleotide to itself (A to A, C to C, etc.) are not used by SLiM and must be 0.0 by convention.

It is important to note that the order of the rows and columns used in SLiM, A/C/G/T, is not a universal convention; other sources will present substitution-rate/transition-rate matrices using different conventions, and so care must be taken when importing such matrices into SLiM.

For sequence-based mutation rates that depend upon the trinucleotide sequence centered upon a mutation site (the adjacent bases to the left and right, in other words, as well as the mutating nucleotide itself), mutationMatrix should be a 64×4 float matrix, specifying mutation rates for the central nucleotide of an existing trinucleotide sequence (rows from 063, representing codons as described in the documentation for the ancestralNucleotides() method of Chromosome) to each of the four possible derived nucleotide states (columns from 03 for A/C/G/T as before).  Note that in every case it is the central nucleotide of the trinucleotide sequence that is mutating, but rates can be specified independently based upon the nucleotides in the first and third positions as well, with this type of mutation matrix.

Several helper functions are defined to construct common types of mutation matrices, such as mmJukesCantor() to create a mutation matrix for a Jukes–Cantor model.

(void)initializeHotspotMap(numeric multipliers, [Ni ends = NULL], [string$ sex = "*"])

In nucleotide-based models, set the mutation rate multiplier along the chromosome.  Nucleotide-based models define sequence-based mutation rates that are set up with the mutationMatrix parameter to initializeGenomicElementType().  If no hotspot map is specified by calling initializeHotspotMap(), a hotspot map with a multiplier of 1.0 across the whole chromosome is assumed (and so the sequence-based rates are the absolute mutation rates used by SLiM).  A hotspot map modifies the sequence-based rates by scaling them up in some regions, with multipliers greater than 1.0 (representing mutational hot spots), and/or scaling them down in some regions, with multipliers less than 1.0 (representing mutational cold spots).

There are two ways to call this function.  If the optional ends parameter is NULL (the default), then multipliers must be a singleton value that specifies a single multiplier to be used along the entire chromosome (typically 1.0, but not required to be).  If, on the other hand, ends is supplied, then multipliers and ends must be the same length, and the values in ends must be specified in ascending order.  In that case, multipliers and ends taken together specify the multipliers to be used along successive contiguous stretches of the chromosome, from beginning to end; the last position specified in ends should extend to the end of the chromosome (i.e. at least to the end of the last genomic element, if not further).

For example, if the following call is made:

initializeHotspotMap(c(1.0, 1.2), c(5000, 9999));

then the result is that the mutation rate multiplier for bases 0...5000 (inclusive) will be 1.0 (and so the specified sequence-based mutation rates will be used verbatim), and the multiplier for bases 5001...9999 (inclusive) will be 1.2 (and so the sequence-based mutation rates will be multiplied by 1.2 within the region).

Note that mutations are generated by SLiM only within genomic elements, regardless of the hotspot map.  In effect, the hotspot map given is intersected with the coverage area of the genomic elements defined; areas outside of any genomic element are given a multiplier of zero.  There is no harm in supplying a hotspot map that specifies multipliers for areas outside of the genomic elements defined; the excess information is simply not used.

If the optional sex parameter is "*" (the default), then the supplied hotspot map will be used for both sexes (which is the only option for hermaphroditic simulations).  In sexual simulations sex may be "M" or "F" instead, in which case the supplied hotspot map is used only for that sex (i.e., when generating a gamete from a parent of that sex).  In this case, two calls must be made to initializeHotspotMap(), one for each sex, even if a multiplier of 1.0 is desired for the other sex; no default hotspot map is supplied.

(object<InteractionType>$)initializeInteractionType(is$ id, string$ spatiality, [logical$ reciprocal = F], [numeric$ maxDistance = INF], [string$ sexSegregation = "**"])

Add an interaction type at initialization time.  The id must not already be used for any interaction type in the simulation.  The id parameter may be either an integer giving the ID of the new interaction type, or a string giving the name of the new interaction type (such as "i5" to specify an ID of 5).

The spatiality may be "", for non-spatial interactions (i.e., interactions that do not depend upon the distance between individuals); "x", "y", or "z" for one-dimensional interactions; "xy", "xz", or "yz" for two-dimensional interactions; or "xyz" for three-dimensional interactions.  The dimensions referenced by spatiality must be defined as spatial dimensions with initializeSLiMOptions(); if the simulation has dimensionality "xy", for example, then interactions in the simulation may have spatiality "", "x", "y", or "xy", but may not reference spatial dimension z and thus may not have spatiality "xz", "yz", or "xyz".  If no spatial dimensions have been configured, only non-spatial interactions may be defined.

The reciprocal flag may be T, in which case the interaction is guaranteed by the user to be reciprocal: whatever the interaction strength is for exerter B upon receiver A, it will be equal (in magnitude and sign) for exerter A upon receiver B.  In principle, this allows the InteractionType to reduce the amount of computation necessary by up to a factor of two (although it may or may not be used).  If reciprocal is F, the interaction is not guaranteed to be reciprocal and each interaction will be computed independently.  The built-in interaction formulas are all reciprocal, but if you implement an interaction() callback, you must consider whether the callback you have implemented preserves reciprocality or not.  For this reason, the default is reciprocal=F, so that bugs are not inadvertently introduced by an invalid assumption of reciprocality.  See below for a note regarding reciprocality in sexual simulations when using the sexSegregation flag.

The maxDistance parameter supplies the maximum distance over which interactions of this type will be evaluated; at greater distances, the interaction strength is considered to be zero (for efficiency).  The default value of maxDistance, INF (positive infinity), indicates that there is no maximum interaction distance; note that this can make some interaction queries much less efficient, and is therefore not recommended.  In SLiM 3.1 and later, a warning will be issued if a spatial interaction type is defined with no maximum distance to encourage a maximum distance to be defined.

The sexSegregation parameter governs the applicability of the interaction to each sex, in sexual simulations.  It does not affect distance calculations in any way; it only modifies the way in which interaction strengths are calculated.  The default, "**", implies that the interaction is felt by both sexes (the first character of the string value) and is exerted by both sexes (the second character of the string value).  Either or both characters may be M or F instead; for example, "MM" would indicate a male-male interaction, such as male-male competition, whereas "FM" would indicate an interaction influencing only female receivers that is influenced only by male exerters, such as male mating displays that influence female attraction.  This parameter may be set only to "**" unless sex has been enabled with initializeSex().  Note that a value of sexSegregation other than "**" may imply some degree of non-reciprocality, but it is not necessary to specify reciprocal to be F for this reason; SLiM will take the sex-segregation of the interaction into account for you.  The value of reciprocal may therefore be interpreted as meaning: in those cases, if any, in which A interacts with B and B interacts with A, is the interaction strength guaranteed to be the same in both directions?  The sexSegregation parameter is shorthand for setting sex constraints on the interaction type using the setConstraints() method; see that method for a more extensive set of constraints that may be used.

By default, the interaction strength is 1.0 for all interactions within maxDistance.  Often it is desirable to change the interaction function using setInteractionFunction(); modifying interaction strengths can also be achieved with interaction() callbacks if necessary.  In any case, interactions beyond maxDistance always have a strength of 0.0, and the interaction strength of an individual with itself is always 0.0, regardless of the interaction function or callbacks.

The global symbol for the new interaction type is immediately available; the return value also provides the new object.  Note that in multispecies models, initializeInteractionType() must be called from a non-species-specific interaction() callback (declared as species all initialize()), since interactions are managed at the community level.

(void)initializeMutationRate(numeric rates, [Ni ends = NULL], [string$ sex = "*"])

Set the mutation rate per base position per gamete.  To be precise, this mutation rate is the expected mean number of mutations that will occur per base position per gamete; note that this is different from how the recombination rate is defined (see initializeRecombinationRate()).  The number of mutations that actually occurs at a given base position when generating an offspring haplosome is, in effect, drawn from a Poisson distribution with that expected mean (but under the hood SLiM uses a mathematically equivalent but much more efficient strategy).  It is possible for this Poisson draw to indicate that two or more new mutations have arisen at the same base position, particularly when the mutation rate is very high; in this case, the new mutations will be added to the site one at a time, and as always the mutation stacking policy will be followed.

There are two ways to call this function.  If the optional ends parameter is NULL (the default), then rates must be a singleton value that specifies a single mutation rate to be used along the entire chromosome.  If, on the other hand, ends is supplied, then rates and ends must be the same length, and the values in ends must be specified in ascending order.  In that case, rates and ends taken together specify the mutation rates to be used along successive contiguous stretches of the chromosome, from beginning to end; the last position specified in ends should extend to the end of the chromosome (i.e. at least to the end of the last genomic element, if not further).

For example, if the following call is made:

initializeMutationRate(c(1e-7, 2.5e-8), c(5000, 9999));

then the result is that the mutation rate for bases 0...5000 (inclusive) will be 1e-7, and the rate for bases 5001...9999 (inclusive) will be 2.5e-8.

Note that mutations are generated by SLiM only within genomic elements, regardless of the mutation rate map.  In effect, the mutation rate map given is intersected with the coverage area of the genomic elements defined; areas outside of any genomic element are given a mutation rate of zero.  There is no harm in supplying a mutation rate map that specifies rates for areas outside of the genomic elements defined; that rate information is simply not used.  The overallMutationRate family of properties on Chromosome provide the overall mutation rate after genomic element coverage has been taken into account, so it will reflect the rate at which new mutations will actually be generated in the simulation as configured.

If the optional sex parameter is "*" (the default), then the supplied mutation rate map will be used for both sexes (which is the only option for hermaphroditic simulations).  In sexual simulations sex may be "M" or "F" instead, in which case the supplied mutation rate map is used only for that sex (i.e., when generating a gamete from a parent of that sex).  In this case, two calls must be made to initializeMutationRate(), one for each sex, even if a rate of zero is desired for the other sex; no default mutation rate map is supplied.

In nucleotide-based models, initializeMutationRate() may not be called.  Instead, the desired sequence-based mutation rate(s) should be expressed in the mutationMatrix parameter to initializeGenomicElementType().  If variation in the mutation rate along the chromosome is desired, initializeHotspotMap() should be used.

The initializeMutationRateFromFile() function is a useful convenience function if you wish to read the mutation rate map from a file.

(void)initializeMutationRateFromFile(string$ path, integer$ lastPosition, [float$ scale = 1.0e-08], [string$ sep = "\t"], [string$ dec = "."])

Set a mutation rate map from data read from the file at path.  This function is essentially a wrapper for initializeMutationRate() that uses readCSV() and passes the data through.  The file is expected to contain two columns of data.  The first column must be integer start positions for rate map regions; the first region should start at position 0 if the map’s positions are 0-based, or at position 1 if the map’s positions are 1-based; in the latter case, 1 will be subtracted from every position since SLiM uses 0-based positions.  The second column must be float rates, relative to the scaling factor specified in scale; for example, if a given rate is 1.2 and scale is 1e-8 (the default), the rate used will be 1.2e-8.  No column header line should be present; the file should start immediately with numerical data.  The expected separator between columns is a tab character by default, but may be passed in sep; the expected decimal separator is a period by default, but may be passed in dec.  Once read, the map is converted into a rate map specified with end positions, rather than start positions, and the position given by lastPosition is used as the end of the last rate region; it should be the last position of the chromosome.

See readCSV() for further details on sep and dec, which are passed through to it; and see initializeMutationRate() for details on how the rate map is validated and used.

This function is written in Eidos, and its source code can be viewed with functionSource(), so you can copy and modify its code if you need to modify its functionality.

(object<MutationType>$)initializeMutationType(is$ id, numeric$ dominanceCoeff, string$ distributionType, ...)

Add a mutation type at initialization time.  The id must not already be used for any mutation type in the simulation.  The id parameter may be either an integer giving the ID of the new mutation type, or a string giving the name of the new mutation type (such as "m5" to specify an ID of 5).  The dominanceCoeff parameter supplies the dominance coefficient for the mutation type; 0.0 produces no dominance, 1.0 complete dominance, and values greater than 1.0, overdominance.  The distributionType may be "f", in which case the ellipsis ... should supply a numeric$ fixed selection coefficient; "e", in which case the ellipsis should supply a numeric$ mean selection coefficient for an exponential distribution; "g", in which case the ellipsis should supply a numeric$ mean selection coefficient and a numeric$ alpha shape parameter for a gamma distribution; "n", in which case the ellipsis should supply a numeric$ mean selection coefficient and a numeric$ sigma (standard deviation) parameter for a normal distribution; "p", in which case the ellipsis should supply a numeric$ mean selection coefficient and a numeric$ scale parameter for a Laplace distribution; "w", in which case the ellipsis should supply a numeric$ λ scale parameter and a numeric$ k shape parameter for a Weibull distribution; or "s", in which case the ellipsis should supply a string$ Eidos script parameter.  The global symbol for the new mutation type is immediately available; the return value also provides the new object.

Note that by default in WF models, all mutations of a given mutation type will be converted into Substitution objects when they reach fixation, for efficiency reasons.  If you need to disable this conversion, to keep mutations of a given type active in the simulation even after they have fixed, you can do so by setting the convertToSubstitution property of MutationType to F.  In contrast, by default in nonWF models mutations will not be converted into Substitution objects when they reach fixation; convertToSubstitution is F by default in nonWF models.  To enable conversion in nonWF models for neutral mutation types with no indirect fitness effects, you should therefore set convertToSubstitution to T.

(object<MutationType>$)initializeMutationTypeNuc(is$ id, numeric$ dominanceCoeff, string$ distributionType, ...)

Add a nucleotide-based mutation type at initialization time.  This function is identical to initializeMutationType() except that the new mutation type will be nucleotide-based – in other words, mutations belonging to the new mutation type will have an associated nucleotide.  This function may be called only in nucleotide-based models (as enabled by the nucleotideBased parameter to initializeSLiMOptions()).

Nucleotide-based mutations always use a mutationStackGroup of -1 and a mutationStackPolicy of "l".  This ensures that a new nucleotide mutation always replaces any previously existing nucleotide mutation at a given position, regardless of the mutation types of the nucleotide mutations.  These values are set automatically by initializeMutationTypeNuc(), and may not be changed.

See the documentation for initializeMutationType() for all other discussion.

(void)initializeRecombinationRate(numeric rates, [Ni ends = NULL], [string$ sex = "*"])

Set the recombination rate per base position per gamete.  To be precise, this recombination rate is the probability that a breakpoint will occur between one base and the next base; note that this is different from how the mutation rate is defined (see initializeMutationRate()).  A recombination rate of 1 centimorgan/Mbp corresponds to a recombination rate of 1e-8 in the units used by SLiM.  All rates must be in the interval [0.0, 0.5].  A rate of 0.5 implies complete independence between the adjacent bases, which might be used to implement unlinked loci.  Whether a breakpoint occurs between two bases is then, in effect, determined by a binomial draw with a single trial and the given rate as probability (but under the hood SLiM uses a mathematically equivalent but much more efficient strategy).  The recombinational process in SLiM will never generate more then one crossover between one base and the next (in one generation/haplosome), and a supplied rate of 0.5 will therefore result in an actual probability of 0.5 for a crossover at the relevant position.  (Note that this was not true in SLiM 2.x and earlier, however; their implementation of recombination resulted in a crossover probability of about 39.3% for a rate of 0.5, due to the use of an inaccurate approximation method.  Recombination rates lower than about 0.01 would have been essentially exact, since the approximation error became large only as the rate approached 0.5.)

There are two ways to call this function.  If the optional ends parameter is NULL (the default), then rates must be a singleton value that specifies a single recombination rate to be used along the entire chromosome.  If, on the other hand, ends is supplied, then rates and ends must be the same length, and the values in ends must be specified in ascending order.  In that case, rates and ends taken together specify the recombination rates to be used along successive contiguous stretches of the chromosome, from beginning to end; the last position specified in ends should extend to the end of the chromosome (i.e. at least to the end of the last genomic element, if not further).

If the optional sex parameter is "*" (the default), then the supplied recombination rate map will be used for both sexes (which is the only option for hermaphroditic simulations).  In sexual simulations sex may be "M" or "F" instead, in which case the supplied recombination map is used only for that sex.  In this case, two calls must be made to initializeRecombinationRate(), one for each sex, even if a rate of zero is desired for the other sex; no default recombination map is supplied.

The initializeRecombinationRateFromFile() function is a useful convenience function if you wish to read the recombination rate map from a file.

(void)initializeRecombinationRateFromFile(string$ path, integer$ lastPosition, [float$ scale = 1.0e-08], [string$ sep = "\t"], [string$ dec = "."])

Set a recombination rate map from data read from the file at path.  This function is essentially a wrapper for initializeRecombinationRate() that uses readCSV() and passes the data through.  The file is expected to contain two columns of data.  The first column must be integer start positions for rate map regions; the first region should start at position 0 if the map’s positions are 0-based, or at position 1 if the map’s positions are 1-based; in the latter case, 1 will be subtracted from every position since SLiM uses 0-based positions.  The second column must be float rates, relative to the scaling factor specified in scale; for example, if a given rate is 1.2 and scale is 1e-8 (the default), the rate used will be 1.2e-8.  No column header line should be present; the file should start immediately with numerical data.  The expected separator between columns is a tab character by default, but may be passed in sep; the expected decimal separator is a period by default, but may be passed in dec.  Once read, the map is converted into a rate map specified with end positions, rather than start positions, and the position given by lastPosition is used as the end of the last rate region; it should be the last position of the chromosome.

See readCSV() for further details on sep and dec, which are passed through to it; and see initializeRecombinationRate() for details on how the rate map is validated and used.

This function is written in Eidos, and its source code can be viewed with functionSource(), so you can copy and modify its code if you need to modify its functionality.

(void)initializeSex([Ns$ chromosomeType = NULL])

Enable sex in the simulation.  Beginning in SLiM 5, this method should generally be passed NULL, simply indicating that sex should be enabled: individuals will then be male and female (rather than hermaphroditic), biparental crosses will be required to be between a female first parent and a male second parent, and selfing will not be allowed.  In this new configuration style, if a sexual simulation involving sex chromosomes is desired, the new initializeChromosome() call should be used to configure the chromosome setup for the simulation.

For backward compatibility, the old style of configuring a sexual simulation is still supported, however.  This implicitly defines a single chromosome, without a call to initializeChromosome().  With this old configuration approach, the chromosomeType parameter to initializeSex() gives the type of chromosome that should be simulated; this should be "A", "X", or "Y", and this chromosomeType value will be used as the symbol ("A", "X", or "Y") for the implicit chromosome.  These legacy chromosome types correspond to the new chromosome types "A", "X", and "-Y" respectively (note that it is not "Y"), when using initializeChromosome().  The implicit chromosome’s id property is always 1.  This old style of chromosome configuration is much less flexible, however, allowing only these three chromosome types, and only allowing a single chromosome to be set up.  This backward compatibility mode may be removed for SLiM in the future, and should be considered deprecated; new models should call initializeChromosome() explicitly instead.

There is no way to disable sex once it has been enabled; if you don’t want to have sex, don’t call this function.  If you require more flexibility with mating types and reproductive strategies than SLiM’s built-in support for sex provides, do not call initializeSex(); instead, track the sex or mating type of individuals yourself in script (with the tag property of Individual, for example), and manage the consequences of that in your script yourself, in terms of which individuals can mate with which, and exactly how the offspring is produced.

The xDominanceCoeff parameter has been deprecated and removed.  In SLiM 5 and later, use the hemizygousDominanceCoeff property of MutationType instead.  If the chromosomeType is "X", the optional xDominanceCoeff parameter can supply the dominance coefficient used when a mutation is present in an XY male, and is thus “heterozygous” (but in a different sense than the heterozygosity of an XX female with one copy of the mutation).

(void)initializeSLiMModelType(string$ modelType)

Configure the type of SLiM model used for the simulation.  At present, one of two model types may be selected.  If modelType is "WF", SLiM will use a Wright-Fisher (WF) model; this is the model type that has always been supported by SLiM, and is the model type used if initializeSLiMModelType() is not called.  If modelType is "nonWF", SLiM will use a non-Wright-Fisher (nonWF) model instead; this is a new model type supported by SLiM 3.0 and above.

If initializeSLiMModelType() is called at all then it must be called before any other initialization function, so that SLiM knows from the outset which features are enabled and which are not.

(void)initializeSLiMOptions([logical$ keepPedigrees = F], [string$ dimensionality = ""], [string$ periodicity = ""], [logical$ doMutationRunExperiments = T], [logical$ preventIncidentalSelfing = F], [logical$ nucleotideBased = F], [logical$ randomizeCallbacks = T])

Configure options for the simulation.  If initializeSLiMOptions() is called at all then it must be called before any other initialization function (except initializeSLiMModelType()), so that SLiM knows from the outset which optional features are enabled and which are not.

If keepPedigrees is T, SLiM will keep pedigree information for every individual in the simulation, tracking the identity of its parents and grandparents.  This allows individuals to assess their degree of pedigree-based relatedness to other individuals (see Individual’s relatedness() and sharedParentCount() methods), as well as allowing a model to find “trios” (two parents and an offspring they generated) using the pedigree properties of Individual.  As a side effect of keepPedigrees being T, the pedigreeID, pedigreeParentIDs, and pedigreeGrandparentIDs properties of Individual will have defined values, as will the haplosomePedigreeID property of Haplosome.  Note that pedigree-based relatedness doesn’t necessarily correspond to genetic relatedness, due to effects such as assortment and recombination.  Beginning in SLiM 3.5, keepPedigrees=T also enables tracking of individual reproductive output, available through the reproductiveOutput property of Individual and the lifetimeReproductiveOutput property of Subpopulation.

If dimensionality is not "", SLiM will enable its optional “continuous space” facility.  Three values for dimensionality are presently supported: "x", "xy", and "xyz", specifying that continuous space should be enabled for one, two, or three dimensions, respectively, using (x), (x, y), and (x, y, z) coordinates respectively.  This has a number of side effects.  First of all, it means that the specified properties of Individual (x, y, and/or z) will be interpreted by SLiM as spatial positions; in particular, SLiMgui will use those properties to display subpopulations spatially.  Second, it allows spatial interactions to be defined, evaluated, and queried using initializeInteractionType() and interaction() callbacks.  And third, it enables the use of any other properties and methods related to continuous space, such as setting the spatial boundaries of subpopulations, which would otherwise raise an error.

If periodicity is not "", SLiM will designate the specified spatial dimensions as being periodic – wrapping around at the edges of the spatial boundaries of that dimension.  This option may only be used if the dimensionality parameter to initializeSLiMOptions() has been used to enable spatiality in the model, and only spatial dimensions that were specified in the dimensionality of the model may be declared to be periodic (but if desired, it is permissible to make just a subset of those dimensions periodic; it is not an all-or-none proposition).  For example, if the specified dimensionality is "xy", the model’s periodicity may be "x", "y", or "xy" (or "", the default, to specify that there are no periodic dimensions).  A one-dimensional periodic model would model a space like the perimeter of a circle.  A two-dimensional model periodic in one of those dimensions would model a space like a cylinder without its end caps; if periodic in both dimensions, the modeled space is a torus.  The shapes of three-dimensional periodic models are harder to visualize, but are essentially higher-dimensional analogues of these concepts.  Periodic boundary conditions are commonly used to model spatial scenarios without “edge effects”, since there are no edges in the periodic spatial dimensions.  The pointPeriodic() method of Subpopulation is typically used in conjunction with this option, to actually implement the periodic boundary condition for the specified dimensions.

The doMutationRunExperiments parameter specifies whether SLiM should attempt to conduct experiments at runtime to determine the optimal number of mutation runs used in the model.  This is a performance optimization.  If doMutationRunExperiments is T (the default), this optimization is enabled for all chromosomes that do not have an explicitly specified mutation run count; this is generally desirable and may significantly improve performance.  If doMutationRunExperiments is F, this optimization is disabled and chromosomes that do not have an explicitly specified mutation run count will simply use a single mutation run.  See the documentation for initializeChromosome() for further discussion.  Note that this parameter used to be [integer$ mutationRuns = 0], specifying the mutation run count directly.  That parameter has been moved to initializeChromosome(), allowing a different mutation run count to be specified for each chromosome in multi-chromosome models.

If preventIncidentalSelfing is T, incidental selfing in hermaphroditic models will be prevented by SLiM.  By default (i.e., if preventIncidentalSelfing is F), SLiM chooses the first and second parents in a biparental mating event independently.  It is therefore possible for the same individual to be chosen as both the first and second parent, resulting in selfing events even when the selfing rate is zero.  In many models this is unimportant, since it happens fairly infrequently and does not have large consequences.  This behavior is SLiM’s default because it is the simplest option, and produces results that most closely align with simple analytical population genetics models.  However, in some models this selfing can be undesirable and problematic.  In particular, models that involve very high variance in fitness or very small effective population sizes may see elevated rates of selfing that substantially influence model results.  If preventIncidentalSelfing is set to T, all such incidental selfing will be prevented (by choosing a new second parent if the first parent was chosen again).  Non-incidental selfing, as requested by the selfing rate, will still be permitted.  Note that if incidental selfing is prevented, SLiM will hang if it is unable to find a different second parent; there must always be at least two individuals in the population with non-zero fitness, and mateChoice() and modifyChild() callbacks must not absolutely prevent those two individuals from producing viable offspring.  Enforcement of the prohibition on incidental selfing will occur after mateChoice() callbacks have been called (and thus the default mating weights provided to mateChoice() callbacks will not exclude the first parent!), but will occur before modifyChild() callbacks are called (so those callbacks may assume that the first and second parents are distinct).

If nucleotideBased is T, the model will be nucleotide-based.  In this case, auto-generated mutations (i.e., mutation types used by genomic element types) must be nucleotide-based, and an ancestral nucleotide sequence must be supplied with initializeAncestralNucleotides().  Non-nucleotide-based mutations may still be used, but may not be referenced by genomic element types.  A mutation rate (or rate map) may not be supplied with initializeMutationRate(); instead, a hotspot map may (optionally) be supplied with initializeHotspotMap().  This choice has many consequences across SLiM. 

If randomizeCallbacks is T (the default), the order in which individuals are processed in callbacks will be randomized to make it easier to avoid order-dependency bugs.  This flag exists because the order of individuals in each subpopulation is non-random; most notably, females always come before males in the individuals vector, but non-random ordering may also occur with respect to things like migrant versus non-migrant status, origin by selfing versus cloning versus biparental mating, and other factors.  When this option is F, individuals in a subpopulation are processed in the order of the individuals vector in each tick cycle stage, which may lead to order-dependency issues if there is an enabled callback whose behavior is not fully independent between calls.  Setting this option to T will cause individuals within each subpopulation to be processed in a randomized order in each tick cycle stage; specifically, this randomizes the order of calls to mutationEffect() callbacks in both WF and nonWF models, and calls to reproduction() and survival() callbacks in nonWF models.  Each subpopulation is still processed separately, in sequential order, so order-dependency issues between subpopulations are still possible if callbacks have effects that are not fully independent.  This feature was added in SLiM 4, breaking backward compatibility; to recover the behavior of previous versions of SLiM, pass F for this option (but then be very careful about order-dependency issues in your script).  The default of T is the safe option, but a small speed penalty is incurred by the randomization of the processing order – for most models the difference will be less than 1%, but in the worst case it may approach 10%.  Models that do not have any order-dependency issue may therefore run somewhat faster if this is set to F.  Note that anywhere that your script uses the individuals property of Subpopulation, the order of individuals returned will be non-random (regardless of the setting of this option); you should use sample() to shuffle the order of the individuals vector if necessary to avoid order-dependency issues in your script.

This function will likely be extended with further options in the future, added on to the end of the argument list.  Using named arguments with this call is recommended for readability.  Note that turning on optional features may increase the runtime and memory footprint of SLiM.

(void)initializeSpecies([integer$ tickModulo = 1], [integer$ tickPhase = 1], [string$ avatar = ""], [string$ color = ""])

Configure options for the species being initialized.  This initialization function may only be called in multispecies models (i.e., models with explicit species declarations); in single-species models, the default values are assumed and cannot be changed.

The tickModulo and tickPhase parameters determine the activation schedule for the species.  The active property of the species will be set to T (thus activating the species) every tickModulo ticks, beginning in tick tickPhase.  (However, when the species is activated in a given tick, the skipTick() method may still be called in a first() event to deactivate it.)  See the active property of Species for more details.

The avatar parameter, if not "", sets a string value used to represent the species graphically, particularly in SLiMgui but perhaps in other contexts also.  The avatar should generally be a single character – usually an emoji corresponding to the species, such as "🦊" for foxes or "🐭" for mice.  If avatar is the empty string, "", SLiMgui will choose a default avatar.

The color parameter, if not "", sets a string color value used to represent the species in SLiMgui.  Colors may be specified by name, or with hexadecimal RGB values of the form "#RRGGBB" (see the Eidos manual for details).  If color is the empty string, "", SLiMgui will choose a default color.

(void)initializeTreeSeq([logical$ recordMutations = T], [Nif$ simplificationRatio = NULL], [Ni$ simplificationInterval = NULL], [logical$ checkCoalescence = F], [logical$ runCrosschecks = F], [logical$ retainCoalescentOnly = T], [Ns$ timeUnit = NULL])

Configure options for tree sequence recording.  Calling this function turns on tree sequence recording, as a side effect, for later reconstruction of the simulation’s evolutionary dynamics; if you do not want tree sequence recording to be enabled, do not call this function.  Note that tree-sequence recording internally uses SLiM’s “pedigree tracking” feature to uniquely identify individuals and haplosomes; however, if you want to use pedigree tracking in your script you must still enable it yourself with initializeSLiMOptions(keepPedigrees=T).  A separate tree sequence will be recorded for each chromosome in the simulation, as configured with initializeChromosome().

The recordMutations flag controls whether information about individual mutations is recorded or not.  Such recording takes time and memory, and so can be turned off if only the tree sequence itself is needed, but it is turned on by default since mutation recording is generally useful.

The simplificationRatio and simplificationInterval parameters control how often automatic simplification of the recorded tree sequence occurs.  This is a speed–memory tradeoff: more frequent simplification (lower simplificationRatio or smaller simplificationInterval) means the stored tree sequences will use less memory, but at a cost of somewhat longer run times.  Conversely, a larger simplificationRatio or simplificationInterval means that SLiM will wait longer between simplifications.  There are three ways these parameters can be used.  With the first option, with a non-NULL simplificationRatio and a NULL value for simplificationInterval, SLiM will try to find an optimal tick interval for simplification such that the ratio of the memory used by the tree sequence tables, (before:after) simplification, is close to the requested ratio. The default of 10 (used if both simplificationRatio and simplificationInterval are NULL) thus requests that SLiM try to find a tick interval such that the maximum size of the stored tree sequences is ten times the size after simplification. INF may be supplied to indicate that automatic simplification should never occur; 0 may be supplied to indicate that automatic simplification should be performed at the end of every tick.  Alternatively – the second option – simplificationRatio may be NULL and simplificationInterval may be set to the interval, in ticks, between simplifications.  This may provide more reliable performance, but the interval must be chosen carefully to avoid exceeding the available memory.  The simplificationInterval value may be a very large number to specify that simplification should never occur (not INF, though, since it is an integer value), or 1 to simplify every tick.  Finally – the third option – both parameters may be non-NULL, in which case simplificationRatio is used as described above, while simplificationInterval provides the initial interval first used by SLiM (and then subsequently increased or decreased to try to match the requested simplification ratio).  The default initial interval, used when simplificationInterval is NULL, is usually 20; this is chosen to be relatively frequent, and thus unlikely to lead to a memory overflow, but it can result in rather slow spool-up for models where the equilibrium simplification interval, as determined by the simplification ratio, is much longer.  It can therefore be helpful to set a larger initial interval so that the early part of the model run is not excessively bogged down in simplification.

The checkCoalescence parameter controls whether a check for full coalescence is conducted after each simplification.  If a model will call treeSeqCoalesced() to check for coalescence during its execution, checkCoalescence should be set to T.  Since the coalescence checks entail a performance penalty, the default of F is preferable otherwise.  See the documentation for treeSeqCoalesced() for further discussion.

The runCrosschecks parameter controls whether cross-checks between SLiM’s internal data structures and the tree-sequence recording data structures will be conducted.  These two sets of data structures record much the same thing (mutations in haplosomes), but using completely different representations, so such cross-checks can be useful to confirm that the two data structures do indeed represent the same conceptual state.  This slows down the model considerably, however, and would normally be turned on only for debugging purposes, so it is turned off by default.

The retainCoalescentOnly parameter controls how, exactly, simplification of the tree-sequence data is performed in SLiM (both for auto-simplification and for calls to treeSeqSimplify()).  More specifically, this parameter controls the behavior of simplification for individuals and haplosomes that have been “retained” by calling treeSeqRememberIndividuals() with the parameter permanent=F.  The default of retainCoalescentOnly=T helps to keep the number of retained individuals relatively small, which is helpful if your simulation regularly flags many individuals for retaining.  In this case, changing retainCoalescentOnly to F may dramatically increase memory usage and runtime, in a similar way to permanently remembering all the individuals.  See the documentation of treeSeqRememberIndividuals() for further discussion.

The timeUnit parameter controls the time unit stated in the tree sequence when it is saved (which can be accessed through tskit APIs); it has no effect on the running simulation whatsoever.  The default value, NULL, means that a time unit of "ticks" will be used for all model types.  (In SLiM 3.7 / 3.7.1, NULL implied a time unit of "generations" for WF models, but "ticks" for nonWF models; given the new multispecies timescale parameters in SLiM 4, a default of "ticks" makes sense in all cases since now even in WF models one tick might not equal one biological generation.)  It may be helpful to set timeUnit to "generations" explicitly when modeling non-overlapping generations in which one tick equals one generation, to tell tskit that the time unit does in fact represent biological generations; doing so may avoid warnings from tskit or msprime regarding the time unit, in cases such as recapitation where the simulation timescale is important.

3.2.  Nucleotide utilities

(is)codonsToAminoAcids(integer codons, [li$ long = F], [logical$ paste = T])

Returns the amino acid sequence corresponding to the codon sequence in codons.  Codons should be represented with values in [0, 63] where AAA is 0, AAC is 1, AAG is 2, and TTT is 63; see ancestralNucleotides() for discussion of this encoding.  If long is F (the default), the standard single-letter codes for amino acids will be used (where Serine is "S", etc.); if long is T, the standard three-letter codes will be used instead (where Serine is "Ser", etc.).  Beginning in SLiM 3.5, if long is 0, integer codes will be used as follows (and paste will be ignored):

stop (TAA, TAG, TGA) 0
Alanine 1
Arginine 2
Asparagine 3
Aspartic acid (Aspartate) 4
Cysteine 5
Glutamine 6
Glutamic acid (Glutamate) 7
Glycine 8
Histidine 9
Isoleucine 10
Leucine 11
Lysine 12
Methionine 13
Phenylalanine 14
Proline 15
Serine 16
Threonine 17
Tryptophan 18
Tyrosine 19
Valine 20

There does not seem to be a widely used standard for integer coding of amino acids, so SLiM just numbers them alphabetically, making stop codons 0.  If you want a different coding, you can make your own 64-element vector and use it to convert codons to whatever integer codes you need.  Other integer values of long are reserved for future use (to support other codings), and will currently produce an error.

When long is T or F and paste is T (the default), the amino acid sequence returned will be a singleton string, such as "LYATI" (when long is F) or "Leu-Tyr-Ala-Thr-Ile" (when long is T).  When long is T or F and paste is F, the amino acid sequence will instead be returned as a string vector, with one element per amino acid, such as "L" "Y" "A" "T" "I" (when long is F) or "Leu" "Tyr" "Ala" "Thr" "Ile" (when long is T).  Using the paste=T option is considerably faster than using paste() in script.

This function interprets the supplied codon sequence as the sense strand (i.e., the strand that is not transcribed, and which mirrors the mRNA’s sequence).  This uses the standard DNA codon table directly.  For example, if the nucleotide sequence is CAA TTC, that will correspond to a codon vector of 16 61, and will result in the amino acid sequence Gln-Phe ("QF").

(is)codonsToNucleotides(integer codons, [string$ format = "string"])

Returns the nucleotide sequence corresponding to the codon sequence supplied in codons.  Codons should be represented with values in [0, 63] where AAA is 0, AAC is 1, AAG is 2, and TTT is 63; see ancestralNucleotides() for discussion of this encoding.

The format parameter controls the format of the returned sequence.  It may be "string" to obtain the sequence as a singleton string (e.g., "TATACG"), "char" to obtain it as a string vector of single characters (e.g., "T", "A", "T", "A", "C", "G"), or "integer" to obtain it as an integer vector (e.g., 3, 0, 3, 0, 1, 2), using SLiM’s standard code of A=0, C=1, G=2, T=3.

(float)mm16To256(float mutationMatrix16)

Returns a 64×4 mutation matrix that is functionally identical to the supplied 4×4 mutation matrix in mutationMatrix16.  The mutation rate for each of the 64 trinucleotides will depend only upon the central nucleotide of the trinucleotide, and will be taken from the corresponding entry for the same nucleotide in mutationMatrix16.  This function can be used to easily construct a simple trinucleotide-based mutation matrix which can then be modified so that specific trinucleotides sustain a mutation rate that does not depend only upon their central nucleotide.

See the documentation for initializeGenomicElementType() for further discussion of how these 64×4 mutation matrices are interpreted and used.

(float)mmJukesCantor(float$ alpha)

Returns a mutation matrix representing a Jukes–Cantor (1969) model with mutation rate alpha to each possible alternative nucleotide at a site.  This 2×2 matrix is suitable for use with initializeGenomicElementType().  Note that the actual mutation rate produced by this matrix is 3*alpha.

(float)mmKimura(float$ alpha, float$ beta)

Returns a mutation matrix representing a Kimura (1980) model with transition rate alpha and transversion rate beta.  This 2×2 matrix is suitable for use with initializeGenomicElementType().  Note that the actual mutation rate produced by this model is alpha+2*beta.

(integer)nucleotideCounts(is sequence)

A convenience function that returns an integer vector of length four, providing the number of occurrences of A / C / G / T nucleotides, respectively, in the supplied nucleotide sequence.  The parameter sequence may be a singleton string (e.g., "TATA"), a string vector of single characters (e.g., "T", "A", "T", "A"), or an integer vector (e.g., 3, 0, 3, 0), using SLiM’s standard code of A=0, C=1, G=2, T=3.

(float)nucleotideFrequencies(is sequence)

A convenience function that returns a float vector of length four, providing the frequencies of occurrences of A / C / G / T nucleotides, respectively, in the supplied nucleotide sequence.  The parameter sequence may be a singleton string (e.g., "TATA"), a string vector of single characters (e.g., "T", "A", "T", "A"), or an integer vector (e.g., 3, 0, 3, 0), using SLiM’s standard code of A=0, C=1, G=2, T=3.

(integer)nucleotidesToCodons(is sequence)

Returns the codon sequence corresponding to the nucleotide sequence in sequence.  The codon sequence is an integer vector with values from 0 to 63, based upon successive nucleotide triplets in the nucleotide sequence.  The codon value for a given nucleotide triplet XYZ is 16X + 4Y + Z, where X, Y, and Z have the usual values A=0, C=1, G=2, T=3.  For example, the triplet AAA has a codon value of 0, AAC is 1, AAG is 2, AAT is 3, ACA is 4, and on upward to TTT which is 63.  If the nucleotide sequence AACACATTT is passed in, the codon vector 1 4 63 will therefore be returned.  These codon values can be useful in themselves; they can also be passed to codonsToAminoAcids() to translate them into the corresponding amino acid sequence if desired.

The nucleotide sequence in sequence may be supplied in any of three formats: a string vector with single-letter nucleotides (e.g., "T", "A", "T", "A"), a singleton string of nucleotide letters (e.g., "TATA"), or an integer vector of nucleotide values (e.g., 3, 0, 3, 0) using SLiM’s standard code of A=0, C=1, G=2, T=3.  If the choice of format is not driven by other considerations, such as ease of manipulation, then the singleton string format will certainly be the most memory-efficient for long sequences, and will probably also be the fastest.  The nucleotide sequence provided must be a multiple of three in length, so that it translates to an integral number of codons.

(is)randomNucleotides(integer$ length, [Nif basis = NULL], [string$ format = "string"])

Generates a new random nucleotide sequence with length bases.  The four nucleotides ACGT are equally probable if basis is NULL (the default); otherwise, basis may be a 4-element integer or float vector providing relative fractions for A, C, G, and T respectively (these need not sum to 1.0, as they will be normalized).  More complex generative models such as Markov processes are not supported intrinsically in SLiM at this time, but arbitrary generated sequences may always be loaded from files on disk.

The format parameter controls the format of the returned sequence.  It may be "string" to obtain the generated sequence as a singleton string (e.g., "TATA"), "char" to obtain it as a string vector of single characters (e.g., "T", "A", "T", "A"), or "integer" to obtain it as an integer vector (e.g., 3, 0, 3, 0), using SLiM’s standard code of A=0, C=1, G=2, T=3.  For passing directly to initializeAncestralNucleotides(), format "string" (a singleton string) will certainly be the most memory-efficient, and probably also the fastest.  Memory efficiency can be a significant consideration; the nucleotide sequence for a chromosome of length 109 will occupy approximately 1 GB of memory when stored as a singleton string (with one byte per nucleotide), and much more if stored in the other formats.  However, the other formats can be easier to work with in Eidos, and so may be preferable for relatively short chromosomes if you are manipulating the generated sequence.

3.3.  Population genetics utilities

(float$)calcFST(object<Haplosome> haplosomes1, object<Haplosome> haplosomes2, [No<Mutation> muts = NULL], [Ni$ start = NULL], [Ni$ end = NULL])

Calculates the FST between two Haplosome vectors – typically, but not necessarily, the haplosomes that constitute two different subpopulations (which we will assume for the purposes of this discussion).  In general, higher FST indicates greater genetic divergence between subpopulations.

The calculation is done using only the mutations in muts; if muts is NULL, all mutations are used.  The muts parameter can therefore be used to calculate the FST only for a particular mutation type (by passing only mutations of that type).

The calculation can be narrowed to apply to only a window – a subrange of the full chromosome – by passing the interval bounds [start, end] for the desired window.  In this case, the vector of mutations used for the calculation will be subset to include only mutations within the specified window.  The default behavior, with start and end of NULL, provides the haplosome-wide FST, which is often used to assess the overall level of genetic divergence between sister species or allopatric subpopulations.

The code for calcFST() is, roughly, an Eidos implementation of Wright’s definition of FST (but see below for further discussion and clarification):

FST = 1 - HS / HT

where HS is the average heterozygosity in the two subpopulations, and HT is the total heterozygosity when both subpopulations are combined.  In this implementation, the two haplosome vectors are weighted equally, not weighted by their size.  In SLiM 3, the implementation followed Wright’s definition closely, and returned the average of ratios: mean(1.0 - H_s/H_t), in the Eidos code.  In SLiM 4, it returns the ratio of averages instead: 1.0 - mean(H_s)/mean(H_t).  In other words, the FST value reported by SLiM 4 is an average across the specified mutations in the two sets of haplosomes, where H_s and H_t are first averaged across all specified mutations prior to taking the ratio of the two.  This ratio of averages is less biased than the average of ratios, and and is generally considered to be best practice (see, e.g., Bhatia et al., 2013).  This means that the behavior of calcFST() differs between SLiM 3 and SLiM 4.

As can be seen from its equation, the FST is undefined if HT is zero, which occurs if no mutations are present in the haplosomes provided (given the optionally specified window and set of mutations).  In that case, calcFST() will return NAN.  It is up to the caller to detect this with isNAN() and handle it as necessary.

The implementation of calcFST(), viewable with functionSource(), treats every mutation in muts as independent in the heterozygosity calculations; in other words, if mutations are stacked, the heterozygosity calculated is by mutation, not by site.  Similarly, if multiple Mutation objects exist in different haplosomes at the same site (whether representing different genetic states, or multiple mutational lineages for the same genetic state), each Mutation object is treated separately for purposes of the heterozygosity calculation, just as if they were at different sites.  One could regard these choices as embodying an infinite-sites interpretation of the segregating mutations.  In most biologically realistic models, such genetic states will be quite rare, and so the impact of these choices will be negligible; however, in some models these distinctions may be important.

(float$)calcHeterozygosity(object<Haplosome> haplosomes, [No<Mutation> muts = NULL], [Ni$ start = NULL], [Ni$ end = NULL])

Calculates the heterozygosity for a vector of haplosomes (containing at least one element), based upon the frequencies of mutations in the haplosomes.  The result is the expected heterozygosity, for the individuals to which the haplosomes belong, assuming that they are under Hardy-Weinberg equilibrium; this can be compared to the observed heterozygosity of an individual, as calculated by calcPairHeterozygosity().  Often haplosomes will be all of the haplosomes in a subpopulation, or in the entire population, but any haplosome vector may be used.  By default, with muts=NULL, the calculation is based upon all mutations in the simulation; the calculation can instead be based upon a subset of mutations, such as mutations of a specific mutation type, by passing the desired vector of mutations for muts.

In multi-chromosome models, all of the haplosomes and mutations passed in haplosomes and muts must all be associated with the same single chromosome.  If you wish to calculate heterozygosity across multiple chromosomes, you can simply write a for loop that calculates it for each chromosome and combines the results; but it is not entirely clear how to weight the chromosomes to produce a single number, especially when sex chromosomes and other chromosomes of variable ploidy might be represented in haplosomes, so it is not done automatically by this function.

The calculation can be narrowed to apply to only a window – a subrange of the full chromosome – by passing the interval bounds [start, end] for the desired window.  In this case, the vector of mutations used for the calculation will be subset to include only mutations within the specified window.  The default behavior, with start and end of NULL, provides the haplosome-wide heterozygosity.

The implementation of calcHeterozygosity(), viewable with functionSource(), treats every mutation as independent in the heterozygosity calculations.  One could regard this choice as embodying an infinite-sites interpretation of the segregating mutations.  In most biologically realistic models, such genetic states will be quite rare, and so the impact of this choice will be negligible; however, in some models this distinction may be important.  See calcPairHeterozygosity() for further discussion.

(float$)calcInbreedingLoad(object<Haplosome> haplosomes, [Nio<MutationType>$ mutType = NULL])

Calculates inbreeding load (the haploid number of lethal equivalents, or B) for a vector of haplosomes (containing at least one element) passed in haplosomes.  The calculation can be limited to a focal mutation type passed in mutType (which may be either an integer representing the ID of the desired mutation type, or a MutationType object specified directly); if mutType is NULL (the default), all of the mutations for the focal species will be considered.  In any case, only deleterious mutations (those with a negative selection coefficient) will be included in the final calculation.

The inbreeding load is a measure of the quantity of recessive deleterious variation that is heterozygous in a population and can contribute to fitness declines under inbreeding.  This function implements the following equation from Morton et al. (1956), which assumes no epistasis and random mating:

B = sum(qs) − sum(q2s) − 2sum(q(1−q)sh)

where q is the frequency of a given deleterious allele, s is the absolute value of the selection coefficient, and h is its dominance coefficient.  Note that the implementation, viewable with functionSource(), sets a maximum |s| of 1.0 (i.e., a lethal allele); |s| can sometimes be greater than 1.0 when s is drawn from a distribution, but in practice an allele with s < -1.0 has the same lethal effect as when s = -1.0.  Also note that this implementation will not work when the model changes the dominance coefficients of mutations using mutationEffect() callbacks, since it relies on the dominanceCoeff property of MutationType. Finally, note that, to estimate the diploid number of lethal equivalents (2B), the result from this function can simply be multiplied by two.

This function was contributed by Chris Kyriazis; thanks, Chris!

(float$)calcMeanFroh(object<Individual> individuals, [integer$ minimumLength = 1000000], [Niso<Chromosome>$ chromosome = NULL])

Calculates the mean value of the Froh statistic across the individuals passed in individuals.  This statistic is a measure of individual autozygosity, likely resulting from inbreeding, and is calculated based upon “runs of homozygosity”, or ROH, in the genome of an individual.  Broadly speaking, Froh is the proportion of an individual’s genome that is spanned by ROH longer than a given threshold length.  However, it should be noted that there are many different ways of calculating Froh, producing different results.  For example, the threshold length might be a given constant, or might be determined statistically from the characteristics of the population.  Furthermore, some heterozygous sites might be discarded (to compensate for genotyping errors), a minimum SNP density might be required within a sliding window for an ROH to be diagnosed, and so forth – it can get quite complex, as seen in the software PLINK (Purcell et al., 2007) and GARLIC (Szpiech, Blant and Pemberton, 2017).  The method used by calcMeanFroh() is the simplest possible method, assessing ROH for each individual directly from the simulated mutations without filtering or modification, and applying a given constant threshold length.  If a more sophisticated Froh algorithm is desired, one could modify the implementation of calcMeanFroh(), which is viewable with functionSource(), or one could output VCF data from SLiM and analyze it with other tools, perhaps calling out from the running SLiM script with system().

The threshold ROH length used by calcMeanFroh() is supplied by the parameter minimumLength.  It defaults to 1e6, or 1 Mbp, since that is a length commonly used in the literature, but can be adjusted as desired.

The chromosome parameter can be supplied to focus the Froh calculation on a specific chromosome; otherwise, the calculation spans all chromosomes for which the individual is actually diploid (without a null haplosome).  If Froh cannot be calculated for an individual (due to the presence of null haplosomes for every intrinsically diploid chromosome being analyzed), that individual is omitted from the mean Froh calculation; for example, if an X chromosome is the focal chromosome being analyzed, all males will be omitted from the mean Froh calculation.  If all individuals are omitted from the mean Froh calculation for this reason, NAN is returned.

This function was developed with advice from Ryan Chaffee.  Thanks, Ryan!

(float$)calcPairHeterozygosity(object<Haplosome>$ haplosome1, object<Haplosome>$ haplosome2, [Ni$ start = NULL], [Ni$ end = NULL], [logical$ infiniteSites = T])

Calculates the heterozygosity for a pair of haplosomes; these will typically be two homologous haplosomes of the same diploid individual, but any two haplosomes associated with the same chromosome may be supplied.

The calculation can be narrowed to apply to only a window – a subrange of the full chromosome – by passing the interval bounds [start, end] for the desired window.  In this case, the vector of mutations used for the calculation will be subset to include only mutations within the specified window.  The default behavior, with start and end of NULL, provides the haplosome-wide heterozygosity.

The implementation calcPairHeterozygosity(), viewable with functionSource(), treats every mutation as independent in the heterozygosity calculations by default (i.e., with infiniteSites=T).  If mutations are stacked, the heterozygosity calculated therefore depends upon the number of unshared mutations, not the number of differing sites.  Similarly, if multiple Mutation objects exist in different haplosomes at the same site (whether representing different genetic states, or multiple mutational lineages for the same genetic state), each Mutation object is treated separately for purposes of the heterozygosity calculation, just as if they were at different sites.  One could regard these choices as embodying an infinite-sites interpretation of the segregating mutations.  In most biologically realistic models, such genetic states will be quite rare, and so the impact of this choice will be negligible; however, in some models this distinction may be important.  The behavior of calcPairHeterozygosity() can be switched to calculate based upon the number of differing sites, rather than the number of unshared mutations, by passing infiniteSites=F.

(float$)calcPi(object<Haplosome> haplosomes, [No<Mutation> muts = NULL], [Ni$ start = NULL], [Ni$ end = NULL])

Calculates π (nucleotide diversity, a metric of genetic diversity) for a vector of haplosomes (containing at least two elements), based upon the mutations in the haplosomes.  π is computed by calculating the mean number of pairwise differences at each site, summing across all sites, and dividing by the number of sites.  Therefore, it is interpretable as the number of differences per site expected between two randomly chosen sequences.  The mathematical formulation (as an estimator of the population parameter θ) is based on work in Nei and Li (1979), Nei and Tajima (1981), and Tajima (1983; equation A3).  The exact formula used here is common in textbooks (e.g., equations 9.1–9.5 in Li 1997, equation 3.3 in Hahn 2018, or equation 2.2 in Coop 2020).

Often haplosomes will be all of the haplosomes in a subpopulation, or in the entire population, but any haplosome vector may be used.  By default, with muts=NULL, the calculation is based upon all mutations in the simulation; the calculation can instead be based upon a subset of mutations, such as mutations of a specific mutation type, by passing the desired vector of mutations for muts.

The calculation can be narrowed to apply to only a window – a subrange of the full chromosome – by passing the interval bounds [start, end] for the desired window.  In this case, the vector of mutations used for the calculation will be subset to include only mutations within the specified window.  The default behavior, with start and end of NULL, provides the haplosome-wide value of π.

The implementation of calcPi(), viewable with functionSource(), treats every mutation as independent in the heterozygosity calculations.  One could regard this choice as embodying an infinite-sites interpretation of the segregating mutations, as with calcHeterozygosity().  Indeed, finite-sites models of π have been derived (Tajima 1996) though are not used here.  In most biologically realistic models, such genetic states will be quite rare, and so the impact of this assumption will be negligible; however, in some models this distinction may be important.  See calcPairHeterozygosity() for further discussion.  This function was written by Nick Bailey (currently affiliated with CNRS and the Laboratory of Biometry and Evolutionary Biology at University Lyon 1), with helpful input from Peter Ralph and Chase Nelson.

(numeric)calcSFS([Ni$ binCount = NULL], [No<Haplosome> haplosomes = NULL], [No<Mutation> muts = NULL], [string$ metric = "density"], [logical$ fold = F])

Calculates the site frequency spectrum, or SFS, for the mutations specified by muts, within the haplosomes specified by haplosomes.  The site frequency spectrum or SFS (sometimes called the allele frequency spectrum, although some authors distinguish between the two) is essentially a histogram of the frequencies of the mutations within the haplosomes; the first bin spans the lowest range of frequencies (down to a frequency of 0.0, or a count of 1), whereas the last bin spans the highest range of frequencies (up to a frequency of 1.0, or a count equal to number of haplosomes minus one).  The idea was introduced by Watterson (1975), and will be discussed in any population genetics textbook (e.g., A. Cutter, 2019, pp. 50–52).  This histogram can be returned as a float vector of density values for each bin by specifying "density" for metric (the default), or as an integer vector of count values for each bin by specifying "count".

There are two modes of operation for calcSFS().  If a specific number of bins is passed for binCount, then the frequency range [0.0, 1.0] is subdivided into binCount intervals of equal width, and the mutations are tallied into those bins according to their frequencies within the haplosomes to produce the histogram.  In this mode, there will be exactly binCount elements in the returned vector.  Note that either "density" or "count" can be chosen in this mode; you can return the frequency bin tallies as either densities or counts.

In the other mode of operation, chosen with a binCount value of NULL, the bins instead represent the count of the number of occurrences for each mutation, and range from a count of 1 (the bin for mutations that occur only once in the haplosomes, sometimes called “singletons”) up to a count of N-1 where N is the number of haplosomes.  (Note that mutations occurring in all N haplosomes are not included in the tally, since they would not be empirically observable.)  In this mode, there will be exactly N-1 elements in the returned vector.  Again, either "density" or "count" can be chosen in this mode; you can return the count bin tallies as either densities or counts (it’s a bit confusing, but we’re talking about two different kinds of “counts”, the count of the number of times a mutation occurs in the haplosomes versus the count of the number of mutations that were tallied into a particular count bin).

The haplosomes parameter can be either a vector of Haplosome objects or NULL.  If NULL is passed, calcSFS() will calculate the SFS across the whole species, using all non-null haplosomes present (and thus there must be only a single species in the model, since an SFS cannot be calculated across multiple species).  Otherwise, haplosomes can contain any set of haplosomes desired, such as from the individuals of one subpopulation, several subpopulations, or an entire species.  However, they must all belong to the same species, and null haplosomes will be automatically and silently excluded from the set.

The muts parameter can be either a vector of Mutation objects or NULL.  If NULL is passed, calcSFS() will calculate the SFS across all mutations belonging to the focal species (as determined from the species of the haplosomes).  Otherwise, muts can contain any set of mutations desired, such as mutations belonging to a specific mutation type, mutations within a specific range of positions along the chromosome, or all of the mutations in the focal species.

The binCount and metric parameters have already been discussed.  Finally, the fold parameter, if T, “folds” the calculated SFS, adding the first and last bins, the second and next-to-last bins, etc., until the center is reached.  Folding is common when working with empirical data, where one often doesn’t know the “polarity” – which allele at a site is ancestral and which is derived.  Folding solves this problem, because the polarity then doesn’t matter; the tally for a given mutation ends up in the same bin regardless.  If the number of bins is even, folding can be performed without ambiguity; the final number of bins is exactly half the original number of bins, and each final bin is the sum of two original bins.  If the number of bins is odd, the correct treatment of the central bin is somewhat ambiguous.  In calcFST(), the central bin is added to itself – doubled – and the number of bins is equal to half the original number of bins rounded up.  If you would prefer to exclude the central bin altogether – another population treatment – then when the original number of bins is odd, you can simply discard the final value in the returned vector (and, if you wish to work with densities rather than counts, re-normalize the result to sum to 1.0).

The implementation of calcSFS(), viewable with functionSource(), tallies each mutation separately, even if more than one mutation occurs at the same position (or is even stacked with another mutation).  One could regard this choice as embodying an infinite-sites interpretation of the SFS, perhaps; in any case, it follows SLiM’s behavior in other population-genetics utility functions.  In most biologically realistic models, such genetic states will be quite rare, and so the impact of this assumption will be negligible; however, in some models this distinction may be important.

This function is compatible with multi-chromosome models, in the following sense.  When binCount is specified with an integer value, mutations are binned according to their frequencies, as described above.  In a multi-chromosome model, the haplosomes and mutations used by calcSFS() may be associated with more than one chromosome, and the frequency assessed for each mutation is its frequency specifically within the haplosomes associated with its chromosome (as you would expect).  Mutations occurring in different chromosomes can therefore be tallied together into the same frequency bins, and combined into a single SFS; this produces a meaningful.  (If you want an SFS for just a single chromosome, then of course you can pass just those haplosomes and mutations to calcSFS().)  When binCount is NULL, on the other hand, mutations are binned according to their counts, as described above.  In a multi-chromosome model, it would not make sense to bin counts together from different chromosomes, since those counts might not be on the same scale – the number of haplosomes associated with the various chromosomes might not be equal.  In this case, calcSFS() will raise an error if haplosomes from more than one chromosome are supplied, or if haplosomes is NULL (since it doesn’t know which chromosome to choose).  If you wish to tally according to counts, with binCount=NULL, you must pass in a vector of haplosomes associated with a single chromosome.  (If you know what you are doing and wish to combine counts across multiple chromosomes, you can simply call calcSFS() once per chromosome, and combine the resulting vectors by adding them together.)

Thanks to Ryan Chaffee and Chase Nelson for helpful input.

(float$)calcTajimasD(object<Haplosome> haplosomes, [No<Mutation> muts = NULL], [Ni$ start = NULL], [Ni$ end = NULL])

Calculates Tajima’s D (a test of neutrality based on the allele frequency spectrum) for a vector of haplosomes (containing at least four elements), based upon the mutations in the haplosomes.  The mathematical formulation is given in Tajima 1989 (equation 38) and remains unchanged (e.g., equations 2.30 in Durrett 2008, 8.4 in Hahn 2018, and 4.44 in Coop 2020).  Often haplosomes will be all of the haplosomes in a subpopulation, or in the entire population, but any haplosome vector may be used.  By default, with muts=NULL, the calculation is based upon all mutations in the simulation; the calculation can instead be based upon a subset of mutations, such as mutations of a specific mutation type, by passing the desired vector of mutations for muts.

The calculation can be narrowed to apply to only a window – a subrange of the full chromosome – by passing the interval bounds [start, end] for the desired window.  In this case, the vector of mutations used for the calculation will be subset to include only mutations within the specified window.  The default behavior, with start and end of NULL, provides the haplosome-wide Tajima’s D.

If the genetic diversity contained within the haplosomes is insufficient for the calculation, calcTajimasD() may return NAN.  It is up to the caller to detect this with isNAN() and handle it as necessary.

The implementation of calcTajimasD(), viewable with functionSource(), treats every mutation as independent in the heterozygosity calculations.  One could regard this choice as embodying an infinite-sites interpretation of the segregating mutations, as with calcHeterozygosity().  Indeed, Tajima’s D can be modified with finite-sites models of π and θ (Misawa and Tajima 1997) though these are not used here.  In most biologically realistic models, such genetic states will be quite rare, and so the impact of this assumption will be negligible; however, in some models this distinction may be important.  See calcPairHeterozygosity() for further discussion.  This function was written by Nick Bailey (currently affiliated with CNRS and the Laboratory of Biometry and Evolutionary Biology at University Lyon 1), with helpful input from Peter Ralph.

(float$)calcVA(object<Individual> individuals, io<MutationType>$ mutType)

Calculates VA, the additive genetic variance, among a vector of individuals (containing at least two elements) passed in individuals, in a particular mutation type mutType that represents quantitative trait loci (QTLs) influencing a quantitative phenotypic trait.  The mutType parameter may be either an integer representing the ID of the desired mutation type, or a MutationType object specified directly.

This function assumes that mutations of type mutType encode their effect size upon the quantitative trait in their selectionCoeff property, as is fairly standard in SLiM.  The implementation of calcVA(), which is viewable with functionSource(), is quite simple; if effect sizes are stored elsewhere (such as with setValue()), a new user-defined function following the pattern of calcVA() can easily be written.

(float$)calcWattersonsTheta(object<Haplosome> haplosomes, [No<Mutation> muts = NULL], [Ni$ start = NULL], [Ni$ end = NULL])

Calculates Watterson’s theta (a metric of genetic diversity comparable to heterozygosity) for a vector of haplosomes (containing at least one element), based upon the mutations in the haplosomes.  Often haplosomes will be all of the haplosomes in a subpopulation, or in the entire population, but any haplosome vector may be used.  By default, with muts=NULL, the calculation is based upon all mutations in the simulation; the calculation can instead be based upon a subset of mutations, such as mutations of a specific mutation type, by passing the desired vector of mutations for muts.

The calculation can be narrowed to apply to only a window – a subrange of the full chromosome – by passing the interval bounds [start, end] for the desired window.  In this case, the vector of mutations used for the calculation will be subset to include only mutations within the specified window.  The default behavior, with start and end of NULL, provides the haplosome-wide Watterson’s theta.

The implementation of calcWattersonsTheta(), viewable with functionSource(), treats every mutation as independent in the heterozygosity calculations.  One could regard this choice as embodying an infinite-sites interpretation of the segregating mutations, as with calcHeterozygosity().  In most biologically realistic models, such genetic states will be quite rare, and so the impact of this assumption will be negligible; however, in some models this distinction may be important.  See calcPairHeterozygosity() for further discussion.

3.4.  Other utilities

(float)summarizeIndividuals(object<Individual> individuals, integer dim, numeric spatialBounds, string$ operation, [Nlif$ empty = 0.0], [logical$ perUnitArea = F], [Ns$ spatiality = NULL])

Returns a vector, matrix, or array that summarizes spatial patterns of information related to the individuals in individuals.  In essence, those individuals are assigned into bins according to their spatial position, and then a summary value for each bin is calculated based upon the individuals each bin contains.  The individuals might be binned in one dimension (resulting in a vector of summary values), in two dimensions (resulting in a matrix), or in three dimensions (resulting in an array).  Typically the spatiality of the result (the dimensions into which the individuals are binned) will match the dimensionality of the model, as indicated by the default value of NULL for the optional spatiality parameter; for example, a two-dimensional ("xy") model would by default produce a two-dimensional matrix as a summary.  However, a spatiality that is more restrictive than the model dimensionality may be passed; for example, in a two-dimensional ("xy") model a spatiality of "y" could be passed to summarize individuals into a vector, rather than a matrix, assigning them to bins based only upon their y position (i.e., the value of their y property).  Whatever spatiality is chosen, the parameter dim provides the dimensions of the desired result, in the same form that the dim() function does: first the number of rows, then the number of columns, and then the number of planes, as needed (see the Eidos manual for discussion of matrices, arrays, and dim()).  The length of dims must match the requested spatiality; for spatiality "xy", for example, dims might be c(50,100) to request that the returned matrix have 50 rows and 100 columns.  The result vector/matrix/array is in the correct orientation to be directly usable as a spatial map, by passing it to the defineSpatialMap() method of Subpopulation.  For further discussion of dimensionality and spatiality, see initializeInteractionType() and InteractionType.

The spatialBounds parameter defines the spatial boundaries within which the individuals are binned.  Typically this is the spatial bounds of a particular subpopulation, within which the individuals reside; for individuals in p1, for example, you would likely pass p1.spatialBounds for this.  However, this is not required; individuals may come from any or all subpopulations in the model, and spatialBounds may be any bounds of non-zero area (if an individual falls outside of the given spatial bounds, it is excluded, as if it were not in individuals at all).  If you have multiple subpopulations that conceptually reside within the same overall coordinate space, for example, that can be accommodated here.  The bounds are supplied in the dimensionality of the model, in the same form as for Subpopulation; for an "xy" model, for example, they are supplied as a four-element vector of the form c(x0, y0, x1, y1) even if the summary is being produced with spatiality "y".  To produce the result, a grid with dimensions defined by dims is conceptually stretched out across the given spatial bounds, such that the centers of the edge and corner grid squares are aligned with the limits of the spatial bounds.  This matches the way that defineSpatialMap() defines its maps.

The particular summary produced depends upon the parameters operation and empty.  Consider a single grid square represented by a single element in the result.  That grid square contains zero or more of the individuals in individuals.  If it contains zero individuals and empty is not NULL, the empty value is used for the result, regardless of operation, providing specific, separate control over the treatment of empty grid squares.  If empty is NULL, this separate control over the treatment of empty grid squares is declined; empty grid squares will be handled through the standard mechanism described next.  In all other cases for the given grid square – when it contains more than zero individuals, or when empty is NULLoperation is executed as an Eidos lambda, a small snippet of code, supplied as a singleton string, that is executed in a manner similar to a function call.  Within the execution of the operation lambda, a constant named individuals is defined to be the focal individuals being evaluated – all of the individuals within that grid square.  This lambda should evaluate to a singleton logical, integer, or float value, comprising the result value for the grid square; these types will all be coerced to float (T being 1 and F being 0).

Two examples may illustrate the use of empty and operation.  To produce a summary indicating presence/absence, simply use the default of 0.0 for empty, and "1.0; " (or "1;", or "T;") for operation.  This will produce 0.0 for empty grid squares, and 1.0 for those that contain at least one individual.  Note that the use of empty is essential here, because operation doesn’t even check whether individuals are present or not.  To produce a summary with a count of the number of individuals in each grid square, again use the default of 0.0 for empty, but now use an operation of "individuals.size();", counting the number of individuals in each grid square.  In this case, empty could be NULL instead and operation would still produce the correct result; but using empty makes summarizeIndividuals() more efficient since it allows the execution of operation to be skipped for those squares.

Lambdas are not limited in their complexity; they can use if, for, etc., and can call methods and functions.  A typical operation to compute the mean phenotype in a quantitative genetic model that stores phenotype values in tagF, for example, would be "mean(individuals.tagF);", and this is still quite simple compared to what is possible.  However, keep in mind that the lambda will be evaluated for every grid cell (or at least those that are non-empty), so efficiency can be a concern, and you may wish to pre-calculate values shared by all of the lambda calls, making them available to your lambda code using defineGlobal() or defineConstant().

There is one last twist, if perUnitArea is T: values are divided by the area (or length, in 1D, or volume, in 3D) that their corresponding grid cell comprises, so that each value is in units of “per unit area” (or “per unit length”, or “per unit volume”).  The total area of the grid is defined by the spatial bounds, and the area of a given grid cell is defined by the portion of the spatial bounds that is within that cell.  This is not the same for all grid cells; grid cells that fall partially outside spatialBounds (because, remember, the centers of the edge/corner grid cells are aligned with the limits of spatialBounds) will have a smaller area inside the bounds.  For an "xy" spatiality summary, for example, corner cells have only a quarter of their area inside spatialBounds, while edge elements have half of their area inside spatialBounds; for purposes of perUnitArea, then, their respective areas are ¼ and ½ the area of an interior grid cell.  By default, perUnitArea is F, and no scaling is performed.  Whether you want perUnitArea to be F or T depends upon whether the summary you are producing is, conceptually, “per unit area”, such as density (individuals per unit area) or local competition strength (total interaction strength per unit area), or is not, such as “mean individual age”, or “maximum tag value”.  For the previous example of counting individuals with an operation of "individuals.size();", a value of F for perUnitArea (the default) will produce a simple count of individuals in each grid square, whereas with T it would produce the density of individuals in each grid square.

(object<Dictionary>$)treeSeqMetadata(string$ filePath, [logical$ userData = T])

Returns a Dictionary containing top-level metadata from the .trees (tree-sequence) file at filePath.  If userData is T (the default), the top-level metadata under the SLiM/user_metadata key is returned; this is the same metadata that can optionally be supplied to treeSeqOutput() in its metadata parameter, so it makes it easy to recover metadata that you attached to the tree sequence when it was saved.  If userData is F, the entire top-level metadata Dictionary object is returned; this can be useful for examining the values of other keys under the SLiM key, or values inside the top-level dictionary itself that might have been placed there by msprime or other software.

This function can be used to read in parameter values or other saved state (tag property values, for example), in order to resuscitate the complete state of a simulation that was written to a .trees file.  It could be used for more esoteric purposes too, such as to search through .trees files in a directory (with the help of the Eidos function filesAtPath()) to find those files that satisfy some metadata criterion.