# Why is the variance of the Wright-Fisher model not equal p(1-p)/(2N)?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I was looking at the properties of the Binomial probability distribution and it says that the variance is`np(1-p)`. In population genetics, n = 2N. So I would expect to see that the variance is`2Np(1-p)`.

But when looking at the Wright-Fisher model, I often see that the variance is $$p(1-p)/(2N)$$ (see this presentation page 6).

How to derive the variance of the Wright-Fisher model to get this variance from $$Prob{X=i}=frac{n!}{i!(n-i)!}p^i(1-p)^{n-i}$$?

The binomial variance $$2N p (1-p)$$ is for the number of individuals $$n'$$ carrying the allele in the next generation. The frequency of the allele in the next generation is $$p'=n'/(2N)$$, so its variance is $$ext{Var}[p'] = ext{Var}[n'/(2N)] = ext{Var}[n']/(2N)^2 = p(1-p)/(2N).$$

## Variance of number of isolated vertices in random graph \$G(n,p)\$

Suppose we have random graph \$G(n,p)\$ from a uniform distribution with \$n\$ vertices and independently, each edge present with probability \$p\$ . Calculating it's expected number of isolated vertices proves quite easy, chance of single vertex to be isolated is equal to \$(1-p)^\$ , then using linearity of probability, expected number of isolated vertices is equal to \$n imes(1-p)^\$ . However, I am tasked to calculate the variance of this number, or at least decent approximation of it, without any idea how to proceed.

## Variance among demes in migration rate and its effect on FST

Variation in the rate of migration among demes affects FST, measured as the ratio of the probability of identity by descent (i.b.d.) of two alleles randomly chosen from within one deme to the probability of identity by descent of two alleles chosen at random from across the entire metapopulation (T). Following Crow and Kimura (1970, p. 347 see also Hartl and Clark 1989, p. 76), for the i-th deme in the absence of migration, the relationship between FST at one generation, t, and the previous generation, (t𢄡), is FST(t|i) = <(1/2N) + (1 − [1/2N])FST(t𢄡|i)>.

Migrants are assumed to arrive in the i-th deme at random from across the metapopulation at the beginning of a generation as per the protocol of Wade and Goodnight (1991). The proportion of migrants a deme receives at generation t is drawn from a distribution with mean, m, and variance, Vm. When a fraction of the i-th deme consists of migrants, mi, then the expression for the probability of i.b.d. within the i-th demes becomes

assuming there is no i.b.d. among migrants entering deme i or between migrants and residents.

Averaging (1 – mi) 2 over demes, I make use of the fact that the mean migration rate is m and the mean of (mi) 2 is (m 2 + Vm) (e.g., Sved and Latter 1977 or Whitlock 1992). At equilibrium, all demes have the same FST independent of t. We set FST(t|i) = FST(t𢄡|i) = FST, and average over the distribution of m. Setting the terms, (m/N), m 2 , and (Vm/2N), equal to zero, gives the equilibrium FST as

From eq. [2], we can see that, when there is no variance among demes in migration rate (Vm = 0), FST reduces to Wright’s classic formula for island model migration (Wright 1969, eq. [12.3], p. 291), namely, FST

The number of migrants from the i-th deme equals Nmi or Mi. Noting that M. equals Nm and that (VM/N 2 ) equals Vm, the mean crowding of migrants around exporting demes, M*, from Lloyd (1967), equals

Note that M* equals Nm whenever the distribution of migrants is Poisson (i.e., [VM/M.] = 1). It is greater than Nm whenever VM > M., that is, whenever there is interdemic selection.

I define m*, the ‘perceived rate of migration’ owing to the mean crowding of migrants, by dividing the number of migrants, M*, by the local deme size, N. That is,

Substituting eq. [4] into eq. [2], gives

When there is no variance among demes in migrations rate, Vm equals 0 and m* is approximately equal to m. Thus, without interdemic selection, to order m 2 , eq. [5] reduces to Wright’s expression as above.

## Mixture distribution

I bought some balls, all blank. I take a binomial random number generator, configure it with some \$n\$ and \$p\$ , and for each ball I paint the number that I get from the display of the generator. Then I put the balls in a bag and start the process that I described.

In the case that the numbers on the balls are considered random variables (that follow a binomial distribution). Then the frequency distribution for the difference \$X-Y\$ is a mixture distribution where the number of balls in the bag, \$m\$ , plays a role.

The first and second ball that you take from the bag are the same. This situation occurs with probability \$frac<1>\$ . In this case the difference \$vert x-y vert\$ is equal to zero.

The first and second ball are not the same. This situation occurs with probability \$1-frac<1>\$ . In this case the difference \$vert x-y vert\$ is distributed according to the difference of two independent and similar binomial distributed variables.

The above situation could also be considered a compound distribution where you have a parameterized distribution for the difference of two draws from a bag with balls numbered \$x_1, . ,x_m\$ and these parameters \$x_i\$ are themselves distributed according to a binomial distribution.

computational example

Below is an example from a result when 5 balls \$x_1,x_2,x_3,x_4,x_5\$ are placed in a bag and the balls have random numbers on them \$x_i sim N(30,0.6)\$ . The probability for the difference of two balls taken out of that bag is computed by simulating 100 000 of those bags. (note this is not the probability distribution of the outcome for a particular bag which has only at most 11 different outcomes)

This example is an instance of a standard Wright–Fisher model interpretation of drift. I do not mean to imply that this is the only possible type of drift process, and I will consider others in Sect. 4.

In a pure drift model, each individual in the population has an identical expected offspring distribution. In mixed models, drift can be combined with selection, mutation, and other factors.

X denotes the number of individuals of the X trait. X/N gives the frequency of the trait.

The terms in the right-hand side of the equation below denote, from left to right, the number of individuals with trait X, the number of individuals of the alternative trait, and a term accounting for variance in the offspring distribution of an individual with the X trait. The right-most term is used to calculate the variance expected population size, which I will discuss in more detail Section 4. See Der et al. (2011) for a more thorough explanation.

For example, Millstein (2002) and Hodge (1987) argue that drift is indiscriminate sampling (that is, sampling without respect to intrinsic physical differences among individuals). According to Gildenhuys (2009), drift refers to “causal influences over a population” that are “non-interactive, non-pervasive, and indiscriminate causes” (p. 522). Other prominent defenses of the causal theory, such as Shapiro and Sober (2007), Stephens (2004), and Reisman and Forber (2005) do not take an explicit stand on this issue.

Okasha (2006), Abrams (2007), and Clatterbuck et al. (2013) are representative of the single-process view, while Millstein (2002) and Hodge (1987) are representative of the separate-process view.

This is stronger than the nomological necessity constitutive of causal explanations.

For example, Sober (1984, p. 117) argues that the sampling size of a stochastic process is a cause of that process’s outcomes. The probability distribution over outcomes of a series of coin-flips is affected by both the bias of the coin and the number of times you flipped it. See also Sober (2011) for a defense of the claim that there are a priori causal claims in evolutionary biology.

According to some views, drift just is the actual deviation of trait frequencies from expectation (Brandon 2005). On most causal views, (N ) is an indication of drift’s power to cause such deviations.

Notable exceptions are Gildenhuys (2009), Millstein et al. (2009), and Plutynski (2007).

Here, we can model this by supposing that the ball drawing is no longer indiscriminate, and a token yellow ball is twice as likely to be drawn than a token green ball. See Brandon (2005, pp. 157–158) for a similar example.

From Wright, the probability of fixation ( (pi )) of a beneficial allele introduced at frequency 1/2 (N =) 2s/(1 (-) e (^<-4Ns>)) . Notice that as Ns increases, the value of the denominator goes to 1, so 2 (s) will be a good approximation of the probability of fixation for most values of (s) . For a more detailed derivation and discussion, see Kimura (1962, pp. 715–716). I am grateful to a reviewer for helpful comments on this issue.

I am not claiming that philosophers have wholly ignored the distinction between population size and effective population size for instance, Stephens (2004) uses the effective population size as an indicator of the strength of drift. However, few in this debate have paid close attention to the significant causal factors which determine effective population size. Gildenhuys (2009) is a welcome exception.

As Der et al. (2011) show, the Wright–Fisher model, as well as the other models I consider, also obey the formal characterizations of drift described by (Mean) and (Variance). The question of whether there are drift models that exhibit the informal properties of drift while not obeying these formal properties is interesting but beyond the scope of this paper. For my purposes, it suffices to show that there are models which meet the formal requirements which population geneticists use to define drift processes but differ interestingly in their resulting population dynamics.

The probability that green will go to fixation is the probability the population will go through a bottleneck event in that generation (1/N (=) 0.01) times the probability that it will be a green ball that is chosen to populate the next generation (its starting frequency, 0.6).

This is not a universal assumption of causal theorists. For instance, see Stephens (2004) and Filler (2009) for discussions of drift and changes in heterozygosity.

The Eldon–Wakeley model is an extension of the Moran framework. For a more detailed discussion, see Der et al. (2012, p. 1332).

One of their primary arguments is that the statisticalist view cannot adequately capture actual scientific practice since the development of alternative models of drift was motivated by the desire for models that made more realistic biological assumptions than the Wright–Fisher model, which Millstein et al. claim “makes no sense if drift were only a statistical outcome” (p. 6). I agree with their contention that biologists have often conceived of drift as a causal process, yet I do not think this argument is a particularly compelling refutation for statistical theorists who want to substantially revise the way that biologists (as well as philosophers) think about drift.

I will ignore meiotic drive or other segregation distorters.

Filler (2009) argues that drift’s tendency to reduce heterozygosity shows that drift is an evolutionary force.

This does not mean that every gamete will contribute a single descendent to the daughter population. Sampling with replacement allows for the possibility that a gamete could be sampled more than once. However, this condition does ensure that the expected offspring distribution of a gamete is binomial.

The quote from Der et al. (2012) suggests ways of manipulating the probability distribution over offspring number so as to increase skew when mortality rates are high, individuals may increase the number of gametes they produce in a given breeding episode, so we could intervene on offspring distributions by changing the environment in a way that increases mortality rates.

The proof of this is simple. In ((1- 1/2N)) of the generations, the probability of a jump from 0.5 (A) to (>) 0.8 (A) or 0.5 (a) to (>) 0.8 (a) is given by the binomial equation, but in 1/2 (N) of the generations, the probability of such a jump is 1, which by necessity makes the probability of a jump greater than it would be in a binomial process. The exact probability of such a jump will depend on the value of (lambda ) , (N) , and the frequency of (A ) and (a) when a replacement event occurs. The probability of fixation of a selectively favored allele will also increase.

This is another way of saying that drift is “indiscriminate”. See Millstein (2002) for a more thorough explication of the concept.

The question of whether we should be realists or instrumentalists about models is somewhat orthogonal to this debate. Antirealists will argue that it is a mistake to go beyond the models to make any judgments whatsoever about ontology. I will not respond to this argument since none of the parties to the debate in question are antirealists. Instead, here I defend the more modest claim that if we are engaged in the practice of making ontological judgments in science, then we should not read them directly off of mathematical models.

Though see Lange and Rosenberg (2011) for a defense of the latter claim’s plausibility.

Another case of a methodological device in population genetics being given an erroneous ontological interpretation is discussed in Clatterbuck et al. (2013).

While an exploration of the empirical ramifications of alternative drift models is beyond the scope of this paper, I suspect that the work of Der et al. will be fruitful in making more accurate predictions about drift. For instance, debates over the plausibility of Wright’s Shifting Balance Theory have appealed to features of Wright–Fisherian drift—such as the probability of fixation of a novel mutant in a small population, time until fixation of neutral alleles, and the interaction of selection and drift—which may be different if the populations are undergoing other types of drift (for an overview of the debate, see Coyne et al. (1997)).

Let us recall the unfortunate polysemy of the word “drift”. In the biological literature “genetic drift” corresponds to the noise-induced variations. When using a stochastic model, this is at odds with the “drift” of a diffusion, i.e. the first order term that models a deterministic force.

Barbour AD, Holst L, Janson S (1992) Poisson approximation. Oxford studies in probability, vol 2. The Clarendon Press/Oxford University Press/Oxford Science Publications, New York

Champagnat N, Ferriere R, Meleard S (2006) Unifying evolutionary dynamics: from individual stochastic processes to macroscopic models. Theor Popul Biol 69(3):297–321

Champagnat N, Ferrière R, Méléard S (2008) From individual stochastic processes to macroscopic models in adaptive evolution. Stoch Models 24(suppl. 1):2–44

Chalub FACC, Souza MO (2009) A non-standard evolution problem arising in population genetics. Commun Math Sci 7(2):489–502

Chalub FACC, Souza MO (2014) The frequency-dependent Wright–Fisher model: diffusive and non-diffusive approximations. J Math Biol 68(5):1089–1133

Damiens D, Boivin G (2006) Why do sperm-depleted parasitoid males continue to mate? Behav Ecol 17(1):138–143

Dionisio F (2007) Selfish and spiteful behaviour through parasites and pathogens. Evol Ecol Res 9(7):1199–1210

Durrett R (1996) Stochastic calculus: a practical introduction. Probability and stochastics series. CRC Press INC, Boca Raton

Durrett R (2008) Probability models for DNA sequence evolution, 2nd edn. Probability and its applications (New York). Springer, New York

Ethier SN, Kurtz TG (1986) Markov processes: characterization and convergence. Wiley series in probability and mathematical atatistics: probability and mathematical statistics. Wiley, New York

Etheridge A (2011) Some mathematical models from population genetics. Lecture notes in mathematics, vol 2012. Springer, Heidelberg. (Lectures from the 39th Probability Summer School held in Saint-Flour, 2009)

Ewens WJ (2004) Mathematical population genetics. I: theoretical introduction, 2nd edn. Interdisciplinary applied mathematics, vol 27. Springer, New York

Foster KR, Wenseleers T, Ratnieks FLW (2001) Spite: Hamilton’s unproven theory. Ann Zool Fenn 38(3–4):229–238

Gillespie JH (1974) Nautural selection for within-generation variance in offspring number. Genetics 76(3):601–606

Gillespie JH (1975) Natural selection for within-generation variance in offspring number II. Discrite haploid models. Genetics 81(2):403–413

Grimmett GR, Stirzaker DR (2001) Probability and random processes, 3rd edn. Oxford University Press, New York

Hamilton WD (1970) Selfish and spiteful behaviour in an evolutionary model. Nature 228(5277):1218–1220

Janson S (1994) Large deviation inequalities for sums of indicator variables. Technical Report No. 34, Department of Mathematics, Uppsala University

Joag-Dev K, Proschan F (1983) Negative association of random variables, with applications. Ann Stat 11(1):286–295

Lessard S (2005) Long-term stability from fixation probabilities in finite populations: new perspectives for ESS theory. Theor Popul Biol 68(1):19–27

McKane AJ, Waxman D (2007) Singular solutions of the diffusion equation of population genetics. J Theor Biol 247(4):849–858

Rice WR (1996) Sexually antagonistic male adaptation triggered by experimental arrest of female evolution. Nature 381(6579):232–234

Ross N (2011) Fundamentals of Stein’s method. Probab Surv 8:210–293

Radhakrishnan P, Pérez-Staples D, Weldon CW, Taylor PW (2009) Multiple mating and sperm depletion in male Queensland fruit flies: effects on female remating behaviour. Anim Behav 78(4):839–846

Shpak M (2007) Selection against demographic stochasticity in age-structured populations. Genetics 177(4):2181–2194

Steiner S, Henrich N, Ruther J (2008) Mating with sperm-depleted males does not increase female mating frequency in the parasitoid Lariophagus distinguendus. Entomol Exp Appl 126(2):131–137

Taylor JE (2009) The genealogical consequences of fecundity variance polymorphism. Genetics 182(3):813–837

Waxman D (2011) Comparison and content of the Wright–Fisher model of random genetic drift, the diffusion approximation, and an intermediate model. J Theor Biol 269:79–87

## Methods

We start by assuming we have a data set of |\$N\$| aligned molecular sequences |\$ extbf=( extbf_1,dots, extbf_N)\$| along with |\$N\$| associated values |\$ extbf=( extbf_1,dots, extbf_N)\$| of an |\$M\$| -dimensional, continuously varying trait. The different coordinates of the “trait” may in fact represent several different phenotypes, but for simplicity, and without loss of generality, we consider it a single multidimensional trait. The sequence and trait data correspond to the |\$N\$| tips of an unknown yet estimable phylogenetic tree |\$ au\$|⁠ . Later we will discuss accounting for phylogenetic uncertainty, modeling the molecular evolution process giving rise to |\$ extbf\$| and integrating it with a model for trait evolution. But first, we explore trait evolution on a fixed phylogeny via a diffusion process acting conditionally independently along its branches.

The |\$N\$| -tipped bifurcating phylogenetic tree |\$ au\$| is a graph with a set of vertices |\$mathcal = (mathcal_1,dots,mathcal_<2N-1>)\$| and edge weights |\$mathcal=(t_1,dots,t_<2N-2>)\$|⁠ . The vertices correspond to nodes of the tree and, as the length of the tree |\$ au\$| is measured in units of time, |\$mathcal\$| consists of times corresponding to branch lengths. Each external node (tree tip) |\$mathcal_i\$| for |\$i = 1, dots , N\$| is of degree 1, with one parent node |\$mathcal_\$| from within the internal or root nodes. Each internal node |\$mathcal_i\$| for |\$i = N+1, dots , 2N-2\$| is of degree 3 and the root node |\$mathcal_<2N-1>\$| is of degree 2. An edge with weight |\$t_i\$| connects |\$mathcal_i\$| to |\$mathcal_\$|⁠ , and we refer to this edge as branch |\$i\$|⁠ . In addition to the observed trait values |\$ extbf_1,dots, extbf_N\$| at the external nodes, we posit for mathematical convenience unobserved trait values |\$ extbf_,dots, extbf_<2N-1>\$| at the internal nodes and root.

Brownian diffusion (also known as a Wiener process or unbiased random walk) is a continuous-time stochastic process originally developed to model the random motion of a physical particle ( Brown 1828 Wiener 1958). For a multivariate Brownian diffusion process |\$ extbf(t)\$|⁠ , the increment |\$ extbf(t_2) - extbf(t_1)\$| of the process starting at time |\$t_1\$| and ending at time |\$t_2 ge t_1\$| is multivariate normally distributed with mean |\$ extbf<0>\$| and variance |\$(t_2-t_1) extbf\$|⁠ , where |\$ extbf\$| is an |\$M imes M\$| identity matrix. The process is time-homogeneous in that the variance depends only on time differences and not on actual time points. Brownian diffusion is also characterized by independent increments: if |\$t_1 < t_2 leq t_3 < t_4\$|⁠ , then the displacements |\$ extbf(t_2) - extbf(t_1)\$| and |\$ extbf(t_4) - extbf(t_3)\$| are independent.

Recent phylogenetic comparative methods ( Felsenstein 1988 Revell and Harmon 2008 Vrancken et al. 2015) aim to model the correlated evolution between multiple traits and, to this end, employ a correlated multivariate Brownian diffusion with displacement variance |\$(t_2-t_1) extbf

^<-1>\$| and displacement mean |\$ extbf<0>\$|⁠ . Here, |\$ extbf

\$| is an |\$M imes M\$| infinitesimal precision matrix that determines the intensity and correlation of the trait diffusion after controlling for shared evolutionary history. Recall that in our development the different coordinates of an |\$M\$| -dimensional “trait” may in fact represent different traits, in which case the correlation between traits can be recovered from the appropriate coordinates of |\$(t_2-t_1) extbf

^<-1>\$|⁠ . The displacement mean of |\$ extbf<0>\$| posits that the traits do not evolve according to any systematic directional trend.

The Brownian diffusion process along a phylogeny produces the observed trait values by starting at the root node and proceeding down the branches of |\$ au\$|⁠ . The displacement |\$ extbf_i - extbf_\$| along a branch is multivariate normally distributed, centered at |\$ extbf<0>\$| with variance |\$t_i extbf

^<-1>\$| proportional to the length of the branch. Therefore, conditioning on the trait value |\$ extbf_\$| at the parent node, we have

Throughout this article, we refer to this correlated standard Brownian diffusion model for phylogenetic trait evolution as simply the random walk (RW) model. An extension that introduces branch-specific mixing parameters |\$ u_i\$| into the process that rescale |\$t_i mapsto u_i t_i\$| yields a mixture of Brownian processes and remains popular in phylogeography ( Lemey et al. 2010).

### Trends

Incorporating a nontrivial displacement mean into the diffusion process is beneficial in several ways. First, we can estimate and quantify directional trends. More importantly, it enables inference of aspects of the evolutionary process that may be poorly approximated or completely unaccounted for by standard Brownian diffusion. More accurate modeling of trait evolution dynamics opens the door to better ancestral trait reconstructions which can, for example, have important implications for elucidating the origin and spread of viral epidemics and ultimately improving disease surveillance and outbreak management ( Woolhouse et al. 2015).

## ASSESSING THE MAXIMUM-LIKELIHOOD ESTIMATOR

Simulation methods: We compared the maximum-likelihood estimator to the F-statistic estimator with simulation tests. As noted earlier, these two estimators do not always estimate the same quantity. The F-statistic method estimates the variance effective size and the maximum-likelihood approach described above estimates the parameter N of a Wright-Fisher population. In these simulation tests the two estimators estimated the same quantity because our simulated populations evolved according to the Wright-Fisher model. In each simulated replicate we selected starting frequencies for each locus from a uniform distribution. The simulated populations consisted of 50 alleles (25 diploid individuals). We sampled 100 alleles (50 diploid individuals) from the population, with replacement, according to three different sampling regimes (see Table 1). For each replicate we computed both maximum-likelihood and F-statistic estimates of the population size. The F-statistic was computed according to F ^ k = 2 ( ( p 0 , A − p t , A ) 2 ( p 0 , A + p t , A ) + ( p 0 , a − p t , a ) 2 ( p 0 , a + p t , a ) ) (7) from W aples (1989a, Equation 9), where 0,A, t,A, 0,a, and t,a are the allele frequencies of allele types A and a in the samples taken at generations 0 and t, respectively. The variance effective size was then computed according to N ^ = t 2 [ F ^ k − ( 1 ∕ n 0 ) − ( 1 ∕ n t ) ] (8) also from W aples (1989a, Equation 11), where 0 and t are the total number of alleles sampled at generations 0 and t.

When there were three sampling events we used the harmonic mean of the estimates for the two intervals. This estimator was derived by pooling the two F-statistic estimates and using this pooled value to compute N. P ollak (1983) developed similar estimators for population sampling without replacement (hypergeometric). In the examples we considered, the three estimators derived by P ollak (1983) are the same as the harmonic mean estimator except larger by multiplicative constants. Because in our simulation tests the F-statistic estimator was biased upward, the Pollak estimators of N were less accurate than our harmonic mean estimator and are not shown.

Results from simulation tests in haploid numbers

In all replicates, the number of sampled individuals in each sampling event was twice as large as the size of the simulated population. When sample sizes are small, sampling error rather than genetic drift may be the primary reason for observed changes in allele frequencies (W aples 1989b). We avoided this problem by assuming large samples could be taken from the population. Fortunately, it is frequently possible to sample large numbers of individuals even when the number of breeding individuals is low. Often, juveniles are sampled from populations with high juvenile mortality, and the number of breeding adults may be orders of magnitude lower than the number of juveniles. Thus, our sampling scheme is representative of those found in the literature (W aples 1990 W aples and T eel 1990 H edgecock et al. 1992 H usband and B arrett 1992a,b J ordan et al. 1992 L essios et al. 1994 H edrick et al. 1995 B urczyk 1996 J orde and R yman 1996 R ichards and L eberg 1996 M iller and K apuscinski 1997 S cribner et al. 1997 D allas et al. 1998 L aikre et al. 1998 L ehmann et al. 1998).

In an application to real data rather than a simulation test, the F-statistic or the maximum-likelihood estimate can always be computed, but the estimate may be infinity or very large. In our simulation tests, to reduce running time, the program halted all searches for the maximum of the likelihood curve if the maximum-likelihood estimate of population size was determined to be >500. We chose 500 because it is an order of magnitude larger than the true population size, 50. To be fair to both estimators, we discarded all simulation replicates in which either estimator was >500. For each estimator, we computed the mean and standard deviation for the estimates of N using the replicates that were not discarded. As a separate statistic we recorded the number of times each estimator was >500. The number of discarded replicates was the union of these events. These simulation tests were computationally intensive, so we ran many more replicates for the cases in which there were 5, 10, or 15 loci as these are the most realistic values for applications of this method to real data.

Simulation results: Both estimators tended to overestimate population size and had a high variance for samples with few loci. This upward bias and large variance of the estimators would have been much greater if we had not discarded replicates. However, in all of our simulation tests, the mean maximum-likelihood estimate of population size among the replicates that were not discarded was closer to the simulated population size than was the mean F-statistic estimate (Table 1). Similarly, the variance in the likelihood estimates among these replicates was smaller than the variance in F-statistic estimates (Table 1). Also, the F-statistic estimate was more than an order of magnitude larger than the true value about twice as often as the maximum-likelihood estimate (Table 1). The estimates were positively correlated and the F-statistic estimate was generally >500 whenever the maximum-likelihood estimator was >500. We had similar results when we ran simulation tests with initial allele frequencies drawn from beta distributions with varying parameters (not shown). Thus the maximum-likelihood estimate seems robust to violations of our assumption that allele frequencies are drawn initially from a uniform distribution.

As expected, increasing the number of loci or the number of sampled time-points reduced both the variance and the bias in both estimators (Table 1). When the number of markers was very large, on the order of 100 to 200 loci, then both estimators performed very well. The difference between the two estimators was more obvious with samples of fewer markers. Increasing the number of sampling times improved the estimates for both methods (Table 1). The dramatic improvement in both estimators with multiple sampling (Table 1) may be explained by the initial rapid increase in accuracy in both estimators, particularly the F-statistic estimator, as the number of sampled loci increases. Having samples covering two time intervals of the same length is roughly equivalent to sampling twice as many loci over one time interval.

As \$m = k imes n\$, we can look at this in terms of \$k\$ and \$n\$ instead of \$n\$ and \$m\$. Let's say \$T_i\$ is the time it takes the \$i\$-th processor to finish its work.

As \$n\$ grows, the probability that \$T_i\$ = \$5k\$ (the processor was assigned only \$T=5\$ tasks) for some \$i\$ approaches \$1\$, so makespan being defined as \$mathrm(T_i)\$, \$E[M]\$ approaches \$5k\$.

For the second scenario this is \$4k\$ so increasing the number of processors makes the 4–2 split better.

What about \$k\$ — increasing the number of tasks per processor? Increasing \$k\$ has the opposite effect, it makes it less likely to have a processor with an unlucky set of tasks. I'm going home now but I'll come back to this later. My "hunch" is that as \$k\$ grows, the difference in \$E[M]\$ between the 4–2 split and the 5­­­–1 split disappears, \$E[M]\$ becomes the same for both. So I would assume that 4–2 is always better except maybe for some special cases (very small specific values of \$k\$ and \$n\$), if even that.

• Lower variance is better, all else being equal.
• As the number of processors grows, lower variance becomes more important.
• As the number of tasks per processor grows, lower variance becomes less important.

I find that heuristic arguments are often quite misleading when considering task scheduling (and closely related problems like bin packing). Things can happen that are counter-intuitive. For such a simple case, it is worthwhile actually doing the probability theory.

Let \$n = km\$ with \$k\$ a positive integer. Suppose \$T_\$ is the time taken to complete the \$j\$-th task given to processor \$i\$. This is a random variable with mean \$mu\$ and variance \$sigma^2\$. The expected makespan in the first case is \$ E[M] = E[max left^k T_ mid i=1,2,dots,m ight>]. \$ The sums are all iid with mean \$kmu\$ and variance \$ksigma^2\$, assuming that \$T_\$ are all iid (this is stronger than pairwise independence).

Now to obtain the expectation of a maximum, one either needs more information about the distribution, or one has to settle for distribution-free bounds, such as:

• Peter J. Downey, Distribution-free bounds on the expectation of the maximum with scheduling applications, Operations Research Letters 9, 189–201, 1990. doi:10.1016/0167-6377(90)90018-Z

which can be applied if the processor-wise sums are iid. This would not necessarily be the case if the underlying times were just pairwise independent. In particular, by Theorem 1 the expected makespan is bounded above by \$ E[M] le kmu + sigmasqrtfrac>. \$ Downey also gives a particular distribution achieving this bound, although the distribution changes as \$n\$ does, and is not exactly natural.

Note that the bound says that the expected makespan can increase as any of the parameters increase: the variance \$sigma^2\$, the number of processors \$n\$, or the number of tasks per processor \$k\$.

For your second question, the low-variance scenario resulting in a larger makespan seems to be an unlikely outcome of a thought experiment. Let \$X = max_^m X_i\$ denote the makespan for the first distribution, and \$Y = max_^m Y_i\$ for the second (with all other parameters the same). Here \$X_i\$ and \$Y_i\$ denote the sums of \$k\$ task durations corresponding to processor \$i\$ under the two distributions. For all \$x ge kmu\$, independence yields \$ Pr[X le x] = prod_^m Pr[X_i le x] le prod_^m Pr[Y_i le x] = Pr[Y le x]. \$ Since most of the mass of the probability distribution of the maximum will be above its mean, \$E[X]\$ will therefore tend to be larger than \$E[Y]\$. This is not a completely rigorous answer, but in short, the second case seems preferable.

## Acknowledgments

Thanks to Valeriano Iranzo, Silvia Martínez, Jesús Alcolea, Andrés Moya, Manuel Serra, Jun Otsuka, Andy Gardner and an anonymous referee for useful comments on an earlier version of this paper. Special thanks to Andy Gardner and Samir Okasha for clarify me their position on the Price equation via personal communication. I also wish to thank Sean H. Rice for his insightful work and for clarify me his view on the Price equation and axiomatic theories via personal communication. Thanks to Bruce Walsh for sending me his great draft on the Price equation. I am grateful to Vicent Picó for providing me with insightful feedback on Newtonian mechanics and key concepts on physics, and also on previous drafts of this paper. I am also grateful to Jesús Alcolea for enlightening discussions on logic and mathematics.