Estimating TMRCA
and mutation rates for the phase 3 Y chromosome STR clusters via ASD estimates
John McEwan
11th
March 2005
Disclaimer
These analyses are being undertaken to investigate the
history of the R1b clusters, no claims are made about these clusters nor the
methods used. They are provided for information purposes only. Specifically,
many of the R1b clusters are not well supported, in addition genealogical and anthro-genealogical
STR mutation estimates are a hotly disputed area.
Background
The phase 3 clustering was done using a distance measure Da that optimized
identification of branches, but is not linear with branch age (Takezaki &
Nei 1996). A distance measure which is approximately linear with age is based
on the ASD measure (Goldstein et al 1995) and is commonly used in many forms of
“model free” estimates of haplotype age and
TMRCA estimates (Stumpf & Goldstein 2001; Zhivotovsky et al 2004). However,
there has been considerable controversy about whether individual loci mutate at
different rates, the actual mutation rate estimates, and the impact of
population size and expansion on these estimates. In addition it appears that “effective”
mutation rates (estimated on the basis of known divergence dates) are lower
than estimates derived from observed mutation rates. The exact reasons are
still being investigated but it appears that this is a consequence of a
deviation from the neutral model.
In this analysis I am concentrating on the R1b clusters, individuals in
these clusters are known to have expanded dramatically since the LGM and
populations have typically grown at 3.5% per annum over that period. In
addition I am using “effective” mutation rate derived from Zhivotovsky et al.
(2004), but corrected for the different markers used.
Regards ASD calculations themselves the equation is best described by
Stumpf and Goldstein (2001)
In the current context, The
ASD was calculated for the various clusters defined in the phase 3 analysis,
average ASD’s were calculated for all clusters (thereby providing an estimate
of the clusters ) relative TMRCA and actual ages in years were estimated based
on corrected mutation rates using the value (0.0007/generation) and generation
length (25 years) from Zhivotovsky (2004). In addition the average ASD for all
clusters was calculated for each marker thereby providing an estimate of
individual marker mutation rates. These are also tabulated.
Results
Estimates of the TMRCA for the
various STR derived clusters
The results are presented in
table 1 below and this has a link to the complete table as well.
In the current context it
must be remembered that these are estimates of the TMRCA for the cluster, they
do not provide estimates of the time since divergence of various clusters,
especially where these correspond to haplogroups. This is because narrow
bottlenecks especially during the last ice age may mean the last common
ancestor of a group as distinct from when the SNP derived haplogroup mutation
occurred may differ. Similarly, the estimates are subject to considerable
error. Perhaps what is surprising is the age estimates for the TMRCA of 7390 (1178)
for R1b when it is known that the R1b1c SNP must have occurred before the start
of the LGM approximately 20 Kyr bp. In part this has to do with sampling,
because most individuals will be from the Iberian LGM refugia which rapidly
expanded from a small group after the end of the last ice age about 9000 yrs bp.
However, a further factor to be born in
mind is the “effective” mutation rate used. It may still be too high.
R1bSTR19 aka North West Irish, or IMH
Perhaps the more surprising
is the estimates fro the “well established” R1b STR clusters, R1bSTR19,
R1bSTR22 and R1bSTR47. The Irish cluster has a particularly recent TMRCA of 3362
(SEM=609) yrs bp, and its best estimate is rather more recent than both the
Scots and Irish clusters. The cluster itself is quite distinct from R1b, so
this could infer that this group must have rapidly expanded from a small group
around that time. Based on its current geographical distribution it is thought
that this group represents the original hunter/gatherer population that
inhabited Ireland after the LGM (~9000 yrs bp). The expansion may have been due
to the rapid growth in population after the introduction of agriculture (~6500
yrs bp), or after metal working began ~4500 yrs bp when Beaker-ware also made
its appearance. This cultural marker was most probably associated with the
introduction of, horses and the Celtic language as well. More recent cultural
events may also have contributed.
R1bSTR22 aka Frisian or Germanic
This cluster is exclusively associated
with the S21+ SNP, although the SNP mutation itself is much older. It is
thought to have originated in North Eastern Germany, although its centre of
origin is disputed. Its estimated age of 5175 (SEM=813) yrs bp coincides best
with the cultural changes that occurred around the time of the Kurgan expansion
reached northern Europe.
R1bSTR47 aka Scots
At 4686 (SEM=903) yrs bp
this group is intermediate between the Irish and Scots, but again may be
related to a cultural shift with the introduction of the horse and altered
agriculture. Again this group is thought to represent the indigenous hunter/gatherer
inhabitants of Britain who settled after the LGM.
Perhaps another facet that
is noticeable is most of the other R1b clusters are typically more diverse and
have older TMRCA.
In summary, these estimates
are presented as an independent chronological assessment of their ages. They do
not provide estimates of their split from other groups (typically much older) rather
the date of their initial rapid expansion. Observation suggests that the most
important factor affecting the estimates is the “effective” mutation rate used.
Relative mutation rates of the
various markers in the 37 FTDNA panel.
This is a fraught exercise
but Table 2 is provided for consideration. The equations above suggest that the
relative rates can be estimated via ASD, and for the sake of completeness these
have been converted to mutation rates via Zhivotovsky et al. (2004) estimate. They
are provided simply for comparison and no claims are made for their accuracy.
For instance they are not weighted for the number of observations in each
cluster (each is independent so the square root of numbers is an appropriate
weighting factor). What is apparent is the rather dramatic difference in relative
rate for individual markers.
Table 1.
Phase 3 analysis cluster ASD estimates for each marker and average ASD and
TMRCA estimates and relative ASD estimates for each marker. (Full table available at this link)
37 STR |
n |
aveASD |
SEM(ASD) |
TMRCA
years1 |
SEM(years) |
AB |
9 |
1.727 |
0.322 |
29512 |
5499 |
E |
143 |
1.075 |
0.182 |
18361 |
3111 |
F |
12 |
4.322 |
1.524 |
73856 |
26040 |
G |
85 |
0.666 |
0.135 |
11385 |
2311 |
HO3 |
3 |
0.925 |
0.265 |
15805 |
4522 |
I |
800 |
0.769 |
0.109 |
13147 |
1868 |
I1a |
550 |
0.375 |
0.074 |
6401 |
1259 |
I1b |
65 |
1.231 |
0.344 |
21042 |
5882 |
I1c |
168 |
0.656 |
0.157 |
11214 |
2681 |
I? |
19 |
0.936 |
0.154 |
15992 |
2635 |
Ix |
17 |
0.451 |
0.091 |
7703 |
1557 |
K2 |
17 |
1.076 |
0.196 |
18394 |
3353 |
N |
8 |
0.410 |
0.086 |
7007 |
1472 |
Q |
24 |
1.021 |
0.168 |
17440 |
2866 |
R1a |
196 |
0.517 |
0.078 |
8830 |
1341 |
R1b |
2553 |
0.432 |
0.069 |
7390 |
1178 |
|
|
|
|
|
|
E |
143 |
1.075 |
0.182 |
18361 |
3111 |
E3b |
105 |
1.037 |
0.209 |
17721 |
3575 |
E3bSTR1 |
51 |
1.270 |
0.297 |
21708 |
5070 |
E3bSTR2 |
46 |
0.434 |
0.091 |
7419 |
1563 |
E3bSTR3 |
8 |
0.525 |
0.115 |
8977 |
1959 |
E3a |
38 |
0.594 |
0.100 |
10156 |
1703 |
|
|
|
|
|
|
G |
85 |
0.666 |
0.135 |
11385 |
2311 |
GG2 |
56 |
0.465 |
0.111 |
7947 |
1900 |
Gx |
7 |
0.231 |
0.056 |
3940 |
951 |
GG2STR2 |
22 |
0.795 |
0.160 |
13587 |
2727 |
GG2(Fx) |
8 |
0.248 |
0.139 |
4236 |
2381 |
|
|
|
|
|
|
I |
800 |
0.769 |
0.109 |
13147 |
1868 |
I1c |
168 |
0.656 |
0.157 |
11214 |
2681 |
I1cSTR1 |
103 |
0.467 |
0.118 |
7979 |
2023 |
IslesI1c |
33 |
0.277 |
0.061 |
4736 |
1043 |
I1cSTR2 |
7 |
0.371 |
0.108 |
6334 |
1845 |
RootsI1c |
18 |
0.430 |
0.081 |
7344 |
1376 |
I1cSTR3 |
7 |
0.428 |
0.138 |
7314 |
2360 |
I1b |
65 |
1.231 |
0.344 |
21042 |
5882 |
I1bSTR1 |
39 |
0.552 |
0.095 |
9433 |
1631 |
WesternI1b |
16 |
1.324 |
0.919 |
22618 |
15699 |
I1b2 |
10 |
0.995 |
0.602 |
16996 |
10289 |
I1a |
550 |
0.375 |
0.074 |
6401 |
1259 |
I1aSTR1 |
42 |
0.378 |
0.087 |
6453 |
1492 |
I1aSTR2 |
56 |
0.344 |
0.071 |
5879 |
1210 |
I1aSTR3 |
37 |
0.278 |
0.076 |
4755 |
1304 |
I1aSTR4 |
69 |
0.269 |
0.062 |
4591 |
1052 |
I1aSTR5 |
71 |
0.285 |
0.081 |
4874 |
1376 |
I1aSTR6 |
59 |
0.317 |
0.050 |
5409 |
857 |
I1aSTR7 |
94 |
0.293 |
0.068 |
5008 |
1167 |
I1aSTR8 |
47 |
0.352 |
0.098 |
6007 |
1668 |
I1aSTR9 |
20 |
0.382 |
0.095 |
6529 |
1618 |
I1aSTR10 |
55 |
0.193 |
0.045 |
3292 |
761 |
Ix |
17 |
0.451 |
0.091 |
7703 |
1557 |
I? |
19 |
0.936 |
0.154 |
15992 |
2635 |
|
|
|
|
|
|
J |
119 |
1.114 |
0.192 |
19030 |
3275 |
J2x |
18 |
0.360 |
0.078 |
6151 |
1329 |
J1 |
21 |
1.161 |
0.339 |
19848 |
5796 |
J2e |
19 |
0.362 |
0.069 |
6192 |
1173 |
J2STR4 |
8 |
0.812 |
0.165 |
13877 |
2816 |
J2 |
43 |
0.766 |
0.135 |
13081 |
2312 |
|
|
|
|
|
|
R1a |
196 |
0.517 |
0.078 |
8830 |
1341 |
R1b |
2553 |
0.432 |
0.069 |
7390 |
1178 |
R1bSTR1 |
57 |
0.340 |
0.071 |
5805 |
1210 |
R1bSTR2 |
56 |
0.380 |
0.090 |
6487 |
1544 |
R1bSTR3 |
53 |
0.236 |
0.038 |
4035 |
647 |
R1bSTR4 |
15 |
0.474 |
0.151 |
8108 |
2585 |
R1bSTR5 |
17 |
0.318 |
0.088 |
5437 |
1506 |
R1bSTR6 |
36 |
0.465 |
0.105 |
7939 |
1787 |
R1bSTR7 |
69 |
0.306 |
0.057 |
5222 |
978 |
R1bSTR8 |
29 |
0.481 |
0.133 |
8223 |
2273 |
R1bSTR9 |
57 |
0.380 |
0.118 |
6500 |
2023 |
R1bSTR10 |
60 |
0.309 |
0.095 |
5281 |
1622 |
R1bSTR11 |
53 |
0.288 |
0.079 |
4917 |
1343 |
R1bSTR12 |
41 |
0.378 |
0.098 |
6453 |
1667 |
R1bSTR13 |
14 |
0.302 |
0.093 |
5160 |
1593 |
R1bSTR14 |
43 |
0.333 |
0.071 |
5698 |
1220 |
R1bSTR15 |
60 |
0.308 |
0.058 |
5264 |
987 |
R1bSTR16 |
39 |
0.356 |
0.086 |
6089 |
1466 |
R1bSTR17 |
57 |
0.406 |
0.101 |
6943 |
1734 |
R1bSTR18 |
37 |
0.328 |
0.044 |
5613 |
750 |
R1bSTR19Irish |
184 |
0.197 |
0.036 |
3362 |
609 |
R1bSTR20 |
37 |
0.331 |
0.061 |
5654 |
1049 |
R1bSTR21 |
26 |
0.319 |
0.060 |
5459 |
1019 |
R1bSTR22Frisian |
117 |
0.303 |
0.048 |
5175 |
813 |
R1bSTR23 |
27 |
0.516 |
0.202 |
8811 |
3451 |
R1bSTR24 |
64 |
0.485 |
0.085 |
8296 |
1448 |
R1bSTR25 |
82 |
0.375 |
0.078 |
6410 |
1341 |
R1bSTR25a |
35 |
0.355 |
0.079 |
6060 |
1354 |
R1bSTR26 |
26 |
0.494 |
0.160 |
8439 |
2741 |
R1bSTR27 |
55 |
0.345 |
0.066 |
5899 |
1124 |
R1bSTR28 |
56 |
0.471 |
0.068 |
8044 |
1154 |
R1bSTR29 |
19 |
0.308 |
0.083 |
5256 |
1416 |
R1bSTR30 |
33 |
0.341 |
0.068 |
5828 |
1163 |
R1bSTR31 |
9 |
0.394 |
0.096 |
6739 |
1646 |
R1bSTR32 |
43 |
0.379 |
0.063 |
6482 |
1084 |
R1bSTR33 |
21 |
0.286 |
0.061 |
4887 |
1049 |
R1bSTR34 |
45 |
0.385 |
0.064 |
6572 |
1092 |
R1bSTR35 |
27 |
0.385 |
0.092 |
6575 |
1578 |
R1bSTR36 |
39 |
0.369 |
0.072 |
6304 |
1228 |
R1bSTR37 |
64 |
0.389 |
0.070 |
6649 |
1201 |
R1bSTR38 |
33 |
0.300 |
0.075 |
5132 |
1277 |
R1bSTR39 |
69 |
0.429 |
0.078 |
7331 |
1329 |
R1bSTR40 |
65 |
0.309 |
0.078 |
5273 |
1338 |
R1bSTR41 |
31 |
0.374 |
0.076 |
6399 |
1302 |
R1bSTR42 |
79 |
0.422 |
0.079 |
7213 |
1348 |
R1bSTR43 |
97 |
0.391 |
0.062 |
6675 |
1059 |
R1bSTR44 |
91 |
0.289 |
0.046 |
4941 |
779 |
R1bSTR45 |
38 |
0.344 |
0.064 |
5870 |
1093 |
R1bSTR46 |
25 |
0.326 |
0.064 |
5579 |
1098 |
R1bSTR47Scots |
133 |
0.274 |
0.053 |
4686 |
903 |
R1bSTR48 |
38 |
0.505 |
0.116 |
8623 |
1974 |
R1bSTR49 |
52 |
0.378 |
0.071 |
6454 |
1207 |
|
|
|
|
|
|
|
|
|
|
|
|
1years were estimated by deviding
ASD by the average generation length (25 yrs and the "corrected"
estimated mutation rate) |
|||||
The
corrected mutation rate was derived as the average of Zhivotosky (2004) markers
and his estimated mutation rate of 0.0007/25 year generation |
|||||
scaled
for the additional markers used in this analysis (DYS461 was assumed
equivalent to the average) =2.09*0.0007 |
Table 2. Relative ASD
values and estimated mutation rates for the 37 FTDNA panel. The mutation estimates
are derived from Zhivotovsky et al. (2004) value derived from the shaded
markers.
Marker |
mean ASD |
SEM ASD |
mutation est |
SEM mutation |
393 |
0.158 |
0.019 |
0.000452 |
0.000055 |
390 |
0.354 |
0.045 |
0.001013 |
0.000129 |
19 |
0.203 |
0.024 |
0.000582 |
0.000069 |
391 |
0.190 |
0.018 |
0.000544 |
0.000052 |
385a |
0.503 |
0.078 |
0.001442 |
0.000222 |
385b |
0.917 |
0.164 |
0.002627 |
0.000469 |
426 |
0.044 |
0.014 |
0.000126 |
0.000040 |
388 |
0.186 |
0.048 |
0.000534 |
0.000137 |
439 |
0.404 |
0.036 |
0.001158 |
0.000103 |
389i |
0.242 |
0.029 |
0.000693 |
0.000082 |
392 |
0.135 |
0.033 |
0.000388 |
0.000094 |
389ii* |
0.326 |
0.042 |
0.000935 |
0.000121 |
458 |
0.819 |
0.060 |
0.002347 |
0.000171 |
459a |
0.111 |
0.020 |
0.000317 |
0.000057 |
459b |
0.127 |
0.018 |
0.000364 |
0.000050 |
455 |
0.052 |
0.018 |
0.000148 |
0.000052 |
454 |
0.057 |
0.017 |
0.000163 |
0.000047 |
447 |
0.733 |
0.136 |
0.002099 |
0.000391 |
437 |
0.193 |
0.069 |
0.000552 |
0.000198 |
448 |
0.248 |
0.041 |
0.000710 |
0.000118 |
449 |
1.318 |
0.129 |
0.003777 |
0.000369 |
464a |
0.447 |
0.080 |
0.001280 |
0.000229 |
464b |
0.468 |
0.058 |
0.001340 |
0.000166 |
464c |
0.424 |
0.036 |
0.001215 |
0.000104 |
464d |
0.370 |
0.040 |
0.001061 |
0.000115 |
460 |
0.248 |
0.020 |
0.000710 |
0.000057 |
H4 |
0.221 |
0.026 |
0.000633 |
0.000075 |
YCAIIa |
0.557 |
0.251 |
0.001595 |
0.000718 |
YCAiib |
0.497 |
0.077 |
0.001423 |
0.000221 |
456 |
0.737 |
0.146 |
0.002111 |
0.000419 |
607 |
0.446 |
0.083 |
0.001279 |
0.000237 |
576 |
1.114 |
0.091 |
0.003190 |
0.000260 |
570 |
1.025 |
0.100 |
0.002936 |
0.000286 |
CDYa |
2.204 |
0.582 |
0.006313 |
0.001667 |
CDYb |
2.003 |
0.252 |
0.005737 |
0.000723 |
442 |
0.727 |
0.375 |
0.002084 |
0.001075 |
438 |
0.113 |
0.017 |
0.000322 |
0.000048 |
References
Stumpf MP, Goldstein DB. 2001. Genealogical and evolutionary inference
with the human Y chromosome.
Science 291:1738-42.
Takezaki N, Nei M. 1996. Genetic distances and reconstruction of
phylogenetic trees from microsatellite DNA.
Genetics. 144:389-99
Zhivotovsky LA, Underhill PA, Cinnioglu C, Kayser M, Morar B, Kivisild
T, Scozzari R, Cruciani F, Destro-Bisol G, Spedini G, Chambers GK, Herrera RJ,
Yong KK, Gresham D, Tournev I, Feldman MW, Kalaydjieva L. 2004. The effective
mutation rate at Y chromosome short tandem repeats, with application to human
population-divergence time. Am J Hum Genet. 74:50-61