Estimating TMRCA and mutation rates for the phase 3 Y chromosome STR clusters via ASD estimates

 

John McEwan

 

11th March 2005

 

Disclaimer

These analyses are being undertaken to investigate the history of the R1b clusters, no claims are made about these clusters nor the methods used. They are provided for information purposes only. Specifically, many of the R1b clusters are not well supported, in addition genealogical and anthro-genealogical STR mutation estimates are a hotly disputed area.

 

Background

The phase 3 clustering was done using a distance measure Da that optimized identification of branches, but is not linear with branch age (Takezaki & Nei 1996). A distance measure which is approximately linear with age is based on the ASD measure (Goldstein et al 1995) and is commonly used in many forms of “model free” estimates of  haplotype age and TMRCA estimates (Stumpf & Goldstein 2001; Zhivotovsky et al 2004). However, there has been considerable controversy about whether individual loci mutate at different rates, the actual mutation rate estimates, and the impact of population size and expansion on these estimates. In addition it appears that “effective” mutation rates (estimated on the basis of known divergence dates) are lower than estimates derived from observed mutation rates. The exact reasons are still being investigated but it appears that this is a consequence of a deviation from the neutral model.

 

In this analysis I am concentrating on the R1b clusters, individuals in these clusters are known to have expanded dramatically since the LGM and populations have typically grown at 3.5% per annum over that period. In addition I am using “effective” mutation rate derived from Zhivotovsky et al. (2004), but corrected for the different markers used.

 

Regards ASD calculations themselves the equation is best described by Stumpf and Goldstein (2001)

 

In the current context, The ASD was calculated for the various clusters defined in the phase 3 analysis, average ASD’s were calculated for all clusters (thereby providing an estimate of the clusters ) relative TMRCA and actual ages in years were estimated based on corrected mutation rates using the value (0.0007/generation) and generation length (25 years) from Zhivotovsky (2004). In addition the average ASD for all clusters was calculated for each marker thereby providing an estimate of individual marker mutation rates. These are also tabulated.

 

Results

 

Estimates of the TMRCA for the various STR derived clusters

The results are presented in table 1 below and this has a link to the complete table as well.

 

In the current context it must be remembered that these are estimates of the TMRCA for the cluster, they do not provide estimates of the time since divergence of various clusters, especially where these correspond to haplogroups. This is because narrow bottlenecks especially during the last ice age may mean the last common ancestor of a group as distinct from when the SNP derived haplogroup mutation occurred may differ. Similarly, the estimates are subject to considerable error. Perhaps what is surprising is the age estimates for the TMRCA of 7390 (1178) for R1b when it is known that the R1b1c SNP must have occurred before the start of the LGM approximately 20 Kyr bp. In part this has to do with sampling, because most individuals will be from the Iberian LGM refugia which rapidly expanded from a small group after the end of the last ice age about 9000 yrs bp.  However, a further factor to be born in mind is the “effective” mutation rate used. It may still be too high.

 

R1bSTR19 aka North West Irish, or IMH

Perhaps the more surprising is the estimates fro the “well established” R1b STR clusters, R1bSTR19, R1bSTR22 and R1bSTR47. The Irish cluster has a particularly recent TMRCA of 3362 (SEM=609) yrs bp, and its best estimate is rather more recent than both the Scots and Irish clusters. The cluster itself is quite distinct from R1b, so this could infer that this group must have rapidly expanded from a small group around that time. Based on its current geographical distribution it is thought that this group represents the original hunter/gatherer population that inhabited Ireland after the LGM (~9000 yrs bp). The expansion may have been due to the rapid growth in population after the introduction of agriculture (~6500 yrs bp), or after metal working began ~4500 yrs bp when Beaker-ware also made its appearance. This cultural marker was most probably associated with the introduction of, horses and the Celtic language as well. More recent cultural events may also have contributed.

 

R1bSTR22 aka Frisian or Germanic

This cluster is exclusively associated with the S21+ SNP, although the SNP mutation itself is much older. It is thought to have originated in North Eastern Germany, although its centre of origin is disputed. Its estimated age of 5175 (SEM=813) yrs bp coincides best with the cultural changes that occurred around the time of the Kurgan expansion reached northern Europe.

 

R1bSTR47 aka Scots

At 4686 (SEM=903) yrs bp this group is intermediate between the Irish and Scots, but again may be related to a cultural shift with the introduction of the horse and altered agriculture. Again this group is thought to represent the indigenous hunter/gatherer inhabitants of Britain who settled after the LGM.

 

Perhaps another facet that is noticeable is most of the other R1b clusters are typically more diverse and have older TMRCA.

 

In summary, these estimates are presented as an independent chronological assessment of their ages. They do not provide estimates of their split from other groups (typically much older) rather the date of their initial rapid expansion. Observation suggests that the most important factor affecting the estimates is the “effective” mutation rate used.

 

Relative mutation rates of the various markers in the 37 FTDNA panel.

This is a fraught exercise but Table 2 is provided for consideration. The equations above suggest that the relative rates can be estimated via ASD, and for the sake of completeness these have been converted to mutation rates via Zhivotovsky et al. (2004) estimate. They are provided simply for comparison and no claims are made for their accuracy. For instance they are not weighted for the number of observations in each cluster (each is independent so the square root of numbers is an appropriate weighting factor). What is apparent is the rather dramatic difference in relative rate for individual markers.

 

 

Table 1. Phase 3 analysis cluster ASD estimates for each marker and average ASD and TMRCA estimates and relative ASD estimates for each marker. (Full table available at this link)

37 STR

n

aveASD

SEM(ASD)

TMRCA years1

SEM(years)

AB

9

1.727

0.322

29512

5499

E

143

1.075

0.182

18361

3111

F

12

4.322

1.524

73856

26040

G

85

0.666

0.135

11385

2311

HO3

3

0.925

0.265

15805

4522

I

800

0.769

0.109

13147

1868

I1a

550

0.375

0.074

6401

1259

I1b

65

1.231

0.344

21042

5882

I1c

168

0.656

0.157

11214

2681

I?

19

0.936

0.154

15992

2635

Ix

17

0.451

0.091

7703

1557

K2

17

1.076

0.196

18394

3353

N

8

0.410

0.086

7007

1472

Q

24

1.021

0.168

17440

2866

R1a

196

0.517

0.078

8830

1341

R1b

2553

0.432

0.069

7390

1178

 

 

 

 

 

 

E

143

1.075

0.182

18361

3111

E3b

105

1.037

0.209

17721

3575

E3bSTR1

51

1.270

0.297

21708

5070

E3bSTR2

46

0.434

0.091

7419

1563

E3bSTR3

8

0.525

0.115

8977

1959

E3a

38

0.594

0.100

10156

1703

 

 

 

 

 

 

G

85

0.666

0.135

11385

2311

GG2

56

0.465

0.111

7947

1900

Gx

7

0.231

0.056

3940

951

GG2STR2

22

0.795

0.160

13587

2727

GG2(Fx)

8

0.248

0.139

4236

2381

 

 

 

 

 

 

I

800

0.769

0.109

13147

1868

I1c

168

0.656

0.157

11214

2681

I1cSTR1

103

0.467

0.118

7979

2023

IslesI1c

33

0.277

0.061

4736

1043

I1cSTR2

7

0.371

0.108

6334

1845

RootsI1c

18

0.430

0.081

7344

1376

I1cSTR3

7

0.428

0.138

7314

2360

I1b

65

1.231

0.344

21042

5882

I1bSTR1

39

0.552

0.095

9433

1631

WesternI1b

16

1.324

0.919

22618

15699

I1b2

10

0.995

0.602

16996

10289

I1a

550

0.375

0.074

6401

1259

I1aSTR1

42

0.378

0.087

6453

1492

I1aSTR2

56

0.344

0.071

5879

1210

I1aSTR3

37

0.278

0.076

4755

1304

I1aSTR4

69

0.269

0.062

4591

1052

I1aSTR5

71

0.285

0.081

4874

1376

I1aSTR6

59

0.317

0.050

5409

857

I1aSTR7

94

0.293

0.068

5008

1167

I1aSTR8

47

0.352

0.098

6007

1668

I1aSTR9

20

0.382

0.095

6529

1618

I1aSTR10

55

0.193

0.045

3292

761

Ix

17

0.451

0.091

7703

1557

I?

19

0.936

0.154

15992

2635

 

 

 

 

 

 

J

119

1.114

0.192

19030

3275

J2x

18

0.360

0.078

6151

1329

J1

21

1.161

0.339

19848

5796

J2e

19

0.362

0.069

6192

1173

J2STR4

8

0.812

0.165

13877

2816

J2

43

0.766

0.135

13081

2312

 

 

 

 

 

 

R1a

196

0.517

0.078

8830

1341

R1b

2553

0.432

0.069

7390

1178

R1bSTR1

57

0.340

0.071

5805

1210

R1bSTR2

56

0.380

0.090

6487

1544

R1bSTR3

53

0.236

0.038

4035

647

R1bSTR4

15

0.474

0.151

8108

2585

R1bSTR5

17

0.318

0.088

5437

1506

R1bSTR6

36

0.465

0.105

7939

1787

R1bSTR7

69

0.306

0.057

5222

978

R1bSTR8

29

0.481

0.133

8223

2273

R1bSTR9

57

0.380

0.118

6500

2023

R1bSTR10

60

0.309

0.095

5281

1622

R1bSTR11

53

0.288

0.079

4917

1343

R1bSTR12

41

0.378

0.098

6453

1667

R1bSTR13

14

0.302

0.093

5160

1593

R1bSTR14

43

0.333

0.071

5698

1220

R1bSTR15

60

0.308

0.058

5264

987

R1bSTR16

39

0.356

0.086

6089

1466

R1bSTR17

57

0.406

0.101

6943

1734

R1bSTR18

37

0.328

0.044

5613

750

R1bSTR19Irish

184

0.197

0.036

3362

609

R1bSTR20

37

0.331

0.061

5654

1049

R1bSTR21

26

0.319

0.060

5459

1019

R1bSTR22Frisian

117

0.303

0.048

5175

813

R1bSTR23

27

0.516

0.202

8811

3451

R1bSTR24

64

0.485

0.085

8296

1448

R1bSTR25

82

0.375

0.078

6410

1341

R1bSTR25a

35

0.355

0.079

6060

1354

R1bSTR26

26

0.494

0.160

8439

2741

R1bSTR27

55

0.345

0.066

5899

1124

R1bSTR28

56

0.471

0.068

8044

1154

R1bSTR29

19

0.308

0.083

5256

1416

R1bSTR30

33

0.341

0.068

5828

1163

R1bSTR31

9

0.394

0.096

6739

1646

R1bSTR32

43

0.379

0.063

6482

1084

R1bSTR33

21

0.286

0.061

4887

1049

R1bSTR34

45

0.385

0.064

6572

1092

R1bSTR35

27

0.385

0.092

6575

1578

R1bSTR36

39

0.369

0.072

6304

1228

R1bSTR37

64

0.389

0.070

6649

1201

R1bSTR38

33

0.300

0.075

5132

1277

R1bSTR39

69

0.429

0.078

7331

1329

R1bSTR40

65

0.309

0.078

5273

1338

R1bSTR41

31

0.374

0.076

6399

1302

R1bSTR42

79

0.422

0.079

7213

1348

R1bSTR43

97

0.391

0.062

6675

1059

R1bSTR44

91

0.289

0.046

4941

779

R1bSTR45

38

0.344

0.064

5870

1093

R1bSTR46

25

0.326

0.064

5579

1098

R1bSTR47Scots

133

0.274

0.053

4686

903

R1bSTR48

38

0.505

0.116

8623

1974

R1bSTR49

52

0.378

0.071

6454

1207

 

 

 

 

 

 

 

 

 

 

 

 

1years were estimated by deviding ASD by the average generation length (25 yrs and the "corrected" estimated mutation rate)

The corrected mutation rate was derived as the average of Zhivotosky (2004) markers and his estimated mutation rate of 0.0007/25 year generation

scaled for the additional markers used in this analysis (DYS461 was assumed equivalent to the average) =2.09*0.0007

 

 

Table 2. Relative ASD values and estimated mutation rates for the 37 FTDNA panel. The mutation estimates are derived from Zhivotovsky et al. (2004) value derived from the shaded markers.  

Marker

mean ASD

SEM ASD

mutation est

SEM mutation

393

0.158

0.019

0.000452

0.000055

390

0.354

0.045

0.001013

0.000129

19

0.203

0.024

0.000582

0.000069

391

0.190

0.018

0.000544

0.000052

385a

0.503

0.078

0.001442

0.000222

385b

0.917

0.164

0.002627

0.000469

426

0.044

0.014

0.000126

0.000040

388

0.186

0.048

0.000534

0.000137

439

0.404

0.036

0.001158

0.000103

389i

0.242

0.029

0.000693

0.000082

392

0.135

0.033

0.000388

0.000094

389ii*

0.326

0.042

0.000935

0.000121

458

0.819

0.060

0.002347

0.000171

459a

0.111

0.020

0.000317

0.000057

459b

0.127

0.018

0.000364

0.000050

455

0.052

0.018

0.000148

0.000052

454

0.057

0.017

0.000163

0.000047

447

0.733

0.136

0.002099

0.000391

437

0.193

0.069

0.000552

0.000198

448

0.248

0.041

0.000710

0.000118

449

1.318

0.129

0.003777

0.000369

464a

0.447

0.080

0.001280

0.000229

464b

0.468

0.058

0.001340

0.000166

464c

0.424

0.036

0.001215

0.000104

464d

0.370

0.040

0.001061

0.000115

460

0.248

0.020

0.000710

0.000057

H4

0.221

0.026

0.000633

0.000075

YCAIIa

0.557

0.251

0.001595

0.000718

YCAiib

0.497

0.077

0.001423

0.000221

456

0.737

0.146

0.002111

0.000419

607

0.446

0.083

0.001279

0.000237

576

1.114

0.091

0.003190

0.000260

570

1.025

0.100

0.002936

0.000286

CDYa

2.204

0.582

0.006313

0.001667

CDYb

2.003

0.252

0.005737

0.000723

442

0.727

0.375

0.002084

0.001075

438

0.113

0.017

0.000322

0.000048

 

References

Stumpf MP, Goldstein DB. 2001. Genealogical and evolutionary inference with the human Y chromosome.

Science 291:1738-42.

 

Takezaki N, Nei M. 1996. Genetic distances and reconstruction of phylogenetic trees from microsatellite DNA.

Genetics. 144:389-99

 

Zhivotovsky LA, Underhill PA, Cinnioglu C, Kayser M, Morar B, Kivisild T, Scozzari R, Cruciani F, Destro-Bisol G, Spedini G, Chambers GK, Herrera RJ, Yong KK, Gresham D, Tournev I, Feldman MW, Kalaydjieva L. 2004. The effective mutation rate at Y chromosome short tandem repeats, with application to human population-divergence time. Am J Hum Genet. 74:50-61