“How to” guide for DNA genealogists who have Y chromosome haplotype

 

John McEwan

 

29th October 2005

 

Overview

This page takes a typical person who has a 37 FTDNA STR haplotype and describes what can be done using the resources available from this and selected other sites. It is not a definitive guide and other approaches and sources should also be explored. The field is constantly changing and this is part of the excitement of DNA genealogy. Before using this site it is recommended that people have read the material about Y haplogroups at http://worldfamilies.net/y-haplogroups.htm

 

Who should use this process?

Anybody who has received an extended DNA haplotype and wishes to find out more about just where their results fit into the “larger picture”.

 

First steps: data into Ysearch and Ybase

The first useful step is that you can do is load your results into Ysearch  http://www.ysearch.org/ and Ybase http://www.ybase.org/default.asp . Each provide you with a unique ID and store your haplotype in a publicly accessible format. Before doing this please ensure you have the marker alleles available in the correct order and format by using the method described here. If known you should also list the location of your earliest known ancestor on the site as well. Similarly if you are lucky enough to have SNP results also please list them (actual SNP name and status eg S21+) in the other comments field. Keep these sites updated.  Where you have a haplogroup assignment please identify if it has been predicted or is based on SNP results.

 

The reason for doing this is if you ask for help on a list server like Genealogy-DNA you simply need to list your Ysearch ID not an error prone listing in an email where people often are unsure about the order of markers and the exact reporting convention used.

 

Close matches

The second step is simply to use these databases to explore what close DNA matches you have available.  Each site has its own intricacies regards search criteria. I suggest you try many options. In the case of Ysearch, probably the best search is to pick as many markers as you have scored and then reduce backwards from a perfect match until you get several hits. For instance if you have results for 37 markers, the options shown in the screen below often provide a reasonable number of hits:

 

 

 

Other sites you should explore are YHRD http://www.yhrd.org/index.html and SMGF http://smgf.org/ .

 

In the case of YHRD you have to enter your values to undertake the search, it has only a few STR values that it can accept, but has the benefit that it has the best geographical spread of any database and gives an excellent geographical plot which allows you to infer what region you may have come from. You cannot submit your results to be part of the database and the hits only provide location but no information about individuals.

 

SMGF is different again, it is a database with information about individual ancestors, but you cannot identify the participants. Its great value is that it has a large number of people genotyped at many markers and also a good geographic sampling of Europe. You can search the database by entering your haplotype (remember to save this search page to your favorites after you have done the search, because you can reuse it later without needing to reenter the values). It is also difficult to extract the exact haplotype of people that have mismatches at SMGF, for the mismatched markers. It can be done but it means you have to alter the allele value for the marker in the query.

 

Conversions between marker scoring schemes

Note the haplotype scoring and marker ordering scheme differs between FTDNA , SMGF, YHRD, Ybase and others. Some of these issues are detailed at https://home.comcast.net/~whitathey/nomenclature.htm  It is important that the correct data is entered, but I find these differences tedious and mind numbingly difficult to remember and reliably enter data, so I use Dean McGee’s calculator http://www.mymcgee.com/tools/yutility.html to reorder the values and undertake the conversions for me.  Simply enter the values of your haplotype in FTDNA order (where most people genotype) and all the conversions are done for you.

 

Predicting haplogroup

The next step if you have not been SNP tested, or it has not already been estimated for you is to predict your haplogroup. A haplogroup is a SNP defined clade that typically has been reported in the scientific literature and often something is known about its age, geographic origin and in some cases prehistoric history. This usually provides information about “deep prehistory” typically from 5,000-50,000 years ago. This is in contrast to recent or “genealogical history” 100-500 years ago that close DNA matches (preferably 34 out of 37 or greater) with the same surname provide. The best site for this is Whit Athey’s haplotype predictor https://home.comcast.net/~whitathey/predictorinstr.htm This site has the great feature that the prediction is updated when every allele value is entered so you can identify which STR markers from your haplotype are essential to your classification. Once you have done this revisit http://worldfamilies.net/y-haplogroups.htm and reread all the information about your haplogroup.

 

Next steps

Done all of the above? Congratulations, you are now ready to move on to the more difficult section. This is to attempt to derive some information about the “middle history” of your haplotype. This typically is the period between 500-5000 years ago. Note the word attempt, you must be aware that the results can only provide an indication, not certainty, and in many cases sufficient information is still not available. Why is this useful? Well if you are attempting to answer these questions you are often asking about “middle history”

·          I have no close matches in the database and none with the same surname, what are my likely origins?

·          My ancestors paper trial can only be traced back to after they emigrated to USA, Canada, Mexico, Australia, New Zealand …. What region of Europe did they originally come from?

·          The family name is reputedly Irish, Scottish, Norse, Norman, AngloSaxon, German  is my haplotype consistent with this origin?

·          In our family project we have 3 distinct clusters of haplotypes within the one surname, where do they originate from?

 

Why is it difficult?

This is a tough question to answer. It has to do with the markers used, the size of the databases available, historical movement and growth of humans, and our current state of knowledge. SNP markers, typically used in anthropological studies, are excellent as in almost all cases those with a given mutation trace to a common ancestor. The problem is that for many groups the only SNPs available are those that happened in the distance past (more than 10,000 years ago). For STRs their properties mean a given mutation may occur independently many times and in some cases “revert” back to the original state. To circumvent this and unambiguously (i.e. more than 99% of the time) define a group of males descended from a common ancestor born perhaps 2000 years in the past probably needs between 100-200 STR markers, many more than are currently commercially available and many more than the databases currently hold. In practice some combination of perhaps 50 STR markers and perhaps 5 selected SNPs may in future provide sufficient resolution. Many of these SNPs have yet to be discovered.

 

Another difference between SNPs and STRs also needs to be described. The rapid mutation rate of STRs means they also contain more information about the past population size and genetic bottle necks that have occurred in the recent past (5-20,000 years). For instance, if as an extreme example, there were many descendants from one unusual ancestral haplotype from a single individual. Say the first person and his close relatives to colonized Ireland after the last ice age. This variant would stand out from the background and the variability of his descendants at the various STR markers would enable us to estimate when it happened. The same thing can be done using SNPs, but for much older times. In the current case though a SNP is unlikely to happen in that individual, or his ancestors close to or within the bottleneck. It may only happen in one of his distant descendants or an ancestor. In either case you know they all had a common ancestor, BUT often the genetic bottle neck and SNP are only loosely related. A much more direct relationship exists with STRs.

 

So in some cases some information can at least be guessed, even with current knowledge, because of fortuitous circumstances. A group may be distinct because they developed in relative isolation and their genetic STR signature stands out from the background. In other cases luck has provided us with diagnostic SNPs for the correct time period.

 

In summary, you have completed a “bottom up” search for close matches in the database and also defined your haplogroup via a “top down” search. Now to the difficult step to find out about your “middle history” and the methods described here which really are based around extending the “top down” methods.

 

What you need

The first step is to open up a spread sheet program like Excel and go to Ysearch and obtain your haplotype and near matches and paste it into Excel. For the sake of the example I am going to use the individual from Ysearch called Doherty with a Ysearch ID YK99U. The result of this search is shown below:

 

 

Note that all with known origins originate from Ireland and he has several close matches and they are all R1b. These matches for him and his matches are then extracted from Y search using the compare function and copying and pasting as shown below.

 

 

Then paste them in Excel and edit them so that you have one identifier. My preference is to append the Ysearch ID to the end. The result should look as shown below. Note here that you have to take care with the identifiers and have them in FTDNA 37 marker format. You will note I have also gone to http://www.oocities.org/mcewanjc/p3modal.htm and copied the R1b modal table and appended it. However, you will note this R1b table is in Genographic allele scoring convention for 389ii (column M) and this needs to be altered by adding the 389i value to it. What is a modal? It is the most common value of an allele at a marker for the group it defines. If the group is distinctive it tends to have values that differ from other groups and these values often tend to be at high frequency within the group.

 

 

I do this by inserting a column and entering the formula as shown below and then dragging it down and then copying it and using the paste special command to select and paste “values” over the original values and then delete the extra column.

 

The result should look like this.

 

The rows are then selected and pasted into Dean McGee’s calculator at http://www.mymcgee.com/tools/yutility.html

Note I normally deselect the options to print the output in different formats and to create the modal haplotype. I also normally turn off the option to calculate the TMRCA initially. The screenshot below shows options selected before pushing the execute button.

 

 

The abbreviated results file is shown below (I have deleted unnecessary rows)

 

Ysearch Database Configuration - DNA Results Comparison

ID

D
Y
S
3
9
3

D
Y
S
3
9
0

D
Y
S
1
9
/
3
9
4

D
Y
S
3
9
1

D
Y
S
3
8
5
a

D
Y
S
3
8
5
b

D
Y
S
4
2
6

D
Y
S
3
8
8

D
Y
S
4
3
9

D
Y
S
3
8
9
-
1

D
Y
S
3
9
2

D
Y
S
3
8
9
-
2

D
Y
S
4
5
8

D
Y
S
4
5
9
a

D
Y
S
4
5
9
b

D
Y
S
4
5
5

D
Y
S
4
5
4

D
Y
S
4
4
7

D
Y
S
4
3
7

D
Y
S
4
4
8

D
Y
S
4
4
9

D
Y
S
4
6
4
a

D
Y
S
4
6
4
b

D
Y
S
4
6
4
c

D
Y
S
4
6
4
d

D
Y
S
4
6
0

G
A
T
A
-
H
4

Y
C
A
-
I
I
a

Y
C
A
-
I
I
b

D
Y
S
4
5
6

D
Y
S
6
0
7

D
Y
S
5
7
6

D
Y
S
5
7
0

C
D
Y
a

C
D
Y
b

D
Y
S
4
4
2

D
Y
S
4
3
8

Doherty_YK99U

13

25

14

11

11

13

12

12

12

13

14

29

18

9

10

11

11

25

15

18

31

15

15

16

17

11

11

19

22

17

16

18

17

38

39

12

12

Templeton_2ADY8

13

25

14

11

11

13

12

12

12

13

14

29

17

9

10

11

11

25

15

18

31

15

15

16

17

11

11

19

23

17

16

18

17

37

39

12

12

Slavens_RWCR2

13

25

14

11

12

13

12

12

12

13

14

29

18

9

10

11

11

25

15

18

31

15

16

16

17

11

11

19

23

17

16

18

17

38

39

12

12

Doherty_SZ8DF

13

25

14

11

11

13

12

12

12

13

14

29

18

9

10

11

11

25

15

18

30

15

16

16

17

11

11

19

22

17

16

18

17

38

39

12

12

R1b

13

24

14

11

11

14

12

12

12

13

13

29

17

9

10

11

11

25

15

19

29

15

15

17

17

11

11

19

23

15

15

18

17

37

38

12

12

R1a

13

25

15

10

11

14

12

12

10

13

11

30

15

9

10

11

11

23

14

20

32

12

15

15

16

11

11

19

23

16

16

18

18

34

39

12

11

R1bSTR1

13

24

14

11

11

14

12

12

12

13

13

29

16

9

10

11

11

25

15

19

30

15

15

16

16

11

11

19

23

16

15

17

17

37

39

12

12

R1bSTR18

13

24

14

11

11

14

12

12

11

13

13

29

18

9

10

11

11

25

14

19

30

15

15

16

17

11

11

19

23

15

15

17

17

36

37

12

12

R1bSTR19Irish

13

25

14

11

11

13

12

12

12

13

14

29

17

9

10

11

11

25

15

18

30

15

16

16

17

11

11

19

23

17

16

18

17

38

39

12

12

R1bSTR20

13

23

14

11

11

14

12

12

12

13

13

29

18

9

10

11

11

26

15

19

29

15

15

16

17

11

11

19

23

16

15

17

18

37

38

12

12

R1bSTR21

13

24

14

11

11

15

12

12

12

13

13

30

17

9

10

11

11

25

15

19

29

15

15

16

17

11

11

19

23

16

15

18

17

35

38

12

12

R1bSTR22Frisian

13

23

14

11

11

14

12

12

12

13

13

29

17

9

10

11

11

24

15

19

29

15

16

17

18

11

10

19

23

17

15

17

17

37

39

13

12

R1bSTR43

13

24

14

10

11

14

12

12

12

13

13

29

17

9

10

11

11

25

15

19

30

15

15

16

17

11

11

19

23

16

15

18

17

36

38

12

12

R1bSTR44

13

24

14

10

11

14

12

12

12

13

13

29

17

9

10

11

11

24

15

19

29

15

15

17

17

11

11

19

23

15

15

18

17

36

37

12

12

R1bSTR45

13

24

14

10

11

14

12

12

12

13

13

29

17

9

10

11

11

25

15

19

29

15

15

17

17

11

11

19

23

15

15

17

17

37

39

12

12

R1bSTR46

13

24

14

11

11

14

12

12

12

13

13

29

16

9

10

11

11

25

15

19

29

15

15

17

18

11

11

19

22

16

15

18

17

36

37

11

12

R1bSTR47Scots

13

24

14

10

11

14

12

12

12

13

13

30

18

9

10

11

11

25

15

19

30

15

15

17

17

11

12

19

24

16

15

18

17

37

38

12

12

R1bSTR48

13

24

14

11

11

14

12

12

12

14

13

30

18

9

10

11

11

26

15

19

30

15

15

17

17

11

11

19

23

15

15

19

17

35

38

12

12

R1bSTR49

13

24

14

11

11

14

12

12

12

13

13

29

18

9

9

11

11

25

15

19

29

15

15

15

17

11

11

19

23

15

15

18

17

36

38

12

12

Distance from reference:

Zero

One

Two

Three+

 

Genetic Distance

ID

D
o
h
e
r
t
y
_
Y
K
9
9
U

T
e
m
p
l
e
t
o
n
_
2
A
D
Y
8

S
l
a
v
e
n
s
_
R
W
C
R
2

D
o
h
e
r
t
y
_
S
Z
8
D
F

R
1
b

R
1
a

R
1
b
S
T
R
1

R
1
b
S
T
R
2

R
1
b
S
T
R
3

R
1
b
S
T
R
4

R
1
b
S
T
R
5

R
1
b
S
T
R
6

R
1
b
S
T
R
7

R
1
b
S
T
R
8

R
1
b
S
T
R
9

R
1
b
S
T
R
1
0

R
1
b
S
T
R
1
1

R
1
b
S
T
R
1
2

R
1
b
S
T
R
1
3

R
1
b
S
T
R
1
4

R
1
b
S
T
R
1
5

R
1
b
S
T
R
1
6

R
1
b
S
T
R
1
7

R
1
b
S
T
R
1
8

R
1
b
S
T
R
1
9
I
r
i
s
h

R
1
b
S
T
R
2
0

R
1
b
S
T
R
2
1

R
1
b
S
T
R
2
2
F
r
i
s
i
a
n

R
1
b
S
T
R
2
3

R
1
b
S
T
R
2
4

R
1
b
S
T
R
2
5

R
1
b
S
T
R
2
5
a

R
1
b
S
T
R
2
6

R
1
b
S
T
R
2
7

R
1
b
S
T
R
2
8

R
1
b
S
T
R
2
9

R
1
b
S
T
R
3
0

R
1
b
S
T
R
3
1

R
1
b
S
T
R
3
2

R
1
b
S
T
R
3
3

R
1
b
S
T
R
3
4

R
1
b
S
T
R
3
5

R
1
b
S
T
R
3
6

R
1
b
S
T
R
3
7

R
1
b
S
T
R
3
8

R
1
b
S
T
R
3
9

R
1
b
S
T
R
4
0

R
1
b
S
T
R
4
1

R
1
b
S
T
R
4
2

R
1
b
S
T
R
4
3

R
1
b
S
T
R
4
4

R
1
b
S
T
R
4
5

R
1
b
S
T
R
4
6

R
1
b
S
T
R
4
7
S
c
o
t
s

R
1
b
S
T
R
4
8

R
1
b
S
T
R
4
9

 

Doherty_YK99U

37

3

3

2

12

17

12

13

11

11

15

14

12

10

12

10

12

13

12

13

13

12

14

13

4

13

12

14

17

16

12

13

10

11

13

16

12

16

15

17

15

12

13

13

15

10

13

13

14

12

14

13

12

14

14

12

 

Templeton_2ADY8

3

37

4

5

9

16

10

11

10

9

15

11

9

11

9

8

11

11

12

11

11

10

12

13

3

12

10

11

15

13

10

10

11

8

12

14

10

14

12

14

13

13

11

10

14

9

11

12

11

10

12

10

13

14

14

12

 

Slavens_RWCR2

3

4

37

3

13

18

12

14

14

12

15

15

13

11

13

11

13

13

13

14

14

13

15

14

3

14

13

14

16

17

13

14

12

12

13

17

13

15

16

18

16

13

14

13

15

11

14

14

13

13

15

14

15

16

15

13

 

Doherty_SZ8DF

2

5

3

37

13

18

11

13

12

11

15

15

13

11

13

11

13

12

13

14

14

13

15

13

2

14

13

14

17

17

13

13

12

12

12

17

13

16

16

18

16

13

14

12

15

11

14

14

13

12

15

14

13

14

14

13

 

R1b

12

9

13

13

37

19

7

6

6

4

7

3

2

6

3

4

2

7

4

4

6

7

5

8

11

7

5

9

10

7

4

6

8

2

5

6

5

9

5

8

5

6

5

5

7

3

6

7

4

5

4

3

7

7

6

4

 

R1a

17

16

18

18

19

37

17

18

20

17

20

19

20

18

18

16

19

20

18

18

19

15

20

18

17

17

17

21

21

19

19

17

21

19

19

21

17

22

20

19

20

19

19

20

23

18

17

19

19

16

18

18

20

18

21

20

 

R1bSTR1

12

10

12

11

7

17

37

6

9

7

13

7

9

8

6

7

9

6

9

8

9

9

10

7

10

7

8

9

13

10

6

7

8

8

7

11

8

11

8

9

11

8

8

5

9

7

6

10

6

6

10

6

8

9

9

9

 

R1bSTR18

13

13

14

13

8

18

7

6

11

8

11

10

10

10

8

10

9

7

6

6

9

9

8

37

13

9

10

12

10

8

7

11

10

10

5

10

6

9

9

10

9

8

8

5

8

7

7

10

8

7

8

8

9

11

8

7

 

R1bSTR19Irish

4

3

3

2

11

17

10

11

12

9

14

13

11

11

11

9

11

10

13

12

12

11

13

13

37

14

11

12

15

15

12

11

12

10

11

15

11

14

14

16

14

13

12

10

14

11

12

12

11

10

13

12

14

15

14

13

 

R1bSTR20

13

12

14

14

7

17

7

10

9

11

12

5

9

8

7

9

8

9

7

9

10

9

10

9

14

37

8

8

13

7

7

7

10

6

10

11

10

14

8

8

12

9

9

8

11

6

5

8

7

8

10

8

10

10

8

8

 

R1bSTR21

12

10

13

13

5

17

8

8

10

8

11

6

7

6

5

6

6

5

8

6

6

8

8

10

11

8

37

11

12

10

9

9

9

6

8

9

7

11

8

7

7

10

6

8

9

6

6

8

7

5

8

8

8

8

9

7

 

R1bSTR22Frisian

14

11

14

14

9

21

9

11

8

11

11

9

10

12

8

10

11

10

12

11

13

11

10

12

12

8

11

37

14

9

9

10

12

7

12

13

10

14

10

8

12

11

11

8

10

10

9

9

8

11

10

8

11

14

13

12

 

R1bSTR43

12

10

13

12

5

16

6

5

10

6

11

6

7

8

5

6

6

6

7

5

6

7

9

7

10

8

5

11

10

11

9

7

8

6

5

6

6

8

6

9

8

8

5

6

10

6

5

8

5

37

5

6

7

6

8

6

 

R1bSTR44

14

12

15

15

4

18

10

7

9

6

8

6

6

8

5

6

5

8

4

4

6

8

8

8

13

10

8

10

10

9

6

9

11

6

7

6

6

10

7

10

5

6

5

7

7

7

7

10

6

5

37

4

7

9

8

6

 

R1bSTR45

13

10

14

14

3

18

6

6

6

5

10

5

5

7

5

5

5

7

6

6

8

8

6

8

12

8

8

8

12

7

4

7

10

5

8

6

6

11

4

9

7

3

7

5

7

6

6

9

3

6

4

37

9

8

8

7

 

R1bSTR46

12

13

15

13

7

20

8

8

8

9

11

8

7

8

4

7

8

9

6

5

7

7

11

9

14

10

8

11

13

10

7

10

9

8

8

10

8

12

11

11

8

10

5

9

8

8

7

11

10

7

7

9

37

10

11

7

 

R1bSTR47Scots

14

14

16

14

7

18

9

9

12

9

13

8

9

9

8

9

9

10

9

9

10

12

11

11

15

10

8

14

14

13

10

9

11

8

9

11

12

12

10

11

8

9

10

10

12

8

11

12

9

6

9

8

10

37

9

9

 

R1bSTR48

14

14

15

14

6

21

9

7

11

7

10

5

8

9

9

9

7

8

7

9

10

12

8

8

14

8

9

13

12

12

8

10

9

8

7

7

10

12

10

12

10

8

10

8

10

7

10

10

9

8

8

8

11

9

37

7

 

R1bSTR49

12

12

13

13

4

20

9

8

9

7

9

7

6

7

6

7

5

10

4

6

8

8

8

7

13

8

7

12

10

10

7

10

7

6

6

6

6

10

6

11

7

7

6

8

8

4

7

9

7

6

6

7

7

9

7

37

 

Related

Probably Related

Possibly Related

FTDNA's Interpreting Genetic Distance for 12 Markers

FTDNA's Interpreting Genetic Distance for 25 Markers

FTDNA's Interpreting Genetic Distance for 37 Markers

- Infinite allele mutation model is used
- Values on the diagonal indicate number of markers tested

 

The first thing to note is YK99U most closely matches the modal of a cluster called R1bSTR19Irish, with only 4 mismatches. The number of mismatches is even lower for his close neighbors and none of the other R1b clusters approach this. Typically they differ by more than 10 or more mismatches for YK99U and his close neighbors. The conclusion is that YK99U resides within the Irish cluster.

 

You can also check if the individual and his close matches are present in the cluster analysis that derived the modal values used by visiting http://www.oocities.org/mcewanjc/p3analysis.htm and examining the phylograms themselves by searching the pdf files. In this case you will find YK99U and RWCR2 but the other two were not present in the analysis. The results are shown below.

 

 

At this point you can also search the web for information about that cluster. Going to Google and typing in “R1b Irish cluster” produces about 246 hits and reading these provides a whole lot of background information about the origins of this cluster. The third hit I had was to David Wilson’s page which defines the cluster

 

http://home.earthlink.net/~wilsondna/DYS392=14%20Summary.htm

 

and I suggest for those interested they search the Genealogy-DNA-L list archives with the key words Wilson AND Irish and 25.

 

However, many times you finish up with matches to a small indistinct cluster. Often little is known about it, and even its existence as a separate entity may be doubtful. In this situation the hard work begins. I cannot fully explore all the options, but the key ones revolve around trying to see whether the members of the cluster have some geographically defining feature that makes them distinct. If they do then it suggests, but does not prove, that you may have similar origins. It is sort of like not only using your DNA haplotype, but your “cluster members” as well to see if a common geographic pattern emerges. Hopefully in time, more diagnostic SNPs will be identified and in many cases the majority of the cluster members will be tied to a SNP subclade.

Are there better methods?

Yes there are better methods, and some that are more statistically rigorous, but they not as simple to implement and nor do they give you as direct a feel for the “robustness” of the matches obtained in an easy to understand way.

Conclusion

You have been taken through the process of examining your haplotype, I have not provided examples for everything that can be done, but concentrated on trying to identify a group of individuals that may have been generated from a common ancestor 2000-10,000 years ago. To do the task involves selecting your haplotype and its “near matches” and examining how closely they match the previously defined information about various clusters. Remember the results are only qualitative and they only provide hypotheses for you to examine further.