I've been trying to figure out how all the Kill*leas (Killalea, Killelea, or Killilea) of the world are related, and so far got 7 Kill*lea men of unknown relation to do the standard Y-chromosome test. The results are here:
http://patrick.net/killeleas/baile.php#dna
Six of them look to be pretty closely related, and are all probably in haplogroup R1b. But how many generations removed are they from their common ancestor? Most Time-to-most-recent-common-ancestor (TMRCA) calculators look only at the number of mismatches, but don't take into account how likely or unlikely a mismatch is for that "locus". The importance of any particular match or mismatch varies enormously. Fortunately, I have some good statistics on the frequency distribution of locus values, here:
http://patrick.net/killeleas/yfreq.html
So you can see that the fact that the first six men all have a value of 12 for locus 438 means very little, because 94% of men in haplogroup R1b have a value of 12 for locus 438.
Conversely, the fact that 4 of the first six men have a value of 13 for locus GATA-H4 is very important and seems to show descent from a recent common ancestor, because very few men have a value of 13.
So how do you combine probability distributions with the results to get a meaningful measurement of difference between the men, and ultimately a time to most recent common ancestor?
Watch
Follow
Befriend
19 threads
1,534 comments
Los Angeles, CA
Maybe its time to get a day job....
Follow
Befriend (54)
5,184 threads
6,156 comments
46 male
Menlo Park, CA
Premium
It was time to get a day job two years ago, but I'm still having too much fun. The only think I lack is money.
Follow
Befriend (12)
10 threads
3,516 comments
Oakland, CA
leo707's website
Premium
I have no idea, but it looks like an awesome project.
Follow
Befriend
2 threads
27 comments
Pleasanton, CA
for distance estimation with known population level frequencies and individual alleles, you need to look at Bayesian priors:
http://en.wikipedia.org/wiki/Posterior_probability_distribution
calibrating the "time" component in the TMRCA will still require the ability to convert "step events" when an STR changes from length X to length X+1 or X-1 to how often those take place over time, the so called molecular clock
You should be able to estimate the Events to MRCA by calculating the most parsimonious ancestor (least number of steps to all extant Kill*la samples), then for the comparison between the MRCA and each Kill*la sample you can get the mean and variance estimator for "steps to MRCA"
The should be published estimators for the mean and variance for STS events per unit time.
the TMRCA estimate is then biased (lower) since:
1) small number of samples, might not have found all the Kill*la variation
2) most parsimonious ancestor is shortest path
you can use both the variance on the steps to MRCA as well as STS events per unit time combined to get upper and lower confidence bounds on the TMRCA estimate
I thought that many of these calculations were being done automagically at places supporting the surname project:
http://www.dnaancestryproject.com/ydna_intro_surname.php
no?
Follow
Befriend (54)
5,184 threads
6,156 comments
46 male
Menlo Park, CA
Premium
NorCalBear says
Thanks for the lead. I'll poke around dnaancestryproject.com, since my own understanding of Bayesian statistics is poor.
I know that familytreedna.com does have a calculator that takes into account the probability of each locus value, but they do not let you use it unless you pay for their DNA test first. Not very friendly of them.
Follow
Befriend (54)
5,184 threads
6,156 comments
46 male
Menlo Park, CA
Premium
Darn, looks like dnaancestry.com is just as money-centered, and does not provide any resources to anyone unless you get a test from them. Let me know if I'm wrong.
Follow
Befriend
2 threads
27 comments
Pleasanton, CA
P,
im the guy who writes the code himself in c++ (or python perl, php, whatever, its been a while since a few others, but I'm not opposed to pushing stack in assembler, given hours of concentration) perhaps we can take the population genetics discussion off line, you know my contacts
dnaancestry is likely very $ centered, the web business model is all trying to figure a way to make $, we all know that problem
-A(Norcalbear)