Jumat, 27 April 2018

Forensic Anthropology 201110Population Genetics pt.1

Forensic Anthropology 201110Population
[ Music ] >>My training was in
human population genetics. And I started working
with DNA polymorphisms in the early 1980's to study
humans for a variety of reasons. And became involved in the
late 80's and early 90's when DNA started
being introduced into court as an expert witness. And I testified in many
different places around the US.

And in Canada and found
it quite interesting. It soon became unnecessary
for expert witnesses. So I basically dropped
out of forensics. It was never my main
area of research.

And then at the time of the
World Trade Center attacks, an expert committee was set up,
and I was asked to be on it. It became obvious that the
standard forensic markers for individual identification
did not work well in a sizable number of cases because of the extreme
degradation of the DNA. And because they could
say nothing about ancestry of the sample, which
would in itself be a help in identifying the person. At that point I realized
that we had all the expertise and the samples necessary
to do a lot of research.

So I got back involved. And let me then start the
lecture and talk about some of what we're doing and why
DNA can be very important both in ancestry and phenotype. And I could say that
DNA is going to make everything you've
learned so far unnecessary, but I won't go that far. [Laughter] >>DNA is not going
to solve all problems but it already can be
extremely helpful when available and it's going to be much
more helpful in a few years because of projects
that are ongoing now.

Caveat. What we know now, as
I hope you will you learn, is often over interpreted --
that there is the implicit thing that if a conclusion is
based on DNA it's immutable, it's true, it's precise. Ain't so. So let's talk about
the DNA in the human genome.

We've got mitochondrial DNA,
which is a very small part of the genome -- less than
17,000 base pairs as a circle -- compared to the nuclear
DNA which is over 3 point 3 billion
base pairs of DNA. The autosomes are 22 chromosomal
pairs of varying sizes. The sex chromosomes, one pair
unmatched, females have two X's. Males have an X and a Y.

The
Y chromosome can be subdivided into a small part that
recombines with the X chromosome at the tip of the chromosome. That is important in segregation
during meiosis forming gametes, and then the non-recombining
part which is inherited without any recombination. So two parts -- they're
both interesting and have different implications
for how they're studied. All of these segments of
DNA have polymorphisms.

So what's a polymorphism? It literally translates
to a part of the DNA. That occurs in many forms. Depending upon how big
the segment you look at, one base pair will generally
occur only in two forms, sometimes three and four. But in general the least
common form must -- or the least common form,
less than one percent or the most common form at
least less than 99 percent.

So we try to make a distinction
between a polymorphism -- which in general
must mean normal even in it's got functional
differences because millions of people around the world
will have that form of DNA -- and the rare variants
that cause disease. So the other part of this, the idea that site is the
polymorphism that occurs in different forms, each
of which is an allele. There's a great tendency to
call an allele a polymorphism. It's the site that's the
polymorphism, and also SNP, single nucleotide polymorphism,
that you'll hear about more.

So the types of polymorphisms
are a combination of how one detects it and
the nature of the variation. So the restriction fragment
length polymorphisms was one of the first technologies. Those can be almost any of the
other types in terms of the DNA. It's basically a way
of detecting variation.

Pretty much obsolete. The short tandem
repeat polymorphisms, I always put the P on. In forensics you generally
hear of it as STRs. But it's the polymorphism
part that's important.

They're short tandem repeats. The VNTRs are generally
longer segments that occur in tandem repeats, but
similar in concept. Insertions, deletions -- a
bit of DNA from megabases, huge deletions in some areas
that seem to be compatible with normality down to one
base pair more or less. The single nucleotide
polymorphisms instead of an adenine or an
adenosine, you've got guanosine, et cetera just in
the string of DNA.

And then the copy
number variation where it's not a tandem repeat,
but a segment of DNA may occur in two copies or three
copies -- sometimes tandem, sometimes a whole
segment is missing -- analogous to a deletion. So let's -- mitochondrial DNA. It's this small loop of DNA. It occurs in the mitochondrion.

It's the remnant of
an early parasite that invaded an early cell. And it's now dependent
upon the nuclear genome and we're dependent
upon the genes here as the major energy producing
genes and apparatus of the cell. It has its own slightly
different DNA code. And it's got its own transfer
RNA for making proteins and its own machinery for
making its own proteins.

But most of the proteins are now
made in the nucleus of the cell and imported into
the mitochondrion. The relevant point
here is of course is that there are many
polymorphisms. It's almost all coding. So there are great restrictions
on what variation can occur because it has to be
compatible with function.

So some variants are recurring. They have arisen
independently many times because both alleles are
compatible with functioning. The other thing is that the
control region is highly variable because it only
has limited function. As long as most of it's
the same, the replication of the mitochondrion
occurs normally.

It doesn't code for a protein. So you'll see a lot of studies
of hypervariable regions which are highly polymorphic and
then of the single nucleotide or other variants around the
rest of the mitochondrion. An advantage is that for every
cell, there are one, two, maybe more thousands of
copies of mitochondrial DNA. So it is much more prevalent in
a sample than is nuclear DNA.

And hence it has been studied because it doesn't need
quite the sophistication and characterization
as nuclear DNA. So a lot of human
ancestry information and forensic identification has
been based on mitochondrial DNA. Some of it is very
powerful, but it's not as powerful as nuclear DNA. So basically this summarizes
what I've just been saying about the variation
in both segments.

But at any one site there's not
such a huge number of variants. Now the relevant
thing for ancestry and even individual
identification is that the mitochondrial
DNA is inherited only through the mother. Because the little sperm is
only a bundle of nuclear DNA. That gets into the egg.

But the egg comes already with
all of the mother's mitochondria and hence her mitochondrial DNA. So even males have
mitochondrial DNA. They just don't transmit
it to their children. It's entirely based on what they
inherited from their mother.

So going back five generations, how many of your ancestors
have your mitochondrial DNA? [ Pause ] >>One out of 32, assuming
there's no inbreeding along the line. [Pause] >>My father was inbred because
his mother and father were at least five generations
removed from the two brothers who originally settled
in the colonies. And they met just because
the young soldier coming through town was asked if
he'd met the Widow Kidd and her three beautiful
young daughters. So young soldier, of
course, wanted to meet them and married one of them.

So within five generations
it is not that uncommon. So you're not learning
much about your ancestry from this type of DNA. What about Y chromosome, the non-recombining part
of your Y chromosome. It shows exactly the same
pattern except it's the paternal lineage.

So again, you're not learning
much about your ancestry -- your overall ancestry
-- from just this. And I'll add that
some of the companies that do ancestry testing
will use only Y-chromosome or only mitochondrial DNA. And they're not telling you
a lot about your ancestry. So I mentioned the mitochondrial
variation and the Y chromosome.

There are relatively few genes. So there's a fair amount of
single nucleotide polymorphism. And there are also many
STRPs in the Y chromosome that have a higher
mutation rate. [Pause] >>Now let's look at autosomal.

Here is a sample pedigree where I've colored the
alleles coming down. So we have ampersand and
at sign alleles in a sister and we have number sign and
star allele in a brother. They got the opposite alleles. Now on this monitor,
the green and the -- the green and the white
don't show up very well.

But what can be said
about the ancestry who five generations
ago contributed each of these alleles? Well, here we have
the tracing back. And notice that this
blue ampersand allele, as we go back we
have a homozygote and another homozygote. And so we cannot really identify where this particular
allele came from. But it has to be one of
those five ancestors.

Similarly the at sign -- here
going back through the white because here is white -- it could only have come from here even though
both have the light green. That means the light green
must have come from here. So from here, we can go back. Again a homozygote, it
could have been from either of those or this ancestor.

So we know roughly
where that came from. We can go back with the
hash mark, the number sign, and the star is the
only one where we know that the father's father's
mother's father father was the origin of that particular
allele. But clearly overall
we're beginning to get a profile
of the ancestry. And if we look at
many individual loci, each locus is going
to tell us something about some of those ancestors.

And with many loci,
because they're independent, we get a picture of all of them. So the next point is
measuring variation. We know that these polymorphisms
exist in frequencies -- the individual alleles -- in
frequencies in a population. And a standard measure we
use is called FST for -- F is the letter we use for
the inbreeding coefficient, subtotal to total, S and T.
So in theory it's related to random genetic drift.

Has anybody heard of
random genetic drift? A couple. You know that
if you have two children, there's a 50 percent
chance that you give to both of them the same
allele that you have. And that chance over many people
means that the gene frequency in a population of children
will not be exactly the same as the gene frequency
in the parents. And the smaller the number
of parents and children, the greater the possible
fluctuation would be.

So among different
populations, over time, random genetic drift
causes some changes. And FST is a way, theoretically,
of explaining that. So here's an example. I'm going to show
many slides like this.

And I always arrange African
populations on the left, Middle East, European
populations. Here's Western Siberia,
East Asia, a couple of Pacific
Island, Eastern Siberia, North America, South America. So here you see in black
one of the two alleles -- the two have a frequency
summing to one. So the other allele is
one minus this or coming from the top instead
of the bottom.

And you see two different
polymorphic loci show different patterns of variation
around the world. The expectation for any
polymorphism you know nothing about in advance is
that there's a lot of gene frequency
variation around the world. We are all alike
in ethical ways. But we are all different
genetically.

Even if we're from the same
ethnic group, we're different. And so those differences
become important. [Pause] >>Here's another example, but these have low
variation around the world. Same populations, same order.

But they're not identical. One of the ways, just as an
example, to look at genotypes, here are a bunch of individuals. Each dot is an individual. And this is using a TaqMan assay
which measures fluorescence as a function of the genotype.

And so across the bottom
you've got the intensity of fluorescence for one floor and the Y axis you've
got the intensity of fluorescence for
the other floor. We typed 384 individuals
at a time. That's what each dot is. And so you can see the blue
here represent individuals who have only the allele
fluorescing in blue.

They are homozygous,
only fluorescing in red hence homozygous
for the other allele, and a bunch of heterozygotes
who fluoresce both colors. And the controls down here as
black squares and those samples that did not give an
interpretable result. And here are some that
were not interpreting. Here's one for whatever reason
low fluorescence, low intensity, whatever -- not being
interpreted.

Here's one of the controls that
not where we'd like it to be. This is clearly real data. But it's certainly not up here. So it's not really
affecting the interpretation.

So that's sort of a
little bit of background. But now exactly how are we
using some of this DNA variation and the polymorphism
in forensics? So it can be used to
identify a criminal. That's the way it's
classically being used now. There's DNA from a crime scene.

You've got the suspect's DNA. They match. And that's evidence
for identity. Most of what we're interested in here is identifying
human remains or maybe from the crime scene trying
to make some inference of the ethnicity or ancestry or
the phenotype of the individual who left that DNA
at the crime scene on a supposition
that's the criminal.

But DNA is used all the
time in parentage testing. And in the court system,
the best use of DNA is to exonerate innocent people. I was once asked in
cross examination if I were falsely accused of
a crime and there was DNA, would I allow the DNA lab and the Royal Canadian
Mounted Police Forensics Labs to test my DNA? And my response was, 'Of course. It's the surest way I know
to prove I'm innocent.' At which point the judge said to the cross examining
defense attorney, 'Don't you think you should
stop working for the prosecution and excuse this witness?' [Pause] >>So identifying human remains.

You may have, based on
what we've experienced from the World Trade Center
attacks, may have known DNA. Almost all of the firemen
-- the first responders -- had given samples for
bone marrow donation. And so there was a known DNA
sample available to test. Clearly a lot of relatives
brought in toothbrushes, brought in dirty underwear,
brought in all sorts of personal objects from
which DNA could be obtained.

And to date -- I forgot to
bring the number with me, but it's over 1600 of the
individuals have been identified with at least one
little piece of bone. [Pause] >>Determine the phenotypic
characteristics. What hair color -- natural hair
color -- did the person have? What skin color? Can we say anything about whether it was
thick or thin hair? Could we say anything
about height? Or determine the ancestry in
terms of more indigenous -- geographically indigenous
-- origins. So the forensic question
in matching to a known person is first
what are the DNA patterns? So this is a molecular
and a laboratory issue.

Has the DNA been
analyzed correctly? Have the patterns been
interpreted correctly? Then, do the two patterns match? Is the method used
specific enough that if the results are
the same, you could say that match for that locus? Then the statistics,
what are the chances that two unrelated people
have the same pattern? Obviously that becomes very
critical assuming the molecular is done well. And that's where
databases are needed because it all depends
upon the allele frequency. If the frequency of an
allele is 99 percent and you've got two homozygotes
for that allele, well 81 percent of the population has
both alleles the same. That really doesn't
exclude a lot of people as not being the same.

Not very informative. And we'll get into that later. So the CODIS markers -- the standard short tandem
repeat polymorphisms -- used in cases nationwide
now are a panel of individual identification
SNPs that are clearly appropriate
for this kind of question. The lab methodology
is pretty good.

And there are fair databases. But individual identification
is not the only type. And remember I mentioned earlier
the CODIS markers are not good for ancestry. They were picked because
they are highly variable, almost every place.

And so there's not
a lot of difference in allele frequencies among
different populations. So I came up with
a classification of four types a few years ago. There are individual
identification SNPs. They have very low probabilities of two individuals having
the same multisite genotype.

So each SNP is optimized
and the panel is very good. Ancestry-informative SNPs would
be sort of the opposite -- the high probability that an
individual's ancestry comes from one part of the
world or maybe admixed from two parts of the world. Lineage SNPs are where
we're trying to get down to individual clans within
a group -- extended families, organized crime where
it is a family. And the phenotype-informative
SNP -- SNPs that will, based on allelic
differences that control parts of the phenotype, will
tell you something about how a person looks.

[Pause] >>So there are different
requirements for these different
purposes of using SNPs. And I'm concentrating on SNPs because that's really the
best type of DNA for any of these applications in
terms of laboratory methods, numbers of markers available,
and the detailed annotation. So the importance here is
that for the individual, the ancestry, and the
clannish or lineage markers, they represent a small fraction
of all available polymorphisms. So one wants to search
for and optimize a set that is particularly appropriate
for one of those purposes.

The phenotype informative SNPs
are also uncommon, but they deal with specific phenotypes. And as yet, though there
are good candidates, they're poorly documented
for exactly how they function in development of the phenotype. So there are now
five or six loci that we know are clearly
involved in the amount of melanin in the skin, but we
don't know how they interact. So while we can type them
and make predictions, the predictions are
based on associations without a clear understanding
of the interactions when you look at all of them.

[Pause] >>So general criteria. I'm reiterating myself
to an extent. Readily typable, has a unique
marker, highly informative for the stated purpose,
and well documented for such relevant
characteristics as allele frequencies, association with
phenotype, biology. So which ones are
going to be best? So we want the maximum amount
of information per SNP, but what do we mean
by information? And we want SNPs that are not
subject to typing difficulties and what kind of typing
difficulties exist.

So additional slides
will amplify the first. Let me verbally amplify
the second. Almost all of the typing methods
involve using bits of DNA. That are complementary to
either conduct amplification of a fragment of DNA.

Or specifically probe
the small region around where the
known variant is. But if there are other
variants nearby that interfere with either of those then one
may not get an accurate reading. The test fails. And so if you've
got a heterozygote, you only detect one
of the alleles.

It's not that the
polymorphism is not valid. It's that that method does
not detect it accurately. [Pause] >>There are other problems
that anybody working in a laboratory knows about
the phase of the moon, the -- what you ate for
dinner the night before. All of these are
probably real variables in humorous sense at least.

But no method is perfect. No dataset is error free. We have to try to minimize them. And that's where the
prior work will be best.

So in terms of amount of
information, we're talking about alleles, we're talking
about allele frequencies. But what we see in the
population is individuals who have two copies -- one from
the mother, one from the father. So fortunately back in 1904, a geneticist asked
a mathematician about this question. And Hardy and Weinberg came up with this very
simple relationship based on elementary probability.

And as a function of
the gene frequencies, you can see here the
genotype frequencies. And basically P squared, 2PQ, Q squared is the
Hardy Weinberg ratios. It's the square of the quantity
P plus Q, the quantity squared where P and Q are frequencies
of the two different alleles. So it's very elementary
probability.

If we want the most
diversity within a population, we clearly want the
allele frequencies to be at point five -- point five. So for individual
identification, the lowest probability of somebody unrelated
being the same is if the allele frequencies
are equal to P, equal Q, equal point five. But remember I said they always
differ among populations. So here's where the low
FST and here we're talking about heterozygocity, the
frequency in this green line of an individual having
two different alleles being a heterozygote.

In the zygote they had
two different alleles. So that's one aspect. For ancestry identification,
we want the opposite. We want one population
to be like this and the other population
to be like that.

So when we test it,
we've got a distinction between the populations. [Pause] >>So, let's review now in terms of ancestry information a little
bit about what we really do know about modern human evolution. This is, at its basics, no longer controversial,
absolutely accepted. There's of course
infinite argument about the nitty picky
fine details.

That will always go on. This is science and these are
humans who are looking at it. But it's clear. Modern humans evolved in Africa
roughly 200,000 years ago.

And it's also very clear that considerable genetic
variation accumulated in Africa and it's still there. Where are the shortest
people in the world? [Pause] >>African pygmies. Where are, on average, the
tallest people in the world? The Nilotics in Africa. [Pause] >>And tremendous variation.

In the US, among non
scientists, there tends to be -- and even among some
scientists who don't know much about human variation -- there tends to be the
assumption Africa is genetically homogenous. Well, no. [Pause] >>About 100,000 years ago --
and here's where the argument, some say as recently as 80,000, some have even said
60,000 years ago -- some individuals left
Africa into southwest Asia. And the single population
had only a small fraction of the genetic variation
present in Africa.

And that population
then expanded to occupy the rest of the world. [Pause] >>And here is how I put
it in a pointillist way. And this has been reproduced
in National Geographic. And if you see the race
exhibit that's going in museums around the country
it's currently at the Smithsonian
in Washington.

>>This is part of the triple A
online also -- The Race Project. >>Is it online? Well it's animated
in the museum. I don't know about online. But it's clear where just the
different colors represent generalized genetic variation that Africa had accumulated
a lot by 100,000 years ago.

But notice it's not
uniformly distributed. There's a little more red here. There's a little more
blue and yellow here. Typical of any widespread
mammalian or any other species, the fringes of the distribution
don't have all of the variation.

There are little bit. That's gene flow and
random genetic drift. Well the last time
I left Africa, it was in a 747 from
Johannesburg. A hundred thousand years ago,
the only way out of Africa was out of Northeast Africa
into Southwest Asia.

And we know that by 40,000 years
ago that population had spread. And if you look carefully,
there is less variation out here than there is here. And yet it's dramatically less
all over than it is in Africa. So basically if you
wiped everybody of non-African origin out, the human species
would still almost all of the genetic variation
that it has today.

Non-Africans represent a
subset of genetic variation and it's characterized
by a loss of variation as humans have spread out of
Africa with a few exceptions, but they're the exceptions. So random genetic drift
can explain most of that, but selection also occurs. We all believe in evolution. That's selection.

These things evolve to get
food into mouth in part. I have no trouble
using them to eat. [Pause] >>How do we detect selection? We can argue that higher FSTs
indicate selection in one part of the world than not. But that's hard.

You have to be very specific. And it can occur by chance. So one of the methods of
detecting selection is the idea that a particular
variant in one part of the world has
become common quickly where random genetic drift to become common would
take many generations. The result is lest recombination
in the DNA flanking it.

So you tend to get around a
variant that's been selected for an extended part of the
DNA that is all identical. [Pause] >>For example, what about
lactose tolerance as an adult? Do you all know about that? Well I've got the genes for it. Sorry I just hit the mike. That is essentially fixed for one particular
variant in Northern Europe.

And it shows a cline, a
gradient, from low frequency in Southern Europe to higher
frequency in Northern Europe, and very strong evidence
of selection. The plausible hypothesis is that
as the Neolithic moved north, your cow was in your hut
with you during the winter when there was little
to eat outside. And if you could use
the cow's milk fresh, you would survive
the winter better. And if you as the hunter
gatherer during the winter died, your children were going to die.

So there's very strong
survival value in being able to
drink fresh milk. In Southern Europe what
happens to fresh milk? Converted to yogurt. So yogurt is a varied
part of that diet, but what makes yogurt? Lactobacillus that
digests the lactose. So here we have culture and selection operating
on a genetic trait.

East Africa has adult
lactose tolerance as well, but from a different
independent mutation. There are many ways one can
think of looking at selection, but they're mostly at
the moment statistical until there's a solid
biological explanation. And the one I just gave
you is a good story, probably makes sense. But it's not proof.

So there are others. We know there are
variants in hemoglobin. Everybody knows about
sickle cell hemoglobin. There, there is proof.

The different susceptibility
of the different genotype to infection from the
trypanosomes is clear. The survival of infants is
clear in a malarial environment. [ Music ].

Tidak ada komentar:

Posting Komentar