The Science of Forensic Genetics

The ability to produce highly discriminating profiles is dependent on individuals being different at the genetic level and, with the exception of identical twins, no two individuals have the same DNA.  However, individuals, even ones that appear very different, are actually very similar at the genetic level.  Indeed, if we compare the human genome to that of our closest animal cousin, the chimpanzee, with whom we share a common ancestor around six million years ago, we find that our genomes have diverged by only around five percent; the DNA sequence has diverged by only 1.2 percent and insertions and deletions in both human and chimpanzee genomes account for another 3.5 percent divergence.  This means that we share ninety five percent of our DNA with chimps! Modern humans have a much more recent common history, which has been dated using genetic and fossil data to around 150,000 years ago.  In this limited time, single nucleotide mutations have led to an average of one difference every 1,000–2,000 bases between every human chromosome, averaging one difference every 1,250 base pairs–which means that we share around 99.9 percent of our genetic code with each other.  There have been attempts to define populations genetically based on their racial identity or geographical location, and while it has been possible to classify individuals genetically into broad racial/geographic groupings, it has nevertheless been shown that the great majority of genetic variation—around eighty five percent—can be attributed to differences between individuals within a population.  It is thus widely accepted that race is a purely sociocultural phenomenon with no basis in genetics.

Tandem repeats

Two important categories of tandem repeat have been used widely in forensic genetics: minisatellites, also referred to as variable number tandem repeats (VNTRs); and microsatellites, also referred to as short tandem repeats (STRs).  The general structure of VNTRs and STRs is the same.  Variation between different alleles is caused by a difference in the number of repeat units that results in alleles that are of different lengths and hence tandem repeat polymorphisms are known as length polymorphisms. 

Variable number tandem repeats (VNTRs)

VNTRs are located predominantly in the subtelomeric regions of chromosomes and have a core repeat sequence that ranges in size from six to one hundred base pairs.  The core repeats are represented in some alleles thousands of times; the variation in repeat number creates alleles that range in size from 500 base pairs to over 30,000 base pairs.  The number of potential alleles can be very large: the MS1 locus, for example, has a relatively short and simple core repeat unit of nine base pairs with alleles that range from approximately 1,000 to over 20,000 base pairs – which means that there are potentially over 2,000 different alleles at this locus. 

VNTRs were the first polymorphisms used in DNA profiling and they were successfully used in forensic casework for several years.  The use of VNTRs was, however, limited by the type of sample that could be successfully analyzed because a large amount of DNA was required.  Interpreting VNTR profiles could also be problematic.  Their use in forensic genetics has now been replaced by short tandem repeats (STRs). 

Short tandem repeats (STRs)

STRs are currently the most commonly analyzed genetic polymorphism in forensic genetics.  They were introduced according to the number of repeats that they contain tool for just about every forensic laboratory in the world–the vast majority of forensic genetic casework involves the analysis of STR polymorphisms.  There are thousands of STRs that can potentially be used for forensic analysis.  STR loci are spread throughout the genome including the twenty two autosomal chromosomes and the X and Y sex chromosomes.  They have a core unit of between one and six base pairs and the repeats typically range from 50 to 300 base pairs.

Single nucleotide polymorphisms (SNPs)

The simplest type of polymorphism is the SNP: single base differences in the sequence of the DNA.   SNPs are formed when errors (mutations) occur as the cell undergoes DNA replication.  Some regions of the genome are richer in SNPs than others.  For example, chromosome one contains an SNP on average every 1,450 base pairs compared with chromosome nineteen, where SNPs occur on average every 2,180 base pairs.  SNPs normally have just two alleles: for example one allele with a guanine and one with an adenine, and therefore are not highly polymorphic.  However, SNPs are so abundant throughout the genome that it is theoretically possible to type hundreds of them.  This will make the combined power of discrimination very high.  It is estimated that to achieve the same discriminatory power that is achieved using ten STRs, fifty to eighty SNPs would have to be analyzed.  With current technology, this is much more difficult than analyzing ten STR loci.