Population structure of human races and dog breeds
A key assumption of the race-breed analogy is that both human “races” (i.e. U.S. census groupings) and dog breeds are formed and structured in similar ways, with each representing distinct groups within each species. If this assumption holds, then one expects to observe both high levels of among-group diversity and low levels of within-group diversity. Put another way, this predicts that groups (whether races or breeds) are clearly distinguishable from each other while at the same time also being internally very similar. Physical anthropologists have a long history of trying to classify people into groups based on biological traits (for example, skin color, cranial measurements, blood group antigens, and more recently DNA). Notably, such groups often varied depending on the trait studied, the populations explored, and the political motivations of the scientist doing the classifying (reviewed in Marks 2012a, b). Today, anthropologists remain interested in the patterns of variation in these and myriad other biological and biocultural traits. However, their motivations for doing so are to use such information to reconstruct human evolutionary history, and to investigate the biological and sociocultural processes that shape phenotypes, rather than to identify biologically discrete human groups.
Patterns of among- versus within-group genetic diversity can be assessed using various tools and methods from the field of population genetics. For example, both the FST statistic and analysis of molecular variance (AMOVA) allow one to investigate patterns of among- versus within-group genetic diversity. Higher FST values indicate a more structured population (i.e. possessing distinct clusters) while lower values (closer to 0) imply less structure (i.e. possessing few or no distinct clusters, most likely due to higher rates of random mating among individuals). AMOVA allows a researcher to partition the total amount of genetic variation in a sample into different levels. When a large percentage of the total variation in a sample can be explained by among-group differences within the sample, this suggests that the sample is highly structured, and composed of distinct genetic subpopulations. Alternatively, when variation among individuals within groups explains a large portion of variation in the total sample of all the groups, it implies a less structured population.
In addition, statistical programs such as structure (Pritchard et al. 2000) use model-based clustering algorithms to place individuals into a predetermined number of groups based on multilocus genotype data, and to estimate the fraction of genetic ancestry that individuals have from each of these groups. Results are displayed graphically, with population groups denoted by different colors. Individuals with ancestry from multiple groups are displayed using multiple colors (see Figs. 1 and 2 for examples and further explanation). It is important to note that structure will always identify the number of groups specified by the user—the program tries to find the best way to allocate sampled individuals into k user-defined groups in a way that will maximize Hardy–Weinberg equilibrium for each group (Bolnick 2008). As such, it is important for users to run structure for multiple values of k, and evaluate the statistical likelihood of each of these models.
Clustering assignment of 85 dog breeds by Parker et al. (2004): “seventy-four breeds are represented by five unrelated dogs each, and the remaining 11 breeds are represented by four unrelated dogs each. Each individual dog is represented on the graph by a vertical line divided into colored segments corresponding to different genetic clusters. The length of each colored segment is equal to the estimated proportion of the individual’s membership in the cluster of corresponding color (designated on the y axis as a percentage). Breeds are labeled below the figure”
From Rosenberg et al. (2002) estimated population structure for the 52 sampled populations of the HGDP-CEPH panel for pre-chosen values of K = 2 through K = 6. Each cluster (K) is represented by a different color. Each individual is a vertical line, which depicts an estimate of that individual’s membership in each cluster (multiple colors indicate membership in more than one cluster). Thin black lines denote individual populations. Population labels are shown at the bottom of the figure, while broad regional labels are listed at the top of the figure. While broad geographic clustering occurs, note that many individuals share genetic similarities with more than one cluster. This is particularly true within continents and for individuals from populations at the borders of continents
Structure’s results are sensitive to a number of factors, including linkage between loci, Hardy–Weinberg equilibrium, sample sizes of populations, genetic drift, and geographic distribution of populations (discussed in Lawson et al. 2018; Novembre 2016; Bolnick 2008). Further, interpretation of the groups identified by structure as real, “pure” groups instead of statistical constructs runs counter to how evolution works and also runs the risk of reifying old and false biological conceptions of race (Weiss 2018; Weiss and Lambert 2010, 2011, 2014; Weiss and Long 2009). Specifically, such misinterpretations imply that at some point in our evolutionary past there existed a set number of distinct homogeneous groups, and that modern populations or individuals with ancestry from multiple groups are somehow less “pure”. It must be emphasized that no current or past population is homogenous or pure and no living population is any group’s ancestral population. As Weiss and Lambert (2010) point out, these approaches can “describe data as if they reflect true evolutionary history. As-if fictions can be useful as analytic tools if everyone understands they are simply convenient statistical digests. But the phrasing of papers often suggests [without ‘imputing to them any social racism’ (p. 97)] that the typological conclusions are being taken as if they represent actual history.” (p. 95).
While the data and results from structure can be misinterpreted in the ways described above, they can be helpful in illustrating if and how genetic variation is shared across groups. Now, let’s consider levels and patterns of dog and human genetic variation to see how they compare. In 2004, Parker and colleagues analyzed data from single nucleotide polymorphisms (SNPs) for 120 dogs representing 60 breeds as well as 96 microsatellite loci genotyped in 414 dogs representing 85 breeds. Both STR and SNP data demonstrated low levels of within-breed heterozygosity, indicating that within breed genetic variation was low (H = 0.313–0.610), while FST estimates also indicated high levels of differentiation among breeds (FST = 0.33). These results were consistent with earlier studies looking at smaller numbers of breeds (Koskinen 2003; Irion et al. 2003), and have been supported by subsequent studies of dog population structure and domestication (vonHoldt et al. 2010). In their AMOVA analysis of the 96 genotyped microsatellites, Parker and colleagues (Parker et al. 2004) report that ~ 27% of variation among dogs in their sample could be attributed to variation across breeds, with the remainder of the genetic variation explained by within-breed variation, implying that the breeds in their sample are highly genetically isolated from each other.
Parker et al. (2004) then used the program structure to place individual dogs into a predefined number of population clusters. Running structure on overlapping subsets of 20–22 breeds at a time, they observed that the majority of individual dogs could be placed into distinct clusters that corresponded with their reported breed identity (Fig. 1). Using genotype data alone, they correctly identified the breed of 99% of the dogs included in their sample. Taken together, the low within-breed heterozygosity, high among-breed FST, AMOVA, and structure results all present a picture of a highly structured population.
Parker et al.’s analysis of dog population structure can be compared to an earlier study of human population structure using similar methods (Rosenberg et al. 2002). In this paper, Rosenberg and colleagues utilized allele frequency data from 377 microsatellites genotyped in the 52 populations of the HGDP-CEPH Human Genome Diversity Panel. Rosenberg et al. conducted AMOVA that examined genetic variance components within and among the individual populations of the HGDP-CEPH as well as within and among five and seven broad geographical groupings of these populations. These regional groups can be viewed as generally analogous to continental regions and U.S. census groupings (the seven-region scheme divides Europe/Middle East/Central Asia into three separate categories). The authors observed that genetic differences among regions accounted for only 3.3–4.7% of global human genetic variation (much smaller than the 27% of genetic differences among dog breeds reported by Parker et al. 2004), and that variation within populations accounts for ~ 92.9–94.3%. Differences among populations within regions accounted for 2.4–2.6% of the remaining genetic variation. In addition, within-region levels of heterozygosity (0.664–0.792; Rosenberg et al. 2002) were notably higher than those observed for dog breeds (0.313–0.610; Parker et al. 2004). This reflects the much greater total genetic variation within human groups compared to dog breeds. These results are comparable to those from other human datasets/populations, including HGDP-CEPH multilocus SNP data (Li et al. 2008). Furthermore, data from The 1000 Genomes Project demonstrates that FST values between continental groups are far lower (0.052–0.083) than FST values for dog breeds (The 1000 Genomes Project Consortium 2015). In sum, these data suggest that a greater degree of global genetic variation in humans can be attributable to variation within local populations, rather than between regional (racial) groups, and that substantial heterogeneity can be found within these groups. This stands in marked contrast to the lower levels of heterozygosity observed within dog breeds and the large amount of genetic variation that can be explained by breed differences.
Rosenberg et al. (2002) also used the program structure to explore patterns of human genetic variation in the HGDP-CEPH dataset (Fig. 2). They found support for a model of six genetic clusters, five of which roughly correspond to the broad continental regions of Africa, Europe/Middle East/Central Asia, East Asia, Oceania, and the Americas (the sixth cluster corresponded to the isolated Kalash population of northwest Pakistan). While some spuriously interpreted the identification of these clusters as support for a genetic basis for human racial groups (Wade 2014), others identify aspects of these results that are inconsistent with such an interpretation. First, Bolnick (2008) notes that in addition to finding support for the six-cluster model, Rosenberg and colleagues found support for models specifying a larger number of clusters, although the groupings of the 52 populations within those clusters were often inconsistent, suggesting a low confidence in any given clustering of the populations. The existence of multiple clustering models of human genetic variation contrasts the rigid breed-aligned clusters identified for dogs by Parker et al. Second, Rosenberg et al. found that most individuals had membership in more than one cluster, implying that genetic clusters did not represent discrete genetic units. This pattern was particularly noted for humans living near the borders of these geographically linked clusters. This supports a distribution of genetic variation that is driven by constant mating among neighboring populations and relatively low levels of genetic differentiation driven almost entirely by geographic factors.
In 2003 Bamshad and colleagues genotyped over 500 people from sub-Saharan Africa, Europe, Asia, and Southern India for 100 Alu insertion polymorphisms and 60 microsatellites. Much like Rosenberg et al., these authors used the program structure to identify genetically determined clusters of populations within this sample. They also attempted to place individuals from Africa, Europe, and East Asia into the correct continent of origin using only genotype data. The authors report being able to do so for 99–100% of the samples. In this case, a correct assignment meant that an individual was identified as having the greatest proportion of ancestry in the genetic cluster corresponding to their continent of origin. While this might sound like good support for the idea that humans can be assigned to unique and distinct genetic clusters that correspond to continental groups, the interpretation of these results is complex, as outlined by Bolnick (2008). First, structure analysis of the sub-Saharan African, European, and East Asian samples identified four clusters: Europeans, East Asians, sub-Saharan Africans (excluding Mbuti pygmy populations and three other African individuals), and a cluster consisting of Mbuti and the remaining three African individuals. However, most subsequent analyses were conducted assuming only three clusters (ignoring potential structure within Africa). Second, as noted by Bamshad et al., the populations chosen represent relatively small samples from a limited number of populations that are widely geographically dispersed—the inclusion of people from geographically intermediate regions may have lowered the accuracy of cluster assignment. As an illustration of this, when South Asian samples were included in the analyses, accuracy of cluster assignment for these samples was notably lower (87%; Bamshad et al. 2003).
Taken together, these comparisons suggest that these continental-based human racial categories differ from dog breeds in two ways. First, levels of within-group (within-“race” or U.S. census groupings) diversity in humans are generally considerably higher than diversity observed within dog breeds, while levels of differentiation among such human groups is lower than observed among breeds. Second, while it is possible to use algorithms such as the one implemented in structure to identify groups of humans that tend to cluster on the basis of genetic similarity (in other words, there is some structure to human genetic diversity as is expected given how humans, like all species, do not mate at random, but over the generations have tended to reproduce with those who live relatively close to them), those clusters tend to be highly porous (individuals may have membership in multiple clusters) and determining the “correct” number of clusters is subjective, even for geneticists. That is, U.S. census groupings are not the only way to impose order on patterns of human genetic variation.
In the comparisons above, we have used the continental population clusters identified by Rosenberg et al. as a proxy for U.S. racial categories. However, it is important to note that racial categories in the U.S. are not simple reflections of geography, nor are these categories applied using definitions of “race” proposed by some geneticists. For example, Dobzhansky [who argued against racial essentialism, as described in Jackson and Depew (2017)], defines races as “Mendelian populations that differ in the frequencies of some gene or gene” (Dobzhansky 1955). While this definition reflects the strong mathematical roots of population genetics (and could be applied in many different ways to different groups, nesting it within that larger, arbitrary social framework for “race”), the use of the term “race” in the U.S. encompasses far more than simple differences in allele frequencies.
Racial categories in the U.S. are drawn, in part, on the western concept of race first described by Linnaeus, which emphasized differences among humans based on geographic, physical, cultural, and behavioral factors. As described in Marks (2016), these categories were heavily influenced by the social, cultural, and political factors of that time. These included extended sea travel by Europeans (traveling great distances by sea tended to emphasize differences in appearance and culture, while land travel highlighted more gradual changes), as well as the strong motivating sociopolitical and economic influences of both colonialism and slavery. From the beginning, racism was embedded in race science. Within the U.S., racial categories (as recognized by the U.S. census) have shifted over time, reflecting concerns about slavery, immigration, hypo- and hyper-descent, and access to resources (Snipp 2003). Alongside that ongoing history, there has been disagreement among geneticists about how human genetic variation is patterned, most famously between Lewontin (1972) and Edwards (2003). For perspective, Marks (2010) wrote,
Other genetics researchers have concluded that human evolutionary history has produced a “nested pattern of genetic structure that is inconsistent with the existence of independently evolving biological races” (Hunley et al. 2009). This perspective complements Livingstone’s famous “there are no races, only clines” (1962) statement, which refers to the spectrum of continuous human phenotypic variation we see globally. Two examples of continuous geographic variation in the human species include human cranial shape and size (Relethford 2009) and skin color, a trait strongly influenced by natural selection, showing clinal variation in epidermal melanin moving north and south away from the most intensive UV radiation around the Equator (Gibbons 2014; Fig. 3). Clinal variation across human-defined boundaries like continents refutes clear-cut distinctions between human groups.
From “Shedding Light on Skin Color” by Gibbons (2014)
If patterns of genetic or biological variation were found to be identical between dogs and humans, or between any other species and humans, that would still not support a biologically-based concept of “race,” with or without its foundation for racism. Further consideration of these issues and a demonstration of the link between the biological concept of race and racism can be found in part 4, below. The point here is to show that the scientific sounding basis of the breed-race analogy does not hold up to science and we continue refuting the analogy next, with a comparison of the evolutionary histories of dogs and humans, which are the circumstances that created the observed variation today.
Visually, a Chihuahua is the chalk to a Great Dane’s cheese, yet they are still the same species, Canis lupus familiaris, and are direct descendants of the grey wolf. All domestic dog breeds are able to interbreed to give birth to reproductively viable offspring.
Subscribe to BBC Focus magazine for fascinating new Q&As every month and follow @sciencefocusQA on Twitter for your daily dose of fun science facts.
In a word, ‘no’. Domestic dogs evolved between 17,000-33,000 years ago. Most ‘breeds’, which have been artificially selected by humans, have arisen very recently within the last 200 years.
This is because their genomes remain relatively unchanged, despite their physical characteristics appearing so different. This key evidence tells us that various dog breeds are not in the running to become a new species any time soon. It takes a long time for mutations, which cause inheritable changes to characteristics, to arise within populations.
“Based on what we know about them as scientists and pet owners, [dogs] have definitely become something different from just wolves,” Tseng said.
Despite these minor differences, genetic data — especially mitochondrial DNA, which gets passed down through the maternal line — suggest that all dogs are the same species, and that wolves likely are, too. But from a societal standpoint, wolves and dogs are extremely different.
“They have the same number of teeth as wolves, but theres less space to put the teeth in,” Tseng said. “The teeth sometimes reduce in size, but also sometimes get rotated a little bit so they can fit more of them in the mouth.”
“If you were a biologist who comes from a society that never had any dogs associated with humans and you looked at these dogs, you would immediately think that these were different species,” Tseng told Live Science. [10 Things You Didnt Know About Dogs]
Forget aliens, said Jack Tseng, a paleontologist at the American Museum of Natural History in New York. If we hadnt actually bred dogs ourselves, even humans would have a hard time determining that a Cavalier King Charles spaniel and a wolfhound are related, he said.