Main
Analysing the full range of human genetic diversity advances our understanding of disease risk and biological mechanisms, and informs the development of safer and more effective therapies 1 . Pakistan, like much of South Asia, has been historically under-represented in large-scale endeavours to catalogue genetic diversity. To address this gap, we have established the Pakistan Genome Resource (PGR), a biobank with the primary goal of expanding the breadth of human genetic variation through the sequencing of hundreds of thousands of participants across Pakistan. In doing so, the PGR provides Pakistani communities with comprehensive information about their genetic heritage and its potential relationships to health, and establishes a genomic reference with the potential to improve healthcare outcomes for patients by discovering novel gene–disease associations 1 , 2 , 3 , 4 .
PGR is a nested case–control cohort comprising whole exomes or genomes of 173,303 individuals recruited from 23 different cities across Pakistan (Fig. 1a ). PGR expands on the Pakistan Risk of Myocardial Infarction Study (PROMIS) from 2017 by 15-fold 4 . With its high levels of familial relatedness, PGR is well-suited to discover instances of genes in which both alleles are identical loss-of-function (LoF) variants. Individuals who are homozygous for LoF alleles have historically been called human gene ‘knockouts’. As this term denotes the purposeful deletion of a gene, we prefer to describe these individuals as carriers of homozygous LoF (homLoF) variants, which is a more accurate description of such naturally occurring genotypes.
Fig. 1: PGR encompasses diverse ethnicities recruited across Pakistan. The alternative text for this image may have been generated using AI.
Full size image
a , Map showing the locations of 23 cities where PGR recruitment centres are located. Copyright OpenStreetMap contributors. b , c , The first three principal components ( b ) and principal component 1 (PC1) versus PC2 ( c ) from a PCA of PGR participants and individuals of specific populations from the 1KG dataset: South Asian (SAS; n = 492), East Asian (EAS; n = 515), African (AFR; n = 671) and European (EUR; n = 521). A random selection of 1,000 individuals from each of the 5 largest ethnic groups within the PGR and 1,000 random individuals from the remainder of the PGR were selected from the broader PCA analysis for visualization. d , e , PC1 versus PC2 ( d ) and PC1 versus PC3 ( e ) from a separate PCA analysis that is restricted to 1KG SAS and PGR participants with less than 2.5% AFR admixture. A maximum of 200 randomly selected individuals from each ethnicity are shown.
Supplementing participant genotype data, PGR includes a compendium of corresponding clinical data (Supplementary Table 1 ), including medical history, clinical measurements and disease status for most individuals. PGR facilitates bespoke recall-by-genotype (RBG) studies, enabling familial genotype–phenotype analyses. Such studies can expand the discovery of homLoF variant carriers, as demonstrated in PGR for APOC3 (ref. 4 ), CORIN 5 , GDF15 6 , SLC30A8 (ref. 7 ) and CASP1 8 .
With its spectrum of South Asian-enriched genetic variants, high familial relatedness, RBG capabilities and clinical phenotypes, PGR complements UK Biobank 9 , FinnGen 10 , All of Us Research Program 11 and the Mexico City Prospective Study 12 . Here we summarize the regional genetic diversity of PGR, the discovery of homLoF variants across one-third of all protein-coding genes, and present examples of genetic associations and RBG analyses that provide insights into gene function, disease biology and therapeutic translation.
Cohort demographics
PGR comprises nested case–control recruitments across multiple diseases from public and private hospital systems in Pakistan (cohort demographics shown in Supplementary Table 1 ). We sequenced 166,625 exomes and 6,678 genomes and identified 6,625,995 variants in the coding regions of 19,021 genes (Table 1 ), including 1,976,965 synonymous (42.1% singletons), 4,186,236 missense (47.0% singletons) and 370,128 putative LoF (pLoF; 59.3% singletons) variants. Of the 6,625,995 coding variants, 3,113,192 (47%) were unique to PGR relative to gnomAD v4.1 non-South Asian individuals 13 and 1,974,277 (30%) were unique to PGR relative to all gnomAD inclusive of South Asian individuals.
Table 1 Number of coding variants discovered in PGR Full size table
To measure genetic differences between PGR and other populations, we calculated the fixation index ( F ST ) (ref. 14 ) for pairs of self-reported PGR ethnicities, reference populations from the 1000 Genomes Project (1KG) 15 , and the Human Genome Diversity Project (HGDP) 16 , 17 (Extended Data Figs. 1 and 2 and Supplementary Table 2 ). PGR and other South Asian populations were genetically closer to European and Middle Eastern groups than to East Asian or African groups (Supplementary Table 2 , Extended Data Fig. 1 ). Within…
Read the full article at Nature News →