1000 Genomes Phase 1 and Phase 3 Ethnic Breakdown

At one point in time, I had the answer committed to memory, but now that I’ve moved out of the world of genetic epi, I had to spend some time researching this question today:

What are the ethnic breakdowns of the 1000 Genomes Phase 1 and Phase 3 populations?

From this rather hard to find file on the 1000GP FTP site, we have the following for phase 1:

ASW	AFR	61
LWK	AFR	97
YRI	AFR	88
CLM	AMR	60
MXL	AMR	66
PUR	AMR	55
CHB	EAS	97
CHS	EAS	100
JPT	EAS	89
CEU	EUR	85
FIN	EUR	93
GBR	EUR	89
IBS	EUR	14
TSI	EUR	98

yielding the following phase 1 “super population” totals:

AFR	246
AMR	181
EAS	286
EUR	379

For phase 3, it’s a bit trickier. There were originally 2,535 samples sequenced, but 31 were removed due to unexpected relatedness, leaving us with 2504 samples found in the Phase 3 release VCF files. I had to download a phase 3 VCF file, look at the header to get the list of 2504 individuals, then compare against the 1000 Genomes “sample info” spreadsheet to derive the required information.

Here’s the breakdown of the 2504 samples included in phase 3:

ACB	AFR	96
ASW	AFR	61
ESN	AFR	99
GWD	AFR	113
LWK	AFR	99
MSL	AFR	85
YRI	AFR	108
CLM	AMR	94
MXL	AMR	64
PEL	AMR	85
PUR	AMR	104
CDX	EAS	93
CHB	EAS	103
CHS	EAS	105
JPT	EAS	104
KHV	EAS	99
CEU	EUR	99
FIN	EUR	99
GBR	EUR	91
IBS	EUR	107
TSI	EUR	107
BEB	SAS	86
GIH	SAS	103
ITU	SAS	102
PJL	SAS	96
STU	SAS	102

with the phase 3 super populations as follows:

AFR	661
AMR	347
EAS	504
EUR	503
SAS	489

By the way, here’s a legend of those rather vague ethnicity IDs:

CDX	Chinese Dai in Xishuangbanna, China
CHB	Han Chinese in Bejing, China
JPT	Japanese in Tokyo, Japan
KHV	Kinh in Ho Chi Minh City, Vietnam
CHS	Southern Han Chinese, China
BEB	Bengali in Bangladesh
GIH	Gujarati Indian in Houston,TX
ITU	Indian Telugu in the UK
PJL	Punjabi in Lahore,Pakistan
STU	Sri Lankan Tamil in the UK
ASW	African Ancestry in Southwest US
ACB	African Caribbean in Barbados
ESN	Esan in Nigeria
GWD	Gambian in Western Division, The Gambia
LWK	Luhya in Webuye, Kenya
MSL	Mende in Sierra Leone
YRI	Yoruba in Ibadan, Nigeria
GBR	British in England and Scotland
FIN	Finnish in Finland
IBS	Iberian populations in Spain
TSI	Toscani in Italy
CEU	Utah residents with Northern and Western European ancestry
CLM	Colombian in Medellin, Colombia
MXL	Mexican Ancestry in Los Angeles, California
PEL	Peruvian in Lima, Peru
PUR	Puerto Rican in Puerto Rico