The NCBI ALlele Frequency Aggregator (ALFA) pipeline is developed to compute allele frequency for variants in dbGaP across approved un-restricted studies and to provide the data as open-access to the public through dbSNP. The goal of the ALFA project is to make frequency data from over 1M dbGaP subjects open-access in future releases to facilitate discoveries and interpretations of common and rare variants with biological impacts or causing diseases.

dbGaP contains the results of over 1,200 studies that have investigated the interaction of genotype and phenotype.  The database has over two million subjects and hundreds of millions of variants along with thousands of phenotypes and molecular assay data.  The harmonized ALFA data will allow the wider scientific community to access allele frequency for millions of variants in dbGaP.  Only dbGaP studies that have been approved by the submitting institutions for sharing of summary statistics are included in ALFA dataset for open-access. Genotype and associated individual-level data are accessible through dbGaP authorized access.

The initial release of ~100 thousand subjects included allele counts and frequency for 447 million rs site including 4 million novel ones aggregated from 551 billion genotypes. More information about ALFA, data access, webinars, and tutorials can be found at and any questions about ALFA track data should be forwarded to