Methods: As of October 2017, BioVu has 50,275 individuals with paired clinical and genetic data. Natural language processing tools were applied to the EHR allergy sections to identify individuals allergic to penicillin, codeine, or “sulfas” (the three most common drug allergies) or to statins (for which adverse effects have been associated with genetic variation). A genome-wide association study was performed for each drug; cases and controls were defined as individuals with and without the drug documented in the allergy section, respectively.
Results: The top three drugs listed in the drug allergy section of the EHR were penicillin (8,474 cases; 12,710 controls), “sulfas” (6,259 cases; 11,871 controls), and codeine (5,240 cases; 13,858 controls). We also identified 2,750 cases labeled as statin allergy and 19,724 controls. Significant associations included SNPs within HSD17B13 for statins and SNPs within WBP2NL and SLC25A5P1 for cutaneous and gastrointestinal reactions to codeine. Query of expression quantitative trait loci (eQTL) databases suggested a strong connection between WBP2NL and CYP2D6, which metabolizes codeine to its active component, morphine.
Conclusions: The extraction of common “drug allergy” information from EHRs for pharmacogenetic studies represents a promising approach to further understand contributing genetic factors and mechanisms of such labels.