METHODS: We utilized a convenience sample (n=397) of the 1997-2007 Olmsted County Birth Cohort. The training cohort (n=177) was used for training the NLP systems for extracting from both children’s and their parents’ EMRs key terms/sentences related to 1) a patient (child) level (ie, history of breastfeeding and other atopic conditions [allergy rhinitis, eczema, and food allergy] and 2) a family level (ie, family history of asthma and other atopic conditions) risk factors. We assessed the performance of the NLP algorithms with manual chart review as gold standard (criterion validity) in an independent test cohort (n=220).
RESULTS: The median age of the test cohort was 13 years (50 % female, 81% Whites, 63% asthmatics). 90% and 6% of children had history of breastfeeding and food allergy, respectively with prevalence of other histories ranging 15-52%. Positive predictive values for NLP algorithms in predicting each asthma-related variable were 87-100% and negative predictive values were 86-99%. The average time duration for collecting risk factors for asthma obtained in this study was 7 hours for manual chart review and 50 minutes for NLP algorithms.
CONCLUSIONS: As the NLP algorithms for identifying individual risk factors for asthma from EMRs are cost-effective and suitable, it will be a useful tool for large-scale clinical studies.