Importance Large-scale data on type-specific human papillomavirus (HPV) prevalence and disease burden worldwide are needed to guide cervical cancer prevention efforts. Promoting the research and application of health care big data has become a key factor in modern medical research.
Objective To examine the prevaccination prevalence of high-risk HPV (hrHPV) and type distribution by cervical cytology grade in Estonia.
Design, Setting, and Participants This cross-sectional study used text mining and the linking of data from electronic health records and health care claims to examine type-specific hrHPV positivity in Estonia from 2012 to 2019. Participants were women aged at least 18 years. Statistical analysis was performed from September 2021 to August 2022.
Main Outcomes and Measures Type-specific hrHPV positivity rate by cervical cytological grade.
Results A total of 11 017 cases of cervical cytology complemented with data on hrHPV testing results between 2012 and 2019 from 66 451 women aged at least 18 years (mean [SD] age, 48.1 [21.0] years) were included. The most common hrHPV types were HPV16, 18, 31, 33, 51 and 52, which accounted for 73.8% of all hrHPV types detected. There was a marked decline in the positivity rate of hrHPV infection with increasing age, but the proportion did not vary significantly based on HPV type. Implementation of nonavalent prophylactic vaccination was estimated to reduce the number of women with high-grade cytology by 50.5% (95% CI, 47.4%-53.6%) and the number with low-grade cytology by 27.8% (95% CI, 26.3%-29.3%), giving an overall estimated reduction of 33.1% (95% CI, 31.7%-34.5%) in the number of women with precancerous cervical cytology findings.
Conclusions and Relevance In this cross-sectional study, text mining and natural language processing techniques allowed the detection of precursors to cervical cancer based on data stored by the nationwide health system. These findings contribute to the literature on type-specific HPV distribution by cervical cytology grade and document that α-9 phylogenetic group HPV types 16, 31, 33, 52 and α-7 phylogenetic group HPV 18 are the most frequently detected in normal-to-high-grade precancerous lesions in Estonia.