Combining Crowd-Sourcing and Automated Content Methods to Improve Estimates of Overall Media Coverage: Theme Mentions in E-cigarette and Other Tobacco Coverage

Abstract [from journal]

Exposure to media content can shape public opinions about tobacco. Accurately describing content is a first step to showing such effects. Historically, content analyses have hand-coded tobacco-focused texts from a few media sources which ignored passing mention coverage and social media sources, and could not reliably capture over-time variation. By using a combination of crowd-sourced and automated coding, we labeled the population of all e-cigarette and other tobacco-related (including cigarettes, hookah, cigars, etc.) 'long-form texts' (focused and passing coverage, in mass media and website articles) and social media items (tweets and YouTube videos) collected May 2014-June 2017 for four tobacco control themes. Automated coding of theme coverage met thresholds for item-level precision and recall, event validation, and weekly-level reliability for most sources, except YouTube. Health, Policy, Addiction and Youth themes were frequent in e-cigarette long-form focused coverage (44%-68%), but not in long-form passing coverage (5%-22%). These themes were less frequent in other tobacco coverage (long-form focused (13-32%) and passing coverage (4-11%)). Themes were infrequent in both e-cigarette (1-3%) and other tobacco tweets (2-4%). Findings demonstrate that passing e-cigarette and other tobacco long-form coverage and social media sources paint different pictures of theme coverage than focused long-form coverage. Automated coding also allowed us to code the amount of data required to estimate reliable weekly theme coverage over three years. E-cigarette theme coverage showed much more week-to-week variation than did other tobacco coverage. Automated coding allows accurate descriptions of theme coverage in passing mentions, social media, and trends in weekly theme coverage.