Using Social Media to Track Geographic Variability in Language About Diabetes: Analysis of Diabetes-Related Tweets Across the United States

Abstract [from journal]

Background: Social media posts about diabetes could reveal patients' knowledge, attitudes, and beliefs as well as approaches for better targeting of public health messages and care management.

Objective: This study aimed to characterize the language of Twitter users' posts regarding diabetes and describe the correlation of themes with the county-level prevalence of diabetes.

Methods: A retrospective study of diabetes-related tweets identified from a random sample of approximately 37 billion tweets from the United States from 2009 to 2015 was conducted. We extracted diabetes-specific tweets and used machine learning to identify statistically significant topics of related terms. Topics were combined into themes and compared with the prevalence of diabetes by US counties and further compared with geography (US Census Divisions). Pearson correlation coefficients are reported for each topic and relationship with prevalence.

Results: A total of 239,989 tweets from 121,494 unique users included the term diabetes. The themes emerging from the topics included unhealthy food and drink, treatment, symptoms/diagnoses, risk factors, research, recipes, news, health care, management, fundraising, diet, communication, and supplements/remedies. The theme of unhealthy foods most positively correlated with geographic areas with high prevalence of diabetes (r=0.088), whereas tweets related to research most negatively correlated (r=-0.162) with disease prevalence. Themes and topics about diabetes differed in overall frequency across the US geographical divisions, with the East South Central and South Atlantic states having a higher frequency of topics referencing unhealthy food (r range=0.073-0.146; P<.001).

Conclusions: Diabetes-related tweets originating from counties with high prevalence of diabetes have different themes than tweets originating from counties with low prevalence of diabetes. Interventions could be informed from this variation to promote healthy behaviors.