How can machine learning algorithms find drunk Twitter users?

A University of Rochester study was able to develop a system for identifying and tracking Twitter users' drinking habits based on their Tweets.

Kacper Pempel/Reuters/File
People holding mobile phones are silhouetted against a backdrop projected with the Twitter logo. Twitter was used as a starting point for a project tracking alcohol use through social media.

Using Twitter to follow trends is nothing new; the social media platform is known for actively tracking popular topics and highlighting them on its website. But a new algorithm may be able to detect a different type of pattern among its users: drinking habits.

Twitter keeps track of what its users post, when they post, and where they post from, and with that data a team of University of Rochester researchers was able to develop a method for evaluating how and where Twitter users drink alcohol.

“Analysis of Twitter has become a widespread approach for geo-spatial studies of human behavior, such as alcohol consumption and exercise, and human latent states, such as sickness and depression,” the researchers wrote in a summary of their study.

“However, nearly all prior work … does not attempt to distinguish mere mentions of activities or states from self-reports of activity. Moreover, no attempt has been made to distinguish reports about future or past activities and in-the-moment reports that provide finer details when geo-tagged tweets are used to map specific locations of activities,” they added, highlighting what they hoped to address through their investigation.

In order to track regional drinking habits through Twitter, the team came up with a system with which they could identify relevant tweets. The Rochester analysts came up with a series of three questions they used to determine if a tweet originated from a drinking user: Does the tweet mention alcoholic beverages – did they use words such as “drunk,” “beer,” or “alcohol?” Is the tweet about the tweeter consuming such beverages? And, is it likely the tweet was sent while the tweeter was drinking?

The study used volunteers on Amazon's Mechanical Turk – an online marketplace where “requesters” can post tasks to be completed by human “turkers” – to best evaluate how to find drinking-related tweets. Using data from the human trials, the team was able to program a support vector machine to follow the same line of inquiry as the humans did in order to accurately find relevant tweets.

Using that initial process, and further machine learning predictive algorithms to estimate tweeters’ locations, an analysis of Twitter users’ alcohol consumption habits was compiled. All tweets in the study were taken from the New York City metropolitan area, and the results are based around drinking preferences in the city versus the suburbs, and drinking at home versus drinking away from home.

The Rochester team found that most drinkers stay relatively close to home when imbibing in both residential situations, with suburban drinkers more likely to stray farther away. The researchers also found a positive correlation between the density of “alcohol outlets” such as liquor stores and bars and the amount of Tweets sent out about drinking. While the paper notes that “correlation does not necessarily imply causation,” it cites several previous studies that arrived at similar conclusions regarding alcohol availability and drinking.

The final results painted an interesting picture of New York’s drinking habits, but also suggested that similar algorithms and research methodology could be used to “help to create a tool for improving a community’s health, given social networks can become a resource to spread positive health behaviour,” wrote the researchers. They did, however, note one significant bias in the report: the relatively high rate of young and minority users on the Twitter platform. But they said that studies in all fields see similar problems and could be weighted accordingly, and that their final conclusions were fairly successful in analyzing the New York drinking scene, with high potential for the future of complementary Twitter-based systematic studies.

“Our results demonstrate that tweets can provide powerful and fine-grained cues of activities going on in cities,” the team said.

You've read  of  free articles. Subscribe to continue.

Dear Reader,

About a year ago, I happened upon this statement about the Monitor in the Harvard Business Review – under the charming heading of “do things that don’t interest you”:

“Many things that end up” being meaningful, writes social scientist Joseph Grenny, “have come from conference workshops, articles, or online videos that began as a chore and ended with an insight. My work in Kenya, for example, was heavily influenced by a Christian Science Monitor article I had forced myself to read 10 years earlier. Sometimes, we call things ‘boring’ simply because they lie outside the box we are currently in.”

If you were to come up with a punchline to a joke about the Monitor, that would probably be it. We’re seen as being global, fair, insightful, and perhaps a bit too earnest. We’re the bran muffin of journalism.

But you know what? We change lives. And I’m going to argue that we change lives precisely because we force open that too-small box that most human beings think they live in.

The Monitor is a peculiar little publication that’s hard for the world to figure out. We’re run by a church, but we’re not only for church members and we’re not about converting people. We’re known as being fair even as the world becomes as polarized as at any time since the newspaper’s founding in 1908.

We have a mission beyond circulation, we want to bridge divides. We’re about kicking down the door of thought everywhere and saying, “You are bigger and more capable than you realize. And we can prove it.”

If you’re looking for bran muffin journalism, you can subscribe to the Monitor for $15. You’ll get the Monitor Weekly magazine, the Monitor Daily email, and unlimited access to CSMonitor.com.