You are here

Social Media Data as Research Tool

Story by
Nancy Joseph

Every cellphone call and post on social media leaves a trail of digital data. Much has been written about the use of such data for political and commercial gain, but social science researchers like Lee Fiorio and Connor Gilroy mine the data for a loftier reason: to understand our world.

Fiorio and Gilroy, graduate students and fellows at the UW’s Center for Studies in Demography and Ecology (CSDE), have spent the past few years tackling big societal questions. Fiorio, a doctoral student in geography, focuses on migration across the US and the globe; Gilroy, a doctoral student in sociology, looks at our willingness to share information about our sexual identity. CSDE has provided them with fellowship support, mentorship, computing resources, and other support.

Both Fiorio and Gilroy have tapped traditional information sources such as the US Census for their research, but they find that the massive quantities of data generated through social media or cell phone use — also known as digital trace data — provide a particularly rich snapshot of society. “The data are generated in real time, they are generated very quickly, and that’s very different from a traditional demographic data source like the US Census, which comes out once every ten years,” says Gilroy.

Lee Fiorio and Connor Gilroy in front of three rows of computers

Graduate students Lee Fiorio (left) and Connor Gilroy explore societal questions using digital data from social media posts and cellphone records. Media credit: Colette Cosner

For his sexual identity research, Gilroy’s main data source is information that Facebook users provide themselves, including age, marital status, geographic location, and sexual orientation. He requests data the same way an advertiser would, but with the goal of understanding the influence of various factors on an individual’s decision to disclose sexual orientation. For example, he might request a count for married individuals in a specific age range who identify as heterosexual, first from one region of the country and then from another region for geographic comparison.

“I’m using counts, or aggregations, of the number of people that fit different characteristics according to their Facebook social media profiles,” Gilroy explains. “And I can use those data because Facebook’s business model relies on offering advertisers the ability to target ads to highly specific groups of people. It’s a rounded estimate — I’m not seeing individuals — which makes me feel more comfortable using the data. I’m not going to accidentally uncover anything about any one individual.”

The data are...generated very quickly, and that’s very different from a traditional demographic data source like the US Census.

Gilroy has found that two factors are particularly significant in the choice to identify sexual orientation on Facebook: age and relationship status. He interprets this finding as a difference in what sexual orientation means to different people. “In older people, marital status is substituting for sexuality,” he says. “But younger people see sexual orientation as a relevant thing to disclose regardless of marital status. It’s something they disclose on a daily basis and seems like less of an imposition.”

Although Facebook has been a rich data source for Gilroy, there are complications. Last year, Facebook removed access to sexual orientation information for advertisers, limiting Gilroy to pre-2018 data. Other platforms also have made changes in the information they collect and provide. “Just as we’re figuring out what we can know from a data source, the ground will shift beneath our feet and we’ll have to start again,” Gilroy says.

closeup of a cellphone

Cellphones provide date and location data that may be useful in studies of migration. Media credit: Rodion Kutsae/Wikimedia

Fiorio has faced similar changes with Twitter, a valuable source of location data for his migration research. In the past, Twitter automatically generated time and location information with every tweet, but in 2015 users were given the option to turn off the location feature and voluntarily add place tags to their posts instead. While that was welcome for individual privacy, it put a wrench in Twitter-dependent research projects. “This naturally changed the kinds of behavior that are generating locational information,” says Fiorio. “If you are consciously putting in a tag, you’re more likely to be traveling. This makes it harder to compare data over time.”

Location and date information are key to Fiorio’s research, which aims to address a major challenge in the study of human migration: inconsistent data. Various countries, organizations, and institutions use different time spans to define migration — six months in a new place may be the definition of an immigrant for one country, twelve months may be the definition for another. These inconsistencies make it difficult to accurately estimate migration, with implications for everything from housing to health care planning.  

“Scholars of migration have long been aware of these complexities, but their ability to investigate them has been constrained by the lack of high quality, longitudinal migration data,” Fiorio says. He believes that digital trace data — collected when people place calls, send text messages, post to social media, or use other web applications — can provide detailed location and date information that will provide much-needed flexibility in measuring migration.

“The research is about methodology — seeing what digital trace data can tell us that might be missing when we go about estimating migration in the standard, traditional way,” Fiorio says. By integrating big data with more traditional migration data sources, he hopes to identify common patterns in calculating migration so that inconsistencies can be addressed. His disparate digital trace data sources include Twitter data, cellphone data, and data from a location-based app.

As researchers like Fiorio and Gilroy look to social media data for answers to societal questions, they are well aware that the availability of that data, and the public’s concerns about it being shared, will continue to evolve. That can make their work more challenging, but they believe the benefits outweigh the frustrations.

“We as a society are developing a stronger sense of what we want social media platforms to be in our lives,” says Gilroy. “Demands for privacy and transparency are getting louder and more explicit. In the short term that may be disruptive to some of these data sources. But as social media platforms are integrated into our day-to-day lives, if we as researchers can be in a position to use that data to capture a little bit of understanding of what’s going on at this moment, I think that can be a pretty valuable contribution. In the long term, when social media platforms have data collection practices that the public is more comfortable with, researchers will still find ways to make meaning out of the data.”