Technology

When the humanities meet big data

Jacob Turcotte/Staff

Quick Read
Deep Read ( 3 Min. )

By Eoin O'Carroll Staff writer
@eoinocarroll

Updated May 16, 2018, 6:26 p.m. ET

How did four researchers read through 40,000 transcripts of speeches made during the early years of the French Revolution? They let a computer do it. A study published in the Proceedings of the National Academy of Sciences last month describes how specialists with backgrounds in informatics, European history, and astrophysics joined forces to develop a machine-learning program that quantifies the novelty and persistence of speech patterns. Their findings illustrate how new ways of thinking about governance emerged and spread at the dawn of our modern political era. The project is just one of many in an emerging field known as digital humanities, which brings scientists and humanists together to tackle questions in history, art, and literature from an entirely new perspective. “There’s no way that a single academic could have read all 10,000 bad pulpy novels published in the 19th century,” says Indiana University historian Rebecca Spang. “So you could ask different kinds of questions because you get different kinds of information.”

Why We Wrote This

New readings of history often reveal previously hidden insights. By enlisting computers to analyze historical texts, historians are spotting patterns in language that were once invisible.

Being a voracious reader is a prerequisite for academics in the humanities, but even the most dedicated bookworm needs to eat, sleep, and socialize.

Not so for computers, which are known for being tireless, thorough, and very fast. And, when asked the right kinds of questions, these electronic speed-readers can grasp patterns that would otherwise lie beyond the reach of human scholars.

That’s exactly what happened when a team of researchers used machine-learning techniques to plow through transcripts of 40,000 speeches in a parliamentary assembly during the first two years of the French Revolution, according to a paper published in the Proceedings of the National Academy of Sciences last month. By quantifying the novelty of speech patterns and the extent to which those patterns were copied by subsequent speakers, the researchers illustrated how much of the important intellectual work of the revolution was initially carried out in committees, rather than in the whole assembly.

Why We Wrote This

New readings of history often reveal previously hidden insights. By enlisting computers to analyze historical texts, historians are spotting patterns in language that were once invisible.

“We’re really getting a quantitative sense of large-scale patterns,” says co-author Simon DeDeo, a professor at Carnegie Mellon University and the Santa Fe Institute, a research center in New Mexico that specializes in complexity science. “There’s a lot of data here. You couldn’t have run this on a machine from 2000 or 2005.... Now you can do this on a desktop.”

Professor DeDeo received his doctorate from Princeton University in 2005 – not in European history, but in astrophysics. That was the tail of an inflationary period in DeDeo’s chosen field, and opportunities to tackle cosmology’s big questions were dwindling. “It was the end of the golden age,” he says. “I went off [and] I spent some time at the Santa Fe Institute, and that’s where I kind of converted into whatever I am now.”

The academy still hasn’t quite settled on a name for what DeDeo does, but the leading contender is “digital humanities,” a term that captures the field’s deeply interdisciplinary approach. Other digital humanities projects have brought together historians, librarians, literary critics, mathematicians, and computer scientists to analyze the complete works of Shakespeare, Time magazine covers, the ancient graffiti of Pompeii, and one million pages of Japanese manga.

“One of the exciting things is, can the humanities and the sciences team up?” DeDeo asks. “There’s a huge amount of knowledge and wisdom that the humanists have that the scientists don’t.”

Digital humanities can be traced to beginnings that are as diverse as the disciplines of its practitioners. One influential figure was Roberto Busa, an Italian Jesuit priest who, beginning in the 1940s, began rendering the works of St. Thomas Aquinas into a machine-readable format. Another is Franco Moretti, a Marxist-trained Italian literary critic who argues that understanding literature comes not from a close reading of the literary canon – literature’s equivalent to the one percent – but from a “distant reading” of the entire corpus.

Whether inspired by Thomistic completism, Marxist inclusivity, or something else entirely, digital humanities holds the potential to shift the way we look at history. “There’s no way that a single academic could have read all 10,000 bad pulpy novels published in the 19th century,” says Indiana University historian Rebecca Spang, a co-author on the French Revolution paper. “So you could ask different kinds of questions because you get different kinds of information.”

In the case of the French parliamentary assembly analysis, researchers found that, unlike Democrats and Republicans today, the bourgeoise and the aristocrats tended to use same language patterns. “There isn’t a sort of discursive spectrum that we can identify,” Professor Spang says, ”where you’ve got speakers on the right who use one vocabulary and the speakers on the left using another.”

Distant reading also results in a different understanding of the subject matter, one that is more holistic but also stands at a greater remove.

From the point of view of the computer, says Professor Spang, “it doesn’t matter what ‘ghijk’ means or says, just that it’s not ‘abcdef.’

“This kind of work is not going to give us a kind of emotionally or narratively satisfying historical explanation,” says David Andress, a historian at the University of Portsmouth in Britain and an expert on the French Revolution, “but it’s certainly going to show us things that we then have to explain, that that we then have to explore why we’ve got that result.”

This explanatory gap is why Dr. Andress doesn’t see digital humanities as a threat to traditional scholarship. “The readers of history and the general public are always going to want to have the story told to them in terms of people,” he says.

[Editor's note: An earlier version misstated the year DeDeo was awarded his doctorate.]

Why is Christian Science in our name?

A shield for Gaza’s innocent

How to move past ‘othering,’ and toward mutual respect

The election’s other message

Trump is back. Parents worldwide hope and fear for children’s futures.

With the election of Donald Trump, Canada braces for surge of asylum-seekers fleeing US

Trump has picked his first cabinet member and she’ll be the first woman chief of staff

Your subscription makes our work possible.

When the humanities meet big data

Why We Wrote This

Why We Wrote This

Deepen your worldview with Monitor Highlights.

When the humanities meet big data

Why We Wrote This

Why We Wrote This

Help fund Monitor journalism for $11/ month

Unlimited digital access $11/month.

Digital subscription includes:

Related stories

How these librarians are changing how we think about digital privacy

Where does music come from?

Breakthroughs arise from a precise mix of old and new knowledge, say scientists

Deepen your worldview with Monitor Highlights.

Subscription expired

Session expired

No subscription

Deepen your worldview
with Monitor Highlights.

Deepen your worldview
with Monitor Highlights.