The registry archives US films deemed to be of cultural, historic, or aesthetic importance. In other words, these are not just movies. These are "fi-i-lms."
And the single best predictor of whether a film will make it to this registry isn't box-office gross, critics' reviews, or public sentiment. Instead, it's the number of times other films adopt a registry candidate's general plot or offer a nod to one or more of its iconic scenes, according to a new study.
“We find that ultimately it is the creators, the filmmakers themselves, who will determine which movies are important, not the expert critics," said Luís Amaral, co-director of the Institute on Complex Systems at Northwestern University in Evanston, Ill., and the study's architect, in a statement.
The study is part of a larger effort to look for ways to use data mining to identify the most important scientific studies in various fields. One goal is to streamline the scientific process somewhat by shortening the time it takes for a researcher to quickly sort through an exploding number of studies to see which have been the most influential or pertinent to his or her new project.
Predictors for influence include the number of times a paper is cited by others in subsequent research or its appearance in influential journals. When publishing one's own results, including those references to others' work is required.
Artistic works have no such citation requirement. But informally, there might be similarities. After all, in music, Brahms’s First Symphony carries brief echos of Beethoven's Ninth Symphony in its final movement. Wagner and Mahler have clearly influenced film-score composers.
In movies, the film "When Harry Met Sally ..." carries a conversation between characters about "Casablanca," while "The Magnificent Seven," "For a Fistful of Dollars," and "Last Man Standing" borrowed general plot lines whole cloth from films by Japanese director Akira Kurosawa.
"The main scientific question we started with was: To what extent can the evaluation of a single expert be trusted? This question is extremely relevant in science, art, literature, and music," Dr. Amaral writes in an e-mail. "Another question that we thought we would be able to investigate was: Can one define objective methods that accurately estimate the significance of a given work?"
Fortunately, with films the team had a powerful confluence of data sources for gauging factors that determine how important a film has been culturally, historically, or aesthetically.
The Library of Congress's National Film Registry had 625 films in its collection against which to test factors that best predict which movies are likely to join the collection. (Since the paper was written, the registry's class of 2014 added 25 additional films.) The website IMDb.com is a mother lode of information on more than 1.7 million films – including the 15,425 US movies that the team used in the study, which were linked by one or more of the cumulative 42,794 "citations" to one another.
At the heart of the problem is finding an objective measure to represent quality or importance – traits that don't lend themselves to direct measurement but can be inferred from indirect measures.
For films, the researchers tested several potential indirect measures of a movie's significance. The subjective measures included the opinions of experts, from a single critic to many critics or the registry's own preservation board; and ratings offered by moviegoers.
Measures available as more-objective "hard numbers" included the amount of money a movie grossed, the number of times that scenes in a movie – or the movie itself – harked back to an earlier film, or the number of hits that a film's home page garnered through Web searches.
The team applied two statistical tests to these traits. One basically looks for correlations between trait and inclusion in the registry. The other is a more complex test that helps pick out factors that separate the important or influential from the rest of the pack.
One result – not much of a surprise, says Max Wasserman, an applied mathematics researcher in Amaral's lab and the lead author of the paper describing the study, published online Monday by the Proceedings of the National Academy of Sciences – is that the collective view of many film critics was a more powerful predictor of acceptance in the registry that the view of any single critic.
Of more interest was the cross-referencing among films. Films only one to two years old have the most references from new releases, the team found.
But the registry accepts only films that are at least 10 years old. When that is taken into account, registry films that garnered the most citations in new releases were those 20 to 25 years old at the time of the releases. The next most-frequent citations were films 15 to 19 years old.
In essence, the citation count for such "long gap" films as a predictor of inclusion in the registry is "superior" to predictions made using other metrics, the team concludes, and shows the power of using automated methods to help identify the cultural, historic, or aesthetic heavyweights.
But Mr. Wassermann points out that heavyweights can be underappreciated, even over time.
By the team's criteria, the film with the most references not currently in the registry is 1974's "The Texas Chain Saw Massacre," he notes.
Such films "are not considered to be the greatest thematic quality," he says. "But in the end, it is a very influential horror film, defining the genre."
It certainly influenced the insurance company GEICO, which ran ads last fall based on the horror classic.