If big data is a haystack, then algorithms may be better than humans when it comes to finding a needle.
Computer algorithms root out patterns in immense data sets, produce correlations, and ultimately come up with predictions. Selecting what an algorithm analyzes has always been done by humans.
A pair of researchers at the Massachusetts Institute of Technology (MIT) – Max Kanter, a master’s student in computer science, and his advisor, Kalyan Veeramachaneni, a research scientist at MIT’s computer science and artificial intelligence laboratory – have created the Data Science Machine to find patterns and select which data points are relevant, without the problem-solving help of humans.
The Data Science Machine is capable of making predictions from raw data, without the help of humans who are typically needed to choose the appropriate data points for a machine to analyze. The algorithm the machine uses to achieves this, called Deep Feature Synthesis, went up against human teams in three data science competitions. Out of 906 teams, the Machine beat 615. According to the research, "In 2 of the 3 competitions we beat a majority of competitors, and in the third, we achieved 94 percent of the best competitor’s score. In the best case, with an ongoing competition, we beat 85.6 percent of the teams and achieved 95.7 percent of the top submissions score."
But the win-loss record was not the most impressive takeaway from the competitions. While teams of humans sweat their predictive algorithms for months leading up to competition, the Data Science Machine took somewhere between two and 12 hours to produce each of its entries, MIT News reports.
In competition, teams were tasked with predicting whether a student would drop out of an online course over the next ten days, based on the student's interactions with MIT’s online-learning platform, MITx. There were many variables in play: teams could consider when and if students turned in their problem sets, or if they spent any time online looking at lecture notes.
But instead, MIT News reports, in predicting dropout rates the two most crucial indicators were how long prior to deadline a student began working on a problem set and "how much time the student spends on the course website relative to his or her classmates." Such data points weren’t directly collected by the online course platform, but such statistics could be inferred from the data available.
"The competitive success of the Data Science Machine suggests it has a role alongside data scientists," the researchers concluded. "Currently, data scientists are very involved in the feature generation and selection processes," but Kanter and Veeramachaneni see a future where their machine may take over that selection process.
“We view the Data Science Machine as a natural complement to human intelligence,” says Mr. Kanter in an interview with MIT News. “There’s so much data out there to be analyzed. And right now it’s just sitting there not doing anything. So maybe we can come up with a solution that will at least get us started on it, at least get us moving.”
The research team is presenting their paper (pdf) in Paris this week, at the IEEE Data Science and Advanced Analytics Conference, which brings together big-data scientists and companies in science, finance, technology, among others. Perhaps one will get Kanter and his machine moving.