myscience.org › science wire › Collecting just the right data

Collecting just the right data

24 July 2014

Calculating the mutual information between two nodes in a graph is like injecting blue dye into one of them and measuring the concentration of blue at the other. Crucial to the new algorithm are the elimination of loops in the graph (orange) and a technique that prevents intermediary nodes (black) from distorting the long-range calculation of mutual information (blue).

Much artificial-intelligence research addresses the problem of making predictions based on large data sets. An obvious example is the recommendation engines at retail sites like Amazon and Netflix. But some types of data are harder to collect than online click histories -information about geological formations thousands of feet underground, for instance. And in other applications - such as trying to predict the path of a storm - there may just not be enough time to crunch all the available data. Dan Levine, an MIT graduate student in aeronautics and astronautics, and his advisor, Jonathan How, the Richard Cockburn Maclaurin Professor of Aeronautics and Astronautics, have developed a new technique that could help with both problems. For a range of common applications in which data is either difficult to collect or too time-consuming to process, the technique can identify the subset of data items that will yield the most reliable predictions. So geologists trying to assess the extent of underground petroleum deposits, or meteorologists trying to forecast the weather, can make do with just a few, targeted measurements, saving time and money.