IEDA: Revolutionizing Big Data
Most research databases are narrowly focused. They might contain only seismic data from earthquakes, for example, or chemical data from volcanic rocks. The Interdisciplinary Earth Data Alliance (IEDA) set out to create a different kind of research experience, and the result is fueling groundbreaking multi-disciplinary discoveries worldwide.
Created and managed by scientists at Lamont-Doherty Earth Observatory, IEDA brings together diverse datasets from across geochemistry and marine geoscience into one system. Importantly, it provides the tools that allow scientists from a wide range of fields to easily search for and explore relationships among many different kinds of data.
“This is a new era of data mining,” said IEDA Director Kerstin Lehnert, a geochemist and Doherty Senior Research Scientist at Lamont. Lehnert will be leading several sessions exploring advances in data preservation and storage this week at the American Geophysical Union’s Fall Meeting in San Francisco. Two of those sessions will be streamed live online.
“Through IEDA, scientists can find the natural samples, the composition, the geochemistry of the samples. If you need to know the structure of the crust underneath those samples, you can get to the seismic data. You can check if there are experimental results for chemical composition from close to these rocks that can tell you where they come from. Are there any dated rocks? Where is geochronology in this particular area? The data starts to be networked, and it comes together in IEDA,” Lehnert said.
Before IEDA, these kinds of data were largely inaccessible, often stored on scientists’ local computers, in their lab notebooks, or fragmented throughout the scientific journals. By bringing the data together in an easily searchable format, IEDA has created a way for researchers to quickly access thousands of values for analysis and comparison. Two scientists were recently able to document a link between deep Earth geochemistry and a rise in oxygen in Earth’s atmosphere by downloading 70,000 samples of continental igneous rock geochemistry from IEDA. Finding all the data would have taken years before IEDA was created.
“Integrating different kinds of observations and observations made from many different regions in order to gain a global perspective is a powerful way to gain new insights into science problems,” said IEDA Associate Director Suzanne Carbotte, a marine geophysicist and Bruce C. Heezen Lamont Research Professor.
Transforming 21st Century Science
IEDA combines EarthChem, the world’s largest geochemistry database, with the Marine Geoscience Data System, which serves data for studies of seafloor and deeper crust and mantle processes.
EarthChem started as the petrology database PetDB at Lamont in 1996, when relational databases were just beginning to be developed. Today it includes several partner databases and some 400,000 samples and 20 million analytical values from across geochemistry, along with the tools to mine the collections. The EarthChem Portal also connects with other large databases, including Germany’s GeoRock, a database in Japan, and the U.S. Geological Survey’s national geochemical database, allowing IEDA users to search across all these major databases at once.
The Marine Geoscience Data System traces its origins to the early 1990s, when Lamont oceanographer Bill Ryan launched a first-of its-kind web-accessible database of seafloor bathymetry data. Building upon this early resource, the Marine Geoscience Data System serves a wide range of marine geoscience data collected by research ships and other platforms, including data back to 1954. It includes global bathymetry data, seafloor imagery, seismic data that provide cross-sectional views beneath the seafloor, as well as other multidisciplinary data from a series of national research programs.
The IEDA system is also uniquely equipped to incorporate smaller, niche data sets, which it then makes open and accessible through interactive, map-based interfaces and other tools.
One focus is compiling what are known as “long-tail” data: what scientists do with observational data in their labs and how they analyze it.
“A lot of what people do with different data sets in their labs can be very innovative and unique and new,” said Vicki Ferrini, an oceanographer who works with IEDA’s marine geophysics data. “It all gets out through scientific publications, interpretations of it get out, but actually making the data that supports those publications accessible and reusable and into something that can be built upon is what we’re really aspiring to do.”
Understanding the Data Needs of Scientists
Part of IEDA’s success stems from its team’s close connections to science. Lehnert, Carbotte, and many of the IEDA team members are scientists who are intimately familiar with scientific workflows and how scientists search for and analyze data. They know what scientists need and how to customize solutions and incorporate different types of results.
In addition to the repositories and analysis tools, IEDA has identification systems that link published papers to their original data and samples. Openness of data is critical to scientists’ ability to test theories and reproduce results, and data management plans are now required by the National Science Foundation, which supports IEDA. IEDA also makes data and samples available for reuse so scientists don’t have to collect the same kinds of data from the same location again, saving time and money.
“IEDA builds upon Lamont’s rich legacy of acquiring diverse multidisciplinary data to address science questions that dates back to the earliest days of Doc Ewing and the globally ranging expeditions of Lamont ships,” Carbotte said. Maurice “Doc” Ewing, Lamont’s founding director, ordered all ocean expeditions to routinely collect diverse sets of geoscience and oceanographic data. When new scientific questions arose, data and samples were often there for analysis.
“We’re making the data available, and people can take the data for whatever they need,” Lehnert said.