Tools of the Trade: Data Sets are Tools, Too
Tools of the Trade is a series that brings you inside the labs of Earth Institute scientists. Learn about the equipment scientists use to make big discoveries about the world around us. This installment takes you into the world of population modeling at CIESIN, where data rules supreme.
Greg Yetman apologizes for not having as many tools to photograph as other labs have. He gestures at the rows of computers outside his office and says, “This is pretty much everything.”
Yetman works as associate director for the Geospatial Applications Division of the Center for International Earth Science Information Network (CIESIN) within Columbia’s Earth Institute. CIESIN uses data to learn about humans and our interactions with the world that surrounds us. The computer lab and accompanying servers make up the bulk of the equipment used by the center.
Physical tools may not abound, but the researchers at CIESIN use the data they collect and maintain to study the world, much like other scientists use microscopes and test tubes. They analyze the results of tests they run, and compare them with previous results. They take samples from all over the world. They aggregate thousands of data points to learn about larger trends. You can’t hold data in your hand but it’s incredibly useful as a tool.
From constructing a Gridded Population of the World to working with Facebook’s non-profit arm to adapt facial recognition technology to recognize buildings, CIESIN is on the cutting edge of data science. They are bringing together teams from all over the world and from different specialties to study people, natural resources, the environment, and climate change. Read on to learn more about some of the projects this center is working on.
Gridded Population of the World
The largest-scale project that CIESIN is working on is the Gridded Population of the World (GPW). The GPW project looks at both population count and density, and maps this information on a grid that represents a map of the world. This information alone is interesting, of course, but it gets really fascinating when it’s paired with other
information. Because it’s an independent data set that isn’t linked to anything besides census results, the GPW can be paired with almost anything, like climate, socioeconomic, or health data. Jeffrey Sachs, director of the Earth Institute’s Center for Sustainable Development, used the GPW to create a map of GDP per capita during the Millennium Villages Project, an initiative that helped rural African communities lift themselves out of extreme poverty; the map “looked a lot like a population map but there was a lot of variance between the rich areas and poor areas,” says Yetman.
CIESIN began as an independent non-profit in 1989, based in Michigan. The center helped launch the GPW in 1994, by funding the population gridding work done by famed UC Santa Barbara geographer Waldo Tobler (who passed away just over a year ago) and his grad student. CIESIN helped distribute it as an open data set—meaning that anyone can access and use it.
When CIESIN moved to Columbia in 1998, NASA’s Socioeconomic and Data Applications Center took over the project ownership and funding, but NASA has always contracted the project to CIESIN. The map is updated every 4 or 5 years; the fifth iteration will be available after 2020.
Facebook Population Data
CIESIN is also working closely with Facebook’s non-profit Internet.org and its Connectivity Lab, which are aimed at bringing internet to rural areas and countries that don’t currently have access. Yetman says that when Facebook started the project, “they realized they need to know the settlement pattern in rural areas in much more detailed scale than has been done before.” This information would help the company figure out how to bring internet to these places; by satellite, mesh network, etc.
To this end, Facebook has engaged the help of CIESIN, pairing the center’s population and mapping expertise with Facebook’s own engineering talent. The company bought a license for DigitalGlobe data, which is extremely detailed satellite imagery, at a spatial resolution of half a meter. Facebook then adapted facial recognition technology to recognize buildings in imagery. They were able to get over 90 percent accuracy in a lot of places, according to Yetman.
“It’s a joint workflow where they have the expertise in finding the buildings in the imagery, and we’ve been working with them on developing a model to distribute the population with the census,” he says.
Because Facebook has immense data processing capabilities, as well as lots of staff, they were able to process 300 terabytes of data in only 14 days on their first batch of locations. Yetman heard that the process took so much electricity that the power company called Facebook to check in.
The maps coming out of this project are amazingly detailed, says Yetman. When CIESIN mapped Haiti using the new data, for instance, they found that the area containing built structures (called “settlement extent”) was about three times higher than they had estimated using two other data products relying on the same satellite imagery.
Yetman says Facebook has decided to map most of the world, which will be useful not only because their results are detailed, but because they are comparable globally. When scientists rely on census data, they are always aware that census methods and quality vary widely from country to country, making it impossible to reliably compare results. This new method, on the other hand, is consistent and relies on satellite imagery rather than human reporting. The only inconsistencies may come from persistently cloudy areas like the Amazon, where they’re looking into radar data to fill in the gaps.
Facebook hopes to finish mapping the rest of the world by the end of 2019. The data will be published under an open license jointly by CIESIN and Facebook, and distributed by CIESIN.
“We’ve had this explosion in population modeling” at CIESIN, Yetman says, “to the point where we now coordinate a group of people doing population modeling and infrastructure modeling.” This group is called POPGRID Data Collaborative and includes CIESIN and Facebook as well as Esri, Oak Ridge National Laboratory, the US Census Bureau, and more.
Whereas many companies don’t want to share methods or data, CIESIN is taking the lead on sharing, collaboration, and coordination. To help with this, they’ve developed a software tool that lets users draw a shape on a map, and then see what the population inside that shape is according to different population estimates. The results include documentation of the methodology for each estimate and links to the data sources.
Data for the real world
Sometimes CIESIN sees trends in what types of data people are looking for. After a natural disaster, for instance, they typically hear from people looking for population and infrastructure data for the location of the event.
People are also interested in urbanization trends. As the population shifts from rural to urban in more places, says Yetman, CIESIN is getting more requests for related data. Similarly, he says, “people are moving more to the coastal areas, a very long trend that’s ongoing even in the US.”
As sea levels continue to rise, there is more interest in detailed population data along the coasts, especially in lower elevation areas. CIESIN has a specific data product for low elevation coastal zones, to get a clear picture of how many people live in areas that would be affected by 1m, 5m, and 10m of sea level rise.
Another CIESIN project funded by New York State will build a model to assess flood impacts across the state. Municipal and county planners and decision makers will be able to look up a property to see if it’s in a flood zone and how damaged it would be under different climate change prediction scenarios. It will also include strategy recommendations for climate adaptation.
Overall, CIESIN’s work shows that data is a necessary tool that we can use to improve life for people all over the world. Gathering and understanding data can help us strategize for a better, safer future for everyone.
All photos by Phebe Pierson unless otherwise noted.