The summer of 2023 is headed for the record books, with increasing heat waves, wildfires, tropical storms, and flooding. July was declared the hottest month on earth since records began in 1880. To better understand how the earth is changing, the impact of extreme climate events, and how humans might adapt, researchers use satellite images to extrapolate data.
Earth and data scientists face a monumental challenge, however: By 2024, new satellite missions will produce 250,000 terabytes of data, NASA estimates. How much data is that? If you were snapping 100 photos a day with an iPhone, it would take you 1.7 million years to accumulate that many terabytes of data.
How can researchers mine all this satellite data, along with the information contained in millions of published scientific papers? And how can they effectively share data with policymakers and the public?
Hamed Alemohammad, director of Clark’s new Center for Geospatial Analytics, and six graduate students — working with NASA and IBM — are hoping artificial intelligence (AI) can answer these questions. Together, they have produced the world’s first geospatial AI foundation model, a milestone that will allow climate and earth scientists to access and study data more quickly and efficiently.
“In construction, you put the foundation on the ground, and then you build a customized structure on top. The foundation model is practically the same thing. But in this case, you are building a deep learning model,” says Alemohammad, associate professor in Clark’s Graduate School of Geography.
Using a foundation model to build generative AI models that can be customized for various applications, rather than building those custom models from scratch, saves researchers time and money, he explains. “That’s really the bottom line here. Using fewer samples of data, you can get similar or better accuracy with the foundation model compared to building a supervised model which requires large number of samples.”
ChatGPT and Google’s Bard are examples of generative models built on top of the first large language foundation model. The Clark/NASA/IBM project “is the first foundation model in geospatial earth science,” Alemohammad says. “We want to assess the usability of foundation models in this field.”
This year, IBM and NASA’s Interagency Implementation and Advanced Concepts Team (IMPACT) deployed the geospatial AI foundation model to comb through and extract information from a year of raw, unlabeled imagery data gathered by the space agency’s Harmonized Landsat Sentinel-2 (HLS) satellite from across the continental United States.
In July, the team released the foundation model on Hugging Face, a repository for open-source machine learning models. The foundation model — fine-tuned on human-labeled data for mapping of floods and burn scars from wildfires — so far has demonstrated a 15 percent improvement over other state-of-the-art techniques, according to IBM. The effort is tied to NASA’s goal to make data, code, and AI models available to everyone through its Open-Source Science Initiative.
“We believe that foundation models have the potential to change the way observational data are analyzed and help us to better understand our planet,” says Kevin Murphy, chief science data officer at NASA. “And by open-sourcing such models and making them available to the world, we hope to multiply their impact.”
The Clark team is refining and evaluating the geospatial AI foundation model for so-called downstream applications. For instance, they are examining whether the fine-tuned foundation model can predict the U.S. Department of Agriculture’s data on the types of crops grown in the U.S. The crop classification layer — which helps identify crops in vast amounts of satellite data and has the potential to mitigate expensive ground data collections — was released with the foundation model in July.
“Having a near-real-time map of what is growing, and where, can be very important for policymaking,” Alemohammad says. “If a severe event happens, like a heat wave or a flooding, you can immediately assess the damage in terms of impact on the harvest, for example.”
Such maps also could show the impacts of specific government policies, he adds. “If the government, for example, imposes taxes on specific crops, or allows export of those crops, one can use these maps to characterize their implications on production.”
The research team hopes that more investment at global scale to collect high-quality ground reference data can allow scientists to study global food-production trends using advancement in these foundation models.
“The U.S. is a very resilient country in terms of production, but if you consider a smaller country, which is very dependent on certain crop types for internal consumption or for export, it is important to be able to monitor that, particularly during the growing season,” Alemohammad says.
Sustainable agriculture practices also could be tracked across the world, he suggests, including crop rotation, application of fertilizer, maintaining the distance between trees, and tillage of fields. “Or do farmers rely on crop burning? That is a very damaging practice in many parts of the world, including India and Mexico.”
Alemohammad, the principal investigator, and his graduate assistants — Mike Cecil, Sam Khallaghi, and Fatemeh Kordi, all doctoral students in geography; Denys Godwin and Hanxi (Steve) Li, a master’s student in geographic information science; and Maryam Ahmadi, a master’s student in business analytics — are continuing work on the project via a grant from NASA IMPACT, a team based at the Marshall Space Flight Center at the University of Alabama in Huntsville.
The Clark team is exploring three types of downstream applications:
The Clark team and their NASA and IBM colleagues already are receiving feedback from researchers accessing the foundation model. “We want people to play with the model,” he says. “In a way, they’re helping us test it.”