Deep Learning Optimizes Data-Driven Representation of Soil Organic Carbon in Earth System Model Over the Conterminous United States

Feng Tao, Zhenghu Zhou, Yuanyuan Huang, Qianyu Li, Xingjie Lu, Shuang Ma, Xiaomeng Huang, Yishuang Liang, Gustaf Hugelius, Lifen Jiang, Russell Doughty, Zhehao Ren, Yiqi Luo

Research output: Contribution to journalArticlepeer-review

24 Scopus citations


Soil organic carbon (SOC) is a key component of the global carbon cycle, yet it is not well-represented in Earth system models to accurately predict global carbon dynamics in response to climate change. This novel study integrated deep learning, data assimilation, 25,444 vertical soil profiles, and the Community Land Model version 5 (CLM5) to optimize the model representation of SOC over the conterminous United States. We firstly constrained parameters in CLM5 using observations of vertical profiles of SOC in both a batch mode (using all individual soil layers in one batch) and at individual sites (site-by-site). The estimated parameter values from the site-by-site data assimilation were then either randomly sampled (random-sampling) to generate continentally homogeneous (constant) parameter values or maximally preserved for their spatially heterogeneous distributions (varying parameter values to match the spatial patterns from the site-by-site data assimilation) so as to optimize spatial representation of SOC in CLM5 through a deep learning technique (neural networking) over the conterminous United States. Comparing modeled spatial distributions of SOC by CLM5 to observations yielded increasing predictive accuracy from default CLM5 settings (R2 = 0.32) to randomly sampled (0.36), one-batch estimated (0.43), and deep learning optimized (0.62) parameter values. While CLM5 with parameter values derived from random-sampling and one-batch methods substantially corrected the overestimated SOC storage by that with default model parameters, there were still considerable geographical biases. CLM5 with the spatially heterogeneous parameter values optimized from the neural networking method had the least estimation error and less geographical biases across the conterminous United States. Our study indicated that deep learning in combination with data assimilation can significantly improve the representation of SOC by complex land biogeochemical models.

Original languageEnglish (US)
Article number17
JournalFrontiers in Big Data
StatePublished - Jun 3 2020


  • Community Land Model version 5 (CLM5)
  • Earth system model
  • data assimilation
  • deep learning
  • soil carbon dynamics
  • soil organic carbon representation

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science (miscellaneous)
  • Information Systems


Dive into the research topics of 'Deep Learning Optimizes Data-Driven Representation of Soil Organic Carbon in Earth System Model Over the Conterminous United States'. Together they form a unique fingerprint.

Cite this