TY - JOUR
T1 - PaleoRec
T2 - A sequential recommender system for the annotation of paleoclimate datasets
AU - Manety, Shravya
AU - Khider, Deborah
AU - Heiser, Christopher
AU - McKay, Nicholas
AU - Emile-Geay, Julien
AU - Routson, Cody
N1 - Publisher Copyright:
© The Author(s), 2022. Published by Cambridge University Press.
PY - 2022/4/13
Y1 - 2022/4/13
N2 - Studying past climate variability is fundamental to our understanding of current changes. In the era of Big Data, the value of paleoclimate information critically depends on our ability to analyze large volume of data, which itself hinges on standardization. Standardization also ensures that these datasets are more Findable, Accessible, Interoperable, and Reusable. Building upon efforts from the paleoclimate community to standardize the format, terminology, and reporting of paleoclimate data, this article describes PaleoRec, a recommender system for the annotation of such datasets. The goal is to assist scientists in the annotation task by reducing and ranking relevant entries in a drop-down menu. Scientists can either choose the best option for their metadata or enter the appropriate information manually. PaleoRec aims to reduce the time to science while ensuring adherence to community standards. PaleoRec is a type of sequential recommender system based on a recurrent neural network that takes into consideration the short-term interest of a user in a particular dataset. The model was developed using 1996 expert-annotated datasets, resulting in 6,512 sequences. The performance of the algorithm, as measured by the Hit Ratio, varies between 0.7 and 1.0. PaleoRec is currently deployed on a web interface used for the annotation of paleoclimate datasets using emerging community standards.
AB - Studying past climate variability is fundamental to our understanding of current changes. In the era of Big Data, the value of paleoclimate information critically depends on our ability to analyze large volume of data, which itself hinges on standardization. Standardization also ensures that these datasets are more Findable, Accessible, Interoperable, and Reusable. Building upon efforts from the paleoclimate community to standardize the format, terminology, and reporting of paleoclimate data, this article describes PaleoRec, a recommender system for the annotation of such datasets. The goal is to assist scientists in the annotation task by reducing and ranking relevant entries in a drop-down menu. Scientists can either choose the best option for their metadata or enter the appropriate information manually. PaleoRec aims to reduce the time to science while ensuring adherence to community standards. PaleoRec is a type of sequential recommender system based on a recurrent neural network that takes into consideration the short-term interest of a user in a particular dataset. The model was developed using 1996 expert-annotated datasets, resulting in 6,512 sequences. The performance of the algorithm, as measured by the Hit Ratio, varies between 0.7 and 1.0. PaleoRec is currently deployed on a web interface used for the annotation of paleoclimate datasets using emerging community standards.
KW - Long short-term memory
KW - paleoclimatology
KW - sequential recommender system
UR - http://www.scopus.com/inward/record.url?scp=86000641311&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=86000641311&partnerID=8YFLogxK
U2 - 10.1017/eds.2022.3
DO - 10.1017/eds.2022.3
M3 - Article
AN - SCOPUS:86000641311
SN - 2634-4602
VL - 1
JO - Environmental Data Science
JF - Environmental Data Science
M1 - e4
ER -