TY - GEN
T1 - Can ChatGPT emulate humans in software engineering surveys?
AU - Steinmacher, Igor
AU - Penney, Jacob Mc Auley
AU - Felizardo, Katia Romero
AU - Garcia, Alessandro F.
AU - Gerosa, Marco A.
N1 - Publisher Copyright:
© 2024 Owner/Author.
PY - 2024/10/24
Y1 - 2024/10/24
N2 - Context: There is a growing belief in the literature that large language models (LLMs), such as ChatGPT, can mimic human behavior in surveys. Gap: While the literature has shown promising results in social sciences and market research, there is scant evidence of its effectiveness in technical fields like software engineering. Objective: Inspired by previous work, this paper explores ChatGPT's ability to replicate findings from prior software engineering research. Given the frequent use of surveys in this field, if LLMs can accurately emulate human responses, this technique could address common methodological challenges like recruitment difficulties, representational shortcomings, and respondent fatigue. Method: We prompted ChatGPT to reflect the behavior of a 'mega-persona' representing the demographic distribution of interest. We replicated surveys from 2019 to 2023 from leading SE conferences, examining ChatGPT's proficiency in mimicking responses from diverse demographics. Results: Our findings reveal that ChatGPT can successfully replicate the outcomes of some studies, but in others, the results were not significantly better than a random baseline. Conclusions: This paper reports our results so far and discusses the challenges and potential research opportunities in leveraging LLMs for representing humans in software engineering surveys.
AB - Context: There is a growing belief in the literature that large language models (LLMs), such as ChatGPT, can mimic human behavior in surveys. Gap: While the literature has shown promising results in social sciences and market research, there is scant evidence of its effectiveness in technical fields like software engineering. Objective: Inspired by previous work, this paper explores ChatGPT's ability to replicate findings from prior software engineering research. Given the frequent use of surveys in this field, if LLMs can accurately emulate human responses, this technique could address common methodological challenges like recruitment difficulties, representational shortcomings, and respondent fatigue. Method: We prompted ChatGPT to reflect the behavior of a 'mega-persona' representing the demographic distribution of interest. We replicated surveys from 2019 to 2023 from leading SE conferences, examining ChatGPT's proficiency in mimicking responses from diverse demographics. Results: Our findings reveal that ChatGPT can successfully replicate the outcomes of some studies, but in others, the results were not significantly better than a random baseline. Conclusions: This paper reports our results so far and discusses the challenges and potential research opportunities in leveraging LLMs for representing humans in software engineering surveys.
KW - Generative AI
KW - Mega-Personas
KW - Replication Study
KW - Survey
UR - http://www.scopus.com/inward/record.url?scp=85210568779&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85210568779&partnerID=8YFLogxK
U2 - 10.1145/3674805.3690744
DO - 10.1145/3674805.3690744
M3 - Conference contribution
AN - SCOPUS:85210568779
T3 - International Symposium on Empirical Software Engineering and Measurement
SP - 414
EP - 419
BT - Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2024
PB - IEEE Computer Society
T2 - 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2024
Y2 - 24 October 2024 through 25 October 2024
ER -