TY - GEN
T1 - SkillScope
T2 - 2025 IEEE/ACM International Workshop on Natural Language-Based Software Engineering, NLBSE 2025
AU - Carter, Benjamin C.
AU - Contreras, Jonathan Rivas
AU - Llanes Villegas, Carlos A.
AU - Acharya, Pawan
AU - Utzerath, Jack
AU - Farner, Adonijah O.
AU - Jenkins, Hunter
AU - Johnson, Dylan
AU - Penney, Jacob
AU - Steinmacher, Igor
AU - Gerosa, Marco A.
AU - Santos, Fabio
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - New contributors often struggle to find tasks that they can tackle when onboarding onto a new Open Source Software (OSS) project. One reason for this difficulty is that issue trackers lack explanations about the knowledge or skills needed to complete a given task successfully. These explanations can be complex and time-consuming to produce. Past research has partially addressed this problem by labeling issues with issue types, issue difficulty levels, and issue skills. However, current approaches are limited to a small set of labels and lack indepth details about their semantics, which may not sufficiently help contributors identify suitable issues. To surmount this limitation, this paper explores large language models (LLMs) and Random Forest (RF) to predict the multilevel skills required to solve the open issues. We introduce a novel tool, SkiliScope, which retrieves current issues from Java projects hosted on GitHub and predicts the multilevel programming skills required to resolve these issues. In a case study, we demonstrate that SkillScope could predict 217 multilevel skills for tasks with 91% precision, 88% recall, and 89% F-measure on average. Practitioners can use this tool to better delegate or choose tasks to solve in OSS projects. A demo video is available at https://youtu.be/gqU/vDcT_0o
AB - New contributors often struggle to find tasks that they can tackle when onboarding onto a new Open Source Software (OSS) project. One reason for this difficulty is that issue trackers lack explanations about the knowledge or skills needed to complete a given task successfully. These explanations can be complex and time-consuming to produce. Past research has partially addressed this problem by labeling issues with issue types, issue difficulty levels, and issue skills. However, current approaches are limited to a small set of labels and lack indepth details about their semantics, which may not sufficiently help contributors identify suitable issues. To surmount this limitation, this paper explores large language models (LLMs) and Random Forest (RF) to predict the multilevel skills required to solve the open issues. We introduce a novel tool, SkiliScope, which retrieves current issues from Java projects hosted on GitHub and predicts the multilevel programming skills required to resolve these issues. In a case study, we demonstrate that SkillScope could predict 217 multilevel skills for tasks with 91% precision, 88% recall, and 89% F-measure on average. Practitioners can use this tool to better delegate or choose tasks to solve in OSS projects. A demo video is available at https://youtu.be/gqU/vDcT_0o
KW - large language models
KW - machine learning
KW - open source software (OSS)
KW - skill categorization
KW - software engineering
UR - https://www.scopus.com/pages/publications/105009458965
UR - https://www.scopus.com/inward/citedby.url?scp=105009458965&partnerID=8YFLogxK
U2 - 10.1109/NLBSE66842.2025.00007
DO - 10.1109/NLBSE66842.2025.00007
M3 - Conference contribution
AN - SCOPUS:105009458965
T3 - Proceedings - 2025 IEEE/ACM International Workshop on Natural Language-Based Software Engineering, NLBSE 2025
SP - 9
EP - 12
BT - Proceedings - 2025 IEEE/ACM International Workshop on Natural Language-Based Software Engineering, NLBSE 2025
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 27 April 2025
ER -