TY - GEN
T1 - Understanding Development Process of Machine Learning Systems
T2 - 13th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2019
AU - De Souza Nascimento, Elizamary
AU - Ahmed, Iftekhar
AU - Oliveira, Edson
AU - Palheta, Márcio Piedade
AU - Steinmacher, Igor
AU - Conte, Tayana
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/9
Y1 - 2019/9
N2 - Background: The number of Machine Learning (ML) systems developed in the industry is increasing rapidly. Since ML systems are different from traditional systems, these differences are clearly visible in different activities pertaining to ML systems software development process. These differences make the Software Engineering (SE) activities more challenging for ML systems because not only the behavior of the system is data dependent, but also the requirements are data dependent. In such scenario, how can Software Engineering better support the development of ML systems? Aim: Our objective is twofold. First, better understand the process that developers use to build ML systems. Second, identify the main challenges that developers face, proposing ways to overcome these challenges. Method: We conducted interviews with seven developers from three software small companies that develop ML systems. Based on the challenges uncovered, we proposed a set of checklists to support the developers. We assessed the checklists by using a focus group. Results: We found that the ML systems development follow a 4-stage process in these companies. These stages are: understanding the problem, data handling, model building, and model monitoring. The main challenges faced by the developers are: identifying the clients' business metrics, lack of a defined development process, and designing the database structure. We have identified in the focus group that our proposed checklists provided support during identification of the client's business metrics and in increasing visibility of the progress of the project tasks. Conclusions: Our research is an initial step towards supporting the development of ML systems, suggesting checklists that support developers in essential development tasks, and also serve as a basis for future research in the area.
AB - Background: The number of Machine Learning (ML) systems developed in the industry is increasing rapidly. Since ML systems are different from traditional systems, these differences are clearly visible in different activities pertaining to ML systems software development process. These differences make the Software Engineering (SE) activities more challenging for ML systems because not only the behavior of the system is data dependent, but also the requirements are data dependent. In such scenario, how can Software Engineering better support the development of ML systems? Aim: Our objective is twofold. First, better understand the process that developers use to build ML systems. Second, identify the main challenges that developers face, proposing ways to overcome these challenges. Method: We conducted interviews with seven developers from three software small companies that develop ML systems. Based on the challenges uncovered, we proposed a set of checklists to support the developers. We assessed the checklists by using a focus group. Results: We found that the ML systems development follow a 4-stage process in these companies. These stages are: understanding the problem, data handling, model building, and model monitoring. The main challenges faced by the developers are: identifying the clients' business metrics, lack of a defined development process, and designing the database structure. We have identified in the focus group that our proposed checklists provided support during identification of the client's business metrics and in increasing visibility of the progress of the project tasks. Conclusions: Our research is an initial step towards supporting the development of ML systems, suggesting checklists that support developers in essential development tasks, and also serve as a basis for future research in the area.
KW - Machine Learning Systems
KW - Software Engineering
KW - challenges
KW - data handling
KW - software development
UR - http://www.scopus.com/inward/record.url?scp=85074293382&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85074293382&partnerID=8YFLogxK
U2 - 10.1109/ESEM.2019.8870157
DO - 10.1109/ESEM.2019.8870157
M3 - Conference contribution
AN - SCOPUS:85074293382
T3 - International Symposium on Empirical Software Engineering and Measurement
BT - Proceedings - 13th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2019
PB - IEEE Computer Society
Y2 - 19 September 2019 through 20 September 2019
ER -