Where are my intelligent assistant's mistakes? A systematic testing approach

Todd Kulesza, Margaret Burnett, Simone Stumpf, Weng Keen Wong, Shubhomoy Das, Alex Groce, Amber Shinsel, Forrest Bice, Kevin McIntosh

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Scopus citations

Abstract

Intelligent assistants are handling increasingly critical tasks, but until now, end users have had no way to systematically assess where their assistants make mistakes. For some intelligent assistants, this is a serious problem: if the assistant is doing work that is important, such as assisting with qualitative research or monitoring an elderly parent's safety, the user may pay a high cost for unnoticed mistakes. This paper addresses the problem with WYSIWYT/ML (What You See Is What You Test for Machine Learning), a human/computer partnership that enables end users to systematically test intelligent assistants. Our empirical evaluation shows that WYSIWYT/ML helped end users find assistants' mistakes significantly more effectively than ad hoc testing. Not only did it allow users to assess an assistant's work on an average of 117 predictions in only 10 minutes, it also scaled to a much larger data set, assessing an assistant's work on 623 out of 1,448 predictions using only the users' original 10 minutes' testing effort.

Original languageEnglish (US)
Title of host publicationEnd-User Development - Third International Symposium, IS-EUD 2011, Proceedings
Pages171-186
Number of pages16
DOIs
StatePublished - 2011
Externally publishedYes
Event3rd International Symposium on End-User Development, IS-EUD 2011 - Torre Canne, Italy
Duration: Jun 7 2011Jun 10 2011

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume6654 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference3rd International Symposium on End-User Development, IS-EUD 2011
Country/TerritoryItaly
CityTorre Canne
Period6/7/116/10/11

Keywords

  • Intelligent assistants
  • end-user development
  • end-user programming
  • end-user software engineering
  • machine learning
  • testing

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Where are my intelligent assistant's mistakes? A systematic testing approach'. Together they form a unique fingerprint.

Cite this