Maximizing Classification Accuracy in Native Language Identification

Scott Jarvis, Yves Bestgen, Steve Pepper

Research output: Chapter in Book/Report/Conference proceedingConference contribution

62 Scopus citations

Abstract

This paper reports our contribution to the 2013 NLI Shared Task. The purpose of the task was to train a machine-learning system to identify the native-language affiliations of 1,100 texts written in English by nonnative speakers as part of a high-stakes test of general academic English proficiency. We trained our system on the new TOEFL11 corpus, which includes 11,000 essays written by nonnative speakers from 11 native-language backgrounds. Our final system used an SVM classifier with over 400,000 unique features consisting of lexical and POS n-grams occurring in at least two texts in the training set. Our system identified the correct native-language affiliations of 83.6% of the texts in the test set. This was the highest classification accuracy achieved in the 2013 NLI Shared Task.

Original languageEnglish (US)
Title of host publicationProceedings of the 8th Workshop on Innovative Use of NLP for Building Educational Applications, BEA 2013
EditorsJoel Tetreault, Jill Burstein, Claudia Leacock
PublisherAssociation for Computational Linguistics (ACL)
Pages111-118
Number of pages8
ISBN (Electronic)9781937284473
StatePublished - 2013
Externally publishedYes
Event8th Workshop on Innovative Use of NLP for Building Educational Applications, BEA 2013 - Atlanta, United States
Duration: Jun 13 2013 → …

Publication series

NameProceedings of the 8th Workshop on Innovative Use of NLP for Building Educational Applications, BEA 2013

Conference

Conference8th Workshop on Innovative Use of NLP for Building Educational Applications, BEA 2013
Country/TerritoryUnited States
CityAtlanta
Period6/13/13 → …

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Maximizing Classification Accuracy in Native Language Identification'. Together they form a unique fingerprint.

Cite this