Data mining with learner corpora Choosing classifiers for L1 detection

Research output: Chapter in Book/Report/Conference proceedingChapter

12 Scopus citations

Abstract

This paper discusses the usefulness of machine-learning techniques for the investigation of cross-linguistic influence in learner corpora, and focuses on an approach known as supervised classification. Within this approach, one of the challenges that researchers face is deciding which particular method - or classifier - to use for a particular task. The classification task that this paper deals with is the ability of classifiers to learn to detect native language-related patterns in samples of second language writing. The empirical portion of this paper compares 20 classifiers in relation to their ability to perform this task with second language texts written by learners from 12 different native language backgrounds on the basis of their use of words and word sequences (or n-grams).

Original languageEnglish (US)
Title of host publicationA Taste for Corpora. In honour of Sylviane Granger
EditorsFanny Meunier, Sylvie De Cock, Gaetanelle Gilquin, Magali Paquot
PublisherJohn Benjamins Publishing Company
Pages127-154
Number of pages28
ISBN (Electronic)9789027287083
DOIs
StatePublished - 2011
Externally publishedYes

Publication series

NameStudies in Corpus Linguistics
Volume45
ISSN (Print)1388-0373

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language
  • Education
  • Management of Technology and Innovation

Fingerprint

Dive into the research topics of 'Data mining with learner corpora Choosing classifiers for L1 detection'. Together they form a unique fingerprint.

Cite this