Abstract
Chapters 3 through 5 of this book have given an indication of the levels of L1 detection accuracy that can be attained through classification analyses whose predictor variables are individual words and multiword sequences (or n-grams, see Chapter 3), measures of coherence, lexical semantics and lexical diversity (or Coh-Metrix (CM) indices, see Chapter 4 and McNamara & Graesser, in press), and the types and numbers of errors that learners make in their L2 English writing (see Chapter 5). The results of these analyses show L1 classification accuracies from roughly 54% for n-grams to roughly 65% for both errors and CM indices. All three analyses were performed with data extracted from the International Corpus of Learner English (ICLE; see Granger et al., 2009) using similar selection criteria (e.g. argumentative essays between 500 and 1000 words in length), but they differ in relation to the number of texts analyzed (2033 in the n-gram analysis, 903 in the CM analysis, and 223 in the error analysis) as well as in relation to the number of L1s under investigation (12, 4 and 3, respectively). The purpose of the present chapter is to perform a series of L1 detection analyses on essays from three language groups (French, German and Spanish), applying the features (or variables) from all three studies to a single dataset in order to examine both the comparative and combined usefulness of n-grams, CM indices and error measures for this type of research.
Original language | English (US) |
---|---|
Title of host publication | Approaching Language Transfer through Text Classification |
Subtitle of host publication | Explorations in the Detection-Based Approach |
Publisher | Channel View Publications |
Pages | 154-177 |
Number of pages | 24 |
ISBN (Electronic) | 9781847696991 |
ISBN (Print) | 9781847696977 |
State | Published - Mar 14 2012 |
Externally published | Yes |
ASJC Scopus subject areas
- General Arts and Humanities
- General Social Sciences