Abstract
Chapter 2 showed that relatively high levels of L1 classification accuracy can be achieved under the following conditions: (1) Five L1 groups, some of which are closely related to each other. (2) The learners within each group vary widely in terms of L2 proficiency. (3) The texts are all written narrative descriptions of a silent film. (4) The features (i.e. variables) used by the classifier include a few dozen highly frequent words. Conditions (1) and (2) were intended to make the detection task challenging for the classifier in order to investigate how sensitive the classifier is to even subtle between-group differences in learners’ language-use patterns, and simultaneously to gather possible evidence of L1 effects that may tend to evade conscious awareness but are nevertheless reliable enough – even across proficiency levels – to be detected by a computer-based classifier. Condition (3) represented a control variable whose purpose was to limit the range of variation in the data to that which could be attributed to proficiency differences (within-group) and L1 differences (between-group). This was done to enhance the clarity of interpretations that could be made on the basis of the results – to show whether certain L1-related tendencies are reliable enough such that they are detectable even when proficiency differences within L1 groups are greater than differences between groups. Finally, condition (4) represented the wealth of resources that were made available to the classifier. In order to test the reliability of L1 lexical effects as well as the strength, sensitivity, and practicality of the classifier, the pool of features made available to the classifier was intentionally restricted to just 53 of the most frequent words in the data. The stepwise feature-selection parameters were further set in such a way as to allow the classifier to build its L1 prediction model using no more than 40 of the 53 features that were made available to it. This was done for purposes of adhering to the convention of restricting the number of variables to no more than 10% of the number of cases.
Original language | English (US) |
---|---|
Title of host publication | Approaching Language Transfer through Text Classification |
Subtitle of host publication | Explorations in the Detection-Based Approach |
Publisher | Channel View Publications |
Pages | 71-105 |
Number of pages | 35 |
ISBN (Electronic) | 9781847696991 |
ISBN (Print) | 9781847696977 |
State | Published - Mar 14 2012 |
Externally published | Yes |
ASJC Scopus subject areas
- General Arts and Humanities
- General Social Sciences