Abstract
Native language identification (NLI) is the task of automatically identifying the first language (L1) of a language user on the basis of the person's production of the target language. This research pursuit is guided by the assumption that a person's L1 background can be inferred from how frequently he or she makes use of certain features of the target language (e.g. words, word sequences, sequences of characters). The task is typically modelled as a text categorisation problem where the set of L1s is predefined and each text is assigned an L1 on account of its specific language features. NLI offers potential practical applications in a wide variety of domains that rely on language corpora. Among other benefits, NLI appears to enhance the performance of a number of natural language processing (NLP) tasks, such as speech recognition, parsing and information extraction (Mayfield Tomokiyo and Jones 2001). NLP tools and techniques are typically trained on native-speaker data and are consequently often less robust when applied to non-native language (L2) (Díaz-Negrillo et al. 2010; Chapter 24, this volume). A second benefit of NLI is that its results may contribute to the success of machine-learning approaches to author identification and profiling. These techniques are today of crucial interest for a number of web-related fields such as internet security and cybercrime investigation (Argamon et al. 2009). The results of an NLI task may also contribute to second language acquisition (SLA) theory building. The ability to detect the L1 of individuals on the basis of their use of certain specific features of the target language indeed offers unprecedented opportunities for the study of transfer, i.e. ‘the influence resulting from similarities and differences between the target language and any other language that has been previously (and perhaps imperfectly) acquired’ (Odlin 1989: 27; see also Chapter 15, this volume). The rapprochement between NLI techniques and transfer research was first made by Tsur and Rappoport (2007) and has recently been fully articulated in the detection-based approach to transfer (Jarvis 2010, 2012). In this exploratory approach, the results of an NLI task are used as primary data to investigate the nature and extent of L1 influence in non-native language use.
Original language | English (US) |
---|---|
Title of host publication | The Cambridge Handbook of Learner Corpus Research |
Publisher | Cambridge University Press |
Pages | 605-628 |
Number of pages | 24 |
ISBN (Electronic) | 9781139649414 |
ISBN (Print) | 9781107041196 |
DOIs | |
State | Published - Jan 1 2015 |
Externally published | Yes |
ASJC Scopus subject areas
- General Arts and Humanities
- General Social Sciences