Abstract
The American National Corpus (ANC) project is developing a corpus comparable to the British National Corpus (BNC), covering American English. Recent interest in the web as a source of corpus materials has caused some in the language processing community to suggest that the development of a corpus of American English is unnecessary. However, we argue that far from being rendered superfluous by the availability of web materials, the ANC is likely to provide a resource for developing web acquisition techniques to support tasks such as genre and language detection and automatic annotation. This paper presents a comparison of the ANC in terms of both content and format with a test corpus compiled from web data, and a discussion of points of intersection and divergence.
| Original language | English (US) |
|---|---|
| Pages | 839-844 |
| Number of pages | 6 |
| State | Published - 2002 |
| Event | 3rd International Conference on Language Resources and Evaluation, LREC 2002 - Las Palmas, Canary Islands, Spain Duration: May 29 2002 → May 31 2002 |
Other
| Other | 3rd International Conference on Language Resources and Evaluation, LREC 2002 |
|---|---|
| Country/Territory | Spain |
| City | Las Palmas, Canary Islands |
| Period | 5/29/02 → 5/31/02 |
ASJC Scopus subject areas
- Linguistics and Language
- Language and Linguistics
- Education
- Library and Information Sciences