Abstract
Some of the first considerations in constructing a corpus concern the overall design: for example, the kinds of texts included, the number of texts, the selection of particular texts, the selection of text samples from within texts, and the length of text samples. Each of these involves a sampling decision, either conscious or not. The use of computer-based corpora provides a solid empirical foundation for general purpose language tools and descriptions, and enables analyses of a scope not otherwise possible. However, a corpus must be 'representative’ in order to be appropriately used as the basis for generalizations concerning a language as a whole; for example, corpus-based dictionaries, grammars, and general part-of-speech taggers are applications requiring a representative basis (cf. Biber, 1993b). Typically researchers focus on sample size as the most important consideration in achieving representativeness: how many texts must be included in the corpus, and how many words per text sample. Books on sampling theory, however, emphasize that sample size is not the most important consideration in selecting a representative sample; rather, a thorough definition of the target population and decisions concerning the method of sampling are prior considerations. Representativeness refers to the extent to which a sample includes the full range of variability in a population.
Original language | English (US) |
---|---|
Title of host publication | Practical Lexicography |
Subtitle of host publication | A Reader |
Publisher | Oxford University Press |
Pages | 63-87 |
Number of pages | 25 |
ISBN (Electronic) | 9781383043891 |
ISBN (Print) | 9780199292332 |
DOIs | |
State | Published - Jan 1 2023 |
Keywords
- dictionaries
- emphasize
- enables
- generalizations
- population
ASJC Scopus subject areas
- General Computer Science
- General Arts and Humanities
- General Social Sciences