TY - JOUR
T1 - Evaluating reliability in quantitative vocabulary studies
T2 - The influence of corpus design and composition
AU - Miller, Don
AU - Biber, Douglas
N1 - Publisher Copyright:
© John Benjamins Publishing Company.
PY - 2015
Y1 - 2015
N2 - Recent methodological advances have been used to create word lists based on large corpora. The present paper explores whether these corpora - and the associated lists - are unequivocally more representative. Corpus design considerations have usually focused on issues of external representativeness (representing the target discourse domain), while disregarding issues of internal representativeness (whether the corpus permits reliable descriptions of linguistic variation). This disregard may be especially problematic for studies of lexical variation, where it is difficult to achieve stable, reliable results from corpus analysis. The present paper illustrates these challenges through experiments based on analysis of a corpus representing a highly restricted discourse domain: university-level introductory psychology textbooks. The results indicate that corpus design and composition has a much greater influence on lexical variation than previously recognized, highlighting the need to evaluate internal representativeness in quantitative corpus-based research.
AB - Recent methodological advances have been used to create word lists based on large corpora. The present paper explores whether these corpora - and the associated lists - are unequivocally more representative. Corpus design considerations have usually focused on issues of external representativeness (representing the target discourse domain), while disregarding issues of internal representativeness (whether the corpus permits reliable descriptions of linguistic variation). This disregard may be especially problematic for studies of lexical variation, where it is difficult to achieve stable, reliable results from corpus analysis. The present paper illustrates these challenges through experiments based on analysis of a corpus representing a highly restricted discourse domain: university-level introductory psychology textbooks. The results indicate that corpus design and composition has a much greater influence on lexical variation than previously recognized, highlighting the need to evaluate internal representativeness in quantitative corpus-based research.
KW - Corpus representativeness
KW - Lexical diversity and variability
KW - Reliability and validity
KW - Word lists
UR - http://www.scopus.com/inward/record.url?scp=84926471509&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84926471509&partnerID=8YFLogxK
U2 - 10.1075/ijcl.20.1.02mil
DO - 10.1075/ijcl.20.1.02mil
M3 - Article
AN - SCOPUS:84926471509
SN - 1384-6655
VL - 20
SP - 30
EP - 53
JO - International Journal of Corpus Linguistics
JF - International Journal of Corpus Linguistics
IS - 1
ER -