TY - JOUR
T1 - Methodological issues in contrastive lexical bundle research
T2 - The influence of corpus design on bundle identification
AU - Pan, Fan
AU - Reppen, Randi
AU - Biber, Douglas
N1 - Publisher Copyright:
© John Benjamins Publishing Company
PY - 2020/8/28
Y1 - 2020/8/28
N2 - This study explores the influence of corpus design when comparing lexical bundle use across groups, examining how the number of texts and average length of texts can impact conclusions about group differences. The study compares the use of lexical bundles by L1-English versus L2-English writers, based on analysis of two sub-corpora of academic articles that are matched for discipline, writer expertize, time of publication, and audience. However, the two sub-corpora differ with respect to the number of texts and the average length of texts. Three experiments examined the influence of differences in corpus composition. The results show that differences in the number of words and number of texts across sub-corpora can have a strong effect on claimed differences in bundle use across groups. This effect is found even when the texts in the corpora are closely matched for their register and topic.
AB - This study explores the influence of corpus design when comparing lexical bundle use across groups, examining how the number of texts and average length of texts can impact conclusions about group differences. The study compares the use of lexical bundles by L1-English versus L2-English writers, based on analysis of two sub-corpora of academic articles that are matched for discipline, writer expertize, time of publication, and audience. However, the two sub-corpora differ with respect to the number of texts and the average length of texts. Three experiments examined the influence of differences in corpus composition. The results show that differences in the number of words and number of texts across sub-corpora can have a strong effect on claimed differences in bundle use across groups. This effect is found even when the texts in the corpora are closely matched for their register and topic.
KW - Corpus design
KW - Lexical bundle type distribution vs. token distribution
KW - Topic variation
UR - http://www.scopus.com/inward/record.url?scp=85092270382&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85092270382&partnerID=8YFLogxK
U2 - 10.1075/ijcl.19063.pan
DO - 10.1075/ijcl.19063.pan
M3 - Article
AN - SCOPUS:85092270382
SN - 1384-6655
VL - 25
SP - 215
EP - 229
JO - International Journal of Corpus Linguistics
JF - International Journal of Corpus Linguistics
IS - 2
ER -