Bootstrapping Techniques

Research output: Chapter in Book/Report/Conference proceedingChapter

13 Scopus citations

Abstract

Bootstrapping is a statistical technique that relies on randomly sampling with replacement from a set of observed values. Bootstrapping makes it possible to measure the accuracy and reliability of sample estimates and is often recommended for small samples and samples with unknown or non-normal distributions. In corpus linguistics, bootstrapping has also been proposed as a method for quantifying the degree of homogeneity in a corpus sample, for validation of statistical results, and as a methodological step in random decision forests, an advanced classification method. However, to date bootstrapping techniques have seldom been used with corpus data. We argue in this chapter that bootstrapping is underused in corpus linguistics, and that quantitative corpus linguists would do well to add this tool to their repertoire. This chapter includes an introduction to the fundamentals-both conceptual and practical-of bootstrapping methods. We address several applications of bootstrapping, including the measurement of sample estimate accuracy, the validation of statistical models, the estimation of corpus homogeneity, and random forests. We include an overview of two representative studies that have successfully used bootstrapping techniques with corpus data. Finally, we demonstrate how to perform bootstrapping on corpus data using R, and how to visualize and interpret the results.

Original languageEnglish (US)
Title of host publicationA Practical Handbook of Corpus Linguistics
PublisherSpringer International Publishing
Pages593-610
Number of pages18
ISBN (Electronic)9783030462161
ISBN (Print)9783030462154
DOIs
StatePublished - Jan 1 2021

ASJC Scopus subject areas

  • General Arts and Humanities
  • General Social Sciences

Fingerprint

Dive into the research topics of 'Bootstrapping Techniques'. Together they form a unique fingerprint.

Cite this