Building a corpus: What are the key considerations?

Research output: Chapter in Book/Report/Conference proceedingChapter

52 Scopus citations


This chapter attempts to elucidate the issues that are involved when building a written corpus. The discussions on size and sampling have necessarily touched on questions of representativeness and balance. In the corpus designed to represent published Business English materials, the Published Materials Corpus, balance and representativeness were achieved by surveying the popularity of use of books in the general market in order to provide an overview of those books actually in use at the time. Publicly available data can be gathered from a variety of sources - newspapers, journals, magazines and a number of sites on the Internet. Despite the fact that written corpora are purportedly easier to create than spoken, largely because of the problems of spoken language transcription, there are still a wide range of issues that need to be addressed at all stages of the process from planning to data gathering and organisation.

Original languageEnglish (US)
Title of host publicationThe Routledge Handbook of Corpus Linguistics
PublisherTaylor and Francis
Number of pages7
ISBN (Electronic)9781135153632
ISBN (Print)9780203856949
StatePublished - Jan 1 2010

ASJC Scopus subject areas

  • General Arts and Humanities


Dive into the research topics of 'Building a corpus: What are the key considerations?'. Together they form a unique fingerprint.

Cite this