The corpus of United States state statutes—design, construction and use

Jesse Egbert, Margaret Wood

Research output: Contribution to journalArticlepeer-review

2 Scopus citations


There is a need for more publicly available corpora of legal language. To help fill this gap, we have developed the Corpus of U.S. State Statutes, or CorUSSS, a new corpus comprising the statutory code from all 50 U.S. states. In total the corpus contains 1,785,742 texts, each of which represents the statutory text associated with a unique Universal Citation in one of the 50 U.S. states’ codes. This corpus provides us with the ability to explore language use in statutes within or across all 50 states. After motivating the need for this corpus, we describe its design and the methods we used to collect, clean and store the texts. We then report on a case study that illustrates the utility of this corpus for addressing important questions in statutory interpretation by investigating whether the word information can be used to refer to statements that are non-factual. We conclude with a call for researchers in law and corpus linguistics to rely on both legal and ordinary language when investigating questions of interpretation.

Original languageEnglish (US)
Article number100047
JournalApplied Corpus Linguistics
Issue number2
StatePublished - Aug 2023


  • Legal corpora
  • Legal language
  • Statutes
  • Statutory interpretation
  • Textualism

ASJC Scopus subject areas

  • Linguistics and Language
  • Social Sciences (miscellaneous)


Dive into the research topics of 'The corpus of United States state statutes—design, construction and use'. Together they form a unique fingerprint.

Cite this