Word Use Equivalence and Hierarchical Word Tiers

Research output: Contribution to journalArticlepeer-review

Abstract

A ranked word list provides information about the position of each word in the list. However, retaining and employing the measure used to generate the ranked list can yield additional information about the words. If (Formula presented.) denotes the prevalence of a word in a corpus, then not only can the values of (Formula presented.) be ordered, their values can be compared to one another, and words having similar values can be grouped together into equivalence classes. Measures of word prevalence include mean text frequency, the dispersion of words across texts in a corpus, or a measure that combines frequency and dispersion. In this paper, we examine the concepts of word equivalence classes and hierarchical word tiers and apply these concepts to the words in the British National Corpus (BNC). Hierarchical word tiers can be constructed without the knowledge of all pairwise comparisons of the words under study. By grouping words that have similar values of prevalence, the ranked ordered list reduces to an informative set of hierarchical word tiers where each tier contains words that are similar to one another in terms of their use in the corpus.

Original languageEnglish (US)
JournalJournal of Quantitative Linguistics
DOIs
StateAccepted/In press - 2022

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Word Use Equivalence and Hierarchical Word Tiers'. Together they form a unique fingerprint.

Cite this