Abstract
While lexical bundles research identifies continuous sequences (e.g. the end of the, I don't know if), researchers have also been interested in discontinuous sequences in which words form a 'frame' surrounding a variable slot (e.g. I don't * to, it is * to). To date, most research has focused on a few intuitively-selected frames, or has begun with frequent continuous sequences and then analyzed those to identify associated frames. Few previous studies have attempted to directly identify the full set of discontinuous sequences in a corpus. In the present study, we work towards that goal, using a corpus-driven approach to identify the set of recurrent four-word continuous and discontinuous patterns in corpora of conversation and academic writing. This direct computational analysis of the corpora reveals a more complete set of frames than alternative approaches, resulting in the documentation of highly frequent frames that have not been identified in previous research.
Original language | English (US) |
---|---|
Pages (from-to) | 109-136 |
Number of pages | 28 |
Journal | International Journal of Corpus Linguistics |
Volume | 18 |
Issue number | 1 |
DOIs | |
State | Published - 2013 |
Keywords
- Collocational framework
- Corpus-driven
- Database tools
- Formulaic language
- Lexical bundles
ASJC Scopus subject areas
- Language and Linguistics
- Linguistics and Language