Abstract
This paper introduces a project to develop a reliable, cost-effective method for classifying Internet texts into register categories, and apply that approach to the analysis of a large corpus of web documents. To date, the project has proceeded in 2 key phases. First, we developed a bottom-up method for web register classification, asking end users of the web to utilize a decision-tree survey to code relevant situational characteristics of web documents, resulting in a bottom-up identification of register and subregister categories. We present details regarding the development and testing of this method through a series of 10 pilot studies. Then, in the second phase of our project we applied this procedure to a corpus of 53,000 web documents. An analysis of the results demonstrates the effectiveness of these methods for web register classification and provides a preliminary description of the types and distribution of registers on the web.
Original language | English (US) |
---|---|
Pages (from-to) | 1817-1831 |
Number of pages | 15 |
Journal | Journal of the Association for Information Science and Technology |
Volume | 66 |
Issue number | 9 |
DOIs | |
State | Published - Sep 1 2015 |
Keywords
- classification
- discourse analysis
- linguistic analysis
ASJC Scopus subject areas
- Information Systems
- Computer Networks and Communications
- Information Systems and Management
- Library and Information Sciences