Abstract
Regular expressions are powerful tools for extracting tables from non-tabular text data. Capturing regular expressions that describe the information to extract from column names can be especially useful when reshaping a data table from wide (few rows with many regularly named columns) to tall (fewer columns with more rows). We present the R package nc (short for named capture), which provides functions for wide-to-tall data reshaping using regular expressions. We describe the main new ideas of nc, and provide detailed comparisons with related R packages (stats, utils, data.table, tidyr, tidyfast, tidyfst, reshape2, cdata).
Original language | English (US) |
---|---|
Pages (from-to) | 69-82 |
Number of pages | 14 |
Journal | R Journal |
Volume | 13 |
Issue number | 1 |
DOIs | |
State | Published - 2021 |
Externally published | Yes |
ASJC Scopus subject areas
- Statistics and Probability
- Numerical Analysis
- Statistics, Probability and Uncertainty