README.org

April 21, 2026 ยท View on GitHub

| nc: named capture | [[https://rdatatable-community.github.io/The-Raft/posts/2024-08-01-seal_of_approval-nc/][https://rdatatable-community.github.io/The-Raft/posts/2024-08-01-seal_of_approval-nc/hex_approved.png]]

| [[file:tests/testthat][tests]] | [[https://github.com/tdhock/nc/actions][https://github.com/tdhock/nc/workflows/R-CMD-check/badge.svg]] | | [[https://github.com/jimhester/covr][coverage]] | [[https://app.codecov.io/gh/tdhock/nc?branch=master][https://codecov.io/gh/tdhock/nc/branch/master/graph/badge.svg]] |

User-friendly functions for extracting a data table (row for each match, column for each group) from non-tabular text data using regular expressions, and for melting/reshaping columns that match a regular expression. Please read and cite my related R Journal papers, if you use this code!

** Quick demo of matching functions

#+BEGIN_SRC R fruit.vec <- c("granny smith apple", "blood orange and yellow banana") fruit.pattern <- list(type=".*?", " ", fruit="orange|apple|banana") nc::capture_first_vec(fruit.vec, fruit.pattern) #> type fruit #> 1: granny smith apple #> 2: blood orange nc::capture_all_str(fruit.vec, fruit.pattern) #> type fruit #> 1: granny smith apple #> 2: blood orange #> 3: and yellow banana #+END_SRC

** Quick demo of reshaping functions

#+begin_src R (one.iris <- iris[1,]) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.2 setosa nc::capture_melt_single(one.iris, part=".", "[.]", dim=".") #> Species part dim value #> 1: setosa Sepal Length 5.1 #> 2: setosa Sepal Width 3.5 #> 3: setosa Petal Length 1.4 #> 4: setosa Petal Width 0.2 nc::capture_melt_multiple(one.iris, part=".", "[.]", column=".") #> Species part Length Width #> 1: setosa Petal 1.4 0.2 #> 2: setosa Sepal 5.1 3.5 nc::capture_melt_multiple(one.iris, column=".", "[.]", dim=".") #> Species dim Petal Sepal #> 1: setosa Length 1.4 5.1 #> 2: setosa Width 0.2 3.5 #+end_src

** Installation

#+BEGIN_SRC R install.packages("nc")

or:

if(!require(devtools))install.packages("devtools") devtools::install_github("tdhock/nc") #+END_SRC

** Usage overview

Watch the [[https://www.youtube.com/watch?v=4mDJnVtzsbg&list=PLwc48KSH3D1P8R7470s0lgcUObJLEXSSO&index=1][screencast tutorial videos]]!

The main functions provided in nc are:

| Subject | nc function | Similar to | And | |----------------------+-------------------------+---------------------------------------+-------------------------| | Single string | =capture_all_str= | =stringr::str_match_all= | =rex::re_matches= | | Character vector | =capture_first_vec= | =stringr::str_match= | =rex::re_matches= | | Data frame chr cols | =capture_first_df= | =tidyr::extract/separate_wider_regex= | =data.table::tstrsplit= | | Data frame col names | =capture_melt_single= | =tidyr::pivot_longer= | =data.table::melt= | | Data frame col names | =capture_melt_multiple= | =tidyr::pivot_longer= | =data.table::melt= | | File paths | =capture_first_glob= | =arrow::open_dataset= | |

*** Choice of regex engine

By default, nc uses PCRE. Other options include ICU and RE2.

To tell nc that you would like to use a certain engine, #+BEGIN_SRC R options(nc.engine="RE2") #+END_SRC

Every function also has an engine argument, e.g.

#+BEGIN_SRC R nc::capture_first_vec( "foo a\U0001F60E# bar", before=".?", emoji="\p{EMOJI_Presentation}", after=".", engine="ICU") #> before emoji after #> 1 foo a ๐Ÿ˜Ž # bar #+END_SRC

** Related work

For an detailed comparison of regex C libraries in R (ICU, PCRE, TRE, RE2), see my [[https://github.com/tdhock/namedCapture-article][R journal (2019) paper about namedCapture]].

The nc reshaping functions provide functionality similar to packages tidyr, stats, data.table, reshape, reshape2, cdata, utils, etc. The main difference is that =nc::capture_melt_*= support named capture regular expressions with type conversion, which (1) makes it easier to create/maintain a complex regex, and (2) results in less repetition in user code. For a detailed comparison, see [[https://github.com/tdhock/nc-article][my R Journal (2021) paper about nc]].

Below I list the main differences between the functions in =nc= and other analogous R functions: