(C++) std::regex

January 17, 2018 ยท View on GitHub

std::regex is an STL class for regular expressions.

The default regular expression notation is that of ECMAScript [1], but the regex can use POSIX, awk, grep and egrep notation additionally [2].

ECMAScript table

Adapted from [1]:

Textdescription
[[:alnum:]]alpha-numerical character
[[:alpha:]]alphabetic character
[[:blank:]]blank character
[[:cntrl:]]control character
[[:digit:]]decimal digit character
[[:graph:]]character with graphical representation
[[:lower:]]lowercase letter
[[:print:]]printable character
[[:punct:]]punctuation mark character
[[:space:]]whitespace character
[[:upper:]]uppercase letter
[[:xdigit:]]hexadecimal digit character
[[:d:]]decimal digit character
[[:w:]]word character
[[:s:]]whitespace character

Example: multipliers

The example show the use of:

  • .: can be anything
  • [[:digit:]]: a digit
  • ?: zero or one repeats of the preceding thing
  • +: one or more repeats of the preceding thing
  • *: zero or more repeats of the preceding thing
  • {2}: two repeats of the preceding thing
#include <cassert>
#include <regex>
#include <string>

int main()
{
  assert(!std::regex_match("", std::regex("."))); //One anything
  assert(!std::regex_match("", std::regex("[[:digit:]]"))); //One digit
  assert( std::regex_match("", std::regex("[[:digit:]]?"))); //Zero or one digit
  assert(!std::regex_match("", std::regex("[[:digit:]]+"))); //One or more digits
  assert( std::regex_match("", std::regex("[[:digit:]]*"))); //Zero or more digits
  assert( std::regex_match("", std::regex("[[:digit:]]{0}"))); //Zero digits

  assert(std::regex_match("1", std::regex("."))); //One anything
  assert(std::regex_match("1", std::regex("[[:digit:]]"))); //One digit
  assert(std::regex_match("1", std::regex("[[:digit:]]?"))); //Zero or one digit
  assert(std::regex_match("1", std::regex("[[:digit:]]+"))); //One or more digits
  assert(std::regex_match("1", std::regex("[[:digit:]]*"))); //Zero or more digits
  assert(std::regex_match("1", std::regex("[[:digit:]]{1}"))); //One digit

  assert(!std::regex_match("12", std::regex("."))); //One anything
  assert(!std::regex_match("12", std::regex("[[:digit:]]"))) ; //One digit
  assert(!std::regex_match("12", std::regex("[[:digit:]]?"))); //Zero or one digit
  assert( std::regex_match("12", std::regex("[[:digit:]]+"))); //One or more digits
  assert( std::regex_match("12", std::regex("[[:digit:]]*"))); //Zero or more digits
  assert( std::regex_match("12", std::regex("[[:digit:]]{2}"))); //Two digits

}

Example: is_benelux_web_domain

The example show the use of:

  • |: or
  • (): group
  • \\.: a literal dot, .. The backslash escapes the dot being a wildcard. Because the backslash is a std::string escape character itself, it needs to be escaped by anothed backslash
#include <cassert>
#include <regex>
#include <string>
//A (simplified) Benelux (Dutch, Flemisch, Luxembourg) URL:
//  - has one or more alphanumeric characters
//  - ends on '.nl', '.be' or '.lu'
int main()
{
  const std::regex benelux_url("[[:alnum:]]+\\.(nl|be|lu)");
  assert( std::regex_match("nu.nl", benelux_url));
  assert( std::regex_match("k3.be", benelux_url));
  assert( std::regex_match("start.lu", benelux_url));
  assert(!std::regex_match("lemonde.fr", benelux_url));
  assert(!std::regex_match("nlbelu", benelux_url));
}

Example programs and code snippets

References

  • [1] Bjarne Stroustrup. The C++ Programming Language (4th edition). 2013. ISBN: 978-0-321-56384-2. Page 1071, 37.6 'Advice', item 3: 'The default regular expression notation is that of ECMAScript'
  • [2] Bjarne Stroustrup. The C++ Programming Language (4th edition). 2013. ISBN: 978-0-321-56384-2. Page 1071, 37.6 'Advice', item 9: 'regex can use ECMAScript, POSIX, awk, grep and egrep notation'