README.org

November 16, 2025 · View on GitHub

#+title: parcom - Parser Combinators

=parcom= is a consise Parser Combinator library in the style of Haskell's =parsec= and Rust's =nom=.

#+begin_src lisp :exports both (in-package :parcom) (parse (*> (string "Tempus") #'space (string "fugit")) "Tempus fugit.") #+end_src

#+RESULTS: : fugit

=parcom= operates strictly on strings, not streamed byte data, but is otherwise "zero copy" in that extracted substrings of the original input are not reallocated.

Also provided are the following subsystems, which can be used independently to read their associated formats:

  • =parcom/json=
  • =parcom/toml=
  • =parcom/xml=
  • =parcom/datetime=
  • =parcom/email=

=parcom= and its children have no dependencies.

  • Table of Contents :TOC_5_gh:noexport:
  • [[#compatibility][Compatibility]]
  • [[#performance][Performance]]
    • [[#json-benchmarks][JSON Benchmarks]]
    • [[#xml-benchmarks][XML Benchmarks]]
  • [[#api][API]]
    • [[#types-and-running-parsers][Types and Running Parsers]]
    • [[#parsers][Parsers]]
      • [[#characters-and-strings][Characters and Strings]]
        • [[#char][char]]
        • [[#string-string-lenient][string, string-lenient]]
        • [[#any][any]]
        • [[#any-but][any-but]]
        • [[#any-if][any-if]]
        • [[#hex][hex]]
        • [[#sneak][sneak]]
        • [[#eof][eof]]
      • [[#numbers][Numbers]]
        • [[#unsigned][unsigned]]
        • [[#integer][integer]]
        • [[#float][float]]
      • [[#whitespace][Whitespace]]
        • [[#newline][newline]]
        • [[#space-space1][space, space1]]
        • [[#multispace-multispace1][multispace, multispace1]]
      • [[#taking-in-bulk][Taking in Bulk]]
        • [[#take][take]]
        • [[#take-while-take-while1][take-while, take-while1]]
        • [[#consume-consume1][consume, consume1]]
        • [[#sliding-take-sliding-take1][sliding-take, sliding-take1]]
        • [[#rest][rest]]
      • [[#other][Other]]
        • [[#pure][pure]]
    • [[#combinators][Combinators]]
      • [[#-right][*>, right]]
      • [[#-left][<*, left]]
      • [[#ap][ap]]
      • [[#-instead][<$, instead]]
      • [[#alt][alt]]
      • [[#opt][opt]]
      • [[#between][between]]
      • [[#many-many1][many, many1]]
      • [[#sep-sep1][sep, sep1]]
      • [[#sep-end-sep-end1][sep-end, sep-end1]]
      • [[#consume-sep-consume-sep1][consume-sep, consume-sep1]]
      • [[#skip][skip]]
      • [[#peek][peek]]
      • [[#count][count]]
      • [[#take-until][take-until]]
      • [[#recognize][recognize]]
      • [[#maybe][maybe]]
      • [[#not][not]]
    • [[#utilities][Utilities]]
      • [[#empty][empty?]]
      • [[#digit][digit?]]
      • [[#fmap][fmap]]
      • [[#const][const]]
    • [[#json][JSON]]
      • [[#parse][parse]]
      • [[#json-1][json]]
    • [[#toml][TOML]]
      • [[#parse-1][parse]]
      • [[#toml-1][toml]]
    • [[#xml][XML]]
      • [[#parse-2][parse]]
      • [[#xml-1][xml]]
      • [[#content][content]]
    • [[#dates-and-times][Dates and Times]]
      • [[#parse-3][parse]]
      • [[#now][now]]
      • [[#date][date]]
      • [[#time][time]]
      • [[#format][format]]
    • [[#email-addresses][Email Addresses]]
      • [[#parse-4][parse]]
      • [[#pretty][pretty]]
      • [[#valid-email-address][valid-email-address?]]
      • [[#addr-spec-msg-id][addr-spec, msg-id]]
  • [[#writing-your-own-parsers][Writing your own Parsers]]
    • [[#basics][Basics]]
    • [[#parameterized-parsers][Parameterized Parsers]]
    • [[#failure][Failure]]
    • [[#type-signatures][Type Signatures]]
  • Compatibility

| Compiler | Status | |-----------+--------| | SBCL | ✅ | | ECL | ✅ | | Clasp | ✅ | | ABCL | ✅ | | CCL | ✅ | | Clisp | ✅ | | Allegro | ✅ | | LispWorks | ❓ |

  • Performance

=parcom= operates over =(simple-array character (*))= input, accessing its elements with =schar=. The =parcom/json= component has been measured to parse JSON at ~50mb/s on a 2016 ThinkPad, which should be sufficient for ordinary usage. Its performance is competitive across compilers, while remaining expressive and light in implementation.

See also [[https://www.fosskers.ca/en/blog/optimizing-common-lisp][this article]] about some of the optimization techniques used within =parcom=.

** JSON Benchmarks

In parsing a [[https://raw.githubusercontent.com/json-iterator/test-data/master/large-file.json][25mb JSON file]] from an in-memory string.

SBCL:

| Method | Time (ms) | Memory (bytes) | |-------------+-----------+----------------| | jsown | 200 | 135m | | jzon | 370 | 132m | | parcom/json | 500 | 227m | | shasht | 650 | 410m | | yason | 1800 | 400m |

Allegro:

| Method | Time (s) | Memory (cons) | Memory (bytes) | |-------------+----------+---------------+----------------| | jsown | 0.7 | 1.3m | 91m | | parcom/json | 1.2 | 3.9m | 242m | | jzon | 1.5 | 0.9m | 258m | | shasht | 1.5 | 4.5m | 452m | | yason | 8.2 | 6.4m | 457m |

ECL:

| Method | Time (s) | Memory (bytes) | |-------------+----------+----------------| | shasht | 2.6 | 1.8b | | parcom/json | 2.8 | 401m | | jzon | 3.2 | 1.6b | | yason | 5.4 | 1.9b | | jsown | 5.8 | 165m |

ABCL:

| Method | Time (s) | Memory (cons cells) | |-------------+----------+---------------------| | parcom/json | 6.0 | 2m | | jsown | 7.4 | 25m | | jzon | 11.3 | 864k | | shasht | 83.3 | 110m | | yason | 328.7 | 366m |

CMUCL:

| Method | Time (s) | Memory (bytes) | |-------------+----------+----------------| | jsown | 0.6 | 71m | | parcom/json | 1.6 | 175m | | shasht | 4.5 | 348m | | yason | 9.1 | 346m |

** XML Benchmarks

In parsing a Java =.pom= file 1000 times.

SBCL:

| Method | Times (ms) | Memory (bytes) | |------------+------------+----------------| | parcom/xml | 341 | 161m | | cxml | 795 | 497m | | plump | 1450 | 656m |

Allegro:

| Method | Time (s) | Memory (cons) | Memory (bytes) | |------------+----------+---------------+----------------| | parcom/xml | 1.2 | 2m | 353m | | plump | 4.9 | 10m | 790m | | cxml | 5.0 | 13m | 684m |

ABCL:

| Method | Time (s) | Memory (cons cells) | |------------+----------+---------------------| | parcom/xml | 12.8 | 35m | | plump | 66.7 | 85m | | cxml | 139.6 | 150m |

=cxml= is quite old and one of its dependencies did not compile under ECL.

  • API

The examples below use =(in-package :parcom)= for brevity, but it's assumed that you'll use a local nickname like =p= or =pc= in your actual code. Further, most examples run the parsers with =parse=, but occasionally =funcall= is used instead to demonstrate what the remaining input offset would be after the parse succeeded. You will generally be using =parse= in your own code.

** Types and Running Parsers

All parsers have the function signature =offset -> (value, offset)=, where =offset= is the current parsing offset. The new =value= and =offset= must be yielded via =values= as multiple return values, as this is the most memory-efficient.

#+begin_src lisp :exports both :results verbatim (in-package :parcom) (funcall (string "Hello") (in "Hello there")) #+end_src

#+RESULTS: : "Hello", 5

Of course, a parser might fail, in which case a failure message and the offset are returned:

#+begin_src lisp :exports both :results verbatim (in-package :parcom) (funcall (string "Hello") (in "Bye!")) #+end_src

#+RESULTS: : :FAIL, 0

In general though, we call =parse= to fully run some combined parsers and yield the final output:

#+begin_src lisp :exports both (in-package :parcom) (apply #'+ (parse (sep (char #.) #'unsigned) "123.456.789!")) #+end_src

#+RESULTS: : 1368

=parse= otherwise ignores any final, unconsumed input. It will also raise a Condition if the parsing failed.

** Parsers

A "parser" is a function that consumes some specific input and yields a single result.

*** Characters and Strings **** char

Parse a given character.

#+begin_src lisp :exports both (in-package :parcom) (parse (char #\a) "apple") #+end_src

#+RESULTS: : #\a

**** string, string-lenient

Parse the given string, limited to the most efficient string type (=(simple-array character (*))=) for efficiency. To further save on memory, the parsed string is the argument itself.

#+begin_src lisp :exports both (in-package :parcom) (parse (string "Hello") "Hello there!") #+end_src

#+RESULTS: : Hello

The =string-lenient= variant accepts any =cl:string= as an argument, and thus is a bit slower. Likewise it returns its own argument in order to avoid allocation.

**** any

Parse any character.

#+begin_src lisp :exports both (in-package :parcom) (parse #'any "Hello there!") #+end_src

#+RESULTS: : #\H

**** any-but

Parse any character except the one you don't want.

#+begin_src lisp :exports both (in-package :parcom) (parse (any-but #!) "Hello there!") #+end_src

#+RESULTS: : #\H

#+begin_src lisp :exports both :results verbatim (in-package :parcom) (funcall (any-but #\H) (in "Hello there!")) #+end_src

#+RESULTS: : :FAIL, 0 **** any-if

Any character that passes the predicate.

#+begin_src lisp :exports both (in-package :parcom) (parse (any-if #'digit?) "8a") #+end_src

#+RESULTS: : #\8

**** hex

Parse a hex character of any case.

#+begin_src lisp :exports both (in-package :parcom) (parse (many #'hex) "abcd0efgh") #+end_src

#+RESULTS: : (#\a #\b #\c #\d #\0 #\e #\f)

**** sneak

Yield the given char if it's the next one, but don't advance the offset. Like =peek=, but character-based and thus more performant.

#+begin_src lisp :exports both (in-package :parcom) (funcall (sneak #\a) (in "aaabcd")) #+end_src

#+RESULTS: : #\a, 0

**** eof

Recognize the end of the input.

#+begin_src lisp :exports both (in-package :parcom) (parse #'eof "") #+end_src

#+RESULTS: : T

#+begin_src lisp :exports both (in-package :parcom) (parse (*> (string "Mālum") #'eof) "Mālum") #+end_src

#+RESULTS: : T

#+begin_src lisp :exports both :results verbatim (in-package :parcom) (funcall (*> (string "Mālum") #'eof) (in "Mālum rubrum")) #+end_src

#+RESULTS: : :FAIL, 5

*** Numbers **** unsigned

Parse a positive integer into a =fixnum=.

#+begin_src lisp :exports both (in-package :parcom) (parse #'unsigned "44") #+end_src

#+RESULTS: : 44

**** integer

Parse a positive or negative integer into a =fixnum=.

#+begin_src lisp :exports both (in-package :parcom) (parse #'integer "-44") #+end_src

#+RESULTS: : -44

**** float

Parse a positive or negative floating point number into a =double-float=.

#+begin_src lisp :exports both (in-package :parcom) (parse #'float "123.0456") #+end_src

#+RESULTS: : 123.0456d0

*** Whitespace **** newline

Matches a single newline character.

#+begin_src lisp :exports both (in-package :parcom) (let ((s (concatenate 'simple-string '(#\newline #\a #\b #\c)))) ; "\nabc" (parse #'newline s)) #+end_src

#+RESULTS: : #\Newline

**** space, space1

Parse 0 or more ASCII whitespace and tab characters.

#+begin_src lisp :exports both (in-package :parcom) (length (parse #'space " Salvē!")) #+end_src

#+RESULTS: : 3

Parse 1 or more ASCII whitespace and tab characters.

#+begin_src lisp :exports both (in-package :parcom) (length (parse #'space1 " Salvē!")) #+end_src

#+RESULTS: : 3

#+begin_src lisp :exports both :results verbatim (in-package :parcom) (funcall #'space1 (in "Salvē!")) #+end_src

#+RESULTS: : :FAIL, 0

**** multispace, multispace1

Parse 0 or more ASCII whitespace, tabs, newlines, and carriage returns.

#+begin_src lisp :exports both (in-package :parcom) (length (parse #'multispace (concatenate 'simple-string '(#\tab #\newline #\tab)))) #+end_src

#+RESULTS: : 3

Parse 1 or more ASCII whitespace, tabs, newlines, and carriage returns.

#+begin_src lisp :exports both (in-package :parcom) (length (parse #'multispace1 (concatenate 'simple-string '(#\tab #\newline #\tab)))) #+end_src

#+RESULTS: : 3

#+begin_src lisp :exports both :results verbatim (in-package :parcom) (funcall #'multispace1 (in "Ārcus")) #+end_src

#+RESULTS: : :FAIL, 0

*** Taking in Bulk

These always yield a substring borrowed directly from the original input.

**** take

Take =n= characters from the input.

#+begin_src lisp :exports both (in-package :parcom) (parse (take 3) "Arbor") #+end_src

#+RESULTS: : Arb

It's okay for =n= to be too large:

#+begin_src lisp :exports both (in-package :parcom) (parse (take 100) "Arbor") #+end_src

#+RESULTS: : Arbor

**** take-while, take-while1

Take characters while some predicate holds.

#+begin_src lisp :exports both (in-package :parcom) (parse (take-while (lambda (c) (equal #\a c))) "aaabbb") #+end_src

#+RESULTS: : aaa

=take-while1= is like =take-while=, but must yield at least one character.

#+begin_src lisp :exports both :results verbatim (in-package :parcom) (funcall (take-while1 (lambda (c) (equal #\a c))) (in "bbb")) #+end_src

#+RESULTS: : :FAIL, 0

**** consume, consume1

A faster version of =take-while= and =skip= when you know you're character-based and don't need the parsed output.

#+begin_src lisp :exports both :results verbatim (in-package :parcom) (funcall (consume (lambda (c) (equal #\a c))) (in "aaabbb")) #+end_src

#+RESULTS: : T, 3

=consume1= requires at least one application of the predicate to succeed.

**** sliding-take, sliding-take1

Like =take-while=, but check two characters at a time. Very useful for parsing escaped characters where the backslash and char need to be analyzed at the same time.

The given function must yield two values via =values=:

  • The keyword =:one= and a character to keep if you only wish to advance the parser by one place. For instance, when the character wasn't escaped.
  • The keyword =:two= and a character to keep if you want to advance the parser by two.

Note that in both cases, the character yielded by your lambda need not be one of the inputs given to it. For example, if it detected a backslash and an =#\n=, it could yield the single Lisp =#\newline= character.

#+begin_src lisp :exports both :results verbatim (in-package :parcom) (parse (sliding-take (lambda (a b) (cond ((and (char= a #\) (char= b #\n)) (values :two #\newline)) (t (values :one a))))) "Hello \n there!") #+end_src

#+RESULTS: : Hello : there!

**** rest

Consume the rest of the input. Always succeeds.

#+begin_src lisp :exports both :results verbatim (in-package :parcom) (parse (<> (string "Salvē") (> #'space #'rest)) "Salvē domine!") #+end_src

#+RESULTS: : ("Salvē" "domine!") *** Other **** pure

Consume no input and just yield a given value.

#+begin_src lisp :exports both (in-package :parcom) (parse (pure :pāx) "Bellum") #+end_src

#+RESULTS: : :PĀX

Useful for chaining with other compound parsers to inject values into the results.

#+begin_src lisp :exports both :results verbatim (in-package :parcom) (parse (<> (<> (pure :pāx) (string "PĀX")) #'multispace (<*> (pure :bellum) (string "BELLUM"))) "PĀX BELLUM") #+end_src

#+RESULTS: : ((:PĀX "PĀX") " " (:BELLUM "BELLUM"))

** Combinators

"Combinators" combine child parsers together to form compound results. They allow us to express intent like "parse this then that" and "parse this, then maybe that, but only if..." etc.

*** *>, right

Run multiple parsers one after another, but yield the value of the rightmost one. =right= is an alias.

#+begin_src lisp :exports both (in-package :parcom) (parse (*> (char #!) #'unsigned) "!123?") #+end_src

#+RESULTS: : 123

*** <*, left

Run multiple parsers one after another, but yield the value of the leftmost one. =left= is an alias.

#+begin_src lisp :exports both (in-package :parcom) (parse (<* (char #!) #'unsigned) "!123?") #+end_src

#+RESULTS: : #!

*** ap

Apply a function to the result of any number of parsers. The main way to parse multiple values into a single struct/class.

#+begin_src lisp :exports both :results verbatim (in-package :parcom) (parse (ap (lambda (a b c) (list a (+ b c))) #'unsigned (> (char #.) #'unsigned) (> (char #.) #'unsigned)) "1.2.3") #+end_src

#+RESULTS: : (1 5)

This is much more memory-efficient variant of the previously Haskell-inspired combination of =fmap= and =<*>=.

*** <$, instead

Run some parser, but substitute its inner value with something else if parsing was successful. =instead= is an alias.

#+begin_src lisp :exports both (in-package :parcom) (parse (<$ :roma (string "Roma")) "Roma!") #+end_src

#+RESULTS: : :ROMA

*** alt

Accept the results of the first parser from a group to succeed. Can combine as many parsers as you want.

#+begin_src lisp :exports both (in-package :parcom) (parse (alt (string "dog") (string "cat")) "cat") #+end_src

#+RESULTS: : cat

*** opt

Yield =nil= if the parser failed, but don't fail the whole process nor consume any input.

#+begin_src lisp :exports both (in-package :parcom) (parse (opt (string "Ex")) "Exercitus") #+end_src

#+RESULTS: : Ex

#+begin_src lisp :exports both (in-package :parcom) (parse (opt (string "Ex")) "Facēre") #+end_src

#+RESULTS: : NIL

*** between

A main parser flanked by two other ones. Only the value of the main parser is kept. Good for parsing backets, parentheses, etc.

#+begin_src lisp :exports both (in-package :parcom) (parse (between (char #!) (string "Salvē") (char #!)) "!Salvē!") #+end_src

#+RESULTS: : Salvē

*** many, many1

=many= parses 0 or more occurrences of a parser. =many1= demands that at least one parse succeeds or a Condition will be raised.

#+begin_src lisp :exports both :results verbatim (in-package :parcom) (parse (many (alt (string "ovēs") (string "avis"))) "ovēsovēsavis!") #+end_src

#+RESULTS: : ("ovēs" "ovēs" "avis")

*** sep, sep1

=sep= parses 0 or more instances of a parser separated by some =sep= parser. =sep1= demands that at least one parse succeeds or a Condition will be raised.

#+begin_src lisp :exports both :results verbatim (in-package :parcom) (parse (sep (char #!) (string "pilum")) "pilum!pilum!pilum.") #+end_src

#+RESULTS: : ("pilum" "pilum" "pilum")

Critically, if a separator is detected, the parent parser must also then succeed or the entire combination fails. For example, this will not parse due to the =!= on the end:

#+begin_src lisp :exports both :results verbatim (in-package :parcom) (funcall (sep (char #!) (string "pilum")) (in "pilum!pilum!pilum!")) #+end_src

#+RESULTS: : :FAIL, 18

For more lenient behaviour regarding the separator, see =sep-end=.

*** sep-end, sep-end1

The same as =sep=, but the separator /may/ appear at the end of the final "parent". Likewise, =sep-end1= demands that at least one parse of the parent succeeds.

#+begin_src lisp :exports both :results verbatim (in-package :parcom) (parse (sep-end (char #!) (string "pilum")) "pilum!pilum!pilum!scūtum") #+end_src

#+RESULTS: : ("pilum" "pilum" "pilum")

*** consume-sep, consume-sep1

A combination of =sep= and =consume= for when you just wish to advance the parser and don't actually require the output. This avoids a lot of wasteful, intermediate list allocation.

#+begin_src lisp :exports both (in-package :parcom) (parse (consume-sep (char #!) (string "pilum")) "pilum!pilum!pilum.") #+end_src

#+RESULTS: : 17

=consume-sep1= requires at least one application of the main parser to succeed.

*** skip

Parse some parser 0 or more times, but throw away all the results.

#+begin_src lisp :exports both (in-package :parcom) (parse (*> (skip (char #!)) #'unsigned) "!!!123") #+end_src

#+RESULTS: : 123

*** peek

Yield the value of a parser, but don't consume the input.

#+begin_src lisp :exports both (in-package :parcom) (funcall (peek (string "he")) (in "hello")) #+end_src

#+RESULTS: : he

*** count

Apply a parser a given number of times and collect the results as a list.

#+begin_src lisp :exports both (in-package :parcom) (funcall (count 3 (char #\a)) (in "aaaaaa")) #+end_src

#+RESULTS: : (#\a #\a #\a), 3

*** take-until

Take characters until another parser succeeds. Does not advance the offset by the subparser.

#+begin_src lisp :exports both :results verbatim (in-package :parcom) (funcall (take-until (char #')) (in "abcd'")) #+end_src

#+RESULTS: : "abcd", 4

If the subparser is just looking for a single char like the above, use =take-while= or =consume= instead. =take-until= is intended for more complex halting conditions that can't easily be detected by a char-by-char predicate function.

*** recognize

If the given parser was successful, return the consumed input as a string instead.

#+begin_src lisp :exports both :results verbatim (in-package :parcom) (funcall (recognize (<*> (string "hi") #'unsigned)) (in "hi123there")) #+end_src

#+RESULTS: : "hi123", 5

*** maybe

If an initial parser succeeds, apply some =f= to the result of the second parser. If the first parser doesn't succeed, the second is attempted as usual but =f= isn't applied.

#+begin_src lisp :exports both (in-package :parcom) (parse (maybe #'1+ (char #\a) #'integer) "a123") #+end_src

#+RESULTS: : 124

*** not

Pass if the given parser fails, and don't advance the offset. Fail if it succeeds.

#+begin_src lisp :exports both :results verbatim (in-package :parcom) (funcall (not (char #\a)) (in "bark")) #+end_src

#+RESULTS: : T, 0

** Utilities

*** empty?

Is a given string empty?

#+begin_src lisp :exports both (in-package :parcom) (empty? "") #+end_src

#+RESULTS: : T

*** digit?

Is a given character a number from 0 to 9?

#+begin_src lisp :exports both (in-package :parcom) (digit? #\7) #+end_src

#+RESULTS: : T

*** fmap

Apply a pure function to the result of a successful parse.

#+begin_src lisp :exports both :results verbatim (in-package :parcom) (fmap #'1+ (funcall #'unsigned (in "1"))) #+end_src

#+RESULTS: : 2, 1

However, in general you should consider =ap= for applying functions to the combined results of multiple parsers.

*** const

Yield a function that ignores its input and returns some original seed.

#+begin_src lisp :exports both (in-package :parcom) (funcall (const 1) 5) #+end_src

#+RESULTS: : 1

** JSON

By depending on the optional =parcom/json= system, you can parse JSON strings or include parcom-compatible JSON parsers into your own custom parsing code.

=(in-package :parcom/json)= is used below for brevity, but it's assumed that in your own code you will use a nickname, perhaps =pj= or =json=.

If you don't care about the individual parsers per se and just want to simply parse some JSON, use =pj:parse=.

Conversions:

| JSON | Lisp | |--------+----------------| | =true= | =T= | | =false= | =NIL= | | Array | Vector | | Object | Hash Table | | Number | =double-float= | | String | String | | =null= | =:NULL= |

*** parse

Attempt to parse any JSON value. Analogous to =parse= from the main library.

#+begin_src lisp :exports both (in-package :parcom/json) (parse "{"x": 1, "y": 2, "z": [1, {"a":true}]}") #+end_src

#+RESULTS: : #<HASH-TABLE :TEST EQUAL :COUNT 3 {1004C0B293}>

#+begin_src lisp :exports both :results verbatim (in-package :parcom/json) (parse "[1.9,true,3e+7,"hi",[4],null]") #+end_src

#+RESULTS: : #(1.9d0 T 3.0d7 "hi" #(4.0d0) :NULL)

Non-ascii and unicode characters are supported:

#+begin_src lisp :exports both (in-package :parcom/json) (parse ""hēllお🐂\u03B1"") #+end_src

#+RESULTS: : hēllお🐂α

*** json

Parse any kind of JSON (the actual parser).

#+begin_src lisp :exports both (in-package :parcom/json) (json (parcom:in "{"x": 1, "y": 2, "z": [1, {"a":true}]} ")) #+end_src

#+RESULTS: : #<HASH-TABLE :TEST EQUAL :COUNT 3 {1004CA4C63}>, 38

There are other subparsers exposed, but they are left out here for brevity. Please consult the source code if you need them.

** TOML

The =parcom/toml= system provides types and parsers for [[https://toml.io/][TOML]] files.

=(in-package :parcom/toml)= is used below for brevity, but it's assumed that in your own code you will use a nickname, perhaps =pt= or =toml=.

If you don't care about the individual parsers per se and just want to simply parse some TOML, use =pt:parse=.

This parser is TOML 1.0.0 compliant, with one exception: =inf= and =nan= float values are not accepted.

This system has no dependencies other than =parcom/datetime=.

*** parse

Parse a full TOML document directly from a string.

#+begin_src lisp :exports both (in-package :parcom/toml) (parse "# My Config

[project] name = "parcom"") #+end_src

#+RESULTS: : #<HASH-TABLE :TEST EQUAL :COUNT 1 {10089AB2F3}>

*** toml

Parse TOML via the actual parser.

#+begin_src lisp :exports both (in-package :parcom/toml) (toml (parcom:in "# My Config

[project] name = "parcom"")) #+end_src

#+RESULTS: : #<HASH-TABLE :TEST EQUAL :COUNT 1 {10066C8433}>, 38

There are other subparsers exposed, but they are left out here for brevity. Please consult the source code if you need them. ** XML

The =parcom/xml= system provides types and parsers for [[https://www.w3.org/TR/xml11][XML]] files.

=(in-package :parcom/xml)= is used below for brevity, but it's assumed that in your own code you will use a nickname, perhaps =px= or =xml=.

If you don't care about the individual parsers per se and just want to simply parse some XML, use =px:parse=.

This parser is gradually compliant to XML 1.1. If you discover XML files for which parsing fails, please open an issue.

*** parse

=parse= produces a top-level =document= which may contain metadata.

#+begin_src lisp :exports both (in-package :parcom/xml) (parse "yes") #+end_src

#+RESULTS: : #S(DOCUMENT : :METADATA NIL : :ELEMENT #S(ELEMENT :NAME "hello" :METADATA NIL :CONTENT "yes"))

*** xml

The actual combinator is =xml=.

#+begin_src lisp :exports both (in-package :parcom/xml) (xml (parcom:in "yes")) #+end_src

#+RESULTS: : #S(DOCUMENT : :METADATA NIL : :ELEMENT #S(ELEMENT :NAME "hello" :METADATA NIL :CONTENT "yes")) : 18

There are other subparsers exposed, but they are left out here for brevity. Please consult the source code if you need them.

*** content

A generic function to easily extract the meaningful content of anything element-like.

#+begin_src lisp :exports both (in-package :parcom/xml) (content (parse "yes")) #+end_src

#+RESULTS: : yes

** Dates and Times

The =parcom/datetime= system provides types and parsers for [[https://datatracker.ietf.org/doc/html/rfc3339][RFC3339]] timestamps.

=(in-package :parcom/datetime)= is used below for brevity, but it's assumed that in your own code you will use a nickname, perhaps =pd=.

As with the other =parcom= libraries, this has no external dependencies, which is an advantage over the otherwise excellent [[https://github.com/dlowe-net/local-time][local-time]] library, which depends on heavy =uiop=.

*** parse

=parse= is lenient, and will parse any kind of date or time you give it.

#+begin_src lisp :exports both (in-package :parcom/datetime) (parse "1975-04-05") #+end_src

#+RESULTS: : #S(LOCAL-DATE :YEAR 1975 :MONTH 4 :DAY 5)

#+begin_src lisp :exports both (in-package :parcom/datetime) (parse "1975-04-05T04:05:06+03:00") #+end_src

#+RESULTS: : #S(OFFSET-DATE-TIME : :DATE #S(LOCAL-DATE :YEAR 1975 :MONTH 4 :DAY 5) : :TIME #S(LOCAL-TIME :HOUR 4 :MINUTE 5 :SECOND 6 :MILLIS 0) : :OFFSET #S(OFFSET :HOUR 3 :MINUTE 0))

It's up to you to handle the concrete type that you're returned. See the =date= and =time= generic functions below.

*** now

Right now!

#+begin_src lisp :exports both (in-package :parcom/datetime) (now) #+end_src

#+RESULTS: : #S(OFFSET-DATE-TIME : :DATE #S(LOCAL-DATE :YEAR 2025 :MONTH 5 :DAY 5) : :TIME #S(LOCAL-TIME :HOUR 10 :MINUTE 0 :SECOND 28 :MILLIS 0) : :OFFSET #S(OFFSET :HOUR 9 :MINUTE 0))

It's a cloudy May morning.

*** date

Regardless of what parsed, you can usually pull a =local-date= out of it.

#+begin_src lisp :exports both (in-package :parcom/datetime) (date (parse "1975-04-05T04:05:06+03:00")) #+end_src

#+RESULTS: : #S(LOCAL-DATE :YEAR 1975 :MONTH 4 :DAY 5)

*** time

Regardless of what parsed, you can usually pull a =local-time= out of it.

#+begin_src lisp :exports both (in-package :parcom/datetime) (time (parse "1975-04-05T04:05:06+03:00")) #+end_src

#+RESULTS: : #S(LOCAL-TIME :HOUR 4 :MINUTE 5 :SECOND 6 :MILLIS 0)

*** format

To convert your object back into something human-readable. Note that this is different from =cl:format=!

#+begin_src lisp :exports both (in-package :parcom/datetime) (format nil (date (parse "1975-04-05T04:05:06+03:00"))) #+end_src

#+RESULTS: : 1975-04-05 ** Email Addresses

The =parcom/email= system provides types and parsers for [[https://datatracker.ietf.org/doc/html/rfc5322][RFC5322]] email addresses. The implementation is RFC-compliant and particularly memory-efficient for well-behaved addresses.

=(in-package :parcom/email)= is used below for brevity, but it's assumed that in your own code you will use a nickname, perhaps =pe=.

*** parse

Simply parse an email address.

#+begin_src lisp :exports both (in-package :parcom/email) (parse "alice@bob.com") #+end_src

#+RESULTS: : #S(ADDRESS :NAME "alice" :DOMAIN "bob.com")

*** pretty

Rerender a parsed email address.

#+begin_src lisp :exports both (in-package :parcom/email) (pretty (parse "alice@bob.com")) #+end_src

#+RESULTS: : alice@bob.com

It will even clean up obsolete syntax for you.

#+begin_src lisp :exports both (in-package :parcom/email) (pretty (parse "alice(comment)@bob(comment) .com")) #+end_src

#+RESULTS: : alice@bob.com

*** valid-email-address?

Simply validate that a given address is legal.

#+begin_src lisp :exports both (in-package :parcom/email) (valid-email-address? "alice@bob.com") #+end_src

#+RESULTS: : T

#+begin_src lisp :exports both (in-package :parcom/email) (valid-email-address? "...alice...@bob.com") #+end_src

#+RESULTS: : NIL

*** addr-spec, msg-id

The top-level parsers that can be fitted in anywhere else you use =parcom=.

#+begin_src lisp :exports both (in-package :parcom/email) (parcom:parse (*> (parcom:string "address: ") #'addr-spec) "address: alice@bob.com") #+end_src

#+RESULTS: : #S(ADDRESS :NAME "alice" :DOMAIN "bob.com")

=msg-id= is a simpler variant also found in the spec that denies certain more complicated syntax.

  • Writing your own Parsers

** Basics

The whole point of Parser Combinators is that it becomes simple to write your own parsing functions. Recall that a "fully realized" parser has the signature =offset -> (value, offset)=. In the simplest case, a parser of yours could look like:

#+begin_src lisp :exports both :results verbatim (in-package :parcom)

(defun excited-apple (offset) (funcall (<* (string "Mālum") (char #!)) offset))

(funcall #'excited-apple (in "Mālum! Ō!")) #+end_src

#+RESULTS: : "Mālum", 6

Wherein you utilize the combinators provided by this library to build up composite parsers that are useful to you.

** Parameterized Parsers

You can also parameterize your parsers, similar to parsers like =take= or combinators like =count=:

#+begin_src lisp :exports both :results verbatim (in-package :parcom)

(defun excited-apple (offset) (funcall (<* (string "Mālum") (char #!)) offset))

(defun excited-apples (n) "Parse a certain number of excited apples." (lambda (offset) (funcall (count n #'excited-apple) offset)))

(funcall (excited-apples 3) (in "Mālum!Mālum!Mālum!Mālum!")) #+end_src

#+RESULTS: : ("Mālum" "Mālum" "Mālum"), 18

So, if your parser is parameterized by some initial argument, it has to return a lambda that accepts an =offset=.

** Failure

You can use =ok?= and =failure?= within more complex hand-written parsers to explicitly test for sub-parser failure, and then react accordingly. Yielding =:fail= signals that parsing has failed overall.

#+begin_src lisp :exports both :results verbatim (in-package :parcom)

(defun three-sad-pears (offset) (multiple-value-bind (res next) (funcall (many (string "Pirum trīste")) offset) (if (or (failure? res) (< (length res) 3) (> (length res) 3)) (fail next) (values res next))))

(three-sad-pears (in "Pirum trīste")) #+end_src

#+RESULTS: : :FAIL, 12

** Type Signatures

Parser function signatures can get quite complex if you write them yourself, but =parcom= supplies some shorthands:

  • =(maybe foo)= for parser which may fail.
  • =(always foo=) for parser which always succeed.
  • =->= for function arguments and return types.
  • =fn= for shortening the =declaim= statement.

All together, this looks like:

#+begin_src lisp (fn any (maybe character)) (defun any (offset) ...)

(fn any-but (-> character (maybe character))) (defun any-but (c) ...)

(fn any-if (-> (-> character boolean) (maybe character))) (defun any-if (pred) ...) #+end_src

Especially for higher-order function arguments, this form is significantly shorter.