EUR-Lex Parser
April 3, 2026 · View on GitHub
An EUR-Lex parser for Python.
Usage
You can install this package as follows:
pip install -U eurlex
After installing this package, you can download and parse any document from EUR-Lex. For example, the 32019R0947 regulation:
from eurlex import get_html_by_celex_id, parse_html
# Retrieve and parse the document with CELEX ID "32019R0947" into a Pandas DataFrame
celex_id = "32019R0947"
html = get_html_by_celex_id(celex_id)
df = parse_html(html)
# Get the first line of Article 1
df_article_1 = df[df.article == "1"]
df_article_1_line_1 = df_article_1.iloc[0]
# Display the subtitle and corresponding text of Article 1
assert df_article_1_line_1.article_subtitle == "Subject matter"
assert df_article_1_line_1.text == (
"This Regulation lays down detailed provisions for the operation of unmanned aircraft systems as well as for personnel, including remote pilots and organisations involved in those operations."
)
Every document on EUR-Lex displays a CELEX number at the top of the page. More information on CELEX numbers can be found on the EUR-Lex website.
For more information about the methods in this package, see the unit tests and doctests.
Data Structure
The following columns are available in the parsed dataframe:
text: The texttype: The type of the datadocument: The document in which the text is foundarticle: The article in which the text is foundarticle_subtitle: The subtitle of the article (when available)ref: The indentation level of the text within the article (e.g.["(1)", "(a)"]when the text is found under paragraph(1), subparagraph(a))
In some cases, additional fields are available. For example, the group field which contains the bold text under which a text is found.
Architecture
The dependency graph below is generated by import-cruiser and refreshed by the pre-commit hook. It focuses on src/eurlex and its non-dev external dependencies, while keeping the public import surface available through eurlex.
Module map
fetch.py: download EUR-Lex HTML and resolve multiple-choice responsesparser.py: turn HTML into tabular recordssparql.py: build and run SPARQL querieslanguage.py: language-code normalizationuri.py: query-parameter and IRI helpersmarkup.py: XML and tag/class helpersconstants.py: prefix and language-code tables
Contributing
Feel free to send any issues, ideas or pull requests.
Branching and pull requests
Please do your work on a feature branch that follows the feature/* naming pattern, for example feature/my-new-improvement.
When your work is ready, open a pull request from that feature branch to the target branch (typically main) for review.
Local checks
For development, install the project and its hooks, then let pre-commit run the same checks that CI expects:
python -m pip install -e .[dev]
pre-commit install
pre-commit run --all-files
The final hook runs the doctests and enforces 100% coverage for eurlex, so you should see the same failures locally before a commit lands.
The README examples are also exercised automatically through pytest-readme, so they stay in sync with the code instead of becoming decorative fiction.
The runnable examples in examples/ are executed by the test suite as well, so they are part of the coverage target rather than a separate side quest.
CI tests the package on Python 3.11, 3.12, and 3.13, while the pre-commit hooks keep the code quality checks on a single pinned environment.
Version tags that start with v — for example v0.1.8 — now create a GitHub Release, attach the built distributions, and publish the package to PyPI after the checks pass.