pyexcel-xlsxr - Let you focus on data, instead of xlsx format
November 17, 2025 ยท View on GitHub
================================================================================ pyexcel-xlsxr - Let you focus on data, instead of xlsx format
.. image:: https://raw.githubusercontent.com/pyexcel/pyexcel.github.io/master/images/patreon.png :target: https://www.patreon.com/chfw
.. image:: https://codecov.io/gh/pyexcel/pyexcel-xlsxr/branch/master/graph/badge.svg :target: https://codecov.io/gh/pyexcel/pyexcel-xlsxr
.. image:: https://badge.fury.io/py/pyexcel-xlsxr.svg :target: https://pypi.org/project/pyexcel-xlsxr
.. image:: https://pepy.tech/badge/pyexcel-xlsxr/month :target: https://pepy.tech/project/pyexcel-xlsxr
.. image:: https://img.shields.io/static/v1?label=continuous%20templating&message=%E6%A8%A1%E7%89%88%E6%9B%B4%E6%96%B0&color=blue&style=flat-square :target: https://moban.readthedocs.io/en/latest/#at-scale-continous-templating-for-open-source-projects
.. image:: https://img.shields.io/static/v1?label=coding%20style&message=black&color=black&style=flat-square :target: https://github.com/psf/black
pyexcel-xlsxr is a specialized xlsx reader using lxml. It does partial reading, meaning it wont load all content into memory.
lxml installation
This library depends on lxml. Because its availablity, the use of this library is restricted.
for PyPy, lxml == 3.4.4 are tested to work well. But lxml above 3.4.4 is difficult to get installed.
for Python 3.7, please use lxml==4.1.1.
Otherwise, this library works OK with lxml 3.4.4 or above.
Support the project
If your company uses pyexcel and its components in a revenue-generating product,
please consider supporting the project on GitHub or
Patreon <https://www.patreon.com/bePatron?u=5537627>_. Your financial
support will enable me to dedicate more time to coding, improving documentation,
and creating engaging content.
Known constraints
Fonts, colors and charts are not supported.
Nor to read password protected xls, xlsx and ods files.
Installation
You can install pyexcel-xlsxr via pip:
.. code-block:: bash
$ pip install pyexcel-xlsxr
or clone it and install it:
.. code-block:: bash
$ git clone https://github.com/pyexcel/pyexcel-xlsxr.git
$ cd pyexcel-xlsxr
$ python setup.py install
Usage
As a standalone library
.. testcode:: :hide:
>>> import os
>>> import sys
>>> from io import BytesIO
>>> from collections import OrderedDict
.. testcode:: :hide:
>>> from pyexcel_xlsxw import save_data
>>> data = OrderedDict() # from collections import OrderedDict
>>> data.update({"Sheet 1": [[1, 2, 3], [4, 5, 6]]})
>>> data.update({"Sheet 2": [["row 1", "row 2", "row 3"]]})
>>> save_data("your_file.xlsx", data)
Read from an xlsx file
Here's the sample code:
.. code-block:: python
>>> from pyexcel_xlsxr import get_data
>>> data = get_data("your_file.xlsx")
>>> import json
>>> print(json.dumps(data))
{"Sheet 1": [[1, 2, 3], [4, 5, 6]], "Sheet 2": [["row 1", "row 2", "row 3"]]}
.. testcode:: :hide:
>>> data = OrderedDict()
>>> data.update({"Sheet 1": [[1, 2, 3], [4, 5, 6]]})
>>> data.update({"Sheet 2": [[7, 8, 9], [10, 11, 12]]})
>>> io = BytesIO()
>>> save_data(io, data)
>>> unused = io.seek(0)
>>> # do something with the io
>>> # In reality, you might give it to your http response
>>> # object for downloading
Read from an xlsx from memory
Continue from previous example:
.. code-block:: python
>>> # This is just an illustration
>>> # In reality, you might deal with xlsx file upload
>>> # where you will read from requests.FILES['YOUR_XLSX_FILE']
>>> data = get_data(io)
>>> print(json.dumps(data))
{"Sheet 1": [[1, 2, 3], [4, 5, 6]], "Sheet 2": [[7, 8, 9], [10, 11, 12]]}
Pagination feature
Let's assume the following file is a huge xlsx file:
.. code-block:: python
huge_data = [ ... [1, 21, 31], ... [2, 22, 32], ... [3, 23, 33], ... [4, 24, 34], ... [5, 25, 35], ... [6, 26, 36] ... ] sheetx = { ... "huge": huge_data ... } save_data("huge_file.xlsx", sheetx)
And let's pretend to read partial data:
.. code-block:: python
partial_data = get_data("huge_file.xlsx", start_row=2, row_limit=3) print(json.dumps(partial_data)) {"huge": [[3, 23, 33], [4, 24, 34], [5, 25, 35]]}
And you could as well do the same for columns:
.. code-block:: python
partial_data = get_data("huge_file.xlsx", start_column=1, column_limit=2) print(json.dumps(partial_data)) {"huge": [[21, 31], [22, 32], [23, 33], [24, 34], [25, 35], [26, 36]]}
Obvious, you could do both at the same time:
.. code-block:: python
partial_data = get_data("huge_file.xlsx", ... start_row=2, row_limit=3, ... start_column=1, column_limit=2) print(json.dumps(partial_data)) {"huge": [[23, 33], [24, 34], [25, 35]]}
.. testcode:: :hide:
os.unlink("huge_file.xlsx")
As a pyexcel plugin
No longer, explicit import is needed since pyexcel version 0.2.2. Instead, this library is auto-loaded. So if you want to read data in xlsx format, installing it is enough.
Reading from an xlsx file
Here is the sample code:
.. code-block:: python
>>> import pyexcel as pe
>>> sheet = pe.get_book(file_name="your_file.xlsx")
>>> sheet
Sheet 1:
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 4 | 5 | 6 |
+---+---+---+
Sheet 2:
+-------+-------+-------+
| row 1 | row 2 | row 3 |
+-------+-------+-------+
.. testcode:: :hide:
>>> sheet.save_as("another_file.xlsx")
Reading from a IO instance
You got to wrap the binary content with stream to get xlsx working:
.. code-block:: python
>>> # This is just an illustration
>>> # In reality, you might deal with xlsx file upload
>>> # where you will read from requests.FILES['YOUR_XLSX_FILE']
>>> xlsxfile = "another_file.xlsx"
>>> with open(xlsxfile, "rb") as f:
... content = f.read()
... r = pe.get_book(file_type="xlsx", file_content=content)
... print(r)
...
Sheet 1:
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 4 | 5 | 6 |
+---+---+---+
Sheet 2:
+-------+-------+-------+
| row 1 | row 2 | row 3 |
+-------+-------+-------+
License
New BSD License
Developer guide
Development steps for code changes
#. git clone https://github.com/pyexcel/pyexcel-xlsxr.git #. cd pyexcel-xlsxr
Upgrade your setup tools and pip. They are needed for development and testing only:
#. pip install --upgrade setuptools pip
Then install relevant development requirements:
#. pip install -r rnd_requirements.txt # if such a file exists #. pip install -r requirements.txt #. pip install -r tests/requirements.txt
Once you have finished your changes, please provide test case(s), relevant documentation.
.. note::
As to rnd_requirements.txt, usually, it is created when a dependent
library is not released. Once the dependency is installed
(will be released), the future
version of the dependency in the requirements.txt will be valid.
How to test your contribution
Although nose and doctest are both used in code testing, it is advisable
that unit tests are put in tests. doctest is incorporated only to make sure
the code examples in documentation remain valid across different development
releases.
On Linux/Unix systems, please launch your tests like this::
$ make
On Windows, please issue this command::
> test.bat
Before you commit
Please run::
$ make format
so as to beautify your code otherwise your build may fail your unit test.
Before you raise pull request
Please edit 'changelog.yml' and record your changes
.. testcode:: :hide:
import os os.unlink("your_file.xlsx") os.unlink("another_file.xlsx")