The DuckDB ReadStat Extension
March 20, 2025 ยท View on GitHub
Use this extension to read data sets from SAS, Stata, and SPSS from DuckDB with ReadStat.
Installation & Loading
Installation is simple through the DuckDB Community Extension repository, just type
INSTALL read_stat FROM community;
LOAD read_stat;
in a DuckDB instance near you.
The read_stat Function
The extension adds a single DuckDB table function, read_stat, which you use as follows:
-- Read a SAS `.sas7bdat` or `.xpt` file
FROM read_stat('sas_data.sas7bdat');
FROM read_stat('sas_data.xpt');
-- Read an SPSS `.sav`, `.zsav`, or `.por` file
FROM read_stat('spss_data.sav');
FROM read_stat('compressed_spss_data.zsav');
FROM read_stat('portable_spss_data.por');
-- Read a Stata .dta file
FROM read_stat('stata_data.dta');
If the file extension is not .sas7bdat, .xpt, .sav, .zsav, .por, or .dta,
use the read_stat function for the right file type with the format parameter:
FROM read_stat('sas_data.other_extension', format = 'sas7bdat');
FROM read_stat('sas_data.other_extension', format = 'xpt');
-- SPSS `.sav` and `.zsav` can both be read through the format `'sav'`
FROM read_stat(
'spss_data_possibly_compressed.other_extension',
format = 'sav'
);
FROM read_stat('portable_spss_data.other_extension', format = 'por');
FROM read_stat('stata_data.other_extension', format = 'dta');
Override the file character encoding inferred from the file with an iconv encoding name, see https://www.gnu.org/software/libiconv/:
FROM read_stat('latin1_encoded.sas7bdat', encoding = 'iso-8859-1');
If your files have the proper file extensions and you do not need to override their character encodings, a replacement scan is also available:
-- Read a SAS `.sas7bdat` or `.xpt` file
FROM 'sas_data.sas7bdat';
FROM 'sas_data.xpt';
-- Read an SPSS `.sav`, `.zsav`, or `.por` file
FROM 'spss_data.sav';
FROM 'compressed_spss_data.zsav';
FROM 'portable_spss_data.por';
-- Read a Stata .dta file
FROM 'stata_data.dta';
Contributing
Cloning
Clone the repo with submodules
git clone --recurse-submodules <repo>
Dependencies
In principle, compiling this template only requires a C/C++ toolchain. However, this template relies on some additional tooling to make life a little easier and to be able to share CI/CD infrastructure with extension templates for other languages:
- Python3
- Python3-venv
- Make
- CMake
- Git
- (Optional) Ninja + ccache
- vcpkg
Installing these dependencies will vary per platform:
- For Linux, these come generally pre-installed or are available through the distro-specific package manager.
- For MacOS, homebrew.
- For Windows, chocolatey.
Building
After installing the dependencies, building is a two-step process. Firstly run:
make configure
This will ensure a Python venv is set up with DuckDB and DuckDB's test runner installed. Additionally, depending on configuration, DuckDB will be used to determine the correct platform for which you are compiling.
Then, to build the extension run:
make debug
This delegates the build process to cargo, which will produce a shared library in target/debug/<shared_lib_name>. After this step,
a script is run to transform the shared library into a loadable extension by appending a binary footer. The resulting extension is written
to the build/debug directory.
To create optimized release binaries, simply run make release instead.
Faster builds
We recommend to install Ninja and Ccache for building as this can have a significant speed boost during development. After installing, ninja can be used by running:
make clean
GEN=ninja make debug
Testing
This extension uses the DuckDB Python client for testing. This should be automatically installed in the make configure step.
The tests themselves are written in the SQLLogicTest format, just like most of DuckDB's tests. A sample test can be found in
test/sql/<extension_name>.test. To run the tests using the debug build:
make test_debug
or for the release build:
make test_release
Version switching
Testing with different DuckDB versions is really simple:
First, run
make clean_all
to ensure the previous make configure step is deleted.
Then, run
DUCKDB_TEST_VERSION=v1.1.2 make configure
to select a different duckdb version to test with
Finally, build and test with
make debug
make test_debug
Using unstable Extension C API functionality
The DuckDB Extension C API has a stable part and an unstable part. By default, this template only allows usage of the stable part of the API. To switch it to allow using the unstable part, take the following steps:
Firstly, set your TARGET_DUCKDB_VERSION to your desired in ./Makefile. Then, run make update_duckdb_headers to ensure
the headers in ./duckdb_capi are set to the correct version. (FIXME: this is not yet working properly).
Finally, set USE_UNSTABLE_C_API to 1 in ./Makefile. That's all!