ChemInformant
April 18, 2026 · View on GitHub
ChemInformant is a robust data acquisition engine for the PubChem database, engineered for the modern scientific workflow. It intelligently manages network requests, performs rigorous runtime data validation, and delivers analysis-ready results, providing a dependable foundation for any computational chemistry project in Python.
✨ Key Features
-
Analysis-Ready Pandas Output with SQL Export: The core API (
get_properties) returns a clean Pandas DataFrame, and a dedicateddf_to_sql()helper (plus thechemfetch --format sqlCLI mode) persists results directly into SQLite / PostgreSQL / any SQLAlchemy backend — so you can move from query to database in two lines without hand-writing wrangling code. -
Automated Network Reliability: Ensures your workflows run flawlessly with built-in persistent caching, smart rate-limiting, and automatic retries. It also transparently handles API pagination (
ListKey) for large-scale queries, delivering complete result sets without any manual intervention. -
Flexible & Fault-Tolerant Input: Natively accepts mixed lists of identifiers (names, CIDs, SMILES) and intelligently handles any invalid inputs by flagging them with a clear status in the output, ensuring a single bad entry never fails an entire batch operation.
-
A Dual API for Simplicity and Power: Offers a clear
get_<property>()convenience layer for quick lookups, backed by a powerfulget_propertiesengine for high-performance batch operations. -
Guaranteed Data Integrity: Employs Pydantic v2 models for rigorous, runtime data validation when using the object-based API, preventing malformed or unexpected data from corrupting your analysis pipeline.
-
Terminal-Ready CLI Tools: Includes
chemfetchandchemdrawfor rapid data retrieval and 2D structure visualization directly from your terminal, perfect for quick lookups without writing a script. -
Modern and Actively Maintained: Built on a contemporary tech stack for long-term consistency and compatibility, providing a reliable alternative to older or less frequently updated libraries.
📦 Installation
Install the library from PyPI:
pip install ChemInformant
To include plotting capabilities for use with the tutorial, install the [plot] extra:
pip install "ChemInformant[plot]"
🚀 Quick Start
Retrieve multiple properties for multiple compounds, directly into a Pandas DataFrame, in a single function call:
import ChemInformant as ci
# 1. Define your identifiers
identifiers = ["aspirin", "caffeine", 1983] # 1983 is paracetamol's CID
# 2. Specify the properties you need
properties = ["molecular_weight", "xlogp", "cas"]
# 3. Call the core function
df = ci.get_properties(identifiers, properties)
# 4. Save the results to an SQL database
ci.df_to_sql(df, "sqlite:///chem_data.db", "results", if_exists="replace")
# 5. Analyze your results!
print(df)
Output:
input_identifier cid status molecular_weight xlogp cas
0 aspirin 2244 OK 180.16 1.2 50-78-2
1 caffeine 2519 OK 194.19 -0.1 58-08-2
2 1983 1983 OK 151.16 0.5 103-90-2
➡️ Click to see Convenience API Cheatsheet
| Function | Description |
|---|---|
get_weight(id) | Molecular weight (float) |
get_formula(id) | Molecular formula (str) |
get_cas(id) | CAS Registry Number (str) |
get_iupac_name(id) | IUPAC name (str) |
get_canonical_smiles(id) | Canonical SMILES with Canonical→Connectivity fallback (str) |
get_isomeric_smiles(id) | Isomeric SMILES with Isomeric→SMILES fallback (str) |
get_xlogp(id) | XLogP (calculated hydrophobicity) (float) |
get_synonyms(id) | List of synonyms (List[str]) |
get_compound(id) | Validated Compound object (Pydantic v2 model) |
Note: This table shows key convenience functions for demonstration. ChemInformant provides 22 convenience functions in total, covering molecular descriptors, mass properties, stereochemistry, and more.
All scalar get_<property>() functions accept a CID, name, or SMILES and return None/[] on failure. get_compound() / get_compounds() instead raise NotFoundError or AmbiguousIdentifierError so you can handle resolution failures explicitly.
ChemInformant also includes handy command-line tools for quick lookups directly from your terminal:
-
chemfetch: Fetches properties for one or more compounds.chemfetch aspirin --props "cas,molecular_weight,iupac_name" -
chemdraw: Renders the 2D structure of a compound.chemdraw aspirin
📚 Documentation & Examples
For a deep dive, please see our detailed guides:
- ➡️ Online Documentation: The official documentation site contains complete API references, guides, and usage examples. This is the most comprehensive resource.
- ➡️ Interactive User Manual: Our Jupyter Notebook Tutorial provides a complete, end-to-end walkthrough. This is the best place to start for a hands-on experience.
- ➡️ Performance Benchmarks: Run integrated benchmarks with
pytest tests/test_benchmarks.py --benchmark-onlyto see the performance advantages of batching and caching.
📖 Additional Resources & Use Cases
- Basic Usage Guide - Quick start examples for common tasks
- Advanced Usage Guide - Complex workflows and batch processing
- Caching Guide - Optimize performance with intelligent caching
- CLI Tools Documentation - Complete reference for
chemfetchandchemdraw - API Reference - Full function documentation with examples
🤔 Why ChemInformant?
ChemInformant's core mission is to serve as a high-performance data backbone for the Python cheminformatics ecosystem. As a software package that has undergone rigorous peer review by both the Journal of Open Source Software (JOSS) and pyOpenSci, it delivers clean, validated, and analysis-ready Pandas DataFrames. This enables researchers to effortlessly pipe PubChem data into powerful toolkits like RDKit, Scikit-learn, or custom machine learning models, transforming multi-step data acquisition and wrangling tasks into single, elegant lines of code.
A detailed comparison with other existing tools is provided in our JOSS paper. For the story and the "why" behind the code, we've shared our thoughts in a post on the official pyOpenSci website.
🤝 Contributing
Contributions are welcome! For guidelines on how to get started, please read our contributing guide. You can open an issue to report bugs or suggest features, or submit a pull request to contribute code.
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
📑 Citation
@article{He2025,
doi = {10.21105/joss.08341},
url = {https://doi.org/10.21105/joss.08341},
year = {2025},
publisher = {The Open Journal},
volume = {10},
number = {112},
pages = {8341},
author = {He, Zhiang},
title = {ChemInformant: A Robust and Workflow-Centric Python Client for High-Throughput PubChem Access},
journal = {Journal of Open Source Software}
}