Klimatkollen Garbo AI
June 25, 2026 · View on GitHub
This is the main repository for the AI pipeline we call Garbo. Garbo runs via our validation frontend and pipeline-api, and is powered by LLM:s to fetch and extract GHG self-reported data from companies. It automates the process of data extraction, evaluation, and formatting, providing a streamlined workflow for handling environmental data.
Garbo is invoked through the validation UI or pipeline-api, and has a pipeline of tasks and jobs that will be started in order to extract, evaluate and format the data autonomously.
Do you have an idea? Jump into the code or head to our Discord server to discuss your thoughts.
We utilise an open source queue manager called BullMQ which relies on Redis. The data is then stored into DB and Wikidata.
Current Status
Start the validation frontend and pipeline-api to run reports through the Garbo pipeline.
Data Flow
Some of the following steps will be performed in parallel and most will be asynchronous. If a process is failed it's important to be able to restart it after a new code release so we can iterate on the prompts etc without having to restart the whole process again.
flowchart TD
subgraph ingest["PDF ingestion"]
A[parsePdf] -->|not cached| B[doclingParsePDF]
B --> C[indexMarkdown]
C --> D[precheck]
A -->|cached| D
end
subgraph context["Company context"]
F[guessWikidata]
G[followUpFiscalYear]
H[extractEmissions]
D --> F
D --> G
F --> H
G --> H
end
subgraph followups["Follow-up extraction (parallel children of checkDB)"]
J[followUpIndustryGics]
K1[followUpScope1]
K2[followUpScope2]
L[followUpScope3]
M[followUpBiogenic]
N[followUpEconomy]
O[followUpGoals]
P[followUpInitiatives]
Q[followUpBaseYear]
CT[followUpCompanyTags]
LEI[extractLEI]
DESC[extractDescriptions]
end
H --> J & K1 & K2 & L & M & N & O & P & Q & CT & LEI & DESC
J & K1 & K2 & L & M & N & O & P & Q & CT & LEI & DESC --> R[checkDB]
subgraph persist["Diff, save and notify"]
DR[diffReportingPeriods]
DI[diffIndustry]
DG[diffGoals]
DBY[diffBaseYear]
DIN[diffInitiatives]
DLEI[diffLEI]
DD[diffDescriptions]
DT[diffTags]
Y[saveToAPI]
SC[sendCompanyLink]
end
R --> DR & DI & DG & DBY & DIN & DLEI & DD & DT
DR & DI & DG & DBY & DIN & DLEI & DD & DT --> Y
DR & DI & DG & DBY & DIN & DLEI & DD & DT --> SC
In BullMQ flows, child jobs finish before their parent runs. Diff jobs and saveToAPI are only enqueued when the matching follow-up returned data. sendCompanyLink is the flow parent that runs after all diff children complete. By default, an already-indexed PDF skips Docling; pass forceReindex: true to re-parse.
LEI resolution (extractLEI → diffLEI → saveToAPI) is documented in doc/pipeline.md#lei-legal-entity-identifier.
For a more in depth explaination of the pipeline and its steps continue here.
Get started 🚀
Ensure you have Node.js version 22.0.0 or higher installed. You will also need Docker (or Podman) to run containers.
Setting up environment variables
Make a copy of the file .env.example and name it .env. Fill it in using the instructions in the file.
Authentication
The API uses GitHub OAuth for authentication. The backend handles the OAuth flow through a single callback endpoint (/api/auth/github/callback) registered with GitHub, then redirects users to the appropriate frontend client based on the state parameter. Multiple frontend clients can use the same backend by passing an optional redirect_uri query parameter when initiating authentication.
Client API keys (X-API-Key): Read routes under /api/... (except /api/auth/... and OpenAPI docs) require an X-API-Key header unless ALLOW_ANONYMOUS_CLIENT_API=true (cutover only). Keys use the format garb_<lookup>.<secret>. Run npm run prisma migrate dev and seed, then optionally set GARBO_ALL_ACCESS_API_KEY / GARBO_BASE_API_KEY before npm run prisma db seed (or run npm run seed:client-api to seed only the API key roles). Hashing uses API_SECRET as pepper unless you set CLIENT_API_KEY_PEPPER. validate / bolt dev: set GARBO_PROXY_CLIENT_API_KEY in .env so the Vite proxy can attach X-API-Key when forwarding to Garbo. For implementation detail, manual test matrix, and troubleshooting, see doc/API_KEYS.md.
Installing dependencies
npm i
Note
If you use a Unix-based operating system, you might need to install additional dependencies for the third-party package canvas and PDF2Image. Follow the instructions at canvas and PDF2Image.
Starting the containers
This project expects some containers running in the background to work properly. We use Postgres as our primary database, Redis for BullMQ, and ChromaDB for embeddings. PDF parsing uses Docling via local docling-serve.
For local development, uncomment the docling service in docker-compose.yaml, set DOCLING_URL=http://localhost:5001/v1 and DOCLING_USE_LOCAL=true in .env, then start the containers:
The simplest way to start the containers is to run the following docker command.
docker compose up
You may want a graphical user interface to make it easier to manage your local containers. [Podman desktop](https://podman-desktop.io/) and [Rancher desktop](https://rancherdesktop.io/) are both good alternatives
Seeding the database for development
This applies migrations and generates the Prisma client. Seed development data (users, tags, GICS, API keys, etc.) in a separate step.
npm run prisma migrate dev
npm run prisma db seed
Optional: Restoring a database backup with test data
Note
This step is very helpful to get a good starting point for developing and testing the frontend and/or the API. However, you may also skip it if you want to start with a clean database.
First, ask one of the Klimatkollen team members and they will send you a database backup.
Not required the first time: Delete the database to make sure it doesn't exist:
docker exec -i garbo_postgres dropdb -f -U postgres --if-exists garbo
Then, replace ~/Downloads/backup_garbo_XYZ.dump with the path to your DB backup file and restore the database backup with the following command:
docker exec -i garbo_postgres pg_restore -C -v -d postgres -U postgres < ~/Downloads/backup_garbo_XYZ.dump
Starting the Garbo project in development mode
The code can be started in three main ways, depending on what you plan to develop/test/run locally
1) To serve only the API
Note
If you plan to develop the frontend and/or the API, this is the best way to get started:
npm run dev-api
This starts the API, and makes it possible to view the OpenAPI documentation at http://localhost:3000/reference (from OPENAPI_PREFIX; must not be api, which collides with REST /api/*).
2) To start the AI pipeline, BullMQ admin dashboard and the API:
If you plan to develop the AI pipeline, this is the recommended way to start the code.
First, run the following command to start the API and the queue system, including an admin dashboard to view progress, logs and more.
npm run dev-board
In addition to the accessing the local API, you can now view the BullMQ dashboard at http://localhost:3000/admin/queues.
The BullMQ dashboard is useful to develop and debug how garbo is extracting data from reports. A common workflow is to run a report through the garbo pipeline and then follow the progress in the BullMQ dashboard to view logs, errors and restart jobs. When updating code or prompts in the workers that make up what we call the garbo pipeline, it's possible to restart a job partway through the pipeline, to make it both easier and faster to iterate on changes.
Then, open another terminal and start the AI pipeline and its workers, which are responsible for processing each report. These can be scaled horizontally.
npm run dev-workers
3) Starting everything concurrently
Get everything up and running with one command (with all output in one terminal).
npm run dev
4) (Optional) Redis Insights
a). Start Redis Insight Run the following Docker command to start Redis Insight:
docker run -d --name redisinsight -p 5540:5540 redislabs/redisinsight:latest
b). Connect Redis Insight to Redis
- Host:
garbo_redis(or the Redis container IP, e.g.,172.17.0.2) - Port:
6379 - Username: Leave empty or use
default. - Password: Leave empty.
c). Access Redis Insight Go to http://localhost:5540 in your browser and add the Redis database using the above connection details.
Flushing Redis Cache
If you're experiencing issues with Redis or need to clear all data, you can flush the Redis cache using the following command:
redis-cli -h 127.0.0.1 -p 6379 -a [PASS] KEYS '*' | xargs redis-cli -h 127.0.0.1 -p 6379 -a [PASS] DEL
Make sure to replace [PASS] with your actual Redis password. If your Redis instance doesn't use a password, you can omit the -a [PASS] part.
Setup completed 🎉
Well done! You've now set up the garbo backend and are ready to start development :)
How to run the pipeline
Use the validation frontend with Garbo running locally (npm run dev or npm run dev-board + npm run dev-workers).
- Start the API and workers (see above).
- Start the validation frontend and point it at
http://localhost:3000. - Submit a report URL through the UI.
- Monitor progress in the BullMQ dashboard at http://localhost:3000/admin/queues and restart failed jobs as needed.
Job payload options (autoApprove, forceReindex, tags, etc.) are documented in doc/PIPELINE_RUN_AND_TAGS.md.
How to make a backup of the local postgres DB
docker exec -i garbo_postgres pg_dump -U postgres -Fc -d garbo > ~/Downloads/backup_garbo_XYZ.dump
Testing DB migrations
These steps can be useful to test DB migrations with data similar to the production environment.
- Recommended: Create a local test DB. This allows you to keep your regular development DB intact.
docker run -d -p 5432:5432 --name garbo_test_postgres -e POSTGRES_PASSWORD=mysecretpassword postgres
Alternatively, make sure your local postgres container is running.
-
Ask one of the Klimatkollen team members and they will send you a database backup.
-
Delete the database if it exists:
docker exec -i garbo_test_postgres dropdb -f -U postgres --if-exists garbo
- Restore the backup. This will initially connect to the default
postgresdatabase without making any modifications and then create any databases if they do not exist
docker exec -i garbo_test_postgres pg_restore -C -v -d postgres -U postgres < ~/Downloads/backup_garbo_XYZ.dump
-
Test the DB migrations with
npm run prisma migrate dev. -
Restart the Garbo API and workers and verify the migration was successful.
Testing
To run the tests, use the following command:
npm test
How to run with Docker
To run the application
docker run -d -p 3000:3000 ghcr.io/klimatbyran/garbo npm start
# start how many workers you want:
docker run -d ghcr.io/klimatbyran/garbo npm run workers
docker run -d ghcr.io/klimatbyran/garbo npm run workers
docker run -d ghcr.io/klimatbyran/garbo npm run workers
# first time you need to initialize the postgres database:
npm run prisma db push # create tables
npm run prisma db seed # seed the data with initial content
Operations / DevOps
This application is deployed in production with Kubernetes and uses FluxCD as CD pipeline. The yaml files in the k8s directory are automatically synced to the cluster. If you want to run a fork of the application yourself - the production cluster uses Helm charts along the lines of:
postgresql (bitnami)
redis (bitnami)
chromadb
To create secret in the k8s cluster - use this command to transfer your .env file as secret to the cluster:
kubectl create secret generic env --from-env-file=.env
Contributing
We welcome contributions! Please see our CONTRIBUTING.md for guidelines on how to contribute to this project.
Contact
For any questions or issues, please contact the maintainers at hej@klimatkollen.se and you will get an invite to our Discord.
License
This project is licensed under the terms of the Apache 2.0 © Klimatbyrån Ideell Förening.