README.md

May 22, 2026 · View on GitHub

Harpoon

An ecosystem of crawlers for detecting: leaks, sensitive data exposure and attempts exfiltration of data



Summary

⚠️ Warning: Harpoon is currently in development, you've been warned :) and please consider contributing!

This project is summarized in several crawlers that constitute a single ecosystem, that monitor public or otherwise authorized channels such as code hosting, search engines and paste sites in order to perform leak detection, exposed sensitive file discovery and data exfiltration monitoring.


How it works

Image

Current implementation focus

  • REST API for alerts, companies, history and rules
  • Bing crawler for public search discovery
  • Pastebin crawler for public paste monitoring
  • Shared crawler detection modules for matching and persistence

Roadmap constraints

Future integrations should stay within authorized monitoring boundaries. Safe next steps include API hardening, notification workers, dashboards, and documented integrations with approved data sources. This repository should not rely on breached credential datasets, hidden services, or scraping private groups or accounts.


Download and setup

  # Download
  $ git clone https://github.com/htrgouvea/harpoon && cd harpoon

  # Start the REST API stack from the current api/ directory
  $ cd api
  $ cat > .env <<'EOF'
  APPLICATION_PORT=5000
  DATABASE_HOST=db
  DATABASE_PORT=5432
  DATABASE_NAME=harpoon
  DATABASE_USER=harpoon
  DATABASE_PASSWORD=harpoon
  EOF

  # Build and start Postgres + API
  $ docker compose up --build -d

  # API will be available on http://localhost:5000

Contribution

Your contributions and suggestions are heartily ♥ welcome. See here the contribution guidelines. Please, report bugs via issues page and for security issues, see here the security policy. (✿ ◕‿◕)


License

This work is licensed under MIT License.