socid_extractor
May 9, 2026 · View on GitHub
Turn any public profile page into a structured account record — usernames, display names, bios, avatars, locations, joined-at dates, follower counts, external links, and the stable internal identifiers that uniquely pin an account across renames, redesigns, and deletions.
socid_extractor parses HTML pages and API responses from 130+ platforms and returns a flat, machine-readable dictionary of account fields. No API keys required, no headless browser — just a single function call on response text.
Why it's useful
- Stable cross-service IDs. Get GAIA ID (Google), Facebook UID, Yandex Public ID, Instagram pk, and dozens more — values that survive username changes and let you correlate accounts across leaks, archives, and search-engine indices.
- One uniform interface. Same
extract()call for Instagram, GitHub, VK, Reddit, Substack, Bluesky, TikTok — no per-platform glue code on your side. - Field ontology. Normalized field names across platforms (
username,fullname,created_at,is_verified, …) so downstream pipelines don't need 130 mappings. - Battle-tested. Powers Maigret and a number of other OSINT tools.
Installation
Python: 3.10+.
pip install socid-extractor
For a clean CLI install on a workstation:
pipx install socid-extractor
The latest development version:
pip install -U git+https://github.com/soxoj/socid-extractor.git
Quick start
As a CLI:
$ socid_extractor --url https://www.deviantart.com/muse1908
country: France
created_at: 2005-06-16 18:17:41
gender: female
username: Muse1908
website: www.patreon.com/musemercier
links: ['https://www.facebook.com/musemercier', 'https://www.instagram.com/muse.mercier/', 'https://www.patreon.com/musemercier']
tagline: Nothing worth having is easy...
As a Python library:
import requests
import socid_extractor
r = requests.get('https://www.patreon.com/annetlovart')
print(socid_extractor.extract(r.text))
# {'patreon_id': '33913189', 'patreon_username': 'annetlovart',
# 'fullname': 'Annet Lovart',
# 'links': "['https://www.facebook.com/322598031832479', ...]"}
Tip — batch runs: pass --skip-fetch-if-no-url-hint to skip the HTTP request when the URL doesn't match any known site hint (faster, but may skip generic engines such as forum templates):
$ socid_extractor --url https://example.com/foo --skip-fetch-if-no-url-hint
Supported sites
130+ schemes — see METHODS.md for the full list.
A non-exhaustive sample:
- Major networks: Facebook (user & group pages), Instagram, VK.com, OK.ru, Reddit, TikTok, Bluesky, Tumblr, Flickr
- Google ecosystem: Google docs/maps contributions (cookies required), Google Play, YouTube
- Mail.ru: my.mail.ru user mainpage, photo, video
- Dev / writing platforms: GitHub, Stack Overflow (HTML + API), LeetCode, Hashnode, Medium, Substack, Paragraph, WordPress.org, Virgool
- Forums (universal detectors): Discourse, MediaWiki / Fandom wikis, Mastodon
- Niche / vertical: Chess.com, Roblox, MyAnimeList, Scratch, Wikipedia, DailyMotion, SlideShare, Weebly, Calendly, Amazon Author, Boosty, Warpcast (Farcaster), Fragment (TON/Telegram), Rarible, CSSBattle, lnk.bio, Spatial, TwitchTracker, Max (max.ru)
…and many others.
For data examples, see tests/test_e2e.py; for the parsing logic, see socid_extractor/schemes.py; for the field ontology, see FIELDS.md.
Use cases
- Pivot from a profile to everything you can see. One call returns the visible info plus the hidden internal IDs the platform uses behind the scenes. Background reading: Week in OSINT — Getting a grasp on Google IDs.
- Track accounts across renames, redesigns, and deletions. Stable IDs (GAIA, FB UID, Yandex Public ID, Instagram pk, …) let you re-identify the same person even when every visible field has changed. Background: Aware Online — User IDs in social-media investigations.
- Search by cross-service UID. Once you have a stable identifier you can pivot into:
- SQL / leaked databases (forum dumps, breach data) where the UID is the join key,
- Google / Yandex / archive.org indices that captured URLs containing the UID.
- Feed downstream OSINT tooling. A normalized record is much easier to ingest than per-site scrapers — used by Maigret and similar tools for enrichment.
SOWEL classification
Maps to the following SOWEL techniques:
Tools using socid_extractor
- Maigret — powerful namechecker that generates a report with all available info from accounts found across 3000+ sites.
- TheScrapper — scrape emails, phone numbers, and social-media accounts from a website.
- InfoHunter — open-source OSINT tool to search, collect, and analyze information online.
- YaSeeker — gather all available information about a Yandex account by login/email.
- Marple — scrape search-engine results for a given username.
Testing
python3 -m pytest tests/test_e2e.py -n 10 -k 'not cookies' -m 'not github_failed and not rate_limited'
Every new scheme must have an e2e test in tests/test_e2e.py hitting a real URL/API. Unit tests with inline fixtures (tests/test_socid_improvements.py) are also required but do not replace e2e coverage. See docs/testing-and-ci.md for details.
Developer documentation (architecture, modules, CI) lives in docs/.
Contributing
See the contributing guide if you want to add a new scheme or fix anything.