transfermarkt-scraper

March 21, 2026 · View on GitHub

checks status docker build status

transfermarkt-scraper

A web scraper for collecting data from Transfermarkt website. It recurses into the Transfermarkt hierarchy to find competitions, games, clubs, players, appearances, national teams and their competitions, and extract them as JSON objects.

The scraper follows two parallel hierarchies:

# Club football
Confederations ====> Competitions ====> Clubs ====> Players ====> Appearances
                                    ====> Games ====> Game Lineups
                                    ====> Tournament Editions ====> Games

# International football
Confederations ====> Countries ====> National Teams ====> Players ====> Appearances
               ====> Competitions (national team competitions: World Cup, Euros, etc.)

Each one of these entities can be discovered and refreshed separately by invoking the corresponding crawler.

Installation

Open in GitHub Codespaces

This project uses Crawlee for Python and can be run with the CLI entry point. All dependencies can be installed using poetry.

cd transfermarkt-scraper
poetry install
poetry shell

Usage

These are some usage examples for how the scraper may be run.

# discover confederations and competitions on separate invocations
python -m tfmkt confederations > confederations.json
python -m tfmkt competitions -p confederations.json > competitions.json

# you can use intermediate files or pipe crawlers one after the other to traverse the hierarchy
cat competitions.json | head -2 \
    | python -m tfmkt clubs \
    | python -m tfmkt players \
    | python -m tfmkt appearances

# scrape national team competitions (World Cup, Euros, Nations League, etc.)
# these are emitted alongside domestic competitions when running the competitions crawler
python -m tfmkt confederations \
    | python -m tfmkt competitions \
    | grep -v '"country_name"' > national_team_competitions.json

# scrape national team squads
python -m tfmkt confederations \
    | python -m tfmkt countries \
    | python -m tfmkt national_teams > national_teams.json

# scrape players from a national team squad
cat national_teams.json | head -1 | python -m tfmkt players

# list all historical World Cup editions (year, winner, season id)
echo '{"type":"competition","competition_type":"world_cup","href":"/world-cup/startseite/pokalwettbewerb/FIWC","competition_name":"World Cup"}' \
    | python -m tfmkt tournament_editions > world_cup_editions.json

# scrape games for a specific World Cup edition
# note: Transfermarkt uses season=<year-1> for summer tournaments (e.g. 2021 for Qatar 2022)
echo '{"type":"competition","competition_type":"world_cup","href":"/world-cup/startseite/pokalwettbewerb/FIWC","competition_name":"World Cup"}' \
    | python -m tfmkt games --season 2021 > world_cup_2022_games.json

# scrape games for UEFA Euro 2024 (season=2023 on Transfermarkt)
echo '{"type":"competition","competition_type":"uefa_euro","href":"/uefa-euro/startseite/pokalwettbewerb/EURO","competition_name":"UEFA Euro"}' \
    | python -m tfmkt games --season 2023 > euro_2024_games.json

Alternatively you can also use dcaribou/transfermarkt-scraper docker image

docker run \
    -ti -v "$(pwd)"/.:/app \
    dcaribou/transfermarkt-scraper:main \
    python -m tfmkt competitions -p samples/confederations.json

Items are extracted in JSON format with one JSON object per item, which get printed to the stdout. Samples of extracted data are provided in the samples folder.

Crawlers

CrawlerInputOutputNotes
confederationsConfederation5 items: Europa, América, África, Asia, FIFA
competitionsConfederationCompetitionDomestic + national team competitions per confederation
countriesConfederationCountryOne item per country (league-bearing nations)
clubsCompetition (first_tier)ClubClub squads with market value, coach, stadium
national_teamsCountryNational TeamSenior national team per country
playersClub or National TeamPlayerFull player profile including market value history
appearancesPlayerAppearancePer-match stats for every game played
tournament_editionsCompetitionTournament EditionHistorical editions with year, season, winner, coach
gamesCompetitionGameMatch result, events, managers. Use --season to select the edition (e.g. --season 2021 for Qatar 2022, --season 2023 for Euro 2024)
game_lineupsGameGame LineupsStarting XI, substitutes, formation

Check out transfermarkt-datasets to see transfermarkt-scraper in action on a real project.

arguments

  • -p / --parents: Crawler "parents" are either a file or a piped output with the parent entities. For example, competitions is parent of clubs, which in turn is a parent of players.
  • -s / --season: The season that the crawler is to run for. It defaults to the most recent season.
  • --base-url: Override the base Transfermarkt URL.

contribute

Extending existing crawlers in this project in order to scrape additional data or even creating new crawlers is quite straightforward. If you want to contribute with an enhancement to transfermarkt-scraper I suggest that you follow a workflow similar to

  1. Fork the repository
  2. Modify or add new crawlers to tfmkt/crawlers. Here is an example PR that extends the games crawler to scrape a few additional fields from Transfermakt games page.
  3. Create a PR with your changes and a short description for the enhancement and send it over :rocket:

It is usually also a good idea to have a short discussion about the enhancement beforehand. If you want to propose a change and collect some feeback before you start coding you can do so by creating an issue with your idea in the Issues section.