Polyglot Code Scanner
April 18, 2026 · View on GitHub
This is part of my Polyglot Code tools - for the main documentation, see https://polyglot.korny.info
A note about releases
Binary releases are working again - see https://github.com/kornysietsma/polyglot-code-scanner/releases for binary releases.
However, for M1 macs this won't work - github actions doesn't yet support M1 macs for free, so you'll have to build binaries yourself for now.
For Macs you also need to run xattr -d com.apple.quarantine polyglot-code-scanner-x86_64-macos to remove the quarantine that OSX adds to all downloaded binaries.
Intro
This application scans source code directories, identifying a range of code metrics and other data, and storing the results in a JSON file for later visualisation by the Polyglot Code Explorer
Installation and running
See also https://polyglot.korny.info/tools/scanner/howto for more detailed instructions for building binary releases, and running the scanner.
To compile and run from source, you'll need to install rust and cargo and then from a copy of this project, you can build a binary package with:
cargo build --release
The binary will be built in the target/release directory.
Running from source
You can also just run it from the source directory with cargo run polyglot_code_scanner -- (other command line arguments) - this will be slower as it runs un-optimised code with more debug information. But it's a lot faster for development.
Getting help
See https://polyglot.korny.info for the main documentation for this project.
You can get up-to-date command-line help by running
polyglot_code_scanner -h
Known Limitations
SHA-256 Git Repositories
The scanner does not currently support scanning git repositories that use SHA-256 object hashes (created with git init --object-format=sha256). This is due to libgit2 not yet supporting SHA-256 repositories. When libgit2 adds this support, the scanner can be updated to use it.
Ignoring files
Git ignored files in .gitignore are not scanned.
You can also manually add .polyglot_code_scanner_ignore files anywhere in the codebase, to list extra files to be ignored - the syntax is the same as .gitignore's
Usage
Run polyglot_code_scanner -h for full options, this is just the main options:
USAGE:
polyglot_code_scanner [OPTIONS] --name <NAME> [ROOT]
ARGS:
<ROOT> Root directory, current dir if not present
OPTIONS:
-h, --help
Print help information
-n, --name <NAME>
project name - identifies the selected data for display and state storage
--id <ID>
data file ID - used to identify unique data files for browser storage, generates a UUID
if not specified
-o, --output <OUTPUT>
Output file, stdout if not present, or not used if sending to web server
--no-git
Do not scan for git repositories
--years <GIT_YEARS>
how many years of git history to parse - default only scan the last 3 years (from now,
not git head) [default: 3]
--prune-inactive-years <PRUNE_YEARS>
Prune (remove) git repositories that haven't been active in the specified number of
years. For example, `--prune-inactive-years 1` will remove all git repositories with
no commits in the last year. Non-git content is always preserved.
-c, --coupling
include temporal coupling data
-V, --version
Print version information
Pruning Inactive Repositories
The --prune-inactive-years flag lets you filter out dormant repositories from your analysis. This is useful for understanding active vs stale code in large workspaces.
Usage Example
# Fetch 5 years of history, but only show repos active in the last year
polyglot_code_scanner --name myproject --years 5 --prune-inactive-years 1 /path/to/workspace
How It Works
- Git roots are atomic: If a repository has any commits within the window, the entire repository is kept, including old files
- Non-git content preserved: Directories without git history are always kept (even if
--prune-inactive-yearsis specified) - Date calculation: The cutoff is based on the latest commit date in each repository, compared against the current date
Separate Windows
The --years and --prune-inactive-years parameters are independent:
--years N: Controls how far back to fetch commit history for metrics and analysis--prune-inactive-years N: Controls what to keep in the final output
This separation lets you do things like:
# Analyze 10 years of history in inactive repos, but don't show them
polyglot_code_scanner --years 10 --prune-inactive-years 1 /path/to/workspace
Running tests
To run a single named test from the command-line:
cargo test -- --nocapture renames_and_deletes_applied_across_history
The --nocapture tells rust not to capture stdout/stderr - so you can add println! and eprintln! statements to help you.
To remove some extra noise and blank lines, pipe the output through grep:
cargo test -- --nocapture renames_and_deletes_applied_across_history | grep -v "running 0 tests" | grep -v "0 passed" | grep -v -e '^\s*$'
showing logs
Rust tests don't install a logger - normally you explicitly install loggers in your main which tests don't use.
To install a logger using the fern crate, add the following to tests:
use test_shared::*;
then
install_test_logger();
This sets up a simple logger which sends logs to stdout - make sure you also use the --nocapture parameter mentioned earlier.
Pretty test output
If you want better assertions, your tests need to explicitly use the pretty_assertions crate:
use pretty_assertions::assert_eq;
Releasing new versions
Releasing uses cargo-release
The basic process is:
- update the top CHANGELOG.md entry (under 'unreleased')
- commit and push changes
- release
cargo release --dry-run
or for a minor change 0.1.3 to 0.2.0 :
cargo release minor --dry-run
License
Copyright © 2019-2022 Kornelis Sietsma
Licensed under the Apache License, Version 2.0 - see LICENSE.txt for details