yarGen-Go
February 7, 2026 ยท View on GitHub

yarGen-Go
A Go rewrite of yarGen (Python) by Florian Roth - an automatic YARA rule generator.
Overview
yarGen-Go generates YARA rules from strings found in malware files while removing all strings that also appear in goodware files. It includes:
- yargen - Main rule generator (CLI + web server)
- yargen-util - Database management utility
Getting Started
Linux/macOS:
- Prerequisites: Install Go 1.22+
- Build: Clone repository or download the ZIP and extract it
- Build binaries: Run the following commands:
go mod tidy go build -o yargen ./cmd/yargen go build -o yargen-util ./cmd/yargen-util - Databases: Run
./yargen-util updateto download goodware databases - Configure (Optional): Copy
config/config.example.ymltoconfig/config.yamland set your LLM API key - Use: Run
./yargen serveand open the Web UI at http://127.0.0.1:8080
Windows:
- Prerequisites: Install Go 1.22+
- Build: Clone repository or download the ZIP and extract it
- Build binaries: Run the following commands:
go mod tidy go build -o yargen.exe .\cmd\yargen go build -o yargen-util.exe .\cmd\yargen-util - Databases: Run
yargen-util.exe updateto download goodware databases - Configure (Optional): Copy
.\config\config.example.ymlto.\config\config.yamland set your LLM API key - Use: Run
.\yargen.exe serveand open the Web UI at http://127.0.0.1:8080
๐ For detailed setup instructions, see the Step-by-Step Setup Guide
Features
- ASCII and UTF-16LE (wide) string extraction
- Opcode extraction from PE/ELF executables
- Encoding detection: Base64, hex-encoded, reversed strings
- Magic header and filesize conditions
- Super rule generation (overlapping string patterns across files)
- Customizable scoring rules (SQLite-backed, editable via Web UI)
- Efficient LLM integration for string selection (OpenAI, Anthropic, Gemini, Ollama)
- Only submits prefiltered top candidates (no goodware strings, max 500 from automatic evaluation)
- Requests numbered list instead of full strings to minimize token usage
- Significantly reduces API costs compared to naive approaches
- Web UI for rule generation and scoring rules management
Installation (Alternative Methods)
Using Pre-built Binaries
Download pre-built binaries from the Releases page for your platform.
Using Go Install
go install github.com/Neo23x0/yarGen-Go/cmd/yargen@latest
go install github.com/Neo23x0/yarGen-Go/cmd/yargen-util@latest
Binaries will be installed to $GOPATH/bin or $HOME/go/bin (add to PATH if needed).
Usage
CLI Mode
# Basic usage
yargen -m ./malware-samples
# With options
yargen -m ./malware-samples \
-o rules.yar \
-a "Your Name" \
-r "Internal Research" \
--opcodes \
--score
# Show all options
yargen -h
Web UI Mode
# Start web server on localhost:8080
yargen serve
# Custom port
yargen serve --port 3000
Then open http://127.0.0.1:8080 in your browser.
Database Management
# Download built-in databases from GitHub
yargen-util update
# List all databases
yargen-util list
# Create new goodware database
yargen-util create -g /path/to/goodware -i mydb
# Append to existing database
yargen-util append -g /path/to/more/goodware -i mydb
# Inspect database
yargen-util inspect ./dbs/good-strings-mydb.db
# Merge databases
yargen-util merge -o combined.db db1.db db2.db
Configuration
Default Config Location:
- The default config file is
./config/config.yaml(in the project directory) - For backward compatibility, the application will automatically check
~/.yargen/config.yamlor~/.yargen/config.ymlif the default location doesn't exist - Use the
--configflag to specify a different config file path - Example:
./yargen serve --config /path/to/custom/config.yml
Quick Setup:
- Copy the example config:
cp config/config.example.yml config/config.yaml(see Step 5 in the Setup Guide for details) - Edit the file to match your LLM provider
- Set your API key as an environment variable
Example Configuration (from config/config.example.yml):
llm:
provider: "openai" # openai, anthropic, gemini, ollama
model: "gpt-4o-mini"
api_key: "${OPENAI_API_KEY}" # Uses environment variable
endpoint: "" # For ollama: http://localhost:11434
timeout: 60
max_candidates: 500
database:
dbs_dir: "./dbs"
scoring_db: "~/.yargen/scoring.db"
defaults:
author: "yarGen"
min_string_length: 8
max_string_length: 128
min_score: 0
max_strings: 20
super_rule_overlap: 5
filesize_multiplier: 3
include_opcodes: true
num_opcodes: 3
server:
host: "127.0.0.1"
port: 8080
Environment Variables:
The config file supports environment variable expansion using ${VARIABLE_NAME} syntax. Common variables:
OPENAI_API_KEY- OpenAI API keyANTHROPIC_API_KEY- Anthropic API keyGEMINI_API_KEY- Google Gemini API key
Custom Config Location:
If you prefer to use a config file in your home directory (e.g., ~/.yargen/config.yml), use the --config flag:
./yargen serve --config ~/.yargen/config.yml
See Step 5 in the Setup Guide for platform-specific environment variable setup instructions.
CLI Flags
Rule Creation
| Flag | Description | Default |
|---|---|---|
-m | Path to malware directory | required |
-y | Minimum string length | 8 |
-z | Minimum score threshold | 0 |
-x | High-scoring string threshold | 30 |
-w | Super rule overlap threshold | 5 |
-s | Maximum string length | 128 |
-rc | Max strings per rule | 20 |
--excludegood | Exclude all goodware strings | false Note: By default, goodware strings receive very low scores but are still included as they can be useful when combined with more specific strings in a malware sample. This flag forces complete removal of all goodware strings from the candidate set. |
--opcodes | Enable opcode extraction | false |
-n | Number of opcodes to include | 3 |
Rule Output
| Flag | Description | Default |
|---|---|---|
-o | Output rule file | yargen_rules.yar |
-a | Author name | "yarGen" |
-r | Reference | "" |
-l | License | "" |
-p | Rule description prefix | "" |
-b | Identifier | (folder name) |
--score | Show scores as comments | false |
--nosimple | Skip simple rules in super rules | false |
--nomagic | No magic header condition | false |
--nofilesize | No filesize condition | false |
-fm | Filesize multiplier | 3 |
--nosuper | Disable super rules | false |
General
| Flag | Description | Default |
|---|---|---|
--config | Config file path | ./config/config.yaml |
--nr | Non-recursive scan | false |
--oe | Only executable extensions | false |
-fs | Max file size (MB) | 10 |
--no-llm | Disable LLM | false |
--debug | Debug output | false |
Scoring System
yarGen-Go uses a customizable scoring system to rank extracted strings. Scores accumulate when multiple rules match.
Built-in Rules (~80 rules)
Categories include:
- Reductions (negative scores):
.., triple spaces, packer strings - File paths (+2 to +4): drive letters, extensions
- System keywords (+5): cmd.exe, system32
- Network (+3 to +5): protocols, IP addresses
- Malware keywords (+5): RAT, spy, inject
- Encoding (+5 to +10): Base64, hex-encoded, reversed strings
- PowerShell (+4): bypass, encoded commands
Custom Rules
Manage scoring rules via the Web UI:
- Add/edit/delete rules
- Enable/disable rules
- Import/export as JSON
- Three match types: exact, contains, regex
Web UI
The Web UI provides:
- Generate Page - Upload files, configure options, generate rules
- Scoring Rules Page - Manage built-in and custom scoring rules
- Settings Page - View LLM configuration status
Features:
- Drag-and-drop file upload
- Real-time rule generation progress
- Download generated .yar files
- CRUD operations for scoring rules
- Import/export scoring rules as JSON
AI Agent Skill
For users working with AI assistants (like OpenClaw, Claude Desktop, or other MCP-based agents), a dedicated yarGen Skill is available to streamline rule generation workflows.
What the Skill Provides
The skill embeds yarGen expertise into your AI assistant, enabling:
- One-shot sample submission - Submit a file, get YARA rules back via
yargen-util submit - Batch rule generation - Process entire directories with one command
- Database management - Update, create, and inspect goodware databases
- API integration - Direct REST API access for automation
Quick Start with AI
# Submit single sample (easiest)
yargen-util submit -a "Your Name" malware.exe
# Batch generate from directory
yargen-generate.sh ./malware-samples --opcodes
# Manage databases
yargen-db.sh update
Installation
Clone the skill into your agent's skills folder:
# For OpenClaw
git clone https://github.com/Neo23x0/yargen-go-skill.git ~/.openclaw/skills/yargen
# Or copy from your local workspace
cp -r ~/clawd/skills/yargen ~/.openclaw/skills/
Skill Features
| Capability | Description |
|---|---|
| Submit | Submit samples via yargen-util submit with automatic polling |
| Generate | Batch process directories with string/opcode extraction |
| Database | Download, create, merge, and inspect goodware databases |
| API | REST API client for integration workflows |
The skill includes:
SKILL.md- Core documentation for agent contextREADME.md- Full usage guide- Helper scripts for common workflows
- API reference and database best practices
Repository: github.com/Neo23x0/yargen-go-skill
Memory Requirements
- Minimum: 4 GB RAM
- With opcodes: 8 GB RAM
The goodware database is loaded entirely into memory for O(1) lookups.
Screenshots

Project Structure
yarGen-Go/
โโโ cmd/
โ โโโ yargen/ # Main binary
โ โโโ yargen-util/ # Database utility
โโโ docs/
โ โโโ SETUP.md # Step-by-step setup guide
โโโ internal/
โ โโโ config/ # YAML configuration
โ โโโ database/ # Goodware DB loading/saving
โ โโโ extractor/ # String/opcode extraction
โ โโโ filter/ # String filtering & scoring
โ โโโ llm/ # LLM integration
โ โโโ rules/ # YARA rule generation
โ โโโ scanner/ # File scanning
โ โโโ scoring/ # Scoring engine & SQLite store
โ โโโ service/ # Core service layer
โ โโโ web/ # HTTP server & static files
โโโ config/
โ โโโ config.example.yml
โโโ go.mod
โโโ README.md
License
See LICENSE file for details. Same license as the original yarGen project (GPL-3.0).
Credits
yarGen-Go is a Go rewrite of yarGen (Python), created by Florian Roth.