Technical Overview

June 21, 2025 · View on GitHub

VersionStatusNotes
1.1.6stableCurrent public release

LegitURL is a 100 % on-device link scanner.
It rates trust purely from technical behaviour and finishes in < 2 s—no cloud, no sign-ups, no tracking.

Schema 2025-06-09

Performance benchmarks — iPhone 16 (-A18/ iOS 18.4)

SiteHTMLJSCookiesFindingsHTML parseTotalGET
stripe.com/ae2.05 MB811032 ms42.77 ms1.49 s
store.steampowered.com0.78 MB2531838 ms37.99 ms0.81 s
societegenerale.com/fr0.08 MB6831618 ms14.50 ms0.83 s
x.com0.26 MB4174319 ms16.31 ms0.69 s

Times are wall-clock averages of three runs.

Notes

  • Total analysis = network latency + HTML parse + script/cookie/CSP checks + scoring + human-report formatting & JSON export.
  • GET shows network latency only, not internal logic.
  • Inline scripts are fully parsed in memory; external scripts are analysed via their src attribute (no extra fetch yet).

Reproduce it: run `testAnalysisEngine_PerformanceForSimpleURL()` in LegitURLTests.swift (Xcode Debug). Each run performs an actual HTTPS GET and full pipeline analysis—no mocks.


1 · Overview

1.1 . Queue & dissect

  • The entered URL is queued, split into domain / sub-domains / path / query / fragment.
  • Domain is normalised to Punycode.
  • Each part is scanned for brand-spoofs, typos, high entropy, and encoded blobs (Base64, %-encoding, etc.).

1.2 . One clean GET

  • LegitURL fires a single HTTPS GET with a vanilla iOS User-Agent and no query-string.
  • Any redirect target is pushed back into the same queue, so every hop is inspected in turn.

1.3 . Signal sweep

  • TLS chain -> age, SAN match, issuer.
  • Headers -> CSP, HSTS, Referrer-Policy, leaks.
  • Cookies -> flags, entropy, redirect phase.
  • HTML & inline JS -> structure, risky patterns, nonce/hash ties.

2 · How LegitURL Works

LegitURL scores a link solely from its technical behaviour.
It dissects the URL offline, then performs one sandboxed GET to capture TLS, headers, cookies, HTML, and scripts.
Deterministic heuristics award or deduct points; a bit-flag engine escalates dangerous combinations.
Reputation is ignored—only concrete signals matter.
The result: a transparent 🟩/🟧/🟥 verdict and a fully-traceable findings log.

The pipeline ends by rendering two outputs:
• a human-readable HTML/PDF report,
• a compact JSON blob optimized for LLMs (machine-readable)
For format details see § 3.6 Export & LLM Integration for format details.
For real-world phishing and redirect examples, see Examples including PDF, JSON, and LLM responses from multiple models.

flowchart TD
  A[Paste / Scan URL] --> B[Offline parse & decode]
  B -->|critical hit| V[Verdict 🟥]
  B -->|no critical| C[Single HTTPS GET]

  C --> D[TLS]
  C --> E[Headers / CSP]
  C --> F[Cookies]
  C --> G[HTML & inline JS]

  D --> H[Detection Engine]
  E --> H
  F --> H
  G --> H

  H --> I[Score & bit-flags]
  H --> K[If New URL detected]
  K --> B

  I --> V
  V --> R[HTML / PDF report]
  V --> J[LLM-ready JSON]   

3 · Detection Engine

LegitURL inspects every layer, from raw URL to inline JS, to surface deterministic, explainable signals.

Compact reference sets

  • Scam keywords / homoglyphs
  • Brand whitelist
  • Suspicious TLDs / infra
  • Risky JS APIs

All bundled offline; users can extend keywords and brand.

Impersonation & obfuscation

  • Levenshtein + n-gram catch typos (gooogle.com) and brand hijacks in sub-domains/paths.
  • Entropy + decoding Base64 / %-encoded blobs in path and queries / fragments.

Five-layer rule grid

LayerFocus
1URL pieces – host, path, query
2TLS – CN/SAN, chain, age
3Headers – CSP, HSTS, leaks
4Cookies – size, entropy, flags
5HTML / JS – tag mix, risky calls

Each rule gives a point delta and, when critical, sets a bit-flag.
Flags accumulate across redirect hops to expose compound risks.

Context-aware weighting

ExampleHeavier penalty
Tracking cookie on 3xx vs 2003xx (redirect tracking)
Brand term in foreign hostNon-brand domain

INFO-only notes

Low-risk facts (fresh DV cert, tiny cookie, benign internal redirect) log as INFO—no score hit unless paired with other flags.

3.1 URL & Encoding

LegitURL inspects every link locally—before any network traffic.
If a critical offline signal appears, the online phase is skipped.

Split & scan

The URL is split into domain · sub-domains · path · query · fragment.
Each part is checked for:

SignalExample
Brand spoofsecure-paypal-login.com
Look-alikeCyrillic + Latin mix, homoglyphs
Scam wordsaccount-verify, login-secure
Encoded blobshidden emails, UUIDs, nested URLs

Weighting

  • Domain + sub-domains » highest score impact
  • Path + fragment » behavioural context
  • Query values -> recursively decoded by Lamai (custom decoder)

Technical pipeline

StepPurpose
PSL + URLComponentsAccurate domain vs. sub-domain split
Punycode normalisationASCII comparison for IDNs
Tokenise on _/-Expose embedded words
Mixed-script checkFlags Cyrillic/Latin blends

Spoof & entropy heuristics

  • Levenshtein ≤ 1 → typo-squats
  • 2-gram score → brand smushing (pay‐paI)
  • System dictionary → confirm real words
  • Shannon entropy → flag random strings

Path / query / fragment drill-down

  1. Recursive decode – Base64, %, Unicode escapes
  2. Structure scan – emails, IPs, UUIDs, nested URLs
  3. Re-queue any decoded URL (depth ≤ 5)

Pruning: branches that resolve to high-entropy noise are cut early to avoid decoding depth bombs.

3.2 TLS Analysis

LegitURL parses the full X.509 chain after URLSession trust.
It layers deterministic heuristics on top—no extra network calls.

CheckWhat we inspectRisk signal
Age windowNew < 7 days • Stale > 13 months.Fresh DV + weak headers ⇒ throw orange; stale cert ⇒ maintenance doubts.
CA policyDV / OV / EV via policy OID.+ pts for EV/OV; neutral for DV unless combined with other flags.
SAN coverageWildcard vs. enumerated FQDNs; unrelated hosts count.> 25 mixed SANs without wildcard ⇒ common in burner infra.

Why not custom trust?
Apple rejects apps that bypass URLSession PKI. LegitURL therefore accepts system trust.

System-trust blind spots

URLSession only says “handshake failed,” never why (missing leaf, bad chain, untrusted root).
LegitURL treats any such failure as critical and stops further online checks.

3.3 Header & Content-Security-Policy

LegitURL treats CSP as the front line of client-side defence.
Rules fire in this order:

GoalRuleScore impact
PresenceNo Content-Security-Policy headerHeavy penalty
Only …Report-Only presentHeavy penalty, but still parsed
SyntaxHeader must end with ;Auto-append if missing (INFO)
Split directives on ; → build map
Mandatory sourcescript-src missing → fall back to default-src → if both absent, check require-trusted-types-for 'script'Treat as incomplete CSP (same weight as missing)
Bit-flagsDirective/value combos mapped to flags (UNSAFE_INLINE, UNSAFE_EVAL, STRICT_DYNAMIC, …)Flags feed scorer
Inline nonce / ext-origin matchInline-script nonce or SHA must match script-src; external script host must be in allow-listMismatch ⇒ penalty • Correct nonce/SHA ⇒ bonus

A strict CSP won’t stop every attack, but it shrinks the attack surface.
When a header is absent or weak, LegitURL proposes a sample script-src directive:

  • hashes for ≤ 5 inline scripts, explicit hosts for external scripts, and nonce-*** guidance when inline count > 5.
    This example is advisory; heavy, dynamic sites will need additional tuning.

Other security headers

HeaderCheckExpected / Penalty
Strict-Transport-SecurityPresent + max-age ≥ 31536000 + includeSubDomainsMissing or short age ⇒ penalty
X-Content-Type-OptionsValueMust be nosniff
Referrer-PolicyValuestrict-origin or stricter
Server / X-Powered-ByVersion leakage (apache/2.4, php/8.2)INFO if header present but no version • Suspicious if version leaks

3.4 Cookies

Every HTTPCookie is reduced to a bit-flag bundle:

FlagTrigger
httpOnlyHttpOnly present
secureSecure present
persistentExpires > 30 days
smallValueValue < 16 B
largeValueValue > 64 B
highEntropyValueShannon > 3.5 bits/char
setOnRedirectArrives on 3xx / 4xx / 5xx
sameSiteNoneSameSite=None
sameSiteMissingAttribute absent (not defaulted to Lax)

Why treat missing SameSite as risky?

  1. Mobile tracking kits still rely on no-SameSite for cross-site POST redirects.
  2. Silent browser defaults hide developer intent; explicit ≥ implicit.
  3. URLSession flattens duplicate Set-Cookie headers, making detection brittle—LegitURL errs on caution.

Flag → penalty matrix

Flags raisedScore impact
smallValue + low entropyIgnored — likely benign session ID
httpOnly onlyMinor (CSRF-related, but low risk for read-only scan)
largeValue + highEntropyValueTracking — strong penalty
Any flag on a 3xx hopAdditional penalty (redirect tracking)

LegitURL simulates a first-visit GET: no existing cookies, no query params.
Any cookie set pre-consent (e.g., RGPD banners) is therefore visible.
Scoring weights may evolve as privacy regulations and browser defaults change.

3.5 HTML & JavaScript

The full body is tokenised on-device; the UI shows only the first 1.2 MB of HTML and 3 KB per inline block, but scoring uses the entire stream.

StageWhat we doRisk signal
1. Boundary scanTest first / last ± 500 B for <html> & </html>.Missing open ⇒ critical (non-HTML) • Missing close ⇒ moderate
2. Tag discoverySingle-pass “< byte” lexer finds <head>, <body>, <script>, and closers.
3. Head checksLook for <meta http-equiv="Content-Security-Policy">.Extra CSP here vs. header ⇒ inconsistency (warning)
4. Script block parseExtract nonce=, src=, integrity=, type=. Log CORS attrs.Non-matching nonce/SHA ⇒ penalty • SRI present ⇒ bonus
5. Origin classMap each src to 'self', 3rd-party, protocol-relative, data:.3rd-party script on site with weak CSP ⇒ penalty
6. Byte-wise JS scanInline code concatenated, then regex-light scan around ( and . tokens.
7. Risk patternsMatch against risky API list (eval, atob, document.write, …) and flagged pairings (e.g. getElementById.submit()).Each match adds a per-API delta; paired patterns add combo flag

3.6 Export & LLM Integration

LegitURL turns its findings into two ready-to-share artefacts:

OutputFormatPurpose
Human reportHTML / PDF, one page per hopShareable for audits or incident tickets
Machine bundleCompact JSONFeed directly to GPT-4, Claude, etc.

JSON schema

  • Stored as KeyValuePairs → preserves ordering, zero-padded keys (04a_, 04b_)

  • All prefixes stripped after serialisation → models see clean keys

  • Two size profiles:

    • brief – topline verdict + key signals (token-friendly)
    • full – domain breakdown, cookies, TLS, headers, scripts, inline snippets
  • No penalty values are embedded — the LLM is free to reason from raw signals.

Token hint:
LegitURL estimates cost by byte-length ÷ 4. A banner warns if export > 8 k tokens (~32 KB).

ActionClipBoard Content
Explain FindingMinimal JSON for the selected signal
Full ReportFull schema block
  • Exploration note - the ChatGPT iOS app currently exposes no share‑sheet hook for pre‑filled content in JSON, so this feature is on hold.

4 · Scoring & Bit-Flag Model

Analysis starts at 100 points once the first request is queued.
As LegitURL follows the entire redirect chain, every hop can deduct points from this single running total.

Individual penalty examples

SignalTypical hit
Scam keyword in sub-domain−20
Watch-list brand misuse in sub-domain−25
High-entropy / obfuscated path−10
Dangerous JavaScript pattern−30
Fresh DV certificate (< 7 d)−10
Tracking cookie on 3xx−15
CSP missing or no script-src, default-src, or require-trusted-types-for 'script'−50

Weights are versioned and will evolve.

Context matters

Same signal, different contextResult
applepie.com vs secure-apple.comOnly the second triggers brand-spoof penalty.
Cookie on 200 OKMild warning
Same cookie set on 3xx redirectHigher penalty (tracking during redirect)

Bit-flag engine

LegitURL tags some finding with a bit flag (DOMAIN_SCAM_OR_PHISHING, TLS_IS_FRESH, HEADERS_CSP_MALFORMED, …).
Combinations drop the score faster than individual hits—catching situations where separate “yellow” signals combine into a clear red flag.

Example comboFlags raisedSeverity
Scam keyword in sub-domain + watch-list brand in domainSCAM_WORDBRAND_HIJACKCritical
Fresh DV cert + weak headers + malformed HTMLFRESH_DVWEAK_HDRHTML_MALFORMDangerous
Hop 1: SCAM_WORD → Hop 2: BRAND_HIJACKChain flags propagateCritical

Why bit-flags? They provide deterministic, explainable downgrades and let new heuristics slot in without rewriting the whole weight table.

5 · Examples

Example 1: Brand impersonation with suspicious TLD (Score: 0/100)

If the user has correctly added bankoftrust.com to their watchlist, the app will:

  • Treat bankoftrust.com as a trusted root domain
  • Skip domain-level penalty checks for it
  • Still flag any use of “bankoftrust” in unrelated subdomains or domains

Pasted URL: https://secure-login.trustedbank.com.userauth-check.info/session?token=xyz

URL Breakdown:

  • Domain: userauth-check
  • TLD: .info
  • Subdomain: secure-login.trustedbank.com
  • Path: /session
  • Query: token=xyz

Offline analysis:

ComponentObservationSignal TypeAction
Domainuserauth not in dictionaryWeak signalNo penalty
TLD.info has poor reputationModerate signal-20 penalty
SubdomainContains scam/phishing terms + brand impersonation (trustedbank.com)Critical comboHeavy penalty
Pathsession resembles API endpoint (expects value)Contextual weak signal-10 penalty
QueryPassed to Lamai, no relevant signals foundNo penalty

Conclusion:

  • Subdomain + .info TLD + API-style path forms a high-risk pattern
  • Offline logic recognizes this combo and applies a critical penalty
  • Total score drops to 0/100
  • Online check is skipped — it's already flagged as too risky

Verdict:

This URL impersonates a known brand using a deceptive subdomain, a suspicious TLD, and a query path that mimics login flow.
Final Score: 0/100 — flagged as DANGEROUS

Example 2: Redirect chain with tracking cookies and suspicious scripts (Score: 0/100)

Let’s say a user encounters a shortened link in a promoted X.com post:
Pasted URL: bit.ly/mihoyanagi


Initial URL Breakdown:

  • Domain: bit
  • TLD: .ly
  • Path: /mihoyanagi

Offline analysis:

ComponentObservationSignal TypeAction
PathNot recognized by dictionaryNoneNo penalty
Redirect301 -> domain changesWeak signal-10

--> Score 90


Online analysis begins

Request sent with real iOS User-Agent and clean headers


Redirect 1: https://jolyvip.com/mihoyanagi

ComponentObservationSignal TypeAction
PathNot recognized by dictionaryNoneNo penalty
Redirect302 → domain changes againWeak signal-10
TLS4 days oldModerate signal-10
Cookie 110 bytes, no flags, 31-day lifespan, SameSite=LaxWeakNo penalty
Cookie 2213 bytes, no flags, SameSite=LaxModerate-15

Final URL: https://coingrok.io

ComponentObservationSignal TypeAction
HTML Body74% inline JavaScript, script density 1.282Suspicious-25
CSPMissing-50
X-Powered-ByNext.js backend leakedWeak signal-5
Server HeadercloudflareInformationalNo penalty

Verdict:

This link leads through a redirect chain with cookie abuse, tracking attempts, and excessive inline scripts.
Final domain leaks stack metadata and hosts CSP violations.
Final Score: 0/100 — flagged as DANGEROUS

Example 3: Cloaked scam infrastructure via shared TLS Certificate (Score: 0/100)

Let’s consider the following link:
https://www.man-entreprise.com/vrp/ayxxxxxxx/yyyy
(Query parameters have been altered to avoid exposing personal data.)


Initial URL Breakdown:

  • Domain: man-entreprise
  • TLD: .com
  • Path: /vrp/ayxxxxxxx/yyyy

Offline Analysis:

ComponentObservationSignal TypeAction
DomainCleanNo penalty
PathNot recognized, not suspiciousNo penalty

→ Score remains 100


Online Analysis Begins

Request sent with real iOS User-Agent and clean headers


Redirect 1: https://ed.manageo.biz/clt-su/SFR/formulaire16_2tps.jsp?...

ComponentObservationSignal TypeAction
Redirect302 to .biz domainModerate-10
TLD.biz — poor reputationModerate-15
Query stringMalformed, some keys empty, odd charactersSuspicious-15
TLSDV cert (Let's Encrypt), ~10 days oldInformationalNo penalty

TLS Certificate (from man-entreprise.com)

AttributeValue
TypeDV (Domain Validation)
IssuerLet's Encrypt
Age10 days
SAN Entries76 fully-qualified domains, unrelated, no wildcard
WildcardNone

→ Strong signal of cloaking infrastructure via shared certificate


Final URL: https://ed.manageo.biz

ComponentObservationSignal TypeAction
Response200 OK
CookieJSESSIONID missing Secure flagWeak-10
Script originUndetectable or malformedSuspicious-15
Script density1.325 scripts per 1000 bytesAbnormally high-15
TLSDV cert, Let's Encrypt, 25 unrelated SANsInfra signal-30

Verdict:

This link leads to a suspicious redirect chain starting from a clean domain that shares a Let's Encrypt DV certificate with 76 unrelated sites.
It lands on a .biz domain serving cloaked or obfuscated JavaScript, with leaked personal data and malformed query patterns.
The final destination shares a similarly structured certificate — indicating shared scam infrastructure at scale.

Final Score: 0/100 — flagged as CRITICAL

Example 4: Popular sites that score poorly when judged blind

These sites are globally recognized — but when analyzed blindly, as if they were unknown, their setups fall short.

SiteScoreKey issues
www.google.com49/100CSP is report‑only (unsafe-eval); sets tracking cookies
m.youtube.com18/10092 % of HTML is JS; one inline script > 100kb; tracking cookies; missing </body>; no object-src , no Referrer-Policy
m.facebook.com0/10096 % JS; three large cookies ; one inline is scrip > 100kb; nonce value entropy is low ; editing cookie in JS despite non accessible JS cookie ; unsafe-eval present
amazon.com20/100Inline document.write; CSP header missing, no X-Content-Type, no Referrer-Policy

These aren’t scams - but if we didn’t already trust them, nothing in their technical behavior would give trust signal.
This shows hygiene gaps, not proven scams. Inline scripts are currently penalized equally; future versions will reduce the penalty if nonce-matching is detected.

Example 5: Popular sites that score good

Some high-profile sites make a visible effort to secure users — and it shows.

SiteScoreNotes
stripe.com100/100one cookie set and JS-accessible, CSP is missing object-src
immatriculation.ants.gouv.fr100/100heavy page (3MB) 95% is JS; CSP allows 5 script sources, but only 1 is used
apple.com80/100CSP includes unsafe-inline and unsafe-eval; weak Referrer-Policy

Stripe clearly wants to appear trustworthy — and backs it up with real protections al inline are protected by a hash, and a EV cert.
The French government site is solid and has a EV cert.
Apple's CSP still allows unsafe-inline/unsafe-eval; referrer policy is lax, the EV cert helps.

6 · Case Studies

6.1 SAN-pattern analysis – spotting cloaked infrastructure

Why 76 unrelated SANs = red flag, and counter-exemple

### SAN‑pattern analysis – spotting cloaked infrastructure

LegitURL examines the Subject Alternative Name (SAN) list for signals that a certificate is protecting more than just a normal “set of sub‑sites.”

#### Red‑flag pattern

Many (10 – 100) unrelated FQDNs, no wildcard entries, and a certificate age < 30 days - especially when the issuer is Let’s Encrypt, strongly suggests throw‑away scam infra.

Why this pattern is rare on legitimate sites
Wildcards are cheaper to maintain — orgs usually issue *.example.com.
Let’s Encrypt wildcards require DNS‑01 — attackers prefer HTTP‑01 because they don’t own DNS.
Dozens of disparate FQDNs make no operational sense for normal businesses but are perfect for phishing kits, redirect chains, and disposable botnet mailers.

A SAN list packed with unrelated domains is therefore scored as “Cloaked Infra”.
When combined with weak headers, shady TLDs (.biz, .click), or obfuscated JavaScript, the overall risk escalates quickly.

#### Counter‑example: Steam

  • steampowered.com → 48 SAN entries, Let’s Encrypt DV, no wildcard.
  • Immediate redirect to store.steampowered.com → EV cert, only 2 SANs.

Because the chain lands on a stronger, scoped EV certificate, LegitURL waives the penalty for the first hop, context overrules the raw SAN count.

6.2 LLM interpretation – “portal-finance.net”

How Claude Sonnet 4 explained the risk from the raw JSON export

One real-world example involved the domain portal-finance.net:

  • The model correctly:
    • Flagged the domain as visually and phonetically similar to “binance”, using the app’s Levenshtein and n-gram results
    • Identified the external file /hidden/payload.js as suspicious due to its naming
      • This is exactly what LegitURL was designed to highlight — the model connected filename patterns to real threats
    • Noted the absence of CSP, HSTS, and SRI headers
    • Connected the TLS issue date (same-day issuance) to likely phishing deployment

All without relying on brand blocklists — purely via behavioral pattern recognition.

This confirmed that the combination of exported JSON + well-primed instructions can enable GPT-4o, Gemini, Claude or DeepSeek to produce a strong, accurate risk explanation from raw technical input. While the priming is still refined manually, the balance between guiding and overconstraining the model remains a key challenge.

The model’s ability to leverage background knowledge is exceptionally powerful. Prior attempts to detect such threats relied on static brand lists, dictionary lookups, or entropy checks — all of which struggled against infinite variations. This new pipeline offers a more flexible and adaptive path forward.

As of now, LegitURL measures JSON export efficiency not only by minimizing token size (still a work in progress), but also by evaluating the cognitive load placed on the model,that is, how easily it can extract structure, identify key signals, and reason across findings.

LegitURL remains a fully offline local tool, but this JSON export adds a new layer: enabling models to provide meaningful insight without deep technical parsing. Suspicious inline script snippets are extracted, structured, and passed in clean blocks — letting models focus on interpretation, not AST traversal.

Claude Sonnet 4:

Looking at the technical behavior of portal-finance.net, I see several concerning signals that suggest this site should be approached with extreme caution.

## Critical Red Flags

**Domain Impersonation**: The domain "portal-finance.net" appears designed to mimic Binance, one of the world's largest cryptocurrency exchanges. The analysis shows "FINANCE" has a Levenshtein distance of only 1 from "binance" and 71% structural similarity. This is a classic impersonation tactic used by scammers to trick users into thinking they're visiting a legitimate financial platform.

**Brand New Certificate**: The TLS certificate was issued just yesterday (May 29, 2025) - literally 0 days old when tested. Legitimate financial services don't typically launch with same-day certificates. This suggests the site was created very recently, which is common with scam operations that get shut down and recreate quickly.

**Suspicious Script Loading**: The site loads an external script from a relative path `./assets/hidden/payload.js` - the name "payload" combined with being in a "hidden" directory is highly suspicious. Legitimate sites don't typically name their JavaScript files "payload" or hide them in directories with such obvious names. This could contain malicious code.

## Security Hygiene Issues

The site exhibits poor security practices across multiple areas:

- **Missing Security Headers**: No Content Security Policy, X-Content-Type-Options, Referrer-Policy, or HSTS headers. This creates vulnerabilities and suggests the operators don't follow security best practices.
- **No Script Integrity**: External scripts lack Subresource Integrity (SRI) hashes, meaning if the jQuery CDN or the local server were compromised, malicious code could be injected.
- **Aggressive Caching Prevention**: The "no-store" cache control header prevents any caching, which while not inherently malicious, is unusual for static content and could indicate frequently changing content.

## Behavioral Context

The inline JavaScript appears benign (clipboard copying functionality), but this could be a facade. The real concern is the external "payload.js" script that wasn't analyzed but has a highly suspicious name and location.

## Trust Assessment

This combination of signals - domain impersonation of a major crypto exchange, same-day certificate issuance, suspicious script naming, and poor security hygiene - creates a pattern consistent with cryptocurrency scam sites. These operations typically create convincing-looking finance portals to steal login credentials or crypto wallet information.

A cautious user should avoid this site entirely. The technical behavior suggests it was hastily constructed to impersonate Binance, likely for fraudulent purposes. The risks far outweigh any potential legitimate use case.​​​​​​​​​​​​​​​​

7 · Architecture & Dependencies

LegitURL is built almost entirely on Swift Foundation.
Only two third‑party libraries are used:

DependencyPurposeLicense
ASN1DecoderDecodes X.509 certificates (CN, SAN, issuer, extensions).MIT
PunycodeSwiftConverts IDNs to ASCII (ACE) for uniform comparisons.MIT

Internal reference lists

DatasetUsed forEditableStored as
Mozilla PSLDomain split, mixed-script checkSQLite
Brand whitelistBrand-spoof detectionJSON
Scam keywordsSub-domain & path heuristicsJSON
Risky JS APIsInline-script scanSwift array

Lists load at startup and can update in-app without a new binary.

Matching & scanning strategy

  • Single O(n) sweep keeps total run time ≈ 2 s.
  • String checks: .contains on case-folded ASCII.
  • Byte lexer: skips whitespace/LF; extracts tags & JS tokens without a full AST.

Lamai decode tree

  1. Base64 (auto-pad)
  2. URL / Unicode escapes
  3. Branch on each decode (depth ≤ 5)
  4. Node tests: brand/scam hits, UUID, IP, e-mail, nested URL, JSON shape
  5. If branch is high-entropy junk → prune

Decoded URLs re-enter the offline queue, ensuring nested phishing links score like any top-level target.

8 · License & Contribution

LegitURL is distributed under the GNU AGPL v3 license.
You may use, study, and modify the code—even in commercial projects—but any derivative that runs over a network must publish its own source under the same license.
For the full terms see LICENSE or the AGPL FAQ.

How to Contribute

Pull requests are welcome! If you find a bug, spot a false positive, or want to add a new heuristic.

But feel free to open an issue or email me if you spot something weird or want to help.