webappanalyzer

February 16, 2025 ยท View on GitHub

Validator Status License

Note

This project is a continuation of the iconic Wappalyzer that went private in August 2023.

First and foremost, Enthec is committed not to set this repo private at any moment since this would be out of the scope of the company's business.

Our interest is to keep it growing, so it can be helpful to the community as it has been until now.

There are no changes to be expected in the library. We will update it with the same JSON structure currently in use so the user experience will not be modified.

Specification

A long list of regular expressions is used to identify technologies on web pages. Wappalyzer inspects HTML code, as well as JavaScript variables, response headers and more.

Patterns (regular expressions) are kept in src/technologies/. The following is an example of an application fingerprint.

Example

{
  "Example": {
    "description": "A short description of the technology.",
    "cats": [
      1
    ],
    "cookies": {
      "cookie_name": "Example"
    },
    "dom": {
      "#example-id": {
        "exists": "",
        "attributes": {
          "class": "example-class"
        },
        "properties": {
          "example-property": ""
        },
        "text": "Example text content"
      }
    },
    "dns": {
      "MX": [
        "example\\.com"
      ]
    },
    "icon": "Example.svg",
    "cpe": "cpe:2.3:a:example:example:*:*:*:*:*:*:*:*",
    "js": {
      "Example.method": ""
    },
    "excludes": [
      "Example"
    ],
    "headers": {
      "X-Powered-By": "Example"
    },
    "text": [
      "\bexample\b"
    ],
    "css": [
      "\\.example-class"
    ],
    "robots": [
      "Disallow: /unique-path/"
    ],
    "implies": [
      "PHP\\;confidence:50"
    ],
    "requires": [
      "WordPress"
    ],
    "requiresCategory": [
      6
    ],
    "meta": {
      "generator": "(?:Example|Another Example)"
    },
    "probe": {
      "/path": ""
    },
    "scriptSrc": [
      "example-([0-9.]+)\\.js\\;confidence:50\\;version:\\1"
    ],
    "scripts": [
      "function webpackJsonpCallback\\(data\\) {"
    ],
    "url": [
      "example\\.com"
    ],
    "xhr": [
      "example\\.com"
    ],
    "oss": true,
    "saas": true,
    "pricing": [
      "mid",
      "freemium"
    ],
    "website": "https://example.com",
    "certIssuer": "Example",
  }
}

JSON fields

Find the JSON schema at schema.json.

Required properties


FieldTypeDescriptionExample
cats[]intCategory ids[1, 6]
websitestringURL of the application's website"https://example.com"

Optional properties


Base

FieldTypeDescriptionExample
descriptionstringA short description of the technology"short description"
iconstringApplication icon filename"Example.svg"
cpestringApplication v2.3 CPE"cpe:2.3:a:apache:http_server:*:*:*:*:*:*:*:*"
saasbooleanSoftware As A Servicetrue
ossbooleanOpen Source Softwaretrue
pricingPricingCost indicator["low", "freemium"]

Implies, requires and excludes

FieldTypeDescriptionExample
implies[]stringThe presence of one application can imply the presence of another["PHP"]
requires[]stringSimilar to implies but detection only runs if the required technology has been identified["WordPress"]
excludes[]stringThe presence of one application can exclude the presence of another["Apache"]
requiresCategory[]intSimilar to requires, but with category ID[6]

Patterns

FieldTypeDescriptionRegexExample
cookies{string:string}Cookiestrue{"cookie_name": "Cookie value"}
domDOMQuery selectorsfalse["img[src*='example']"]
dns{string:[]string}DNS recordstrue{"MX": ["example\\.com"]}
js{string:string}JavaScript propertiestrue{"jQuery.fn.jquery": ""}
headers{string:string}HTTP response headerstrue{"X-Powered-By": "^WordPress$"}
text[]stringMatches plain texttrue["\bexample\b"]
css[]stringCSS rulestrue["\\.example-class"]
probe{string:string}Request a URL to test for its existence or match text contentfalse{"/path": "Example text"}
robots[]stringRobots.txt contentsfalse["Disallow: /unique-path/"]
url[]stringFull URL of the pagetrue["^https?//.+\\.wordpress\\.com"]
xhr[]stringHostnames of XHR requeststrue["cdn\\.netlify\\.com"]
meta{string:string}HTML meta tagstrue{"generator": "^WordPress$"}
scriptSrc[]stringURLs of JavaScript filestrue["jquery\\.js"]
scripts[]stringJavaScript source codetrue["function webpackJsonpCallback\\(data\\) {"]
html(deprecated)[]stringHTML source codetrue["<a [^>]*href=\"index.html"]
certIssuerstringSSL certificate issuerfalse"Let's Encrypt"

Patterns


Patterns are essentially JavaScript regular expressions written as strings, but with some additions.

Quirks and pitfalls

  • Because of the string format, the escape character itself must be escaped when using special characters such as the dot (\\.). Double quotes must be escaped only once (\"). Slashes do not need to be escaped (/).
  • Flags are not supported. Regular expressions are treated as case-insensitive.
  • Capture groups (()) are used for version detection. In other cases, use non-capturing groups ((?:)).
  • Use start and end of string anchors (^ and $) where possible for optimal performance.
  • Short or generic patterns can cause applications to be identified incorrectly. Try to find unique strings to match.

Tags

Tags (a non-standard syntax) can be appended to patterns (and implies and excludes, separated by \\;) to store additional information.

TagDescriptionExample
confidenceIndicates a less reliable pattern that may cause false positives. The aim is to achieve a combined confidence of 100%. Defaults to 100% if not specified"js": {"Mage": "\\;confidence:50"}
versionGets the version number from a pattern match using a special syntax"scriptSrc": "jquery-([0-9.]+)\.js\\;version:\\1"

Version syntax

Application version information can be obtained from a pattern using a capture group. A condition can be evaluated using the ternary operator (?:).

ExampleDescription
\\1Returns the first match
\\1?a:Returns a if the first match contains a value, nothing otherwise
\\1?a:bReturns a if the first match contains a value, b otherwise
\\1?:bReturns nothing if the first match contains a value, b otherwise
foo\\1Returns foo with the first match appended

Types

DOM

Dom data type can be either:

  • []string: list of query selectors

  • JSON Object: key is the query selector & value is an object that requires the following structure:

    • value requirements:
      1. {"attributes": {string: pattern}}
        • pattern can be a regex
        • pattern is compatible with tags
        • example: {"attributes": {"href": "pattern", "src": "pattern"}}
      2. {"properties": {string: pattern}}
        • pattern can be a regex
        • pattern is compatible with tags
        • example: {"attributes": {"href": "pattern", "src": "pattern"}}
      3. {"text": pattern}
        • pattern can be a regex
        • pattern is compatible with tags
      4. {"exists": ""}
        • value is an empty string
        • empty string is compatible with tags
// example []string
{
  "dom": ["img[src*='example']", "form[action*='example.com/forms/']"]
}
// example JSON Object
{
  "dom": {
    "link[href*='fonts.g']": {
      "attributes": {
        "href": "fonts\\.(?:googleapis|google|gstatic)\\.com"
      },
      "properties": {
        "container": ""
      }, 
      "text": "GLPI\\s+version\\s+([\\d\\.]+)\\;version:\\1"
    },
    "style[data-href*='fonts.g']": {
      "attributes": {
        "data-href": "fonts\\.(?:googleapis|google|gstatic)\\.com"
      },
      "exists": "\\;confidence:50"
    }
  }
}

Pricing

Cost indicator (based on a typical plan or average monthly price) and available pricing models. For paid products only.

One of:

  • low: Less than US $100/mo
  • mid: Between US $100-$1,000/mo
  • high: More than US $1,000/mo

Plus any of:

  • freemium: Free plan available
  • onetime: One-time payments accepted
  • recurring: Subscriptions available
  • poa: Price on asking
  • payg: Pay as you go (e.g. commissions or usage-based fees)