SiteOne Crawler: JSON Output Documentation

March 16, 2026 ยท View on GitHub

Table of Contents

This document describes the structure and content of the JSON output file generated by the SiteOne Crawler. This JSON file contains detailed information about the crawled website, including metadata about the crawl process, results for each visited URL, quality scores, summary findings, and various analysis tables.

1. Introduction

The JSON output provides a comprehensive dataset about the crawled website. Key information includes:

  • Crawl Metadata: Details about the crawler execution, such as version, execution time, command used, hostname, and the final user agent.
  • Options: A complete record of all CLI configuration values used for the crawl.
  • Quality Scores: Overall and per-category quality scores (0-10) with deduction details.
  • Visited URL Results: For each URL visited during the crawl:
    • URL address
    • HTTP status code
    • Elapsed time for the request (performance)
    • Size of the response body
    • Content type (HTML, CSS, JS, Image, etc.)
    • Caching information (cache flags, lifetime)
    • Additional analysis results stored in the extras field.
  • Stats: Aggregate statistics about the crawl (total URLs, sizes, timings, status code counts).
  • Summary: A list of findings (OK, Warning, Critical, Info) that feed into quality scoring.
  • Analysis Tables: Aggregated data and specific findings presented in structured tables:
    • Skipped URLs: Reasons why certain URLs were not crawled (e.g., external domain, disallowed by robots.txt, specific rules).
    • Redirects: List of URLs that resulted in redirects (3xx status codes).
    • 404 Errors: List of URLs that resulted in a 404 Not Found status.
    • SSL/TLS Info: Details about the website's SSL certificate (issuer, subject, validity dates, supported protocols).
    • Performance: Tables listing the fastest and slowest URLs encountered during the crawl.
    • SEO & Content:
      • SEO metadata (title, description, keywords, H1, indexing directives) for HTML pages.
      • OpenGraph and Twitter Card metadata.
      • Heading structure analysis (correctness of H1-H6 hierarchy).
      • Analysis of non-unique titles and descriptions across pages.
    • Technical Details:
      • HTTP Headers: Summary of headers found, their occurrences, and unique values.
      • Caching Analysis: Breakdown of caching strategies by content type and domain.
      • DNS Information: DNS resolution details for the target domain.
      • Security Analysis: Evaluation of security-related HTTP headers.
      • External URLs: List of external URLs discovered during the crawl.
    • Crawler Statistics: Performance metrics for the crawler itself, individual analyzers, and content processors.

2. Potential Use Cases

The detailed data within the JSON output enables a wide variety of use cases:

  1. Comprehensive SEO Audits: Analyze titles, descriptions, heading structures, indexing status, and OpenGraph tags across the entire site.
  2. Performance Monitoring & Optimization: Identify the slowest pages and resources, analyze load times, and check caching headers.
  3. Broken Link Checking: Easily extract lists of all 404 errors and the pages where they were found.
  4. Redirect Chain Analysis: Identify and analyze redirect chains.
  5. Security Header Audits: Verify the implementation of crucial security headers (CSP, HSTS, X-Frame-Options, etc.) across the site.
  6. Content Inventory & Analysis: Get a list of all crawled resources, their types, sizes, and status codes. Analyze content type distribution.
  7. Website Archiving/Cloning: While the crawler has a dedicated offline export, the JSON contains the list of all discovered resources, which could inform a custom archiving process.
  8. Competitive Analysis: Run the crawler on competitor sites (respecting their robots.txt) to gather insights into their structure, performance, and technology.
  9. CI/CD Integration: Integrate the crawler into deployment pipelines to automatically check for new errors (404s, performance regressions) after deployments. Use quality scores and thresholds for automated pass/fail decisions.
  10. Technical Debt Assessment: Identify outdated practices, missing security headers, or performance issues that need addressing.

3. Detailed JSON Structure

The JSON output has 8 top-level keys:

3.1. crawler (Object)

Contains metadata about the crawler execution:

  • name (String): Name of the crawler software.
  • version (String): Version of the crawler.
  • executedAt (String): Timestamp when the crawl was executed, in the format "YYYY-MM-DD HH:MM:SS" (space separator, no timezone). Example: "2026-03-16 14:55:13".
  • command (String): The command-line arguments used to run the crawl.
  • hostname (String): The hostname where the crawler was run.
  • finalUserAgent (String): The User-Agent string used for the HTTP requests.

3.2. extraColumnsFromAnalysis (Array)

An array of objects defining extra columns that might be added during specific analyses. These are primarily intended for augmenting report outputs. Each object contains:

  • name (String): The display name of the column.
  • length (Integer): Suggested display length/width.
  • truncate (Boolean): Whether the content should be truncated if it exceeds the length.
  • customMethod, customPattern, customGroup: Fields used for custom data extraction logic (null when not configured).

3.3. options (Object)

A flat object containing all 132 CLI configuration values used for the crawl. Every option from the command line (or its default value) is recorded here. Keys are the option names in camelCase (e.g., url, workers, maxReqsPerSec, timeout, outputType, userAgent, acceptEncoding, etc.). Values are strings, integers, booleans, or null, depending on the option type.

This is useful for reproducing a crawl or understanding the exact configuration that produced the results.

3.4. qualityScores (Object)

Contains overall and per-category quality scores computed after analysis.

  • overall (Object): The aggregate quality score.

    • score (Float): Overall score from 0.0 to 10.0.
    • label (String): Human-readable label (e.g., "A+", "A", "B", "C", "D", "F").
    • weight (Float): Total weight (1.0 for overall).
    • deductions (Array): Array of objects, each with:
      • points (Float): Number of points deducted.
      • reason (String): Explanation for the deduction.
  • categories (Array): Array of 5 category objects, each with:

    • code (String): Category identifier. One of: "performance", "seo", "security", "accessibility", "bestPractices".
    • name (String): Human-readable category name.
    • score (Float): Category score from 0.0 to 10.0.
    • label (String): Human-readable label.
    • weight (Float): Weight of this category in the overall score (e.g., 0.20 for SEO, 0.25 for Security).
    • deductions (Array): Array of deduction objects (same structure as overall deductions).

3.5. results (Array)

An array of objects, where each object represents a single visited URL.

  • url (String): The absolute URL that was visited.
  • status (String): The HTTP status code returned (e.g., "200", "404").
  • elapsedTime (Float): Time taken to fetch the URL in seconds (e.g., 0.005).
  • size (Integer): Size of the response body in bytes (e.g., 50961).
  • type (Integer): An enum representing the detected content type:
    • 1: HTML
    • 2: JavaScript
    • 3: CSS
    • 4: Image
    • 7: Document (e.g., robots.txt)
    • 8: JSON
    • Other types may exist (Audio, Font, Video, XML, Redirect, Other).
  • cacheTypeFlags (Integer): Bitmask representing detected caching mechanisms (e.g., Cache-Control, ETag, Last-Modified). For example, 31 typically means Cache-Control + ETag + Last-Modified are all present. 32768 might indicate no caching headers found.
  • cacheLifetime (Integer): Cache lifetime in seconds derived from Cache-Control: max-age or Expires header. 0 if no lifetime could be determined.
  • extras (Array): Contains additional data from specific analyzers run on this URL. Typically an empty array [].

3.6. stats (Object)

Aggregate statistics about the entire crawl:

  • totalUrls (Integer): Total number of URLs visited.
  • totalSize (Integer): Total size of all responses in bytes.
  • totalSizeFormatted (String): Human-readable formatted total size (e.g., "31.33 MB").
  • totalExecutionTime (Float): Total wall-clock execution time in seconds.
  • totalRequestsTimes (Float): Sum of all individual request times in seconds.
  • totalRequestsTimesAvg (Float): Average request time in seconds.
  • totalRequestsTimesMin (Float): Minimum request time in seconds.
  • totalRequestsTimesMax (Float): Maximum request time in seconds.
  • countByStatus (Object): An object mapping HTTP status codes to counts. Keys are status code strings (e.g., "200", "404", "429"), values are integers. Only status codes that were actually encountered appear as keys.

3.7. summary (Object)

Contains a list of summary findings that feed into quality scoring.

  • items (Array): Array of finding objects, each with:
    • aplCode (String): A unique code identifying the finding (e.g., "s201", "s404", "s502").
    • status (String): Severity level. One of: "CRITICAL", "WARNING", "OK", "INFO".
    • text (String): Human-readable description of the finding (e.g., "Brotli is supported for HTML", "1 URL(s) returned a 404 status code").

3.8. tables (Object)

An object where each key is a table identifier (e.g., skipped-summary, 404, seo) and the value is an object describing that table. Each table object contains:

  • aplCode (String): A unique code for the table.
  • title (String): A human-readable title for the table.
  • columns (Object): An object describing the columns of the table. Each key is a column identifier (e.g., reason, url, statusCode). The value is an object detailing the column:
    • aplCode (String): Unique code for the column.
    • name (String): Display name for the column header.
    • width (Integer): Suggested display width (-1 might mean auto).
    • formatter (Object | null): Defines how the data should be formatted (e.g., adding units like 'ms' or 'kB'). Empty object {} indicates default formatting.
    • renderer (Object | null): Defines how the data should be rendered (e.g., adding color or links). Empty object {} indicates default rendering.
    • truncateIfLonger (Boolean): Whether to truncate the value if it exceeds the width.
    • Other fields like formatterWillChangeValueLength, nonBreakingSpaces, escapeOutputHtml, getDataValueCallback, forcedDataType provide more hints for rendering.
  • rows (Array): An array of objects, where each object represents a row in the table. The keys in each row object correspond to the column identifiers defined in columns. Important: All values in all table rows are strings, regardless of whether the data represents a number, count, or other type. For example, a count of 51 appears as "51", a request time of 0.003 appears as "0.003", and an empty value appears as "". Rows may also contain extra keys beyond the declared columns (see individual table descriptions for details).
  • position (String): A hint about where this table should typically be positioned in a report (e.g., before-url-table, after-url-table).

Note: The specific content and structure within tables depend on the analyzers enabled during the crawl. The set of tables may vary depending on what data was encountered (e.g., certificate-info only appears for HTTPS sites).

4. JSON Schema (Draft)

This is a draft JSON schema based on the actual output. It may need refinement for edge cases.

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "SiteOne Crawler JSON Output",
  "description": "Schema for the JSON output file generated by SiteOne Crawler.",
  "type": "object",
  "properties": {
    "crawler": {
      "description": "Metadata about the crawler execution.",
      "type": "object",
      "properties": {
        "name": { "type": "string" },
        "version": { "type": "string" },
        "executedAt": { "type": "string", "description": "Format: YYYY-MM-DD HH:MM:SS" },
        "command": { "type": "string" },
        "hostname": { "type": "string" },
        "finalUserAgent": { "type": "string" }
      },
      "required": ["name", "version", "executedAt", "command", "hostname", "finalUserAgent"]
    },
    "extraColumnsFromAnalysis": {
      "description": "Definitions for extra columns used in analyses.",
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "name": { "type": "string" },
          "length": { "type": "integer" },
          "truncate": { "type": "boolean" },
          "customMethod": { "type": ["string", "null"] },
          "customPattern": { "type": ["string", "null"] },
          "customGroup": { "type": ["string", "null"] }
        },
        "required": ["name", "length", "truncate"]
      }
    },
    "options": {
      "description": "All CLI configuration values used for the crawl.",
      "type": "object",
      "additionalProperties": true
    },
    "qualityScores": {
      "description": "Overall and per-category quality scores.",
      "type": "object",
      "properties": {
        "overall": {
          "type": "object",
          "properties": {
            "score": { "type": "number" },
            "label": { "type": "string" },
            "weight": { "type": "number" },
            "deductions": {
              "type": "array",
              "items": {
                "type": "object",
                "properties": {
                  "points": { "type": "number" },
                  "reason": { "type": "string" }
                },
                "required": ["points", "reason"]
              }
            }
          },
          "required": ["score", "label", "weight", "deductions"]
        },
        "categories": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "code": { "type": "string", "enum": ["performance", "seo", "security", "accessibility", "bestPractices"] },
              "name": { "type": "string" },
              "score": { "type": "number" },
              "label": { "type": "string" },
              "weight": { "type": "number" },
              "deductions": {
                "type": "array",
                "items": {
                  "type": "object",
                  "properties": {
                    "points": { "type": "number" },
                    "reason": { "type": "string" }
                  },
                  "required": ["points", "reason"]
                }
              }
            },
            "required": ["code", "name", "score", "label", "weight", "deductions"]
          }
        }
      },
      "required": ["overall", "categories"]
    },
    "results": {
      "description": "Array of results for each visited URL.",
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "url": { "type": "string", "format": "uri" },
          "status": { "type": "string" },
          "elapsedTime": { "type": "number" },
          "size": { "type": "integer" },
          "type": { "type": "integer", "description": "Enum for content type (1:HTML, 2:JS, 3:CSS, 4:Image, 7:Document, 8:JSON, ...)" },
          "cacheTypeFlags": { "type": "integer", "description": "Bitmask for caching mechanisms" },
          "cacheLifetime": { "type": "integer", "description": "Cache lifetime in seconds, 0 if undetermined" },
          "extras": {
            "type": "array",
            "description": "Additional analysis data for this URL (typically empty)"
          }
        },
        "required": ["url", "status", "elapsedTime", "size", "type", "cacheTypeFlags", "cacheLifetime", "extras"]
      }
    },
    "stats": {
      "description": "Aggregate crawl statistics.",
      "type": "object",
      "properties": {
        "totalUrls": { "type": "integer" },
        "totalSize": { "type": "integer" },
        "totalSizeFormatted": { "type": "string" },
        "totalExecutionTime": { "type": "number" },
        "totalRequestsTimes": { "type": "number" },
        "totalRequestsTimesAvg": { "type": "number" },
        "totalRequestsTimesMin": { "type": "number" },
        "totalRequestsTimesMax": { "type": "number" },
        "countByStatus": {
          "type": "object",
          "additionalProperties": { "type": "integer" }
        }
      },
      "required": ["totalUrls", "totalSize", "totalSizeFormatted", "totalExecutionTime", "totalRequestsTimes", "totalRequestsTimesAvg", "totalRequestsTimesMin", "totalRequestsTimesMax", "countByStatus"]
    },
    "summary": {
      "description": "Summary findings that feed into quality scoring.",
      "type": "object",
      "properties": {
        "items": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "aplCode": { "type": "string" },
              "status": { "type": "string", "enum": ["CRITICAL", "WARNING", "OK", "INFO"] },
              "text": { "type": "string" }
            },
            "required": ["aplCode", "status", "text"]
          }
        }
      },
      "required": ["items"]
    },
    "tables": {
      "description": "Aggregated analysis results presented as tables.",
      "type": "object",
      "additionalProperties": {
        "type": "object",
        "properties": {
          "aplCode": { "type": "string" },
          "title": { "type": "string" },
          "columns": {
            "type": "object",
            "additionalProperties": {
              "type": "object",
              "properties": {
                "aplCode": { "type": "string" },
                "name": { "type": "string" },
                "width": { "type": "integer" },
                "formatter": { "type": ["object", "null"] },
                "renderer": { "type": ["object", "null"] },
                "truncateIfLonger": { "type": "boolean" }
              },
              "required": ["aplCode", "name", "width"]
            }
          },
          "rows": {
            "type": "array",
            "items": {
              "type": "object",
              "description": "All row values are strings. Rows may contain extra keys beyond the declared columns.",
              "additionalProperties": { "type": "string" }
            }
          },
          "position": { "type": "string", "enum": ["before-url-table", "after-url-table"] }
        },
        "required": ["aplCode", "title", "columns", "rows", "position"]
      }
    }
  },
  "required": ["crawler", "extraColumnsFromAnalysis", "options", "qualityScores", "results", "stats", "summary", "tables"]
}

5. Analysis Tables Description (tables key)

This section details the structure and columns of each table found under the tables key in the JSON output.

Important note on data types: All values in all table rows are strings. Numeric values such as counts, times, and sizes are serialized as strings (e.g., "51" not 51, "0.003" not 0.003). Empty values appear as "". This applies to every table described below. Where column descriptions say "count" or "time", the value is still a string representation of that number.

Some tables include extra row keys beyond the declared columns. These are noted in the individual table descriptions.

5.1. skipped-summary (Skipped URLs Summary)

Provides a summary of skipped URLs grouped by domain and reason.

ColumnDescription
reasonA human-readable string describing why URLs from this domain were skipped (e.g., "Not allowed host", "Blocked by robots.txt").
domainThe domain name whose URLs were skipped.
countThe number of unique URLs skipped for this domain and reason.

5.2. skipped (Skipped URLs)

Lists individual URLs that were skipped during the crawl.

ColumnDescription
reasonA human-readable string describing why the URL was skipped (e.g., "Not allowed host", "Blocked by robots.txt", "File extension is not allowed").
urlThe URL that was skipped.
sourceAttrA string describing the HTML attribute where the skipped URL was found (e.g., "<a href>", "<link href>", "<script src>").
sourceUqIdThe URL path of the page where the skipped URL was discovered (e.g., "/", "/docs/getting-started"). This allows linking back to the source page.

5.3. redirects (Redirected URLs)

Lists URLs that resulted in an HTTP redirect (3xx status code).

ColumnDescription
statusCodeThe specific redirect status code (e.g., "301", "302").
urlThe original URL that redirected.
targetUrlThe target URL to which the original URL redirected.
sourceUqIdURL path of the page where the redirected URL was found.

5.4. 404 (404 URLs)

Lists URLs that resulted in a "404 Not Found" status code.

ColumnDescription
statusCodeThe HTTP status code (typically "404").
urlThe URL that resulted in the 404 error.
sourceUqIdURL path of the page where the broken URL was found.

5.5. certificate-info (SSL/TLS info)

Provides details about the SSL/TLS certificate of the crawled domain.

ColumnDescription
infoThe name of the certificate attribute (e.g., "Issuer", "Subject", "Valid from", "Valid to", "Supported protocols", "RAW certificate output", "RAW protocols output").
valueThe value of the corresponding certificate attribute. Always a string. For multi-line values like raw certificate or protocol output, the entire content is a single string with embedded newlines.

5.6. fastest-urls (TOP fastest URLs)

Lists the URLs with the lowest request times encountered during the crawl.

ColumnDescription
requestTimeThe time taken to fetch the URL in seconds (e.g., "0.003").
statusCodeThe HTTP status code of the URL (e.g., "200").
urlThe URL itself.

5.7. slowest-urls (TOP slowest URLs)

Lists the URLs with the highest request times encountered during the crawl.

ColumnDescription
requestTimeThe time taken to fetch the URL in seconds (e.g., "1.234").
statusCodeThe HTTP status code of the URL (e.g., "200").
urlThe URL itself.

5.8. seo (SEO metadata)

Provides SEO-related metadata extracted from HTML pages.

ColumnDescription
urlPathAndQueryThe path and query string of the URL.
indexingA string describing the indexing status (e.g., "index, follow", "noindex, follow").
titleThe content of the <title> tag, or empty string if not found.
h1The content of the first <h1> tag found, or empty string.
descriptionThe content of the meta name="description" tag, or empty string.
keywordsThe content of the meta name="keywords" tag, or empty string.

Extra row keys (present in each row object but not declared as columns):

  • robotsIndex (String): Whether the page allows indexing (e.g., "1" for index, "0" for noindex).
  • deniedByRobotsTxt (String): Whether the page is denied by robots.txt (e.g., "0" for allowed, "1" for denied).

5.9. open-graph (OpenGraph metadata)

Provides Open Graph and Twitter Card metadata extracted from HTML pages.

ColumnDescription
urlPathAndQueryThe path and query string of the URL.
ogTitleContent of the og:title meta tag, or empty string.
ogDescriptionContent of the og:description meta tag, or empty string.
ogImageContent of the og:image meta tag, or empty string.
twitterTitleContent of the twitter:title meta tag, or empty string.
twitterDescriptionContent of the twitter:description meta tag, or empty string.
twitterImageContent of the twitter:image meta tag, or empty string.

5.10. seo-headings (Heading structure)

Provides analysis of the heading (H1-H6) structure for each HTML page.

ColumnDescription
headingsA formatted string representation of the heading structure showing hierarchy and potential errors (e.g., "OK H1, H2, H2, H3" or "ERR H1, H3 (skipped H2)").
headingsCountTotal number of headings found on the page (e.g., "5").
headingsErrorsCountNumber of structural errors found in the headings (e.g., "0", "2").
urlPathAndQueryThe path and query string of the URL.

Extra row key:

  • headingsHtml (String): An HTML string containing the full heading tree with markup (e.g., "<b>H1</b> Title<br><b>H2</b> Section..."). Useful for rendering a visual heading tree in reports.

5.11. headers (HTTP headers)

Summarizes the HTTP response headers encountered across all crawled URLs.

ColumnDescription
headerThe name of the HTTP header.
occurrencesThe total number of times this header was found (e.g., "73").
uniqueValuesThe count of distinct values found for this header, as a string (e.g., "3").
valuesPreviewA preview string showing some of the values encountered (truncated if many).
minValueThe minimum value found (relevant for numerical or date headers), or empty string.
maxValueThe maximum value found, or empty string.

5.12. headers-values (HTTP header values)

Lists unique values for each HTTP header and their occurrence count.

ColumnDescription
headerThe name of the HTTP header.
occurrencesThe number of times this specific value occurred for this header (e.g., "51").
valueThe specific unique value of the HTTP header.

5.13. caching-per-content-type (HTTP Caching by content type)

Analyzes caching effectiveness grouped by general content type (HTML, Image, JS, CSS, etc.).

ColumnDescription
contentTypeThe general content type category (e.g., "HTML", "Image", "JS").
cacheTypeDescription of the caching mechanism detected (e.g., "Cache-Control + ETag + Last-Modified", "No cache headers").
countNumber of URLs matching this content type and cache type.
avgLifetimeAverage cache lifetime in seconds for URLs in this group, or empty string if not determinable.
minLifetimeMinimum cache lifetime in seconds, or empty string.
maxLifetimeMaximum cache lifetime in seconds, or empty string.

5.14. caching-per-domain (HTTP Caching by domain)

Analyzes caching effectiveness grouped by domain.

ColumnDescription
domainThe domain name.
cacheTypeDescription of the caching mechanism detected.
countNumber of URLs from this domain matching this cache type.
avgLifetimeAverage cache lifetime in seconds, or empty string.
minLifetimeMinimum cache lifetime in seconds, or empty string.
maxLifetimeMaximum cache lifetime in seconds, or empty string.

5.15. caching-per-domain-and-content-type (HTTP Caching by domain and content type)

Analyzes caching effectiveness grouped by both domain and general content type.

ColumnDescription
domainThe domain name.
contentTypeThe general content type category.
cacheTypeDescription of the caching mechanism detected.
countNumber of URLs matching this domain, content type, and cache type.
avgLifetimeAverage cache lifetime in seconds, or empty string.
minLifetimeMinimum cache lifetime in seconds, or empty string.
maxLifetimeMaximum cache lifetime in seconds, or empty string.

5.16. non-unique-titles (TOP non-unique titles)

Lists page titles that appear on more than one page.

ColumnDescription
countThe number of pages sharing this title.
titleThe non-unique page title.

5.17. non-unique-descriptions (TOP non-unique descriptions)

Lists meta descriptions that appear on more than one page.

ColumnDescription
countThe number of pages sharing this description.
descriptionThe non-unique meta description content.

5.18. best-practices (Best practices)

Summarizes the results of various best practice checks performed by analyzers.

ColumnDescription
analysisNameThe name of the specific best practice check (e.g., "Large inline SVGs", "Heading structure", "Brotli support").
okCount of URLs passing this check.
noticeCount of URLs with a notice-level finding.
warningCount of URLs with a warning-level finding.
criticalCount of URLs with a critical-level finding.

5.19. accessibility (Accessibility)

Summarizes the results of accessibility checks.

ColumnDescription
analysisNameThe name of the specific accessibility check (e.g., "Missing image alt attributes", "Missing html lang attribute", "ARIA roles and landmarks").
okCount of elements/pages passing this check.
noticeCount of notice-level findings.
warningCount of warning-level findings.
criticalCount of critical-level findings.

5.20. source-domains (Source domains)

Provides statistics about the domains from which resources were loaded.

ColumnDescription
domainThe domain name.
totalsA summary string showing total count, size, and time for resources from this domain (e.g., "67/30MB/6.2s").
HTMLSummary string (count/size/time) for HTML resources from this domain.
ImageSummary string for Image resources.
JSSummary string for JavaScript resources.
CSSSummary string for CSS resources.
DocumentSummary string for Document resources (e.g., robots.txt).

Extra row keys (dynamic, present when data exists):

  • Audio, Font, JSON, Other, Redirect, Video, XML (String): Summary strings for additional content types, included only when resources of that type are present.
  • totalCount (String): Total number of resources loaded from this domain.

Note: The set of content type columns is dynamic. The declared columns (HTML, Image, JS, CSS, Document) are always present, but additional content type columns appear in row data based on what resource types were actually encountered during the crawl.

5.21. content-types (Content types)

Summarizes statistics grouped by general content type.

ColumnDescription
contentTypeThe general content type category (e.g., "HTML", "Image").
countTotal number of URLs of this content type.
totalSizeTotal size in bytes for this content type.
totalTimeTotal time spent fetching resources of this content type.
avgTimeAverage time spent fetching a resource of this content type.
status20xCount of URLs with a 2xx status code.
status40xCount of URLs with a 4xx status code.

Note: The status columns are dynamic. Additional columns like status42x (for HTTP 429) or status30x, status50x may appear depending on which status codes were actually encountered during the crawl. These dynamic columns will also be declared in the table's columns object.

5.22. content-types-raw (Content types (MIME types))

Summarizes statistics grouped by the specific MIME type reported in the Content-Type HTTP header.

ColumnDescription
contentTypeThe raw MIME type string (e.g., "text/html", "image/svg+xml", "text/html; charset=utf-8").
countTotal number of URLs with this MIME type.
totalSizeTotal size in bytes.
totalTimeTotal time spent fetching.
avgTimeAverage time spent fetching.
status20xCount of URLs with a 2xx status code.
status40xCount of URLs with a 4xx status code.

Note: Like content-types, the status columns are dynamic. Additional status columns (e.g., status42x) appear when the corresponding status codes are encountered.

5.23. dns (DNS info)

Shows the DNS resolution information for the crawled domain(s).

ColumnDescription
infoA line of text representing part of the DNS resolution (e.g., the domain name, an IP address, the DNS server used). Presented as a simple text tree.

5.24. security (Security)

Summarizes findings related to security HTTP headers.

ColumnDescription
headerThe name of the security header being analyzed (e.g., "Strict-Transport-Security", "X-Frame-Options", "Content-Security-Policy").
okCount of URLs where the header was configured correctly.
noticeCount of URLs with a notice-level finding.
warningCount of URLs with a warning-level finding.
criticalCount of URLs with a critical-level finding.
recommendationA string containing textual recommendations for improving the configuration of this header.

Extra row key:

  • highestSeverity (String): The highest severity level found for this header across all URLs (e.g., "ok", "warning", "critical").

5.25. analysis-stats (Analysis stats)

Provides performance metrics for individual analyzer methods.

ColumnDescription
classAndMethodThe class and method name of the analyzer function.
execTimeTotal execution time in seconds spent in this method across all relevant URLs/data points.
execCountThe number of times this method was executed.

Extra row key:

  • execTimeFormatted (String): Human-readable formatted execution time (e.g., "0.012 s", "1.234 s").

5.26. content-processors-stats (Content processor stats)

Provides performance metrics for content processor methods (HTML, CSS, JS, XML processors that run during the crawl).

ColumnDescription
classAndMethodThe class and method name of the content processor function.
execTimeTotal execution time in seconds spent in this method.
execCountThe number of times this method was executed.

Extra row key:

  • execTimeFormatted (String): Human-readable formatted execution time.

5.27. external-urls (External URLs)

Lists external URLs discovered during the crawl along with where they were found.

ColumnDescription
urlThe external URL that was discovered.
countThe number of times this external URL was found across all crawled pages.
foundOnThe URL of the page where this external URL was found (typically the first occurrence).

6. Note on Text Output

While this document focuses on the JSON output, SiteOne Crawler also offers a simpler Text output format (--output-text-file). The Text output provides a human-readable summary suitable for quick review in a terminal or text editor.

See the Text Output Documentation for more details on the Text format.