Pipeline Architecture

April 16, 2026 · View on GitHub

Visibility into the automation that builds and publishes the third-party module catalogue helps contributors reason about changes and spot failure points early. This document summarizes the current canonical pipeline and the parts of the broader architecture that are still future-facing.

Current State (April 2026)

The supported production pipeline is orchestrated via node scripts/orchestrator/index.ts run full-refresh-parallel (also exposed as node --run all). The orchestrator now drives four registered stages across three operational phases: metadata collection, parallel module processing, and publication.

Stage Overview

OrderStage IDKey Outputs
1collect-metadatain-memory metadata payload, gitHubData.json
2parallel-processingin-memory analysis payload, modules/, modules_temp/, website/images/, skipped_modules.json
3aggregate-cataloguemodules.json, modules.min.json, stats.json
4generate-result-markdownresult.md

Current Workflow Diagram

flowchart TB
  orchestrator[[Orchestrator<br>4-stage execution]]

  subgraph Phase 1: Metadata Collection
    seed[("Module seed list")] --> collect{{Collect metadata}}
    collect --> cache[("gitHubData.json cache")]
    collect --> metadata["metadata payload (in-memory)"]
  end

  subgraph Phase 2: Parallel Module Processing
    metadata --> parallel{{Parallel processing}}
    parallel --> clones[("modules/<br>modules_temp/")]
    parallel --> images[("website/images/")]
    parallel --> analysisPayload["analysis payload (in-memory)"]
  end

  subgraph Phase 3: Catalogue Aggregation
    analysisPayload --> aggregate{{Aggregate catalogue}}
    aggregate --> outputs[("modules.json<br>modules.min.json<br>stats.json")]
    analysisPayload --> result{{Generate result markdown}}
    outputs --> result
    result --> resultMd[("result.md")]
  end

  orchestrator -.controls.-> collect
  orchestrator -.controls.-> parallel
  orchestrator -.controls.-> aggregate

Key Features

  • Orchestrator CLI: Declarative stage graph with --only/--skip support, retries, and structured logging
  • Worker Pool Stage: parallel-processing encapsulates clone, enrich, image, and analysis work behind a single supported stage
  • Aggregation Stage: aggregate-catalogue builds published JSON artifacts from the in-memory analysis payload
  • Schema Validation: JSON schemas enforce contracts at the published boundaries (modules.json, modules.min.json, stats.json)
  • Shared Utilities: HTTP, Git, filesystem, and rate limiting in scripts/shared/

Incremental Pipeline Behavior

The pipeline implements intelligent caching and skip logic to avoid redundant work:

ScopeOptimizationCurrent BehaviorWhy It Helps
MetadataAPI cache TTLReuses recent host API responses during collect-metadataReduces external API traffic
Module processingClone reuseRecycles modules_temp/ when repositories can be refreshed in placeAvoids unnecessary full re-clones
Module processingWorker batchingProcesses modules in bounded child-process batchesKeeps memory bounded and throughput predictable
Analysis cacheCache read/writeWorker-compatible moduleCache.json drives skip/read/write/prune in parallel-processingRestores second-run skip behavior while preserving worker throughput

No persisted intermediate stage boundary remains. Stage handoffs are fully in-memory.


Distribution Touchpoints

This section is about how module data enters the system and reaches downstream consumers. Unlike the canonical pipeline above, part of this flow is still conceptual.

Current Intake Flow

flowchart LR
  wiki[(module wiki list<br><i>- unreliable -</i>)]
  pipeline{{automation pipeline}}
  api[(API<br>modules.json)]
  remote[MMM-Remote-Control]
  modinstall[MMM-ModInstall]
  config[MMM-Config]
  mmpm[mmpm]
  moduleWebsite[website<br>modules.magicmirror.builders]

  wiki --> pipeline --> api
  api --> mmpm
  api --> remote
  api --> modinstall
  api --> config
  api --> moduleWebsite

Potential Future Intake Flow

flowchart LR
  ui[(Form-based front end<br>for adding, editing, and<br>deleting modules<br><i>- not yet conceptualized -</i>)]
  pipeline{{automation pipeline}}
  api[(API<br>modules.json)]
  remote[MMM-Remote-Control]
  modinstall[MMM-ModInstall]
  config[MMM-Config]
  mmpm[mmpm]
  moduleWebsite[website<br>modules.magicmirror.builders]

  ui --> pipeline --> api
  api --> remote
  api --> modinstall
  api --> config
  api --> mmpm
  api --> moduleWebsite

If this direction is pursued, the wiki would be replaced with a form-based frontend while downstream consumers continue using the unchanged API endpoint.