Privacy-Preserving Data Analysis Workflow

March 17, 2026 · View on GitHub

sequenceDiagram
    participant DS as Data Scientist
    participant INB as DO Inbox
    participant OUT as DO Outbox
    participant SB as DO Syftbox
    participant DO as Data Owner

    Note over DS,DO: 1. Peer Request
    DS->>INB: add_peer("do@org.com")
    DO->>INB: approve_peer_request("ds@org.com")

    Note over DO,SB: 2. Dataset Publication
    DO->>SB: create_dataset(mock, private)
    DO->>OUT: Mock data + metadata
    DS->>OUT: sync() — pull mock data

    Note over DS: 3. Explore & Develop
    DS->>DS: Test analysis on mock data

    Note over DS,INB: 4. Job Submission
    DS->>INB: submit_python_job(code)

    Note over DO: 5. Review & Execute
    DO->>INB: sync() — receive job
    DO->>DO: Approve (or reject) & run job

    Note over DO,OUT: 6. Publish Results
    DO->>OUT: Write results to outbox

    Note over DS,OUT: 7. Retrieve Results
    DS->>OUT: sync() — pull results
    DS->>DS: Read results

Workflow Steps

Peer Request: The Data Scientist requests access to the Data Owner's datasite. The Data Owner reviews and approves the request.
Dataset Publication: The Data Owner publishes a dataset with both mock (public) and private components. Mock data is placed in the outbox for Data Scientists to pull.
Explore & Develop: The Data Scientist downloads the mock data to explore the structure and test their analysis code locally.
Job Submission: The Data Scientist submits analysis code via the Data Owner's inbox.
Review & Execute: The Data Owner syncs to receive the job, reviews the code, and approves (or rejects) and runs it on private data.
Publish Results: The Data Owner writes job outputs to the outbox for the Data Scientist to pull.
Retrieve Results: The Data Scientist syncs to pull the results.