Query Analysis & Cache Security

May 18, 2026 · View on GitHub

How NornicDB safely analyzes and caches Cypher queries.

Overview

NornicDB uses a QueryAnalyzer to extract metadata from Cypher queries for caching optimization. This document explains the security model, what protections exist, and edge cases.

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                         Query Execution Flow                         │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  ┌─────────────┐    ┌──────────────────┐    ┌──────────────────┐   │
│  │   Client    │───▶│  QueryAnalyzer   │───▶│   Cache Check    │   │
│  │   Query     │    │  (Metadata only) │    │  (Read-only?)    │   │
│  └─────────────┘    └──────────────────┘    └──────────────────┘   │
│                              │                        │              │
│                              ▼                        ▼              │
│                     ┌──────────────────┐    ┌──────────────────┐   │
│                     │  Query Executor  │◀───│  Cache Hit/Miss  │   │
│                     │  (Full parsing)  │    │                  │   │
│                     └──────────────────┘    └──────────────────┘   │
│                              │                                       │
│                              ▼                                       │
│                     ┌──────────────────┐                            │
│                     │  Storage Engine  │                            │
│                     │  (Data access)   │                            │
│                     └──────────────────┘                            │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Security Model

What QueryAnalyzer Does

The QueryAnalyzer performs lightweight keyword detection to determine:

Is this query read-only? (cacheable)
Is this query a write operation? (requires cache invalidation)
What labels are affected? (for smart cache invalidation)

What QueryAnalyzer Does NOT Do

NOT used for access control - permissions are checked separately
NOT used for query validation - the executor validates syntax
NOT a full parser - just keyword detection for performance

Threat Analysis

✅ Protected: Write Operations Hidden as Reads

Threat: An attacker crafts a query that deletes/modifies data but is marked as "read-only" by the analyzer.

Protection: This is NOT POSSIBLE because:

Cypher keywords must be whole words with boundaries
We search ALL occurrences of each keyword
There's no way to "hide" a keyword in valid Cypher syntax

Example attacks that are blocked:

-- Hiding DELETE - NOT POSSIBLE
MATCH (n) WHERE n.name = 'safe' DELETE n
-- ✅ DELETE detected, marked as write

-- Splitting keywords - NOT POSSIBLE (invalid Cypher)
MATCH (n) DE/*comment*/LETE n
-- ✅ Invalid syntax, rejected by executor

-- Unicode tricks - NOT POSSIBLE
MATCH (n) ＤＥＬＥＴＥ n
-- ✅ Not ASCII "DELETE", not valid Cypher

⚡ Accepted: Read Operations Marked as Writes (False Positives)

Scenario: A read-only query is incorrectly marked as a write operation.

Impact: Performance only (not security)

Query is not cached
May trigger unnecessary cache invalidation
Query still executes correctly

When this happens:

Scenario	Example Query	What Happens
Property named like keyword	`RETURN n.delete`	Marked as write
Keyword in string literal	`WHERE name = 'DELETE me'`	Marked as write
Keyword in comment	`RETURN n // TODO: delete`	Marked as write

Why we accept this:

Conservative is safer - Better to not cache than to cache incorrectly
Rare occurrence - Few users name properties "delete" or "create"
No security impact - Query executes correctly, just not cached
Simple implementation - Complex parsing adds attack surface

✅ Protected: False Positives Cannot Leak Data

Concern: Could an attacker use a false positive to gain unauthorized data access?

Answer: No. The QueryAnalyzer is completely separate from access control.

┌─────────────────────────────────────────────────────────────┐
│                    Security Boundary                         │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│   QueryAnalyzer              Access Control (RBAC)           │
│   ┌────────────────┐         ┌────────────────┐             │
│   │ Determines:    │         │ Determines:    │             │
│   │ • Caching      │         │ • Can user     │             │
│   │ • Invalidation │         │   read nodes?  │             │
│   │                │         │ • Can user     │             │
│   │ Does NOT       │         │   write nodes? │             │
│   │ affect:        │         │                │             │
│   │ • Permissions  │         │ ALWAYS         │             │
│   │ • Authorization│         │ ENFORCED       │             │
│   └────────────────┘         └────────────────┘             │
│          │                          │                       │
│          │ Performance              │ Security              │
│          ▼                          ▼                       │
│   ┌────────────────┐         ┌────────────────┐             │
│   │ Cache miss     │         │ Query allowed  │             │
│   │ (slower, but   │         │ or denied      │             │
│   │ still works)   │         │ (enforced)     │             │
│   └────────────────┘         └────────────────┘             │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Attack scenarios that DO NOT work:

Attack	Why It Fails
"Read secret data by triggering write path"	Write path doesn't grant extra permissions
"Learn label exists via cache invalidation"	Query must succeed first (already requires access)
"Bypass RBAC by confusing the analyzer"	Analyzer has no role in authorization
"Force stale data via false positive"	False positive = no caching = always fresh data

Example:

-- Attacker tries to read :Secret nodes they shouldn't access
MATCH (n:Secret) RETURN n.delete

What happens:

QueryAnalyzer sees "delete" → marks as write (false positive)
Access control checks if user can read :Secret → DENIED
Query rejected - attacker gets nothing
Cache invalidation never happens (query failed)

The false positive only affects step 1 (caching decision). The security check in step 2 is completely independent and always enforced.

Cache Security

Result Caching

Only read-only queries are cached:

// Cached (read-only):
MATCH (n:Person) RETURN n.name           // ✅ Cached
CALL db.labels()                          // ✅ Cached (db.* procedures)
SHOW INDEXES                              // ✅ Cached

// NOT cached (write operations):
CREATE (n:Person {name: 'Alice'})         // ❌ Not cached
MATCH (n) SET n.updated = timestamp()     // ❌ Not cached
CALL gds.graph.drop('myGraph')            // ❌ Not cached (non-db.* procedure)

Cache Invalidation

Write operations invalidate relevant caches:

// Smart invalidation by label:
CREATE (n:Person)                         // Invalidates :Person cache
MATCH (n:Company) DELETE n                // Invalidates :Company cache

// Full invalidation (no label detected):
MATCH (n) DELETE n                        // Invalidates ALL caches

Cache Key Security

Cache keys include:

Normalized query string
Parameter values (hashed)

This prevents:

Cache poisoning across different parameters
Cache confusion between similar queries

Procedure Call Security

Read-Only Procedures (Cached)

Only CALL db.* procedures are marked as read-only:

CALL db.labels()                    -- ✅ Read-only, cached
CALL db.propertyKeys()              -- ✅ Read-only, cached
CALL db.relationshipTypes()         -- ✅ Read-only, cached

Write Procedures (Not Cached)

All other procedures are conservatively treated as writes:

CALL gds.graph.project(...)         -- ❌ May be write, not cached
CALL gds.graph.drop(...)            -- ❌ Is a write, not cached
CALL apoc.create.node(...)          -- ❌ Is a write, not cached
CALL custom.myProcedure(...)        -- ❌ Unknown, not cached

Implementation Details

Keyword Detection

// containsKeyword searches ALL occurrences, checking word boundaries
func containsKeyword(upper, keyword string) bool {
    // Iterates through string finding each occurrence
    // Checks character before: must not be alphanumeric or _
    // Checks character after: must not be alphanumeric or _
    // Returns true if ANY occurrence is a valid keyword
}

Word Boundary Rules

A keyword match requires:

Before: Start of string, or non-alphanumeric (except _)
After: End of string, or non-alphanumeric (except _)

"DELETE n"        → DELETE matched (space before, space after)
"n.delete"        → delete matched (. is not alphanumeric) - FALSE POSITIVE
"ToDelete"        → DELETE not matched (o before)
"DELETED"         → DELETE not matched (D after)

Query Types Reference

Write Operations (Never Cached)

Keyword	Detection	Example
`CREATE`	Word boundary	`CREATE (n:Node)`
`MERGE`	Word boundary	`MERGE (n:Node {id: 1})`
`DELETE`	Word boundary	`MATCH (n) DELETE n`
`DETACH DELETE`	Phrase	`MATCH (n) DETACH DELETE n`
`SET`	Word boundary	`MATCH (n) SET n.x = 1`
`REMOVE`	Word boundary	`MATCH (n) REMOVE n.label`

Read Operations (Cacheable)

Pattern	Requirement	Example
`MATCH ... RETURN`	No write keywords	`MATCH (n) RETURN n`
`CALL db.*`	Starts with `CALL db.`	`CALL db.labels()`
`SHOW`	Starts with `SHOW`	`SHOW INDEXES`

Schema Operations (Special Handling)

Operation	Cached	Invalidates
`CREATE INDEX`	No	Schema cache
`DROP INDEX`	No	Schema cache
`CREATE CONSTRAINT`	No	Schema cache
`DROP CONSTRAINT`	No	Schema cache

Testing & Verification

Security Tests

The following attack patterns are verified in tests:

// All write operations detected:
"MATCH (n) DELETE n"                    // HasDelete = true ✅
"MATCH (n) WHERE type = 'safe' DELETE n" // HasDelete = true ✅
"CREATE (n:Node)"                        // HasCreate = true ✅
"MERGE (n:Node)"                         // HasMerge = true ✅
"MATCH (n) SET n.x = 1"                  // HasSet = true ✅
"MATCH (n) REMOVE n.label"               // HasRemove = true ✅

// False positives (safe, not cached):
"RETURN n.delete"                        // HasDelete = true (false positive)
"WHERE name = 'DELETE'"                  // HasDelete = true (false positive)

Running Security Tests

# Run all cypher tests including security checks
cd nornicdb
go test ./pkg/cypher/... -v

# Run specific cache/security tests
go test ./pkg/cypher/... -run "Cache|Security" -v

Best Practices

For Query Authors

Avoid keyword-named properties - Don't name properties delete, create, set, etc.
Use parameters for values - WHERE name = $name instead of WHERE name = 'DELETE'
Explicit transactions for writes - Use BEGIN/COMMIT for critical write operations

For Administrators

Monitor cache hit rates - Low hit rates may indicate false positives
Review slow query logs - Uncached queries appear more frequently
Use production mode - Ensures strict security settings

Summary

Aspect	Protection Level	Notes
Write ops hidden as reads	✅ Fully Protected	Not possible in valid Cypher
Read ops marked as writes	⚡ Accepted	Performance impact only
False positive data leakage	✅ Protected	Analyzer doesn't affect access control
Cache poisoning	✅ Protected	Keys include parameters
Procedure writes	✅ Protected	Only `db.*` cached
Access control bypass	✅ N/A	Analyzer not used for auth
Cache timing side-channel	✅ Protected	Query must succeed before cache action

See Also:

HTTP Security - Network-level protections
Compliance Guide - Regulatory requirements