Custom Analysis Rules
Define organization-specific policies in YAML. Custom rules produce structured signals with rule IDs, source spans, and matching evidence — same shape as built-ins.
Rules are pattern-matching only — no code execution, no external calls. The engine validates each rule's shape and predicate at load time; the generated JSON Schema drives offline IDE autocomplete against /schemas/v1/custom_rules.schema.json.
Basic Example
# company_rules.yaml
# yaml-language-server: $schema=https://lexega.com/schemas/v1/custom_rules.schema.json
rules:
- id: ACME-STG-ENC
risk_level: critical
message: "Stage created without encryption."
triggers:
all_of:
- kind: { in: [create_stage, alter_stage] }
- ddl.stage.encryption_disabled: true
- id: ACME-MASK-DROP
risk_level: high
message: "Masking policy {policy.target.name.canonical} removed."
triggers:
kind: drop_masking_policy
The first rule fires on CREATE STAGE or ALTER STAGE statements that turn encryption off. The second fires on any DROP MASKING POLICY, and its message interpolates a field from the matched statement's facts.
Running Custom Rules
# Analyze with custom rules
lexega-sql analyze query.sql --custom-rules company_rules.yaml
# Combine with catalog
lexega-sql analyze query.sql --custom-rules rules.yaml --catalog prod.json
# Multiple rule files
lexega-sql analyze query.sql --custom-rules rules1.yaml --custom-rules rules2.yaml
Cloud Storage URIs
For larger or shared deployments, rules, policies, exceptions, and output artifacts can be read from or written to cloud storage. This enables separation of concerns: DevOps maintains rules, Security maintains policies, artifacts go to shared storage.
| Scheme | Provider | CLI Required |
|---|---|---|
s3://bucket/key | AWS S3 | aws |
gs://bucket/key | Google Cloud Storage | gsutil |
az://container/blob | Azure Blob Storage | az |
https://... | Any HTTPS endpoint (read-only) | curl |
Reading from cloud storage:
# AWS S3
lexega-sql analyze query.sql --custom-rules s3://mybucket/governance/rules.yaml
# Google Cloud Storage
lexega-sql analyze query.sql --policy gs://mybucket/security/policy.yaml
# Azure Blob Storage
lexega-sql analyze query.sql --exceptions az://governance/exceptions.yaml
# Pre-signed URLs
lexega-sql analyze query.sql --custom-rules "https://mybucket.s3.amazonaws.com/rules.yaml?sig=..."
Writing to cloud storage:
# Write decision artifact to S3
lexega-sql review main..HEAD models/ \
--policy policy.yaml \
--env prod \
--decision-out s3://mybucket/ci/decisions/$GITHUB_RUN_ID/
# Write risk report to GCS
lexega-sql analyze query.sql --report-out gs://mybucket/reports/risk.json
Severity vs Enforcement
- Signal severity (
risk_level): Each signal carries arisk_level(critical,high,medium,low,info). Built-in signals set this in the YAML rule; custom rules set it the same way. - Policy enforcement (
--policy): The policy layer is the only way to block. A policy bundle canallow/warn/blockby referencing rule IDs (e.g., block ruleGRT-TO-PUBLICin prod). Policies are an enforcement layer on top of signals — they reference rules, not define detection logic.
Rule Structure
| Field | Type | Required | Description |
|---|---|---|---|
id | string | yes | Stable identifier (e.g. ACME-STG-ENC). Avoid the prefixes used by built-ins (GRT-, MASK-, SNW-, Q-, DML-, …) so customer rules can be told apart at a glance — unless you're authoring a partial override, in which case id must match a built-in. |
risk_level | enum | yes | One of critical, high, medium, low, info (lowercase). |
message | string | yes | Signal description. Supports {path.to.field} interpolation against the matched statement's facts. |
triggers | object | yes for full rules; omit for partial overrides | The predicate that decides whether the rule fires on a given statement. See below. Omitting it turns the entry into a partial overlay on a built-in (see Overriding Built-in Rules). |
enabled | bool | no | Default true. Set false to keep the rule in source but skip evaluation. |
emission | enum | no | once (one signal per statement, default) or per_witness (one signal per matched element). Auto-set to per_witness whenever the predicate contains an each: quantifier. Cannot be set on a partial override. |
per_statement | bool | no | Default false. When true, the deduplication pass keeps each emitted signal as its own entry instead of merging by (rule_id, risk_level). Use for compliance-review rules where every occurrence is a distinct artifact. Cannot be set on a partial override. |
Trigger Predicates
The triggers: key is a single predicate object that matches against StatementFacts — the typed bag of facts the analyzer derives for one SQL statement.
Combinators
| Form | Meaning |
|---|---|
all_of: [<pred>, ...] | Boolean AND. Implicit all_of is also formed when a predicate object lists multiple sibling path-matches without a combinator. |
any_of: [<pred>, ...] | Boolean OR. |
not: <pred> | Boolean NOT. |
Selecting the statement type
kind: is the most common top-level matcher. Values are snake-cased StatementKind variants — select, insert, grant, create_table, alter_stage, drop_masking_policy, etc. See the Builtin Rule Reference for the full taxonomy.
triggers:
kind: grant # one statement kind
triggers:
kind: # any of several kinds
in: [insert, update, delete, merge, multi_insert, truncate]
Matching fact fields
Every field on StatementFacts is addressable by dotted path. The schema generator walks the fact tree and emits an explicit property for each legal path, so IDE auto-complete works offline:
triggers:
all_of:
- kind: grant
- privilege.with_grant_option: true # eq sugar — bare value means `eq`
- privilege.target.kind: # the *object* the grant is on
in: [database, schema, catalog]
targetis the object, not the grantee.privilege.targetis the object the privilege is being granted on (GRANT … ON <target> TO <grantees>). To predicate on who receives the grant, use theprivilege.granteesrelational path (see the next section).
Scalar operators
For scalar fields (bool, string, number, enum, Option<T>):
| Op | Example |
|---|---|
| (bare value) | kind: grant (sugar for { eq: grant }) |
eq / neq | privilege.all_privileges: { neq: true } |
gt / lt / gte / lte | privilege.role_grant_impact.inherited_privileges: { gt: 0 } |
matches: <glob> | ddl.stage.url_literal: { matches: "*://*:*@*" } |
in: [...] / not_in: [...] | kind: { in: [create_stage, alter_stage] } |
exists: <bool> | privilege.role_grant_impact: { exists: true } (Option-presence test) |
is_null: <bool> | same shape, alternate spelling |
range: { low, high, low_inclusive, high_inclusive } | numeric ranges; *_inclusive default true |
About
matches:.matches:is glob, not regex.*is the only wildcard (matches any sequence of characters, including empty); every other character — including.,?,^,$,[,],\— is matched literally. The pattern"PROD.*"matches strings beginning with the literal four charactersPROD.followed by anything (so it works against dot-separated canonical names likePROD.SCHEMA.TABLE); it is not equivalent to the regexPROD\..*. For fixed enumerations usein:/not_in:; for OR-of-patterns, combinematches:predicates underany_of:. Anchored, character-class, and capture-group matching are not supported.
For Vec<scalar> fields (e.g. privilege.privileges):
| Op | Example |
|---|---|
contains: <value> | privilege.privileges: { contains: ownership } |
contains_any: [...] | privilege.privileges: { contains_any: [select, insert] } |
contains_all: [...] | every listed value must be present |
Relational quantifiers
For Vec<struct> fields (e.g. privilege.grantees, query.scopes), nest a sub-predicate inside a quantifier:
| Quantifier | Meaning |
|---|---|
exists: <pred> | ∃ element. P(element). |
all: <pred> | ∀ element. P(element). |
none: <pred> | ¬∃ element. P(element). |
each: <pred> | Per-witness driver — one signal fires per matching element. Implies emission: per_witness. |
count: <numeric-op> | Match on the size of the collection. Bare integer is sugar for eq. |
triggers:
all_of:
- kind: grant
- privilege.grantees:
exists:
name.normalized: PUBLIC # ∃ grantee named PUBLIC
- privilege.privileges:
count: { gte: 5 } # at least 5 privileges granted
# Per-witness emission: one signal per OR-tautology found in the query.
triggers:
all_of:
- kind: { in: [select, set_select] }
- query.or_tautologies:
each: {} # vacuous body — every element witnesses
Identifier matching
Identifier fields (IdentName) carry both raw (the source spelling) and normalized (dialect-aware case fold — UPPERCASE on Snowflake/BigQuery/MSSQL, lowercase on PostgreSQL/Databricks/Redshift):
- privilege.grantees:
exists:
name.normalized:
in: [ACCOUNTADMIN, SECURITYADMIN] # matches regardless of source casing
TableRef adds a canonical field (database.schema.table joined from normalized components) for cross-dialect-stable identity:
- query.reads_table:
exists:
table.canonical: { matches: "PROD.SENSITIVE.*" }
Message Interpolation
{path.to.field} slots in message resolve against the matched statement's facts at emission time:
- id: ACME-PROD-WRITE
risk_level: high
message: "Write to {query.writes_table.0.table.canonical} requires change ticket."
triggers:
all_of:
- kind: { in: [insert, update, delete, merge] }
- query.writes_table: { count: { gt: 0 } }
Paths that don't resolve render as the empty string — author messages defensively against optional fields. For relational paths (Vec<T>), index in with a numeric segment as above (query.writes_table.0.table.canonical) or use {witness.<sub-path>} from a per_witness rule.
Overriding Built-in Rules
A rule entry that omits triggers: becomes a partial override on the built-in rule with the same id. The built-in's predicate (triggers, emission, per_statement) is inherited verbatim; only the fields you set replace the built-in's.
# Downgrade the built-in SCHEMA-DROP from critical to low and reword the message.
# No `triggers:` — the built-in's `kind: drop_schema` predicate is inherited.
rules:
- id: SCHEMA-DROP
risk_level: low
message: "Schema dropped (downgraded per org policy — swap-and-drop is expected)."
Partial overrides can only set risk_level, message, and enabled — setting emission or per_statement without triggers: raises DisallowedFieldOnPartialOverride at load time. To change the predicate too, write a full override instead: include triggers: and the entire rule (still using the built-in's id) replaces the built-in wholesale.
# Full override — restates the built-in's id with a different predicate.
# The built-in is replaced entirely; nothing is inherited.
rules:
- id: SCHEMA-DROP
risk_level: high
message: "Schema drop in a non-ephemeral namespace."
triggers:
all_of:
- kind: drop_schema
- ddl.target.name.schema.normalized:
not_in: [EPHEMERAL, SANDBOX, SCRATCH]
Loader behaviour:
- Unknown id on a partial.
UnresolvedPartialOverride— fix the id, or convert the entry to a full rule by addingtriggers:. There is no "silent overlay against nothing" fallback. - Partial override under
--no-builtin.PartialOverrideWithoutBuiltins— there's no built-in corpus to inherit from. Either drop--no-builtinor restate the rule as a full entry. - Partial with no override-able fields set.
EmptyPartialOverride— at least one ofrisk_level/message/enabledis required so the entry does something.
Common Mistakes
triggers:as an array. Legacy shape. v1triggers:is one predicate object. Useall_of: [...]/any_of: [...]for multiple conditions.statement_type: GrantStatement. Legacy shape. v1 useskind: grant(snake_case, noStatementsuffix).categorical_signal: { category: SECURITY, ... }. Removed. v1 signals carryrule_iddirectly; the legacy(category, surface, condition)triple is gone. Predicate against typed fact fields instead.risk_level: High. Lowercase only —high. Same for the others.- Top-level
metadata:/name:/schema_version:. Not part of the v1 rule-file shape. Onlyrules:is recognized at the root.
Debugging
--explain-signals lists, per statement, which rules matched; --verbose adds a per-rule breakdown of the paths that matched and (for rules that almost matched) the paths that didn't:
lexega-sql analyze query.sql --custom-rules rules.yaml --explain-signals
Execution Order: Built-in rules and custom rules run through the same engine and produce the same signal shape. Customer entries with a built-in's
ideither fully replace the built-in (triggers:present) or partially overlay it (triggers:absent — see Overriding Built-in Rules); in both cases only one rule fires perid. After all rules fire, signals are deduplicated by(rule_id, risk_level)unless the rule setsper_statement: true. The policy layer is applied last to map signals toallow/warn/blockdecisions.
Need Help?
Can't find what you're looking for? Check out our GitHub or reach out to support.