Batch Mode & Per-File Detail

Name: Lexega
Author: Lexega

Lexega operates in two modes depending on the input target: single-file mode and batch mode. Understanding the differences helps you choose the right invocation for your workflow.

Single-File Mode

When you point analyze at a single file (or pipe via --stdin), Lexega produces a complete report for that file alone.

lexega-sql analyze file.sql
lexega-sql analyze --dialect bigquery query.bq.sql
cat file.sql | lexega-sql analyze --stdin

What you get:

Full risk report for one file
All matching signals displayed inline
Evidence with line numbers and statement previews
Policy decision (allow/warn/block) if --policy is set

This is the mode you're most likely to use during development or in targeted CI checks.

Batch Mode

Pass a directory (with -r for recursive scanning) to analyze an entire codebase at once. Lexega scans all .sql files, aggregates results, and prints a summary.

# Scan current directory recursively
lexega-sql analyze . -r

# Scan a specific folder
lexega-sql analyze models/ -r

# With dialect and severity filter
lexega-sql analyze . -r --dialect bigquery --min-severity medium

Default batch output is a compact summary printed to stderr:

Batch Semantic Analysis Summary
==================
Files processed: 38/42
Files skipped: 4

Analyzed: 112 SQL statements
Unrecognized: 15 (opaque, no analysis)
Files with signals: 11
Total signals: 27
  By level: 2 CRITICAL, 5 HIGH, 12 MEDIUM, 8 LOW
Average signals per file: 2.5
Highest risk level: CRITICAL

Top risky files:
  models/admin/grants.sql - 4 signal(s), max CRITICAL
  models/staging/raw_load.sql - 3 signal(s), max HIGH
  ...

Top Matched Rules:
  MASK-DROP (v1_fact) - 3 occurrence(s)
  GRT-TO-PUBLIC (v1_fact) - 5 occurrence(s)
  ...

This summary tells you the overall health of the codebase without overwhelming detail.

The `--detail` Flag

When batch summary isn't enough, add --detail to see per-file signal breakdowns after the summary:

lexega-sql analyze . -r --detail
lexega-sql analyze . -r --detail --min-severity high

Text Output (default)

With --detail, the summary is followed by a per-file section:

Per-File Signal Details
══════════════════════════════════════════════════════════════

── models/staging/raw_load.sql (2 signals)
  [CRITICAL] TRUNCATE TABLE detected. All rows will be permanently deleted.
    ↳ Line 22 • `TBL-TRUNCATE` • "TRUNCATE TABLE raw_events_staging" • models/staging/raw_load.sql:22:1
  [HIGH] CREATE OR REPLACE TABLE detected. Existing table definition (and potentially data semantics) is replaced.
    ↳ Line 4 • `TBL-REPLACE` • "CREATE OR REPLACE TABLE raw_events AS SELECT id, payload FROM ext_source" • models/staging/raw_load.sql:4:1

── models/admin/grants.sql (2 signals)
  [CRITICAL] Masking Policy dropped. Column data protection removed. All columns using this policy will be unmasked.
    ↳ Line 12 • `MASK-DROP` • "DROP MASKING POLICY email_mask" • models/admin/grants.sql:12:1
  [CRITICAL] Network Policy dropped. Network access controls removed.
    ↳ Line 18 • `SNW-NETPOL-DROP` • "DROP NETWORK POLICY corp_policy" • models/admin/grants.sql:18:1

Each file shows its signals with severity level, message, and evidence (line numbers, rule IDs, statement previews, and source locations). Files with no signals are omitted.

Markdown Output

When combined with --format markdown (for PR comments), --detail adds a Per-File Signal Details section with tables:

lexega-sql analyze . -r --detail --format markdown

This produces per-file tables with level icons:

`grants.sql`

Level	Signal	Evidence
🔴 critical	Masking Policy dropped. Column data protection removed. All columns using this policy will be unmasked.	Line 12: `MASK-DROP`
🔴 critical	Network Policy dropped. Network access controls removed.	Line 18: `SNW-NETPOL-DROP`

Level icons: 🔴 Critical, 🟠 High, 🟡 Medium, 🟢 Low, ℹ️ Info.

JSON / YAML Output

JSON and YAML formats always include full per-file reports regardless of --detail. The flag only affects text and markdown output.

# Full per-file JSON (--detail not needed)
lexega-sql analyze . -r --format json -q > report.json

# YAML
lexega-sql analyze . -r --format yaml -q > report.yaml

The same applies to artifacts: a batch report written with --report-out always carries the full per-file reports, whatever the stdout format. This is what the dashboard ingests findings from:

# Quiet batch run; the artifact still contains per-file reports
lexega-sql analyze . -r -q --report-out .lexega/reports/

SARIF Output

SARIF is inherently per-file — each result carries its physical location (file path, line number, region). The --detail flag has no effect on SARIF output since the format already includes full detail by design.

# Single file → stdout
lexega-sql analyze file.sql --format sarif

# Single file → write to directory
lexega-sql analyze file.sql --format sarif --report-out .lexega/reports/

# Batch → produces batch_summary.sarif in the output directory
lexega-sql analyze . -r --format sarif --report-out .lexega/reports/

In batch mode, all signals across all files are combined into a single SARIF document. Each result entry includes the originating file path via locations[].physicalLocation, so tools like GitHub Code Scanning, VS Code SARIF Viewer, and other SARIF consumers can map findings back to exact source locations.

Key Differences: Single-File vs Batch

Aspect	Single-File	Batch
Input	One file or stdin	Directory with `-r`
Default output	Full report with all signals	Aggregate summary only
Per-file signals	Always shown	Only with `--detail`
JSON/YAML	Full report	Full per-file reports (always)
`--min-severity`	Filters signals shown	Filters signals per-file and in summary
Policy enforcement	Single allow/block decision	Per-file decisions, exit code 2 if any blocked
Embedded SQL	N/A	Use `--scan-embedded` for `.py`/`.ipynb` files

Common Workflows

Quick codebase health check

lexega-sql analyze . -r

Just the summary — how many files, signals, and what's the highest risk.

Pre-merge review with full detail

lexega-sql analyze models/ -r --detail --min-severity medium

See every medium-and-above signal, file by file.

CI pipeline with JSON artifact

lexega-sql analyze . -r --format json -q > report.json

Full structured data for dashboards or downstream processing.

PR comment with detail tables

lexega-sql analyze . -r --detail --format markdown --min-severity high

Markdown tables with per-file signal breakdowns, ready for GitHub/GitLab PR comments.

Policy enforcement across a repo

lexega-sql analyze . -r --policy policy.yaml --env prod

Each file gets its own allow/warn/block decision. Exit code 2 if any file is blocked.