faz
Usage

Federated Queries

Run a single query that spans multiple databases. faz resolves dependencies, parallelises execution, and merges results in DuckDB.

A federated query is one logical request that touches multiple databases. faz takes a list of steps, runs each against its own connector, materialises the results into an in-memory DuckDB scratchpad, and runs an optional merge SQL to join the per-step tables. The whole thing goes through the safety pipeline.

The use case is "ask one question that requires data from two systems" — for example, joining order totals from Postgres with customer profiles from MongoDB without writing app-level glue.

When to use a federated query

Use it when:

  • The data lives in two or more databases and you need them combined in one response.
  • You want one safety-pipeline check covering the whole request, not separate checks per system.
  • You want one audit log entry for the operation, not one per database.

If you can answer the question with a single query_simple call, do that. The federated path adds DuckDB merge overhead and is overkill for single-source queries.

The step model

A federated request is a list of steps. Each step looks like a query_simple request, plus three DAG fields:

{
  "step_id": "s1",
  "database": "<database-2>",
  "table": "<collection>",
  "language": "mql",
  "query": "{\"operation\": \"find\", \"filter\": {}}",
  "depends_on": ["s0"],
  "link_from": "<link-column>",
  "link_to": "<foreign-key>"
}
FieldTypeRequiredNotes
step_idstringyesUnique within the request. Convention: "s0", "s1", "s2". Used to refer to results in the merge.
databasestringyesConnector name from faz.yaml.
tablestringyesDeclared target. Drives the per-table RBAC lookup.
languagestring | nullnoInferred from the connector when omitted.
querystringyesNative query for that connector — see MCP tools › query_simple for per-language formats.
depends_onlistnoOther step_ids this step waits for. Strict topological order: a step can only depend on steps that appear earlier in the list.
link_fromstring | nullnoField name to extract from the dependency's results.
link_tostring | nullnoField in this step's query to substitute the extracted values into.

Dependency resolution

depends_on builds a DAG. The IR builder validates:

  • No cycles — caught by the topological-order check (a step can only reference earlier steps).
  • No forward references — if s1 claims depends_on: ["s2"] and s2 is later in the list, the request rejects with 400.
  • No self-references — a step can't depend on itself.
  • No unknown step IDs — if s1 claims depends_on: ["s9"] and s9 doesn't exist, the request rejects with 400.

Steps with no depends_on (or whose dependencies have completed) run in parallel. Steps wait for their dependencies to finish before kicking off.

Linking results between steps

link_from and link_to are the simple way to chain a query: take field values from one step's results and use them as a filter in the next step.

[
  {
    "step_id": "s0",
    "database": "<database-1>",
    "table": "<table-1>",
    "query": "SELECT <link-column>, <other-column> FROM <table-1> WHERE <condition>"
  },
  {
    "step_id": "s1",
    "database": "<database-2>",
    "table": "<collection>",
    "language": "mql",
    "query": "{\"operation\": \"find\", \"filter\": {}}",
    "depends_on": ["s0"],
    "link_from": "<link-column>",
    "link_to": "<foreign-key>"
  }
]

s0 returns rows that include a <link-column> value. s1 then runs against MongoDB with its <foreign-key> filter populated from s0's <link-column> column. The agent doesn't have to construct the $in clause by hand — link_from / link_to does it.

If you don't use link_from/link_to, both steps run independently and the merge SQL is responsible for joining their results.

The merge

After every step has executed, faz loads each step's results into a DuckDB in-memory database under the step's id (s0, s1, ...) and runs the merge SQL against them. The result of the merge SQL is what gets returned to the caller.

"merge": "SELECT s1.<column>, s0.<other-column> FROM s0 JOIN s1 ON s0.<link-column> = s1.<foreign-key>"

DuckDB is a real SQL engine — you can write CTEs, aggregations, window functions, anything DuckDB supports. The merge runs entirely in-process; no network hop.

The merge SQL is required when two or more steps return data. For single-step requests (or federated requests where only one step actually runs), the merge is skipped and the single step's results are returned directly.

The merge SQL is restricted to read-only operations — SELECT, CTEs, set operations. INSERT, UPDATE, DELETE, and DDL are rejected by the merge guard. The merge is a join layer, not a write layer.

A complete example

Joining rows from a relational database with documents from a document store:

POST /v1/query
{
  "steps": [
    {
      "step_id": "s0",
      "database": "<database-1>",
      "table": "<table-1>",
      "language": "sql",
      "query": "SELECT <link-column>, <col-a>, <col-b> FROM <table-1> WHERE <condition>"
    },
    {
      "step_id": "s1",
      "database": "<database-2>",
      "table": "<collection>",
      "language": "mql",
      "query": "{\"operation\": \"find\", \"filter\": {}}",
      "depends_on": ["s0"],
      "link_from": "<link-column>",
      "link_to": "<foreign-key>"
    }
  ],
  "merge": "SELECT s1.<col-x>, s1.<col-y>, s0.<col-a>, s0.<col-b> FROM s0 JOIN s1 ON s0.<link-column> = s1.<foreign-key> ORDER BY s0.<col-a> DESC"
}

Response:

{
  "request_id": "req_a1b2c3d4e5f6",
  "status": "ok",
  "data": {
    "columns": ["<col-x>", "<col-y>", "<col-a>", "<col-b>"],
    "rows": [
      { "<col-x>": "<value>", "<col-y>": "<value>", "<col-a>": "<value>", "<col-b>": "<value>" }
    ],
    "row_count": 42
  },
  "safety": {
    "stages_passed": ["PROMPT_GUARD", "RBAC_GATE", "AST_CHECKER", "INJECTION_ANALYSER", "GUARDRAILS"],
    "warnings": []
  },
  "metadata": {
    "execution_time_ms": 78.3,
    "transport": "rest/local",
    "steps": [
      { "step_id": "s0", "database": "<database-1>", "rows_returned": 42, "latency_ms": 23.0 },
      { "step_id": "s1", "database": "<database-2>", "rows_returned": 42, "latency_ms": 31.5 }
    ],
    "merge_latency_ms": 11.2
  }
}

Permissions and partial denial

Every step is independently checked by the RBAC gate. The default policy is DenyPolicy.PARTIAL: when some steps are allowed and some are denied, the denied steps are silently stripped and the rest of the request continues. The merge runs without the stripped step's results, which usually means an empty join on that side.

This is convenient — an agent that asked to join three databases doesn't get hard-failed because one of them is off-limits — but it's also a footgun for queries that semantically require all sources to succeed. If you need all-or-nothing, the DenyPolicy.BLOCK mode rejects the whole request when any step is denied; it isn't yet exposed via faz.yaml and is on the roadmap.

The audit log captures both requested and stripped lists so you can tell after the fact which sources contributed.

Pagination

If a federated query returns more rows than fit in the inline response (over the row cap from the safety: block), the result is stored in a scratchpad keyed by request_id. The first request_id is returned in the success envelope; pull subsequent pages via:

curl 'http://localhost:8787/v1/results/req_a1b2c3d4e5f6?page=2&size=500'
{
  "request_id": "req_a1b2c3d4e5f6",
  "page": 2,
  "size": 500,
  "columns": [...],
  "rows": [...],
  "has_more": true
}

Scratchpads are short-lived — they're cleared on a TTL by a background reaper. Pull pages soon after the initial query rather than holding the request_id for long.

Validation errors

The IR builder rejects malformed requests before any step runs. Common errors return 400:

  • 'steps' must contain at least one step — empty steps list.
  • step missing 'step_id' — empty or absent step_id.
  • duplicate step_id: 's0' — two steps reusing the same id.
  • step 's1' depends on itself — circular self-dependency.
  • step 's1' depends on unknown or forward-referenced step 's2' — step s2 either doesn't exist or appears later in the list.
  • unknown database: '<typo>'. Configured: ['<database-1>', '<database-2>'] — step's database doesn't match any entry in faz.yaml.

On this page