Federated Queries
Run a single query that spans multiple databases. faz resolves dependencies, parallelises execution, and merges results in DuckDB.
A federated query is one logical request that touches multiple databases. faz takes a list of steps, runs each against its own connector, materialises the results into an in-memory DuckDB scratchpad, and runs an optional merge SQL to join the per-step tables. The whole thing goes through the safety pipeline.
The use case is "ask one question that requires data from two systems" — for example, joining order totals from Postgres with customer profiles from MongoDB without writing app-level glue.
When to use a federated query
Use it when:
- The data lives in two or more databases and you need them combined in one response.
- You want one safety-pipeline check covering the whole request, not separate checks per system.
- You want one audit log entry for the operation, not one per database.
If you can answer the question with a single query_simple call, do that. The federated path adds DuckDB merge overhead and is overkill for single-source queries.
The step model
A federated request is a list of steps. Each step looks like a query_simple request, plus three DAG fields:
{
"step_id": "s1",
"database": "<database-2>",
"table": "<collection>",
"language": "mql",
"query": "{\"operation\": \"find\", \"filter\": {}}",
"depends_on": ["s0"],
"link_from": "<link-column>",
"link_to": "<foreign-key>"
}| Field | Type | Required | Notes |
|---|---|---|---|
step_id | string | yes | Unique within the request. Convention: "s0", "s1", "s2". Used to refer to results in the merge. |
database | string | yes | Connector name from faz.yaml. |
table | string | yes | Declared target. Drives the per-table RBAC lookup. |
language | string | null | no | Inferred from the connector when omitted. |
query | string | yes | Native query for that connector — see MCP tools › query_simple for per-language formats. |
depends_on | list | no | Other step_ids this step waits for. Strict topological order: a step can only depend on steps that appear earlier in the list. |
link_from | string | null | no | Field name to extract from the dependency's results. |
link_to | string | null | no | Field in this step's query to substitute the extracted values into. |
Dependency resolution
depends_on builds a DAG. The IR builder validates:
- No cycles — caught by the topological-order check (a step can only reference earlier steps).
- No forward references — if
s1claimsdepends_on: ["s2"]ands2is later in the list, the request rejects with400. - No self-references — a step can't depend on itself.
- No unknown step IDs — if
s1claimsdepends_on: ["s9"]ands9doesn't exist, the request rejects with400.
Steps with no depends_on (or whose dependencies have completed) run in parallel. Steps wait for their dependencies to finish before kicking off.
Linking results between steps
link_from and link_to are the simple way to chain a query: take field values from one step's results and use them as a filter in the next step.
[
{
"step_id": "s0",
"database": "<database-1>",
"table": "<table-1>",
"query": "SELECT <link-column>, <other-column> FROM <table-1> WHERE <condition>"
},
{
"step_id": "s1",
"database": "<database-2>",
"table": "<collection>",
"language": "mql",
"query": "{\"operation\": \"find\", \"filter\": {}}",
"depends_on": ["s0"],
"link_from": "<link-column>",
"link_to": "<foreign-key>"
}
]s0 returns rows that include a <link-column> value. s1 then runs against MongoDB with its <foreign-key> filter populated from s0's <link-column> column. The agent doesn't have to construct the $in clause by hand — link_from / link_to does it.
If you don't use link_from/link_to, both steps run independently and the merge SQL is responsible for joining their results.
The merge
After every step has executed, faz loads each step's results into a DuckDB in-memory database under the step's id (s0, s1, ...) and runs the merge SQL against them. The result of the merge SQL is what gets returned to the caller.
"merge": "SELECT s1.<column>, s0.<other-column> FROM s0 JOIN s1 ON s0.<link-column> = s1.<foreign-key>"DuckDB is a real SQL engine — you can write CTEs, aggregations, window functions, anything DuckDB supports. The merge runs entirely in-process; no network hop.
The merge SQL is required when two or more steps return data. For single-step requests (or federated requests where only one step actually runs), the merge is skipped and the single step's results are returned directly.
The merge SQL is restricted to read-only operations — SELECT, CTEs, set operations. INSERT, UPDATE, DELETE, and DDL are rejected by the merge guard. The merge is a join layer, not a write layer.
A complete example
Joining rows from a relational database with documents from a document store:
{
"steps": [
{
"step_id": "s0",
"database": "<database-1>",
"table": "<table-1>",
"language": "sql",
"query": "SELECT <link-column>, <col-a>, <col-b> FROM <table-1> WHERE <condition>"
},
{
"step_id": "s1",
"database": "<database-2>",
"table": "<collection>",
"language": "mql",
"query": "{\"operation\": \"find\", \"filter\": {}}",
"depends_on": ["s0"],
"link_from": "<link-column>",
"link_to": "<foreign-key>"
}
],
"merge": "SELECT s1.<col-x>, s1.<col-y>, s0.<col-a>, s0.<col-b> FROM s0 JOIN s1 ON s0.<link-column> = s1.<foreign-key> ORDER BY s0.<col-a> DESC"
}Response:
{
"request_id": "req_a1b2c3d4e5f6",
"status": "ok",
"data": {
"columns": ["<col-x>", "<col-y>", "<col-a>", "<col-b>"],
"rows": [
{ "<col-x>": "<value>", "<col-y>": "<value>", "<col-a>": "<value>", "<col-b>": "<value>" }
],
"row_count": 42
},
"safety": {
"stages_passed": ["PROMPT_GUARD", "RBAC_GATE", "AST_CHECKER", "INJECTION_ANALYSER", "GUARDRAILS"],
"warnings": []
},
"metadata": {
"execution_time_ms": 78.3,
"transport": "rest/local",
"steps": [
{ "step_id": "s0", "database": "<database-1>", "rows_returned": 42, "latency_ms": 23.0 },
{ "step_id": "s1", "database": "<database-2>", "rows_returned": 42, "latency_ms": 31.5 }
],
"merge_latency_ms": 11.2
}
}Permissions and partial denial
Every step is independently checked by the RBAC gate. The default policy is DenyPolicy.PARTIAL: when some steps are allowed and some are denied, the denied steps are silently stripped and the rest of the request continues. The merge runs without the stripped step's results, which usually means an empty join on that side.
This is convenient — an agent that asked to join three databases doesn't get hard-failed because one of them is off-limits — but it's also a footgun for queries that semantically require all sources to succeed. If you need all-or-nothing, the DenyPolicy.BLOCK mode rejects the whole request when any step is denied; it isn't yet exposed via faz.yaml and is on the roadmap.
The audit log captures both requested and stripped lists so you can tell after the fact which sources contributed.
Pagination
If a federated query returns more rows than fit in the inline response (over the row cap from the safety: block), the result is stored in a scratchpad keyed by request_id. The first request_id is returned in the success envelope; pull subsequent pages via:
curl 'http://localhost:8787/v1/results/req_a1b2c3d4e5f6?page=2&size=500'{
"request_id": "req_a1b2c3d4e5f6",
"page": 2,
"size": 500,
"columns": [...],
"rows": [...],
"has_more": true
}Scratchpads are short-lived — they're cleared on a TTL by a background reaper. Pull pages soon after the initial query rather than holding the request_id for long.
Validation errors
The IR builder rejects malformed requests before any step runs. Common errors return 400:
'steps' must contain at least one step— emptystepslist.step missing 'step_id'— empty or absentstep_id.duplicate step_id: 's0'— two steps reusing the same id.step 's1' depends on itself— circular self-dependency.step 's1' depends on unknown or forward-referenced step 's2'— steps2either doesn't exist or appears later in the list.unknown database: '<typo>'. Configured: ['<database-1>', '<database-2>']— step'sdatabasedoesn't match any entry infaz.yaml.
Related
- MCP tools › query — the same shape over MCP.
- REST endpoints › POST /v1/query — the HTTP endpoint.
- Permissions — how
DenyPolicy.PARTIALstrips denied steps. - Audit log — the per-step decisions that get logged.