S SCHEMA.BIZ

Last updated:

SQL Query Visualizer

Paste your database schema and a SELECT query to visualize which tables, columns, and joins are involved. Referenced elements are highlighted in blue; uninvolved tables are grayed out.

What this tool does

Reading a multi-table SELECT and holding the implied data flow in your head is a skill that breaks down somewhere around the fifth join. The Query Visualizer takes the schema you paste in and the query you are trying to understand, parses both with a deterministic SQL grammar, and projects the result onto an interactive entity-relationship diagram. Tables touched by the query light up; columns referenced in the SELECT, JOIN, WHERE, and GROUP BY clauses get individual highlights; everything else fades into the background so the active surface area of the query becomes obvious at a glance.

The whole pipeline runs in your browser. No driver is loaded, no schema is uploaded, no query is executed. That matters because the most useful queries to visualize are usually the most sensitive ones: production reporting queries, half-finished migrations, customer-data extracts. You can paste a schema with personally identifiable column names or a query that joins your billing tables, close the tab, and have left no trace anywhere outside the browser process. The same pipeline that handles a textbook three-table join handles a fifteen-table reporting query without any setup, signup, or credit card.

When to use the Query Visualizer

Reviewing a teammate's pull request that adds a new reporting query. The diff shows the SQL but not which tables are now coupled to which. Pasting the query into the visualizer surfaces hidden joins and confirms whether the query honors the access boundaries your data model is supposed to enforce.

Diagnosing a query that returns the wrong number of rows. When a count comes back too high or too low, the cause is almost always a join condition that creates duplicates or a WHERE filter that silently converts an outer join into an inner one. Seeing the join graph and the conditions side-by-side makes the failure mode jump out.

Onboarding to a database you have never seen before. The fastest way to learn a new schema is to read its real queries. Visualizing a handful of representative reporting queries shows which tables are central, which are leaf nodes, and which fact tables fan out to which dimensions.

Pruning queries before they hit a slow path. A query that joins eight tables but only uses columns from three is a refactor candidate. The visualizer makes it trivial to see which joins contribute to the output and which exist only to satisfy a WHERE filter that could be rewritten as an EXISTS subquery.

Walkthrough with a real example

Paste the following schema into the left pane:

CREATE TABLE customers (
  id           BIGSERIAL PRIMARY KEY,
  email        TEXT NOT NULL UNIQUE,
  signup_date  DATE NOT NULL
);

CREATE TABLE orders (
  id            BIGSERIAL PRIMARY KEY,
  customer_id   BIGINT NOT NULL REFERENCES customers(id),
  status        TEXT NOT NULL,
  placed_at     TIMESTAMPTZ NOT NULL,
  total_cents   INTEGER NOT NULL
);

CREATE TABLE order_items (
  id          BIGSERIAL PRIMARY KEY,
  order_id    BIGINT NOT NULL REFERENCES orders(id),
  product_id  BIGINT NOT NULL,
  quantity    INTEGER NOT NULL
);

CREATE TABLE refunds (
  id          BIGSERIAL PRIMARY KEY,
  order_id    BIGINT NOT NULL REFERENCES orders(id),
  reason      TEXT,
  amount_cents INTEGER NOT NULL
);

Then paste this query, which calculates lifetime net revenue per high-value customer:

SELECT
  c.id,
  c.email,
  COUNT(DISTINCT o.id) AS lifetime_orders,
  SUM(o.total_cents) - COALESCE(SUM(r.amount_cents), 0) AS net_cents
FROM customers c
LEFT JOIN orders o
  ON o.customer_id = c.id
  AND o.status = 'completed'
LEFT JOIN refunds r
  ON r.order_id = o.id
WHERE c.signup_date >= '2025-01-01'
GROUP BY c.id, c.email
HAVING SUM(o.total_cents) > 50000
ORDER BY net_cents DESC
LIMIT 100;

The diagram immediately lights up customers, orders, and refunds as the active set, leaving order_items grayed out — visually confirming that this query does not need the line items table even though it is part of the same business entity. The edge from orders to customers is labeled LEFT JOIN ON o.customer_id = c.id AND o.status = 'completed', and the edge from refunds to orders is labeled with its own ON clause. Inside each node, only the columns the query references are highlighted: id, email, and signup_date on customers; customer_id, status, and total_cents on orders; order_id and amount_cents on refunds.

The analysis panel breaks the query into its parts: three tables (with their aliases), four projected expressions (two raw columns, two aggregates), two join clauses, a WHERE predicate on signup_date, a GROUP BY on the customer identity columns, a HAVING filter on aggregated revenue, and a LIMIT. The performance notes flag two things to think about: the LEFT JOIN to refunds can fan out a customer's row across multiple refunds before aggregation, which is why the SUM uses COALESCE on the refunds side; and the HAVING clause filters on a SUM that can only be computed after the joins resolve, so an index on orders.total_cents alone will not help.

SQL concepts you should know

Join types and what they preserve. An INNER JOIN returns only rows where the ON predicate is satisfied on both sides. A LEFT JOIN preserves every row from the left input and pads the right side with NULLs when no match exists; RIGHT JOIN does the mirror. FULL OUTER JOIN preserves unmatched rows from both sides. CROSS JOIN produces every pair, with no condition — almost always a mistake outside of generating ranges or pivoting. The visualizer labels each edge with its type so the preservation semantics are explicit.

The ON clause vs the WHERE clause. Predicates in the ON clause are part of the join itself and run before the rows are combined. Predicates in the WHERE clause run after. For an INNER JOIN this distinction is invisible — the optimizer can move predicates between the two. For an OUTER JOIN it is decisive: a WHERE predicate that references the nullable side filters out the NULL-padded rows the OUTER JOIN was supposed to preserve, collapsing it into an INNER JOIN. This is the single most common cause of "my LEFT JOIN is acting like an INNER JOIN" bug reports.

Subqueries vs CTEs vs derived tables. A subquery in the SELECT or WHERE returns a value or a set of values per outer row. A derived table is a subquery in the FROM clause that acts as an inline relation. A CTE (the WITH clause) is the same idea, but named and reusable within the query. Modern optimizers usually inline non-recursive CTEs into the surrounding query, so the choice between a CTE and a derived table is mostly about readability — except in PostgreSQL versions before 12, where CTEs were optimization fences.

The logical execution order of a query. SQL reads top-to-bottom but executes in a different order: FROM and JOIN first, then WHERE, then GROUP BY, then HAVING, then SELECT, then DISTINCT, then ORDER BY, then LIMIT. That is why a column alias defined in the SELECT cannot be referenced in the WHERE — the WHERE runs first. It is also why HAVING can filter on aggregates but WHERE cannot. Internalizing this order is the single biggest unlock for writing correct queries on the first try.

Common mistakes

Filtering the nullable side of a LEFT JOIN in the WHERE clause. The classic outer-join collapse:

-- Looks like a LEFT JOIN, behaves like an INNER JOIN
SELECT u.id, p.title
FROM users u
LEFT JOIN posts p ON p.user_id = u.id
WHERE p.published_at > '2025-01-01';

Customers with no posts disappear because p.published_at is NULL for them, and NULL fails the comparison. Move the predicate into the ON clause and the LEFT JOIN behaves as intended:

-- Predicate moved into the ON clause: LEFT JOIN preserved
SELECT u.id, p.title
FROM users u
LEFT JOIN posts p
  ON p.user_id = u.id
  AND p.published_at > '2025-01-01';

Forgetting the ON clause and getting a Cartesian product. Writing FROM users, orders without a join condition multiplies row counts together. On a small dev database it returns slowly; on production it can exhaust memory.

Aggregating before joining and doubling counts. Joining to a table that has multiple matching rows per parent row, then doing a SUM or COUNT, multiplies the parent's other columns. Aggregate first in a subquery or CTE, then join the aggregate to the parent.

Using SELECT * in production code. It hides which columns are actually used, breaks downstream consumers when the schema changes, and makes the visualizer's column-highlighting useless because every column is "referenced." Name the columns explicitly.

Assuming join order matches execution order. The optimizer reorders joins based on statistics, not on the order you wrote them. Putting a small filtering table first in the FROM clause is a hint, not a guarantee — and on modern planners, often not even a hint.

FAQ

Does the visualizer execute my query against a real database?

No. The tool parses your CREATE TABLE statements and the SELECT query as text and renders relationships from the parsed AST. Nothing is sent to a server, no driver is loaded, and no database connection is opened. You can paste production schemas without exposing them to anything outside your browser tab.

Which SQL dialects are supported for the query input?

The parser accepts standard ANSI SQL plus dialect-specific touches from PostgreSQL, MySQL, SQLite, and SQL Server: backticked or bracketed identifiers, RETURNING clauses, window function syntax, and common JOIN variants. Vendor extensions like Postgres LATERAL joins or MySQL STRAIGHT_JOIN are recognized but rendered as standard joins.

Can it visualize Common Table Expressions (CTEs) and subqueries?

Yes. Each CTE is treated as a named intermediate relation that participates in the diagram, with its own column projection. Inline subqueries in the SELECT or WHERE clauses are listed in the analysis panel under their parent table reference, so you can see what each correlated subquery touches.

How does it handle UNION, INTERSECT, and set operations?

Set operations are flattened into a list of subqueries that contribute rows to a single output. The diagram highlights every table referenced across all branches of the UNION, and the analysis panel labels the per-branch projections so you can confirm the column lists match in type and order.

What about views and materialized views?

If you paste the CREATE VIEW statement alongside your tables, the parser inlines the view definition and treats the view as a virtual table with its derived columns. Materialized views are rendered the same way; the visualizer cannot tell them apart from regular views without a refresh schedule, which is fine for query analysis.

Related tools and guides

  • SQL to Diagram — when you have CREATE TABLE statements but no query yet, this tool gives you the bare ER diagram. Use it for documentation and onboarding before you start writing queries.
  • Visual Schema Designer — design a new schema from scratch with drag-and-drop tables and relationships, then export DDL for any major dialect when you are ready to run migrations.
  • Schema Diff — compare two versions of a schema and generate the migration to go from one to the other, with safety checks that flag destructive operations.
  • Database Design Fundamentals — the underlying theory behind normalization, key selection, and indexing strategies that make queries fast in the first place.
  • API Versioning Guide — when your queries back an API, the schema changes you make are also API contract changes; this guide covers how to evolve them without breaking clients.