Last updated:
Schema Audit Tool
Paste your full HTML page source to discover and audit all structured data — JSON-LD, Microdata, and RDFa. Get validation results, rich result eligibility, AI search readiness scores, missing opportunities, and quick wins in one report.
Accepts full HTML source (detects JSON-LD, Microdata, and RDFa) or standalone JSON-LD snippets.
What this tool does
Most pages with structured data have it half-implemented. There is an Article schema, but it is missing the publisher logo Google needs for the rich result; there is a Product schema, but no offer block with price and currency; the page renders an FAQ section in HTML, but nobody added the FAQPage schema to make it eligible for the rich result. A line-by-line schema validator catches the syntax errors. What you actually need to know is broader: which schemas are present, which are missing, which are incomplete, which would qualify for which rich results if you fixed one or two fields, and what the biggest leverage moves are. That is what this audit produces.
Paste a complete HTML page source and the audit walks the page in four passes. Format detection finds every JSON-LD block, every Microdata itemscope, and every RDFa vocab declaration. Validation runs each schema against schema.org and Google's rich result rules. Completeness scoring measures how many recommended properties are populated for each detected type. Opportunity detection scans the surrounding HTML for patterns that suggest schema types that should exist but do not — breadcrumb navigation without a BreadcrumbList, an FAQ section without a FAQPage, a recipe without a Recipe schema. Everything happens locally; nothing is uploaded.
When to use the Schema Audit
Quarterly schema maintenance pass. Schemas drift. A template that produces clean structured data today produces under-populated markup six months later as the content model evolves. A scheduled audit on each template catches drift before it costs eligibility.
After a major template redesign. Template rewrites are the most common cause of schema regressions — the old template emitted FAQPage and BreadcrumbList schemas, the new template kept only the Article. Run the audit on the new template before merging the redesign and the regressions are visible.
Reverse-engineering a competitor's markup. A competitor consistently appears with rich results you do not have. Audit one of their pages and the report will tell you exactly which schemas they implement and how completely. The opportunity detection tells you which patterns on your own pages would produce the same eligibility.
The discovery phase of an SEO engagement. When you take over a site for the first time, the audit gives you a one-page summary of where structured data stands across each template. The quick-wins prioritization gives the first sprint's worth of work without a separate planning phase.
Walkthrough with a real example
Paste this page source — a blog article with an Article schema, a visible breadcrumb nav, and a visible FAQ section, but no BreadcrumbList or FAQPage schemas:
<!doctype html>
<html>
<head>
<title>Why we rebuilt our checkout flow | Example Co Blog</title>
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Article",
"headline": "Why we rebuilt our checkout flow",
"datePublished": "2026-03-14",
"author": { "@type": "Person", "name": "Sam Patel" }
}
</script>
</head>
<body>
<nav aria-label="breadcrumb">
<a href="/">Home</a> / <a href="/blog">Blog</a> / Why we rebuilt
</nav>
<article>
<h1>Why we rebuilt our checkout flow</h1>
<p>Posted March 14, 2026 by Sam Patel</p>
<img src="/posts/checkout-rebuild.jpg" />
<h2>Frequently asked questions</h2>
<h3>Why did you rebuild instead of refactoring?</h3>
<p>The legacy code path had three branches that all needed...</p>
<h3>How long did the rebuild take?</h3>
<p>Eleven weeks from kickoff to full rollout...</p>
</article>
</body>
</html> The audit reports three categories of findings:
Article (existing schema, found in <script type="application/ld+json">)
✗ Missing required: image
✗ Missing recommended: dateModified
✗ Missing recommended: publisher
✗ Missing recommended: mainEntityOfPage
⚠ Author is a Person but missing url for entity linking
Completeness: 45% (4/9 recommended properties present)
AI Search Readiness: 28/100 (no sameAs, thin entity signals, no freshness)
Detected in HTML, missing as structured data:
+ BreadcrumbList — <nav aria-label="breadcrumb"> with 3 items detected
+ FAQPage — 2 H3 questions with answer paragraphs detected
Quick Wins (effort: minimal):
1. Add image as absolute URL: "https://example.com/posts/checkout-rebuild.jpg"
2. Add dateModified: "2026-03-14T09:00:00-04:00"
3. Add publisher Organization with logo
4. Add BreadcrumbList schema for the existing breadcrumb nav
5. Add FAQPage schema for the existing Q/A section
The existing Article schema validates as JSON but is incomplete: missing image (a hard requirement for the Article rich result), missing dateModified (recommended and important for AI search freshness), missing publisher (required for the brand attribution that surfaces in the rich result), missing mainEntityOfPage (recommended for resolving the article-page identity). The completeness score lands at 45%, the AI readiness score at 28 out of 100 because the entity has no sameAs links to disambiguate the publisher, no canonical URL on the article, and no freshness signals beyond a single date.
The opportunity detector noticed two patterns the visible HTML expresses but the structured data does not. The breadcrumb nav element is identifiable both by its aria-label="breadcrumb" and by the slash-separated link structure inside it; the audit recommends a BreadcrumbList schema with the three items it parsed out. The H2 followed by H3+paragraph pattern in the FAQ section matches the FAQ template; the audit recommends a FAQPage schema with the two questions it extracted. Both are easy rich-result wins because the content is already there.
The quick-wins panel sorts the recommendations by effort. The minimal-effort items go first because they are the highest-leverage: adding an image URL is two lines of edit work and unblocks the Article rich result entirely. The new schemas (BreadcrumbList, FAQPage) come next; they require generating the JSON-LD but the source content is unchanged. Recommended properties that need new content (dateModified, publisher details) are listed but not in the quick-wins band.
Schema audit concepts you should know
Validation versus completeness versus opportunity detection. A validator confirms the schema is well-formed. A completeness check measures how many recommended properties are populated. Opportunity detection looks at the page contents and suggests schema types that should exist. The three layers answer different questions: "is the schema valid?" / "is it as good as it could be?" / "are there schemas missing?" — and a serious audit needs all three.
Rich result eligibility versus AI search readiness. Rich result eligibility is a binary: does the schema satisfy Google's documented requirements for a given rich result? AI search readiness is a score: how clearly does the schema communicate the entity, its relationships, and its freshness to AI engines like ChatGPT Search, Perplexity, and Google's AI Overviews? A schema can be rich-result eligible but AI-thin (an Article with author and image but no sameAs on the author, no datePublished/dateModified pair, no mainEntityOfPage). The audit reports both.
The four output dimensions. Per-schema validation results (errors, warnings, passed checks). Per-schema completeness percentage. Per-page missing schema opportunities. Cross-cutting quick wins, sorted by effort. Reading them in that order — validate first, complete next, expand third, prioritize fourth — produces a roadmap that matches the underlying difficulty of the work.
How missing-opportunity inference works. The audit pattern-matches HTML structures against templates that imply specific schema types. A nav element with breadcrumb-shaped links suggests BreadcrumbList. An H2 "FAQ" or "Frequently asked questions" followed by H3+paragraph pairs suggests FAQPage. A page with an itemtype-rich product description suggests Product. The detection is conservative — it only fires on patterns that are unambiguous — to avoid noisy false-positive recommendations.
Freshness signals. AI search engines weight freshness heavily. A schema with datePublished alone signals the original publication. A schema with both datePublished and dateModified, where dateModified is recent, signals active maintenance. A schema with neither signals an unmaintained page. The completeness score weights freshness fields explicitly because they punch above their weight in AI ranking.
Cross-format coexistence. Pages can host JSON-LD, Microdata, and RDFa simultaneously — and many do, because plugins inject one format while the theme injects another. The audit surfaces all three. When the same entity is described in multiple formats with different data, that is a real audit finding (which one does Google trust?), and the audit flags it.
Common mistakes
Auditing one page and assuming the rest of the site is fine. Schemas are template-level. An audit on the homepage tells you nothing about the product template. Run the audit on every distinct template, not on a single sample page.
Validating without measuring completeness. A schema can pass validation while populating only the bare minimum of fields. The rich result will not appear, even though the schema is "valid." Check the completeness score, not just the validation verdict.
Ignoring AI search signals. Optimization for traditional rich results and optimization for AI engines overlap but are not identical. A schema that is rich-result-eligible but AI-thin will hold its SERP rich result and lose to better-marked-up competitors when users move to ChatGPT Search or Perplexity. The AI score is not optional anymore.
Not refreshing dateModified. A page that was edited yesterday but still shows last year's dateModified looks unmaintained to both Google and AI engines. Bind dateModified to the underlying content's actual modification time, not to the page's deployment time.
Letting plugin-generated schemas accumulate alongside hand-coded ones. A CMS plugin injects an Organization schema, the developer adds another, and now there are two competing entity descriptions. Audit for duplicates, decide which is canonical, and remove the other.
FAQ
Does this audit my whole site, or only the page I paste in?
One page per audit. The tool runs entirely in the browser and has no crawler component, so site-wide audits mean running multiple pastes — typically one for each template (homepage, blog post, product, category, search results). Most site-wide schema problems are template-level, so a five-template audit covers what a thousand-page crawl would surface.
Can I audit pages behind authentication?
Yes — paste the rendered HTML directly. After signing in to the page yourself, view the page source (right-click → View Page Source, or Ctrl+U / Cmd+U), select all, copy, paste into the input. Since nothing is sent to a server, the page contents stay on your machine even though the page is behind auth.
Does it handle JSON-LD that is injected by JavaScript after the page loads?
If you paste the static page source, only schemas present at server-render time are visible. To audit hydration-time schemas, open DevTools, copy the rendered HTML from the Elements panel after the page settles, and paste that. The audit then sees the post-hydration DOM the way Google's rendered HTML pipeline does.
How is this different from running a full SEO crawler?
An SEO crawler walks every URL on a site, captures titles and meta tags, finds broken links, and produces a sitemap-level view. The schema audit goes deep on a single page's structured data: completeness scoring, missing-type inference from HTML patterns, AI readiness, and quick-win prioritization. Use both — a crawler for site coverage, this audit for per-template depth.
How often should the audit be rerun?
After every template change, after every Google rich-result feature launch, and on a quarterly cadence as a maintenance pass. Schemas drift when content evolves — a template that produces clean Article markup today can drift into producing under-populated markup six months later as the surrounding content changes shape. The quarterly run catches drift before it costs visibility.
Related tools and guides
- Schema Markup Validator — when the audit reports a validation error on a specific schema, lift the schema out and validate it in isolation to iterate on the fix.
- AI Search Optimizer — the audit gives you the AI readiness score; the optimizer takes a single schema and walks you through the changes that raise it.
- JSON-LD Generator — the cleanest way to add the schemas the audit's opportunity detector flagged is to generate them from a form rather than hand-author the JSON.
- Schema Markup Beginner's Guide — covers the foundational concepts the audit assumes: types, properties, JSON-LD structure, and how Google uses them.
- Structured Data for AI Search — the optimization layer the audit's AI readiness score is built around: which signals AI engines actually weight and how to populate them.