S SCHEMA.BIZ

Last updated:

JSON to Schema Generator

Paste any JSON and get a complete JSON Schema definition automatically. Detects types, string formats, nested objects, and arrays. Then refine with constraints, descriptions, and enum values.

JSON Data (input)
Generated JSON Schema (output)

What this tool does

Writing a JSON Schema from a blank editor is tedious; deriving one from a real payload you already have in front of you is fast and produces fewer transcription bugs. This generator inverts the usual workflow. Paste a JSON document — an API response, an event from your analytics pipeline, a sample config file, anything that round-trips through JSON.parse — and the tool walks the value tree and emits a JSON Schema that describes the shape it observed, complete with type narrowing, format detection, and proper nesting.

The generated schema is a starting point, not a finished artifact. Schema inference can only see what is in the sample: types come from the values present, required-ness comes from presence, and constraints come from format heuristics like "this string looks like an email address." The refinement panel below the output is where the schema becomes production-ready — toggling required versus optional, adding enums, tightening number ranges, and writing descriptions. The whole pipeline is browser-local; no payload, however sensitive, is transmitted anywhere.

When to use the JSON to Schema Generator

Drafting validation for an API endpoint you own. You have a handler that already builds the response object. Capture one real response, paste it in, refine the optional fields, and you have the schema to drop into your test suite or your gateway's request validation.

Reverse-engineering a vendor's response shape. The vendor's documentation is sparse, the OpenAPI spec is missing fields you can see in the actual responses, and you need a schema for your own SDK. Paste a few sample responses, merge them, and the generator infers a schema that matches reality rather than the marketing docs.

Building a contract from production log samples. Take a representative slice of recent payloads from your event pipeline, paste them all in, and let the multi-sample merge figure out which fields are universal and which are conditional. This is faster than reading the producer's source code and more accurate than reading the docs.

Bootstrapping a config-file schema. Many tools accept a config file with a sprawling shape and no schema. Paste a working config, refine the optional fields, and you have a schema to attach to "$schema" in the file so editors like VS Code can offer autocomplete and red-squiggle validation to the people writing it.

Walkthrough with a real example

Paste this analytics event payload into the generator:

{
  "event_id": "evt_01HQ8X9YZ4M2NK7PQRT5V3W6BJ",
  "event_type": "page_view",
  "occurred_at": "2026-04-29T18:42:11.000Z",
  "user": {
    "id": "u_938741",
    "email": "casey@example.com",
    "is_signed_in": true
  },
  "page": {
    "url": "https://schema.biz/database/designer/",
    "referrer": "https://www.google.com/",
    "title": "Free Visual Database Schema Designer"
  },
  "device": {
    "platform": "macos",
    "viewport": {
      "width": 1440,
      "height": 900
    }
  },
  "session_seconds": 287
}

The output schema has every nested object turned into its own object schema, formats detected for the obvious string types, and integer narrowing applied to fields whose sample values are whole numbers:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "required": [
    "event_id", "event_type", "occurred_at",
    "user", "page", "device", "session_seconds"
  ],
  "properties": {
    "event_id":     { "type": "string" },
    "event_type":   { "type": "string" },
    "occurred_at":  { "type": "string", "format": "date-time" },
    "user": {
      "type": "object",
      "required": ["id", "email", "is_signed_in"],
      "properties": {
        "id":           { "type": "string" },
        "email":        { "type": "string", "format": "email" },
        "is_signed_in": { "type": "boolean" }
      }
    },
    "page": {
      "type": "object",
      "required": ["url", "referrer", "title"],
      "properties": {
        "url":      { "type": "string", "format": "uri" },
        "referrer": { "type": "string", "format": "uri" },
        "title":    { "type": "string" }
      }
    },
    "device": {
      "type": "object",
      "required": ["platform", "viewport"],
      "properties": {
        "platform": { "type": "string" },
        "viewport": {
          "type": "object",
          "required": ["width", "height"],
          "properties": {
            "width":  { "type": "integer" },
            "height": { "type": "integer" }
          }
        }
      }
    },
    "session_seconds": { "type": "integer" }
  }
}

Walk through what the generator inferred. occurred_at matches the ISO 8601 date-time pattern, so it gets format: date-time; email matches the email regex; both URL fields get format: uri. session_seconds is a whole number in the sample, so it is narrowed from number to integer. The viewport object becomes a fully typed nested schema with its own required array. Every property in the sample is marked required by default — a strict starting point that you almost always want to relax in the refinement panel.

The refinement step is where the schema earns its keep. The generator cannot tell that event_type is one of a known set of values, but you can — open the field in the refinement panel and add an enum. Same with event_id: the format is a ULID, which the generator does not recognize as a built-in format, but you can add a regex pattern. And session_seconds deserves a sane upper bound so a buggy producer that emits a million-second session gets caught at the gateway:

// After refinement: enum on event_type, pattern on event_id, bounds on numbers
{
  "event_id": { "type": "string", "pattern": "^evt_[0-9A-Z]{26}$" },
  "event_type": {
    "type": "string",
    "enum": ["page_view", "click", "form_submit", "purchase"]
  },
  "session_seconds": { "type": "integer", "minimum": 0, "maximum": 86400 }
}

Run the refined schema against five more sample events using the JSON Schema Validator and any property that is sometimes missing in real traffic shows up as a validation failure on the "is required" rule. Walk those failures back into the schema as optional, and after two or three rounds you have a schema that matches the producer's real behavior rather than just one sample's behavior.

Schema inference concepts you should know

Type narrowing from sample values. JSON has only one numeric type, but JSON Schema distinguishes integer from number. The inference rule is straightforward: if every observed value for a numeric field is a whole number with no fractional part, the field is narrowed to integer. The risk is that a field whose values happen to be whole in your sample will be typed too strictly — a price field always at $1, $2, $3 in dev gets the integer treatment until a $1.99 row breaks it.

Format detection. The generator runs the standard JSON Schema format detectors (email, uri, date, date-time, time, ipv4, ipv6, uuid, hostname) on every string value. A match adds the format annotation. The detection is conservative — a string that almost-but-not-quite matches a format is left as a plain string rather than guessing. You can add format annotations manually in the refinement panel for domain-specific patterns the generator does not recognize.

Required by presence. Every property in a sample is marked required, on the principle that the safest default is the strictest one. This is wrong about half the time — many sampled payloads happen to include optional properties — so the refinement panel includes a per-field optional toggle. Multi-sample inference does this automatically: a property that is missing from any sample is downgraded to optional in the merged schema.

Array inference and union types. If every element of an array has the same type, the array's items schema is that single type. If elements vary, the inference falls back to oneOf covering each observed shape. For arrays of objects, the per-element schemas are merged the same way multi-sample object inference works — properties that appear in every element are required, others optional.

The reliability ceiling of one-sample inference. A single sample can tell you the shape of the value it actually contains, but cannot tell you which fields are conditional, what values an enum field accepts, what bounds are reasonable, or which formats apply to ambiguous strings. Treat one-sample output as the rough draft and use multi-sample input plus refinement to converge on a schema that matches the real range of the data.

The relationship between schema and data evolution. A schema generated today describes the data as it exists today. When the producer adds a field, your generated schema does not know about it; when the producer renames a field, your schema rejects the new payload. Treat the generator as a starting point for a schema you maintain in version control, not as a runtime inference layer that auto-adapts to producer changes.

Common mistakes

Treating one sample as the complete picture. A single event from the analytics pipeline cannot tell you which optional fields exist or what the full enum membership is. Sample at least a handful of payloads and use the multi-sample merge.

Shipping the schema with everything still marked required. The default-required behavior is correct as a starting point and dangerous as a final state. The first edge-case payload that omits a "required" optional field will fail validation noisily. Walk the refinement panel and downgrade the truly optional ones before publishing the schema.

Trusting format detection blindly. The generator labels a string with format: email if it matches the email regex. If your application's idea of an "email" is actually a username that occasionally contains an @ for legacy reasons, the format annotation is wrong. Audit each format the generator added before accepting it.

Skipping the constraint pass on numbers and strings. The inferred schema gives you types and required-ness but no minimum, maximum, minLength, maxLength, or pattern. A schema without bounds will accept negative ages, billion-character display names, and whitespace-only emails. Add the bounds the domain actually requires.

Not validating the generated schema against more samples. The fastest way to find inference mistakes is to run the schema against payloads it was not generated from. Drop the schema and ten more samples into the JSON Schema Validator; every failure is either a producer bug or a place the schema needs to be relaxed.

FAQ

How does the generator decide between 'number' and 'integer'?

If every numeric value in the sample is a whole number, the field is typed as integer; if any value has a fractional component, it is typed as number. This is a sample-driven decision, so a price field that happens to round to whole dollars in your sample will be typed as integer until a $4.99 row is included.

What happens with null values in the input?

A property whose only observed value is null is typed as null exclusively, which is rarely what you want. The generator surfaces this as a warning so you can either remove the property from the sample or refine the type to a union like ["string", "null"]. Multi-sample inference handles this better — a property that is sometimes a string and sometimes null is correctly typed as a nullable string.

Can the generator handle JSON with comments (JSONC) or trailing commas?

The parser uses a forgiving mode that strips line and block comments and tolerates trailing commas in arrays and objects. The generated schema is still strict JSON Schema with no JSONC artifacts, so you can paste configuration files in their authored form and get a valid schema out the other side.

Can I generate one schema from multiple JSON samples?

Yes — paste multiple top-level JSON values, and the generator merges them into a single schema. Properties that appear in every sample are required; properties that appear in some are optional. Type unions form when the same property has different types across samples. This is the recommended workflow when reverse-engineering a real-world API.

Which JSON Schema draft does it emit by default, and can I change it?

Draft 2020-12 by default, the latest stable draft. The dropdown lets you target Draft 4, 6, 7, or 2019-09 instead. The output adjusts the $schema URI and falls back to syntax compatible with the chosen draft — for example, it uses definitions instead of $defs on Draft 7 and earlier.

Related tools and guides

  • JSON Schema Validator — once the generator emits a schema, validate it against more sample payloads to find places the inference was too strict or too loose. The two tools are designed to be used in a tight loop.
  • Schema Format Converter — turn the generated JSON Schema into TypeScript, Zod, Pydantic, GraphQL, or Go for the consumer side of your stack.
  • Mock Data Generator — generate additional sample payloads from the inferred schema so you can pressure-test it without waiting for real traffic.
  • Complete Guide to JSON Schema — the full vocabulary the generator emits, including the constraints and patterns you will add during refinement.
  • API Versioning Guide — when the producer changes the payload shape, the schema is part of your contract; this guide covers the patterns for evolving it without breaking consumers.