It was 11 PM and I was copying fields from a Stripe API response into a Python file for the third time that week.

The response had 47 fields, nested objects three levels deep, optional fields, nullable values, and arrays of objects. I was manually typing Optional[str], List[Address], and @dataclass decorators by hand. By the time I got to the nested payment_method_details.card.wallet object, I'd already made two typos that mypy caught in CI.

That's when I decided: there has to be a way to generate Python dataclasses directly from the JSON response. After a few late nights, I found a workflow that saved me hours. Here's how it works.

Why Hand-Typing Dataclasses Is a Bad Idea

When you manually type a dataclass from a JSON payload, you're doing error-prone translation work. Every field name, type annotation, and nesting level is a chance to introduce a bug.

# What I typed by hand (wrong on first try)
@dataclass
class PaymentIntent:
    id: str
    amount: int
    currency: str
    status: str
    payment_method: str  # ← This is actually a nested object!
    metadata: Dict[str, str]
    created: int
    
# What the API actually returns
{
  "id": "pi_123",
  "amount": 2000,
  "currency": "usd",
  "status": "succeeded",
  "payment_method": {
    "id": "pm_456",
    "type": "card",
    "card": {
      "brand": "visa",
      "last4": "4242"
    }
  },
  "metadata": {"order_id": "ORD-789"},
  "created": 1711122333
}

I had payment_method as a str, but it's actually a nested object with its own fields. Mypy didn't catch this because the raw Dict fallback in deserialization masked the problem. The bug shipped to production and surfaced as an AttributeError in a background job at 3 AM.

The Core Mapping: JSON Types to Python Types

Before you can generate code automatically, you need a solid type mapping. Here's what I use:

JSON TypePython TypeNotes
stringstrAlways
number (integer)intCheck for "type": "integer" in schema
number (float)floatDefault if fractional
booleanboolUse field() for default handling
nullNone or Optional[T]Context-dependent
objectNested @dataclassRecursively generate
arrayList[T]Infer element type from first item
Array of primitivesList[str], List[int]etc.

The tricky part is detecting optional vs required fields. My rule: if a field is null or missing in the sample JSON, make it Optional.

Building the Generator

Here's a generator function I wrote that takes any JSON payload and outputs a Python dataclass:

import json
from typing import List, Optional, Dict, Any
from collections import OrderedDict

def generate_dataclass(json_data: dict, class_name: str = "Root") -> str:
    """Generate a Python dataclass from a JSON object."""
    lines = []
    lines.append(f"@dataclass")
    lines.append(f"class {class_name}:")
    
    fields = []
    for key, value in json_data.items():
        python_key = key.replace("-", "_").replace(".", "_")
        python_type, imports = infer_type(value, python_key)
        
        if value is None:
            fields.append(f"    {python_key}: Optional[{python_type}] = None")
        else:
            fields.append(f"    {python_key}: {python_type}")
    
    if not fields:
        lines.append("    pass")
    else:
        lines.extend(fields)
    
    return "\n".join(lines)

def infer_type(value: Any, name: str = "") -> tuple:
    """Infer Python type from a JSON value."""
    if value is None:
        return "Any", set()
    if isinstance(value, bool):
        return "bool", set()
    if isinstance(value, int):
        return "int", set()
    if isinstance(value, float):
        return "float", set()
    if isinstance(value, str):
        return "str", set()
    if isinstance(value, list):
        if not value:
            return "List[Any]", {"List"}
        # Infer from first item
        elem_type, _ = infer_type(value[0])
        return f"List[{elem_type}]", {"List"}
    if isinstance(value, dict):
        # Generate nested class
        nested_name = name.capitalize() if name else "Inner"
        nested_class = generate_dataclass(value, nested_name)
        return nested_name, set()
    return "Any", set()

This generates clean output like:

@dataclass
class PaymentMethod:
    id: str
    type: str
    card: Card

@dataclass
class Card:
    brand: str
    last4: str

@dataclass
class PaymentIntent:
    id: str
    amount: int
    currency: str
    status: str
    payment_method: PaymentMethod
    metadata: Dict[str, str]
    created: int

Handling Real-World Edge Cases

The basic generator works for simple cases, but real APIs throw curveballs. Here's how I handle them:

Snake_case vs camelCase

Most APIs return camelCase (firstName), but Python convention is snake_case (first_name). A good converter handles this automatically:

import re

def to_snake_case(name: str) -> str:
    """Convert camelCase or PascalCase to snake_case."""
    name = re.sub(r'([A-Z]+)([A-Z][a-z])', r'\1_\2', name)
    name = re.sub(r'([a-z0-9])([A-Z])', r'\1_\2', name)
    return name.lower()

Arrays of different object types

Some APIs return arrays where each object has a different shape. You can't generate a single accurate dataclass for that. I use Union types or Dict[str, Any] as a fallback:

if isinstance(value, list) and value:
    types = set(type(item).__name__ for item in value)
    if len(types) > 1:
        return "List[Dict[str, Any]]", {"List", "Dict"}

Fields named after Python keywords

APIs return fields named class, type, import, global. Python won't let you use those as attribute names. I append an underscore:

PY_KEYWORDS = {"class", "type", "import", "global", "def", "from", "in", "is", "and", "or", "not", "if", "else", "for", "while", "return", "None", "True", "False"}

def safe_field_name(name: str) -> str:
    name = to_snake_case(name)
    return name + "_" if name in PY_KEYWORDS else name

From Sample to Production-Ready Code

The generated code still needs some manual polish. Here's my checklist:

  1. Add field() for mutable defaults. Lists and dicts need field(default_factory=list) to avoid shared mutable defaults.
  2. Add a from_dict classmethod. This handles the JSON deserialization with nested dataclasses.
  3. Verify with mypy. Run mypy --strict on the generated code to catch any missed types.
@dataclass
class PaymentIntent:
    id: str
    amount: int
    currency: str
    status: str
    payment_method: PaymentMethod
    metadata: Dict[str, str] = field(default_factory=dict)
    created: int
    
    @classmethod
    def from_dict(cls, data: dict) -> "PaymentIntent":
        data = dict(data)  # Shallow copy
        data["payment_method"] = PaymentMethod.from_dict(data["payment_method"])
        return cls(**data)

I usually keep this from_dict method auto-generated too. The recursion handles nested objects automatically—each nested dataclass gets its own from_dict.

If you want to skip the manual step entirely, a good JSON Formatter with code generation can output Python dataclasses directly from any JSON payload. Just paste the API response, select "Python" as the target format, and it generates fully typed dataclasses with nested class support, snake_case conversion, and optional field detection.

FAQ

Q: Should I use dataclasses or Pydantic models for API responses?

A: Pydantic gives you runtime validation out of the box, which is great for external APIs. Dataclasses are lighter and faster, better for internal services where you control both ends. If you're code-generating, dataclasses are easier to generate cleanly.

Q: How does the generator handle fields that are sometimes a string and sometimes an object?

A: This is a common pain point. The safest approach is Union[str, SomeObject], but it makes usage awkward. I prefer to normalize the API response first (convert string IDs to objects) before generating the dataclass.

Q: What about nullable fields marked with ? in TypeScript?

A: Without a schema, you can't detect "intentionally present but null" vs "field doesn't exist." I default to Optional[T] if the sample JSON has null or missing keys, then manually adjust based on the actual API docs.

Q: Can I generate Python dataclasses from large API responses?

A: Yes, as long as the generator handles nesting. I've generated dataclasses for responses with 200+ fields across 15 nested classes. The key is class name deduplication—if two objects have the same structure, they should map to the same class.

Q: How do I handle recursive JSON structures (self-referencing objects)?

A: Recursive structures like a tree node with children: List[TreeNode] require manual intervention. The generator will infinitely recurse. I manually detect patterns (a field named the same as the parent class) and set a recursion depth limit.

Q: What's wrong with just using json.loads() and accessing dict keys directly?

A: Nothing for quick scripts. But for production code, you lose type safety, autocomplete, and refactoring support. When the API adds a new field, mypy won't tell you which code paths reference the old structure.

Q: Are there JavaScript/TypeScript equivalents of this approach?

A: Yes, you can generate TypeScript interfaces from JSON the same way. The JSON Formatter also supports TypeScript, Java, and Go output, not just Python.

Q: How do I handle date/time fields in generated code?

A: ISO 8601 strings are common in JSON. I add a configuration option to detect date-like string patterns (^\d{4}-\d{2}-\d{2}) and use datetime.date or datetime.datetime types, with a custom deserializer.


If you're still hand-typing dataclasses from API responses, give yourself a break. Drop your sample JSON into the JSON Formatter, switch to Python output, and get typed, nested dataclass code in seconds. One paste, one copy, done.