Parse YAML in Python — PyYAML vs ruamel.yaml vs oyaml

If you write Python and work with YAML, you have three library choices — and picking the wrong one costs you time, security, or data integrity.

import yaml           # PyYAML — the standard
from ruamel import yaml  # ruamel.yaml — the round-trip king
import oyaml          # oyaml — the ordered dictator

Each library parses the same YAML but produces different results, handles edge cases differently, and exposes different security profiles.

This guide covers exactly what each library does differently, when to use which, and the real-world tradeoffs that documentation glosses over.


PyYAML: The Standard

PyYAML is the default YAML library in Python. If you pip install pyyaml and import yaml, this is what you get.

Basic Usage

import yaml

config = """
version: "3.8"
services:
  web:
    image: nginx
    ports:
      - "80:80"
"""

data = yaml.safe_load(config)
print(data["version"])       # "3.8"
print(data["services"]["web"]["image"])  # "nginx"

safe_load vs load

This is the most important distinction in PyYAML:

yaml.safe_load(file)    # Safe — recommended for all normal use
yaml.load(file)         # UNSAFE — can execute arbitrary code

yaml.load() without a Loader argument uses the default loader, which can deserialize arbitrary Python objects. This is a known security vulnerability:

# malicious.yaml
!!python/object:os.system ["rm -rf /"]
import yaml
yaml.load(open("malicious.yaml"))  # Executes the command!

Always use yaml.safe_load() unless you have a specific reason to load custom Python objects from trusted sources.

Dumping YAML

data = {"name": "api", "port": 8080}
print(yaml.dump(data))
# name: api
# port: 8080

Limitations

PyYAML has several frustrating limitations:

1. YAML 1.1 only. PyYAML only implements YAML 1.1. Boolean traps (yes/no/on/off) are fully active.

2. No round-trip preservation. Comments, formatting, and key order are lost when you dump back to YAML:

config = """
# Server configuration
host: localhost
port: 8080
"""

data = yaml.safe_load(config)
print(yaml.dump(data))
# host: localhost
# port: 8080
# Comment is gone!

3. Dict key order is not guaranteed. PyYAML loads mappings as standard Python dicts (Python 3.7+ preserves insertion order, but the behavior is not guaranteed by the library).

4. Performance degrades with large files. PyYAML is not optimized for very large YAML files.


ruamel.yaml: Round-Trip Champion

ruamel.yaml is a YAML 1.2 parser that preserves comments, formatting, and key order when you modify and re-dump YAML files.

Installation

pip install ruamel.yaml

Basic Usage

from ruamel.yaml import YAML

yaml = YAML()

config = """
# Production settings
server:
  host: prod.example.com
  port: 443  # HTTPS port
  ssl: true

# Database
database:
  host: db.internal
  port: 5432
"""

data = yaml.load(config)

# Modify
data["server"]["port"] = 8443

# Dump — preserves comments and formatting!
yaml.dump(data, sys.stdout)

Output:

# Production settings
server:
  host: prod.example.com
  port: 8443  # HTTPS port
  ssl: true

# Database
database:
  host: db.internal
  port: 5432

Comments preserved. Key order preserved. Formatting preserved.

YAML 1.2 by Default

ruamel.yaml defaults to YAML 1.2, which means yes/no/on/off are strings, not booleans:

from ruamel.yaml import YAML

yaml = YAML()
data = yaml.load("flag: yes")
print(type(data["flag"]))  # <class 'str'> — not bool!

This alone is reason enough to use ruamel.yaml if you are starting a new project.

Control Over YAML Version

yaml = YAML(typ='safe')
yaml.version = (1, 2)

Performance

ruamel.yaml is slightly slower than PyYAML for simple loads but comparable for complex files. The round-trip feature adds some overhead.

When to Use ruamel.yaml

  • You need to preserve comments and formatting
  • You are building a tool that modifies YAML files in place
  • You want YAML 1.2 behavior
  • You need ordered mappings

oyaml: Ordered Dicts, Nothing Else

oyaml is a tiny library that replaces PyYAML's dict with collections.OrderedDict:

import oyaml as yaml

# Same API as PyYAML, but dicts are ordered

Why It Exists

Before Python 3.7 guaranteed dict insertion order, YAML files loaded with PyYAML scrambled key order. oyaml fixed this by using OrderedDict.

Current Relevance

Python 3.7+ already preserves dict insertion order, so oyaml is largely obsolete for modern Python. However, it still ensures order preservation across all Python versions and in edge cases where dict order might not be guaranteed.

When to Use oyaml

  • Python 3.6 or earlier (should not be running these)
  • You need explicit OrderedDict behavior for backward compatibility
  • You want a drop-in replacement with no API changes

For modern Python projects, ruamel.yaml is a better choice for ordered YAML because it provides order preservation plus many other features.


Library Comparison Table

FeaturePyYAMLruamel.yamloyaml
YAML version1.11.2 (default)1.1
Safe by defaultYes (safe_load)YesYes
Round-tripNoYesNo
Comment preservationNoYesNo
Dict orderPython 3.7+AlwaysAlways
Securityload() is unsafeSafe by defaultSame as PyYAML
PerformanceFastestSlightly slowerSame as PyYAML
APIyaml.load/dumpYAML().load/dumpyaml.load/dump
MaintainedYesYesMinimal
YAML 1.2 booleansNoYesNo

Real-World Benchmark

Loading a 500-line Kubernetes manifest 100 times:

PyYAML safe_load:    0.89s
ruamel.yaml load:    1.12s
oyaml load:          0.91s

Dumping the same data:

PyYAML dump:         0.45s
ruamel.yaml dump:    0.61s
oyaml dump:          0.46s

For most use cases, the performance difference is negligible. The choice should be based on features, not speed.


Security Deep Dive

PyYAML

The biggest risk with PyYAML is using yaml.load() instead of yaml.safe_load().

# NEVER do this with untrusted input
data = yaml.load(user_provided_yaml)

# Always do this
data = yaml.safe_load(user_provided_yaml)

The unsafe loader can deserialize arbitrary Python objects, leading to remote code execution:

!!python/object:__main__.EvilClass
  cmd: "rm -rf /"

ruamel.yaml

ruamel.yaml's default YAML() instance is safe. It does not load arbitrary Python objects unless explicitly configured:

from ruamel.yaml import YAML

yaml = YAML(typ='unsafe')  # Only if you really need it

General Advice

  • Never parse untrusted YAML with PyYAML's yaml.load()
  • Always validate parsed YAML against a schema
  • Prefer safe_load or ruamel.yaml's default mode
  • If you must load custom types, use a whitelist approach

Advanced Pattern: Schema Validation

Parsing YAML is only half the battle. Validating that the parsed data matches your expected schema prevents boolean traps, missing keys, and type errors.

With PyYAML and Pydantic

from pydantic import BaseModel
import yaml

class DatabaseConfig(BaseModel):
    host: str
    port: int = 5432
    ssl: bool = False

class AppConfig(BaseModel):
    app_name: str
    debug: bool = False
    database: DatabaseConfig

raw = yaml.safe_load(open("config.yml"))
config = AppConfig(**raw)
print(config.database.host)  # validated string

With ruamel.yaml and JSON Schema

from ruamel.yaml import YAML
import json
from jsonschema import validate

yaml = YAML(typ='safe')
data = yaml.load(open("config.yml"))

schema = {
    "type": "object",
    "properties": {
        "app_name": {"type": "string"},
        "version": {"type": "string"},
    },
    "required": ["app_name"]
}

validate(instance=data, schema=schema)

Schema validation catches boolean traps because country: NO becomes False (boolean), and the schema expects a string. The validator rejects the file with a clear error.


Which Library Should You Use?

Use ruamel.yaml if:

  • You need to preserve comments and formatting when modifying YAML
  • You want YAML 1.2 boolean handling
  • You are writing a tool that edits YAML files in place
  • You want the most complete YAML implementation

Use PyYAML if:

  • You only need to read YAML files (not write them back)
  • You value the smallest dependency footprint
  • You work with legacy projects already using PyYAML
  • Performance is critical (microbenchmarks show PyYAML slightly faster)

Use oyaml if:

  • You need a drop-in PyYAML replacement with ordered dicts
  • You are stuck on Python 3.6 or earlier (unlikely in 2026)
  • You want absolutely no API changes

For most new projects, ruamel.yaml is the better choice. The YAML 1.2 support alone prevents the most common YAML bugs, and round-trip preservation is invaluable for any tool that modifies configuration files.


FAQ

What is the difference between PyYAML and ruamel.yaml?

PyYAML is the original YAML library for Python, implementing YAML 1.1 with a simple yaml.load() / yaml.dump() API. ruamel.yaml is a fork of PyYAML that implements YAML 1.2, supports round-trip preservation (comments, formatting, key order survive modification and re-dumping), and defaults to safe parsing. PyYAML is faster in benchmarks and has a smaller footprint, but ruamel.yaml prevents more bugs (YAML 1.2 boolean handling) and is better for tools that modify YAML files.

Is PyYAML safe to use?

PyYAML's yaml.load() function is unsafe — it can deserialize arbitrary Python objects and execute code embedded in the YAML file. However, yaml.safe_load() is safe and should always be used for untrusted input. The security risk is well-documented but still catches developers who follow online examples that use yaml.load() without a loader argument. Never use yaml.load() on YAML files from untrusted sources. For new projects, consider ruamel.yaml which defaults to safe parsing.

How do I preserve comments when editing YAML in Python?

Use ruamel.yaml, which is specifically designed for round-trip preservation. PyYAML discards all comments, formatting, and key order when dumping modified data back to YAML. With ruamel.yaml, load the file with from ruamel.yaml import YAML; yaml = YAML(); data = yaml.load(file), make your modifications, then call yaml.dump(data, file) — comments, indentation, and key order are preserved exactly.

Does Python have a built-in YAML parser?

No, Python does not include a built-in YAML parser in its standard library. You must install a third-party library. The most popular options are PyYAML (pip install pyyaml), ruamel.yaml (pip install ruamel.yaml), and oyaml (pip install oyaml). Python does include json, configparser, and xml in the standard library, but not YAML.

Which YAML library should I use for a new Python project?

For a new Python project in 2026, ruamel.yaml is the best default choice. It supports YAML 1.2 (which eliminates the most common boolean traps), preserves comments and formatting when re-dumping, and defaults to safe parsing. The YAML 1.2 support alone prevents bugs where yes/no/on/off are incorrectly parsed as booleans. If you only need to read YAML files and want minimal dependencies, PyYAML with yaml.safe_load() is sufficient.


Final Thoughts

The YAML library you choose in Python determines more than just API syntax — it determines whether your application silently corrupts data, whether you can safely parse untrusted input, and whether modified YAML files retain their original formatting.

For reading configuration files, PyYAML with safe_load() is sufficient and performant. For any tool that modifies YAML, ruamel.yaml is the only serious choice. And oyaml remains a niche solution for projects that need ordered dicts without changing their PyYAML API.

Whichever library you use, always validate parsed YAML against a schema. The combination of a robust parser and schema validation catches the silent bugs — boolean traps, missing keys, unexpected types — that make YAML dangerous at scale.

Before deploying any Python code that parses YAML, run your configuration through a YAML formatter to verify the output matches your expectations. A quick visual inspection of the parsed types often reveals bugs that code review misses.

For more on the boolean traps that PyYAML's YAML 1.1 default introduces, see YAML Booleans Are Traps and Why Your YAML Is Invalid. If you are dealing with indentation-related parse errors in Python, How to Fix YAML Indentation Errors covers the common patterns.