I had a production issue last month. The monitoring team dumped 200MB of application logs in .json format. Each log entry looked like this:

{"timestamp":"2026-06-04T10:23:11Z","level":"ERROR","service":"auth","message":"Token validation failed","userId":"u_38492","metadata":{"ip":"192.168.1.10","attempt":3}}

My task: extract every failed login attempt by userId, count attempts per user, and output a CSV. Manually? No way. Let me walk through the extraction and transformation techniques I used — and how you can do the same without writing complex scripts.

Method 1: jq for Command-Line Extraction

The Swiss Army knife for JSON processing is jq. It's a command-line tool that filters, transforms, and reshapes JSON data. If you work with JSON files regularly, install it now.

# Install
brew install jq        # macOS
apt install jq          # Linux
choco install jq        # Windows

Extract specific fields from an array of objects:

cat logs.json | jq '{timestamp, level, message}'

Filter by condition:

# Get only ERROR level entries
cat logs.json | jq 'select(.level == "ERROR")'

# Get failed auth attempts only
cat logs.json | jq 'select(.service == "auth" and .level == "ERROR")'

Transform into a different structure:

cat logs.json | jq '{time: .timestamp, user: .userId, msg: .message}'

Group and count:

cat logs.json | jq -s 'group_by(.userId) | map({userId: .[0].userId, count: length})'

The -s flag (slurp) reads the entire input into an array, then groups by userId and counts.

Method 2: Extract from Deeply Nested JSON

Not all JSON is flat. Sometimes you're dealing with nested structures like API responses:

{
  "status": "ok",
  "data": {
    "products": [
      {
        "id": "p1",
        "details": {
          "name": "Wireless Mouse",
          "specs": {
            "connectivity": "Bluetooth",
            "battery": "AA"
          }
        },
        "pricing": {
          "current": 29.99,
          "original": 39.99
        }
      }
    ]
  }
}

To extract product names and prices from this:

cat products.json | jq '.data.products[] | {name: .details.name, price: .pricing.current}'

Output:

{"name":"Wireless Mouse","price":29.99}

You can output this as CSV directly:

cat products.json | jq -r '.data.products[] | [.details.name, .pricing.current] | @csv'

"Wireless Mouse",29.99

Method 3: Extract from Log Files (.log, .txt, .json)

Log files often contain mixed content — one JSON object per line, interspersed with plain text log entries.

[INFO] 2026-06-04 Service started
{"event":"request","method":"POST","path":"/api/login","status":200,"duration_ms":145}
{"event":"request","method":"GET","path":"/api/users","status":401,"duration_ms":32}
[WARN] 2026-06-04 Rate limit approaching
{"event":"request","method":"POST","path":"/api/login","status":200,"duration_ms":89}

Extract only JSON lines:

grep '^{' log.txt | jq '{method, path, status}'

Calculate average response time:

grep '^{' log.txt | jq -s '[.[].duration_ms] | add / length'

Find slow requests:

grep '^{' log.txt | jq 'select(.duration_ms > 100) | {path, duration_ms}'

Method 4: Pick-and-Choose with Python

When jq isn't installed and you need something quick in Python:

import json

with open('logs.json') as f:
    logs = [json.loads(line) for line in f if line.startswith('{')]

# Filter and extract
errors = [
    {'user': log['userId'], 'time': log['timestamp']}
    for log in logs
    if log.get('level') == 'ERROR' and log.get('service') == 'auth'
]

# Group by user
from collections import Counter
user_counts = Counter(e['user'] for e in errors)

for user, count in user_counts.most_common():
    print(f"{user},{count}")

One-liner version (for quick scripts):

python3 -c "
import json, sys, collections
logs = [json.loads(l) for l in sys.stdin if l.startswith('{')]
errors = [l for l in logs if l.get('level') == 'ERROR']
print(f'Total errors: {len(errors)}')
" < logs.json

Method 5: Transform JSON to Other Formats

Sometimes you need JSON → CSV, JSON → Markdown table, or JSON → YAML.

JSON to CSV with jq:

cat data.json | jq -r '.[] | [.id, .name, .email] | @csv' > output.csv

JSON to Markdown table (manual approach):

echo "| ID | Name | Email |"
echo "|---|---|---|"
cat data.json | jq -r '.[] | "| \(.id) | \(.name) | \(.email) |"'

JSON to array of arrays for spreadsheet import:

cat data.json | jq '[.[] | [.id, .name, .email]]'

Processing Uploaded Files with an Online Tool

If you're working on a machine without jq or Python (or just want visual output), you can upload your .json, .txt, or .log files to an online formatter. The JSON Formatter supports file upload — drag your file in and instantly see its structure in a tree view, making it easy to find and extract the fields you need.

Real Workflow: From Raw Logs to Actionable Data

Here's a complete workflow I used to triage a production incident:

# Step 1: Extract all ERROR lines from multi-format log
grep '^{' app.log > json_only.json

# Step 2: Filter for auth-related errors
cat json_only.json | jq 'select(.service == "auth")' > auth_errors.json

# Step 3: Extract key fields
cat auth_errors.json | jq '{time: .timestamp, user: .userId, ip: .metadata.ip}' > summary.json

# Step 4: Count attempts per IP
cat auth_errors.json | jq -r '.metadata.ip' | sort | uniq -c | sort -rn

# Step 5: Save as CSV report
cat auth_errors.json | jq -r '[.timestamp, .userId, .metadata.ip] | @csv' > report.csv

Total time: about 2 minutes. Time to manually parse 200MB of logs: hours.

Common Pitfalls When Extracting JSON Data

1. Newline-delimited JSON vs JSON array

Some files have one JSON object per line (NDJSON), others have a single JSON array [{...}, {...}]. jq handles both differently:

# NDJSON (one JSON per line)
cat ndjson.log | jq '.name'

# JSON array
cat array.json | jq '.[].name'

2. Escaped JSON inside JSON

Sometimes log entries have a string field that contains escaped JSON:

{"event": "order_created", "payload": "{\"orderId\":\"ord_123\",\"total\":49.99}"}

Extract the nested JSON:

cat logs.json | jq '.payload | fromjson | .orderId'

3. Very large files (100MB+)

Don't try to read the whole file into memory. Use streaming:

# jq streaming mode
cat hugefile.json | jq -n --stream 'fromstream(1|truncate_stream(inputs))'

# Or split and process in chunks
split -l 10000 hugefile.json chunk_
for f in chunk_*; do cat $f | jq '...' > "processed_$f"; done

FAQ

Q: What's the fastest way to extract one field from a huge JSON file?

A: Use jq '.[].fieldName' with streaming mode (--stream). For really huge files, use grep to pre-filter lines before piping to jq.

Q: Can I extract data from .log files that aren't pure JSON?

A: Yes. Pre-filter with grep '^{' to extract only JSON lines, then process with jq or the JSON Formatter.

Q: How do I convert JSON to Excel format?

A: Convert JSON to CSV first, then import into Excel. jq's @csv format handles this cleanly.

Q: What if my JSON has nested objects I want to flatten?

A: jq can flatten with [.. | objects | to_entries[]] | from_entries, or use a dedicated flattening library if you're in Python.

Q: Is there a way to preview a JSON file before extracting data?

A: Yes. Use head -n 5 file.json | jq '.' to see the first few records, or upload the file to the JSON Formatter for a visual tree view.

Q: How do I handle duplicate keys in JSON?

A: JSON technically doesn't allow duplicate keys at the same level, but some parsers accept them, keeping the last value. Deduplicate by converting to an array of unique keys.

Q: Can I extract data from compressed (.gz) JSON files without decompressing?

A: Yes — use zcat file.json.gz | jq '...' to pipe directly from compressed format.

Q: What's the best way to extract a subset of JSON for testing?

A: Use jq '.[:100]' to take the first 100 records, or filter by a condition to get a representative sample.

Next time you're staring at a massive JSON log file wondering how to pull out the data you need, try the piped jq workflow above. Or if you prefer a visual approach, drag your file into the JSON Formatter and use the tree view to explore before you extract.

Method 1: jq for Command-Line Extraction

Method 2: Extract from Deeply Nested JSON

Method 3: Extract from Log Files (.log, .txt, .json)

Method 4: Pick-and-Choose with Python

Method 5: Transform JSON to Other Formats

Processing Uploaded Files with an Online Tool

Real Workflow: From Raw Logs to Actionable Data

Common Pitfalls When Extracting JSON Data

FAQ

Related Tools

JSON Formatter

JSON Validator

JSON to XML

JSON to CSV

Related Articles

How to Generate JSON Schema from JSON: Step-by-Step Guide

Common JSON Schema Validation Errors and How to Fix Them

JSON Schema Draft Comparison: 04 vs 07 vs 2019-09 vs 2020-12

JSON Schema to TypeScript: Complete Code Generation Guide