I had a production issue last month. The monitoring team dumped 200MB of application logs in .json format. Each log entry looked like this:

{"timestamp":"2026-06-04T10:23:11Z","level":"ERROR","service":"auth","message":"Token validation failed","userId":"u_38492","metadata":{"ip":"192.168.1.10","attempt":3}}

My task: extract every failed login attempt by userId, count attempts per user, and output a CSV. Manually? No way. Let me walk through the extraction and transformation techniques I used — and how you can do the same without writing complex scripts.

Method 1: jq for Command-Line Extraction

The Swiss Army knife for JSON processing is jq. It's a command-line tool that filters, transforms, and reshapes JSON data. If you work with JSON files regularly, install it now.

# Install
brew install jq        # macOS
apt install jq          # Linux
choco install jq        # Windows

Extract specific fields from an array of objects:

cat logs.json | jq '{timestamp, level, message}'

Filter by condition:

# Get only ERROR level entries
cat logs.json | jq 'select(.level == "ERROR")'

# Get failed auth attempts only
cat logs.json | jq 'select(.service == "auth" and .level == "ERROR")'

Transform into a different structure:

cat logs.json | jq '{time: .timestamp, user: .userId, msg: .message}'

Group and count:

cat logs.json | jq -s 'group_by(.userId) | map({userId: .[0].userId, count: length})'

The -s flag (slurp) reads the entire input into an array, then groups by userId and counts.

Method 2: Extract from Deeply Nested JSON

Not all JSON is flat. Sometimes you're dealing with nested structures like API responses:

{
  "status": "ok",
  "data": {
    "products": [
      {
        "id": "p1",
        "details": {
          "name": "Wireless Mouse",
          "specs": {
            "connectivity": "Bluetooth",
            "battery": "AA"
          }
        },
        "pricing": {
          "current": 29.99,
          "original": 39.99
        }
      }
    ]
  }
}

To extract product names and prices from this:

cat products.json | jq '.data.products[] | {name: .details.name, price: .pricing.current}'

Output:

{"name":"Wireless Mouse","price":29.99}

You can output this as CSV directly:

cat products.json | jq -r '.data.products[] | [.details.name, .pricing.current] | @csv'
"Wireless Mouse",29.99

Method 3: Extract from Log Files (.log, .txt, .json)

Log files often contain mixed content — one JSON object per line, interspersed with plain text log entries.

[INFO] 2026-06-04 Service started
{"event":"request","method":"POST","path":"/api/login","status":200,"duration_ms":145}
{"event":"request","method":"GET","path":"/api/users","status":401,"duration_ms":32}
[WARN] 2026-06-04 Rate limit approaching
{"event":"request","method":"POST","path":"/api/login","status":200,"duration_ms":89}

Extract only JSON lines:

grep '^{' log.txt | jq '{method, path, status}'

Calculate average response time:

grep '^{' log.txt | jq -s '[.[].duration_ms] | add / length'

Find slow requests:

grep '^{' log.txt | jq 'select(.duration_ms > 100) | {path, duration_ms}'

Method 4: Pick-and-Choose with Python

When jq isn't installed and you need something quick in Python:

import json

with open('logs.json') as f:
    logs = [json.loads(line) for line in f if line.startswith('{')]

# Filter and extract
errors = [
    {'user': log['userId'], 'time': log['timestamp']}
    for log in logs
    if log.get('level') == 'ERROR' and log.get('service') == 'auth'
]

# Group by user
from collections import Counter
user_counts = Counter(e['user'] for e in errors)

for user, count in user_counts.most_common():
    print(f"{user},{count}")

One-liner version (for quick scripts):

python3 -c "
import json, sys, collections
logs = [json.loads(l) for l in sys.stdin if l.startswith('{')]
errors = [l for l in logs if l.get('level') == 'ERROR']
print(f'Total errors: {len(errors)}')
" < logs.json

Method 5: Transform JSON to Other Formats

Sometimes you need JSON → CSV, JSON → Markdown table, or JSON → YAML.

JSON to CSV with jq:

cat data.json | jq -r '.[] | [.id, .name, .email] | @csv' > output.csv

JSON to Markdown table (manual approach):

echo "| ID | Name | Email |"
echo "|---|---|---|"
cat data.json | jq -r '.[] | "| \(.id) | \(.name) | \(.email) |"'

JSON to array of arrays for spreadsheet import:

cat data.json | jq '[.[] | [.id, .name, .email]]'

Processing Uploaded Files with an Online Tool

If you're working on a machine without jq or Python (or just want visual output), you can upload your .json, .txt, or .log files to an online formatter. The JSON Formatter supports file upload — drag your file in and instantly see its structure in a tree view, making it easy to find and extract the fields you need.

Real Workflow: From Raw Logs to Actionable Data

Here's a complete workflow I used to triage a production incident:

# Step 1: Extract all ERROR lines from multi-format log
grep '^{' app.log > json_only.json

# Step 2: Filter for auth-related errors
cat json_only.json | jq 'select(.service == "auth")' > auth_errors.json

# Step 3: Extract key fields
cat auth_errors.json | jq '{time: .timestamp, user: .userId, ip: .metadata.ip}' > summary.json

# Step 4: Count attempts per IP
cat auth_errors.json | jq -r '.metadata.ip' | sort | uniq -c | sort -rn

# Step 5: Save as CSV report
cat auth_errors.json | jq -r '[.timestamp, .userId, .metadata.ip] | @csv' > report.csv

Total time: about 2 minutes. Time to manually parse 200MB of logs: hours.

Common Pitfalls When Extracting JSON Data

1. Newline-delimited JSON vs JSON array

Some files have one JSON object per line (NDJSON), others have a single JSON array [{...}, {...}]. jq handles both differently:

# NDJSON (one JSON per line)
cat ndjson.log | jq '.name'

# JSON array
cat array.json | jq '.[].name'

2. Escaped JSON inside JSON

Sometimes log entries have a string field that contains escaped JSON:

{"event": "order_created", "payload": "{\"orderId\":\"ord_123\",\"total\":49.99}"}

Extract the nested JSON:

cat logs.json | jq '.payload | fromjson | .orderId'

3. Very large files (100MB+)

Don't try to read the whole file into memory. Use streaming:

# jq streaming mode
cat hugefile.json | jq -n --stream 'fromstream(1|truncate_stream(inputs))'

# Or split and process in chunks
split -l 10000 hugefile.json chunk_
for f in chunk_*; do cat $f | jq '...' > "processed_$f"; done

FAQ

Q: What's the fastest way to extract one field from a huge JSON file?

A: Use jq '.[].fieldName' with streaming mode (--stream). For really huge files, use grep to pre-filter lines before piping to jq.

Q: Can I extract data from .log files that aren't pure JSON?

A: Yes. Pre-filter with grep '^{' to extract only JSON lines, then process with jq or the JSON Formatter.

Q: How do I convert JSON to Excel format?

A: Convert JSON to CSV first, then import into Excel. jq's @csv format handles this cleanly.

Q: What if my JSON has nested objects I want to flatten?

A: jq can flatten with [.. | objects | to_entries[]] | from_entries, or use a dedicated flattening library if you're in Python.

Q: Is there a way to preview a JSON file before extracting data?

A: Yes. Use head -n 5 file.json | jq '.' to see the first few records, or upload the file to the JSON Formatter for a visual tree view.

Q: How do I handle duplicate keys in JSON?

A: JSON technically doesn't allow duplicate keys at the same level, but some parsers accept them, keeping the last value. Deduplicate by converting to an array of unique keys.

Q: Can I extract data from compressed (.gz) JSON files without decompressing?

A: Yes — use zcat file.json.gz | jq '...' to pipe directly from compressed format.

Q: What's the best way to extract a subset of JSON for testing?

A: Use jq '.[:100]' to take the first 100 records, or filter by a condition to get a representative sample.


Next time you're staring at a massive JSON log file wondering how to pull out the data you need, try the piped jq workflow above. Or if you prefer a visual approach, drag your file into the JSON Formatter and use the tree view to explore before you extract.