I had a production issue last month. The monitoring team dumped 200MB of application logs in .json format. Each log entry looked like this:
{"timestamp":"2026-06-04T10:23:11Z","level":"ERROR","service":"auth","message":"Token validation failed","userId":"u_38492","metadata":{"ip":"192.168.1.10","attempt":3}}
My task: extract every failed login attempt by userId, count attempts per user, and output a CSV. Manually? No way. Let me walk through the extraction and transformation techniques I used — and how you can do the same without writing complex scripts.
Method 1: jq for Command-Line Extraction
The Swiss Army knife for JSON processing is jq. It's a command-line tool that filters, transforms, and reshapes JSON data. If you work with JSON files regularly, install it now.
# Install
brew install jq # macOS
apt install jq # Linux
choco install jq # Windows
Extract specific fields from an array of objects:
cat logs.json | jq '{timestamp, level, message}'
Filter by condition:
# Get only ERROR level entries
cat logs.json | jq 'select(.level == "ERROR")'
# Get failed auth attempts only
cat logs.json | jq 'select(.service == "auth" and .level == "ERROR")'
Transform into a different structure:
cat logs.json | jq '{time: .timestamp, user: .userId, msg: .message}'
Group and count:
cat logs.json | jq -s 'group_by(.userId) | map({userId: .[0].userId, count: length})'
The -s flag (slurp) reads the entire input into an array, then groups by userId and counts.
Method 2: Extract from Deeply Nested JSON
Not all JSON is flat. Sometimes you're dealing with nested structures like API responses:
{
"status": "ok",
"data": {
"products": [
{
"id": "p1",
"details": {
"name": "Wireless Mouse",
"specs": {
"connectivity": "Bluetooth",
"battery": "AA"
}
},
"pricing": {
"current": 29.99,
"original": 39.99
}
}
]
}
}
To extract product names and prices from this:
cat products.json | jq '.data.products[] | {name: .details.name, price: .pricing.current}'
Output:
{"name":"Wireless Mouse","price":29.99}
You can output this as CSV directly:
cat products.json | jq -r '.data.products[] | [.details.name, .pricing.current] | @csv'
"Wireless Mouse",29.99
Method 3: Extract from Log Files (.log, .txt, .json)
Log files often contain mixed content — one JSON object per line, interspersed with plain text log entries.
[INFO] 2026-06-04 Service started
{"event":"request","method":"POST","path":"/api/login","status":200,"duration_ms":145}
{"event":"request","method":"GET","path":"/api/users","status":401,"duration_ms":32}
[WARN] 2026-06-04 Rate limit approaching
{"event":"request","method":"POST","path":"/api/login","status":200,"duration_ms":89}
Extract only JSON lines:
grep '^{' log.txt | jq '{method, path, status}'
Calculate average response time:
grep '^{' log.txt | jq -s '[.[].duration_ms] | add / length'
Find slow requests:
grep '^{' log.txt | jq 'select(.duration_ms > 100) | {path, duration_ms}'
Method 4: Pick-and-Choose with Python
When jq isn't installed and you need something quick in Python:
import json
with open('logs.json') as f:
logs = [json.loads(line) for line in f if line.startswith('{')]
# Filter and extract
errors = [
{'user': log['userId'], 'time': log['timestamp']}
for log in logs
if log.get('level') == 'ERROR' and log.get('service') == 'auth'
]
# Group by user
from collections import Counter
user_counts = Counter(e['user'] for e in errors)
for user, count in user_counts.most_common():
print(f"{user},{count}")
One-liner version (for quick scripts):
python3 -c "
import json, sys, collections
logs = [json.loads(l) for l in sys.stdin if l.startswith('{')]
errors = [l for l in logs if l.get('level') == 'ERROR']
print(f'Total errors: {len(errors)}')
" < logs.json
Method 5: Transform JSON to Other Formats
Sometimes you need JSON → CSV, JSON → Markdown table, or JSON → YAML.
JSON to CSV with jq:
cat data.json | jq -r '.[] | [.id, .name, .email] | @csv' > output.csv
JSON to Markdown table (manual approach):
echo "| ID | Name | Email |"
echo "|---|---|---|"
cat data.json | jq -r '.[] | "| \(.id) | \(.name) | \(.email) |"'
JSON to array of arrays for spreadsheet import:
cat data.json | jq '[.[] | [.id, .name, .email]]'
Processing Uploaded Files with an Online Tool
If you're working on a machine without jq or Python (or just want visual output), you can upload your .json, .txt, or .log files to an online formatter. The JSON Formatter supports file upload — drag your file in and instantly see its structure in a tree view, making it easy to find and extract the fields you need.
Real Workflow: From Raw Logs to Actionable Data
Here's a complete workflow I used to triage a production incident:
# Step 1: Extract all ERROR lines from multi-format log
grep '^{' app.log > json_only.json
# Step 2: Filter for auth-related errors
cat json_only.json | jq 'select(.service == "auth")' > auth_errors.json
# Step 3: Extract key fields
cat auth_errors.json | jq '{time: .timestamp, user: .userId, ip: .metadata.ip}' > summary.json
# Step 4: Count attempts per IP
cat auth_errors.json | jq -r '.metadata.ip' | sort | uniq -c | sort -rn
# Step 5: Save as CSV report
cat auth_errors.json | jq -r '[.timestamp, .userId, .metadata.ip] | @csv' > report.csv
Total time: about 2 minutes. Time to manually parse 200MB of logs: hours.
Common Pitfalls When Extracting JSON Data
1. Newline-delimited JSON vs JSON array
Some files have one JSON object per line (NDJSON), others have a single JSON array [{...}, {...}]. jq handles both differently:
# NDJSON (one JSON per line)
cat ndjson.log | jq '.name'
# JSON array
cat array.json | jq '.[].name'
2. Escaped JSON inside JSON
Sometimes log entries have a string field that contains escaped JSON:
{"event": "order_created", "payload": "{\"orderId\":\"ord_123\",\"total\":49.99}"}
Extract the nested JSON:
cat logs.json | jq '.payload | fromjson | .orderId'
3. Very large files (100MB+)
Don't try to read the whole file into memory. Use streaming:
# jq streaming mode
cat hugefile.json | jq -n --stream 'fromstream(1|truncate_stream(inputs))'
# Or split and process in chunks
split -l 10000 hugefile.json chunk_
for f in chunk_*; do cat $f | jq '...' > "processed_$f"; done
FAQ
Q: What's the fastest way to extract one field from a huge JSON file?
A: Use jq '.[].fieldName' with streaming mode (--stream). For really huge files, use grep to pre-filter lines before piping to jq.
Q: Can I extract data from .log files that aren't pure JSON?
A: Yes. Pre-filter with grep '^{' to extract only JSON lines, then process with jq or the JSON Formatter.
Q: How do I convert JSON to Excel format?
A: Convert JSON to CSV first, then import into Excel. jq's @csv format handles this cleanly.
Q: What if my JSON has nested objects I want to flatten?
A: jq can flatten with [.. | objects | to_entries[]] | from_entries, or use a dedicated flattening library if you're in Python.
Q: Is there a way to preview a JSON file before extracting data?
A: Yes. Use head -n 5 file.json | jq '.' to see the first few records, or upload the file to the JSON Formatter for a visual tree view.
Q: How do I handle duplicate keys in JSON?
A: JSON technically doesn't allow duplicate keys at the same level, but some parsers accept them, keeping the last value. Deduplicate by converting to an array of unique keys.
Q: Can I extract data from compressed (.gz) JSON files without decompressing?
A: Yes — use zcat file.json.gz | jq '...' to pipe directly from compressed format.
Q: What's the best way to extract a subset of JSON for testing?
A: Use jq '.[:100]' to take the first 100 records, or filter by a condition to get a representative sample.
Next time you're staring at a massive JSON log file wondering how to pull out the data you need, try the piped jq workflow above. Or if you prefer a visual approach, drag your file into the JSON Formatter and use the tree view to explore before you extract.