\ If you’re using large language models in real products, “the model gave a sensible answer” is not enough.
What you actually need is:
This article walks through a practical framework for turning messy, natural-language LLM outputs into machine-friendly structured JSON, using only prompt design. We’ll cover:
By default, LLMs talk like people: paragraphs, bullet points, and the occasional emoji.
Example request:
A typical answer might be:
Nice for humans. Awful for code.
If you want to:
…you’re forced to regex your way through free text. Any tiny format change breaks your parsing.
[], objects use {}.json in Python, JSON.parse in JS, etc.).If the output is valid JSON, parsing is a solved problem.
price_gbp must be a number, not \"£1,299\"”.Think: user → order list → line items. JSON handles this naturally:
{ "user": { "name": "Alice", "orders": [ { "product": "Laptop", "price_gbp": 1299 }, { "product": "Monitor", "price_gbp": 199 } ] } }
Free-text output:
JSON output:
{ "laptop_analysis": { "analysis_date": "2025-01-01", "total_count": 3, "laptops": [ { "brand": "Lenovo", "model": "Slim 7", "screen": { "size_inch": 16, "resolution": "2.5K", "touch_support": false }, "processor": "Intel i7", "price_gbp": 1299 }, { "brand": "HP", "model": "Envy 14", "screen": { "size_inch": 14, "resolution": "2.2K", "touch_support": true }, "processor": "AMD Ryzen 7", "price_gbp": 1049 }, { "brand": "Apple", "model": "MacBook Air M2", "screen": { "size_inch": 13.6, "resolution": "Retina-class", "touch_support": false }, "processor": "Apple M2", "price_gbp": 1249 } ] } }
Now your pipeline can do:
data = json.loads(output) for laptop in data["laptop_analysis"]["laptops"]: ...
No brittle parsing. No surprises.
Getting an LLM to output proper JSON isn’t magic. A robust prompt usually has four ingredients:
Let’s go through them.
You must explicitly fight the model’s “chatty” instinct.
Bad instruction:
You will absolutely get:
Here is your analysis: { ... } Hope this helps!
Your parser will absolutely die.
Use strict wording instead:
You MUST return ONLY valid JSON. - Do NOT include any explanations, comments, or extra text. - The output must be a single JSON object. - If you include any non-JSON content, the result is invalid.
You can go even stricter by wrapping it:
【HARD REQUIREMENT】 Return output wrapped between the markers ---BEGIN JSON--- and ---END JSON---. Outside these markers there must be NOTHING (no text, no spaces, no newlines). Example: ---BEGIN JSON--- {"key": "value"} ---END JSON---
Then your code can safely extract the block between those markers before parsing.
Don’t leave structure to the model’s imagination. Tell it exactly what object you want.
Example: extracting news metadata.
{ "news_extraction": { "article_title": "", // string, full headline "publish_time": "", // string, "YYYY-MM-DD HH:MM", or null "source": "", // string, e.g. "BBC News" "author": "", // string or null "key_points": [], // array of 3–5 strings, each ≤ 50 chars "category": "", // one of: "Politics", "Business", "Tech", "Entertainment", "Sport" "word_count": 0 // integer, total word count } }
Template design tips:
product_name, price_gbp, word_count.null instead of empty string.tags: [] // array of strings, e.g. ["budget", "lightweight"].This turns the model’s job into “fill in a form”, not “invent whatever feels right”.
The template defines shape. Validation rules define what’s legal inside that shape.
Examples you can include in the prompt:
You don’t need a full JSON Schema in the prompt, but a few clear bullets like this reduce errors dramatically.
Models learn fast by imitation. Give them a mini “input → JSON” pair that matches your task.
Example: news extraction.
Prompt snippet:
Example input article: "[Tech] UK startup launches home battery to cut energy bills Source: The Guardian Author: Jane Smith Published: 2024-12-30 10:00 A London-based climate tech startup has launched a compact home battery designed to help households store cheap off-peak electricity and reduce their energy bills..." Example JSON output:
{ "news_extraction": { "article_title": "UK startup launches home battery to cut energy bills", "publish_time": "2024-12-30 10:00", "source": "The Guardian", "author": "Jane Smith", "key_points": [ "London climate tech startup releases compact home battery", "Product lets households store off-peak electricity and lower bills", "Targets UK homeowners looking to reduce reliance on the grid" ], "category": "Tech", "word_count": 850 } }
Then you append your real article and say:
This single example often bumps JSON correctness from “coin flip” to “production-ready”.
Even with good prompts, you’ll still see issues. Here’s what usually goes wrong and how to fix it.
Why it happens: chatty default behaviour; format instruction too soft.
How to fix:
---BEGIN JSON--- / ---END JSON---) as shown earlier.Examples:
Fixes:
JSON syntax rules: - All keys MUST be in double quotes. - Use double quotes for strings, never single quotes. - No trailing commas after the last element in an object or array. - All { [ must have matching } ].
Try json.loads().
If it fails, send the error message back to the model:
Examples:
"price_gbp": "1299.0" instead of 1299.0"in_stock": "yes" instead of true"word_count": "850 words"Fixes:
"price_gbp": 0.0 // number ONLY, like 1299.0, no currency symbol "word_count": 0 // integer ONLY, like 850, no text "in_stock": false // boolean, must be true or false
Wrong: "word_count": "850 words" Correct: "word_count": 850 Wrong: "touch_support": "yes" Correct: "touch_support": true
"1299" → 1299.0), but still log violations.Examples:
author omitted even though it existedsummary field appearsFixes:
The JSON MUST include exactly these fields: article_title, publish_time, source, author, key_points, category, word_count. Do NOT add any new fields such as summary, description, tags, etc.
This is where things like arrays of objects containing arrays go sideways.
Fixes:
"laptops" is an array. Each element is an object with: { "brand": "", "model": "", "screen": { "size_inch": 0, "resolution": "", "touch_support": false }, "processor": "", "price_gbp": 0 }
Here are three complete patterns you can lift straight into your own system.
Goal: From a UK shop’s product description, extract key fields like product ID, category, specs, price, stock, etc.
Prompt core:
Task: Extract key product data from the following product description and return JSON only. ### Output requirements 1. Output MUST be valid JSON, no extra text. 2. Use this template exactly (do not rename keys): { "product_info": { "product_id": "", // string, e.g. "P20250201001" "product_name": "", // full name, not abbreviated "category": "", // one of: "Laptop", "Phone", "Appliance", "Clothing", "Food" "specifications": [], // 2–3 core specs as strings "price_gbp": 0.0, // number, price in GBP, e.g. 999.0 "stock": 0, // integer, units in stock "free_shipping": false, // boolean, true if free delivery in mainland UK "sales_count": 0 // integer, total units sold (0 if not mentioned) } } 3. Rules: - No "£" symbol in price_gbp, number only. - If no product_id mentioned, use "unknown". - If no sales info, use 0 for sales_count. ### Product text: "..."
Example model output:
{ "product_info": { "product_id": "P20250201005", "product_name": "Dell XPS 13 Plus 13.4" Laptop", "category": "Laptop", "specifications": [ "Colour: Platinum", "Memory: 16GB RAM, 512GB SSD", "Display: 13.4" OLED, 120Hz" ], "price_gbp": 1499.0, "stock": 42, "free_shipping": true, "sales_count": 850 } }
In Python, it’s just:
import json data = json.loads(model_output) price = data["product_info"]["price_gbp"] stock = data["product_info"]["stock"]
And you’re ready to insert into a DB.
Goal: Take free-text customer feedback and turn it into structured analysis for your support system.
Template:
{ "feedback_analysis": { "feedback_id": "", // string, you can generate like "F20250201093001" "sentiment": "", // "Positive" | "Negative" | "Neutral" "core_demand": "", // 10–30 chars summary of what the customer wants "issue_type": "", // "Delivery" | "Quality" | "After-sales" | "Enquiry" "urgency_level": 0, // 1 = low, 2 = medium, 3 = high "keywords": [] // 3–4 noun keywords, e.g. ["laptop", "screen crack"] } }
Rule of thumb for urgency:
321Example output:
{ "feedback_analysis": { "feedback_id": "F20250201093001", "sentiment": "Negative", "core_demand": "Request replacement or refund for dead-on-arrival laptop", "issue_type": "Quality", "urgency_level": 3, "keywords": ["laptop", "won't turn on", "replacement", "refund"] } }
Your ticketing system can now:
urgency_level = 3 to a priority queue.core_demand instead of a wall of text.Goal: Turn a “website redesign” paragraph into a structured task list.
Template:
{ "project": "Website Redesign", "tasks": [ { "task_id": "T001", // T + 3 digits "task_name": "", // 10–20 chars, clear action "owner": "", // "Product Manager" | "Designer" | "Frontend" | "Backend" | "QA" "due_date": "", // "YYYY-MM-DD", assume project start 2025-02-01 "priority": "", // "High" | "Medium" | "Low" "dependencies": [] // e.g. ["T001"], [] if none } ], "total_tasks": 0 // number of items in tasks[] }
Rules:
Example output (shortened):
{ "project": "Website Redesign", "tasks": [ { "task_id": "T001", "task_name": "Gather detailed redesign requirements", "owner": "Product Manager", "due_date": "2025-02-03", "priority": "High", "dependencies": [] }, { "task_id": "T002", "task_name": "Design new homepage and listing UI", "owner": "Designer", "due_date": "2025-02-08", "priority": "High", "dependencies": ["T001"] }, { "task_id": "T003", "task_name": "Implement login and registration backend", "owner": "Backend", "due_date": "2025-02-13", "priority": "High", "dependencies": ["T001"] } ], "total_tasks": 3 }
You can then POST tasks into Jira/Trello with their APIs and auto-create all tickets.
To recap:
JSONDecodeError with error feedback to the model."1299" → 1299.0) with logging.Once you can reliably get structured JSON out of an LLM, you move from:
to:
That’s the real unlock.
\


