Testing / probleam_statement.txt
MrTechie's picture
The first shit that works end-to-end
916322f
============================================
Problem Package – Placeholder-Free Drafting
============================================
This document captures everything another engineer / model needs to understand
and complete the "placeholder-free legal drafting" task:
1. Problem statement
2. Current file structure (relevant subset)
3. Key existing code (verbatim)
4. Chain-of-thought / proposed solution
5. Reference code patch to implement the fix
--------------------------------------------------
1. Problem statement
--------------------------------------------------
We have a FastAPI + LangChain agent that performs three steps each user turn:
1. Conversational chain gathers data and decides actions.
2. Document drafter chain produces a legal draft (JSON with fields
`draft` and `is_drafted`).
3. Placeholder-checker chain scans the draft for unresolved tokens such as
`[DATE]`, `[NAME]`, etc.
Issue
-----
β€’ The drafter still returns drafts that contain placeholders.
β€’ The checker detects them but there is **no feedback loop** – the agent never
retries nor asks the user for the missing details.
β€’ Users end up with documents containing `[Your Address]`, etc.
Goal
----
Design a **simple, production-grade mechanism** that guarantees placeholder-free
documents while keeping chains modular.
--------------------------------------------------
2. File structure (excerpt)
--------------------------------------------------
agree_upon/
β”œβ”€β”€ api/
β”œβ”€β”€ agent/
β”‚ β”œβ”€β”€ agent_runner.py < orchestrator (snippet patched below)
β”‚ └── chains/
β”‚ β”œβ”€β”€ document_drafter_chain.py < full file below
β”‚ β”œβ”€β”€ conversational_legal_chain.py
β”‚ └── placeholder_checker.py
└── probleam_statement.txt < you are here
--------------------------------------------------
3. Key existing code
--------------------------------------------------
A) agent/chains/document_drafter_chain.py (current full content)
----------------------------------------------------------------
```python
"""
agent/chains/document_drafter_chain.py
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Generates or edits the legal draft *purely* from AgentState + instruction.
INPUT keys:
β€’ document_type – str
β€’ filled_fields_json – str (JSON object of {field: value})
β€’ current_draft – str (may be empty)
β€’ instruction – str (e.g. "create fresh draft" | "swap Party A …")
OUTPUT (strict JSON):
{
"draft": "<complete draft>",
"is_drafted": true
}
"""
import json
import logging
from typing import Dict
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from pydantic import BaseModel, Field
from agent.llm import llm_chain
logger = logging.getLogger("agent.drafter")
logger.setLevel(logging.DEBUG)
# ────────────────────────────
# JSON schema & parser
# ────────────────────────────
from agent.utils import safe_parse_json_block, _iterate_json_candidates
class SimpleDraftOutputParser:
def get_format_instructions(self) -> str:
return 'Return ONLY a JSON object with keys "draft" and "is_drafted".'
def parse(self, text: str) -> dict:
# Try all balanced-brace JSON blocks
for candidate in _iterate_json_candidates(text):
parsed = safe_parse_json_block(candidate)
if parsed and 'draft' in parsed and 'is_drafted' in parsed:
return parsed
raise ValueError(f"No valid draft JSON found in output:\n{text}")
parser = SimpleDraftOutputParser()
# ────────────────────────────
# Prompt
# ────────────────────────────
_DRAFTER_PROMPT = PromptTemplate(
input_variables=["document_type", "filled_fields_json",
"current_draft", "instruction"],
template=r"""
You are a veteran legal drafter.
────────────────────────────────────────────
πŸ“ Document type: {document_type}
πŸ“‘ Field values (JSON): {filled_fields_json}
πŸ“„ Existing draft: <<<START>>>
{current_draft}
<<<END>>>
πŸ›ˆ Instruction: {instruction}
────────────────────────────────────────────
TASK:
β€’ Produce a *clean*, professional draft in plain text.
β€’ No placeholders like [DATE] – substitute actual field values.
β€’ NO boilerplate like "Here is your draft".
β€’ Return only compact JSON exactly:
{{
"draft": "<the complete draft here>",
"is_drafted": true
}}
No markdown, no commentary.
{format_instructions}
"""
)
def get_document_drafter_chain() -> LLMChain:
return LLMChain(
llm=llm_chain,
prompt=_DRAFTER_PROMPT.partial(
format_instructions=parser.get_format_instructions()
),
output_key="text", # parsed later by agent_runner
verbose=True,
)
```
B) agent/chains/conversational_legal_chain.py (current full content)
---------------------------------------------------------------------
```python
"""
agent/chains/conversational_legal_chain.py
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Natural conversation chain that:
β€’ Learns/updates `document_type`
β€’ Collects & validates `needed_fields`
β€’ Replies naturally to the user
β€’ Emits a *strict* JSON command set for the runner:
{
"actions": ["update_document_type", "update_needed_values", "update_document"],
"user_reply": "<string>",
"update_document_type": "<DOC_TYPE|NONE>",
"update_needed_values": { "<field>": "<value>", ... } | {},
"update_document_instruction": "<string|NONE>"
}
"""
from typing import Dict, Any
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from agent.llm import llm_chain # β†’ your ChatOpenAI wrapper
_CLS_PROMPT = PromptTemplate(
input_variables=["history", "user_input", "state"],
template=r"""
Act as an *empathetic legal-AI assistant*.
Goal β†’ build an accurate AgentState (shown below) and help the user.
Never violate the guard-rules afterwards.
────────────────────────────────────────────
πŸ“¦ Current AgentState (read-only summary)
{state}
────────────────────────────────────────────
πŸ’¬ Conversation history
{history}
────────────────────────────────────────────
πŸ§‘ Latest user message
{user_input}
────────────────────────────────────────────
🎯 MUST:
1. Carry on a *natural* dialogue. Explain, clarify, warn politely.
2. Decide which of the following *atomic* actions you must take **this turn**:
β€’ `update_document_type` – we just learned / corrected the doc type
β€’ `update_needed_values` – we have one or more field values to add
β€’ `update_document` – ready to draft / revise the draft
You MAY output multiple actions at once.
3. Reply to the user in `user_reply`.
4. Return **ONLY** a valid compact JSON, no markdown, no commentary.
Schema:
{{
"actions": [...], // ⬆ see list
"user_reply": "<your reply>",
"update_document_type": "<type|NONE>",
"update_needed_values": {{ "<field>": "<value>", ... }},
"update_document_instruction": "<instruction|NONE>"
}}
Guard-rules:
β€’ If AgentState.is_drafted is true β†’ changing `document_type` is **forbidden**.
Instead, offer to refine the existing draft.
β€’ Politely warn & double-check if the user’s input seems wrong or contradictory.
"""
)
def get_conversational_legal_chain(memory) -> LLMChain:
"""Factory β€” returns an LLMChain with SQL-buffer memory attached."""
return LLMChain(
llm=llm_chain,
prompt=_CLS_PROMPT,
memory=memory,
output_key="text",
verbose=True # <β€” runner pulls .text then parses JSON
)
```
C) agent/chains/placeholder_checker.py (current full content)
-------------------------------------------------------------
```python
"""
Detect unresolved placeholders in a draft.
Changes
β€’ Returns structured list via PydanticOutputParser
"""
import logging
from typing import List
from pydantic import BaseModel, Field
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.output_parsers import PydanticOutputParser
from agent.llm import llm_chain
from agent.prompts import PLACEHOLDER_CHECKER_PROMPT
logger = logging.getLogger("agent.placeholder_checker")
logger.setLevel(logging.DEBUG)
# ────────────────────────────
# JSON schema
# ────────────────────────────
class PlaceholderOutput(BaseModel):
placeholders: List[str] = Field(
..., description="List of unresolved placeholder tokens"
)
parser = PydanticOutputParser(pydantic_object=PlaceholderOutput)
# ────────────────────────────
# Prompt
# ────────────────────────────
prompt = PromptTemplate(
input_variables=["draft"],
template="""
You are reviewing a legal draft for unresolved placeholders such as
[DATE], [NAME], [ADDRESS], [PLACEHOLDER], etc.
Draft:
{draft}
{placeholder_checker_prompt}
Return ONLY a JSON list, e.g. ["[DATE]", "[NAME]"] or [].
"""
)
# ────────────────────────────
# Chain
# ────────────────────────────
def get_placeholder_checker_chain() -> LLMChain:
return LLMChain(
llm=llm_chain,
prompt=prompt.partial(placeholder_checker_prompt=PLACEHOLDER_CHECKER_PROMPT),
output_key="text", # raw text; we'll parse separately
verbose=True,
)
```
B) agent/agent_runner.py (patched excerpt – placeholder gate)
-------------------------------------------------------------
```python
import re, datetime
PLACEHOLDER_RE = re.compile(r"\[[^\[\]\n]{1,40}\]") # e.g. [DATE]
MAX_RETRIES = 2
AUTO_FILL = {
"[DATE]": datetime.date.today().strftime("%B %-d, %Y"),
}
def _find_placeholders(text: str) -> list[str]:
return PLACEHOLDER_RE.findall(text)
auto_retry_count = 0
while True:
# drafter already ran β†’ state.draft populated
# optional quick auto-fill
for ph, val in AUTO_FILL.items():
state.draft = state.draft.replace(ph, val)
missing = _find_placeholders(state.draft)
if not missing:
break # draft clean βœ…
if auto_retry_count < MAX_RETRIES:
auto_retry_count += 1
instr = (
"Replace the unresolved placeholders "
+ ", ".join(missing)
+ " with concrete values based on filled fields and context."
)
drafter_raw = invoke_with_retry(drafter, {
"document_type": state.document_type,
"filled_fields_json": json.dumps(state.needed_fields),
"current_draft": state.draft,
"instruction": instr,
})
state.draft = _extract_text(drafter_raw)
continue # loop again
# after retries, still missing β†’ ask user
reply_to_user = (
"I still need the following details to finalise the draft: "
+ ", ".join(missing)
+ ". Could you please provide them?"
)
state.needed_fields.update(_map_placeholders_to_fields(missing))
_persist(memory, conv_id, user_input, reply_to_user, state)
return # end turn – conversational chain will ask next time
```
--------------------------------------------------
5. Reference patch checklist
--------------------------------------------------
βœ“ Add `PLACEHOLDER_RE`, `MAX_RETRIES`, `_find_placeholders` and loop to
`agent_runner.py`.
βœ“ Optionally extend `AUTO_FILL` map.
βœ“ No other files need modification.
--- End of package ---
solution
# Upgrade Guide β€” β€œAsk-Until-Clean” Drafting
*(logic explanation + concrete change list you can paste into Vibe Coding)*
---
## 🌐 How the new flow works (plain English)
1. **Generate a draft.**
`document_drafter_chain` turns the currently-known field values into a full document.
2. **Check the draft for leftover tokens** like `[DATE]`, `[PARTY_ADDRESS]`.
`placeholder_checker_chain` now returns one JSON object:
```json
{ "is_success": true|false, "missing_desc": "natural language list of what’s still empty" }
```
3. **Branch on the result**
* **Success (`is_success == true`)** β†’ send the polished draft to the user, stop.
* **Failure (`is_success == false`)** β†’ fall back to the conversational chain:
> β€œI am an internal checker. The draft is missing: *effective date, addresses* β€” please ask the user for these details.”
The conversational chain politely asks the user, captures the answers, and emits `update_needed_values`. The runner merges those into state and goes back to step 1.
4. **Guard against endless loops.**
If the agent has already asked twice (`MAX_USER_PROMPTS = 2`) and the user still hasn’t supplied the data, it exits gracefully:
> β€œI’m still missing details. Let’s continue once you have them.”
This guarantees either a placeholder-free draft *or* a controlled, polite stop.
---
## πŸ›  What to change (file-by-file)
| File | Change |
| ----------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **`agent/chains/placeholder_checker_chain.py`** | *Replace* the Pydantic model with:<br>`class PlaceholderCheckOut(BaseModel): is_success: bool; missing_desc: str`<br>*Replace* the prompt with the three-line logic shown below. |
| **`AgentState` dataclass / pydantic model** | Add `missing_prompt_count: int = 0`. |
| **`agent/agent_runner.py`** | Remove the old regex-retry loop. Add the new orchestration block (β‰ˆ25 lines) shown below. |
### πŸ”‘ New checker prompt (copy verbatim)
```
Scan the draft below.
β€’ If NO tokens like [DATE] remain:
return {"is_success": true, "missing_desc": ""}
β€’ Otherwise:
return {"is_success": false, "missing_desc": "<short English list of what’s missing>"}
Return ONLY that JSON. No other text.
Draft:
{draft}
```
### πŸ”‘ New runner logic (insert where the old placeholder code was)
```python
MAX_USER_PROMPTS = 2
draft_raw = drafter.run({...})
draft = drafter_parser.parse(draft_raw)["draft"]
check_raw = placeholder_checker.run({"draft": draft})
check = checker_parser.parse(check_raw)
if check.is_success: # ➊ clean β†’ finish
state.draft = draft
reply_to_user = "βœ… All set! Your document is fully drafted. Let me know if you'd like any edits."
_persist(memory, conv_id, user_input, reply_to_user, state)
return
if state.missing_prompt_count >= MAX_USER_PROMPTS: # βž‹ give up politely
reply_to_user = (
"I'm still missing details (" + check.missing_desc +
"). Let's continue once you have them."
)
_persist(memory, conv_id, user_input, reply_to_user, state)
return
state.missing_prompt_count += 1 # ➌ ask user
system_addition = (
"I am an internal checker. The draft is missing: " +
check.missing_desc +
". Please ask the user for these details."
)
conv_raw = conversational_chain.run({
"system_addition": system_addition,
**standard_inputs
})
conv_cmds = conv_parser.parse(conv_raw)
_apply_commands(conv_cmds) # merges user answers into state
return # wait for next turn
```
---
## πŸ”’ Safety & validation
* **Type checking:** the conversational chain prompt already instructs the LLM to ensure answers match expected formats.
* **Second line of defence (optional):** after `_apply_commands`, run lightweight regex/date parsers; if a value fails validation, ignore it and prompt again.
---
### Result
*User sees at most two polite follow-up questions; you never deliver a document with `[PLACEHOLDERS]`; and the codebase changes only in one chain, one dataclass field, and one runner block.*