Spaces:

agreeupon
/

Testing

Runtime error

App Files Files Community

Testing / probleam_statement.txt

MrTechie

The first shit that works end-to-end

916322f 8 months ago

raw

history blame contribute delete

17.4 kB

	============================================
	Problem Package – Placeholder-Free Drafting
	============================================

	This document captures everything another engineer / model needs to understand
	and complete the "placeholder-free legal drafting" task:

	1. Problem statement
	2. Current file structure (relevant subset)
	3. Key existing code (verbatim)
	4. Chain-of-thought / proposed solution
	5. Reference code patch to implement the fix

	--------------------------------------------------
	1. Problem statement
	--------------------------------------------------
	We have a FastAPI + LangChain agent that performs three steps each user turn:

	1. Conversational chain gathers data and decides actions.
	2. Document drafter chain produces a legal draft (JSON with fields
	`draft` and `is_drafted`).
	3. Placeholder-checker chain scans the draft for unresolved tokens such as
	`[DATE]`, `[NAME]`, etc.

	Issue
	-----
	• The drafter still returns drafts that contain placeholders.
	• The checker detects them but there is no feedback loop – the agent never
	retries nor asks the user for the missing details.
	• Users end up with documents containing `[Your Address]`, etc.

	Goal
	----
	Design a simple, production-grade mechanism that guarantees placeholder-free
	documents while keeping chains modular.

	--------------------------------------------------
	2. File structure (excerpt)
	--------------------------------------------------
	agree_upon/
	├── api/
	├── agent/
	│ ├── agent_runner.py < orchestrator (snippet patched below)
	│ └── chains/
	│ ├── document_drafter_chain.py < full file below
	│ ├── conversational_legal_chain.py
	│ └── placeholder_checker.py
	└── probleam_statement.txt < you are here

	--------------------------------------------------
	3. Key existing code
	--------------------------------------------------

	A) agent/chains/document_drafter_chain.py (current full content)
	----------------------------------------------------------------
	```python
	"""
	agent/chains/document_drafter_chain.py
	━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
	Generates or edits the legal draft purely from AgentState + instruction.

	INPUT keys:
	• document_type – str
	• filled_fields_json – str (JSON object of {field: value})
	• current_draft – str (may be empty)
	• instruction – str (e.g. "create fresh draft" \| "swap Party A …")

	OUTPUT (strict JSON):
	{
	"draft": "<complete draft>",
	"is_drafted": true
	}
	"""

	import json
	import logging
	from typing import Dict

	from langchain.chains import LLMChain
	from langchain.prompts import PromptTemplate

	from pydantic import BaseModel, Field

	from agent.llm import llm_chain

	logger = logging.getLogger("agent.drafter")
	logger.setLevel(logging.DEBUG)

	# ────────────────────────────
	# JSON schema & parser
	# ────────────────────────────
	from agent.utils import safe_parse_json_block, _iterate_json_candidates

	class SimpleDraftOutputParser:
	def get_format_instructions(self) -> str:
	return 'Return ONLY a JSON object with keys "draft" and "is_drafted".'

	def parse(self, text: str) -> dict:
	# Try all balanced-brace JSON blocks
	for candidate in _iterate_json_candidates(text):
	parsed = safe_parse_json_block(candidate)
	if parsed and 'draft' in parsed and 'is_drafted' in parsed:
	return parsed
	raise ValueError(f"No valid draft JSON found in output:\n{text}")

	parser = SimpleDraftOutputParser()

	# ────────────────────────────
	# Prompt
	# ────────────────────────────
	_DRAFTER_PROMPT = PromptTemplate(
	input_variables=["document_type", "filled_fields_json",
	"current_draft", "instruction"],
	template=r"""
	You are a veteran legal drafter.

	────────────────────────────────────────────
	📝 Document type: {document_type}
	📑 Field values (JSON): {filled_fields_json}
	📄 Existing draft: <<<START>>>
	{current_draft}
	<<<END>>>
	🛈 Instruction: {instruction}
	────────────────────────────────────────────

	TASK:
	• Produce a clean, professional draft in plain text.
	• No placeholders like [DATE] – substitute actual field values.
	• NO boilerplate like "Here is your draft".
	• Return only compact JSON exactly:

	{{
	"draft": "<the complete draft here>",
	"is_drafted": true
	}}

	No markdown, no commentary.

	{format_instructions}
	"""
	)

	def get_document_drafter_chain() -> LLMChain:
	return LLMChain(
	llm=llm_chain,
	prompt=_DRAFTER_PROMPT.partial(
	format_instructions=parser.get_format_instructions()
	),
	output_key="text", # parsed later by agent_runner
	verbose=True,
	)
	```

	B) agent/chains/conversational_legal_chain.py (current full content)
	---------------------------------------------------------------------
	```python
	"""
	agent/chains/conversational_legal_chain.py
	━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
	Natural conversation chain that:

	• Learns/updates `document_type`
	• Collects & validates `needed_fields`
	• Replies naturally to the user
	• Emits a strict JSON command set for the runner:

	{
	"actions": ["update_document_type", "update_needed_values", "update_document"],
	"user_reply": "<string>",
	"update_document_type": "<DOC_TYPE\|NONE>",
	"update_needed_values": { "<field>": "<value>", ... } \| {},
	"update_document_instruction": "<string\|NONE>"
	}
	"""

	from typing import Dict, Any

	from langchain.chains import LLMChain
	from langchain.prompts import PromptTemplate

	from agent.llm import llm_chain # → your ChatOpenAI wrapper

	_CLS_PROMPT = PromptTemplate(
	input_variables=["history", "user_input", "state"],
	template=r"""
	Act as an empathetic legal-AI assistant.

	Goal → build an accurate AgentState (shown below) and help the user.
	Never violate the guard-rules afterwards.

	────────────────────────────────────────────
	📦 Current AgentState (read-only summary)
	{state}
	────────────────────────────────────────────
	💬 Conversation history
	{history}
	────────────────────────────────────────────
	🧑 Latest user message
	{user_input}
	────────────────────────────────────────────
	🎯 MUST:

	1. Carry on a natural dialogue. Explain, clarify, warn politely.
	2. Decide which of the following atomic actions you must take this turn:
	• `update_document_type` – we just learned / corrected the doc type
	• `update_needed_values` – we have one or more field values to add
	• `update_document` – ready to draft / revise the draft
	You MAY output multiple actions at once.
	3. Reply to the user in `user_reply`.
	4. Return ONLY a valid compact JSON, no markdown, no commentary.

	Schema:
	{{
	"actions": [...], // ⬆ see list
	"user_reply": "<your reply>",
	"update_document_type": "<type\|NONE>",
	"update_needed_values": {{ "<field>": "<value>", ... }},
	"update_document_instruction": "<instruction\|NONE>"
	}}

	Guard-rules:
	• If AgentState.is_drafted is true → changing `document_type` is forbidden.
	Instead, offer to refine the existing draft.
	• Politely warn & double-check if the user’s input seems wrong or contradictory.
	"""
	)

	def get_conversational_legal_chain(memory) -> LLMChain:
	"""Factory — returns an LLMChain with SQL-buffer memory attached."""
	return LLMChain(
	llm=llm_chain,
	prompt=_CLS_PROMPT,
	memory=memory,
	output_key="text",
	verbose=True # <— runner pulls .text then parses JSON
	)
	```

	C) agent/chains/placeholder_checker.py (current full content)
	-------------------------------------------------------------
	```python
	"""
	Detect unresolved placeholders in a draft.

	Changes
	• Returns structured list via PydanticOutputParser
	"""

	import logging
	from typing import List
	from pydantic import BaseModel, Field

	from langchain.prompts import PromptTemplate
	from langchain.chains import LLMChain
	from langchain.output_parsers import PydanticOutputParser

	from agent.llm import llm_chain
	from agent.prompts import PLACEHOLDER_CHECKER_PROMPT

	logger = logging.getLogger("agent.placeholder_checker")
	logger.setLevel(logging.DEBUG)

	# ────────────────────────────
	# JSON schema
	# ────────────────────────────
	class PlaceholderOutput(BaseModel):
	placeholders: List[str] = Field(
	..., description="List of unresolved placeholder tokens"
	)

	parser = PydanticOutputParser(pydantic_object=PlaceholderOutput)

	# ────────────────────────────
	# Prompt
	# ────────────────────────────
	prompt = PromptTemplate(
	input_variables=["draft"],
	template="""
	You are reviewing a legal draft for unresolved placeholders such as
	[DATE], [NAME], [ADDRESS], [PLACEHOLDER], etc.

	Draft:
	{draft}

	{placeholder_checker_prompt}

	Return ONLY a JSON list, e.g. ["[DATE]", "[NAME]"] or [].
	"""
	)

	# ────────────────────────────
	# Chain
	# ────────────────────────────
	def get_placeholder_checker_chain() -> LLMChain:
	return LLMChain(
	llm=llm_chain,
	prompt=prompt.partial(placeholder_checker_prompt=PLACEHOLDER_CHECKER_PROMPT),
	output_key="text", # raw text; we'll parse separately
	verbose=True,
	)
	```

	B) agent/agent_runner.py (patched excerpt – placeholder gate)
	-------------------------------------------------------------
	```python
	import re, datetime

	PLACEHOLDER_RE = re.compile(r"\[[^\[\]\n]{1,40}\]") # e.g. [DATE]
	MAX_RETRIES = 2
	AUTO_FILL = {
	"[DATE]": datetime.date.today().strftime("%B %-d, %Y"),
	}

	def _find_placeholders(text: str) -> list[str]:
	return PLACEHOLDER_RE.findall(text)

	auto_retry_count = 0
	while True:
	# drafter already ran → state.draft populated

	# optional quick auto-fill
	for ph, val in AUTO_FILL.items():
	state.draft = state.draft.replace(ph, val)

	missing = _find_placeholders(state.draft)
	if not missing:
	break # draft clean ✅

	if auto_retry_count < MAX_RETRIES:
	auto_retry_count += 1
	instr = (
	"Replace the unresolved placeholders "
	+ ", ".join(missing)
	+ " with concrete values based on filled fields and context."
	)
	drafter_raw = invoke_with_retry(drafter, {
	"document_type": state.document_type,
	"filled_fields_json": json.dumps(state.needed_fields),
	"current_draft": state.draft,
	"instruction": instr,
	})
	state.draft = _extract_text(drafter_raw)
	continue # loop again

	# after retries, still missing → ask user
	reply_to_user = (
	"I still need the following details to finalise the draft: "
	+ ", ".join(missing)
	+ ". Could you please provide them?"
	)
	state.needed_fields.update(_map_placeholders_to_fields(missing))
	_persist(memory, conv_id, user_input, reply_to_user, state)
	return # end turn – conversational chain will ask next time
	```

	--------------------------------------------------
	5. Reference patch checklist
	--------------------------------------------------
	✓ Add `PLACEHOLDER_RE`, `MAX_RETRIES`, `_find_placeholders` and loop to
	`agent_runner.py`.
	✓ Optionally extend `AUTO_FILL` map.
	✓ No other files need modification.

	--- End of package ---

	solution

	# Upgrade Guide — “Ask-Until-Clean” Drafting

	(logic explanation + concrete change list you can paste into Vibe Coding)

	---

	## 🌐 How the new flow works (plain English)

	1. Generate a draft.
	`document_drafter_chain` turns the currently-known field values into a full document.

	2. Check the draft for leftover tokens like `[DATE]`, `[PARTY_ADDRESS]`.
	`placeholder_checker_chain` now returns one JSON object:

	```json
	{ "is_success": true\|false, "missing_desc": "natural language list of what’s still empty" }
	```

	3. Branch on the result

	* Success (`is_success == true`) → send the polished draft to the user, stop.
	* Failure (`is_success == false`) → fall back to the conversational chain:

	> “I am an internal checker. The draft is missing: effective date, addresses — please ask the user for these details.”

	The conversational chain politely asks the user, captures the answers, and emits `update_needed_values`. The runner merges those into state and goes back to step 1.

	4. Guard against endless loops.
	If the agent has already asked twice (`MAX_USER_PROMPTS = 2`) and the user still hasn’t supplied the data, it exits gracefully:

	> “I’m still missing details. Let’s continue once you have them.”

	This guarantees either a placeholder-free draft or a controlled, polite stop.

	---

	## 🛠 What to change (file-by-file)

	\| File \| Change \|
	\| ----------------------------------------------- \| -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- \|
	\| `agent/chains/placeholder_checker_chain.py` \| Replace the Pydantic model with:<br>`class PlaceholderCheckOut(BaseModel): is_success: bool; missing_desc: str`<br>Replace the prompt with the three-line logic shown below. \|
	\| `AgentState` dataclass / pydantic model \| Add `missing_prompt_count: int = 0`. \|
	\| `agent/agent_runner.py` \| Remove the old regex-retry loop. Add the new orchestration block (≈25 lines) shown below. \|

	### 🔑 New checker prompt (copy verbatim)

	```
	Scan the draft below.

	• If NO tokens like [DATE] remain:
	return {"is_success": true, "missing_desc": ""}

	• Otherwise:
	return {"is_success": false, "missing_desc": "<short English list of what’s missing>"}

	Return ONLY that JSON. No other text.

	Draft:
	{draft}
	```

	### 🔑 New runner logic (insert where the old placeholder code was)

	```python
	MAX_USER_PROMPTS = 2

	draft_raw = drafter.run({...})
	draft = drafter_parser.parse(draft_raw)["draft"]

	check_raw = placeholder_checker.run({"draft": draft})
	check = checker_parser.parse(check_raw)

	if check.is_success: # ➊ clean → finish
	state.draft = draft
	reply_to_user = "✅ All set! Your document is fully drafted. Let me know if you'd like any edits."
	_persist(memory, conv_id, user_input, reply_to_user, state)
	return

	if state.missing_prompt_count >= MAX_USER_PROMPTS: # ➋ give up politely
	reply_to_user = (
	"I'm still missing details (" + check.missing_desc +
	"). Let's continue once you have them."
	)
	_persist(memory, conv_id, user_input, reply_to_user, state)
	return

	state.missing_prompt_count += 1 # ➌ ask user
	system_addition = (
	"I am an internal checker. The draft is missing: " +
	check.missing_desc +
	". Please ask the user for these details."
	)

	conv_raw = conversational_chain.run({
	"system_addition": system_addition,
	**standard_inputs
	})
	conv_cmds = conv_parser.parse(conv_raw)
	_apply_commands(conv_cmds) # merges user answers into state
	return # wait for next turn
	```

	---

	## 🔒 Safety & validation

	* Type checking: the conversational chain prompt already instructs the LLM to ensure answers match expected formats.
	* Second line of defence (optional): after `_apply_commands`, run lightweight regex/date parsers; if a value fails validation, ignore it and prompt again.

	---

	### Result

	User sees at most two polite follow-up questions; you never deliver a document with `[PLACEHOLDERS]`; and the codebase changes only in one chain, one dataclass field, and one runner block.