Hi, @Leonardo Vieira
Root Cause Analysis:
The inconsistent behavior (the "Flakiness Test" failing) indicates instruction ambiguity rather than configuration errors. When AVA exhibits ~80% success rates, it's typically due to:
- Locality Violation: Instructions placed too far from the decision point are less reliably followed. Global Guidelines are weaker than Tool-level Pre-instructions.
- Conflicting Instruction Layers: Contradictions between Global (Role), Behavioral (Guidelines), Tool Instructions, and Post-Action instructions.
- Ambiguous Tool Descriptions: Tool name/description overlap causing the APT-1 model to misidentify when to invoke specific tools.
The Locality Hierarchy (Most to Least Reliable):
Post-Action Instructions (on_success_when) [Most Reliable]
↓
Tool-Level Pre-instructions
↓
Behavior Guidelines
↓
Global Instructions (Role/Setting) [Least Reliable]
Debugging Approach (6-Step Protocol):
- Identify Symptom: Categorize issue (looping, wrong function call, ignored instruction, repetitive questioning)
- Check Locality: Verify if relevant instruction is close enough to the decision point
- Move tool-specific guidance from Global Guidelines to Tool Pre-instructions
- Check Ambiguity: Audit all tool descriptions for overlap
- Example: ❌ "get_user_details" vs ✅ "get_user_checking_details" (more specific)
- Check Conflicts: Review for contradictory instructions across layers
- Global vs. local instruction conflicts
- Add Examples: Implement WRONG/RIGHT examples for desired behavior
- Add deterministic constraints for critical flow points
- Test Variations: Verify if issue reproduces consistently or intermittently
Remediation Best Practices:
For Repetitive Information Requests:
- Add explicit "do not re-ask" instruction to the specific tool's Pre-instructions (not Global Guidelines)
- Implement context tracking in Start Context variables
For Tool-Firing Issues:
- Use verb-based naming:
validate_login_code not check_user
- Add domain-specific clarity:
pay_bill_to_electricity_provider not tool_get_stuff
- Specify in Description: what it does, parameters needed, what it returns
The Reinforcement Pattern (Critical Behaviors):
For mission-critical behaviors, apply redundancy across multiple layers:
- Global level: "Do not use numbered lists-use comma-separated items instead."
- Tool level: "When presenting results, list items separated by commas, not numbers."
Additional Technical Considerations:
- Context Window Optimization: Implement variable masking for authentication tokens/UUIDs to reduce context size and improve APT-1 reasoning performance
- Start Context Variables: Must match Architect flow outputs using underscore naming conventions (e.g.,
customer_name, in_person_banking_available)
- Violation Management: Set violation limit to 3 for production; too low causes premature session termination
- Knowledge Fallback: Always add "do not fabricate" instruction to Knowledge outcome instructions to prevent hallucination
- Post-Action Instructions: Define explicit on_success_when handlers for each tool outcome path
Configuration Review Red Flags:
- Tool descriptions are generic or overlap with other tools
- "Do not re-ask" only in Global Guidelines (not Tool Pre-instructions)
- Missing Post-Action instructions for tool outcomes
- Behavioral guidance mistakenly placed in Guardrails (should be in Guidelines)
This approach addresses the fundamental issue: APT-1's probabilistic nature requires instruction proximity and clarity to maintain deterministic behavior in production environments.
------------------------------
Lineu Romão
------------------------------