AI & Agents
AI agent observability vs. traditional software
Why monitoring AI agents is a different game: traces, a single LLM evaluator, human review and prompts treated as code — our day-to-day routine.
Blog / AI & Agents
AI agents in production: evals, error analysis, observability and what we learned running them with real clinics.
Why monitoring AI agents is a different game: traces, a single LLM evaluator, human review and prompts treated as code — our day-to-day routine.
How we doubled our agent's escalation accuracy with error analysis and manual annotation — and why LLM-as-a-Judge wasn't the way.