Fixing Concurrent Agent Slowness in llama-server (and Why I Didn't Switch to vLLM)

Posted on Sat 09 May 2026 in Thought, AI, Security Research • Tagged with chronicles, AI agents, claude, LiteLLM, llama-server, local models, routing

This is a follow-up to what I learned running local models in my agent pipeline. That post covered context sizing and KV cache memory. This one covers what I got wrong about concurrency.

The Problem: Agents Queuing Up

My pipeline runs up to four agents simultaneously, called step1-2, step3, step4 …


Continue reading

How I Audit Security Patches with an AI Pipeline

Posted on Sat 02 May 2026 in Thought, AI, Security Research • Tagged with chronicles, AI agents, WebKit, security research, vulnerability research, patch auditing, methodology

Most security patch auditing tools look for known vulnerability patterns. They diff a commit, grep for dangerous functions, maybe flag things that look like what last year's CVEs looked like. That works for the obvious stuff. It doesn't work for the commit that says "no behavior change" and silently fixes …


Continue reading

What I Learned Running Local Models in My Agent Pipeline

Posted on Sat 25 April 2026 in Thought, AI, Security Research • Tagged with chronicles, AI agents, claude, LiteLLM, llama-server, local models, routing

This is a follow-up to my previous post on routing agents through LiteLLM. That post covered the architecture. This one covers what broke when I actually ran it.

Claude Code Doesn't Pass Through Arbitrary Model Names

The first thing I got wrong: I assumed model: local-sonnet in agent frontmatter would …


Continue reading

How I Route AI Agents Through a Local Model Proxy

Posted on Tue 21 April 2026 in Thought, AI, Security Research • Tagged with chronicles, AI agents, claude, LiteLLM, llama-server, pipeline, local models, routing

This is a follow-up to my previous post where I covered reducing token costs in a multi-agent pipeline. That post touched on local model fallback at a high level. This one goes deeper on how the routing layer actually works.

The Pipeline

I have five agents, split across two tiers …


Continue reading

How I Cut AI Agent Costs Without Cutting Corners

Posted on Mon 20 April 2026 in Thought, AI, Security Research • Tagged with chronicles, AI agents, claude, pipeline, cost optimization, LLM

Running a multi-agent pipeline for security research gets expensive fast. I have several agents doing sequential analysis work - reading commit diffs, running adversarial bypass analysis, building proof-of-concept exploits. Token costs compound at every step: system prompts, tool schemas, conversation history, and verbose outputs all stack up before a single useful …


Continue reading