Fixing Concurrent Agent Slowness in llama-server (and Why I Didn't Switch to vLLM)

Posted on Sat 09 May 2026 in Thought, AI, Security Research • Tagged with chronicles, AI agents, claude, LiteLLM, llama-server, local models, routing

This is a follow-up to what I learned running local models in my agent pipeline. That post covered context sizing and KV cache memory. This one covers what I got wrong about concurrency.

The Problem: Agents Queuing Up

My pipeline runs up to four agents simultaneously, called step1-2, step3, step4 …


Continue reading

How I Audit Security Patches with an AI Pipeline

Posted on Sat 02 May 2026 in Thought, AI, Security Research • Tagged with chronicles, AI agents, WebKit, security research, vulnerability research, patch auditing, methodology

Most security patch auditing tools look for known vulnerability patterns. They diff a commit, grep for dangerous functions, maybe flag things that look like what last year's CVEs looked like. That works for the obvious stuff. It doesn't work for the commit that says "no behavior change" and silently fixes …


Continue reading

Patch Audit Schema - SOUNDNESS_CLAIMS Spec

Posted on Sat 02 May 2026 in Security Research • Tagged with methodology, patch auditing, schema, vulnerability research

This is the schema spec referenced in How I Audit Security Patches with an AI Pipeline. It defines the SOUNDNESS_CLAIMS format, the empirical proof requirement, and the adjacent-unpatched rule.

SOUNDNESS_CLAIMS Format

Each claim is a falsifiable hypothesis about a specific attack vector. Required fields:

F(x):          The trigger - exact JS …

Continue reading

What I Learned Running Local Models in My Agent Pipeline

Posted on Sat 25 April 2026 in Thought, AI, Security Research • Tagged with chronicles, AI agents, claude, LiteLLM, llama-server, local models, routing

This is a follow-up to my previous post on routing agents through LiteLLM. That post covered the architecture. This one covers what broke when I actually ran it.

Claude Code Doesn't Pass Through Arbitrary Model Names

The first thing I got wrong: I assumed model: local-sonnet in agent frontmatter would …


Continue reading

How I Route AI Agents Through a Local Model Proxy

Posted on Tue 21 April 2026 in Thought, AI, Security Research • Tagged with chronicles, AI agents, claude, LiteLLM, llama-server, pipeline, local models, routing

This is a follow-up to my previous post where I covered reducing token costs in a multi-agent pipeline. That post touched on local model fallback at a high level. This one goes deeper on how the routing layer actually works.

The Pipeline

I have five agents, split across two tiers …


Continue reading