[{"data":1,"prerenderedAt":184},["ShallowReactive",2],{"blog-en-one-symptom-five-causes":3,"header-blog-translations-/en/blog/one-symptom-five-causes":181},{"id":4,"title":5,"author":6,"body":7,"date":163,"description":164,"draft":165,"extension":166,"image":167,"meta":168,"navigation":169,"path":170,"seo":171,"stem":172,"tags":173,"translationKey":179,"__hash__":180},"blog_en/blog/en/one-symptom-five-causes.md","When a Symptom Looks Like One Bug, It's Five","Patrick Hofmann",{"type":8,"value":9,"toc":155},"minimark",[10,23,26,31,38,41,45,52,55,65,75,99,106,109,113,116,119,122,126,134,141,145,148],[11,12,13,14,18,19,22],"p",{},"The first time the whole Nest pipeline ran end-to-end, I saw exactly three things: my Telegram got an approval notification every 15 seconds, the ",[15,16,17],"code",{},"troop"," list on the SP surface was empty, and the agent ",[15,20,21],{},"igor"," I had just spawned was nowhere visible. Three observations that felt like a single broken thing. That's how you read it: one symptom, one bug, one cause, one fix.",[11,24,25],{},"It was five causes. Each on its own would have produced a different symptom — or none at all. Only the chaining produced exactly what I saw.",[27,28,30],"h2",{"id":29},"how-it-was-before","How it was before",[11,32,33,34,37],{},"The pipeline had grown over several phases, each piece tested on its own. The IdP that issues agent identities. The ",[15,35,36],{},"apes"," CLI that spawns. A YOLO auto-approval layer that lets certain commands through without a prompt. A Nest daemon that starts the agents' bridges and keeps them alive. And underneath, launchd plists that manage processes on macOS.",[11,39,40],{},"Every part had its own tests, every part was green. Only nobody had ever run them together in a real run against the production identity. That day, the first time.",[27,42,44],{"id":43},"why-a-symptom-isnt-a-cause","Why a symptom isn't a cause",[11,46,47,48,51],{},"I start at the most conspicuous part: the approval flood. Every 15 seconds a prompt on the phone for a command that should actually be YOLO-approved. My first reflex was to read the CLI logs. They said: ",[15,49,50],{},"pending",". So: auto-approval doesn't trigger, something with the pattern.",[11,53,54],{},"Then I stopped believing the logs and inspected the actual grant request in the IdP:",[56,57,62],"pre",{"className":58,"code":60,"language":61},[59],"language-text","$ apes grant inspect \u003Cid>\nstatus:             approved\nauto_approval_kind: yolo\n","text",[15,63,60],{"__ignoreMap":64},"",[11,66,67,68,71,72,74],{},"The grant was approved. With ",[15,69,70],{},"auto_approval_kind: yolo",". The CLI still said ",[15,73,50],{}," and sent me a prompt. That was the moment the assumption flipped: the symptom \"flood of prompts\" had nothing to do with missing approval at all.",[11,76,77,78,81,82,84,85,88,89,92,93,95,96,98],{},"It was a triple bug. The YOLO pattern matched the ",[15,79,80],{},"apes run"," wrapper instead of the inner command — so the wrapper was approved, the inner stayed pending. The supervisor called ",[15,83,80],{}," without ",[15,86,87],{},"--wait",", saw no success, and restarted. And the registry path was doubly nested because ",[15,90,91],{},"homedir()"," in the daemon context resolved the daemon's HOME, not the agent's — which is why ",[15,94,21],{}," was invisible in the registry and ",[15,97,17],{}," empty. Each of these three on its own would have produced \"no prompt\", each a different failure mode. Only the interplay yielded \"flood of prompts plus empty list\".",[11,100,101,102,105],{},"On top of that came two that had nothing to do with the visible symptom and only surfaced because I was already standing in the engine room anyway: the owner attribution in the IdP was recursively wrong when an agent enrolls an agent. And in the setuid transition over the ",[15,103,104],{},"escapes"," helper the PATH wasn't inherited.",[11,107,108],{},"Five drifts, all made visible at once, because no one had ever run all layers together before. That's not bad luck. That's the property. The first real dogfooding of a multi-layer pipeline finds every layer's drift at once — because \"tested\" per layer means something different than \"ran together\".",[27,110,112],{"id":111},"the-fix-that-wasnt-a-fix","The fix that wasn't a fix",[11,114,115],{},"That leaves the supervisor. It ran in parallel with the launchd plists, started the same bridges launchd also started, saw them crash, restarted them, launchd too — crashloop. The obvious fix would be: coordinate supervisor and launchd, one wins.",[11,117,118],{},"The right fix was to delete the supervisor.",[11,120,121],{},"Process lifecycle on macOS is a solved problem. launchd has done that for years, correctly, with restart policy, with system-domain plists, with everything. The Nest supervisor was a second instance of something the OS already is. I had built it because while building I hadn't thought of the responsibility already living elsewhere. The cheapest solution wasn't to reconcile the two supervisors — it was that there's only one.",[27,123,125],{"id":124},"how-it-looks-now","How it looks now",[11,127,128,129,133],{},"The Nest daemon today is a pure registry watcher. It decides ",[130,131,132],"em",{},"which"," agents should run, writes that into a registry, and reconciles launchd plists in the system domain. The process lifecycle — start, restart after crash, boot persistence — is owned by launchd alone. Single source of truth for \"is the process running\".",[11,135,136,137,140],{},"What fell away: the supervisor. And shortly after, the HTTP intent channel the Nest used to talk to the bridges through — that too was a layer that UNIX permissions on an ",[15,138,139],{},"intents/"," directory handle more cleanly. Less code means concretely here: fewer layers that can become a chain whose symptom lies about the number of its causes.",[27,142,144],{"id":143},"the-point","The point",[11,146,147],{},"A symptom isn't a bug count. \"Notifications every 15 seconds, troop empty, igor invisible\" reads like one defect and was five, each with its own different symptom, that only collapsed by chaining into the one I saw. Whoever probes the symptom for causes counts wrong.",[11,149,150,151,154],{},"What stuck: sometimes the bug isn't ",[130,152,153],{},"in"," the layer. Sometimes the layer is the bug — and the most honest fix is to delete it, not patch it. I didn't repair the supervisor. I removed it, because launchd had long been doing the job.",{"title":64,"searchDepth":156,"depth":156,"links":157},2,[158,159,160,161,162],{"id":29,"depth":156,"text":30},{"id":43,"depth":156,"text":44},{"id":111,"depth":156,"text":112},{"id":124,"depth":156,"text":125},{"id":143,"depth":156,"text":144},"2026-05-09","The first real end-to-end dogfooding of the Nest pipeline showed exactly one behavior: notifications every 15 seconds, troop empty, igor invisible. It was five independent causes, each with a different symptom — only the chaining produced what I saw. And the deepest one wasn't a bug, but a layer that had to go.",false,"md",null,{},true,"/blog/en/one-symptom-five-causes",{"title":5,"description":164},"blog/en/one-symptom-five-causes",[174,175,176,177,178],"Debugging","AI Agents","Infrastructure","OpenApe","Building in Public","one-symptom-five-causes","1FnYsjra2Qf27xwyGhPRtdDw4nv1BJzMPYn_OnmP-Wk",{"en":182,"de":183},"/en/blog/one-symptom-five-causes","/de/blog/wenn-ein-symptom-nach-einem-bug-aussieht-sind-es-fuenf",1779001887012]