I bumped a Rails app to Ruby 4.0.1 and CI went red on five tests, intermittently, never on my laptop. The obvious villain was the upgrade sitting at the top of the diff. It was a red herring. The tests had always been wrong: they compared a Ruby timestamp held in memory (nanoseconds) against the same value reloaded from Postgres (microseconds), and the rounding disagreed about 1% of the time. They passed for years because I develop on macOS and CI runs on Linux, which hands out sub-microsecond timestamps far more often. The fix was one word: reload before you capture.
My server hit 93% disk. The culprit was a 124 GB SQLite write-ahead log next to a 16 GB database, owned by my performance monitoring gem. Every safeguard against unbounded growth was switched on, and that was the problem: if the WAL never checkpoints, retention, row caps, and auto_vacuum all quietly feed it. Here's the WAL checkpoint mechanics, why a long-lived connection pins it forever, and why I removed the monitoring tool instead of fixing it.
The technical companion to the agentic-engineering story. Kamal, Terraform, 1Password for secrets, Caddy in front of kamal-proxy, the cutover playbook, and the bugs I spent the most time on.
Two and a half months mostly solo. BSPK off Heroku to AWS, no downtime, 60%+ cost down. I barely wrote code. What 'agentic engineering' looks like at production scale.
After building a wiki from 600 spiritist books, I built a second one from a few dozen YouTube videos, articles, and tweets about a Toy Story fan theory. The end product wasn't a website. It was a children's book for my kid.
A customer clicked a Stripe Connect button in my Rails app and nothing happened. Solid Errors showed RecordNotUnique exceptions from them clicking again and creating duplicate rows. The original click wasn't logged anywhere because nothing went wrong on the server. The redirect to Stripe was leaving my server fine. Turbo Drive was silently eating it, because fetch can't follow cross-origin redirects. Three things had to line up for the bug to exist. Remove any one and it goes away.
Karpathy posted his LLM Wiki gist and I had 600 books already chunked for a RAG system. I pointed Claude Code at them to test the wiki approach instead. Six days later: 679 interlinked pages, 6,000+ cross-references, and concept pages synthesizing 36+ sources each. The first attempt was garbage because Claude skimmed instead of reading. The fix was expensive: read every chunk of every book. But the result compounds in a way RAG never will.
I had a Rails side project and decided to rewrite it in Elixir/Phoenix. Streaming LLM responses, concurrency, personal interest. Built the whole thing, shared the same PostgreSQL database between both apps. Then came back to Rails. Not because Elixir was bad, but because writing code with Claude Code was a noticeably worse experience in Elixir than in Ruby. More errors, more iteration rounds, slower path to working software. When AI writes most of your code, that gap compounds fast.
I needed to extract speakers and topics from 40K+ YouTube videos for a spiritist knowledge base. Started with Groq's free tier, hit every rate limit, discovered my exception handling was silently flooding Solid Queue with 18K duplicate jobs, then moved to local models on Ollama. Along the way I found that Qwen3's default thinking mode turns a sub-second extraction into a 100-second one, and that 4B models need JSON sanitization to be reliable.
I spent three weeks tuning Litestream for Backblaze B2's free tier, wrote a blog post about it, then ripped the whole thing out and replaced it with a cron job. Meanwhile, my earlier SQLite auto_vacuum post led to a Rails PR that changes the default for every new SQLite database.