Hung by a Thread
January 24, 2026
It's 2am. My robot is frozen. Not crashed, not erroring, just... vibing. Sitting there. Motors off. Completely checked out.
I've been debugging for 8 hours and I'm about to mass delete my entire codebase and become a farmer.
The Setup
I'm building autonomous sidewalk robots. The control loop runs at 100Hz — every 10ms we read sensors, do math, send motor commands. It's the heartbeat. The one thing that absolutely cannot stop.
It had been rock solid for weeks. Then I added LiDAR streaming over WebRTC.
Now, ~16 seconds after a client connects, the loop just stops. Doesn't crash. Doesn't throw. Just ghosts me. The watchdog starts barking, the robot coasts to a stop, and my laptop shows a beautiful 3D point cloud of a robot that has given up on life.
The Wrong Turns
I tried everything.
"It's tokio starving the loop" — switched to std::thread::sleep. Nope.
"It's the async mutex" — swapped for std::sync::Mutex. Nope.
"It's running on the wrong thread" — moved the whole loop to std::thread::spawn. Complete isolation. Nope nope nope.
Same freeze. Same spot. Iteration 1,615. Every single time.
The consistency was almost insulting. Like the bug was laughing at me.
The Breakthrough
Ok new plan. I add a heartbeat thread. Just a lil guy that watches a counter and screams if it stops:
std::thread::spawn(move || {
let mut last = 0;
loop {
std::thread::sleep(Duration::from_secs(5));
let current = counter.load(Ordering::Relaxed);
if current == last {
eprintln!("STUCK at iteration {}", current);
}
last = current;
}
});
Five seconds after freeze: STUCK at iteration 1615
Oh. OH. It's not slow. It's not starved. It's blocked. Something is holding a lock and simply not letting go. Deadlock behavior.
Time to bring out the big guns. GDB.
She's waiting on a mutex. But who's holding it??
I scroll through the other threads. Tokio workers, GStreamer stuff, and then... four threads I definitely did not create. Rayon workers. I don't use rayon. Who invited rayon.
The Reveal
Rerun is this beautiful visualization SDK I use for recording telemetry. You call recorder.log() and magic happens.
Turns out rerun uses rayon internally.
And I was calling recorder.log() while holding a mutex.
This is a known rayon footgun: rayon#592. When you call into rayon while holding a mutex, rayon's work-stealing threads can deadlock trying to "help" with work that needs the lock you're already holding.
That's it. That's the fix. 8 hours of debugging. 2 lines changed. Hold the lock for less time. Tale as old as time.
The Takeaways
-
GDB is cracked for deadlocks. Logs can't show you thread state.
thread apply all bthits different. -
Random threads showing up? Suspicious. If you see thread pools you didn't spawn, figure out who did.
-
Your dependencies have dependencies. Somewhere in that
Cargo.lockis a threading model waiting to fight yours. -
Heartbeat threads are free. A few lines to detect "stuck" is worth it for any critical loop. They're just a lil guy. Let them watch.
-
The fix is always smaller than the hunt. Always. Without exception. It's almost annoying.
I submitted a PR to rerun adding docs about this. Maybe the next person finds the warning before they find the bug.
The robot runs now. Hasn't frozen since. The LiDAR streams beautifully.
But I will never call into a library I don't fully understand while holding a mutex again. Fool me once.

