Building a Voice Agent That Doesn't Sound Like a Robot
The engineering choices that separate uncanny-valley demos from interviews candidates actually finish.

Latency Is Everything
If your agent takes more than 500ms to respond, candidates notice. At 800ms they get awkward. At 1.2s they hang up. Our pipeline targets sub-400ms turn-taking by streaming partial transcripts directly into the LLM and beginning TTS generation before the model finishes its response.
Handling Interruptions
Real conversations are messy. Candidates trail off, restart sentences, ask clarifying questions mid-prompt. A good agent listens for voice activity continuously and yields the floor immediately when interrupted — no robotic "I'm sorry, please let me finish."
Scoring Without Bias
Every scored dimension maps to a behavioral indicator written by your team. The model never invents categories. Audit logs show exactly which transcript span contributed to each score, which is what makes the system defensible under NYC LL 144.

