The 7 Voice Agent Mistakes That Make Candidates Hang Up (And How to Fix Each One)
Build a voice agent that candidates want to talk to. From silence handling to interruption etiquette, here are the seven traps we've seen kill voice screening pilots — and what to do instead.

The 7 Voice Agent Mistakes That Make Candidates Hang Up (And How to Fix Each One)
Building a voice agent for candidate screening sounds straightforward. You pick a voice, write some questions, deploy to Retell or Vapi, connect it to your ATS. Done.
Then you listen to the recordings and find out your candidates are hanging up in the first 90 seconds.
Voice is unforgiving. Unlike a chatbot or a form, a phone call creates a real-time social contract. If the AI breaks that contract — by being too slow, too rigid, too robotic, or too pushy — the candidate is gone. They don't submit a support ticket. They just hang up.
We've run thousands of AI screening calls. Here are the seven mistakes we see repeatedly — in our own early builds and in systems other teams have shown us — and exactly how to fix them.
01. Awkward opening (the 3-second rule)
The first three seconds determine whether the candidate stays on the line. This is not an exaggeration. If the opening feels off — robotic, abrupt, or confusing — a significant portion of candidates will hang up before the first question.
Most default voice agent openings sound like this:
Before:
"Hello. I am an AI assistant calling on behalf of [Company Name]. I will be conducting your screening interview today. Please answer each question clearly and concisely."
That opening has several problems. It front-loads the procedural ("I will be conducting"). It tells the candidate what to do before establishing why they should trust the call. And "please answer each question clearly and concisely" is aggressive — it creates pressure before any rapport.
Here's what works better:
After:
"Hey [First Name], this is Alia calling from CannaZip — thanks for applying for the [Role] position earlier today. I know you probably weren't expecting a call this fast, so I'll keep this short: I just have a few quick questions to learn more about your background. Got about seven minutes?"
Notice what that opening does. It uses the candidate's name. It acknowledges the surprise of a fast callback (which is actually a positive signal). It previews the time commitment. And it ends with an opt-in question — "Got about seven minutes?" — which gets a yes or a "sure, let me just step outside" instead of confusion.
The micro-pause after "Got about seven minutes?" matters too. Train your LLM to wait for a real response here before proceeding. Do not auto-advance.
02. Robotic pacing (the breath problem)
Human speech has rhythm. It has micro-pauses between sentences. It has the occasional filled pause — "so," "right," "okay" — that signals processing and keeps the listener engaged. Text-to-speech engines, by default, don't have this. They barrel through at a constant rate that sounds technically correct and psychologically wrong.
Before (typical TTS output):
"Great. My first question is: Can you tell me about your most recent role and your primary responsibilities? Please take your time."
It's fine. It's also inhuman. It sounds like a hold menu.
After (with rhythm adjustments):
"Okay, great. So my first question — [0.4s pause] — can you tell me a little about your most recent role? What were you mainly responsible for day-to-day?"
Two changes: a short pause before the question (which mimics natural phrasing), and the question is broken into two parts — a lead-in and the specifics. This structure gives the candidate's brain time to activate the right memory before answering.
In Retell, you can configure sentence-level pause values in the voice config. In Vapi, the silencePadding parameter at the turn level controls this. Set it to 300–500ms for questions. A little more breathing room than you think you need.
Also: use a voice model with natural prosody. Not all TTS voices are equal. Test your candidate experience by listening to five full calls back-to-back without reading the transcript. If it starts to feel grating, your pacing is off.
03. Handling silence wrong
Silence happens. A candidate is searching for words. They've been asked something they need to think about. They stepped away from background noise. How your agent handles that silence determines whether they come back or hang up.
The wrong behavior: Immediately re-prompt after 1.5–2 seconds of silence.
[1.8 seconds pass] "I'm sorry, I didn't quite catch that — could you repeat your answer?"
This is one of the most common failure modes we see. The candidate is still thinking. The agent re-prompts. The candidate gets flustered. They say something shorter and less useful. Or they hang up.
The right behavior: Wait longer than you think you should.
In Retell, the endpointing (Voice Activity Detection) setting determines how long the system waits after candidate speech ends before treating the turn as complete. Set your VAD silence threshold to at least 800ms–1200ms for question responses. This is longer than the default, but it dramatically reduces false turn-completions where the agent talks over a candidate who just paused mid-sentence.
For complete silence (candidate hasn't spoken at all), configure a two-stage fallback:
- At 4–5 seconds: a gentle nudge — "Take your time, no rush."
- At 10–12 seconds: a clarifying re-prompt — "Did I lose you? Just let me know when you're ready."
- At 20 seconds: offer to reschedule — "It sounds like this might not be a great time — I can call back later if that's easier."
That third fallback is the key. A candidate who got a distracted call will convert at much higher rates if you offer a callback than if you just terminate the call.
04. Talking over the candidate
This one frustrates candidates more than any other. They're mid-answer, and the agent starts talking. It feels dismissive. It feels like the system doesn't actually care about their response — which undermines the entire premise of the screening call.
The technical root: interruption sensitivity settings are too aggressive. The agent is configured to begin speaking as soon as it detects a pause in the candidate's speech, but it's detecting micro-pauses mid-thought rather than actual turn completions.
In Retell: The interruptSensitivity parameter (0.0–1.0 scale) controls how aggressively the agent interrupts. Most defaults sit around 0.5. For candidate screening, set it to 0.2–0.3. Lower sensitivity means the agent waits longer before cutting in. The trade-off is slightly slower turn transitions — worth it.
In Vapi: The equivalent is startSpeakingPlan.waitSeconds — the amount of quiet time the system requires before the assistant speaks. Set it to at least 0.8–1.0 seconds. Combine this with a stopSpeakingPlan.numStopSecs of 2.5+ so the agent backs off if the candidate resumes speaking.
There's a subtler prompt-level fix too. Explicitly instruct the LLM not to follow up immediately:
Before:
"After the candidate responds, ask your next question."
After:
"After the candidate finishes speaking, wait a beat, then acknowledge briefly before the next question. Never start a new question while the candidate is still speaking or within 0.5 seconds of their last word."
The model will respect this framing in most cases. Combine it with proper VAD settings and interruptions drop significantly.
05. The hold-music gap
There's a moment in every voice agent call that most builders don't think about: the gap. The candidate finishes speaking. The system is processing — running the LLM inference, generating the next TTS response, buffering the audio. That gap is typically 1.5–3 seconds for most cloud-hosted setups.
For a human, that's fine. Humans pause. For a voice agent, silence mid-call signals dropped call, technical glitch, or dead line. Candidates hang up.
The fix is a filler strategy. Configure a set of short, natural filler phrases that trigger during processing gaps:
- "Mm-hmm, got it."
- "Okay."
- "Right."
- "Got it, thanks."
These play while the real response is generating. They're not the response — they're the audio equivalent of nodding. They signal: still here, still listening.
In Retell, this is the backchannel feature — you can enable it with backchannel.enabled: true and configure the frequency and phrase list. In Vapi, you can simulate this with streaming and a fast-response first token, but the implementation is more manual. Either way, this single change meaningfully reduces candidate hang-up rates during processing gaps.
One warning: don't overdo the backchannel. If every sentence the candidate says is met with "Mm-hmm, got it," it starts to feel like the agent is rushing them. Set the backchannel frequency to fire on responses over 5 seconds, not every response.
06. Confirming numbers and names
This is a smaller issue but it compounds quickly: voice agents are bad at numbers and proper nouns by default, and candidates notice.
Scenarios that break candidate trust:
- The agent repeats back "Fifteen years" when the candidate said "Fifty years"
- The agent mispronounces the candidate's name on the opening
- The agent says "your previous role at Accenture" when the candidate said "Accentra" (a local firm)
- The agent confirms "availability starting January" when the candidate said "availability starting June"
Each of these creates a moment of "wait — did it hear me?" That moment is trust-eroding. If it happens twice, the candidate starts treating the call like a voicemail — they disengage, knowing whatever they say isn't really being processed.
Fixes:
For names: use the application data, not the AI's inference. Pull the candidate's first name from the ATS record and inject it directly into the opening prompt as a hardcoded variable. Don't ask the AI to infer pronunciation from spelling.
For numbers: instruct the LLM to repeat back confirmed figures explicitly in the transcript and to use a clarifying question if it's uncertain — "Just to confirm — you said fifteen years of experience, is that right?" This adds 5 seconds and saves 30 minutes of recruiter confusion later.
For company names and role titles: maintain a glossary of your client's common entities (companies, certifications, locations) and include them in the system prompt. A well-primed model with context makes far fewer recognition errors.
07. Closing without an ask
Most voice agents die a quiet death at the end. The questions are done, so the agent says something like "Thanks for your time today, we'll be in touch" and hangs up. The candidate has no idea what happens next. They feel like they just talked to a wall.
A bad close destroys the goodwill you built in the previous seven minutes. A good close extends the relationship.
Before:
"Great, thank you for your responses today. We'll review your application and be in touch shortly. Have a great day!"
After:
"That's everything I need — thanks for taking the time, [First Name]. Our recruiting team will review your responses and reach out within [timeframe] about next steps. Is there anything you'd want them to know that we didn't cover? [pause] And is the best way to reach you still this number?"
Three things happening there. You're setting a concrete expectation (timeframe for follow-up). You're giving the candidate an open field — some of the best additional context you'll get comes from this question. And you're confirming contact preference, which is practical data that improves recruiter follow-up rates.
On the backend, make sure the close triggers an immediate SMS to the candidate with the "next steps" summary. Not tomorrow, not when the recruiter logs in. Immediately. That SMS is the digital handshake that makes the call feel real, not a bot interaction in a void.
Voice AI for hiring is still early enough that most of your competitors are making at least three of these seven mistakes. Fix all seven and your candidate experience will be noticeably better than what they're getting from anyone else.
That's the bar. Not perfection — just better than silence.
Want to hear a well-tuned voice agent in action? Call our demo number: +1 716 333 7560. You'll experience Alia live and can debrief with our team on the configuration afterward.

