Skip to content
Back to blog
· 8 min read

Why Erlang C Overstaffs Your Chat Queue by 15-25% (And What to Use Instead)

Erlang C was built for 1917 telephone exchanges and fails in modern chat queues by ignoring concurrency, abandonment, and blended-agent AHT inflation - producing consistent 15–25% overstaff. This guid

Erlang C was written in 1917 for telephone exchanges. It assumes agents handle one contact at a time, customers wait forever without abandoning, and arrivals follow a tidy Poisson distribution. You're using it to staff a chat queue where your agents are juggling three conversations simultaneously, your abandonment rate is somewhere between 8 and 15%, and arrivals spike in ways that have nothing to do with the time of day.

That's why your model looks right and your service levels still miss. Or worse: your model looks right, you hit a great service level, and the formula was overstating your requirement the whole time.

This isn't a calibration problem. It's a model mismatch. The fix isn't tweaking your inputs.

---

What Erlang C Actually Assumes (Most People Have Never Read the Original)

A.K. Erlang published "The Theory of Probabilities and Telephone Conversations" in 1917. He was solving a real problem: how many telephone operators does a Copenhagen exchange need to ensure callers don't wait too long? The assumptions he built into the model were appropriate for that problem. He stated them explicitly.

The formula assumes:

  • One agent handles one contact at a time (single-server queues)
  • Callers never abandon. They wait indefinitely.
  • Arrivals follow a Poisson process
  • Service times are exponentially distributed

These were reasonable assumptions for a telephone exchange in 1917. They are not reasonable for a modern contact center chat queue.

Most WFM practitioners never go back to the original paper. They encounter Erlang C through their WFM platform, where it's the default staffing formula in Verint, NICE, Calabrio, and most others. The platforms ship it as the default because it's computationally cheap and fast. What they often don't document clearly is what the formula cannot model.

The single biggest structural problem for chat is the first assumption. Erlang C models one agent, one contact. Chat agents handle two, three, sometimes four simultaneous sessions. The common workaround is to divide agent count by a flat concurrency ratio: if agents handle three chats each, divide your Erlang C agent output by three. This sounds reasonable. It's not a solution. It's a patch that introduces its own distortions.

---

The Three Ways Chat Breaks the Formula

Concurrency is dynamic, not flat. A "3 chats per agent" ratio is an average. At your Monday morning peak, agents handle two chats because handle times extend, context-switching increases, and response quality requires more deliberate effort. During shoulder periods, the same agents comfortably run four. Applying a flat ratio to your Erlang C output produces the wrong number at nearly every interval. The formula overstates headcount during shoulders (where the ratio should be higher) and understates it at peak (where the ratio compresses). The net effect across the day is consistent overstaff.

AHT inflation in blended environments. ContactBabel's research into omnichannel operations documented this repeatedly: when agents handle chat alongside voice, average chat handle time climbs 20-40% compared to agents who only work chat. Context-switching is the mechanism. An agent who puts a voice call on hold to respond to a chat, then returns to the call, loses cognitive thread in both conversations. Response quality drops, agents write longer messages to compensate, and resolution takes more turns. If your AHT model was built on data from a period when chat was handled by dedicated agents, or from a pilot before blending was introduced, every staffing calculation downstream is inflated from the wrong baseline.

Abandonment erasure. Chat abandonment rates in contact centers typically run between 8 and 15%. Some queues hit 20% during sustained peaks. Erlang C treats all of them as persistent demand. Every customer who abandons after 45 seconds was never going to need service, but the formula counts them as a contact requiring resolution. In a queue receiving 500 chat initiations per day, you're carrying 40 to 75 phantom contacts as staffing demand. That's real headcount being justified by people who already left.

Together, these three distortions compound. A team applying Erlang C with a flat concurrency ratio, inflated AHT, and no abandonment adjustment can be running 15-25% more staff than the queue actually requires during the periods where the overstaff matters most.

---

How the Overstaff Actually Plays Out in Practice

The operational symptoms aren't always obvious, which is part of why this persists.

The model says you need 18 agents to hit an 80/20 service level. You schedule 18. You hit 95/20. Leadership reads this as the model being conservative. Over time, headcount pressure means cuts get made against the "buffer." The model was conservative, but for the wrong structural reasons. When cuts get made to the right number, the actual queue dynamics haven't changed, and service level drops.

Overstaffing peaks creates a secondary problem: understaffed shoulders. Total FTE budget gets consumed at peak intervals, so coverage at 10 AM and 4 PM becomes thin. Those shoulder periods often carry the highest-complexity contacts. Customers who called during the peak but abandoned, customers with billing disputes or account escalations, contacts that require actual thinking rather than quick resolution. The contacts where resolution quality matters most get handled by the smallest, most fatigued coverage window. Optimizing for peak SLA can quietly degrade overall efficiency and quality across the operating day.

Channel shrinkage applied uniformly compounds the problem further. Most teams apply a single shrinkage percentage across all channels, typically somewhere between 25 and 35%, derived from voice-era benchmarks. But chat agents in concurrency don't have unproductive dead time between contacts. The gap between turns in one chat session is productive time handling another chat. Applying 30% voice shrinkage to concurrent chat adds another layer of phantom headcount requirement that has nothing to do with how the agents actually spend their time.

---

Interval-Weighted Concurrency Modeling: The Practical Fix

The fix doesn't require replacing your WFM platform. Most teams can implement this as an overlay in Python or Excel that feeds into their existing scheduling engine.

The core idea: build a concurrency lookup table by 15-minute interval rather than applying a flat ratio across the day.

Pull your chat data by 15-minute interval for a representative period (eight to twelve weeks of stable operations, excluding anomalous weeks). For each interval, calculate:

  • Actual chat initiations
  • Average handle time for that interval
  • Abandonment rate
  • Observed concurrent sessions per active agent

You'll see the pattern clearly: concurrency compresses at peak (typically dropping to 2.0-2.5) and expands during shoulders (reaching 3.5-4.0 in low-volume periods). Map these as your concurrency multiplier by interval.

When you run your staffing calculation, apply the interval-specific multiplier rather than a flat assumption. Erlang C still runs underneath, but now its output is being adjusted by a concurrency factor that reflects actual queue behavior rather than a round number someone chose at implementation.

For AHT, recalculate separately for blended and chat-only agent populations if you run both. If you've recently introduced blending, treat the pre-blending AHT data as outdated for staffing purposes until you have 6-8 weeks of post-blending data to work from.

For abandonment, pull your actual abandonment rate by interval and subtract those contacts from the demand figure before the staffing calculation runs. This is straightforward: if you receive 80 chat initiations in an interval and 10% abandon within 30 seconds, you're solving for 72 contacts, not 80.

Teams with enough volume to justify a more complete solution should consider discrete-event simulation. DES handles concurrency, abandonment, priority routing, and non-Poisson arrivals correctly, which is everything Erlang C gets wrong in a chat environment. The trade-off is data requirements and expertise. For most operations under 100 agents, the interval-weighted concurrency model captures 80% of the accuracy benefit without the infrastructure overhead. We covered the math behind channel-specific concurrency and shrinkage modeling in more detail if you want to go deeper on the calculation approach.

---

Channel-Specific Shrinkage: The Quick Win Most Teams Skip

This one requires no new tooling. It's a change to your spreadsheet model.

Published shrinkage benchmarks were built for voice queues. Voice agents have genuinely unproductive time: bathroom breaks, post-call wrap, coaching, system lag. Chat agents in concurrency have a different profile. The gap between turns in one chat session is working time in another session. Turn-wait time doesn't belong in your shrinkage calculation.

Practical channel-specific ranges that align with what practitioner teams actually observe: voice runs 28-35%, concurrent chat runs 18-24%, email and async channels run 12-18%. These aren't universal benchmarks to copy directly. Calculate yours from your own attendance records, training schedules, and system downtime data. But if you're currently running a single 30% shrinkage figure across all channels, you can be confident you're overstating your chat and email requirements.

The FTE impact is meaningful. In a medium-sized operation with 50-80 agents across channels, correcting shrinkage by channel typically removes 10-15 FTE equivalents of phantom requirement from async and chat. That's either budget recovered, or capacity that can be reallocated to genuine coverage gaps.

The practical implementation: build a channel-separated shrinkage worksheet. Voice, chat, and async each get their own row with their own shrinkage rate. The output feeds into your channel-separated staffing calculation. If your platform accepts a shrinkage input, use the channel-appropriate figure rather than a blended one.

---

The Rebound Wave and Other Non-Linear Staffing Surprises

Even with corrected concurrency modeling and channel-specific shrinkage, multi-channel staffing produces nonlinear dynamics that the math alone won't prepare you for.

The rebound wave is the most consistently misunderstood. When a wave of newly available agents simultaneously clears a backed-up queue, handle time spikes across the board (agents working through complex, deferred contacts) and a secondary queue depth peak follows 15-20 minutes later as those extended interactions complete at the same time. Releasing a block of 5 agents at once to address an emerging peak creates a surge-and-crash pattern rather than stable recovery. Staggering activation in 10-minute intervals produces a smoother queue clearance and avoids the secondary spike.

Counter-intuitive but documented by operations teams that have tested it: hiding chat or callback options during declared peak windows can improve total throughput and customer satisfaction. When agents temporarily become voice-only specialists, handle time drops, queue math simplifies, and the overall contact experience improves. It runs against every instinct about customer choice, but the staffing mathematics of a simplified single-channel queue during a 90-minute peak window are genuinely cleaner than blended queue math under pressure. This connects to a broader question about channel blending strategy; the multi-channel peak scheduling playbook covers the tactical tradeoffs in more depth.

The phantom staff gap deserves specific attention. In most WFM systems, agents who are scheduled but unavailable (unplanned coaching session, system outage, informal break) remain counted as available capacity in the staffing model until real-time adherence flags the deviation. RTA typically catches this 5-10 minutes after the fact. During a high-volume peak, that's a meaningful window where your model shows sufficient coverage but your actual answered rate has already dropped.

Running intraday reforecasts at 15-minute intervals rather than 30-minute or hourly intervals catches emergent understaffing 40-60 minutes earlier. The mechanism is simple: more frequent reforecasts incorporate actual arrival data sooner, so the remaining-day staffing gap report updates before the understaffing has compounded. Most WFM platforms support this configuration but don't default to it because of the additional compute load. Tools designed for intraday planning, including Soon's intraday scheduling module, support this kind of interval-level reforecast cadence alongside dynamic coverage requirements by activity type, which removes the need to maintain a separate manual overlay just for 15-minute reforecasting.

---

The underlying issue with Erlang C in chat environments isn't that the formula is bad. It solved the problem it was designed to solve in 1917 and remained useful for single-channel voice queues for decades afterward. The problem is applying it unchanged to an environment its assumptions explicitly don't describe, then wondering why the outputs are consistently off.

The fixes exist. Interval-weighted concurrency modeling, channel-specific shrinkage, and 15-minute intraday reforecasting don't require new platforms or simulation software. They require being willing to look at what the formula actually assumes, checking which assumptions are false in your specific environment, and correcting for exactly that.