TURN Servers, ICE Candidates, and NAT Traversal: How VoiceMeet Connects You

Most people never think about how a WebRTC call actually connects. Here's a plain-English guide to ICE, STUN, TURN, and why VoiceMeet uses Cloudflare relays for every call.

· 12 min read · The VoiceMeet team

TURN Servers, ICE Candidates, and NAT Traversal: How VoiceMeet Connects You

You tap the green button. Two seconds later, a stranger's voice fills your ears from a city you've never visited. What you just experienced as a simple tap required your device to locate itself on the internet, punch through at least two layers of network translation, negotiate encryption keys, and stream compressed audio over a path assembled in real time. Most people never wonder how it works. This post explains it.

WebRTC — the browser standard that powers VoiceMeet — was designed to enable direct peer-to-peer communication without routing audio through a central server. That design choice has enormous privacy and latency benefits. It also creates a genuinely hard networking problem, because the internet wasn't built for peers to find each other directly. It was built for clients to reach servers with known, stable addresses.

Why NAT Makes Peer-to-Peer Hard

Network Address Translation, or NAT, is the reason your home router can give ten devices internet access using a single public IP address. When your laptop makes a request, the router rewrites the source address to its public IP, forwards the packet, and remembers the mapping so it can route the response back to the right device. For outbound connections to web servers, this works transparently. For inbound connections from a stranger, it's a problem.

If someone on the other side of the world tries to send your laptop a packet directly, they have no way to reach it. Your public IP belongs to your router, not your laptop. And your router has no mapping for an unsolicited inbound packet — it doesn't know which device to forward it to. The packet is dropped. This is why a naive peer-to-peer call setup fails for most users: both endpoints are hidden behind NATs that neither knows how to reach.

The situation gets more complex when you consider that there are multiple NAT types — full cone, address-restricted cone, port-restricted cone, and symmetric — each with different rules about which packets get through. Symmetric NAT, common in corporate networks and mobile carriers, is particularly hostile to direct peer connections because the router assigns a different external port for every outgoing connection, making it nearly impossible for a peer to predict the right address to send to.

ICE: The Framework That Solves Everything

ICE, or Interactive Connectivity Establishment, is the protocol framework WebRTC uses to figure out the best path between two peers. Rather than trying a single connection method and giving up, ICE gathers a list of candidate connection addresses for each peer, exchanges those lists, and systematically tests every combination to find the one that actually works. It's a structured, comprehensive approach to a problem that would otherwise require guessing.

ICE defines three categories of candidates. Host candidates are the device's local network addresses — typically a private IP like 192.168.x.x. These only work when both peers are on the same local network, which is almost never the case for VoiceMeet callers. Server-reflexive candidates are the public IP and port that a STUN server observes when it receives a connection from you. Relay candidates are addresses on a TURN server that will forward traffic on your behalf. Each category represents a different strategy for reaching the other peer.

Gathering Candidates: The Three Types

The ICE agent runs these candidate-gathering steps in parallel to save time. While it's contacting STUN servers to learn the server-reflexive candidates, it's simultaneously preparing relay candidates on the TURN server. Candidate gathering happens before the call is established, so by the time you hear the other person's voice, all of this work is already done.

STUN Servers: Discovering Your Public Address

A STUN server — Session Traversal Utilities for NAT — has one simple job: tell you what your public IP address and port look like from the outside. When your ICE agent contacts a STUN server, the server inspects the source address of the incoming packet and sends it back in a response. Your device now knows its server-reflexive candidate address. That's it. STUN servers are cheap to operate, stateless, and handle millions of requests without storing anything.

STUN is effective when both peers are behind NAT types that allow unsolicited inbound packets once an outbound mapping has been established. If both sides send a packet to each other's server-reflexive address at the same time — a technique called hole-punching — the outbound packets from each side create NAT mappings that allow the other side's packets in. When hole-punching works, you get a true peer-to-peer connection with minimal latency and no relay cost. However, hole-punching fails in a significant percentage of real-world network configurations.

TURN Servers: The Reliable Fallback

TURN — Traversal Using Relays around NAT — is what happens when hole-punching fails. A TURN server acts as a relay: both peers connect to it, and all traffic flows through it rather than directly between endpoints. From a connectivity standpoint, TURN always works, regardless of NAT type, firewall configuration, or network topology. You're no longer establishing a peer-to-peer connection — you're establishing two separate connections that meet in the middle at the relay.

TURN servers are significantly more expensive to operate than STUN servers because they must relay every byte of audio in both directions. A STUN server handling a million ICE negotiations uses trivial bandwidth; a TURN server handling the same million calls handles all the voice data for those calls. This is why many WebRTC implementations try STUN first and only fall back to TURN for the minority of calls where direct connection fails. It's a bandwidth and cost optimization.

TURN is the reason WebRTC works everywhere, not just in ideal network conditions. Without it, a significant percentage of calls would simply never connect.

— WebRTC Infrastructure Working Notes

Why VoiceMeet Routes All Calls Through TURN

Most WebRTC applications treat TURN as a last resort. VoiceMeet does the opposite: we use TURN relay candidates for every single call, even when a direct connection would technically be possible. This decision comes down to one principle that defines everything we build: your IP address reveals where you live, and the stranger you're talking to should never see it.

If VoiceMeet allowed direct peer-to-peer connections via server-reflexive candidates, the other peer's ICE agent would receive your public IP address as part of the candidate exchange. With a public IP, someone can look up your approximate city, infer your ISP, and in some cases narrow your location significantly. For an app built on anonymity, allowing that level of information leakage would be a fundamental contradiction of our core promise. TURN ensures your real IP is never exposed.

There is a secondary benefit: network consistency. When all calls go through a TURN relay, connection establishment is faster and more predictable. We don't spend time testing dozens of candidate pairs hoping one works. The relay candidate is offered first and exclusively, connection happens reliably, and you're talking within two to three seconds of matching. The privacy guarantee and the user experience improvement point in the same direction.

Cloudflare's TURN Infrastructure

VoiceMeet uses Cloudflare's TURN relay service, specifically their Calls API, which provides access to Cloudflare's global network of edge nodes for WebRTC relay. Cloudflare operates infrastructure in over 300 cities worldwide, which means the TURN relay your packets traverse is almost always within a few milliseconds of both you and your call partner. Geographic proximity to the relay point is the primary determinant of call latency when traffic is being relayed.

Cloudflare's network also provides security properties beyond simple relay. Traffic between your device and the Cloudflare edge is encrypted, the relay infrastructure is protected by DDoS mitigation systems, and the global Anycast routing ensures you connect to the nearest edge node automatically without any configuration. For an application where users may be connecting from anywhere in the world — including regions with degraded or throttled internet infrastructure — this global distribution is important.

Security Properties of Cloudflare TURN

The ICE Candidate Exchange in VoiceMeet

ICE candidates need to be communicated between peers through a signaling channel — WebRTC itself doesn't define how this happens; it only defines what to exchange. VoiceMeet uses Supabase Realtime as its signaling layer. When two users are matched, they're assigned a shared Supabase channel. ICE candidates discovered by each peer's ICE agent are serialized to JSON and published to this channel as they arrive, a pattern called Trickle ICE.

Trickle ICE is important because candidate gathering takes time. Waiting for all candidates to be gathered before sending any of them would add noticeable delay to call setup. With Trickle ICE, candidates are sent as soon as they're discovered. The other peer's ICE agent begins testing pairs immediately, often completing connectivity checks before all candidates have even been gathered. The practical effect is a faster call connection with no penalty for sending a candidate that turns out not to be the best one.

A critical implementation detail is ICE candidate buffering. There's a race condition in Trickle ICE: candidates from the remote peer can arrive via the signaling channel before the local RTCPeerConnection has received the remote session description (SDP) and is ready to accept them. VoiceMeet's client code maintains a buffer of received candidates and adds them to the peer connection once the remote description is set. Without this buffer, candidates received during the setup window are silently dropped, which can degrade connection reliability.

Failover and Resilience Design

TURN servers, like any infrastructure, can be slow or temporarily unavailable. VoiceMeet's connection architecture handles this through a combination of credential prefetching and connection timeout monitoring. TURN credentials are fetched from our backend before the user enters the matchmaking queue, so credential retrieval latency doesn't add to call setup time. If the credential fetch fails, the user sees an error before being matched rather than experiencing a confusing failed call.

Once a call is in progress, the ICE agent continuously monitors connection health using periodic STUN binding requests over the established path. If the relay path degrades — for example, if a Cloudflare edge node becomes temporarily unreachable — WebRTC's ICE restart mechanism can renegotiate a new set of candidates without ending the call. The user may hear a brief interruption, but the call itself doesn't drop. This is the same mechanism that allows WebRTC calls to survive network switches, such as moving from WiFi to cellular.

For most users, all of this infrastructure operates silently in the background. The complexity of NAT traversal, ICE negotiation, TURN relay selection, and candidate buffering collapses into a single user experience: you tap a button, and a few seconds later you're talking to someone. The engineering underneath that simplicity is what this post has tried to make visible — not because you need to understand it to use VoiceMeet, but because understanding it reveals why privacy-by-default in WebRTC requires intentional infrastructure choices, not just good intentions.

Anonymity on the internet isn't automatic. Every layer of the networking stack leaks information by default. Building true privacy means examining each layer and deciding what to expose and what to protect.

— VoiceMeet Engineering

The next time you tap the green button and a stranger's voice comes through, you'll know what happened in those two seconds: your ICE agent gathered relay candidates on a Cloudflare edge node near you, exchanged them with the other peer over Supabase Realtime, and established an encrypted SRTP audio stream through the relay — all without either of you ever learning the other's real IP address. That's the infrastructure behind the conversation.

#webrtc #turn-servers #networking #technical