WebRTC Security Deep Dive: How VoiceMeet Encrypts Every Call
A technical but accessible guide to WebRTC security — covering DTLS, SRTP, ICE, STUN, TURN, and exactly how VoiceMeet's architecture protects your voice data.
· 12 min read · The VoiceMeet team
Every time you pick up the phone you make a silent assumption: that the conversation belongs to the two of you. Most internet calling software asks you to take that assumption on faith. WebRTC — the open standard powering VoiceMeet — doesn't ask for faith. It builds cryptographic proof directly into the protocol, so your voice data is protected before the first syllable leaves your mouth.
Security built in from day one is a fundamentally different animal from security bolted on afterward. WebRTC was designed by the IETF and W3C with the explicit requirement that media must be encrypted, not as an option but as a baseline. There is no version of a standards-compliant WebRTC implementation that ships voice in the clear. That design decision shapes everything downstream.
What WebRTC Is — and Why Security Comes First
WebRTC stands for Web Real-Time Communication. It is a collection of browser APIs and network protocols that let two peers exchange audio, video, and data directly — ideally without any server touching the media stream. The browser vendors — Google, Mozilla, Apple — all ship native implementations, which means the crypto primitives are audited, battle-tested, and updated with browser releases rather than locked inside a proprietary black box.
The IETF's RFC 8827 mandates that all WebRTC implementations use DTLS for key negotiation and SRTP for media encryption. These are not suggestions. A browser that skipped them would fail interoperability testing and be rejected by the standards body. This mandatory-crypto model is why WebRTC is structurally more secure than many legacy VoIP systems, which often treat encryption as an enterprise add-on rather than a baseline requirement.
For VoiceMeet specifically, this means we inherit a security foundation that the world's largest browser vendors maintain. We don't write our own crypto. We don't negotiate our own ciphers. We use the same primitives that secure HTTPS traffic for billions of people every day. Our job is to configure that foundation correctly and layer our own operational choices on top of it.
The DTLS Handshake: Authenticating Without a Central Authority
DTLS — Datagram Transport Layer Security — is the same handshake logic as TLS, adapted for UDP packets. When two VoiceMeet peers connect, they run a DTLS handshake before any audio flows. Each peer generates a self-signed certificate and sends a fingerprint of that certificate in the SDP offer or answer — the negotiation message that travels through VoiceMeet's signaling layer.
When the actual DTLS handshake begins over the media path, each side verifies that the certificate presented by the remote peer matches the fingerprint it received through signaling. If the fingerprints don't match, the connection is dropped immediately. This is what prevents a man-in-the-middle attack even when the media is being relayed through a TURN server: the TURN server sees encrypted packets it cannot decrypt, and any attempt to substitute a different certificate would fail the fingerprint check.
Because the certificates are self-signed and ephemeral, there is no central certificate authority that could be compromised to forge them. The trust anchor is the fingerprint in the signaling message, which means security depends on the integrity of the signaling channel — a point we return to shortly. Each call session generates fresh certificates, so there is no long-lived credential that accumulates attack surface over time.
The fingerprint in the SDP is your handshake contract. If the certificate doesn't match, the connection dies. That's not a feature — it's the foundation.
SRTP: Why Your Voice Stays Encrypted Through Every Hop
Once DTLS completes, both peers have negotiated a shared key. That key is handed to SRTP — the Secure Real-time Transport Protocol — which encrypts every voice packet before it leaves the device. SRTP uses AES in counter mode with a 128-bit key derived from the DTLS exchange. The authentication tag on each packet detects tampering; if a relay modified even a single bit of the payload, the receiving peer would discard the packet.
The critical implication is that SRTP encryption happens at the endpoint, not at the relay. When VoiceMeet routes a call through a TURN server — which we do for nearly all connections, for reasons explained below — the TURN server handles encrypted UDP datagrams it is mathematically unable to decrypt. It forwards bytes. It never sees audio. The call is end-to-end encrypted by construction, not by policy.
SRTP also protects against replay attacks by including a sequence number in each packet. If an attacker captured a burst of packets and replayed them, the receiver would recognize the duplicate sequence numbers and discard the replayed packets. This kind of protection matters even for voice, where replaying a few packets from a previous call could be used to inject convincing audio fragments out of context.
ICE, STUN, and TURN: The Path Your Packets Actually Take
WebRTC's Interactive Connectivity Establishment protocol — ICE — solves a practical problem: most devices sit behind NAT routers with no public IP address. Before two peers can exchange media, they must discover a network path that actually works. ICE does this by gathering a list of candidate addresses, testing each one, and selecting the best viable path.
STUN servers help a peer discover its own public IP address and port as seen by the internet — essentially a NAT reflection service. In an ideal world, ICE would find a direct peer-to-peer path using STUN-discovered addresses, and the two peers would exchange encrypted packets without any server in the middle. In practice, symmetric NATs and corporate firewalls block a large fraction of direct connections.
TURN servers relay media when direct connections fail. The peer sends its encrypted SRTP packets to the TURN relay, which forwards them to the other peer. VoiceMeet routes all calls through TURN by default rather than trying direct connections first. The reason is privacy: a direct peer-to-peer connection exposes both parties' real IP addresses to each other. Using TURN as a relay means neither caller learns the other's IP, which is essential for an anonymous voice platform.
- Direct (host candidate): fastest, but leaks real IP addresses to both parties
- STUN-reflexive: uses NAT-mapped public address, still exposes public IP
- TURN relay: both parties see only the TURN server's IP — VoiceMeet's preferred path
- ICE Lite mode: VoiceMeet forces TURN to prevent IP leakage from the ICE gathering phase
- Candidate filtering: VoiceMeet strips host and reflexive candidates from SDP before sending
Certificate Fingerprints in SDP: The Chain of Trust
The SDP — Session Description Protocol — document exchanged during call setup contains a fingerprint attribute that looks like a SHA-256 hash of the remote peer's DTLS certificate. This fingerprint is the linchpin of WebRTC's security model. Before the media handshake, each peer commits to a specific certificate. During the media handshake, each peer verifies the commitment. Break either step and the call fails to connect.
In VoiceMeet, SDP offer and answer messages travel through Supabase Realtime — our signaling channel. This raises a legitimate question: what prevents someone who controls the signaling layer from replacing the fingerprint with their own certificate's fingerprint, effectively becoming a man-in-the-middle? The honest answer is that a compromised signaling layer could perform this attack. This is true of every WebRTC application.
Our mitigations are layered. Supabase Realtime channels are protected by row-level security policies and JWT-scoped access tokens. The signaling channel for a room is only writable by the authenticated session that created it. The connection is itself TLS-encrypted. We also publish our signaling architecture publicly so researchers can audit the trust assumptions. Perfect forward secrecy, discussed next, limits what an attacker gains even if they succeed.
The Signaling Layer Is Not the Media Layer
A key principle in WebRTC security thinking is that the signaling channel and the media channel have different trust models. Signaling is application-controlled and can be secured with standard HTTPS/WSS mechanisms. Media security is protocol-controlled and cannot be circumvented by the application developer even if they wanted to. VoiceMeet benefits from both layers: our Supabase infrastructure secures signaling, and the WebRTC spec secures media.
Perfect Forward Secrecy: Protection for Recorded Traffic
Perfect Forward Secrecy means that compromising today's keys cannot decrypt yesterday's traffic. In WebRTC, DTLS achieves this by using ephemeral Diffie-Hellman or elliptic-curve Diffie-Hellman key exchange. The long-term certificates are used only for authentication. The actual session key is derived fresh for every call using a key exchange algorithm that leaves no trace in the certificates.
The practical implication is significant. Suppose an adversary recorded every encrypted packet you sent through VoiceMeet over the past year. Then suppose they somehow obtained your device's private key. They still cannot decrypt those recordings, because the session keys that encrypted them were never stored anywhere — they were computed in memory and discarded when the call ended. Only an attacker who was present in real time and could break the key exchange algorithm itself would have any path to plaintext.
This property is especially relevant for anonymous voice platforms. Users who discuss sensitive topics — mental health, political views, personal relationships — need assurance that a future data breach or legal demand cannot retroactively expose their past conversations. Perfect Forward Secrecy provides exactly that assurance, at the protocol level, without requiring any trust in VoiceMeet's promises.
Common WebRTC Attack Vectors and How VoiceMeet Mitigates Them
Understanding the real attack surface helps calibrate realistic expectations. WebRTC applications face several distinct classes of threats, and each one has a different mitigation strategy. No system is perfectly secure, but a clear-eyed enumeration of the risks is more useful than vague assurances.
- MITM via signaling: attacker replaces SDP fingerprints to intercept media — mitigated by TLS-secured signaling and signed room tokens
- ICE candidate harvesting: gathering reflexive/host candidates exposes the caller's real IP — mitigated by forcing TURN-only ICE policy
- TURN credential abuse: stolen TURN credentials let attackers relay arbitrary traffic at our cost — mitigated by short-lived rotating credentials issued per session
- Replay attacks on SRTP packets: injecting recorded audio — mitigated by SRTP sequence-number authentication
- Browser DTLS downgrade: attacker tricks browser into negotiating weak ciphers — mitigated by enforcing minimum cipher suites in server-side SDP munging
- Room enumeration: attacker cycles through room codes to find active calls — mitigated by rate-limiting and cryptographically random room IDs
- Denial of service on TURN: flooding the relay — mitigated by per-session bandwidth caps and Cloudflare's DDoS protection layer
Cloudflare TURN: An Additional Relay Privacy Layer
VoiceMeet's TURN infrastructure runs on Cloudflare's network. This adds an operational privacy layer on top of the protocol privacy layer. Cloudflare terminates the relay connection at an edge node geographically close to each caller, which means the IP address visible to the TURN relay is a Cloudflare IP, not a datacenter IP that could be correlated with VoiceMeet specifically. The actual user IP is known only to the Cloudflare edge node, subject to Cloudflare's own privacy commitments.
The Cloudflare infrastructure also provides automatic failover across hundreds of edge locations, ensuring that TURN relay availability does not depend on a single datacenter. TURN credentials are issued as short-lived HMAC-signed tokens with a 24-hour TTL; once a token expires, it cannot be reused to relay new connections, limiting the blast radius of any credential leak to a single calendar day.
We deliberately chose a managed edge network over running our own TURN servers for this reason: the operational security of TURN infrastructure is a full-time job. Hardware security modules, BGP route protection, DDoS scrubbing, and global anycast routing are not problems a small team should try to solve from scratch when Cloudflare has spent a decade and billions of dollars solving them at scale.
Security is not a product you ship once. It is an operational posture you maintain every day. Cloudflare's edge handles the physical layer so we can focus on the application layer.
Why WebRTC Security Beats Most Proprietary VoIP Systems
Legacy VoIP systems — SIP-based enterprise PBXes, older conferencing platforms, many carrier-grade solutions — were designed in an era when encryption was considered optional. Many still ship with unencrypted SIP signaling as the default, offer SRTP as an enterprise feature, and rely on perimeter security rather than end-to-end encryption. The assumption is that the network is trusted. That assumption has been wrong for thirty years.
WebRTC's mandatory encryption model means that even a developer who has no interest in security cannot accidentally ship a plaintext voice call. The browser enforces it. The standard requires it. There is no configuration knob that disables DTLS-SRTP. This makes WebRTC applications structurally more secure than most proprietary systems even without any additional hardening.
The open-source nature of the WebRTC codebase matters too. The DTLS and SRTP implementations in Chrome and Firefox have been audited by independent researchers, scrutinized by academic cryptographers, and refined through years of real-world deployment. Proprietary VoIP stacks rarely receive this level of public scrutiny. When vulnerabilities are found in WebRTC, they are patched in browser updates that reach users automatically — without requiring an enterprise maintenance window or a manual firmware update.
None of this means WebRTC is invulnerable. Application-layer mistakes — weak signaling security, predictable room IDs, misconfigured TURN credentials — can undermine an otherwise solid protocol. VoiceMeet's security posture is only as strong as the weakest link in the chain. That's why we publish our architecture, accept security reports through a responsible disclosure program, and revisit our configuration with each major browser update. The protocol gives us a strong foundation. What we build on top of it is our responsibility.
#webrtc #security #encryption #technical