Is VoiceMeet Secure? End-to-End Encryption and WebRTC Safety Explained
A plain-English breakdown of how VoiceMeet secures every call with WebRTC DTLS-SRTP encryption, no audio storage, and privacy-first infrastructure.
· 14 min read · The VoiceMeet team
When you speak into a microphone on a voice app, your words become data almost immediately — packets of audio information traveling across networks you do not control, through infrastructure owned by companies you may never have heard of, toward a device you have never physically touched. Whether those packets can be intercepted, stored, or sold to someone is not a matter of trust. It is a matter of cryptographic architecture. Here is exactly how VoiceMeet makes that architecture work in your favor.
What DTLS-SRTP Actually Means in Plain English
WebRTC, the open standard that powers VoiceMeet's calls, mandates encryption on every audio and video stream. The protocol it uses is called DTLS-SRTP — which sounds like alphabet soup but has a sensible structure once unpacked. DTLS stands for Datagram Transport Layer Security: it is essentially TLS (the same cryptography that puts the padlock in your browser's address bar) adapted for real-time audio data rather than web pages. SRTP stands for Secure Real-time Transport Protocol, the encrypted envelope that actually carries your voice.
When two VoiceMeet users connect, their browsers perform a DTLS handshake: an automated cryptographic negotiation that generates unique encryption keys that exist only in the memory of those two devices. Neither key ever travels to a server. Neither key is ever written to disk. The encrypted audio stream is then transmitted between the two endpoints — and because the keys live nowhere except those two devices' RAM, anyone intercepting the packets in transit would see nothing but random noise.
This is not encryption-in-transit, which is what most services mean when they say 'your data is encrypted.' Encryption-in-transit means data is scrambled between your device and a server, but the server can decrypt it once it arrives. DTLS-SRTP as implemented in WebRTC is end-to-end: the server in the middle cannot decrypt the content because it never has the keys. The only partial exception is when calls must route through a TURN relay server — a topic covered in detail below.
Why VoiceMeet Never Records Your Audio
Many services claim they do not record your data, but the claim is a policy commitment rather than an architectural fact. A policy can change with a terms-of-service update. An architecture either routes audio through a recordable server or it does not. VoiceMeet's answer is architectural: in the default peer-to-peer call path, audio flows directly between the two participants' devices and is never present on VoiceMeet's servers in any form. There is nothing to record because the audio is never there.
Even in cases where a TURN relay server is used — when network topology prevents a direct peer connection — the relay server forwards encrypted packets without decrypting them. The TURN server acts as a blind courier: it knows the source and destination IP addresses of the packets it handles, but it cannot read the payload. The audio content remains end-to-end encrypted throughout. This is why no internal policy decision, subpoena, or data breach at VoiceMeet could produce a recording of your voice.
Security that depends entirely on promises is not security. The only guarantee worth trusting is one baked into the math.
ICE, STUN, and TURN: Connecting Without Exposing Your IP
Establishing a peer-to-peer connection between two devices behind firewalls and NAT routers is a genuinely hard networking problem. WebRTC solves it with a protocol called ICE (Interactive Connectivity Establishment), which works by gathering multiple candidate connection paths and selecting the best one. To gather candidates, ICE uses two helper protocols: STUN and TURN. Understanding how they work is essential to understanding VoiceMeet's privacy posture on connection establishment.
STUN (Session Traversal Utilities for NAT) lets your device discover its public IP address so that address can be shared with the person you want to call. This is where a privacy concern arises: if your public IP is shared with another user, they could potentially use it to geolocate you. VoiceMeet mitigates this by using mDNS ICE candidates in supported browsers, which replaces your real IP with a randomized local hostname during the negotiation phase so your true IP is never exposed to the other party.
TURN (Traversal Using Relays around NAT) is the fallback when a direct connection cannot be established — typically when one or both users are behind symmetric NAT or restrictive firewalls. In this case, audio is relayed through VoiceMeet's TURN infrastructure, operated via Cloudflare. As described above, the relay only sees encrypted packets. TURN sessions are authenticated with short-lived credentials that expire after the call ends, limiting the window of any potential credential compromise.
Signaling Security: How the Connection Setup Is Protected
Before audio flows, two parties must exchange session descriptions — the WebRTC offer and answer that describe each side's audio capabilities and ICE candidates. This exchange is called signaling, and it happens over VoiceMeet's backend rather than peer-to-peer. VoiceMeet uses Supabase Realtime for signaling, which means the session description messages travel over a TLS-encrypted WebSocket connection. An attacker who intercepted these messages could see connection metadata but could not use it to decrypt the audio stream, since audio keys are negotiated independently via DTLS.
Signaling channels are also a potential vector for man-in-the-middle attacks, where an attacker substitutes their own session description in place of the legitimate one. VoiceMeet's DTLS handshake uses certificate fingerprints to detect this: each side embeds a hash of its DTLS certificate in the session description, and if a man-in-the-middle substitutes a different certificate during the handshake, the fingerprint mismatch will cause the connection to fail. This protection is built into the WebRTC specification itself.
What VoiceMeet Does Not Collect
The clearest way to understand VoiceMeet's privacy posture is to be explicit about what is not collected. There are no user accounts, which means no email addresses, phone numbers, names, or profile pictures. There are no persistent call logs linking specific users to specific conversations. There are no behavioral profiles tracking which topics you discuss, how long you typically talk, or what time of day you are active. There is no advertising technology, no third-party tracking pixels, and no sale of user data to any entity.
- No user accounts or identity verification of any kind
- No audio recordings — not in transit, not at rest
- No persistent call history tied to any user identifier
- No behavioral profiles or interest graphs for advertising
- No sale of user data to third parties under any circumstance
- No third-party tracking SDKs or advertising frameworks embedded in the app
What Minimal Metadata Is Stored and Why
Complete transparency requires disclosing what is stored. VoiceMeet retains aggregate call duration data for infrastructure capacity planning. Report flags — when a user taps the report button during a call — are stored with a short-lived device identifier to enable pattern detection of repeat offenders. These report records are retained for 30 days and then deleted unless they are part of an active abuse investigation. VoiceMeet also maintains per-session risk scores that decay over time and are not linked to any persistent identity.
Comparing VoiceMeet to Phone Calls and WhatsApp
Traditional phone calls — the kind made over the public switched telephone network — have no end-to-end encryption at all. The SS7 protocol that underlies most cellular and landline infrastructure was designed in the 1970s with no consideration for modern threat models. SS7 attacks are well-documented and available to nation-state actors, telecom insiders, and sophisticated criminal groups. A phone call between two people who trust each other completely can still be intercepted by a third party at the carrier level.
WhatsApp offers genuine end-to-end encryption using the Signal Protocol for both messages and calls, which is cryptographically strong. The significant caveat is that using WhatsApp requires a phone number, which ties your identity to your communications history. WhatsApp is owned by Meta, whose business model depends on building advertising profiles. While WhatsApp claims it does not use call content for advertising, the metadata — who you called, when, how often — is available to Meta and subject to its data practices and law enforcement requests.
TURN Server Security and Cloudflare Infrastructure
VoiceMeet's TURN servers run on Cloudflare's global network, which brings both geographic distribution and a mature security posture. Cloudflare operates under SOC 2 Type II compliance and publishes detailed transparency reports on government data requests. TURN credentials issued by VoiceMeet are time-limited tokens generated using HMAC-SHA1 with a server-side secret — they are specific to a single call session and expire when the call ends. An attacker who captured a valid TURN credential could not use it for any other session.
Cloudflare's infrastructure also provides DDoS protection for VoiceMeet's signaling layer, which is important because real-time communication servers are attractive targets for disruption attacks. Rate limiting on the signaling endpoint prevents credential stuffing and session flooding. These protections operate at the infrastructure level, meaning they are active regardless of what any individual user does and require no configuration from VoiceMeet's application layer.
Security Audit Posture and Responsible Disclosure
VoiceMeet maintains a responsible disclosure policy — a documented process for security researchers to report vulnerabilities without fear of legal retaliation. Reports submitted through the disclosure program are triaged within 48 hours and, where valid, patched and disclosed publicly in the project changelog. The codebase's WebRTC implementation relies on the browser's native WebRTC engine — Chrome's libwebrtc, Firefox's WebRTC stack — rather than a custom implementation, inheriting the security properties of those well-audited, widely-deployed codebases.
Security is never finished. WebRTC is a complex specification, Supabase Realtime is a third-party dependency, and Cloudflare's infrastructure is outside VoiceMeet's direct control. The team's commitment is to be honest about the threat model, transparent about what the architecture can and cannot protect against, and responsive when the gap between the two requires closing. That posture — not marketing language about 'military-grade encryption' — is what meaningful security looks like in practice.
Security is a process, not a product. The goal is not to achieve a state of perfect safety but to reduce the surface area of failure and respond quickly when it appears.
Every time you open VoiceMeet and speak with a stranger, several layers of cryptographic protection activate automatically without requiring any action on your part. The keys are generated, the handshake happens, the audio flows encrypted from one device to another, and when the call ends the keys are discarded. No recording exists. No server has heard your voice. That is not a feature. That is the architecture — and it will not change because it cannot change without rebuilding the product from the ground up.
#security #encryption #webrtc #privacy