Recursively Encapsulated TURN (RETURN) for Connectivity and Privacy in
WebRTC
Google, Inc.111 8th AveNew YorkNY10011USAbemasc@webrtc.orgGoogle, Inc.747 6th Street SouthKirklandWA98033USAjustin@uberti.name
RAI
In the context of WebRTC, the concept of a local TURN proxy has been
suggested, but not reviewed in detail. WebRTC applications are already
using TURN to enhance connectivity and privacy. This document explains how
local TURN proxies and WebRTC applications can work together.TURN is a protocol for communication
between a client and a TURN server, in order to route UDP traffic to and
from one or more peers. As noted in , the
TURN relay server “typically sits in the public Internet”. In
a WebRTC context, if a TURN server is to be used, it is typically provided
by the application, either to provide connectivity between users whose NATs
would otherwise prevent it, or to obscure the identity of the participants
by concealing their IP addresses from one another.In many enterprises, direct UDP transmissions are not permitted between
clients on the internal networks and external IP addresses, so media must
flow over TCP. To enable WebRTC services in such a situation, clients
must use TURN-TCP, or TURN-TLS. These configurations are not ideal: they
send all traffic over TCP, which leads to higher latency than would
otherwise be necessary, and they force the application provider to operate
a TURN server because WebRTC endpoints behind NAT cannot typically act as
TCP servers. These configurations may result in especially bad behaviors
when operating through TCP or HTTP proxies that were not designed to carry
real-time media streams.To avoid forcing WebRTC media streams through a TCP stage, enterprise
network operators may operate a TURN server for their network, which can
be discovered by clients using TURN Auto-Discovery
, or through a
proprietary mechanism. This TURN server may be placed inside the network,
with a firewall configuration allowing it to communicate with the public
internet, or it may be operated by a third party outside the network,
with a firewall configuration that allows hosts inside the network to
communicate with it. Use of the specified TURN server may be the only
way for clients on the network to achieve a high quality WebRTC
experience. This scenario is required to be supported by the WebRTC
requirements document
Section
2.3.5.1.When the application intends to use a TURN server for identity
cloaking, and the enterprise network administrator intends to use a TURN
server for connectivity, there is a conflict. In current WebRTC
implementations, TURN can only be used on a single-hop basis in each
candidate,
but using only the enterprise's TURN server reveals information about the
user (e.g. organizational affiliation), and using only the application's
TURN server may be blocked by the network administrator, or may require
using TURN-TCP or TURN-TLS, resulting in a significant sacrifice in
latency.To resolve this conflict, we introduce Recursively Encapsulated TURN,
a procedure that allows a WebRTC endpoint to route traffic through
multiple TURN servers, and get improved connectivity and privacy in
return. shows a browser located
inside a home or enterprise network which connects to the Internet
through a Network Address Translator and Firewall (NAT/FW). A
TURN server in the Internet cloud is also shown, which is provided
by the WebRTC application via the JavaScript RTCIceServers object.A WebRTC application can use a TURN server to provide NAT traversal,
but also to provide privacy, routing optimizations, logging, or
possibly other functionality. The application can accomplish this by
forcing all traffic to flow through the TURN server using the JavaScript
RTCIceTransportPolicy object .
Since this TURN server is injected by the application, we will refer to
it as an Application TURN server. shows a TURN server co-resident with
the NAT/FW, i.e. in the DMZ of the FW. This TURN server might be used by an
enterprise, ISP, or home network to enable WebRTC media flows that would
otherwise be blocked by the firewall, or to improve quality of service on
flows that pass through this TURN server. This TURN server is not part of
a particular application, and is managed as part of the border control
system, so we call it a Border TURN Server. shows the port allocated on this TURN
server as "(border)", not any particular candidate type, to distinguish it
from the other ports, which have been represented as ICE candidates in
accordance with the WebRTC specifications. This case is different, because
unlike an Application TURN server, there is not yet any specification for
how WebRTC should interact with a Border TURN server. Under what conditions
should WebRTC allocate a port on a Border TURN server? How should WebRTC
represent that port as an ICE candidate? This draft serves to answer these
two questions.In , there is both an
Application TURN server and a Border TURN server. The Firewall
is blocking UDP traffic except for UDP traffic to/from the Border
TURN server, so only the "(border)" port allocation will work. However,
there is no specified way for WebRTC to use this port as a candidate.
Moreover, this port on its own would not be sufficient to satisfy
the user's needs.
Both TURN servers provide important functionality, so we need a
way for WebRTC to select a candidate that uses both TURN servers.The solution proposed in this draft is for the browser to implement
RETURN, which provides a candidate that traverses both TURN servers,
as shown in .The Browser in implements
RETURN, so it allocates a port on the Border TURN server, now
referred to as a Border TURN Proxy by analogy to an HTTP
CONNECT or SOCKS Proxy (see ),
and then runs STUN and TURN over this allocation, resulting
in three candidates: relay2, srflx2, and host2. The relay2
candidate causes traffic to flow through both TURN servers by
encapsulating TURN within TURN - hence the name Recursively
Encapsulated TURN (RETURN).The host2 and srflx2 candidates are probably identical, so
one will be dropped by ICE. If the NAT/FW blocks UDP and the
application uses only relay candidates, then the relay2
candidate will be selected. Otherwise, the other candidates will
be used, in accordance with the usual ICE procedure.Only the browser needs to implement the RETURN behavior -
both the Border TURN Proxy and Application TURN servers' TURN
protocol usage is unchanged.
Note that this arrangement preserves the end-to-end security
and privacy features of WebRTC media flows. The ability to
steer the media flows through multiple TURN servers while still
allowing end-to-end encryption and authentication is a key
benefit of RETURN.
These goals are requirements on this document (not on implementations
of the specification).As noted in
Section 2.3.5.1 and requirement F20, a WebRTC browser endpoint MUST be
able to direct UDP
connections through a designated TURN server configured by enterprise
policy (a “proxy”).It MUST be possible to configure a WebRTC endpoint that supports
proxies to achieve connectivity no worse than if the endpoint were
operating at the proxy's address.For efficiency, network administrators SHOULD be able to prevent
browsers from attempting to send traffic through routes that are already
known to be blocked.Both network administrators and application developers may wish to
direct all their UDP flows through a particular TURN server. There are
many goals that might motivate such a choice, including
improving quality of service by tunneling packets through a network
that is faster than the public internet,monitoring the usage of UDP services,troubleshooting and debugging problematic services,logging connection metadata for legal or auditing reasons,recording the entire contents of all connections, orproviding partial IP address anonymization (as described in
Section 4.2.4 and
Section 5.4).To achieve our goals, we introduce the following new concepts:In this document a “proxy” is any TURN server that was
provided by any mechanism other than through the standard
WebRTC-application ICE candidate provisioning API
. We call it a "proxy" by
analogy with SOCKS proxies and similar network services, because it
performs a similar function and can be configured in a similar
fashion.If a proxy is to be used,
it will be the destination of traffic generated by the client. (There is
no analogue to the transparent/intercepting HTTP proxy configuration,
which modifies traffic at the network layer.) Mechanisms to configure a
proxy include Auto-Discovery
and local
policy ( Section 3.5.3, "ICE candidate
policy").In an application context, a proxy may be “active”
(producing candidates) or “inactive” (not in use, having no
effect on the context).A typical WebRTC browser endpoint may have multiple network
interfaces available, such as wired ethernet, wireless ethernet, and
WAN. In this document, a “virtual interface” is a procedure
for generating ICE candidates that are not simply generated by a
particular physical interface. A virtual interface can produce
“host”, “server-reflexive”,
and “relay” candidates, but may be restricted to only some
type of candidate (e.g. UDP-only).“Leakiness” is an attribute of a proxy configuration.
This document defines two values for the “leakiness” of a
proxy configuration: “leaky” and “sealed”. Proxy
configuration, including leakiness, may be set by local policy
(,
“ICE candidate policy”) or other mechanisms.A leaky configuration adds a proxy and also allows the browser to use
routes that transit directly via the endpoint's physical interfaces (not
through the proxy). In a leaky configuration, setting a proxy augments
the available set of ICE candidates. Multiple leaky-configuration
proxies may therefore be active simultaneously.A sealed proxy configuration requires the browser to route all WebRTC
traffic through the proxy, eliminating all ICE candidates that do not go
through the proxy. Only one sealed proxy may be active at a time.Leaky proxy configurations allow more efficient routes to be selected.
For example, two peers on the same LAN can connect directly (peer to
peer) if a leaky proxy is enabled, but must "hairpin" through the TURN
proxy if the configuration is sealed. However, sealed proxy
configurations can be faster to connect, especially if many of the
peer-to-peer routes that ICE will try first are blocked by the network's
firewall policies.In some configurations, an endpoint may be subject to multiple sealed
proxy settings at the same time. In that case, one of those settings
will have highest rank, and it will be the active proxy. In a given
application context (e.g. a webpage), there is at most one active sealed
proxy. This document does not specify a representation for rank.When a proxy is configured, by Auto-Discovery or a proprietary means,
the browser MUST NOT report a “relay” candidate representing
the proxy. Instead, the browser MUST connect to
the proxy and then, if the connection is successful, treat the TURN
tunnel as a UDP-only virtual interface.For a virtual interface representing a TURN proxy, this means that
the browser MUST report the public-facing IP address and port acquired
through TURN as a “host” candidate, the browser MUST perform
STUN through the TURN proxy (if STUN is configured), and it MUST perform
TURN by recursive encapsulation through the TURN proxy, resulting in
TURN candidates whose “raddr” and “rport”
attributes match the acquired public-facing IP address and port on the
proxy.Because the virtual interface has some additional overhead due to
indirection, it SHOULD have lower priority than the physical interfaces
if physical interfaces are also active. Specifically, even host
candidates generated by a virtual interface SHOULD have priority 0 when
physical interfaces are active (similar to
Section 4.1.2.2, “the local
preference for host candidates from a VPN interface SHOULD have a
priority of 0”).If the active proxy for an application is leaky, the browser should
undertake the standard ICE candidate discovery mechanism
on the available physical and virtual
interfaces.If the active proxy for an application is sealed, the browser MUST
NOT gather or produce any candidates on physical interfaces. The WebRTC
implementation MUST direct its traffic from those interfaces only to the
proxy, and perform ICE candidate discovery only on the single virtual
interface representing the active proxy.Any browser mechanism for specifying a proxy SHOULD allow the caller
to indicate a higher rank than the proxy provided by Auto-Discovery
.Some operating systems allow the browser to use multiple interfaces
to contact a single remote IP address. To avoid producing an excessive
number of candidates, WebRTC endpoints MUST NOT use multiple physical
interfaces to connect to a single proxy simultaneously. (If this were
violated, it could produce a number of virtual interfaces equal to the
product of the number of physical interfaces and the number of active
proxies.)Mechanisms for configuring a RETURN proxy SHOULD allow configuring a
proxy that only applies to connections made from a single physical
interface. This is useful to optimize efficiency in modes 2 and 3 of
.A proxy MAY have both an IPv4 and an IPv6 address (e.g. if the proxy
is specified by DNS and has both A and AAAA records). The client MAY
try both of these addresses, but MUST select one, preferring IPv6,
before allocating any remote addresses. This corresponds to the
the Happy Eyeballs procedure for
dual-stack clients.A proxy MAY provide both IPv4 and IPv6 remote addresses to clients
. A client SHOULD request both address
families. If both requests are granted, the client SHOULD treat the two
addresses as host candidates on a dual-stack virtual interface.
If a proxy configuration mechanism does not specify leakiness,
browsers SHOULD treat the proxy as leaky. This is similar to current
WebRTC implementations' behavior in the presence of SOCKS and HTTP
proxies: the candidate allocation code continues to generate UDP
candidates that do not transit through the proxy.The SOCKS5 proxy standard permits
compliant SOCKS proxies to support UDP traffic. However, most
implementations of SOCKS5 today do not support UDP. Accordingly, WebRTC
browsers MUST by default (i.e. unless deliberately configured otherwise)
treat SOCKS5 proxies as leaky and having lower rank than any configured
TURN proxies.Encapsulating a link in TURN adds overhead on the path between the
client and the TURN server, because each packet must be wrapped in a
TURN message. This overhead is sometimes doubled in RETURN proxying.
To avoid excessive overhead, client implementations SHOULD use
ChannelBind and ChannelData messages to connect and send data through
proxies and application TURN servers when possible. Clients MAY buffer
messages to be sent until the ChannelBind command completes (requiring
one round trip to the proxy), or they MAY use CreatePermission and Send
messages for the first few packets to reduce startup latency at the cost
of higher overhead.Adding overhead to packets on a link decreases the effective
Maximum Transmissible Unit on that link. Accordingly, clients that
support proxying MUST NOT rely on the effective MTU complying with the
Internet Protocol's minimum MTU requirement.ChannelData messages have constant overheard, enabling consistent
effective PMTU, but Send messages do not necessarily have constant
overhead. TURN messages may be fragmented and reassembled if they are not
marked with the Don't Fragment (DF) IP bit or the DONT-FRAGMENT TURN
attribute. Client implementors should keep this in mind, especially if
they choose to implement PMTU discovery through the proxy.As per , a TURN server MAY respond to
an Allocate request with an error code of 300 and an ALTERNATE-SERVER
indication. When connecting to proxies or application TURN servers,
clients SHOULD attempt to connect to the specified alternate server in
accordance with . The client MUST route
a connection to the alternate server through the proxy if and only if
the original connection attempt was routed through the proxy.It is possible that the same TURN server may appear more than once
in the network path. For example, if both endpoints configure the
same sealed proxy, then each peer will only provide candidates on this
proxy. This is not a problem, and will work as expected.It is also possible that the same TURN server could be used by both
the enterprise and the application. It might appear attractive to
connect to this server only once, rathering connecting to it through
itself, in order to avoid imposing unnecessary server load. However,
a RETURN client MUST connect to the server twice, even when this
appears redundant, to ensure correct session attribution.For example, consider a TURN service operator that issues different
authentication credentials to different customers, and then allows each
customer to observe the source and destination IP addresses used with
their credentials. Suppose the application and enterprise both have
accounts on this service: the application uses it to prevent the
enterprise from learning its peers' IP addresses, and the enterprise uses
it to prevent the application from learning its employees' IP addresses.
If the client only connects to the service once, then either the enterprise
or the application will learn IP address information (via the TURN
provider's metadata reporting) that was meant to be kept secret.As a result of this requirement, it is possible for the same TURN
server to appear up to four times in a RETURN network path: once as each
peer's application's TURN server, and once as each peer's sealed proxy.In this example, an enterprise network is configured with a firewall
that blocks all UDP traffic, and a TURN server is advertised for
Auto-Discovery in accordance with
.
The proxy leakiness of the TURN server is unspecified, so the browser
treats it as leaky.The application specifies a STUN and TURN server on the public net.
In accordance with the ICE candidate gathering algorithm
RFC 5245, it receives a set of candidates
like:
A host candidate acquired from one interface.
e.g. candidate:1610808681 1 udp 2122194687 [internal ip addr
for interface 0] 63555 typ host generation 0A host candidate acquired from a different interface.
e.g. candidate:1610808681 1 udp 2122194687 [internal ip addr
for interface 1] 54253 typ host generation 0The proxy, as a host candidate.
e.g. candidate:3458234523 1 udp 24584191 [public ip addr for
the proxy] 54606 typ host generation 0The virtual interface also generates a STUN candidate, but it is
eliminated because it is redundant with the host candidate, as noted
in Sec 4.1.2..The application-provided TURN server as seen through the virtual
interface. (Traffic through this candidate is recursively
encapsulated.)
e.g. candidate:702786350 1 udp 24583935 [public ip addr of the
application TURN server] 52631 typ relay raddr [public ip addr for
the proxy] rport 54606 generation 0
There are no STUN or TURN candidates on the physical interfaces, because
the application-specified STUN and TURN servers are not reachable
through the firewall.If the remote peer is within the same network, it may be possible to
establish a direct connection using both peers' host candidates. If the
network prevents this kind of direct connection, the path will instead
take a “hairpin” route through the enterprise's proxy, using
one peer's physical “host” candidate and the other's virtual
“host” candidate, or (if that is also disallowed by the
network configuration) a “double hairpin” using both
endpoints' virtual “host” candidates.Consider an enterprise network with TURN and HTTP proxies advertised
for Auto-Discovery with unspecified leakiness (thus defaulting to
leaky). The browser endpoint configures an additional TURN proxy by a
proprietary local mechanism.If the locally configured proxy is leaky, then the browser MUST
produce candidates representing any physical interfaces (including
SSLTCP routes through the HTTP proxy), plus candidates for both UDP-only
virtual interfaces created by the two TURN servers.There MUST NOT be any candidate that uses both proxies. Multiple
configured proxies are not chained recursively.If the locally configured proxy is “sealed”, then the
browser MUST produce only candidates from the virtual interface
associated with that proxy.If both proxies are configured for “sealed” use, then the
browser MUST produce only candidates from the virtual interface
associated with the proxy with higher rank.A RETURN proxy can capture, block, and otherwise interfere with all of
its clients' WebRTC network activity. Therefore, browsers and other WebRTC
endpoints MUST NOT use RETURN proxies that are provided by untrusted
sources. For example, endpoints MUST NOT implement a configuration based
on unauthenticated network multicast (e.g. mDNS) unless the endpoint will
only be used on networks where all other users are fully trusted to
intercept all WebRTC traffic. In contrast, endpoints MAY implement
mechanisms to configure RETURN proxies by system-wide policy, which can only
be modified by trusted system administrators.This document describes web browser behaviors that, if implemented
correctly, allow users to achieve greater identity-confidentiality during
WebRTC calls under certain configurations.If a site administrator offers the site's users a TURN proxy, websites
running in the users' browsers will be able to initiate a UDP-based WebRTC
connection to any UDP transport address via the proxy. Websites'
connections will quickly terminate if the remote endpoint does not reply
with a positive indication of ICE consent, but no such restriction applies
to other applications that access the TURN server. Administrators should
take care to provide TURN access credentials only to the users who are
authorized to have global UDP network access.TURN proxies and application TURN servers can provide some privacy
protection by obscuring the identity of one peer from the other.
However, unencrypted TURN provides no additional privacy from an observer
who can monitor the link between the TURN client and server, and even
encrypted TURN ( Section
4.6) does not provide significant privacy from an observer who sniff
traffic on both legs of the TURN connection, due to packet timing
correlations.This document requires no actions from IANA.Thanks to Harald Alvestrand, Philipp Hancke, Tirumaleswar Reddy, Alan
Johnston, John Yoakum, and Cullen Jennings for suggestions to improve the
content and presentation. Special thanks to Alan Johnston for
contributing the visual overview in .