The FlowQueue-CoDel Packet Scheduler and Active Queue Management AlgorithmKarlstad UniversityDept. of Computer ScienceKarlstad65188Swedentoke.hoiland-jorgensen@kau.seIBM Linux Technology Center1385 NW Amberglen ParkwayHillsboroOR97006USApaulmck@linux.vnet.ibm.comhttp://www2.rdrop.com/~paulmck/Teklibre2104 W First streetApt 2002FT MyersFL33901USAdave.taht@gmail.comhttp://www.teklibre.com/21 Oak Knoll RoadCarlisleMA993USAjg@freedesktop.orghttps://en.wikipedia.org/wiki/Jim_GettysGoogle, Inc.1600 Amphitheater PkwyMountain ViewCA94043USAedumazet@gmail.com
General
AQM working groupThis memo presents the FQ-CoDel hybrid packet scheduler/Active Queue
Management algorithm, a powerful tool for fighting bufferbloat and
reducing latency.FQ-CoDel mixes packets from multiple flows and reduces the impact of
head of line blocking from bursty traffic. It provides isolation for
low-rate traffic such as DNS, web, and videoconferencing traffic. It
improves utilisation across the networking fabric, especially for
bidirectional traffic, by keeping queue lengths short; and it can be
implemented in a memory- and CPU-efficient fashion across a wide range
of hardware.The FlowQueue-CoDel (FQ-CoDel) algorithm is a combined packet scheduler
and Active Queue Management (AQM) algorithm developed as
part of the bufferbloat-fighting community effort . It is
based on a modified Deficit Round Robin (DRR) queue scheduler
, with the CoDel AQM algorithm
operating on each queue. This document describes the combined algorithm;
reference implementations are available for the ns2 and ns3
network simulators, and it is included in the mainline Linux
kernel as the fq_codel queueing discipline .FQ-CoDel is a general, efficient, nearly parameterless queue management
approach combining flow queueing with CoDel. It is a powerful tool for
solving bufferbloat , and we believe it to be safe to turn on
by default, as has already happened in a number of Linux distributions.
In this document we document the Linux implementation in sufficient
detail for an independent implementation, to enable deployment outside
of the Linux ecosystem.Since the FQ-CoDel algorithm was originally developed in the Linux
kernel, that implementation is still considered canonical. This document
strives to describe the algorithm in the abstract in the first sections
and separate out most implementation details in subsequent sections, but
does use the Linux implementation as reference for default behaviour in
the algorithm description itself.The rest of this document is structured as follows: This section gives
some concepts and terminology used in the rest of the document, and
gives a short informal summary of the FQ-CoDel algorithm.
gives an overview of the CoDel algorithm. covers the flow hashing
and DRR portion. then describes the working of the
algorithm in detail, while describes implementation
details and considerations. lists some of the
limitations of using flow queueing. Finally,
outlines the current status of FQ-CoDel deployment and lists some
possible future areas of inquiry, and reiterates some
important security points that must be observed in the implementation.The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”,
“SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this
document are to be interpreted as described in .In this document, these words will appear with that interpretation only
when in ALL CAPS. Lower case uses of these words are not to be
interpreted as carrying significance.Flow: A flow is typically identified by a 5-tuple of source IP,
destination IP, source port, destination port, and protocol number. It
can also be identified by a superset or subset of those parameters, or
by media access control (MAC) address, or other means. FQ-CoDel hashes
flows into a configurable number of buckets to assign packets to
internal Queues.Queue: A queue of packets represented internally in FQ-CoDel. In most
instances each flow gets its own queue; however because of the
possibility of hash collisions, this is not always the case. In an
attempt to avoid confusion, the word ‘queue’ is used to refer to the
internal data structure, and ‘flow’ to refer to the actual stream of
packets being delivered to the FQ-CoDel algorithm.Scheduler: A mechanism to select which queue a packet is dequeued from.CoDel AQM: The Active Queue Management algorithm employed by FQ-CoDel
.DRR: Deficit round-robin scheduling .Quantum: The maximum amount of bytes to be dequeued from a queue at
once.Interval: Characteristic time period used by the control loop of CoDel
to detect when a persistent Queue is developing (see Section 4.3 of
).Target: Setpoint value of the minimum sojourn time of packets in a Queue
used as the target of the control loop in CoDel (see Section 4.4 of
).FQ-CoDel is a hybrid of DRR and CoDel ,
with an optimisation for sparse flows similar to Shortest Queue First
(SQF) and DRR++ . We call this “Flow Queueing” rather
than “Fair Queueing” as flows that build a queue are treated differently
from flows that do not.By default, FQ-CoDel stochastically classifies incoming packets into
different queues by hashing the 5-tuple of IP protocol number and source
and destination IP and port numbers, perturbed with a random number
selected at initiation time (although other flow classification schemes
can optionally be configured instead; see ). Each
queue is managed by the CoDel AQM algorithm . Packet ordering
within a queue is preserved, since queues have FIFO ordering.The FQ-CoDel algorithm consists of two logical parts: the scheduler
which selects which queue to dequeue a packet from, and the CoDel AQM
which works on each of the queues. The subtleties of FQ-CoDel are mostly
in the scheduling part, whereas the interaction between the scheduler
and the CoDel algorithm are fairly straight forward:At initialisation, each queue is set up to have a separate set of CoDel
state variables. By default, 1024 queues are created. The Linux
implementation at the time of writing supports anywhere from one to 64K
separate queues, and each queue maintains the state variables throughout
its lifetime, and so acts the same as the non-FQ CoDel variant would.
This means that with only one queue, FQ-CoDel behaves essentially the
same as CoDel by itself.On dequeue, FQ-CoDel selects a queue from which to dequeue by a two-tier
round-robin scheme, in which each queue is allowed to dequeue up to a
configurable quantum of bytes for each iteration. Deviations from this
quantum is maintained as byte credits for the queue, which serves to
make the fairness scheme byte-based rather than packet-based. The
two-tier round-robin mechanism distinguishes between “new” queues (which
don’t build up a standing queue) and “old” queues, that have queued
enough data to be around for more than one iteration of the round-robin
scheduler.This new/old queue distinction has a particular consequence for queues
that don’t build up more than a quantum of bytes before being visited
by the scheduler: Such queues are removed from the list, and then
re-added as a new queue each time a packet arrives for it, and so will
get priority over queues that do not empty out each round (except for
a minor modification to protect against starvation, detailed below).
Exactly how little data a flow has to send to keep its queue in this
state is somewhat difficult to reason about, because it depends on
both the egress link speed and the number of concurrent
flows. However, in practice many things that are beneficial to have
prioritised for typical internet use (ACKs, DNS lookups, interactive
SSH, HTTP requests, VoIP) tend to fall in this category,
which is why FQ-CoDel performs so well for many practical
applications. However, the implicitness of the prioritisation means
that for applications that require guaranteed priority (for instance
multiplexing the network control plane over the network itself),
explicit classification is still needed.This scheduling scheme has some subtlety to it, which is explained in
detail in the remainder of this document.CoDel is described in the ACM Queue paper , and the IETF
document . The basic idea is to control queue
length, maintaining sufficient queueing to keep the outgoing link busy,
but avoiding building up the queue beyond that point. This is done by
preferentially dropping packets that remain in the queue for “too long”.
Packets are dropped by head drop, which lowers the time for the drop
signal to propagate back to the sender by the length of the queue, and
helps trigger TCP fast retransmit sooner.The CoDel algorithm itself will not be described here; instead we refer
the reader to the CoDel draft .The intention of FQ-CoDel’s scheduler is to give each flow its own
queue, hence the term Flow Queueing. Rather than a perfect realisation
of this, a hashing-based scheme is used, where flows are hashed into a
number of buckets which each has its own queue. The number of buckets is
configurable, and presently defaults to 1024 in the Linux
implementation. This is enough to avoid hash collisions on a moderate
number of flows as seen for instance in a home gateway. Depending on the
characteristics of the link, this can be tuned to trade off memory for a
lower probability of hash collisions. See Section 6 for a more in-depth
discussion of this.By default, the flow hashing is performed on the 5-tuple of source and
destination IP addresses and port numbers and IP protocol number. While
the hashing can be customised to match on arbitrary packet bytes, care
should be taken when doing so: Much of the benefit of the FQ-CoDel
scheduler comes from this per-flow distinction. However, the default
hashing does have some limitations, as discussed in .FQ-CoDel’s DRR scheduler is byte-based, employing a deficit round-robin
mechanism between queues. This works by keeping track of the current
number byte credits of each queue. This number is initialised to the
configurable quantum; each time a queue gets a dequeue opportunity, it
gets to dequeue packets, decreasing the number of credits by the packet
size for each packet. This continues until the value of byte credits
becomes zero or less, at which point it is increased by one quantum, and
the dequeue opportunity ends.This means that if one queue contains packets of, for instance, size
quantum/3, and another contains quantum-sized packets, the first queue
will dequeue three packets each time it gets a turn, whereas the second
only dequeues one. This means that flows that send small packets are not
penalised by the difference in packet sizes; rather, the DRR scheme
approximates a (single-)byte-based fairness queueing scheme. The size of
the quantum determines the scheduling granularity, with the tradeoff
from too small a quantum being scheduling overhead. For small
bandwidths, lowering the quantum from the default MTU size can be
advantageous.Unlike plain DRR there are two sets of flows - a “new” list for flows
that have not built a queue recently, and an “old” list for queues that
build a backlog. This distinction is an integral part of the FQ-CoDel
scheduler and is described in more detail in .To make its scheduling decisions, FQ-CoDel maintains two ordered lists
of active queues, called “new” and “old” queues. When a packet is added
to a queue that is not currently active, that queue becomes active by
being added to the list of new queues. Later on, it is moved to the list
of old queues, from which it is removed when it is no longer active.
This behaviour is the source of some subtlety in the packet scheduling
at dequeue time, explained below.The packet enqueue mechanism consists of three stages: classification
into a queue, timestamping and bookkeeping, and optionally dropping a
packet when the total number of enqueued packets goes over the maximum.When a packet is enqueued, it is first classified into the appropriate
queue. By default, this is done by hashing (using a Jenkins hash
function ) on the 5-tuple of IP protocol, and source and
destination IP addresses and port numbers (if they exist), and taking
the hash value modulo the number of queues. The hash is salted by modulo
addition of a random value selected at initialisation time, to prevent
possible DoS attacks if the hash is predictable ahead of time (see
). The Linux kernel implements the Jenkins hash function by
mixing three 32-bit values into a single 32-bit output value. Inputs
larger than 96 bits are reduced by additional mixing steps, 96 bits at a
time.Once the packet has been successfully classified into a queue, it is
handed over to the CoDel algorithm for timestamping. It is then added to
the tail of the selected queue, and the queue’s byte count is updated by
the packet size. Then, if the queue is not currently active (i.e., if it
is not in either the list of new or the list of old queues), it is added
to the end of the list of new queues, and its number of credits is
initiated to the configured quantum. Otherwise, the queue is left in its
current queue list.Finally, the total number of enqueued packets is compared with the
configured limit, and if it is above this value (which can happen
since a packet was just enqueued), a packet is dropped from the head
of the queue with the largest current byte count. Note that this in most
cases means that the packet that gets dropped is different from the one
that was just enqueued, and may even be from a different queue.As mentioned previously, it is possible to modify the classification
scheme to provide a different notion of a ‘flow’. The Linux
implementation provides this option in the form of the tc filter
command. While this can add capabilities (for instance, matching on
other possible parameters such as MAC address, diffserv code point
values, firewall rules, flow specific markings, IPv6 flow label, etc.),
care should be taken to preserve the notion of ‘flow’ as much of the
benefit of the FQ-CoDel scheduler comes from keeping flows in separate
queues.For protocols that do not contain a port number (such as ICMP), the
Linux implementation simply sets the port numbers to zero and performs
the hashing as usual. In practice, this results in such protocols to
each get their own queue (except in the case of hash collisions). An
implementation can perform other classifications for protocols that have
their own notion of a flow, but SHOULD fall back to simply hashing on
source and destination IP address and IP protocol number in the absence
of other information.The default classification scheme can additionally be improved by
performing decapsulation of tunnelled packets prior to hashing on the
5-tuple in the encapsulated payload. The Linux implementation does this
for common encapsulations known to the kernel, such as 6in4 ,
IP-in-IP and GRE (Generic Routing Encapsulation)
. This helps to distinguish between flows that share the same
(outer) 5-tuple, but of course is limited to unencrypted tunnels (see
).Most of FQ-CoDel’s work is done at packet dequeue time. It consists of
three parts: selecting a queue from which to dequeue a packet,
actually dequeuing it (employing the CoDel algorithm in the process),
and some final bookkeeping.For the first part, the scheduler first looks at the list of new queues;
for the queue at the head of that list, if that queue has a negative
number of credits (i.e., it has already dequeued at least a quantum of
bytes), it is given an additional quantum of credits, the queue is put
onto the end of the list of old queues, and the routine selects the
next queue and starts again.Otherwise, that queue is selected for dequeue. If the list of new queues
is empty, the scheduler proceeds down the list of old queues in the same
fashion (checking the credits, and either selecting the queue for
dequeuing, or adding credits and putting the queue back at the end of
the list).After having selected a queue from which to dequeue a packet, the CoDel
algorithm is invoked on that queue. This applies the CoDel control law,
which is the mechanism CoDel uses to determine when to drop packets (see
). As a result of this, one or more packets may be
discarded from the head of the selected queue, before the packet that
should be dequeued is returned (or nothing is returned if the queue is
or becomes empty while being handled by the CoDel algorithm).Finally, if the CoDel algorithm does not return a packet, then the queue
must be empty, and the scheduler does one of two things: if the queue
selected for dequeue came from the list of new queues, it is moved to
the end of the list of old queues. If instead it came from the list of
old queues, that queue is removed from the list, to be added back (as a
new queue) the next time a packet arrives that hashes to that queue.
Then (since no packet was available for dequeue), the whole dequeue
process is restarted from the beginning.If, instead, the scheduler did get a packet back from the CoDel
algorithm, it subtracts the size of the packet from the byte credits for
the selected queue and returns the packet as the result of the dequeue
operation.The step that moves an empty queue from the list of new queues to the
end of the list of old queues before it is removed is crucial to
prevent starvation. Otherwise the queue could reappear (the next time a
packet arrives for it) before the list of old queues is visited; this
can go on indefinitely even with a small number of active flows, if the
flow providing packets to the queue in question transmits at just the
right rate. This is prevented by first moving the queue to the end of
the list of old queues, forcing a pass through that, and thus preventing
starvation. Moving it to the end of the list, rather than the front, is
crucial for this to work.The resulting migration of queues between the different states is
summarised in the following state diagram:Figure 1: Partial state diagram for queues between different states.
Both the new and old queue states can additionally have arrival and
dequeue events that do not change the state; these are omitted
here.This section contains implementation details for the FQ-CoDel algorithm.
This includes the data structures and parameters used in the Linux
implementation, as well as discussion of some required features of the
target platform and other considerations.The main data structure of FQ-CoDel is the array of queues, which is
instantiated with the number of queues specified by the flows parameter
at instantiation time. Each queue consists simply of an ordered list of
packets with FIFO semantics, two state variables tracking the queue
credits and total number of bytes enqueued, and the set of CoDel state
variables. Other state variables to track queue statistics can also be
included: for instance, the Linux implementation keeps a count of
dropped packets.In addition to the queue structures themselves, FQ-CoDel maintains two
ordered lists containing references to the subset of queues that are
currently active. These are the list of ‘new’ queues and the list of
‘old’ queues, as explained in above.In the Linux implementation, queue space is shared: there’s a global
limit on the number of packets the queues can hold, but not one per
queue.The following are the user configuration parameters exposed by the Linux
implementation of FQ-CoDel.The interval parameter has the same semantics as CoDel and is used to
ensure that the minimum sojourn time of packets in a queue used as an
estimator by the CoDel control algorithm is a relatively up-to-date
value. That is, CoDel only reacts to delay experienced in the last epoch
of length interval. It SHOULD be set to be on the order of the
worst-case RTT through the bottleneck to give end-points sufficient time
to react.The default interval value is 100 ms.The target parameter has the same semantics as CoDel. It is the
acceptable minimum standing/persistent queue delay for each FQ-CoDel
Queue. This minimum delay is identified by tracking the local minimum
queue delay that packets experience.The default target value is 5 ms, but this value should be tuned to be
at least the transmission time of a single MTU-sized packet at the
prevalent egress link speed (which for, e.g., 1Mbps and MTU 1500 is
~15ms), to prevent CoDel from being too aggressive at low bandwidths. It
should otherwise be set to on the order of 5-10% of the configured
interval.Routers do not have infinite memory, so some packet limit MUST be
enforced.The limit parameter is the hard limit on the real queue size, measured
in number of packets. This limit is a global limit on the number of
packets in all queues; each individual queue does not have an upper
limit. When the limit is reached and a new packet arrives for enqueue, a
packet is dropped from the head of the largest queue (measured in bytes)
to make room for the new packet.In Linux, the default packet limit is 10240 packets, which is suitable
for up to 10 Gigabit Ethernet speeds. In practice, the hard limit is
rarely, if ever, hit, as drops are performed by the CoDel algorithm long
before the limit is hit. For platforms that are severely memory
constrained, a lower limit can be used.The quantum parameter is the number of bytes each queue gets to
dequeue on each round of the scheduling algorithm. The default is set to
1514 bytes which corresponds to the Ethernet MTU plus the hardware
header length of 14 bytes.In systems employing TCP Segmentation Offload (TSO), where a “packet”
consists of an offloaded packet train, it can presently be as large as
64K bytes. In systems using Generic Receive Offload (GRO), they can be
up to 17 times the TCP max segment size (or 25K bytes). These
mega-packets severely impact FQ-CoDel’s ability to schedule traffic, and
hurt latency needlessly. There is ongoing work in Linux to make smarter
use of offload engines.The flows parameter sets the number of queues into which the
incoming packets are classified. Due to the stochastic nature of
hashing, multiple flows may end up being hashed into the same slot.This parameter can be set only at initialisation time in the current
implementation, since memory has to be allocated for the hash table.The default value is 1024 in the current Linux implementation.ECN is enabled by default. Rather than do anything special with
misbehaved ECN flows, FQ-CoDel relies on the packet scheduling system to
minimise their impact, thus the number of unresponsive packets in a flow
being marked with ECN can grow to the overall packet limit, but will not
otherwise affect the performance of the system.It can be disabled by specifying the noecn parameter.This parameter enables Date Centre TCP (DCTCP)-like processing resulting
in CE (Congestion Encountered) marking on ECN-Capable Transport (ECT)
packets starting at a lower sojourn delay setpoint than the
default CoDel Target. Details of DCTCP can be found in
.The parameter, ce_threshold, is disabled by default and can be set to
a number of microseconds to enable.Since the Linux FQ-CoDel implementation by default uses 1024 hash
buckets, the probability that (say) 100 flows will all hash to the same
bucket is something like ten to the power of minus 300. Thus, at least
one of the flows will almost certainly hash to some other queue.Expanding on this, based on analytical equations for hash collision
probabilities, for 100 flows, the probability of no collision is 90.78%;
the probability that no more than two of the 100 flows will be involved
in any given collision = 99.57%; and the probability that no more than
three of the 100 flows will be involved in any given collision = 99.99%.
These probabilities assume a hypothetical perfect hashing function, so
in practice they may be a bit lower. We have not found this difference
to matter in practice.These probabilities can be improved upon by using set-associative
hashing, a technique used in the Cake algorithm currently being
developed as a further development upon the FQ-CoDel principles. For a
4-way associative hash with the same number of total queues, the
probability of no collisions for 100 flows is 99.93%, while for an 8-way
associative hash it is ~100%.FQ-CoDel can be implemented with a low memory footprint (less than 64
bytes per queue on 64 bit systems). These are the data structures used
in the Linux implementation:<CODE BEGINS><CODE ENDS>The master table managing all queues looks like this:<CODE BEGINS><CODE ENDS>The CoDel portion of the algorithm requires per-packet timestamps be
stored along with the packet. While this approach works well for
software-based routers, it may be impossible to retrofit devices that do
most of their processing in silicon and lack space or mechanism for
timestamping.Also, while perfect resolution is not needed, timestamp resolution finer
than the CoDel target setting is necessary. Furthermore, timestamping
functions in the core OS need to be efficient as they are called at
least once on each packet enqueue and dequeue.When deploying a queue management algorithm such as FQ-CoDel, it is
important to ensure that the algorithm actually runs in the right place
to control the queue. In particular lower layers of the operating system
networking stack can have queues of their own, as can device drivers and
hardware. Thus, it is desirable that the queue management algorithm runs
as close to the hardware as possible. However, scheduling such
complexity at interrupt time is difficult, so a small standing queue
between the algorithm and the wire is often needed at higher transmit
rates.In Linux, the mechanism to ensure these different needs are balanced is
called “Byte Queue Limits” , which controls the device driver
ring buffer (for physical line rates). For cases where this
functionality is not available, the queue can be controlled by means of
a software rate limiter such as Hierarchical Token Bucket or
Hierarchical Fair-Service Curve . The Cake algorithm
integrates a software rate limiter for this purpose.Other issues with queues at lower layers are described in .Much of the scheduling portion of FQ-CoDel is derived from DRR and is
substantially similar to DRR++. Versions based on Stochastic Fair
Queueing have also been produced and tested in ns2. Other forms
of Fair Queueing, such as Weighted Fair Queueing or Quick Fair
Queueing , have not been thoroughly explored, but there’s no a
priori reason why the round-robin scheduling of FQ-CoDel couldn’t be
replaced with something else.For a comprehensive discussion of fairness queueing algorithms and their
combination with AQM, see .CoDel can be applied to a single queue system as a straight AQM, where
it converges towards an “ideal” drop rate (i.e., one that minimises delay
while keeping a high link utilisation), and then optimises around that
control point.The scheduling of FQ-CoDel mixes packets of competing flows, which acts
to pace bursty flows to better fill the pipe. Additionally, a new flow
gets substantial leeway over other flows until CoDel finds an ideal drop
rate for it. However, for a new flow that exceeds the configured
quantum, more time passes before all of its data is delivered (as
packets from it, too, are mixed across the other existing queue-building
flows). Thus, FQ-CoDel takes longer (as measured in time) to converge
towards an ideal drop rate for a given new flow, but does so within
fewer delivered packets from that flow.Finally, the flow isolation FQ-CoDel provides means that the CoDel drop
mechanism operates on the flows actually building queues, which results
in packets being dropped more accurately from the largest flows than
CoDel alone manages. Additionally, flow isolation radically improves the
transient behaviour of the network when traffic or link characteristics
change (e.g., when new flows start up or the link bandwidth changes);
while CoDel itself can take a while to respond, FQ-CoDel reacts almost
immediately.While FQ-CoDel has been shown in many scenarios to offer significant
performance gains compared to alternative queue management strategies,
there are some scenarios where the scheduling algorithm in particular is
not a good fit. This section documents some of the known cases which
either may require tweaking the default behaviour, or where alternatives
to flow queueing should be considered.In some parts of the network, enforcing flow-level fairness may not be
desirable, or some other form of fairness may be more important. An
example of this can be an Internet Service Provider that may be more
interested in ensuring fairness between customers than between flows. Or
a hosting or transit provider that wishes to ensure fairness between
connecting Autonomous Systems or networks. Another issue can be that the
number of simultaneous flows experienced at a particular link can be too
high for flow-based fairness queueing to be effective.Whatever the reason, in a scenario where fairness between flows is not
desirable, reconfiguring FQ-CoDel to match on a different characteristic
can be a way forward. The implementation in Linux can leverage the
packet matching mechanism of the tc subsystem to use any
available packet field to partition packets into virtual queues, to for
instance match on address or subnet source/destination pairs,
application layer characteristics, etc.Furthermore, as commonly deployed today, FQ-CoDel is used with three or
more tiers of service classification: priority, best effort and
background, based on diffserv markings. Some products do more detailed
classification, including deep packet inspection and
destination-specific filters to achieve their desired result.Where possible, FQ-CoDel will attempt to decapsulate packets before
matching on the header fields for the flow hashing. However, for some
encapsulation techniques, most notably encrypted VPNs, this is not
possible. If several flows are bunched into one such encapsulated
tunnel, they will be seen as one flow by the FQ-CoDel algorithm. This
means that they will share a queue, and drop behaviour, and so flows
inside the encapsulation will not benefit from the implicit
prioritisation of FQ-CoDel, but will continue to benefit from the
reduced overall queue length from the CoDel algorithm operating on the
queue. In addition, when such an encapsulated bunch competes against
other flows, it will count as one flow, and not assigned a share of the
bandwidth based on how many flows are inside the encapsulation.Depending on the application, this may or may not be desirable
behaviour. In cases where it is not, changing FQ-CoDel’s matching to not
be flow-based (as detailed in the previous subsection above) can be a
mitigation. Going forward, having some mechanism for opaque
encapsulations to express to the outer layer which flow a packet belongs
to, could be a way to mitigate this. Naturally, care needs to be taken
when designing such a mechanism to ensure no new privacy and security
issues are raised by exposing information from inside the encapsulation
to the outside world. Keeping the extra information out-of-band and
dropping it before it hits the network could be one way to achieve this.In the presence of queue management schemes that limit latency under
load, low-priority congestion control algorithms such as LEDBAT
(or, in general, algorithms that try to voluntarily use up
less than their fair share of bandwidth) experiences little added
latency when the link is congested. Thus, they lack the signal to back
off that added latency previously afforded them. This effect is seen
with FQ-CoDel as well as with any effective AQM .As such, these delay-based algorithms tend to revert to loss-based
congestion control, and will consume the fair share of bandwidth
afforded to them by the FQ-CoDel scheduler. However, low-priority
congestion control mechanisms may be able to take steps to continue to
be low priority, for instance by taking into account the vastly reduced
level of delay afforded by an AQM, or by using a coupled approach to
observing the behaviour of multiple flows.The FQ-CoDel algorithm as described in this document has been shipped as
part of the Linux kernel since version 3.5, released on the 21st of
July, 2012, with the ce_threshold being added in version 4.2. The
algorithm has seen widespread testing in a variety of contexts and is
configured as the default queueing discipline in a number of mainline
Linux distributions (as of this writing at least OpenWRT, Arch Linux and
Fedora). We believe it to be a safe default and encourage people running
Linux to turn it on: It is a massive improvement over the previous
default FIFO queue.Of course there is always room for improvement, and this document has
listed some of the known limitations of the algorithm. As such, we
encourage further research into algorithm refinements and addressing of
limitations. One such effort is undertaken by the bufferbloat community
in the form of the Cake queue management scheme . In addition to
this we believe the following (non-exhaustive) list of issues to be
worthy of further enquiry:Variations on the flow classification mechanism to fit different
notions of flows. For instance, an ISP might want to deploy
per-subscriber scheduling, while in other cases several flows can
share a 5-tuple, as exemplified by the RTCWEB QoS recommendations
.Interactions between flow queueing and delay-based congestion control
algorithms and scavenger protocols.Other scheduling mechanisms to replace the DRR portion of the
algorithm, e.g., QFQ or WFQ.Sensitivity of parameters, most notably the number of queues and the
CoDel parameters.There are no specific security exposures associated with FQ-CoDel that
are not also present in current FIFO systems. On the contrary, some
vulnerabilities of FIFO systems are reduced with FQ-CoDel (e.g., simple
minded packet floods). However, some care is needed in the
implementation to ensure this is the case. These are included in the
description above, however we reiterate them here:To prevent packets in the new queues from starving old queues, it is
important that when a queue on the list of new queues empties, it is
moved to the end of the list of old queues. This is described at the
end of .To prevent an attacker targeting a specific flow for a denial of
service attack, the hash that maps packets to queues should not be
predictable. To achieve this, FQ-CoDel salts the hash, as described in
the beginning of . The size of the salt and the strength of
the hash function is obviously a tradeoff between performance and
security. The Linux implementation uses a 32 bit random value as the
salt and a Jenkins hash function. This makes it possible to achieve
high throughput, and we consider it sufficient to ward off the most
obvious attacks.Packet fragments without a layer 4 header can be hashed into different
bins than the first fragment with the header intact. This can cause
reordering and/or adversely affect the performance of the flow.
Keeping state to match the fragments to the beginning of the packet,
or simply putting all packet fragments (including the first fragment
of each fragmented packet) into the same queue, are two ways to
alleviate this.This document has no actions for IANA.Our deepest thanks to Kathie Nichols, Van Jacobson, and all the members
of the bufferbloat.net effort for all the help on developing and testing
the algorithm. In addition, our thanks to Anil Agarwal for his help with
getting the hash collision probabilities in this document right.IP Encapsulation within IPThis document specifies a method by which an IP datagram may be encapsulated (carried as payload) within an IP datagram. [STANDARDS-TRACK]Key words for use in RFCs to Indicate Requirement LevelsIn many standards track documents several words are used to signify the requirements in the specification. These words are often capitalized. This document defines these words as they should be interpreted in IETF documents. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.Key and Sequence Number Extensions to GREThis document describes extensions by which two fields, Key and Sequence Number, can be optionally carried in the GRE Header. [STANDARDS-TRACK]The Addition of Explicit Congestion Notification (ECN) to IPThis memo specifies the incorporation of ECN (Explicit Congestion Notification) to TCP and IP, including ECN's use of two bits in the IP header. [STANDARDS-TRACK]Basic Transition Mechanisms for IPv6 Hosts and RoutersThis document specifies IPv4 compatibility mechanisms that can be implemented by IPv6 hosts and routers. Two mechanisms are specified, dual stack and configured tunneling. Dual stack implies providing complete implementations of both versions of the Internet Protocol (IPv4 and IPv6), and configured tunneling provides a means to carry IPv6 packets over unmodified IPv4 routing infrastructures.This document obsoletes RFC 2893. [STANDARDS-TRACK]Low Extra Delay Background Transport (LEDBAT)Low Extra Delay Background Transport (LEDBAT) is an experimental delay-based congestion control algorithm that seeks to utilize the available bandwidth on an end-to-end path while limiting the consequent increase in queueing delay on that path. LEDBAT uses changes in one-way delay measurements to limit congestion that the flow itself induces in the network. LEDBAT is designed for use by background bulk-transfer applications to be no more aggressive than standard TCP congestion control (as specified in RFC 5681) and to yield in the presence of competing flows, thus limiting interference with the network performance of competing flows. This document defines an Experimental Protocol for the Internet community.On Queuing, Marking, and DroppingThis note discusses queuing and marking/dropping algorithms. While these algorithms may be implemented in a coupled manner, this note argues that specifications, measurements, and comparisons should decouple the different algorithms and their contributions to system behavior.Controlled Delay Active Queue ManagementThis document describes a general framework called CoDel (Controlled Delay) [CODEL2012] that controls bufferbloat-generated excess delay in modern networking environments. CoDel consists of an estimator, a setpoint, and a control loop. It requires no configuration in normal Internet deployments. CoDel comprises some major technical innovations and has been made available as open source so that the framework can be applied by the community to a range of problems. It has been implemented in Linux (and available in the Linux distribution) and deployed in some networks at the consumer edge. In addition, the framework has been successfully applied in other ways. Note: Code Components extracted from this document must include the license as included with the code in Section 5.DSCP and other packet markings for WebRTC QoSMany networks, such as service provider and enterprise networks, can provide different forwarding treatments for individual packets based on Differentiated Services Code Point (DSCP) values on a per-hop basis. This document provides the recommended DSCP values for web browsers to use for various classes of WebRTC traffic.Datacenter TCP (DCTCP): TCP Congestion Control for DatacentersThis informational memo describes Datacenter TCP (DCTCP), an improvement to TCP congestion control for datacenter traffic. DCTCP uses improved Explicit Congestion Notification (ECN) processing to estimate the fraction of bytes that encounter congestion, rather than simply detecting that some congestion has occurred. DCTCP then scales the TCP congestion window based on this estimate. This method achieves high burst tolerance, low latency, and high throughput with shallow-buffered switches. This memo also discusses deployment issues related to the coexistence of DCTCP and conventional TCP, the lack of a negotiating mechanism between sender and receiver, and presents some possible mitigations.Bufferbloat: Dark buffers in the Internet.Cake comprehensive queue management systemBufferbloat web siteNS2 web siteNS3 web siteCurrent FQ-CoDel Linux source codeNetwork Byte Queue LimitsHierarchical Token BucketHierarchical fair-service curveEfficient Fair Queueing Using Deficit Round RobinDeficits for Bursty Latency-critical Flows: DRR++On the impact of TCP and per-flow scheduling on Internet PerformanceTelecom ParisTechOrange LabsEurocomControlling Queue DelayGoogle, IncFighting the bufferbloat: on the coexistence of AQM and low priority congestion controlTelecom ParisTechTelecom ParisTechTelecom ParisTechTelecom ParisTechTekLibreA Hash Function for Hash Table LookupStochastic fairness queueingQFQ: Efficient packet scheduling with tight guaranteesAnalysis and simulation of a fair queueing algorithm