Compressed BGP Update MessageJuniper1137 Innovation Way
SunnyvaleCA
USA
prz@juniper.net
AT&T200 S Laurel Ave
MiddletownNJ
USA
ar977m@att.com
Futurewei Technologies Incjefftant.ietf@gmail.com
This document provides specification of an optional
compressed BGP update
message format to allow family
independent reduction in BGP control traffic volume.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119.
BGP as a protocol evolved over the years to carry larger and larger volumes of
information and this trend seems to continue unabated.
And while lots of the growth can be contributed to the advent of new address families spurred by
, steady increase in attributes and their size amplifies
this tendency.
Recently, even the same NLRI may be advertised multiple times by the means of ADD-PATH extensions.
All those developments drive up the volume of information BGP needs to exchange to synchronize RIBs of the peers.
Although BGP update format provides a simple "semantic" compression mechanism that avoids the repetition of attributes
if multiple NLRIs share them already, in practical terms,
the packing of updates has proven a difficult challenge. The packing attempts are further undermined by the plethora of "per NLRI-tagging"
attributes such as extended
communities .
One could of course dismiss the growing, raw volume of the data necessary to exchange BGP information between two peers as
a mere trifle given the still rising link bandwidths, alas we are facing other sustained trends that would make the
reduction of data volume exchanged by BGP highly desirable:
Link delays will remain constant until radically new transmission mechanisms become common place
.
Bare those developments, and given the prevailing constant ethernet MTU, increasing volume of BGP traffic will
cause more and more IP packets
to be sent with the
BGP synchronization speed being limited by the
expanding bandwith-delay product.
The data volume, which for one peer may be reasonable, becomes less so when many of those need to be refreshed due to
and
interactions. Use of those techniques is expected to increase due to increasing demands on BGP reliability and
novel variants of state synchronization between peers.
BGP message length is limited to 4K which in itself is a recognized problem. Extensions to the message length are
being worked on but this puts
its own requirements and memory pressure on the implementations and ultimately will not help with attributes
exceeding 4K size limit in mixed environments.
Virtualization techniques introduce
an increasing amount of context switches an IP packet has to cross between two
BGP instances.
Coupled with difficulties in estimating a reasonable TCP MSS in virtualized environments and
the number of
IP packets TCP generates, more
and more context switching overhead per update is necessary
before good-put BGP processing can happen.
Obviously, unless we change BGP encoding drastically by e.g. introducing
more context to allow for semantic compression, we cannot expect a reduction in
data volume without paying some kind of price.
Ideas such as changing BGP format to allow for decoupling of attribute value updates from the NLRI
updates could be a viable course of action. The challenges of such a scheme are significant and since such
"compression" would extend the semantics and formats of the updates as we have them today, former and future drafts may
interact with such an approach in ways not discernible today. Last but not least, attempting to introduce a smarter, context-rich encoding
is likely to cause dependency problems and slow-down in BGP encoding procedures.
Fortunately, some observations can be made and emerging
trends exploited to attempt a reduction in
BGP data volumes without the mentioned disadvantages:
BGP updates are very repetitive. Smallest change in attribute values causes extensive repetition of all attributes
and any difference prevents packing of NLRIs in same update. On top, each update message BGP still carries
a marker that largely lost its practical value some time ago. One could generalize those facts by saying that BGP updates tend to
exhibit very low
entropy.
CPU cycles available to run control protocols are getting more and more abundant as does to a certain
extent memory. They tend to not be available anymore in easily harvested "single core with higher frequency" form factors but as
multiple cores that introduce the usual pitfalls of parallelization. In short, getting a lot of
independent work done is getting cheaper and cheaper while speeding up a single strain of
execution depending on previous results less so. This opens nevertheless the possibility to apply
different filters on BGP streams, possibly even executing in parallel threads. One possible filter can
compress the data in a manner completely transparent to the rest of existing implementation.
Hence, we suggest in this draft the removal of redundancy in the BGP update stream
via Huffman codes which can be applied as filter to a BGP update stream concurrently to
the rest of the BGP processing and per peer.
Subsequently, this document describes an optional scheme to compress BGP update traffic with a deflate variant of Huffman
encoding , .
In broadest terms, such a scheme will be beneficial if a BGP
implementation finds itself in an I/O constrained scenario
while having spare CPU cycles disponible. Compression will
ease the pressure on TCP processing and synchronization as well
as reduce raw number of IP packets exchanged between peers.
This document will request IANA to assign new BGP message type value and
and a new optional capability value in the BGP Capability
Codes registry. The suggested value for the Compressed
Updates message type in this process will be 6 and
for the Capability Code the suggested
value will be 76. IANA will be requested as well to assign a new subcode in the "BGP Cease
NOTIFICATION message subcodes" registry. The suggested name for the
code point will be "Decompression Error". The suggested value
will be 10.
The capability to *decompress* a new, optional message type carrying
compressed updates is advertised via the usual BGP optional capability
negotiation technique.
A peer MUST NOT send any compressed updates towards peers that did
not advertise the capability to decompress. A peer MAY send
compressed updates towards peers that advertised such
capability.
A new BGP message is introduced under the name of "Compressed BGP Update".
It contains inside arbitrary number of following message types
normal BGP updatesEnhanced Route Refresh subtype 1 and 2 (BoRR and EoRR)
Route Refresh with Options subtype 4 and 5 (BoRR and EoRR with options)
following
each other and compressed while following the rules below:
Compressed and uncompressed BGP updates MAY follow each other in
arbitrary order with exception of compressor overflow scenario per
.
After decompression of the stream of interleaved compressed and
uncompressed
BGP update messages the
resulting sequence of updates does not have to be identical to
the
sequence in a stream generated without compression. However,
the uncompressed sequence MUST ensure that the
ultimate semantics of the updates are the same to the peer as
in the no-compression case.
The updates contained within the compressed BGP update message
MUST be stripped of the initial marker while preserving the
BGP update message
header. The length field in the BGP update header retains its
original value.
Each compressed BGP Update MUST carry a sequence of
non-fragmented original updates,
i.e. it cannot contain a part of an original BGP update.
presents the only exception
to this rule.
Each compressed
BGP Update MUST be sent as a block, i.e. the decompression
MUST be able to yield decompressed results of the update
without waiting for
further compressed updates. This is different from the
normally used
stream compression mode. presents the only exception
to this rule.
The compressed update message MAY exceed the maximum
message size but in such case compressor overflow per
MUST be invoked.
To achieve optimal compression rates it is desirable to
provide to the compressor enough data so the
resulting compressed update is as close to the maximum
BGP update size as possible. Unfortunately, a Huffman
with adapting dictionary compresses at always varying ratio
which can
lead to an overflow unless it is used very conservatively.
A special provision, optionally to be used at the sender's
discretion, allows for such overruns and simplifies
the handling of overflow events.
In case
the compressed block size exceeds
the
maximum BGP update size, the compressing peer MUST set
the according bit in the compressed update generated and
MUST proceed it with one and only one compressed update with
the overflow and compressor restart bit cleared and the
remainder of the block.
No other BGP update messages are allowed in the TCP
stream between the compressed update of a certain compressor
and its overflow fragment.
In case of any deviations, the error procedures of
MUST be followed.
The receiving peer MUST concancenate the first compressed
update and the following overflow update as a single
compressed block and apply decompression to it.
The first update MAY be smaller than the maximum BGP
update size.
In certain scenarios it is beneficial for the
compressing peer to be able to restart any of the
compressors at any point in the ongoing BGP session.
To indicate such an occurrence, each compressed
update CAN carry a flag signaling to the decompressing
peer that it MUST restart the given de-compressor before
attempting to handle the update.
If the decompression fails for any reason,
the failure MUST cause
immediate CEASE notification with a newly introduced subcode of
"Decompression Error" (as documented in the IANA BGP Error
Codes registry).
The peer which experienced the failure MAY initiate the connection again but
it SHOULD NOT advertise the decompressor capability until an administrative
reset of the session or re-configuration of the peer. This will
achieve self-stabilization of the feature in case of
implementation problems.
The compressing peer MAY send such CEASE notification as well and close the peer.
It is at the discretion of the decompressing peer given such a
notification to omit the decompression capability on the next OPEN.
Network sniffing tool today have the capability to
monitor an ongoing BGP session and try to reconstruct
the state of the peers from the updates parsed. Obviously,
with compression enabled, such a monitor cannot
follow the compressed updates unless the session is
monitored from the first compressed update on.
Several possibilities to deal with the problem exist,
the simplest one being the restart of the compressors on a periodic
basis to allow the monitoring tool to 'sync up'. It goes without
saying that this will be detrimental to the compression
ratio achieved.
Another possibility would have been to periodically send
the Huffman dictionary over the wire but this complexity
has been left out as to not overburden this specification.
Moreover, at the current time,
such a capability is not part of any standard
Huffman implementation that could be easily referred to.
Decompressor Capability is following the normal procedures of
. In its generic form the option
can support different compressors in the future.
This document specifies only DEFLATE Huffman support per
.
To be obtained by early allocation,
suggested value in this process will be 76.1 octet.3 bits of CM indicating DEFLATE compressed format
value
as specified in .4 bits of CINFO as specified in . Invalid values MUST lead to the capability being ignored.
The compressing peer MUST use this value for the parametrization
of its algorithm.
This carries the original updates in a single message with content
adhering to .
To be obtained by early allocation,
suggested value in this process will be 6.2 octets.3 bits. Indicates the number of
the compressor used. Up to 8 compressors MAY
be used by the compressing peer to allow for
multiple thread of execution to compress the
BGP update stream. Accordingly the decompressing
side MUST support up to 8 independent decompressors.If the bit is set, the
according de-compressor MUST
be initialized before the following compressed data is
decompressed per .
The bit MAY be set on first compressed
update sent for the compressor on the session or is otherwise implied
sapienti sat.
The bit MUST NOT be set on the overflow fragment in case
of overflow.
If the bit is set, procedures in
MUST be applied.
If both
the R-bit and the O-bit are set, the de-compressor must
be re-initialized before the update and its overflow is assembled and
decompression attempted.Original uncompressed length indication as to be interpreted as
2**(11+ULI). This MUST indicate a buffer large enough the decompressed
data (including overflow) will fit in. The indication MAY be ignored by
the receiver but should allow for efficient buffer allocation. The field MUST be
ignored on overflow fragment.
This document introduces no new security concerns to BGP or other
specifications referenced in this document.Thanks to John Scudder for some bar discussions that primed
the creative process. Thanks to Eric Rosen, Jeff Haas,
Acee Lindem and
Jeff Tantsura for their careful reviews.Extended Message support for BGPExtension to BGP's Route Refresh MessageWorldwide Quantum Web May Be Possible with Help from GraphsNew Journal on Physics