Network Working Group V. Krasnov Internet-Draft Cloudflare Inc. Intended status: Informational March 13, 2017 Expires: September 14, 2017 Compression Dictionaries for HTTP/2 draft-vkrasnov-h2-compression-dictionaries-02 Abstract This document specifies a new HTTP/2 frame type and new HTTP/2 settings values that would enable the use of previously transferred data as compression dictionaries, significantly improving overall compression ratio for a given connection. In addition, this document proposes to define a set of industry standard, static, dictionaries to be used with any Lempel-Ziv based compression for the common textual MIME types prevalent on the web. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on September 14, 2017. Copyright Notice Copyright (c) 2017 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must Krasnov Expires September 14, 2017 [Page 1] Internet-Draft Compression Dictionaries for HTTP/2 March 2017 include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1. Conventions and Terminology . . . . . . . . . . . . . . . 3 2. HTTP/2 Extension . . . . . . . . . . . . . . . . . . . . . . 3 2.1. Extension Settings . . . . . . . . . . . . . . . . . . . 3 2.2. Extension Frames . . . . . . . . . . . . . . . . . . . . 4 2.2.1. The SET_COMPRESSION_CONTEXT . . . . . . . . . . . . . 4 2.2.2. The SET_DICTIONARY Frame . . . . . . . . . . . . . . 4 2.2.3. The USE_DICTIONARY Frame . . . . . . . . . . . . . . 5 2.3. Static Dictionaries . . . . . . . . . . . . . . . . . . . 5 3. Dictionary State . . . . . . . . . . . . . . . . . . . . . . 6 3.1. Server Behavior . . . . . . . . . . . . . . . . . . . . . 6 3.2. Client Behavior . . . . . . . . . . . . . . . . . . . . . 7 4. Security Considerations . . . . . . . . . . . . . . . . . . . 7 5. HTTP/1.1 Mappings . . . . . . . . . . . . . . . . . . . . . . 8 5.1. The mapping . . . . . . . . . . . . . . . . . . . . . . . 8 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 9 6.1. Normative References . . . . . . . . . . . . . . . . . . 9 6.2. Informative References . . . . . . . . . . . . . . . . . 10 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 10 1. Introduction The HTTP/2 [RFC7540] protocol encourages the use of many small assets for CSS/JS/HTML, due to its multiplexed nature. Prior to HTTP/2, asset inlining was encouraged, resulting in fewer, larger assets per website. The HTTP/2 protocol also allows for transmitted data to be compressed with a lossless compression format. The format used is specified in the "Content-Encoding" (see [RFC2616], section 14.11) header field. For example, "Content-Encoding: br" means the data was compressed using the Brotli format. The nature of the compression algorithms, such as DEFLATE [RFC1951] and Brotli [RFC7932], used with HTTP in practice, require a certain "window" of data to perform backward matching. Therefore, larger files have much better compression ratio. To improve compression for smaller files, these algorithms allow to use a chunk of arbitrary data as a "Custom Dictionary" and function as the initial sliding window. Krasnov Expires September 14, 2017 [Page 2] Internet-Draft Compression Dictionaries for HTTP/2 March 2017 While compression, especially of dynamic resources, is a compute- heavy operation, where investing more compute power results in diminishing returns (in terms of compression ratio). This technique is known to improve compression ratio significantly, while not requiring significant additional processing time, and is supported by most LZ based compression formats. This document introduces a mechanism for using previously transmitted data over HTTP/2 as a dictionary to be used with an underlying compression algorithm. 1.1. Conventions and Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 2. HTTP/2 Extension 2.1. Extension Settings The extension introduces two new SETTINGS values. SETTINGS_COMPRESSION_SETTINGS(0xTBA): For greater compression, and to prevent setting identifier depletion, the 32-bit value for this setting is defined as follows: +---------------+---------+-----------+-----------+ | SDVersion (8) | Fmt (8) | DSize (8) | NDict (8) | +---------------+---------+-----------+-----------+ NDict: Indicates the number of dictionaries the client is willing to maintain. The default value is 0, the maximal value is 255. DSize: Log2 of the maximal size of each dictionary. The default value is 0, the maximal value is 255. For example value of 17 indicates each dictionary MUST be smaller or equal to 2^17 (131,072 octets). Fmt: Compression format to use. 1 indicates brotli, 2 indicates zlib. The default value is 0, for which cross-stream compression is disabled. Values 3-255 are reserved for additional formats. SDVersion: If greater than 0, indicates the version of static dictionaries to use. Maximal value is 255, the default value is 0, which indicates no static dictionaries are used. Krasnov Expires September 14, 2017 [Page 3] Internet-Draft Compression Dictionaries for HTTP/2 March 2017 2.2. Extension Frames 2.2.1. The SET_COMPRESSION_CONTEXT The SET_COMPRESSION_CONTEXT frame (type=0xTBA). +-------------+ | Context (8) | +-------------+ The SET_COMPRESSION_CONTEXT frame can be sent by the client on any stream in the idle state (prior to the first HEADER frames on the stream). It indicates that the stream MAY be compressed by dictionaries with the given Context ID. The SET_COMPRESSION_CONTEXT frame contains the following fields: Context: an 8-bit Context ID that indicates the compression context for the stream. A stream can only be compressed by dictionaries with the same Context. If a stream SETs a dictionary, it will be assigned the Context indicated by this frame. If the frame is ommited, then the context value is assumed to be 0. The allowed Context values are 0 through 253. A special Context ID of 255 indicates the stream SHALL NOT be compressed at all and passed as is. A special Context ID of 254 indicates the stream SHALL NOT be compressed with any dictionary, and it MUST NOT be used as dictionary for other streams. 2.2.2. The SET_DICTIONARY Frame The SET_DICTIONARY frame (type=0xTBA). +-------------+-------------+ | Dict ID (8) | Size (8) | +-------------+-------------+ The SET_DICTIONARY frame can be sent from the server to the client, on any client initiated stream in the open or half-closed (remote) states. The SET_DICTIONARY frame MUST precede any DATA frames on that stream. The SET_DICTIONARY frame SHOULD be followed by sufficient DATA frames to fill Size octets after decompression, even if an RST frame was received for that stream. If not enough DATA was sent, the Dictionary for the given ID is considered uninitialized. More than one SET_DICTIONARY frames MAY be sent on a given stream. The SET_DICTIONARY frame contains the following fields: Krasnov Expires September 14, 2017 [Page 4] Internet-Draft Compression Dictionaries for HTTP/2 March 2017 Dict ID: an 8-bit ID, indicates the dictionary. MUST be lower than the value agreed by the SETTINGS_COMPRESSION_SETTINGS setting. Size: Indicates how many octets (as a power of 2) of the given stream will be used. If Size is greater than the length of the transmitted data, then all of the data will be used. The SET_DICTIONARY frame defines the following flag: APPEND (0x1): Indicates that the data is to be appended to the existing dictionary with the given ID, as opposed to replacing it with the new data. 2.2.3. The USE_DICTIONARY Frame The USE_DICTIONARY frame (type=0xTBA). +-------------+ | Dict ID (8) | +-------------+ The USE_DICTIONARY frame indicates that the current stream is to be decompressed with the indicated dictionary. The USE_DICTIONARY frame MUST be sent before any DATA frame on a given stream. SET_DICTIONARY and USE_DICTIONARY frames MAY be sent on the same stream. Only one USE_DICTIONARY frame MAY be sent for a stream. The USE_DICTIONARY frame contains the following fields: Dict ID: an 8-bit ID that indicates which dictionary to use. The dictionary MUST be previously defined by a SET_DICTIONARY frame, or by a static dictionary. 2.3. Static Dictionaries This document proposes to generate a set of up to 8 standard dictionaries to be optionally bundled with supporting implementations. Each dictionary should be 32,768 or 65,536 octets long. Each static dictionary will be identified by an integer ID in the range {0..7}. If either endpoint supports the use of static dictionaries, it will indicate this by setting the SDVersion value of SETTINGS_COMPRESSION_SETTINGS to greater than 0. The number will indicate the highest version of the dictionaries known. Krasnov Expires September 14, 2017 [Page 5] Internet-Draft Compression Dictionaries for HTTP/2 March 2017 The actual version used will be the lowest of the two values set by the endpoints. If the client and the server agreed on the use of static dictionaries, then both will initialize the first 8 dictionaries (IDs 0 through 7), with the contents of the static dictionaries. The Context ID for those dictionaries will be 0. If the value of the field NDict is lower than 8, then up to NDict dictionaries will be initialized. 3. Dictionary State Both the server and the client MUST process the SET_DICTIONARY and USE_DICTIONARY frames in the order they are sent/received, with the exception when both are sent over the same stream. In that case USE_DICTIONARY is processed prior to the SET_DICTIONARY frames. Doing otherwise will result in an illegal state of the dictionaries. This is similar to the way HEADER frames are processed in order to maintain legal HPACK state on the server and the client. Initially the dictionaries are uninitialized, unless static dictionary use is agreed upon by both endpoints. In that case the dictionaries are initialized as described in Section 2.3. When SET_DICTIONARY is used with the APPEND flag cleared, both the server and the client SHALL use the first Size octets of the decompressed stream as the dictionary for the given ID. The Context ID for the dictionary is inherited from the stream. When SET_DICTIONARY is used with the APPEND flag set, both the server and the client SHALL take the existing dictionary with the given ID, append the first Size octets of the decompressed stream to it from the right, and use the last 2^DSize octets as the dictionary for the given ID. If the Context ID of the dictionary before the appendage was different from the stream ID this is considered as an error of type COMPRESSION_ERROR. 3.1. Server Behavior The server MAY send a SET_DICTIONARY frame on any client initiated stream in the open or half-closed (remote) states, prior to sending any DATA on that stream. After the server sends a SET_DICTIONARY stream with a given ID, it MUST not use the dictionary for compression, until it sent sufficient Krasnov Expires September 14, 2017 [Page 6] Internet-Draft Compression Dictionaries for HTTP/2 March 2017 data for the dictionary to become usable by the client. Sufficient data is computed as the size of the data after decompression. After sufficient data was sent, the server MAY use the associated dictionary on any subsequent stream by sending the USE_DICTIONARY frame. The server MUST only use dictionaries on streams with the same Context ID. 3.2. Client Behavior If the client wants to use cross-stream compression it must take into consideration that the data transferred might not be compressible. This can happen when an application level compression is used, encrypted data is transferred or for specific binary file formats, such as video, images etc. To maximize the benefits of cross-stream compression, the client SHOULD disable application level compression. In addition when binary data is expected on the stream, the clinet SHOULD hint to the server by sending a SET_COMPRESSION_CONTEXT with the special value of 255. When receiving a USE_DICTIONARY frame, the client MUST use the specified dictionary to decompress the DATA. A given stream MAY receive a SET_DICTIONARY and USE_DICTIONARY with the same ID. In that case the stream is decompressed with the existing dictionary and the dictionary is updated using the decompressed data. If a USE_DICTIONARY frame arrives for an uninitialized dictionary, this is considered as stream error of type COMPRESSION_ERROR. If a USE_DICTIONARY frame arrives for a stream with Context ID different from the dictionary Context ID, this is considered as a stream error of type COMPRESSION_ERROR. 4. Security Considerations As with any compression scheme, using cross-stream compression is potentially more sensitive to [BREACH] type of attacks. Therefore, this extension SHALL be disabled by default by all server implementations. Krasnov Expires September 14, 2017 [Page 7] Internet-Draft Compression Dictionaries for HTTP/2 March 2017 If a server acts as an intermediary, then it MUST NOT enable cross- stream compression, unless the origin also enables cross-stream compression. If the origin does not use the HTTP/2 protocol, the intermediary server MUST NOT use cross-stream compression, unless notified by the origin explicitly by either a receipt of a pre-agreed upon HTTP header, or out-of-band. If the origin does use the HTTP/2 protocol, then the intemediary server MUST preserve the context ids from the client to the origin. A client MAY indicate to the server a request SHALL NOT be compressed using a dictionary or used as a dictionary by sending a SET_COMPRESSION_CONTEXT with the special value of 254. The client MAY also indicate to the server other Context values for any stream, when it is desirable to prevent cross-stream side channel leaks, such as when connection coalescing is used. The server SHOULD avoid the use of cross-stream compression on cross- site requests. 5. HTTP/1.1 Mappings Cross-stream compression by definition is very efficient for HTTP/2, because it allows for a very fine-grained dictionary defention and consumption on the protocol level, and also because during the life span of an HTTP/2 connection it will serve more assets on average than an HTTP/1.1 connection. However there are cases when a behavior similar to cross-stream compression is desirable for HTTP/1.1 as well. This is especially true for long lived connections, for example from a CDN server to origin server. Therefore the following mapping to the HTTP/1.1 protocol is suggested. 5.1. The mapping The client SHALL indicate support for "cross-asset" compression by sending the CSCS header with the first request for the connection: CSCS = "CSCS" ":" 1("nd" "=" value ";" "ds" "=" value ";" "fmt" "=" value ";" "sd" "=" value) Krasnov Expires September 14, 2017 [Page 8] Internet-Draft Compression Dictionaries for HTTP/2 March 2017 Where "nd", "ds", "fmt" and "sd" map to the NDict, DSize, Fmt, and SDVersion settings. It MAY then indicate the desired Context ID for the request with the CSCC header: CSCC = "CSCC" ":" 1(value) Where the values are the same as defined for the SET_COMPRESSION_CONTEXT frame. The server MAY then compress the response body, similarly indicating the USE and SET functions using the CSCD header: CSCD = "CSCD" ":" ["u" "=" value ";"] 0#("s" "=" value [ ";" "a" "=" value] ) Where "u" indicates the dictionary id to use if any, maps to the USE_DICTIONARY frame. "s" indicates one or more dictionary to set, maps to the SET_DICTIONARY frame. The optional "a" value indicates if the dictionary is set in the "append" mode, maps to the APPEND flag of SET_DICTIONARY frame. Such mapping only exists for a single persistent HTTP/1.1 connection, and not between multiple connections. It therfore requires only connection-level state keeping. 6. References 6.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, DOI 10.17487/RFC2616, June 1999, . [RFC7540] Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext Transfer Protocol Version 2 (HTTP/2)", RFC 7540, DOI 10.17487/RFC7540, May 2015, . Krasnov Expires September 14, 2017 [Page 9] Internet-Draft Compression Dictionaries for HTTP/2 March 2017 6.2. Informative References [BREACH] Prado, A., Harris, N., and Y. Gluck, "BREACH: SSL, Gone in 30 Seconds", 2013, . [RFC1951] Deutsch, P., "DEFLATE Compressed Data Format Specification version 1.3", RFC 1951, DOI 10.17487/RFC1951, May 1996, . [RFC7932] Alakuijala, J. and Z. Szabadka, "Brotli Compressed Data Format", RFC 7932, DOI 10.17487/RFC7932, July 2016, . Author's Address Vlad Krasnov Cloudflare Inc. Email: vlad@cloudflare.com Krasnov Expires September 14, 2017 [Page 10]