Network Working Group S. Leonard Internet-Draft Penango, Inc. Updates: 5234 (if approved) P. Kyzivat Intended Status: Standards Track Expires: September 14, 2017 March 13, 2017 Constrained ABNF draft-seantek-constrained-abnf-02 Abstract This document extends the base definition of ABNF (Augmented Backus- Naur Form) to express a rule that is constrained by another rule. If a rule B is constrained by rule A, then every production generated by rule B must also be generated by rule A. By creating subordinate production forms, ABNF-using specifications can formally denote the relationship between a general rule and specific subsets of that rule, while preserving ABNF's context-free nature. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on September 14, 2017. Copyright Notice Copyright (c) 2017 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents Leonard & Kyzivat Standards Track [Page 1] Internet-Draft Constrained ABNF March 2017 carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. 1. Introduction Augmented Backus-Naur Form (ABNF) [RFC5234] is a formal syntax that is popular among many Internet specifications. As a context-free grammar, all rules heretofore are expressible in the form: rule = elements where a rule can be applied regardless of the context of the elements. Many Internet documents employ this syntax. However, many specifications define protocols with extension points where certain rules identify a field that can take on different productions with different semantics. For example, a field might generally permit any , but when the field has "IPv6:" *VCHAR, it has a peculiar meaning. The traditional ABNF approach is to enumerate a long list of production rules that comply with the general pattern, in the alternative, and then to tack on the generic pattern named as the . This syntax works okay for a base specification but makes it difficult to extend a rule in subsequent specifications in a way that formally names the conformance to the (as opposed to extending the rule with novel syntax). Furthermore, since ABNF does not imply an order of operations, a production that matches a specific rule will also match the generic catch-all rule. The traditional approach constructs an ambiguous grammar, even though the standards authors do not intend the grammar to be ambiguous. These limitations hamper computational ABNF parsers as well as ABNF efforts for services such as syntax highlighting, automatic grammar checking, and compiling into target computer languages. 2. Constrained Grammar This document provides a syntax for an ABNF rule that is constrained by another rule. We observe the following relation: If rule B is constrained by rule A, then every production generated by rule B must also be generated by rule A. A few comments are in order with this proposal. First of all, ABNF is a context-free grammar; this proposal attempts to preserve this Leonard & Kyzivat Standards Track [Page 2] Internet-Draft Constrained ABNF March 2017 nature. There are other ways to express "constraints" that are context-sensitive, but extending ABNF in such ways would make it a context-sensitive grammar (which implies more capable automata for parsing and generating such languages). Second of all, the relation "A constrains B" can, in other contexts, be understood as a "subclassing", "subtyping", or "conjunctive" operation. As with conjunction, the underlying operation is commutative: a rule "B-constrained-by-A" produces the same results as "A-constrained-by-B". However, from a rule-naming perspective, the first name (the constrained) is the name; the second name (the constraint) is the ternary operand. Therefore, the syntax proposed in Section 3 is not commutative. Third of all, it is possible to express a constraint relationship that cannot generate any production--not even the empty string. This is consonant with [RFC5234] ABNF. With generic [RFC5234] ABNF, it is possible to repeat rule definitions in a way that makes them impossible to be satisfied, such as followed by . This syntax behaves exactly the same way, except that it permits creating new rule names. 3. Constraint Syntax To write a constrained rule, use the ternary syntax rulename ^ constraint = elements. The constraint can be a list of elements, but in most formulations will simply be another rulename. The reflective case where constraint is rulename has no effect and is analogous to duplicating a rule in a list of rules as in [RFC5234] ABNF. The following enhancement to [RFC5234] permits this referenced- rule syntax as an incremental element: rulelist =/ constrainedrule constrainedrule = rulename constrained-by constraint *c-wsp "=" *c-wsp elements c-nl constraint = elements constrained-by = *c-wsp "^" *c-wsp In the constrained-rule production (constrainedrule), the "^" and "=" production is a ternary operator that takes the name of the rule, the constraint, and the elements. A processor that wishes to reduce constrainedrule to an [RFC5234] rule, can do so by conjoining the constraint with the elements. The synthesized rule is then appended in situ to the constraining rule in Leonard & Kyzivat Standards Track [Page 3] Internet-Draft Constrained ABNF March 2017 the alternative (with "/"), in elements where the constraining rule is named. A basic [RFC5234] processor will match or generate productions to both the constraining and constrained rules. [[NB: This synthesis demonstrates that constraint syntax amounts to syntactic sugar, rather than a fundamental change to ABNF.]] A rule is defined either regularly ("=") or as constrained("^" "="). The incremental alternative cannot be used with the constraint syntax. [[NB: This is a change from -00 to -01.]] However, a rule first defined as a constrained rule can be further refined with incremental alternatives. While this syntax suggests that a binary "^" operator could be defined, such an operator is undesirable for a couple of reasons. Firstly, arbitrary syntax serves no parsing or generating purpose: it would be clearer for an author to combine the two prior to writing a specification. A hypothetical is better written in the first place. Similarly, is better written in the first place. [[NB: From this perspective, idealized ABNF can be understood as being in Disjunctive Normal Form, given the existence of "=/".]] Secondly, conjunction inverts the dependency relationship between symbols in ABNF, reversing the directionality of edges in the reachability graph. In [RFC5234] ABNF, every nonterminal symbol (rule name) declares its dependent symbols, which are known at the time of parsing to be terminal or nonterminal. Having parsed a rule, an ABNF parser knows exactly which nonterminal symbols to look for before returning. Therefore, the parsing of subsequent rule definitions that do not have sought-after names can be skipped. The existence of a general-purpose conjunctive operator, however, implies that any rule definition could name a rule that constrains the subject rule, meaning that an ABNF parser would have to parse every rule to completion before returning. By moving the constraint to the left- hand-side of the "=", a parser need not parse the right-hand side of every rule. (Notably, a parser will have to parse all of the elements of the constraint, which is a reason why authors should exercise REstraint by limiting the constraint to a single named rule.) Stylistically, authors are encouraged to put constraint syntax below the rule that defines the constraint. If the constraining rule refers to novel rules, those rules may be defined prior the constrained rules. For example, the relevant parts of [RFC5322] could be written: ; trace is not relevant to this example fields = *(trace *field / *resent-field) *field Leonard & Kyzivat Standards Track [Page 4] Internet-Draft Constrained ABNF March 2017 resent-field = "Resent-" field-name ":" unstructured CRLF field = field-name ":" unstructured CRLF field-name = 1*ftext ftext = %d33-57 / %d59-126 ; obs-unstruct removed unstructured = (*([FWS] VCHAR) *WSP) resent-date ^ resent-field = "Resent-Date:" date-time CRLF resent-from ^ resent-field = "Resent-From:" mailbox-list CRLF resent-sender ^ resent-field = "Resent-Sender:" mailbox CRLF resent-to ^ resent-field = "Resent-To:" address-list CRLF resent-cc ^ resent-field = "Resent-Cc:" address-list CRLF resent-bcc ^ resent-field = "Resent-Bcc:" [address-list / CFWS] CRLF resent-msg-id ^ resent-field = "Resent-Message-ID:" msg-id CRLF orig-date ^ field = "Date:" date-time CRLF from ^ field = "From:" mailbox-list CRLF sender ^ field = "Sender:" mailbox CRLF reply-to ^ field = "Reply-To:" address-list CRLF to ^ field = "To:" address-list CRLF cc ^ field = "Cc:" address-list CRLF bcc ^ field = "Bcc:" [address-list / CFWS] CRLF message-id ^ field = "Message-ID:" msg-id CRLF in-reply-to ^ field = "In-Reply-To:" 1*msg-id CRLF references ^ field = "References:" 1*msg-id CRLF subject ^ field = "Subject:" unstructured CRLF Leonard & Kyzivat Standards Track [Page 5] Internet-Draft Constrained ABNF March 2017 comments ^ field = "Comments:" unstructured CRLF keywords ^ field = "Keywords:" phrase *("," phrase) CRLF 4. Effects on RFC 5234 Formally, this document updates [RFC5234] but does not modify it in situ. Authors need to reference this document if they want to include these enhancements; bare references to [RFC5234] do not include this specification. This directive follows a model whereby document authors can choose whether to invoke particular enhancements to ABNF. As time goes on, the IETF can determine how often these enhancements are invoked, and can decide whether to include them as part of a revision to the base [RFC5234]. 5. IANA Considerations This document implies no IANA considerations. 6. Security Considerations Security is truly believed to be irrelevant to this document. 7. References 7.1. Normative References [RFC5234] Crocker, D., Ed., and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", STD 68, RFC 5234, DOI 10.17487/RFC5234, January 2008, . 7.2. Informative References [RFC5322] Resnick, P., Ed., "Internet Message Format", RFC 5322, DOI 10.17487/RFC5322, October 2008, . Leonard & Kyzivat Standards Track [Page 6] Internet-Draft Constrained ABNF March 2017 Authors' Addresses Sean Leonard Penango, Inc. 5900 Wilshire Boulevard 21st Floor Los Angeles, CA 90036 USA EMail: dev+ietf@seantek.com URI: http://www.penango.com/ Paul Kyzivat Massachusetts United States EMail: pkyzivat@alum.mit.edu Leonard & Kyzivat Standards Track [Page 7]