6lo Working Group                                          M. Richardson
Internet-Draft                                  Sandelman Software Works
Intended status: Informational                            April 16, 2017
Expires: October 18, 2017


             Constrained firmware update problem statement
               draft-richardson-fud-constrained-update-00

Abstract

   This document details the problems of upgrading small devices that
   need complete firmware replacements, but which do not have enough
   storage to keep an entire copy of the replacement image.  In addition
   to detailing the specific challenge, a conceptual architecture for a
   solution is posited involving use of DTLS session resumption tickets
   with CoAP Block Transfer mode.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on October 18, 2017.

Copyright Notice

   Copyright (c) 2017 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of


Richardson              Expires October 18, 2017                [Page 1]

Internet-Draft               inplace update                   April 2017


   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
     1.1.  Fundamental Postulate . . . . . . . . . . . . . . . . . .   3
     1.2.  Terminology . . . . . . . . . . . . . . . . . . . . . . .   3
     1.3.  Challenges  . . . . . . . . . . . . . . . . . . . . . . .   3
       1.3.1.  Entire Image Validation . . . . . . . . . . . . . . .   3
       1.3.2.  Incrementally Validation  . . . . . . . . . . . . . .   4
       1.3.3.  Incrementally Decrypt . . . . . . . . . . . . . . . .   4
       1.3.4.  Securely transferred  . . . . . . . . . . . . . . . .   4
       1.3.5.  Use of CoAP . . . . . . . . . . . . . . . . . . . . .   4
       1.3.6.  Storage in the Network  . . . . . . . . . . . . . . .   4
     1.4.  Compromises . . . . . . . . . . . . . . . . . . . . . . .   4
   2.  Straw-man proposal  . . . . . . . . . . . . . . . . . . . . .   5
     2.1.  Things to standardize . . . . . . . . . . . . . . . . . .   6
   3.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   6
   4.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .   6
   5.  Normative References  . . . . . . . . . . . . . . . . . . . .   6
   Appendix A.  Change history . . . . . . . . . . . . . . . . . . .   6
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .   6

1.  Introduction

   Class 2 constrained devices typically have a few hundred kilobytes of
   program store.

   A Freescale mc1322x device contains 128K of flash with 96K of ram
   (which is mirrored from flash).  A CC2538 ("OpenMote") comes with up
   to 512KB of flash, but options for 256KB and 128KB exist.

   A basic build of the Contiki OS containing the simplest of of CoAP
   (erbium) server consumes approximately 60Kbyte of store on the
   Freescale device.

   This number does not include any bootstrap or application level TLS-
   style security, nor L2-security code.  Nor is the application code at
   all sophisticated.  Fitting all of these things would likely push the
   96K of space significantly, although there is also 80K of ROM that
   may be leveraged in some situations.

   An OpenMote would have little difficulty if the with 512KB of flash,
   but if the application fitted into the 256K device, there would be
   significant pressure to cost optimize into a smaller device.  Even if
   that pressure was resisted, at some point the 250K image that could
   be easily doubled buffered might grow bigger due a need to debug some


Richardson              Expires October 18, 2017                [Page 2]

Internet-Draft               inplace update                   April 2017


   real life customer problem.  It might be necessary to turn some
   significant portion of the flash memory into storage for debug
   information..

1.1.  Fundamental Postulate

   It is therefore postulated that regardless of the ratio of device
   capabilities to image size when initially specified, that the image
   size will grow over time to do more things (or do the same things
   more correctly) over time, and at some point the device will be
   unable to double buffer image updates.  Two things can occur that
   that point:

   1.  the device no longer receives updates.

   2.  the device can only be updated via JTAG or other invasive
       proceedure.

   A device which is hard to reach, or a which requires an update
   process that depends upon decade old equipment: laptops running
   ancient versions of Windows with hard to find cabling is effectively
   not able to be updated.

   The result is devices which present significant security risks
   because of the difficulty in deploying even the simplest of fixes to
   them.

1.2.  Terminology

   Terminology from [RFC7228] is used extensively in this document.

   In this document, the key words "MUST", "MUST NOT", "REQUIRED",
   "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY",
   and "OPTIONAL" are to be interpreted as described in BCP 14, RFC 2119
   [RFC2119] and indicate requirement levels for compliant STuPiD
   implementations.

1.3.  Challenges

   The firmware update process must accomplish the following things.

1.3.1.  Entire Image Validation

   The process MUST be able to cryptographically validate the entire
   image after writing to flash, and before rebooting into the image.


Richardson              Expires October 18, 2017                [Page 3]

Internet-Draft               inplace update                   April 2017


1.3.2.  Incrementally Validation

   The process SHOULD be able to incrementally validate blocks as
   received from the server before writing to flash.  Efficient flash
   operation may require that blocks of 4K or 128K must be received
   before executing the erase/flash operation.

1.3.3.  Incrementally Decrypt

   The image MAY be object encrypted (in addition to transfer
   encryption) and must be decrypted prior to writing to flash.

1.3.4.  Securely transferred

   The image MUST be transfered privately and integrally from a content
   server.  A mechanism like (D)TLS or OSCOAP is appropriate.  Access
   control to the content may be by private URL, username/password, or
   preferably, via ACE token.

1.3.5.  Use of CoAP

   The image SHOULD be transfered using CoAP, possibly using CoAP Block
   Transfer.  The transfer SHOULD be pausable if bandwidth in a LLN is
   unavailable.  The transfer MUST not be a single large Block Transfer,
   but MUST instead be a series of "bite" sized chunks.

1.3.6.  Storage in the Network

   In a network with many identical nodes, it SHOULD be possible for one
   node (once upgraded) to offer it's own image to another node.  This
   has significant latency and bandwidth savings in an LLN.

   For this optional feature to be possible, it implies that the image
   is not transformed in any way that breaks the cryptographic
   signature.

1.4.  Compromises

   In an LLN, one node may provide routing (mesh-under or route over)
   services to other nodes.  It is acceptable that for the entire
   duration of the upgrade tha the node is not capable of forwarding
   packets.  In a well provisioned LLN, alternate routes will be
   available.  In order to reduce the downtime, the protocol SHOULD
   provide an ability to focus the available upgrade bandwidth on a
   single node such that it is upgraded as quickly as possible and
   returned to service.


Richardson              Expires October 18, 2017                [Page 4]

Internet-Draft               inplace update                   April 2017


   A network may wish to upgrade leaf nodes first, or may wish to
   upgrade core nodes first.  Each method has advantages, and the choice
   of which one to do first SHOULD be managed by the operator.

2.  Straw-man proposal

   This is a sketch of one way of doing the upgrade.  The use of DTLS
   and CoAP is assumed.  It is assumed that the device can update flash
   in 4K blocks, and has approximately 8K of ram, possibly twice that.
   The image size is approximately 128K.

   The upgrade server provides the images 4K chunks, accessible at a
   predicable base URL, such as coaps://example.com/device/1234/
   chunk/00000 through coaps://example.com/device/1234/chunk/1f000.  The
   URLs are encoded as hex 4K blocks.

   The process starts with the network controller contacting the node to
   be updated, and a way to get the access token from the update
   server's ACE AS.  The client initiates the DTLS connection, along
   with the appropriate token.

   The client then stores all the details of the DTLS session in a
   stable part of it's flash, possibly encrypting the information of a
   randomly generated and known only locally key.  The use of a TPM
   module or other TEEP mechanism would be appropriate for this data.

   The stored information should include: the server's address, the port
   numbers, the DTLS sequence number, the current blocks URL, etc.  An
   appropriate hash of the next block to receive would be recorded.  It
   would not be inappropriate to craft much of the IP/UDP/DTLS/CoAP
   headers to use.

   The client then restarts into a very small recovery code.  This code
   can decrypt the saved information and must be capable of: 1.
   initiating a CoAP block transfer for a 4K block.  2. receiving the
   results.  3. validating a hash of the block.  4. flashing the block
   to the right location.  5. updating the stored information to update
   to the next block.  This may be done using information appended to
   the transfer itself, or via additional CoAP headers.  (Such as a 2.01
   header) 6. "rebooting" to start again.

   The design above should guarantee that the transfer can continue
   where it left off in the event of power failure or other transfer
   failure that causes a reboot (a watchdog timer would be appropriate).


Richardson              Expires October 18, 2017                [Page 5]

Internet-Draft               inplace update                   April 2017


2.1.  Things to standardize

   The data block interface between the main operating system in the
   device and the recovery bootloader are rather specific to
   implementations, but it may be worth standardizing the abstract of
   what should be transfered.  The recover bootloader may be in ROM (and
   difficult to change) and main system in flash, and subject to upgrade
   over a period of decades.

   The format of the contents of the block that is transfered each time
   may need to include meta information (such as the location of the
   next block), as well as additional meta information to permit
   validation of the final image.  This may be need algorithm agility,
   which may be difficult to accomplish given that the recovery
   bootloader ROM itself may be unchangeable.  It may be necessary to do
   two-step validation.

3.  IANA Considerations

   This document details a problem and does not define any specific
   protocols, so no allocations are defined.

4.  Acknowledgements

   none yet.

5.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <http://www.rfc-editor.org/info/rfc2119>.

   [RFC7228]  Bormann, C., Ersue, M., and A. Keranen, "Terminology for
              Constrained-Node Networks", RFC 7228,
              DOI 10.17487/RFC7228, May 2014,
              <http://www.rfc-editor.org/info/rfc7228>.

Appendix A.  Change history

   version 00 of document.

Author's Address

   Michael Richardson
   Sandelman Software Works

   Email: mcr+ietf@sandelman.ca


Richardson              Expires October 18, 2017                [Page 6]