Network Working Group Y. Zheng Internet-Draft China Unicom Intended status: Informational S. Xu Expires: September 14, 2017 D. Dhody Huawei Technologies March 13, 2017 Usecases for Network Artificial Intelligence (NAI) draft-zheng-opsawg-network-ai-usecases-00 Abstract This document discusses the scope of Network Artificial Intelligence (NAI), and the possible use cases that are able to demonstrate the advantage of applying NAI. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on September 14, 2017. Copyright Notice Copyright (c) 2017 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Zheng, et al. Expires September 14, 2017 [Page 1] Internet-Draft Usecases of NAI March 2017 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. NAI Architecture . . . . . . . . . . . . . . . . . . . . . . 3 3. NAI Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 3 3.1. Traffic Predication and Re-Optimization/Adjustment . . . 3 3.2. Route Monitoring and Analytics . . . . . . . . . . . . . 4 3.3. Multilayer Fault Detection In NFV Framework . . . . . . . 5 3.4. Data Center Network Use Cases . . . . . . . . . . . . . . 7 3.4.1. Service Function Chaining . . . . . . . . . . . . . . 7 4. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 8 5. Security Considerations . . . . . . . . . . . . . . . . . . . 8 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 7. Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 8 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 8 8.1. Normative References . . . . . . . . . . . . . . . . . . 9 8.2. Informative References . . . . . . . . . . . . . . . . . 9 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 9 1. Introduction Current networks have become much more dynamic and complex, and pose new challenges for network management and optimization. For example, network management/optimization should be automated to avoid human intervention (and thus to minimize the operational expense). Artificial Intelligence (AI) and Machine Learning (ML) is a promising approach to realize such automation, and can even do better than human beings. Furthermore, the population of Software-Defined Networks (SDN) paradigm makes the application of Artificial Intelligence in networks possible, since the SDN controller has the complete knowledge of the network status and can control behavior of network nodes to implement AI decisions. AI and ML technologies can learn from historical data, and make predictions or decisions, rather than following strictly static program instructions. They can dynamically adapt to a changing situation and enhance their own intelligence with by learning from new data. It can learn and complete complicated tasks. It also has potential in the network technology area especially with SDN and Network Function Virtualization (NFV). This document presents the concept of Network Artificial Intelligence. It first discusses the scope of Network Artificial Intelligence (NAI). And then Some use cases are discussed to demonstrate the advantage of applying NAI. Zheng, et al. Expires September 14, 2017 [Page 2] Internet-Draft Usecases of NAI March 2017 2. NAI Architecture The definition of the architecture of NAI could be refer to [I-D.li-rtgwg-network-ai-arch]. In the architecture of NAI, central controller is the core part of Network Artificial Intelligence which can be called as 'Network Brain'. The Network Telemetry and Analytics (NTA) engines can be introduced acompanying with the central controller. The Network Telemetry and Analytics (NTA) engine inclues data collector, analytics framework, data persistence, and NAI applications. ^ ^ (4)| |(4) +---------------|--------------+ +---------------|--------------+ | Domain 1 | | | | Domain 2 | | +------------+ | | +------------+ | | | Central | | | | Central | | | (1)| Controller |----------------------| Controller |(1) | | | with | | | | with | | | | NTA | | | | NTA | | | +------------+ | | +------------+ | | / \ | | / \ | | (3)/ \ | | / \(3) | | / \ | | / \ | | +--------+ +--------+ | | +--------+ +--------+ | | | | | | | | | | | | | | |Network | ...... |Network | | | |Network | ...... |Network | | | | Device | (2) | Device | | | | Device | (2) | Device | | | | 1 | | N | | | | 1 | | N | | | +--------+ +--------+ | | +--------+ +--------+ | | | | | +------------------------------+ +------------------------------+ Figure 1: An Architecture of Network Artificial Intelligence(NAI) 3. NAI Use Cases 3.1. Traffic Predication and Re-Optimization/Adjustment This subsection introduces the Path Computation Element (PCE) [RFC4655] use cases in wide area networks (WAN). In PCE scenario, network data collection is realized through the control plane protocols such as PCE protocol (PCEP) and BGP-LS [RFC7752] protocol and data are passed to the PCE application. PCEP receives the state of Label Switched Path (LSP) from the network, and BGP-LS receives the topology information from the network. If network telemetry is used, traffic information can be received from the network as well directly at the NTA engine using protocols such as gRPC. Zheng, et al. Expires September 14, 2017 [Page 3] Internet-Draft Usecases of NAI March 2017 PCE application (APP) only maintains the latest information. To enable NAI, history of all LSP and topology changes is stored in external data repository. Further traffic monitoring data could also be collected and stored, if network telemetry is used. There are two usecases in the application scenarios: (1) reroute/re-optimize using the historical trend and predications from AI; (2) traffic congestion avoidance and AI-enabled auto-bandwidth adjustment. For the usecase (1), the analytics component in NTA (Network Telemetry and Analytics), can use stored data to build models to predict impact of network events and state of the LSPs. For example, it can use historical trends to guide path computation to include/ exclude specific links. Finding correlations between data, finding anomalies and data visualization are also possible. The analytics component in NTA can also use stored data to detect and predict network events and request PCE to take necessary actions. For example, it can use network bandwidth utilization historical trends to request for re-optimizations. For the usecase (2), with network telemetry, the NTA can collect per- link and per-LSP traffic flow using gRPC from network. Such network telemetry data includes statistics for tunnels, links, bandwidth reservations, actual usage, delay, jitter, packet loss, etc. Meanwhile, it also collects data regarding network events and its impact on traffic flows. The analytics component can use telemetry data to build traffic models to predict traffic congestion when new years or sporting events are coming. According to the congestion prediction, the PCE app could reroute traffic to avoid congested links. Besides the case, NTA can also perform predication and make necessary changes to network. In particular, the PCE APP performs bandwidth usage prediction (i.e., bandwidth calendaring) by looking at the historical trends of all sampled data instead of the instant sampled data. The collected data are traffic engineering data base (TEDB) and LSP-DB, and can also include scheduling information. In addition, the collected data also include auto-bandwidth related changes under particular network events. Using machine learning algorithm, the analytics component is able to correct such changes with the events, and predicts network events and their impact. 3.2. Route Monitoring and Analytics This subsection introduces the BGP Monitoring Protocol (BMP) [RFC7854] use case in wide area networks (WAN). The BGP protocol is known for its flexibility and ability to manage a large number of neighbors and routes. It is also the basis for many overlay services such as L3VPN, L2VPN and so on. The BMP protocol can be used by the Zheng, et al. Expires September 14, 2017 [Page 4] Internet-Draft Usecases of NAI March 2017 controller to monitor BGP protocol neighbor status and routing information on the routers. According to [RFC7854], BMP client located in the router collects BGP neighbor status, routes for each neighbor, and events defined by the user. And then it passes the informations through the BMP protocol to the management station located on the controller. Based on BMP monitoring of BGP, there are three use cases: (1) BGP Route Leaks Monitoring; (2) BGP Hijacks Monitoring; (3) Traffic Analytics. Route leaks involve the illegitimate advertisement of prefixes, blocks of IP addresses, which propagate across networks and lead to incorrect or suboptimal routing. For case (1), based on BMP, NAI apps can analyze BGP route leaks. For case (2), by manipulating BGP, data can be rerouted in an attacker's favor out them to intercept or modify traffic.If the malicious announcement is more specific than the legitimate one, or claims to offer a shorter path, the traffic may be directed to the attacker.By broadcasting false announcements, the compromised router may poison the RIB of its peers.After poisoning one peer, the malicious routing information could propagate to other peers, to other Autonomous Systems, and onto the interactive Internet. Based on monitoring BGP routes, ML algorithms can be trained to determine when a hijack has taken place and take necessary actions. In case (3), with BMP protocol providing BGP changes, together with Telemetry providing network traffic information, The NAI Apps can analyze traffic trends, predict traffic changes, and do traffic optimizing. 3.3. Multilayer Fault Detection In NFV Framework The high reliability and high availability required for carrier-class applications is a big challenge in virtualized and software-based environment where failures are normal in a software-based environment. The interdependence between NFV's abstraction levels and virtual resources is complex as shown in Fig.. The dynamic characteristics of the resources in the cloud environment make it difficult to locate the fault. So multilayer fault detection for NFV networks and cloud environment will be very useful. Zheng, et al. Expires September 14, 2017 [Page 5] Internet-Draft Usecases of NAI March 2017 +--------------------+ | Central | | Controller | | with | | NTA | +--------------------+ | | | | | | | | | V V V +-------------------------------------------------------+ | | | +-----------+ +-----------+ +-----------+ | | | VNF1 | | VNF2 | | VNF3 | | | +-----|-----+ +-----|-----+ +-----|-----+ | | | | VN-NF | | | +-------|--------------|--------------|-------+ | | | NFVI | | | | +-----------+ +-----------+ +-----------+ | | | | | Virtual | | Virtual | | Virtual | | | | | | Computing | | Storage | | Network | | | | | +-----------+ +-----------+ +-----------+ | | | | +-----------------------------------------+ | | | | | VIRTUALIZATION LAYER | | | | | +--------------------|--------------------+ | | | | VI-Ha | | | | |+---------------------|---------------------+| | | || Hardware Resouces || | | ||+-----------+ +-----------+ +-----------+|| | | ||| Computing | | Storage | | Network ||| | | ||| Hardware | | Hardware | | Hardware ||| | | ||+-----------+ +-----------+ +-----------+|| | | |+-------------------------------------------+| | | +---------------------------------------------+ | | | +-------------------------------------------------------+ Figure 2 NAI in Multi-layer NFV Framework For the virtualization layer, CPU performance, memory usage, interface bandwidth and other KPI indicators can be monitored. At the same time resource occupancy and the life cycle of NVF software process can also be monitored. Through the NAI, the relevant statistical data in multiple levels can be analyzed and the models can be setup to locate the root cause for the possible fault in the multi-layer environment. Zheng, et al. Expires September 14, 2017 [Page 6] Internet-Draft Usecases of NAI March 2017 3.4. Data Center Network Use Cases Traditionally, data center networks have comprised a large number of switches and routers that direct traffic based on the limited view of each device. With help of SDN/NFV the data center networks are more agile and dynamic to changing usage and traffic patterns. The real- time traffic data and usage can be used to make the data center management and operations intelligent. Various protocols such as sFLOW, IPFIX could be used to get the port statistics as well as traffic sampling. Over time this information can help build the traffic usage models on a per port and per flow basis. With historical data as the base the NTA engine can predict the traffic usage and make necessary instructions to the SDN controller or NFV orchestrator. These instructions could be reroute a flow to avoid a congested port or scale-in another switch to share load based on the predicted traffic demand. The NTA engine should find correlation between the various network data to build models and predict the impact of network events, congestions, network utilization patters etc. Further NTA could detect anomalies based on the historical patterns and help in root cause analysis. The policy framework can be enhanced to consider the analytics. NTA engine could also get the usage and health information from the Host (servers). Correlation between this information with the information received from network could help in finding security flows and anomalies when the information does not match. 3.4.1. Service Function Chaining This sub section introduces how to apply NAI to SFC scenario to intelligently reroute/re-optimize the service chains; increase utilization for both Service Functions(SF) and network; intelligent selection of the Service Function Path (SFP) based on data traffic trends. As per [RFC7665], Service function chaining (SFC) enables the creation of composite (network), services that consist of an ordered set of SFs that must be applied for specific treatment of received packets and/or frames and/or flows selected as a result of classification The SFs of chain are connected using a service function forwarder (SFF), which is responsible for forwarding traffic to one or more connected SFs according to information carried in the SFC encapsulation, as well as handling traffic coming back from the SF. Zheng, et al. Expires September 14, 2017 [Page 7] Internet-Draft Usecases of NAI March 2017 The various network telemetry information like delay, jitter, packet loss from the network and the CPU/memory usage utilizations from the SFs, can be collected using sFLOW/gRPC protocol and stored in persistent data repository. The analytics component in NTA can use stored data to build statistics models to predict the impact on various Service Function Paths due to network events, traffic and state of the SFPs and instruct the SDN controller to take necessary actions SDN controller can calculate new paths/reroute the SFC path to avoid congested Ports/SFFs or overloaded SFs. This correlation of application analytics from the SFs and the network analytics from the SFFs could enhance the intelligent management of the service chains for the operators. The usage and traffic pattern over time can help increase the utilization of SF as well as the underlay network. 4. Contributors The following people have substantially contributed to the usecases of NAI: Lizhao You Huawei Email: youlizhao@huawei.com Kalyankumar Asangi Huawei Email: kalyana@huawei.com 5. Security Considerations TBD 6. IANA Considerations This document has no actions for IANA. 7. Acknowledgement Thanks to Li Zhenbin and Liu Shucheng for their comments and contribution. 8. References Zheng, et al. Expires September 14, 2017 [Page 8] Internet-Draft Usecases of NAI March 2017 8.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . 8.2. Informative References [I-D.li-rtgwg-network-ai-arch] Li, Z. and J. Zhang, "An Architecture of Network Artificial Intelligence(NAI)", draft-li-rtgwg-network-ai- arch-00 (work in progress), October 2016. [RFC4655] Farrel, A., Vasseur, J., and J. Ash, "A Path Computation Element (PCE)-Based Architecture", RFC 4655, DOI 10.17487/RFC4655, August 2006, . [RFC7665] Halpern, J., Ed. and C. Pignataro, Ed., "Service Function Chaining (SFC) Architecture", RFC 7665, DOI 10.17487/RFC7665, October 2015, . [RFC7752] Gredler, H., Ed., Medved, J., Previdi, S., Farrel, A., and S. Ray, "North-Bound Distribution of Link-State and Traffic Engineering (TE) Information Using BGP", RFC 7752, DOI 10.17487/RFC7752, March 2016, . [RFC7854] Scudder, J., Ed., Fernando, R., and S. Stuart, "BGP Monitoring Protocol (BMP)", RFC 7854, DOI 10.17487/RFC7854, June 2016, . Authors' Addresses Yi Zheng China Unicom No.9, Shouti Nanlu, Haidian District Beijing 100048 China Email: zhengyi39@chinaunicom.cn Zheng, et al. Expires September 14, 2017 [Page 9] Internet-Draft Usecases of NAI March 2017 Xu Shiping Huawei Technologies Huawei Bld., No.156 Beiqing Rd. Beijing 100095 P.R. China Email: xushiping7@huawei.com Dhruv Dhody Huawei Technologies Divyashree Techno Park, Whitefield Bangalore, Karnataka 560066 India Email: dhruv.ietf@gmail.com Zheng, et al. Expires September 14, 2017 [Page 10]