This is the second part of a two part blog about one of ONF’s newest projects in software defined networking: SD-Fabric. The first blog covers the background and accomplishments to date; this blog outlines exciting development opportunities going forward.
ONF and the SD-Fabric community have opportunities to gradually expand the SDN control domain and introduce additional data plane programmability. These opportunities can be categorized into three threads:
- Extending SD-Fabric to NIC and vSwitch
- Generalize Network Function Embedding
- Bring Benefits of SDN and Data Plane Programmability to Existing Switch Stacks
Thread 1: Extending SD-Fabric to NIC and vSwitch
We envision extending SD-Fabric to the NIC and to the server in a few notable steps:
First Step: Expand the SDN control domain to include SmartNICs
The goal is to extend existing SD-Fabric features to the server, such as:
- Enforcing slicing isolation and QoS to manage the NIC buffers between competing applications running on the same server, and
- Expanding INT visibility to the NIC pipeline to detect drops, congestion, and other anomalies.
At the control plane level, the ONOS core needs to be extended to support the server and NIC abstractions, and new southbound protocols and drivers will need to be developed. These enhancements will enable newly developed control plane applications that will configure and program the NIC as well as changes to existing applications like routing and INT.
From a data plane programmability perspective, the pipeline definition (e.g. P4 program) will need to be adapted to the NIC architecture. The switch configuration and telemetry models will need to be extended to support the NIC use case. An agent will need to run either on the NIC or on the server CPU. While the intention is to use an existing open source tool for this (i.e. SONiC, Stratum, or P4 OvS), it will need to be adapted to support the NIC abstractions (e.g. OS level networking capabilities, QoS model), hardware SDK, and server management model.
This step provides the foundation for controlling additional server NIC capabilities in SD-Fabric, including slicing, INT, and QoS.
Second Step: Expand the SDN control domain to include the vSwitch
This step expands the control domain by another layer, enabling new capabilities. For example:
- Providing early classification for slicing and QoS at the container (or VM) virtual interface, and
- Expanding INT visibility to detect drops (e.g., due to container security policies, or missing entries in the vSwitch tables) and other anomalies.
- Exploring opportunities to unify/flatten underlay and overlay network for better visibility, less overhead, and simpler network design.
It is important to note that the vSwitch layer is significantly more dynamic and changes more frequently than the hardware infrastructure layer (switches, NICs), and ONOS will need to be tested (and possibly extended) to support the order of magnitude increase in topology changes and network state. There needs to be some exploration and broader consensus building before prototype and development begins. There is an opportunity to explore bi-directional orchestrator integration: the orchestrator will inform the controller when new switches and interfaces are added, and the controller can provide information to the orchestrator about the state of the network to better inform placement decisions. The vSwitch is also an area that has adopted a variety of programming methods for the data plane including C and eBPF, and the plan is to explore if and how these other approaches can be reconciled.
Completion of this step will enable the first truly end-to-end programmable fabric that allows for packets to be controlled and monitored from their source application to their destination in the network domain.
For both steps, we hope to work with key industry players to reach consensus on the choice of interfaces and use cases to demonstrate along the way.
Thread 2: Generalize Network Function Embedding
The focus of this thread is making SD-Fabric a framework that allows operators to take advantage of the diversity of accelerators to execute network functions, such as the 5G UPF.
Today, SD-Fabric’s control plane apps are designed with the assumption that the UPF function is executed by the fabric switches. While this option works for certain deployments, it is not ideal for small edge clouds where it’s hard to justify the footprint of a top of rack (ToR) switch. In those cases, executing the UPF using SmartNICs is a more cost-effective option. Similarly, SmartNICs and other accelerators such as FPGAs found in switch-server units, provide enhanced flexibility not found in switch ASICs that allows the realization of more advanced features, such as HQoS and DPI.
This thread envisions accelerators such as programmable switch ASICs, SmartNICs, and FPGAs, as a more integral part of the fabric. To achieve this objective, the control plane will need to be extended to provide first-class modeling of network functions. The agents that control these functions will likely need to be normalized, including a move to common interfaces (i.e. P4Runtime, gNMI, gNOI). The use of common interfaces and models will enable us to build out a library of pipeline definitions (i.e. P4 programs) and configuration models (based on OpenConfig) that can be deployed on a variety of network targets. From a topology perspective, apps running on ONOS will be able to deploy network functions to standalone accelerators such as switches or NICs, or embedded ones like FPGAs included in newer switch-server units.
This thread will enable SD-Fabric to control offloaded network functions, like the UPF, running on different accelerators other than switches. For example, it could control a switch-less fabric that relies on NICs alone for both routing and UPF. This will provide a more uniform treatment for different network targets than the siloed approach taken today, and will facilitate the option to deploy functions to different target types with minimal/no changes on the control plane and application layers.
Thread 3: Bring Benefits of SDN and Data Plane Programmability to Existing Switch Stacks
This thread focuses on two symbiotic objectives: allowing SD-Fabric to run on a broader set of targets by leveraging open source platforms that already support them (e.g. SONiC, OvS), and bringing the benefits of SDN and data plane programmability to the broader networking community. This helps keep the community focused and maximizes the benefits participants can gain from each other. It may be helpful to think of SDN interfaces and data plane programmability as an application that can be ported to existing switch operating systems or distributions rather than as a competitive full-stack solution.
P4 Integrated Network Stack (PINS)
To bring the benefits of SDN to the switching community, ONF plans to continue to work with partners to extend SONiC in the following ways:
- Support P4 programmability of the pipeline (after building consensus with key stakeholders in the SONiC community on the right approach),
- Extend PINS to meet SD-Fabric requirements,
- Provide open source SDN testing framework and test cases to SONiC for PINS features,
- Expand the fixed set of SAI functionality, including hash configuration, L2 FDB, VLAN, SVI, and VxLAN,
- Migrate existing SONiC components to take advantage of PINS features, including the response status, table entity ownership, and configurable retry, and
- Introduce critical state (which helps avoid gray failures) and operational interfaces (gNOI).
It is also important to recognize that the mainstream network operator does not deploy a full SDN stack today. A user may want to define and deploy the UPF using P4, while continuing to use BGP or OSPF to manage the routing table. Another user might want to use P4 to customize an ACL or use SDN to manage routes in a VFR without forklifting the entire control plane. To meet these users’ needs, the new capabilities that are exposed must be modular and independent. This will enable hybrid control planes that deliver the familiarity of traditional networking, while allowing users to realize the benefits of SDN and data plane programmability.
This PINS effort also includes enhancements to meet the requirements of SD-Fabric, including support for full programmability of the data plane and interface features that are not currently exposed (i.e. new P4Runtime entities and OpenConfig paths). This part allows SD-Fabric to benefit from the hardware support and hardening provided by the SONiC community. To prepare SD-Fabric to support PINS, some exploration will be needed to determine the right structure for the fabric pipeline, likely including some adaptation to the SAI pipeline.
Server Agent
This thread also envisions a data plane agent for server-based targets. P4 OvS, Stratum, and PINS are the potential candidates. With any of these options, we are looking at Intel’s recently announced Infrastructure Programming Development Kit (IPDK) to map the control plane interface (P4Runtime) and configuration interface (gNMI) to hardware. IPDK includes the open source Table Driven Interface (TDI) for programming a variety of targets, including CPU, SmartNIC, and switching ASIC. The use of IPDK should dramatically reduce the time and effort required to deliver a switch agent for the SmartNIC and vSwitch.
This thread aims to provide switching agents for the switch and server that expose SDN interfaces and data plane programming to the user. Rather than providing an SD-Fabric specific or a pure-play SDN solution, this thread will put special emphasis on extending the agents that are favored by existing communities for the respective targets.
Getting Involved
ONF is currently working on a plan to deliver enhancements to SD-Fabric, which includes resources from ONF and aligned companies in the ONF community. We invite additional ecosystem participation in this exciting endeavor. Please contact Brian O’Connor (brian@openneworking.org) to learn how if you are interested.