Overview

Networking is all about data exchange, and therefore, plays no small role in the operation of the internet. As can be seen from the majority of servers on the internet, Linux has been ported to various processor architectures and includes a considerable number of network protocols. Take a deeper dive into these protocols and one may stare at what seems to be an endless number of function pointers, which makes it easy to see why the network stack is large, complex, exacting, and demanding. Arguably, it’s no wonder that the network stack is slow to change.

This will be the first of a three part post that high-lights key areas in the understanding of the operation of the Linux kernel network stack.

  • Architecture Overview (Part 1)
  • Network Layers (Part 2)
  • Packet Management (Part 3)

By examining these three areas, we gain a better understanding of how protocols are layerd, implemented, and how packets are manipulated [routed]. By doing so, this allows us to fine tune the performance of our networks, trouble-shoot issues, or implement an entirely new protocol.

License

Just a few minor details. The first thing to note is that proprietary code does not get included in the architecture. All code is licensed under the General Public License (GPL) and freely available. Patches to the network stack are done using git via a separate mailing list from the main list, known as the Kernel Networking Development mailing list.

Development Model

Next, much like the TCP/IP reference model, or the lower four layers of the ISO/OSI model, the network architecture is a layered communication implementation where each layer is responsible for handling a specific task which will either interface with the layer above, or the layer below. These well defined interfaces are implemented in C, as is the majority of the kernel. With so many networking layer combinations available, this flexibility allows for the creation of some very interesting protocol stacks. Generally, layering networks means more overhead and there are disadvantages to be had, but over time, the advantages seem to have shown to outweight the risks.

Execution Space

Each of the layers of our models are located in one of three specific areas, 1) user space, 2) kernel space, and 3) device space. User space of course executes the application layer and system calls whereas kernel space executes the core networking code mostly contained inside their respective kernel modules, both of which are handled by the CPU. The latter, device space, is what is executed by the network interface card (NIC) where packets are sent and received. Note also that modern NIC’s (usually with 1Gb speeds or more) incorporate advanced features such as a TCP offload engine (TOE) which can process the entire TCP/IP stack on the controller. We will cover layers with more detail in part 2.

Data Management

The data passing through those layers are held in a data structure known as sk_buff, or more commonly referred to as an skb within the code. The SKB API is how network packets are managed and the sk_buff data structure itself can be found in include/linux/skbuff.h which represents the data and headers of our protocols. This data structure will be dealt with in part 3.

Observations

Dynamic features such as virtual machines and byte-code interpreters seem to be the current trend in networking. These features move some of the complexity into user space and increase the levels of abstraction. Also, the general direction of development continues to work in favor of packet processing, performance, and efficiency. Packet processing probably being the most noticable as it allows users to configure traffic flow according to their requirements by inserting code to handle packet procedures. The good thing with all this is that network stack development continues to keep pace with internet progress.

Summary

From the network models, we now have an idea of how the layers in the network stack are organized, where each layer is located, and what data structure is responsible for traversing said stack. This tells us a good bit about the overall architecture, but we’d still like to know how protocols are implemented and get more familiar with the data structures and how packets are routed. In the upcoming post, part 2, we’ll cover the implementation details of how data travels the network stack, and in part 3 look at how packets are managed.


Would you like to be notified of future posts?

Gary is the principal software developer at NeuroQuest Software. An accomplished developer with over 26 years of experience largely dedicated to Open Source, his former position was spent working with NASA for nearly 15 years.