Thursday, May 28, 2009

Review: Survey and Taxonomy of IP Address Lookup Algorithms

I will first start with some basic definitions:
  • Address Lookup: is when a router uses the packet's destination address as a key to consults its forwarding table in order to decide the next packet's destination.
  • A trie is a tree based data structure allowing the organization of prefixes on a digital basis by using the bits of prefixes to direct the branching. A node on level l represents the set of all addresses that begin with the sequence of l bits consisting of the string of bits labeling the path from the root to the node. (According to the survey).
  • Address Class Scheme: In IPv4 the address was first designed as a two part identifiers: the prefix designate the network and a suffix that identifies a host. The prefix refers to class category A (8 bits), B (16 bits) or C (24 bits). The prefix 130.86/16 is a class B with a 16 bit length (number after the slash).
  • Address aggregation: Since routers forwards packets based on the prefix IP part, forwarding tables need only to store a single entry in order to forward packets to all hosts attached to the same network. This technique is called address aggregation.
  • CIDR (Class Inter Domain Routing) Address Scheme: the CIDR was introduced to slow down the growth of the backbone forwarding tables. CIDR prefixes can be of arbitrary length rather than (classified addresses) 8, 16 or 24 bits long. Hence, this will allow recursive address aggregation. Forwarding Tables will reduce the number of entries of the backbone routers and will hold aggregate information rather than network information. Example: Consider the following networks 208.12.16/24 through 208.12.31/24 which are reachable from the same service provider. These networks share the same leftmost 20 bits. Instead of storing all these networks IP in a backbone router, these addresses can be aggregated into one "super router" represented by the same 20 bit prefix 208.12.16/20 and the backbone router will hold only one entry to the "super router".
  • Network renumbering: this operation occurs when a network changes its service provider. Hence the change of the IP prefix which imposes the change of the IP addresses assigned in the network, this is called network renumbering.
What is the longest prefix matching?

CIDR scheme reduces the forwarding tables through aggregations, however few elements can still interfere with the process of aggregation. Such as when a network 208.12.21/24 changes its service provider but still wants to use the same IP prefix and IP addresses assigned in the network. Consequently all networks from 208.12.16/24 through 208.12.31/24 can still be reached from the same service provider except 208.12.21/24 which break the aggregation process. The "super router" cannot aggregate 208.12.16/20 anymore, it will store in its forwarding table both 208.12.16/20 and the exception entry 208.12.21/24. Since the two entries share the same 20-bit prefix, when matching we need to look for the longest prefix in order to find the most specific forwarding information to the correct destination. Therefore, we are talking about a search in two dimensions: value dimension and length dimension.

Why do we need scalable and fast address lookup algorithms?
  • Number of entries in routing tables are still growing due to the increasing number of networks and exceptions.
  • Increasing number of update operations in routing tables. Forwarding tables needs to be updated dynamically to reflect the route change. According to [1] route changes occurs 100 times/second. Hence update operations must be fast (less than 10 msec).
Solutions:

Binary Trie:
In a binary trie, every node has at most 2 branches with values (0/1). The search starts at the root level and goes to the left or to the right according to the bits of the destination address. While traversing the trie, every time we visit a node marked as prefix (nodes with letters inside), the prefix is remembered as the long prefix visited so far. The search ends when there is no branches matching the next bit in the destination address and the best matching prefix (BMP) is the last prefix remembered. The Binary trie is a sequential prefix search by length, at each step the search space is reduced hierarchically . Update operations are simple, inserting a prefix begins with a search until the arrival to a node with no branch the node can be inserted. Deleting operations starts also with a search unmarking the node as a prefix and deleting unused nodes. Nodes don't store prefixes since the prefixes are represented by the trie structure.


Path-compressed Tries has been introduced to improve space and time performance for tries with sequences of one-child node. The prefix b in the figure above shows a one-child node sequence. When searching in a binary trie, The bits separating prefix a from b need to be inspected even though no other branches exist. The additional search is a waste of time and the additional nodes (between prefix b and a) consume additional memory. In a path-compressed trie, one-child nodes are removed, when performing a search we can jump directly to the bit where a decision can be made. Hence, a bit number field is kept to indicate the next bit to inspect. In a path-compressed trie, the bit-number field of the nodes traversed indicates the position to inspect and the bit value in the branch reveals the destination to follow (left if 0, right if 1) depending on the inspection result. When a node marked as prefix is encountered, a comparison with the actual prefix value is performed. This step is essential since we are skipping some bits. When a match is found, the search continues and the prefix is kept as the BMP so far. The search ends when a mismatch or a leaf is encountered. Path compression is useful when the binary trie is not well populated because using comparison has little benefits. In general binary and path-compressed tries requires many memory access. (worst case 32 for IPv4 address).

Simple Search on values approach:
The data structure needed is just an array containing unordered prefixes. The search goes through all entries comparing destination address. When a prefix match is found, it keeps the longest match. At the end, the last prefix remembered is the BMP. Clearly this scheme is a function of the number of entries and is not scalable.

Prefix transformation:
Forwarding information is based on prefixes that represents address ranges. One of the most common prefix transformation techniques is prefix expansion which expands one prefix into several longer and more specific prefixes that cover the same range of addresses. Appropriate prefix expansion results in fewer different lengths and can be used to make a faster search. In order to avoid the longest prefix match a method of disjoint prefixes is used. Disjoint prefixes do not overlap and no prefix is a prefix of another one. A trie representing disjoint prefixes will have prefixes at the leaves but not at internal nodes. A disjoint-prefix binary trie is obtained by adding leaves to nodes with only one-child. The following figure is a transformation of the figure above. Prefixes a1, a2 and a3 replaced prefix a. Prefixes are pushed from internal nodes to leaves. (Which will make prefixes disjoint).


Search on prefix lengths using Multibit Tries:
Multibit trie is a trie structure that allows multibit inspection at each step. Multibit tries increase the search speed, instead of inspecting one bit per step it allows multibit inspection. The number k of bits allowed per inspection is called stride. Every node in a multibit trie has at most 2^k child. Since search traverses strides of several bits at a time, multibit tries cannot support arbitrary prefix length. Height of the trie (comparing to binary tries) decreases and so the number of memory access when searching.
The strides choice influence the search speed and memory consumption. Actually if the stride increases so does the search speed (trie’s height decreases) and memory consumption to store larger amounts of entries. With a stride of 32 bit for IPv4, there is only one level access but memory to store 2^32 entries is required.
When prefix expansion is applied, the local BMP for each node of the subtrie is computed. Inserting or deleting a prefix needs to be applied on the subtries. Hence prefix update is local, if the new prefix will be stored in a subtrie with a stride of k bits then the update requires to modify at most 2^(k-1) nodes.
Choosing larger strides will make faster searches but more memory is needed and updates will modify more entries due to expansion.
Multibit based solutions emerged in order to optimize memory consumption and increase search speed such as the Level Compressed trie, Lulea scheme, Full Expansion/Compression and many others. However these approaches are inadequate when it comes to update operations, rebuilding the whole structure is more often the only solution.

Binary Search on Prefix Lengths:
Waldvogel [2] proposes another method for prefix lookup without using a trie structure. He proposes to organize prefixes in different tables according to their lengths and to use a binary search to reduce the search space in half in each step. A comparison is applied in each step in order to decide to which half the search should proceed. If a match is found, then the search can proceed to longer prefixes but if there is no match than no decision can be made regarding the direction to take.(Longer or shorter prefixes?). Waldvogel proposes to use precomputed BMP on each prefix and extra prefixes called markers to be sure that when no match is found the binary search should proceed towards shorter prefixes.
For example, if we search the BMP for the address 11000010, since the length is 8 we ll start searching in the table prefix with length 4.(half length). We compare entries with 1100, a match is found with prefix f then f is a BMP so far and we can proceed with 110000 (the other half is 4 bit length, we take a half and continue the search it means we have a 6 bit length address). If another match occurs with a prefix when searching in the 6 bit length tables then the prefix becomes the BMP so far. The other part of the address left is 10, 2 bit length, then we proceed with address 1100001 in the 7 bit length tables etc. When going towards a next step extra markers prefixes are inserted in the adequate length in order to direct the search to shorter (resp. longer) prefixes in there is a no match (resp. match). Usually I think that extra markers are inserted when the one child subtrie situation occurs.
The markers and precomputed BMP increases the memory usage and make updates more difficult because these values need also to be updated.

Prefix Range Search:
The prefix range search flatten the two dimension (length and value) search into only one dimension by getting rid of the length dimension. One way of doing this is by applying a prefix full expansion in order to have the same length of all prefixes (expansion to 32 bit address IPv4).
Once the transformation is applied, we can proceed with range search. Every prefix determines a well defined range of address. For example for 5 bit length address, a prefix a= 0* would define the range [0,15] since after expansion the prefix range would be [00000,01111]. Hence, instead of storing addresses we will store range endpoints. The BMP of the endpoints is in theory the same of all addresses in the same range. Finding a BMP for an address would be reduced to finding the endpoints of the corresponding interval. However, this technique will not work because prefix ranges may overlap. Range addresses are not disjoint, a range may be contained in another range.
However, endpoints divide the total address space into disjoint basic intervals. The BMP of an address can be found by using the endpoints of the basic intervals (see figure below). Some basic intervals do not have explicit endpoints, so it maintains two BMPs.
The figure below shows a search tree. For example, if we search the BMP for the address (22) 10110, the search begins with the key value 26 and the search proceeds following the comparison decision until arriving to the BMP value "d".
Since the BMP for each basic interval needs to be precomputed, update operation requires to recompute the BMP for many basic intervals.
A multiway search tree can be used just like the multibit trie in order to accelerate the search speed and reduce the tree's height.

IPv6 related:
According to the survey, multibit tries scale badly with longer addresses and hence prefixes when dealing with IPv6 128 bit addresses while the range search seems to be a good approach to face scalability issues for IPv6.


This survey reveals important limitations of the Internet infrastructure and address classification design in IP addresses. It exposes the challenges facing routers and Internet growth. It points to the need of a scalable algorithm that increases the search speed of addresses and reduces memory consumption and access. Such algorithm should handle incremental updates with a minimum change to the structure.
In my opinion, every naming system will face such challenges when dealing with delivering packets to the appropriate destination or retrieving pointers through addresses as keys.
Other reviews will be added to cover the state of the art of such algorithms proposed beyond year 2001.

References:
1 - C. Labovitz, “Scalability of the Internet Backbone Routing Infrastructure,” Ph. D. Thesis, University of Michigan, 1999.
2- M. Waldvogel, G. Varghese, J. Turner, and B. Plattner, “Scalable High Speed IP Routing Lookups,” Proceedings
of ACM SIGCOMM’97, pp. 25-36, September 1997.

Link to the survey

1 comment:

  1. I read this article and know the taxonomy and survey about the ip address algorithms and entire history etc., Because i know only how to get the ip adddress for our own website from ip details

    ReplyDelete