Thursday, May 28, 2009

Review: Survey and Taxonomy of IP Address Lookup Algorithms

I will first start with some basic definitions:
  • Address Lookup: is when a router uses the packet's destination address as a key to consults its forwarding table in order to decide the next packet's destination.
  • A trie is a tree based data structure allowing the organization of prefixes on a digital basis by using the bits of prefixes to direct the branching. A node on level l represents the set of all addresses that begin with the sequence of l bits consisting of the string of bits labeling the path from the root to the node. (According to the survey).
  • Address Class Scheme: In IPv4 the address was first designed as a two part identifiers: the prefix designate the network and a suffix that identifies a host. The prefix refers to class category A (8 bits), B (16 bits) or C (24 bits). The prefix 130.86/16 is a class B with a 16 bit length (number after the slash).
  • Address aggregation: Since routers forwards packets based on the prefix IP part, forwarding tables need only to store a single entry in order to forward packets to all hosts attached to the same network. This technique is called address aggregation.
  • CIDR (Class Inter Domain Routing) Address Scheme: the CIDR was introduced to slow down the growth of the backbone forwarding tables. CIDR prefixes can be of arbitrary length rather than (classified addresses) 8, 16 or 24 bits long. Hence, this will allow recursive address aggregation. Forwarding Tables will reduce the number of entries of the backbone routers and will hold aggregate information rather than network information. Example: Consider the following networks 208.12.16/24 through 208.12.31/24 which are reachable from the same service provider. These networks share the same leftmost 20 bits. Instead of storing all these networks IP in a backbone router, these addresses can be aggregated into one "super router" represented by the same 20 bit prefix 208.12.16/20 and the backbone router will hold only one entry to the "super router".
  • Network renumbering: this operation occurs when a network changes its service provider. Hence the change of the IP prefix which imposes the change of the IP addresses assigned in the network, this is called network renumbering.
What is the longest prefix matching?

CIDR scheme reduces the forwarding tables through aggregations, however few elements can still interfere with the process of aggregation. Such as when a network 208.12.21/24 changes its service provider but still wants to use the same IP prefix and IP addresses assigned in the network. Consequently all networks from 208.12.16/24 through 208.12.31/24 can still be reached from the same service provider except 208.12.21/24 which break the aggregation process. The "super router" cannot aggregate 208.12.16/20 anymore, it will store in its forwarding table both 208.12.16/20 and the exception entry 208.12.21/24. Since the two entries share the same 20-bit prefix, when matching we need to look for the longest prefix in order to find the most specific forwarding information to the correct destination. Therefore, we are talking about a search in two dimensions: value dimension and length dimension.

Why do we need scalable and fast address lookup algorithms?
  • Number of entries in routing tables are still growing due to the increasing number of networks and exceptions.
  • Increasing number of update operations in routing tables. Forwarding tables needs to be updated dynamically to reflect the route change. According to [1] route changes occurs 100 times/second. Hence update operations must be fast (less than 10 msec).
Solutions:

Binary Trie:
In a binary trie, every node has at most 2 branches with values (0/1). The search starts at the root level and goes to the left or to the right according to the bits of the destination address. While traversing the trie, every time we visit a node marked as prefix (nodes with letters inside), the prefix is remembered as the long prefix visited so far. The search ends when there is no branches matching the next bit in the destination address and the best matching prefix (BMP) is the last prefix remembered. The Binary trie is a sequential prefix search by length, at each step the search space is reduced hierarchically . Update operations are simple, inserting a prefix begins with a search until the arrival to a node with no branch the node can be inserted. Deleting operations starts also with a search unmarking the node as a prefix and deleting unused nodes. Nodes don't store prefixes since the prefixes are represented by the trie structure.


Path-compressed Tries has been introduced to improve space and time performance for tries with sequences of one-child node. The prefix b in the figure above shows a one-child node sequence. When searching in a binary trie, The bits separating prefix a from b need to be inspected even though no other branches exist. The additional search is a waste of time and the additional nodes (between prefix b and a) consume additional memory. In a path-compressed trie, one-child nodes are removed, when performing a search we can jump directly to the bit where a decision can be made. Hence, a bit number field is kept to indicate the next bit to inspect. In a path-compressed trie, the bit-number field of the nodes traversed indicates the position to inspect and the bit value in the branch reveals the destination to follow (left if 0, right if 1) depending on the inspection result. When a node marked as prefix is encountered, a comparison with the actual prefix value is performed. This step is essential since we are skipping some bits. When a match is found, the search continues and the prefix is kept as the BMP so far. The search ends when a mismatch or a leaf is encountered. Path compression is useful when the binary trie is not well populated because using comparison has little benefits. In general binary and path-compressed tries requires many memory access. (worst case 32 for IPv4 address).

Simple Search on values approach:
The data structure needed is just an array containing unordered prefixes. The search goes through all entries comparing destination address. When a prefix match is found, it keeps the longest match. At the end, the last prefix remembered is the BMP. Clearly this scheme is a function of the number of entries and is not scalable.

Prefix transformation:
Forwarding information is based on prefixes that represents address ranges. One of the most common prefix transformation techniques is prefix expansion which expands one prefix into several longer and more specific prefixes that cover the same range of addresses. Appropriate prefix expansion results in fewer different lengths and can be used to make a faster search. In order to avoid the longest prefix match a method of disjoint prefixes is used. Disjoint prefixes do not overlap and no prefix is a prefix of another one. A trie representing disjoint prefixes will have prefixes at the leaves but not at internal nodes. A disjoint-prefix binary trie is obtained by adding leaves to nodes with only one-child. The following figure is a transformation of the figure above. Prefixes a1, a2 and a3 replaced prefix a. Prefixes are pushed from internal nodes to leaves. (Which will make prefixes disjoint).


Search on prefix lengths using Multibit Tries:
Multibit trie is a trie structure that allows multibit inspection at each step. Multibit tries increase the search speed, instead of inspecting one bit per step it allows multibit inspection. The number k of bits allowed per inspection is called stride. Every node in a multibit trie has at most 2^k child. Since search traverses strides of several bits at a time, multibit tries cannot support arbitrary prefix length. Height of the trie (comparing to binary tries) decreases and so the number of memory access when searching.
The strides choice influence the search speed and memory consumption. Actually if the stride increases so does the search speed (trie’s height decreases) and memory consumption to store larger amounts of entries. With a stride of 32 bit for IPv4, there is only one level access but memory to store 2^32 entries is required.
When prefix expansion is applied, the local BMP for each node of the subtrie is computed. Inserting or deleting a prefix needs to be applied on the subtries. Hence prefix update is local, if the new prefix will be stored in a subtrie with a stride of k bits then the update requires to modify at most 2^(k-1) nodes.
Choosing larger strides will make faster searches but more memory is needed and updates will modify more entries due to expansion.
Multibit based solutions emerged in order to optimize memory consumption and increase search speed such as the Level Compressed trie, Lulea scheme, Full Expansion/Compression and many others. However these approaches are inadequate when it comes to update operations, rebuilding the whole structure is more often the only solution.

Binary Search on Prefix Lengths:
Waldvogel [2] proposes another method for prefix lookup without using a trie structure. He proposes to organize prefixes in different tables according to their lengths and to use a binary search to reduce the search space in half in each step. A comparison is applied in each step in order to decide to which half the search should proceed. If a match is found, then the search can proceed to longer prefixes but if there is no match than no decision can be made regarding the direction to take.(Longer or shorter prefixes?). Waldvogel proposes to use precomputed BMP on each prefix and extra prefixes called markers to be sure that when no match is found the binary search should proceed towards shorter prefixes.
For example, if we search the BMP for the address 11000010, since the length is 8 we ll start searching in the table prefix with length 4.(half length). We compare entries with 1100, a match is found with prefix f then f is a BMP so far and we can proceed with 110000 (the other half is 4 bit length, we take a half and continue the search it means we have a 6 bit length address). If another match occurs with a prefix when searching in the 6 bit length tables then the prefix becomes the BMP so far. The other part of the address left is 10, 2 bit length, then we proceed with address 1100001 in the 7 bit length tables etc. When going towards a next step extra markers prefixes are inserted in the adequate length in order to direct the search to shorter (resp. longer) prefixes in there is a no match (resp. match). Usually I think that extra markers are inserted when the one child subtrie situation occurs.
The markers and precomputed BMP increases the memory usage and make updates more difficult because these values need also to be updated.

Prefix Range Search:
The prefix range search flatten the two dimension (length and value) search into only one dimension by getting rid of the length dimension. One way of doing this is by applying a prefix full expansion in order to have the same length of all prefixes (expansion to 32 bit address IPv4).
Once the transformation is applied, we can proceed with range search. Every prefix determines a well defined range of address. For example for 5 bit length address, a prefix a= 0* would define the range [0,15] since after expansion the prefix range would be [00000,01111]. Hence, instead of storing addresses we will store range endpoints. The BMP of the endpoints is in theory the same of all addresses in the same range. Finding a BMP for an address would be reduced to finding the endpoints of the corresponding interval. However, this technique will not work because prefix ranges may overlap. Range addresses are not disjoint, a range may be contained in another range.
However, endpoints divide the total address space into disjoint basic intervals. The BMP of an address can be found by using the endpoints of the basic intervals (see figure below). Some basic intervals do not have explicit endpoints, so it maintains two BMPs.
The figure below shows a search tree. For example, if we search the BMP for the address (22) 10110, the search begins with the key value 26 and the search proceeds following the comparison decision until arriving to the BMP value "d".
Since the BMP for each basic interval needs to be precomputed, update operation requires to recompute the BMP for many basic intervals.
A multiway search tree can be used just like the multibit trie in order to accelerate the search speed and reduce the tree's height.

IPv6 related:
According to the survey, multibit tries scale badly with longer addresses and hence prefixes when dealing with IPv6 128 bit addresses while the range search seems to be a good approach to face scalability issues for IPv6.


This survey reveals important limitations of the Internet infrastructure and address classification design in IP addresses. It exposes the challenges facing routers and Internet growth. It points to the need of a scalable algorithm that increases the search speed of addresses and reduces memory consumption and access. Such algorithm should handle incremental updates with a minimum change to the structure.
In my opinion, every naming system will face such challenges when dealing with delivering packets to the appropriate destination or retrieving pointers through addresses as keys.
Other reviews will be added to cover the state of the art of such algorithms proposed beyond year 2001.

References:
1 - C. Labovitz, “Scalability of the Internet Backbone Routing Infrastructure,” Ph. D. Thesis, University of Michigan, 1999.
2- M. Waldvogel, G. Varghese, J. Turner, and B. Plattner, “Scalable High Speed IP Routing Lookups,” Proceedings
of ACM SIGCOMM’97, pp. 25-36, September 1997.

Link to the survey

Wednesday, May 27, 2009

Review: RFC 4941 Privacy Extensions for Stateless Address Autoconfiguration in IPv6

RFC 4941 reveals privacy issues related to IPv6 stateless auto configuration.

The IPv6 stateless auto configuration allows hosts to generate IP addresses without the need of a central node to coordinate and distribute unique addresses. The mechanism allows to generate a unique IP address by using the IEEE interface identifier. Actually the interface identifier is generated based on address identifier such as a MAC address which is supposed to be unique, a characteristic guaranteed by the constructor/manufacturer of the network card.
Using a unique interface identifier allows nodes to generate unique IPv6 addresses. The Node adds the network prefix to the interface identifier in order to obtain a 128 bit unique (local or global) address.
Consequently, when a node moves to another network the prefix is the only changing part in the IPv6 address since the interface identifier is always the same and unique.
This unchanging part of the IP address allows for tracking and user localization which violates privacy. ( An employee is at home?, active?, with whom he is communicating? etc).
This privacy problem occurred in the IPv6 and did not exist in the IPv4 which assigned IP addresses independently from the interface identifier.

Any possible Approaches?
  • Use DHCPv6 to assign and manage addresses. Those addresses are also temporary and never changed. RFC 4941 claims to propose a similar DHCPv6 approach when using temporary addresses.
  • Change the interface identifier portion of the address over time and generate new addresses from the interface identifier.
  • Caller ID approach: Many machines function as clients and servers. When acting as a server the machine would need a DNS name. The privacy issue appears when the machine is acting like a client and its identity is revealed. (The similarity with the caller ID approach is when a user lists his telephone numbers publicly but disable the display of its number when initiating calls.)
RFC 4941 proposal:
Their approach proposes the generation of Temporary addresses, a pseudo-random sequence of interface identifiers using the MD5 hash. These addresses would be used for a short period of time. New temporary addresses will be generated to replace the expired ones. Nodes concerned about privacy may use different interface identifiers on different prefixes.

Generation of Randomized Interface Identifiers:
They propose to use 2 approaches in order to generate randomized interface identifiers.
  • The first requires a 64 bit stable storage for generated temporary addresses, so the new generated address is based on the previous one. This technique prevents two nodes of generating the same random number.
  • The non stable storage technique will use configuration parameters like user ID, serial numbers with a randomized data and an MD5 algorithm is order to generate random numbers.
  • Alternate approaches can be used like CGA (Cryptographically Generated Addresses) to generate a random interface ID based on the node's public key. The purpose is to prove ownership of an IPv6 address and prevents stealing and spoofing of addresses. However this technique requires that a node holds a public key. The node can still be identified by its key (transactions etc). The process is intensive and discourages frequent regeneration. (Especially on low cost machines).
However, every time a node generates a value it should checks that the address in not reserved for particular usage or already assigned. If so, the node should repeat the process of generating random interface identifiers. The node also generates temporary addresses of every public address created. New Temporary addresses should be generated to replace expired ones.

The use of temporary addresses is an approach proposed to resolve privacy issues, however this solution have the following impacts on the Internet:
  • The widespread use of temporary addresses complicating the flexibility of generating global unique addresses from interface identifiers since for each generated address DAD should be applied.
  • Clients having their addresses changing over time will make packet tracking more difficult and so debugging when unknown behaviors occurs. Hence the packet's source cannot be determined if it is from one machine or multiple ones.
  • Some servers refuses access to clients for which no DNS names exists. Temporary addresses are not registered.
  • How to distinguish in a large network with a high rate of changing temporary addresses between the new generated addresses and spoofed addresses.

We can see clearly that stateless auto configuration generates addresses without the need of a central node and no need to apply DAD. However this unchanging part is causing privacy issue and permits identification and node tacking across the Internet due to the prefix modification when entering another network.
This prefix modification is the result of the hierarchical nature of the Internet addresses to facilitate routing and delegate address management between organizations. An unstructured architecture will hide the topology of the network but will add more burden on routers in order to transfer and deliver packets.
A new naming and addressing approach should make a good balance between simplicity and flexibility in generating global unique addresses in a distributed, self manner and providing privacy hence not exposing identity and node's location.

Link to the RFC

Tuesday, May 26, 2009

Review: RFC 4862 - IPv6 Stateless Address Autoconfiguration

RFC 4862 specifies the steps for a host (except routers) to apply in order to auto configure its interface and generate an IPv6 address.

Why Stateless auto configuration?
  • Assign a unique address to an interface.
  • Hosts in small networks would not require a DHCP server or a router to obtain a unique address. Such hosts should be able to generate unique addresses in the network.
  • Large networks will not require DHCP servers for address auto configuration. Hosts should be able to generate global unique addresses.
  • Facilitate address renumbering on a site or subnet. A site renumber it's nodes when it switches to a new network service provider (IP prefix modification). When renumbering, old IP address will coexist with the new IP address during a period until the old IP becomes invalid.
How stateless auto configuration works?
We can divide the process of auto configuration into the following phases:
  1. A node generate a link-local address when its interface becomes enabled. (Reboot, start time, attachment to another link etc). This link-local address is formed by concatenating the interface identifier (generated from the interface MAC address for example) to the well-know link-local prefix 0xFE80::0.
  2. The node checks if the address is unique by using Duplication Address Detection (DAD) techniques. If the address is unique then it will be assigned to the interface, if not another interface identifier is required to generate a unique link-local address. Administrators can supply an alternate interface identifier. If not a manual configuration is required. In this phase, the node generates a link-local unique address.
  3. Nodes will listen to routers advertisements holding information to generate global addresses. Solicitation messages can also be sent by nodes to routers to avoid waiting for advertisement messages. If some specific flags are set then the node can use a prefix carried in the advertisement and apply it to the generated address. (Prefix concatenated with Interface ID). Actually this prefix is usually the subnet's prefix.
  4. Duplication Address Detection is required before assigning the global unique address to the interface. Some implementations applies DAD only to link local addresses and assumes it is globally unique if it passes the local test. However new techniques have been developed for privacy protection issues. An interface identifier can be generated randomly for example, then a clash with another global address might occurs [RFC 4941 - Privacy extensions for stateless address autoconfiguration].

Nodes still listens to routers advertisements mainly to reset/increase the prefix lifetime or valid time. When the prefix advertised is different from the one generated, then it is a renumbering case, the node will form a new address (new prefix, Interface ID) and adds it to the list of addresses assigned to the interface.


According to this RFC, the Duplication Address Detection is not fully reliable, it will generate a large overhead when testing a global unique IP address.
Other mechanisms to detect address collision should be pushed further.
Site renumbering is another argument to be added to the list in order to separate a host's name from its identity.
Generating the same suffix every time can expose the identity and location of a node which in some cases can be unwanted.

Link to RFC 4862

Monday, May 25, 2009

Review: 6LowPAN

Why IPv6 can’t be applied directly on LowPANs?
6LoWPAN defines IPv6 protocol over Low power Wireless Personal Area Networks.
LowPAN devices uses IEEE 802.15.4 radios. In order to understand why we
can’t use IPv6 directly on top of LowPAN devices let’s list the characteristics of such
devices [2]:
  • Small packet size. 81 bytes for data packets in the networking layer. (The maximum physical layer packet is 127 bytes, consequently the maximum frame size at the media access control layer is 102 octets. 21 bytes at maximum are usedfor securing link-layer communications).
  • Support for both 16-bit short or IEEE 64-bit MAC addresses.
  • Low cost devices: low bandwidth with data rates of (250 kbps, 40 kbps, 20 kbps) for (2.4 GHz, 915 MHz, 868 MHz). Low power typically battery dependent. Low processing and storage capabilities (8KB RAM, limited buffering, etc).
  • Topologies include star and mesh operation.
  • Large number of devices expected to be deployed.
The main problems with IP for LowPANs are the following:
The term Maximum Transmission Unit (MTU) refers to the size (in bytes) of the largest PDU that a given layer of a communications protocol can pass to other layers. A higher MTU brings greater efficiency because each packet carries more user data than protocol overheads. Large packets can occupy a slow link for some time, causing greater delays to following packets and increasing lag and minimum latency.
IPv6 protocol uses 128 bits IP address and a header of 40 bytes long. The MTU is at least 1280 bytes in order to maximize the efficiency of the transmission ((useful data)/(overhead data)).
LowPAN devices supports small packets with 81 bytes available for the networking layer and above. Using IPv6 on IEEE 802.15.4 leaves only 41 bytes for transport and applications layers. This is obviously not enough for data exchange, packet fragmentation and reassembly is needed but will use even more bytes. An IP header compression is needed.
Since large number of devices will be deployed, address auto configuration is attractive because it will reduce the overhead between devices. There is a need for a method to generate and assign an Identifier from the EUI-64 bits to a LowPAN device.
Routing protocols in mesh and star networks must be adapted to a small overhead. LowPAN devices have limited resources (memory, bandwidth, CPU, energy), processing 128 bits addresses and large headers will decrease the system’s efficiency and increase data treatment latency.

Why use IP based protocol in LowPANs?
The benefits of using IP based networks are the following [1][2]:
  • The hierarchy of naming and addressing in IP networks which simplify the connectivity model.
  • IP-based technologies already exist, are well-known, and proven to be working.
  • IP networking technology is specified in open and freely available specifications.
  • Use existing tools for diagnostics, management, and debugging of IP networks instead of designing and developing new ones.
  • IP-based devices can be connected easily to other IP-based networks, without the need for intermediate entities like translation gateways or proxies.
Addressing Modes
IEEE 802.15.4 uses IEEE 64 bit and 16 bit addresses. Short addresses are assigned by a PAN coordinator during an event which means that validity and uniqueness of such addresses are limited by the lifetime of the association, failure of the coordinator, etc [3].
For short addresses (16 bits), a pseudo 48 bit address is formed by concatenating 16 zero bits to the 16 bit PAN ID (Personal Area Network). If no PAN ID assigned then 16 bits of zeros are used and the resulting concatenation is a 32 bits address. These 32 bits are concatenated with the short address in order to obtain a 48 bit address.
From a 48 bit address, a 64 bit interface identifier is formed as in [5] and [6] by adding 0xFFFE in the middle of the 48 bits (24 bits,0xFFFE,24bits). IPv6 local addresses are formed using the 64 interface identifier by appending the prefix FE80::/64. (0xFE80::EUI-64) or global addresses based on information advertised by routers [7].
Routers are the link between IP and LowPAN networks, those routers will handle address transition.

Header types
6LowPAN proposes specific header encoding and compression mechanisms to adapt IPv6 into IEEE 802.15.4 frames. Separating headers will reduce overhead. If a device is sending short packets directly to another node it does not pay for extra fields such as Mesh networking or fragmentation. The overhead reduction is an energy saving.
The header types are the following:
  • The Dispatch Header (1 byte), define the type of header to follow. The dispatchheader is identified by the first two bits set to either 00 (non-6LowPAN frames)or 01. The remaining 6 bits indicate if the following field is an uncompressedIPv6 header or an HC1 header (IPv6 compressed header). To accomplish compression[1] the protocol uses a combination of the following facts: the low order64 bits of an IPv6 address (the link local address) can be the device’s MAC address,the 802.15.4 frame carries these MAC addresses, a number of the fields inthe IPv6 header are static.Combining all of these features allows the protocol to compress the standard 40 byte IPv6 header down to just 2 bytes (including the HC1 Header byte) for most intra-PAN unicast communication where source and destination addresses are deleted and generated from Link level frames (IEEE 802.15.4). All of the rest of the fields can be reconstituted without any state information at any of the receiving or intermediate nodes. Additionally by assigning the link local address to the device’s MAC address 6lowpan can use Stateless Address Auto configuration(Zero-conf) and eliminates the need to infrastructure servers like DHCP servers.
  • The Mesh Header (4 bytes) is used to encode the hop limit and the source and destination of the packet. It includes two single bit fields to indicate if the originating
    or final address is a short or long address. The “hops left” field is a 4 bit
    field used to limit the number of intermediate hops between the source and destination.
    The value of 0xF was reserved to indicate that an extra byte is included
    allowing for network depths of up to 255 hops.
  • The Fragmentation Header (4 bytes for the first fragment and 5 bytes for subsequent
    fragments) supports the fragmentation and reassembly of frames larger
    than the size of the 802.15.4 frame.



Today there is at least 6 implementations of 6LowPAN on multiple 802.15.4 radio platforms. The working group is still continuing to investigate the areas of neighbor discovery: IPv6 network prefix, local routers and other network configurations parameters.
The area of service discovery to locate other sensors and controllers and higher layer services.
Is it a good choice to push further IP based protocols to other areas such as LowPAN while efforts increases to redesign the Internet?
Should we inherit the limitation of IP which merges between naming and addressing to LowPANs?
What about cross-layering violation while compressing Ipv6 headers and regenerating source and destination address for Link level frames? According to the OSI scheme, Layers should be independent and unable to understand other layer’s data.
Even though in some cases header compression is efficient (with layer violation) other compressions remains unoptimized in LowPAN. Additional work should push routing protocols and reduce overhead of such protocols.

References:
1. The 6LoWPAN Architecture, Geoff Mulligan and 6LoWPAN Working Group, EmNets '07: Proceedings of the 4th workshop on Embedded networked sensors
2. RFC 4919: IPv6 over Low-Power Wireless Personal Area Networks (6LoWPANs):
Overview, Assumptions, Problem Statement, and Goals.
3. RFC4944 - Transmission of IPv6 Packets over IEEE 802.15.4 Networks
4. 6LoWPAN: Incorporating IEEE 802.15.4 into the IP architecture Internet Protocol
for Smart Objects (IPSO) alliance.
5. RFC 2464: Transmission of IPv6 Packets over Ethernet Networks
6. http://technet.microsoft.com/en-us/library/cc736439(WS.10).aspx
7. RFC 4862: IPv6 Stateless Address Auto configuration

Thursday, May 7, 2009

The Internet sky really is falling

This is really an intresting short review about the internet limitations.

Link to the page