Showing posts with label P2P. Show all posts
Showing posts with label P2P. Show all posts

Friday, June 12, 2009

The shadow network architecture

Interesting link reveals the possible use of hidden botnets in networks.

Link

Thursday, June 11, 2009

Review: Design considerations for a network of Information

In this position paper they insist on the fact that the Internet has to be reshaped and be focused on data instead of endpoints, in order to have a data centric network or network of information.
According to Jacobson [1], the first generation of the network dealt with connecting wires. The second one focused on end nodes hosting data while the third generation should refocus on what humans care the most about Information.

Their information models distinguish between 2 main objects:
Data Objects (DO): are the actual bit-patterns that contain the information: such as file, phone call, video, web page, a song(Beethoven's 9th symphony) etc. These data objects can be divided into "chunks" smaller pieces in order to simplify the transfer.
Information Objects (IO): holds semantic information and meta-data related to data objects, such as Beethoven's 9th symphony is an mp3 file encoded with 128kbps, IOs can be composed of other IOs or directly pointing to one or multiple DOs. An IO can represent the Eiffel tower and point to DO like pictures, a wiki page or service to buy tickets, etc.

Versioning and Revocation:
Since some information is frequently changing such as news papers. An IO can represent today's version, however the IO should adapt dynamically by binding to another DO (web page) the next day, and so on for the IO pointing to yesterday's news.
They suggest that objects invalidate themselves in order to conserve consistency. After an amount of time, objects should be recertified before it can be used. By applying this technique, they maintain consistency due to disconnected operation during information update of other replicas for example. DO can be deleted same way when no certification is given.

Security considerations:
Security in today's architecture is based on confidence (Encryption keys) of the host delivering the object, they propose to reshape security conventions so we can handle secured data instead of secured tunnels.
Integrity and authenticity is directly tied to object's names which means there would be a cryptographic relation between the name and object such as self-certifying names. However to enable off-line verification, DO must carry private keys which can be compromised. Another approach is to assign precomputed signatures to objects. It remains a research field.

Name resolution(NR):
Data objects are retrieved based on their unique identity (UID). NR starts with locating the object in the network then routing forwards the object retrieval query to its storage location and finally the DO is sent to the requesting client.
The naming resolution resolves an UID into one or more locations and should work on global and local scale by cooperating between NR systems for example. They show the side effects if ID/address split mechanism is adopted with the following example, if a laptop hosting numerous Data Objects moves its location then all Data objects location changes too. This will lead to huge number of updates in the NR system.
The NR system will be influenced by the characteristics of namespaces. They would like to adopt flat names which respects the non right of ownership and other characteristics revealed by [2].
Off course, using flat names prevents the use of hierarchical names spaces and systems like DNS.
DHT based solutions are promising since the are decentralized, scalable, self-organized and don't need central structure. However when going globally DHT uses flat names hence non hierarchical names which prevents cooperation with other systems.

Routing:
Addressable entities are still increasing and will reach millions even billions in few years with the emergence of sensor networks, Internet of things, growing data etc. They claim that routing research are not encouraging according to [3] (will be reviewed later). Hence, they ll investigate the efficiency of name based routing which integrate both resolution and retrieval paths. Name based routing will locate DO based on their ID by transforming the ID directly into a path without going through ID-address transition. Other techniques such as LLc and NodeID are to be investigated also (Soon will be reviewed).

Storage:
The information network can be implemented following two different models:
  • Network based storage model where storage resources are provided by the network infrastructure, like dedicated storage servers.
  • Network managed storage model where network nodes control portions of storage memory of users connected to the network. Users will be able to decide what DO goes public or be shared only with friends etc.

Search:
Search systems are expected to go far beyond text match search, such as semantic search or even search functionality based on GPS position, location positioning. For example when a picture of Eiffel tower is taken, a search mechanism will handle the identification of the monument based on GPS or other techniques and points to DO related informations such as web page, history etc.

This position paper gives many ideas and anticipations about the future Internet architecture and reveals the weakness in the current addressing system. They distinguish between DO and IO and argued that a network of information needs a scalable naming system supported by an efficient routing system.

References:
1 - V. Jacobson, M. Mosko, D. Smetters, and J. Garcia-Luna-Aceves. Content-centric networking. Whitepaper, Palo Alto Research Center, Jan. 2007.
2 - M. Walfish, H. Balakrishnan, and S. Shenker. Untangling the web from DNS. In NSDI’04: Proc. 1st Symp. on Networked Systems Design and Implementation, San Francisco, CA, USA, 2004.

Link to the article

Tuesday, April 21, 2009

Review: SENS - a Scalable and Expressive Naming System using CAN Routing Algorithm

This paper proposes a Scalable and Expressive Naming System (SENS) based on CAN routing algorithm. SENS is a descriptive naming scheme that uses a tuple of attribute/value pairs to name each resource. A resource such a computer is named as ( String OS= "Linux", string CPU-name = "Pentium 4", etc). Those informations are stored at a large number of name servers (NS).

Their design claims to achieve scalable and efficient resource information distribution and retrieval. SENS handles exact (MEMORY = 512 MB) and multi-attribute range queries (MEMORY > 512 MB) with small overhead as well as load balancing.

DESIGN OF SENS:
Mapping resource names to resource IDs:
A resource ID is considered as a set of d coordinates of a point in the d-dimensional resource ID space. (In the example below d = 6). A resource name is mapped to a resource ID by assigning the hash value of each attribute/value (a/v) pairs of the resource name to a coordinate value of the resource ID. Name servers are responsible for resource ID sets just like in the CAN system.
Ha is a function that uniformly hashes every attribute from 1 to d and Hv hashes every attribute value in a [1, 2^(m-1)] interval where m is the maximum size of coordinate value in bits.












If multiple attributes in a resource name are hashed to the same value Ha (attr i) = Ha (attr j) then the corresponding attribute values will be mapped to multiple coordinate values in the same dimension. This means that resource names are mapped to multiple resource IDs which are distributed on NSs.


Their mapping scheme is not injective, several resource names can be mapped to the same resource ID. Consequently resource ID is not a unique identifier of resource name, in order to identify a resource, the resource_ID (resulting form Hash) and its name are required to uniquely identify a resource.

In the case of numerical attribute values, a locality preserving hashing function is used. Such hashing function is defined as if (val1 > val2) then (Hval1 > Hval2). the main purpose behind using a locality preserving hashing function is to deal with range queries, in fact it ensures that a resource ID will be in a interval of resource IDs between a min and max value. By doing this they limit the number of NSs responsible for a query range.

If the attribute/value pairs number is lower than d then the resource ID is filled with zeros.
However when the attribute/value pairs number is higher than d then the set of attributes should be divided to multiple sets of attributes which corresponds to multiple resource names. (this aspect of fragmentation is not treated in this paper).

Resource Information Distribution:
Since their scheme is CAN based, zones are assigned to NS, consequently each NS manages resource information according to the resource ID.
In the case of a resource name mapping to multiple resource IDs. If the NS is responsible for several resource IDs of the same resource, only one copy in maintained at the NS. When resource IDs belongs to different zones i.e. different NS, they use a multicast routing algorithm based on Spanning Binomial Trees. Their algorithm sends minimum amount of messages to deliver information to a NS.
Resource IDs corresponding to a resource name construct a hypercube in the resource ID space according to this article (Optimum Broadcasting and Personalized Communication in the Hypercube) . (They don't explain how the hypercube is built, so further reviews will detail the construction algorithm of hypercubes).


The registration message containing information of the resource is first delivered to the NS responsible for the resource ID created from the lowest values of each resource IDs coordinates. (In this example root node is (0.0.0)). This NS becomes the root node and forwards the message to its descendants according to the tree. Those descendants also will forward the registration message to their descendants according to the tree. etc

Query resolution:
SENS supports:
  • Exact queries: A query host sends a message to a NS which will map the query resource name to resource IDs and select the nearest destination resource ID. The message arrives to destination using the CAN routing algorithm. The NS responsible for the resource ID will lookup its database to find the queried information and send it back to the initial NS (the one first queried by the host).
  • Range queries, in the case of a range query it will be limited by the hash values of the upper and lower limit of the queried value ranges in each dimension. When a host sends a range query message to a NS, the latter will map the query range to a range query segment in the resource ID space. A query message will be broadcasted to all NSs whose zones overlap the segment. They propose a broadcasting algorithm based on the SBT and hypercube in order to reduce the number of messages broadcasted. (Further details will be added once I read the article that treats the hypercube and SBT formation).
Related works:
This article propose a more expressive naming scheme than DNS which offers a value-limited resource name space without the possibility to realize the range query.
Other systems such as the Intentional naming system uses a descriptive name space based also on attribute/value pairs. The message routing for a name query is realized by look up of the query name on forwarding tables. Main limitation of such systems is scalability since the forwarding tables will grow with the number of resource names.
DHT-based routing protocols like Chord, CAN achieve a scalable and efficient lookup by hashing a resource name to a DHT key. Range query is the limitation of such systems, a query may spread to the hole DHT key space.
Other systems proposes range queries like MAAN where nodes are responsible for attribute values of an attribute in the query range. The node responsible for the attribute/value pairs (String OS = "Linux") must keeps information related to Linux OS. The main limitation is load balancing since popular attribute/value pairs may appear in resource names with high probability.

SENS is naming system capable of handling resource information with exact and multi-attribute range queries. They propose a multicast/broadcast algorithm to deliver/retrieve information.
However some issues remains unclear:
Several resource names might be mapped to same resource ID and the unique identification is done by resource ID and resource name. Is it enough to uniquely identify a resource? Is it possible to map 2 different resource names having same attribute values in common into the same resource ID? In 2005 Xiaoyun Wang and Hongbo Yu achieved collision on purpose in the MD5 hashing algorithm (2 different values were hashed to same ID).
Is SENS trying to merge between search engines and DNS-like systems?
Do we really need to merge between such systems? is it faster ? More reliable? More scalable ?
Is it a good approach when a resource is mapped to many resource IDs? Having multiple IDs will accelerate routing and finding the nearest ID? What if we mapped essential and important data to multiple IDs and restrained non important data to a single resource ID? How to design such a ranking system to classify data according to it's importance?
What if I updated my RAM from 512 MB to 1024 MB, how SENS manages such updates?
Is space partition efficient? when a node joins the system, a lot of information is handled to the new joining node ? Consequently huge messages are transmitted to the new node.

Link to the article