Their design claims to achieve scalable and efficient resource information distribution and retrieval. SENS handles exact (MEMORY = 512 MB) and multi-attribute range queries (MEMORY > 512 MB) with small overhead as well as load balancing.
DESIGN OF SENS:
Mapping resource names to resource IDs:
A resource ID is considered as a set of d coordinates of a point in the d-dimensional resource ID space. (In the example below d = 6). A resource name is mapped to a resource ID by assigning the hash value of each attribute/value (a/v) pairs of the resource name to a coordinate value of the resource ID. Name servers are responsible for resource ID sets just like in the CAN system.
Ha is a function that uniformly hashes every attribute from 1 to d and Hv hashes every attribute value in a [1, 2^(m-1)] interval where m is the maximum size of coordinate value in bits.
If multiple attributes in a resource name are hashed to the same value Ha (attr i) = Ha (attr j) then the corresponding attribute values will be mapped to multiple coordinate values in the same dimension. This means that resource names are mapped to multiple resource IDs which are distributed on NSs.
Their mapping scheme is not injective, several resource names can be mapped to the same resource ID. Consequently resource ID is not a unique identifier of resource name, in order to identify a resource, the resource_ID (resulting form Hash) and its name are required to uniquely identify a resource.
In the case of numerical attribute values, a locality preserving hashing function is used. Such hashing function is defined as if (val1 > val2) then (Hval1 > Hval2). the main purpose behind using a locality preserving hashing function is to deal with range queries, in fact it ensures that a resource ID will be in a interval of resource IDs between a min and max value. By doing this they limit the number of NSs responsible for a query range.
If the attribute/value pairs number is lower than d then the resource ID is filled with zeros.
However when the attribute/value pairs number is higher than d then the set of attributes should be divided to multiple sets of attributes which corresponds to multiple resource names. (this aspect of fragmentation is not treated in this paper).
Resource Information Distribution:
Since their scheme is CAN based, zones are assigned to NS, consequently each NS manages resource information according to the resource ID.
In the case of a resource name mapping to multiple resource IDs. If the NS is responsible for several resource IDs of the same resource, only one copy in maintained at the NS. When resource IDs belongs to different zones i.e. different NS, they use a multicast routing algorithm based on Spanning Binomial Trees. Their algorithm sends minimum amount of messages to deliver information to a NS.
Resource IDs corresponding to a resource name construct a hypercube in the resource ID space according to this article (Optimum Broadcasting and Personalized Communication in the Hypercube) . (They don't explain how the hypercube is built, so further reviews will detail the construction algorithm of hypercubes).
The registration message containing information of the resource is first delivered to the NS responsible for the resource ID created from the lowest values of each resource IDs coordinates. (In this example root node is (0.0.0)). This NS becomes the root node and forwards the message to its descendants according to the tree. Those descendants also will forward the registration message to their descendants according to the tree. etc
Query resolution:
SENS supports:
- Exact queries: A query host sends a message to a NS which will map the query resource name to resource IDs and select the nearest destination resource ID. The message arrives to destination using the CAN routing algorithm. The NS responsible for the resource ID will lookup its database to find the queried information and send it back to the initial NS (the one first queried by the host).
- Range queries, in the case of a range query it will be limited by the hash values of the upper and lower limit of the queried value ranges in each dimension. When a host sends a range query message to a NS, the latter will map the query range to a range query segment in the resource ID space. A query message will be broadcasted to all NSs whose zones overlap the segment. They propose a broadcasting algorithm based on the SBT and hypercube in order to reduce the number of messages broadcasted. (Further details will be added once I read the article that treats the hypercube and SBT formation).
This article propose a more expressive naming scheme than DNS which offers a value-limited resource name space without the possibility to realize the range query.
Other systems such as the Intentional naming system uses a descriptive name space based also on attribute/value pairs. The message routing for a name query is realized by look up of the query name on forwarding tables. Main limitation of such systems is scalability since the forwarding tables will grow with the number of resource names.
DHT-based routing protocols like Chord, CAN achieve a scalable and efficient lookup by hashing a resource name to a DHT key. Range query is the limitation of such systems, a query may spread to the hole DHT key space.
Other systems proposes range queries like MAAN where nodes are responsible for attribute values of an attribute in the query range. The node responsible for the attribute/value pairs (String OS = "Linux") must keeps information related to Linux OS. The main limitation is load balancing since popular attribute/value pairs may appear in resource names with high probability.
SENS is naming system capable of handling resource information with exact and multi-attribute range queries. They propose a multicast/broadcast algorithm to deliver/retrieve information.
However some issues remains unclear:
Several resource names might be mapped to same resource ID and the unique identification is done by resource ID and resource name. Is it enough to uniquely identify a resource? Is it possible to map 2 different resource names having same attribute values in common into the same resource ID? In 2005 Xiaoyun Wang and Hongbo Yu achieved collision on purpose in the MD5 hashing algorithm (2 different values were hashed to same ID).
Is SENS trying to merge between search engines and DNS-like systems?
Do we really need to merge between such systems? is it faster ? More reliable? More scalable ?
Is it a good approach when a resource is mapped to many resource IDs? Having multiple IDs will accelerate routing and finding the nearest ID? What if we mapped essential and important data to multiple IDs and restrained non important data to a single resource ID? How to design such a ranking system to classify data according to it's importance?
What if I updated my RAM from 512 MB to 1024 MB, how SENS manages such updates?
Is space partition efficient? when a node joins the system, a lot of information is handled to the new joining node ? Consequently huge messages are transmitted to the new node.
Link to the article
No comments:
Post a Comment