Tuesday, April 21, 2009

Review: SENS - a Scalable and Expressive Naming System using CAN Routing Algorithm

This paper proposes a Scalable and Expressive Naming System (SENS) based on CAN routing algorithm. SENS is a descriptive naming scheme that uses a tuple of attribute/value pairs to name each resource. A resource such a computer is named as ( String OS= "Linux", string CPU-name = "Pentium 4", etc). Those informations are stored at a large number of name servers (NS).

Their design claims to achieve scalable and efficient resource information distribution and retrieval. SENS handles exact (MEMORY = 512 MB) and multi-attribute range queries (MEMORY > 512 MB) with small overhead as well as load balancing.

DESIGN OF SENS:
Mapping resource names to resource IDs:
A resource ID is considered as a set of d coordinates of a point in the d-dimensional resource ID space. (In the example below d = 6). A resource name is mapped to a resource ID by assigning the hash value of each attribute/value (a/v) pairs of the resource name to a coordinate value of the resource ID. Name servers are responsible for resource ID sets just like in the CAN system.
Ha is a function that uniformly hashes every attribute from 1 to d and Hv hashes every attribute value in a [1, 2^(m-1)] interval where m is the maximum size of coordinate value in bits.












If multiple attributes in a resource name are hashed to the same value Ha (attr i) = Ha (attr j) then the corresponding attribute values will be mapped to multiple coordinate values in the same dimension. This means that resource names are mapped to multiple resource IDs which are distributed on NSs.


Their mapping scheme is not injective, several resource names can be mapped to the same resource ID. Consequently resource ID is not a unique identifier of resource name, in order to identify a resource, the resource_ID (resulting form Hash) and its name are required to uniquely identify a resource.

In the case of numerical attribute values, a locality preserving hashing function is used. Such hashing function is defined as if (val1 > val2) then (Hval1 > Hval2). the main purpose behind using a locality preserving hashing function is to deal with range queries, in fact it ensures that a resource ID will be in a interval of resource IDs between a min and max value. By doing this they limit the number of NSs responsible for a query range.

If the attribute/value pairs number is lower than d then the resource ID is filled with zeros.
However when the attribute/value pairs number is higher than d then the set of attributes should be divided to multiple sets of attributes which corresponds to multiple resource names. (this aspect of fragmentation is not treated in this paper).

Resource Information Distribution:
Since their scheme is CAN based, zones are assigned to NS, consequently each NS manages resource information according to the resource ID.
In the case of a resource name mapping to multiple resource IDs. If the NS is responsible for several resource IDs of the same resource, only one copy in maintained at the NS. When resource IDs belongs to different zones i.e. different NS, they use a multicast routing algorithm based on Spanning Binomial Trees. Their algorithm sends minimum amount of messages to deliver information to a NS.
Resource IDs corresponding to a resource name construct a hypercube in the resource ID space according to this article (Optimum Broadcasting and Personalized Communication in the Hypercube) . (They don't explain how the hypercube is built, so further reviews will detail the construction algorithm of hypercubes).


The registration message containing information of the resource is first delivered to the NS responsible for the resource ID created from the lowest values of each resource IDs coordinates. (In this example root node is (0.0.0)). This NS becomes the root node and forwards the message to its descendants according to the tree. Those descendants also will forward the registration message to their descendants according to the tree. etc

Query resolution:
SENS supports:
  • Exact queries: A query host sends a message to a NS which will map the query resource name to resource IDs and select the nearest destination resource ID. The message arrives to destination using the CAN routing algorithm. The NS responsible for the resource ID will lookup its database to find the queried information and send it back to the initial NS (the one first queried by the host).
  • Range queries, in the case of a range query it will be limited by the hash values of the upper and lower limit of the queried value ranges in each dimension. When a host sends a range query message to a NS, the latter will map the query range to a range query segment in the resource ID space. A query message will be broadcasted to all NSs whose zones overlap the segment. They propose a broadcasting algorithm based on the SBT and hypercube in order to reduce the number of messages broadcasted. (Further details will be added once I read the article that treats the hypercube and SBT formation).
Related works:
This article propose a more expressive naming scheme than DNS which offers a value-limited resource name space without the possibility to realize the range query.
Other systems such as the Intentional naming system uses a descriptive name space based also on attribute/value pairs. The message routing for a name query is realized by look up of the query name on forwarding tables. Main limitation of such systems is scalability since the forwarding tables will grow with the number of resource names.
DHT-based routing protocols like Chord, CAN achieve a scalable and efficient lookup by hashing a resource name to a DHT key. Range query is the limitation of such systems, a query may spread to the hole DHT key space.
Other systems proposes range queries like MAAN where nodes are responsible for attribute values of an attribute in the query range. The node responsible for the attribute/value pairs (String OS = "Linux") must keeps information related to Linux OS. The main limitation is load balancing since popular attribute/value pairs may appear in resource names with high probability.

SENS is naming system capable of handling resource information with exact and multi-attribute range queries. They propose a multicast/broadcast algorithm to deliver/retrieve information.
However some issues remains unclear:
Several resource names might be mapped to same resource ID and the unique identification is done by resource ID and resource name. Is it enough to uniquely identify a resource? Is it possible to map 2 different resource names having same attribute values in common into the same resource ID? In 2005 Xiaoyun Wang and Hongbo Yu achieved collision on purpose in the MD5 hashing algorithm (2 different values were hashed to same ID).
Is SENS trying to merge between search engines and DNS-like systems?
Do we really need to merge between such systems? is it faster ? More reliable? More scalable ?
Is it a good approach when a resource is mapped to many resource IDs? Having multiple IDs will accelerate routing and finding the nearest ID? What if we mapped essential and important data to multiple IDs and restrained non important data to a single resource ID? How to design such a ranking system to classify data according to it's importance?
What if I updated my RAM from 512 MB to 1024 MB, how SENS manages such updates?
Is space partition efficient? when a node joins the system, a lot of information is handled to the new joining node ? Consequently huge messages are transmitted to the new node.

Link to the article

Friday, April 17, 2009

Review: The SATIN Component System—A Metamodel for Engineering Adaptable Mobile Systems

Readers might be wondering why is he writing about components now, well I am interested in the meta models. Such models can be useful when designing a naming and addressing system, the meta-data associated to each component will help to find a specific component.

This article suggests a component model in order to deal with mobile adaptation.
Applications are more often monolithic (one code block), which means hard to update and maintain when an event or change of context occurs.
Their model allows logical mobility defined as the migration of a partial or complete application or process from one host to another.

Component based development divide the system into interacting components with well defined interfaces. Traditional approaches do not support code migration and execution at runtime. They apply this mobility to components.

The SATIN component model allows to send and receive components dynamically at runtime. Instead of relying on the invocation of remote services, SATIN components are remotely cloned and instantiated locally.
This solution provides autonomy when networks are disconnected. Applications can introspect which components are available locally for a task, and dynamically change the system configuration by adding or removing components.

Component metadata is essential in order to best describe component's properties which are a set of its attributes (ID, version, dependencies with other components, hardware etc). The SATIN component system use attributes similar to the Debian packaging system. This allows applications running on heterogeneous hardware/framework to decide whether on not to deploy a component.
The central component of a SATIN system is the container component which acts as a registry, holds a reference to each component installed on the system. A registrar component is used for loading, removing components, managing dependencies etc.

They consider four aspects of Logical Mobility: components, classes, Instances and data types which is a bit stream. A Logical Mobility Entity (LME) is an abstract generalization of a class, instance or data types. Logical Mobility Unit (LMU) is an encapsulation of LMEs. LMU provides operations that permit inspection of the content, a handler class to deploy and manipulate it contents. Properties (set of attributes) of LMU are used to express dependencies, target environment etc. These LMUs and its contents are serialized/deserialized for transfer and deployment between physical hosts. LMU is always deployed in a reflective component which is able to adapt on runtime by receiving LMU from other hosts.
When a change in the context/environment appears (user/component triggered), the system asks the deployer for a component satisfying a set of attributes. The local deployer contacts a remote deployer and asks for a component. The container packs an LMU and sends it back to the host.
Component discovery and advertising is general. It can uses Jini, UPnP, etc.
In their implementation they used IP Multicast advertising group and a simple centralized publish/subscribe protocol.

So far my knowledge on mobile adaptively component systems is limited, comments on this topic will be added further.
However classic issues deserves to be explored such as:
How much time is needed to transfer a component in a mobile system?
Can we and how to guarantee such time in a dynamic and mobile system. Mobile nodes appears and disappears, the transfer of LMU will be interrupted due to node departure (this is MANET similar issues).
How to advertise for services and new nodes with a minimum overhead, message exchange between nodes?
How to manage resources allocated for every component in a limited resource environment/platforms?

Link to the article

Wednesday, April 15, 2009

Review: Toward a search architecture for software components

This paper proposes a design of a component search engine for Grid applications.
With the development of the component based programming model, applications are going to be more dynamically formed with the associations of components. Developers should be able to reuse already developed components that matches their needs. To do so, a component search engine seems to be essential.
The component search for Grid applications offers two facilities:
  1. Developers will be able to find the best component for their need.
  2. The framework can replace a malfunctioning or a slow component dynamically (at run time). The application should be able to decide the best component to be replaced with the malfunctioning.
They assume that open source Grid applications will appear and software components can be found on portals. These components will be ranked according to their usage, the more a component is used by applications the more important it is considered. This raking will establish a trust index. This approach is used by Google to rank the pages and improve search results.

One of the related works:
Agora components search engine supports the location and indexing of components and the search and retrieval of a component. Agora discovers automatically sites containing software components by crawling the web (Google's web crawler), when it finds a page containing an Applet tag, it downloads and indexes the related component. Agora supports JavaBeans and CORBA components. The database search is keyword based refined by users.

Workflows:
A workflow can be described as a process description of how tasks are done, by whom, in what order and how quickly.
Workflows are represented with low level languages such as BPEL4WS which requires too much user effort to describe a simple workflow.
Other high level language and Graphical User Interface on top of BPEL4WS are being introduced/build that generates BPEL code.

Their approach is workflow based: components can be adapted and coordinated through workflows. Applications should be able to choose and bind with other components from different sources on the Grid. Such applications searches first in its own local repository for components previously used or installed and uses a search engine to find suitable components.

The application development process can be divided into 3 stages:
  1. Application Sketching is when developers specifies: (1) An abstract workflow plan containing the way information passes through the application's parts. (2) A place-holder describing the functions and operations to be carried out. This description will help finding a list of suited components.
  2. Components discovering is based on 2 steps: First they resolve the place-holder query by searching in the local repository. If a suitable component is found locally than an identifier of this component is returned to the application. Second, If no component was found, a Query session is started on remote sites. A list of ranked components is returned and refined by user specifications.
  3. Application assembling is the binding phase. Data or protocol conversion are often needed due to the heterogeneous input/output between components (string to array of double conversion etc).
GRIDLE is their component search engine: Google like Ranking, Indexing and Discovery service for a Link-based Eco-system of software components. The main modules are the following:
  1. The Component Crawler is like a Web Crawler, it retrieves new components and updates links (bindings) between components and pass the results to the indexer.
  2. The Indexer will build the index data structure of GRIDLE. Characteristics and meta data associated to the component should be carefully selected to be indexed. Actually the meta information associated to components will help retrieve the suited one. Such Meta data can be: (1) Functional information like interfaces (published methods, names, signatures) and runtime environment. (2) Non functional information such as QoS and textual description. (3) Linking information to other components.
  3. The Query Analyzer resolves the queries on index basis, it uses a ranking module to retrieve the most relevant components, the search will be refined by the user.

To this stage, I don't have advanced knowledge in such systems and search engines but I find this approach interesting since the world of component development is emerging.
In the near future, thousands of components will be developed and ready to use. One of the main reasons of the wide adoption of the component based programming model is the ability to reuse already developed components and save time during the development process. A search engine seems to be necessary in order to find and locate suitable components.
Some issues in their approach remains unexplained or not clear such as:
  • Components will be updated , deleted, added, so how to determine the crawler iteration frequency in order to update the indexing?
  • The same question appears when dealing with Component binding, since the model is inspired from Web pages, I think that components are more dynamic when it deals with binding with other components. Bindings will dynamically (on runtime) appear/disappear when replacing a component, how to maintain the ranking of a component? What is the frequency of the component ranking algorithm ?
  • In their approach, first they search locally for a suited component. What if remote sites holds better suited components with higher ranks than those already placed in the local repository? What policy to use in order to keep updating the local repository?
  • The Crawling module searches for new components, do we need to insert an agent on every repository?
  • How to manage the heterogeneous aspects between components? COM and CORBA components?
  • Semantic web and ontology use might simplify the mapping and query even though it is considered to be a disadvantage for the designers of GRIDLE due to the unique usage of a unified taxonomy.

Link to the article
PS: According to the Ranking algorithm, the rank of the page hosting the article increased while the rank of my blog is decreasing, actually I am offering a portion of my page's rank.Lien

Review: An IPv6-Based Identification Scheme

This article presents an IPv6 identification scheme to identify physical objects with
RFID tags. The identification is needed in different fields such as locating objects,
health care monitoring, military operations etc.

Their scheme is based on IPv6 unicast address:

010.Registry_ID[5].Provider_ID[16].0.8].Subscriber_ID[24].0[8].Subnet_ID[16].Interface[48].

(note that the "‘."’ is used as a concatenation operator and X[n] where n indicates the
number of bits used to code the field X).

Register_ID is allocated to organizations responsible for assigning network addresses.
Provider_ID is allocated to Internet service provider. 0[8] future extension.
To identify objects they propose to use the unassigned IPv6 namespace that has the
binary prefix "‘001"’ with two formats:

General ID:
0010.Agency_ID[5].Domain_name[48].0[7].Object_Class[16].Serial_Number[48].

Agency_ID is analogous to registry ID, the agency is responsible for allocating the
identifier. Domain name for company or organization. 0[7] future use. Object class to
identify object types. Serial number ID of an object type.

Pseudo Random ID:
0011.Agency_ID[5].Random_Number[119]

This scheme provides more privacy, it does not reveals the ID of the company and other information. They clearly distinguish between and IPv6 address to locate an object and IPv6 ID
to identify and object. The prefix translation is what they propose to translate between an IPv6 ID (prefix 001) and an IPv6 address prefix (010) which means that the IPv6 ID may be used to
obtain an IPv6 address. Since physical objects are mobile, they propose the following
two methods to track objects:

Name System: (Same ID, multiple addresses)
They propose to use canonical name written in a reverse order in which they are constructed.
They reverse it so it can be used as a URL DNS like. By doing this, they can integrate their scheme in existing systems like DNS.

Serial_number.object-class.company.organization.obj.
DNS query will start at "obj"’ level than it will go from right to left.

When objects moves, we need only to update DNS records which maps a name into an address. When objects moves to a different domain, the new owner of objects should update the DNS record. Since DNS is not suited for updates, a localization service provided by the proxy should handle the update.

Address forward scheme:
They use the home agent approach (see Mobile IP paragraph).
They assume that routers are configured to distinguish identifiers from addresses.
We can search for an object by its ID because routers will translate the IPv6 ID into an IPv6 address by modifying the 3 bits prefix. The ID will remain the same and so the address. Objects of the same owner are assigned to a dedicated proxy. The proxy’s address will have same domain prefix. When a router receives an object ID, it translates it and forwards it to the correspondent proxy according to the domain name. When an object moves, its ID and so its address remains the same. The object updates his home proxy with the new location where it moved recently. It also informs the new proxy about its ID. Routers will forward the packets to the proxy according to the domain name. This proxy will have the same role as a home agent in an IP mobile. The proxy will forward requests to the new proxy where the object has moved.

The approach is very comprehensive specially the mapping between ID and address with the 3 bits prefix and the facility of integration in today’s system without major modifications and without a need to query directories like DNS.
However since objects are usually manufactured in thousands and millions (Gillette raisers) when a container moves from an owner to another. The traffic update will generate a massive overhead between owners.
We are faced to the same problems when dealing with mobile IP. We lost object’s trace during transition from one proxy to another.
Not all companies have the same productions capacities. Small companies manufacturers small number of items which means that small companies will not use the IDs assigned to it, while big companies will exhaust ID in a short time (comparing to small companies). Should small companies share the domain ? This means that proxies of small companies cannot use a routing mask when dealing with IDs and IP addresses.
Two companies can share same domain, these 2 companies will have same prefix and since it cannot apply a mask. A proxy will list all the IDs of owned objects. Proxies will be overloaded then and I/O time query will be slow due to the huge amount of data.
Mobile IP approach what if home proxy or company owner of the domain is closed for economic or other reasons. How to maintain the address forwarding?

Link to the article

Thursday, April 2, 2009

Review: Infrastructure for Pervasive Computing: Challenges

This paper enumerates the challenges faced today to design an infrastructure for pervasive computing.
Their conceptual model is based on four components: devices, users, software components and users interface.
In the following paragraph a summary of the challenges related to the naming and addressing issues of devices and software components.

Devices:
  • Heterogeneity is expected to increase in the future, devices will include sensors and everyday objects such as watches, shoes etc. Those devices will require to interact despite differences in resources (hardware and software capabilities). An infrastructure is needed to manage the integration of those devices in the system.
  • Mobility is faced to problems related to maintaining connectivity while devices move between networks (See IP mobile section). Disconnected networks can be handled by replicated data.

Software Components:
  • Mobility and distribution: since users are mobile they can access several devices simultaneously. Mechanisms should enable the mobility and distribution of software. The support for distribution need to be extended to address mobility, synchronization and coordination between software components. Platforms like CORBA offers only transparency of distributed communication.
  • Context awareness: applications will rely more on context knowledge than information based on user's input.
  • Adaptation: The infrastructure should be able to facilitate adaptation such as reconfiguring software components, binding, adding, removing components.
  • Interoperability: applications will be required to respond to new tasks and situations. Applications would be dynamically assembled from existing and available components. Those components should acquire knowledge of each other's interfaces and behavior to interact with other unknown heterogeneous components.
  • Component discovery has been addressed in many research areas. Resource discovery is supported by a type management repository which maintains descriptions of service interface types. Another solution is addressed by network directory protocols such as LDAP. As resource discovery tried to solve the problem differently, the main challenge would be to integrate the different approaches into a scalable resource discovery by mapping requests between resource discovery domains.
  • Scalability: the increasing number of devices and softwares requires a powerful scalable software platform on which distributed components will be built on.
The following paragraph presents existing approaches in the domain of service platforms:
Service platforms allow the rapid creation and deployment of services, as well dynamic service discovery.
JINI based on Java and RMI. Services can be added and removed dynamically.
JINI proposes among others a look up service used by clients to locate services.

MOCA aims to satisfy requirements of mobile environments. It provides dynamic service discovery, limited forms of adaptation to changes due to mobility, support for device heterogeneity.
JINI and MOCA provides solution to small scale environments.

The NINJA service framework addresses large scale services. Service discovery is supported by hierarchical arrangement of service discovery services.



Link to the article

Review: BonAHA - Service Discovery Framework for Mobile Ad-Hoc Applications

This article proposes a framework for service discovery. It is based on the mutlicast DNS (used also by apple bonjour). BonAHA as its name indicates is used for Ad-Hoc applications. (Bonjour for Ad-Hoc Application).

Service discovery protocols in Ad-Hoc require the developer to add network monitoring, node arrival/departure in/from the network. Applications need to be aware of such changes in a such mobile and dynamic networks.
However Bonjour is not suitable for Ad-Hoc networks due to the absence of a state view of the network.
BonAHA uses a concept of service. Application can register/listen to a particular service on a network. Service names are DNS-like.
Services are discovered by instantiating a service object and registering it to respond to network events.

Applications announcing a service follows these steps:
* Create a service object with the name of the service
* Set any metadata associated to the object
* Register it

Applications listening to service announcements follows these steps:
* Create a service object with the name of the service
* Set an event handler object for this service
* The class handling events for the service will handle events corresponding to node updates.
* Metadata associated with that node can be retrieved from the nodes.

The BonAHA API has two classes and one interface:
Class Bservice: allows to construct a service instance.
Interface BListener: handles node entry/update and departure.
Class BNode: correspnd to a node in the network offering a service and exposes metadata such as host name, host address, service name etc.



This article proposes a framework for service discovery and an API to allow developers to concentrate on the application part without handling the topology modifications.
Some issues remains unclear:
Services are not published on a central node due to the Ad-Hoc characteristic.
A node will listen then to service announcements. This means that it is not aware of services currently available in the network because it arrived after the announcements.
When registering a new services, announcement is broadcasted over the whole network? This does not results a huge overhead for every service announcement.
Can we use DHT table approach to store metadata associated to services available on the network?
What is the cost for maintaining service's updates due to node arrival/departure?

Link to the article