Wednesday, April 15, 2009

Review: Toward a search architecture for software components

This paper proposes a design of a component search engine for Grid applications.
With the development of the component based programming model, applications are going to be more dynamically formed with the associations of components. Developers should be able to reuse already developed components that matches their needs. To do so, a component search engine seems to be essential.
The component search for Grid applications offers two facilities:
  1. Developers will be able to find the best component for their need.
  2. The framework can replace a malfunctioning or a slow component dynamically (at run time). The application should be able to decide the best component to be replaced with the malfunctioning.
They assume that open source Grid applications will appear and software components can be found on portals. These components will be ranked according to their usage, the more a component is used by applications the more important it is considered. This raking will establish a trust index. This approach is used by Google to rank the pages and improve search results.

One of the related works:
Agora components search engine supports the location and indexing of components and the search and retrieval of a component. Agora discovers automatically sites containing software components by crawling the web (Google's web crawler), when it finds a page containing an Applet tag, it downloads and indexes the related component. Agora supports JavaBeans and CORBA components. The database search is keyword based refined by users.

Workflows:
A workflow can be described as a process description of how tasks are done, by whom, in what order and how quickly.
Workflows are represented with low level languages such as BPEL4WS which requires too much user effort to describe a simple workflow.
Other high level language and Graphical User Interface on top of BPEL4WS are being introduced/build that generates BPEL code.

Their approach is workflow based: components can be adapted and coordinated through workflows. Applications should be able to choose and bind with other components from different sources on the Grid. Such applications searches first in its own local repository for components previously used or installed and uses a search engine to find suitable components.

The application development process can be divided into 3 stages:
  1. Application Sketching is when developers specifies: (1) An abstract workflow plan containing the way information passes through the application's parts. (2) A place-holder describing the functions and operations to be carried out. This description will help finding a list of suited components.
  2. Components discovering is based on 2 steps: First they resolve the place-holder query by searching in the local repository. If a suitable component is found locally than an identifier of this component is returned to the application. Second, If no component was found, a Query session is started on remote sites. A list of ranked components is returned and refined by user specifications.
  3. Application assembling is the binding phase. Data or protocol conversion are often needed due to the heterogeneous input/output between components (string to array of double conversion etc).
GRIDLE is their component search engine: Google like Ranking, Indexing and Discovery service for a Link-based Eco-system of software components. The main modules are the following:
  1. The Component Crawler is like a Web Crawler, it retrieves new components and updates links (bindings) between components and pass the results to the indexer.
  2. The Indexer will build the index data structure of GRIDLE. Characteristics and meta data associated to the component should be carefully selected to be indexed. Actually the meta information associated to components will help retrieve the suited one. Such Meta data can be: (1) Functional information like interfaces (published methods, names, signatures) and runtime environment. (2) Non functional information such as QoS and textual description. (3) Linking information to other components.
  3. The Query Analyzer resolves the queries on index basis, it uses a ranking module to retrieve the most relevant components, the search will be refined by the user.

To this stage, I don't have advanced knowledge in such systems and search engines but I find this approach interesting since the world of component development is emerging.
In the near future, thousands of components will be developed and ready to use. One of the main reasons of the wide adoption of the component based programming model is the ability to reuse already developed components and save time during the development process. A search engine seems to be necessary in order to find and locate suitable components.
Some issues in their approach remains unexplained or not clear such as:
  • Components will be updated , deleted, added, so how to determine the crawler iteration frequency in order to update the indexing?
  • The same question appears when dealing with Component binding, since the model is inspired from Web pages, I think that components are more dynamic when it deals with binding with other components. Bindings will dynamically (on runtime) appear/disappear when replacing a component, how to maintain the ranking of a component? What is the frequency of the component ranking algorithm ?
  • In their approach, first they search locally for a suited component. What if remote sites holds better suited components with higher ranks than those already placed in the local repository? What policy to use in order to keep updating the local repository?
  • The Crawling module searches for new components, do we need to insert an agent on every repository?
  • How to manage the heterogeneous aspects between components? COM and CORBA components?
  • Semantic web and ontology use might simplify the mapping and query even though it is considered to be a disadvantage for the designers of GRIDLE due to the unique usage of a unified taxonomy.

Link to the article
PS: According to the Ranking algorithm, the rank of the page hosting the article increased while the rank of my blog is decreasing, actually I am offering a portion of my page's rank.Lien

No comments:

Post a Comment