About IDIS
Staff
Contact
Disclaimer
Basin Kit
Data Access
Data Market
IDIS Nodes
Contributors
IDIS News Letter
LOGIN

 

Introduction to IDIS
The Integrated Database Information System (IDIS) is an on-line data sharing platform that provides access to water, agriculture and environment scientific data for poverty alleviation. The main goal of IDIS is to help IWMI and CPWF scientists as well as their research partners improve the productivity of water in river basins. IDIS contains over 1 billion time series records with geographical focus in the IWMI and CPWF river basins. IDIS is funded by the CPWF and IWMI and hosted in Colombo, Sri Lanka. Please go through the following sections to learn more about IDIS - you can use the links of the table of contents to the right to directly point to your subject of interest.

 

Rationale for IDIS
It is well accepted and recognized the time and efforts spent by researchers in gathering, managing and analyzing data are significant. Such data is usually located in different places, stored under different file formats, organized according to varying data structures and very often not documented. As a result it is very difficult to make use of such data and ensure it is adequately maximized. The lack of appropriate data inventories, data sharing agreements and understanding of intellectual property rights (further restrain the identification and usage of the data. The efficiency of the research lifecycle is significantly affected by the above constrains hence resulting in high costs required to adequately deal with data management. Advances in computer science technology and internet communication have created significant opportunities to access shared data as well as distributed data. The costs of storage and data management software have drastically decreased while their quality improved. As a result high volumes of data are now increasingly available through the internet. Intellectual property rights are now better understood and recognized.

 

Vision and Goal
Researchers should spend less time on data management and focus more on research and data analysis. Institutional data needs to be stored and managed adequatel to ensure data use by future scientists thereby enhancing the utility of the data and helping reduce the length of the research lifecycle. Data banks need to be established at appropriate levels to facilitate easy access to data. Advantage must be taken from existing opportunities in technology and communication to help resolve constrains.
top

 

IDIS Activities
4 pillar activities support IDIS. Each one is described below.

 

  • Data Inventory
    The goal of data inventory is to build a searchable repository that allows researchers, through the internet, to clearly identify what data is available from who in each basin and when possible to directly access it through IDIS for inspection and download. Since late 2004 detailed data inventories have been carried in the Olifants, Ruhuna, Karkheh and Volta basins to determine what data is available and from which source. This data inventory was often carried for projects which required rapid access to such information. In cases where the data could not be retrieved it was documented in the repository and links to original sources were provided. In cases where data could be retrieved it was documented and incorporated into IDIS so that it becomes available through IDIS2. All available data for the above basins was documented and incorporated into IDIS2. Similarly, all rainfall and streamflow data consolidated for all basins for the IDIS2 prototype was documented and incorporated into IDIS2. According to best practices, the documentation of all these data complies with the FGDC metadata schema which is the de facto international metadata standard. Each metadata record clearly explains who is the originator of the data, the data unit, the geographical coverage of the data, the temporal extend of the data, the accuracy of the data, etc. The metadata repository is supported by GeoNetwork. GeoNetwork is an open source (free of copyrights) metadata search engine that allows to easily share metadata between the CGIAR centers, UN Agencies, NGO's and other institutions.
  • Data Loading
    The goal of data loading is to efficiently and promptly extract data identified as relevant to IDIS2 through data inventory and load it into IDIS2 so that it becomes available for data sharing. In 2004 all rainfall and streamflow data available across basins was loaded into the IDIS2 prototype for data sharing. From early 2005 up to now all data made available by the Olifants, Ruhuna, Karkheh, Volta and Sao Francisco basins was loaded into IDIS2. Such data loading activities can hardly be automated due to the very diverse nature of each dataset that is received e.g. data file formats, units, locations, data structures (how the data is arranged in the file), etc. As a result a generic data extraction, transformation and loading (ETL) workflow was designed and implemented to supply data to IDIS2. This ETL workflow guarantees that optimized procedures can efficiently be applied whenever possible. It also permits complete tracking and lineage monitoring for every single record entering IDIS2. Since its implementation this workflow yields an average of 100.000 records per day into IDIS2 independently of the nature of the data, its file structure, its format, its unit, etc.
  • System Development
    The goal of system development is to design, implement, maintain and continuously improve the technical architecture of IDIS. Simply put this architecture is composed of the web layer (all web pages & services) which retrieves data from the database layer (which contains all stored data) as well as from the metadata layer (which contains the associated metadata). All these layers are intricately linked and depend on each other for adequate operation of IDIS. Since early 2004 the IDIS2 system development followed an iterative approach as it was both cheaper and better suited. According to this approach an iterative set of growing system development lifecycles are under carried to gradually meet all Users’ requirements while always making sure core Users’ needs are satisfied first. This approach advocates for the sequential release of system versions which increasingly match the functions and data of the target system. The prototype demonstrates the proof of concept and validates Users’ requirements. The pilot features all core functions as well as all data that can be easily made available. Finally, the target system operates all functions and all data. During 2004 the IDIS2 Prototype was designed, tested, implemented, released in October 2004 and very successfully evaluated by its Users in December 2004. Each layer of this prototype (web, database & metadata) followed a generic structure based on best practices. That was easy to understand, maintain and operate. In opposite, the linkages between the database, the web and the metadata layers were hard coded to save time and quickly deliver a proof-of-concept. As a result the prototype’s operation required a high level of support and no additional data type could be easily added without being hard coded again. Such limitation is common of all prototypes and does not create issues which are not known or cannot be delt with. Throughout 2005 the architecture of the target system was designed and tested based on best practices and lessons learned from the IDIS2 prototype. The design and implementation of each layer was revisited, optimized and automated whenever possible. The inter linkage of all layers was designed to be as generic and as automated as possible. While in the IDIS2 prototype all data was stored in a few common tables the design of the pilot involved splitting the data storage from each basin into a separated and dedicated database for faster execution of queries, improved maintenance and backup services. A new security layer was introduced to control all data access according to policies, data agreements and intellectual property rights. A new metadata search engine, GeoNetwork, was incorporated into the IDIS2 architecture to ensure all IDIS2 metadata was easily exchanged with all CGIAR centers, UN Agencies, NGO's and other institutions. In 2006, the final design of the IDIS2 pilot was implemented and all available data from all basins was loaded into it. The web layer was coded and tested extensively to ensure a flawless operational environment. In May 2006 the IDIS2 pilot will be launched.
  • Data usage enhanced
    The goal of data usage enhanced is to promote good data management best practices and ensure adequate support to projects which require assistance in dealing with data. In 2006 such assistance was mainly provided to two projects : the E-Flow project (Vladimir Smakhtin) and the Wetland project (Max Finlayson & Sanjiv de Silva). In May 2006, MS Excel based data templates will be launched to promote adequate and standardized storage of data. These templates are IDIS2 compliant hence any data they contain can easily be harvested and loaded into IDIS2. The design of these data templates is generic and allows the storage of any form of data. In May 2006, “basin kits” will be sent out to all basins and projects. For each basin the kit includes a wide array of geographical layers that are clipped to the boundary of the basin. Amongst others, each kit includes high resolution topography, 102 years of high resolution climate data, high resolution population data, soil and water access data, etc.
    top

A brief history of IDIS
Past data sharing mechanisms for IWMI and CPWF involved file based data sharing over email and shared network drives. Data was often not documented, not standardized and not protected from deletion and overwriting. Late 2002 an initiative taking advantage of open-source (free of copyrights) technology was launched to resolve the above constrains. This initiative was named IDIS. Its approach was based on an exhaustive model description of all the components involved in the research analysis process according to various domains. Due to the inherent complexity of the task this approach was unfortunately not successful. Early 2004 a new initiative based on data warehousing best practices was launched according to a participatory approach. This second initiative was named IDIS2. After carrying out a thorough consultation a detailed Users’ requirements survey was executed. Conclusions from this survey guided the development of the IDIS2 prototype targeted at validating a proof-of-concept. The IDIS2 prototype was launched in October 2004 and very successfully evaluated by its target group Users in December 2004. This prototype featured only rainfall and streamflow data. In 2005 the approach used by IDIS2 prototype was extended to allow the storage of any data type as well as it geographical representation on a web map. This development work also included the significant design and implementation of a data extraction, transformation and loading (ETL) workflow into the architecture of IDIS2. This workflow was used since to load all available data into IDIS2. In May 2006 IDIS2 was launched to provide access to a wide range of indicators and parameters available from the IWMI and CPWF basins.


top

 

IDIS Working papers


IDIS Presentations

 
© 2007 Integrated Database Information System
Headquarters : 127, Sunil Mawatha, Pelwatte, Battaramulla, Sri Lanka. Telephone +94-11 2880000 | Fax: +94-11 2786854 | Email: idis@cgiar.org