Introduction
to IDIS
The Integrated Database Information System (IDIS) is
an on-line data sharing platform that provides access to water, agriculture and
environment scientific data for poverty alleviation. The main goal of IDIS is
to help IWMI and CPWF scientists as well as their research partners improve the
productivity of water in river basins. IDIS contains over 1 billion time series
records with geographical focus in the IWMI and CPWF river basins. IDIS is
funded by the CPWF and IWMI and hosted in Colombo, Sri Lanka. Please go through
the following sections to learn more about IDIS - you can use the links of the
table of contents to the right to directly point to your subject of interest.
Rationale for
IDIS
It is well accepted and recognized the time and
efforts spent by researchers in gathering, managing and analyzing data are
significant. Such data is usually located in different places, stored under
different file formats, organized according to varying data structures and very
often not documented. As a result it is very difficult to make use of such data
and ensure it is adequately maximized. The lack of appropriate data
inventories, data sharing agreements and understanding of intellectual property
rights (further restrain the identification and usage of the data. The
efficiency of the research lifecycle is significantly
affected by the above constrains hence resulting in high costs required
to adequately deal with data management. Advances in computer science
technology and internet communication have created significant opportunities to
access shared data as well as distributed data. The costs of storage and data
management software have drastically decreased while their quality improved. As
a result high volumes of data are now increasingly available through the
internet.
Intellectual
property rights are now better understood and recognized.
Vision and Goal
Researchers should spend less time on data management
and focus more on research and data analysis. Institutional data needs to be
stored and managed adequatel to ensure data use by future scientists thereby
enhancing the utility of the data and helping reduce the length of the research
lifecycle. Data banks need to be established at appropriate levels to
facilitate easy access to data. Advantage must be taken from existing
opportunities in technology and communication to help resolve constrains.
top
IDIS
Activities
4 pillar
activities support IDIS. Each one is described below.
-
Data
Inventory
The goal of data inventory is to build a searchable
repository that allows researchers, through the internet, to clearly
identify what data is available from who in each basin and when
possible to directly access it through IDIS for inspection and download. Since
late 2004 detailed data inventories have been carried in the Olifants, Ruhuna,
Karkheh and Volta basins to determine what data is available and from which
source. This data inventory was often carried for projects which required rapid
access to such information. In cases where the data could not be retrieved it
was documented in the repository and links to original sources were provided.
In cases where data could be retrieved it was documented and incorporated into
IDIS so that it becomes available through IDIS2. All available data for the
above basins was documented and incorporated into IDIS2. Similarly, all
rainfall and streamflow data consolidated for all basins for the IDIS2
prototype was documented and incorporated into IDIS2. According to best
practices, the documentation of all these data complies with the FGDC metadata
schema which is the de facto international metadata standard. Each metadata
record clearly explains who is the originator of the data, the data unit, the
geographical coverage of the data, the temporal extend of the data, the
accuracy of the data, etc. The metadata repository is supported by GeoNetwork.
GeoNetwork is an open source (free of copyrights) metadata search engine that
allows to easily share metadata between the CGIAR centers, UN Agencies, NGO's
and other institutions.
-
Data Loading
The goal of data loading is to efficiently and promptly extract data
identified as relevant to IDIS2 through data inventory and load it into IDIS2
so that it becomes available for data sharing. In 2004 all rainfall and
streamflow data available across basins was loaded into the IDIS2 prototype for
data sharing. From early 2005 up to now all data made available by the
Olifants, Ruhuna, Karkheh, Volta and Sao Francisco basins was loaded into
IDIS2. Such data loading activities can hardly be automated due to the very
diverse nature of each dataset that is received e.g. data file formats, units,
locations, data structures (how the data is arranged in the file), etc. As a
result a generic data extraction, transformation and loading (ETL) workflow was
designed and implemented to supply data to IDIS2. This ETL workflow guarantees
that optimized procedures can efficiently be applied whenever possible. It also
permits complete tracking and lineage monitoring for every single record
entering IDIS2. Since its implementation this workflow yields an average of
100.000 records per day into IDIS2
independently of the nature of the data, its file structure, its
format, its unit, etc.
-
System
Development
The goal of system development is to design,
implement, maintain and continuously improve the technical architecture of
IDIS. Simply put this architecture is composed of the web layer (all web pages
& services) which retrieves data from the database layer (which contains
all stored data) as well as from the metadata layer (which contains the
associated metadata). All these layers are intricately linked and depend on
each other for adequate operation of IDIS. Since early 2004 the IDIS2 system
development followed an iterative approach as it was both cheaper and better
suited. According to this approach an iterative set of growing system
development lifecycles are under carried to gradually meet all Users’
requirements while always making sure core Users’ needs are satisfied first.
This approach advocates for the sequential release of system versions which
increasingly match the functions and data of the target system. The prototype
demonstrates the proof of concept and validates Users’ requirements. The pilot
features all core functions as well as all data that can be easily made
available. Finally, the target system operates all functions and all data.
During 2004 the IDIS2 Prototype was designed, tested, implemented, released in
October 2004 and very successfully evaluated by its Users in December 2004.
Each layer of this prototype (web, database & metadata) followed a generic
structure based on best practices. That was easy to understand, maintain and
operate. In opposite, the linkages between the database, the web and the
metadata layers were hard coded to save time and quickly deliver a
proof-of-concept. As a result the prototype’s operation required a high level
of support and no additional data type could be easily added without being hard
coded again. Such limitation is common of all prototypes and does not create
issues which are not known or cannot be delt with. Throughout 2005 the
architecture of the target system was designed and tested based on best
practices and lessons learned from the IDIS2 prototype. The design and
implementation of each layer was revisited, optimized and automated whenever
possible. The inter linkage of all layers was designed to be as generic and as
automated as possible. While in the IDIS2 prototype all data was stored in a
few common tables the design of the pilot involved splitting the data storage
from each basin into a separated and dedicated database for faster execution of
queries, improved maintenance and backup services. A new security layer was
introduced to control all data access according to policies, data agreements
and intellectual property rights. A new metadata search engine, GeoNetwork, was
incorporated into the IDIS2 architecture to ensure all IDIS2 metadata was
easily exchanged with all CGIAR centers, UN Agencies, NGO's and other
institutions. In 2006, the final design of the IDIS2 pilot was implemented and
all available data from all basins was loaded into it. The web layer was coded
and tested extensively to ensure a flawless operational environment. In May
2006 the IDIS2 pilot will be launched.
-
Data
usage enhanced
The goal of data usage enhanced is to promote good
data management best practices and ensure adequate support to projects which
require assistance in dealing with data. In 2006 such assistance was mainly
provided to two projects : the E-Flow project (Vladimir Smakhtin) and the
Wetland project (Max Finlayson & Sanjiv de Silva). In May 2006, MS Excel
based data templates will be launched to promote adequate and standardized
storage of data. These templates are IDIS2 compliant hence any data they
contain can easily be harvested and loaded into IDIS2. The design of these data
templates is generic and allows the storage of any form of data. In May 2006,
“basin kits” will be sent out to all basins and projects. For each basin the
kit includes a wide array of geographical layers that are clipped to the
boundary of the basin. Amongst others, each kit includes high resolution
topography, 102 years of high resolution climate data, high resolution
population data, soil and water access data, etc.
top
A brief history
of IDIS
Past data sharing mechanisms for IWMI and CPWF
involved file based data sharing over email and shared network drives. Data was
often not documented, not standardized and not protected from deletion and
overwriting. Late 2002 an initiative taking advantage of open-source (free of
copyrights) technology was launched to resolve the above constrains. This
initiative was named IDIS. Its approach was based on an exhaustive model
description of all the components involved in the research analysis process
according to various domains. Due to the inherent complexity of the task this
approach was unfortunately not successful. Early 2004 a new initiative based on
data warehousing best practices was launched according to a participatory
approach. This second initiative was named IDIS2. After carrying out a thorough
consultation a detailed Users’ requirements survey was executed. Conclusions
from this survey guided the development of the IDIS2 prototype targeted at
validating a proof-of-concept. The IDIS2 prototype was launched in October 2004
and very successfully evaluated by its target group Users in December 2004.
This prototype featured only rainfall and streamflow data. In 2005 the approach
used by IDIS2 prototype was extended to allow the storage of any data type as
well as it geographical representation on a web map. This development work also
included the significant design and implementation of a data extraction,
transformation and loading (ETL) workflow into the architecture of IDIS2. This
workflow was used since to load all available data into IDIS2. In May 2006
IDIS2 was launched to provide access to a wide range of indicators and
parameters available from the IWMI and CPWF basins.
top
IDIS
Working papers
IDIS
Presentations
-
IDIS Launch (PPT format) on
16 June 2006 in IWMI HQ, Colombo, Sri Lanka
-
20060610 IWMI EPMR
Colombo Sri Lanka, 10 June 2006
-
Advisory Group Meeting IWMI-CPWF Data Platform
Colombo, Sri Lanka, 14 December 2005
-
Advisory Group Meeting IWMI-CPWF Data Platform
Colombo, Sri Lanka, 14 October 2005
-
An introduction to Metadata @ IWMI
Colombo, Sri Lanka, 23 September 2004
-
Data Management @ IWMI Ghana
Accra, Ghana, 28 August 2004
-
Making Data Sharing Happen in the CP : It’s All about the Users
MRC, Cambodia, 15 March 2004
-
Sharing data @ IWMI: it's all about the users!
Colombo Sri Lanka, 17 October 2003
-
IDIS for Sri Lanka Steering Committee (PPT
format) in IWMI HQ, Colombo, Sri Lanka
|
|