Startup Pitch: HYDRA - an Engine for Ad Hoc Querying and Data Federation for Bioinformatics and Clinical Intelligence
|
If you are the presenter of this abstract (or if you cite this abstract in a talk or on a poster), please show the QR code in your slide or poster (QR code contains this URL). |
Abstract
Need for Data Federation and Self-Service Querying.
Finding and integrating information from multiple heterogeneous and distributed resources accounts for a large share of time and costs in Bioinformatic and Cheminformatic knowledge management activities. Typically users need to draw information from hundreds of completely autonomous resources, such as online biomedical databases, nomenclatures, clinical databases and specialised analytical Web services. State-of-the-art approaches to data integration - datawarehousing and workflow scripting - are both costly and limited in scope.
In the data federation paradigm, query access to multiple heterogeneous distributed resources is the same as querying a single database. In addition, real world business use cases require querying to be self-service, so that non-technical users - scientists and biotechnologists - can run ad hoc queries without help from programmers. The need for self-service querying is equally acute in the Clinical Intelligence context where data - typically relational and textual - has to be analysed by clinical trial professionals, health care managers, surveillance practitioners and clinical researchers.
SADI Semantic Web services.
Our solution leverages the power of the SADI (Semantic Automated Discovery and Integration) framework - a set of conventions that turn simple HTTP-based Web services into Semantic Web services that can be fully automatically discovered, composed and called by client programs. Practically, this means that numerous databases and algorithms can be queried as a single database. This is achieved by associating a description with each SADI service that unambiguously defines what the service does, thus facilitating the discovery of the service by client programs when they need the corresponding functionality.
The schema of the virtual database represented by a network of SADI services is, essentially, a controlled vocabulary containing concepts and relations from the subject domain, e.g., biology, chemistry or health care, which can be understood by non-technical users, unlike most of the real-life relational and XML database schemas. This semantic exposition of the data represented by networks of SADI services facilitates self-service ad hoc querying.
SADI has been developed since 2007 by several bioinformatics laboratories in North America and Europe. It currently comprises high-quality open-source libraries for easy service creation, infrastructure for service discovery and a few client program prototypes. Several case studies have been performed to test the technology in data integration scenarios in genomics, cheminformatics, lipidomics, toxicology and Clinical Intelligence. 600+ SADI services for Bioinformatics and Cheminformatics have been created.
Startup, Technology and Products.
IPSNP Computing Inc. based in Saint John, Canada, was set up to
commercialize prior university-based research on data federation and semantic querying with SADI. The core technology is a high-performance query engine (working title HYDRA) operating on networks of SADI services representing various distributed resources. HYDRA will be packaged and licensed as two products: an intuitive end user-oriented querying and data browsing tool, including a software-as-a-service edition, and an OEM-oriented Java toolkit. IPSNP will target Bioinformatics and Clinical Intelligence markets and, later, other verticals requiring self-service ad hoc federated querying.
Finding and integrating information from multiple heterogeneous and distributed resources accounts for a large share of time and costs in Bioinformatic and Cheminformatic knowledge management activities. Typically users need to draw information from hundreds of completely autonomous resources, such as online biomedical databases, nomenclatures, clinical databases and specialised analytical Web services. State-of-the-art approaches to data integration - datawarehousing and workflow scripting - are both costly and limited in scope.
In the data federation paradigm, query access to multiple heterogeneous distributed resources is the same as querying a single database. In addition, real world business use cases require querying to be self-service, so that non-technical users - scientists and biotechnologists - can run ad hoc queries without help from programmers. The need for self-service querying is equally acute in the Clinical Intelligence context where data - typically relational and textual - has to be analysed by clinical trial professionals, health care managers, surveillance practitioners and clinical researchers.
SADI Semantic Web services.
Our solution leverages the power of the SADI (Semantic Automated Discovery and Integration) framework - a set of conventions that turn simple HTTP-based Web services into Semantic Web services that can be fully automatically discovered, composed and called by client programs. Practically, this means that numerous databases and algorithms can be queried as a single database. This is achieved by associating a description with each SADI service that unambiguously defines what the service does, thus facilitating the discovery of the service by client programs when they need the corresponding functionality.
The schema of the virtual database represented by a network of SADI services is, essentially, a controlled vocabulary containing concepts and relations from the subject domain, e.g., biology, chemistry or health care, which can be understood by non-technical users, unlike most of the real-life relational and XML database schemas. This semantic exposition of the data represented by networks of SADI services facilitates self-service ad hoc querying.
SADI has been developed since 2007 by several bioinformatics laboratories in North America and Europe. It currently comprises high-quality open-source libraries for easy service creation, infrastructure for service discovery and a few client program prototypes. Several case studies have been performed to test the technology in data integration scenarios in genomics, cheminformatics, lipidomics, toxicology and Clinical Intelligence. 600+ SADI services for Bioinformatics and Cheminformatics have been created.
Startup, Technology and Products.
IPSNP Computing Inc. based in Saint John, Canada, was set up to
commercialize prior university-based research on data federation and semantic querying with SADI. The core technology is a high-performance query engine (working title HYDRA) operating on networks of SADI services representing various distributed resources. HYDRA will be packaged and licensed as two products: an intuitive end user-oriented querying and data browsing tool, including a software-as-a-service edition, and an OEM-oriented Java toolkit. IPSNP will target Bioinformatics and Clinical Intelligence markets and, later, other verticals requiring self-service ad hoc federated querying.
Medicine 2.0® is happy to support and promote other conferences and workshops in this area. Contact us to produce, disseminate and promote your conference or workshop under this label and in this event series. In addition, we are always looking for hosts of future World Congresses. Medicine 2.0® is a registered trademark of JMIR Publications Inc., the leading academic ehealth publisher.

This work is licensed under a Creative Commons Attribution 3.0 License.