Biological data (functional, experimental, and modeling) do not easily fit into
a relational schema. One problem is capturing ambiguity and multiple hypotheses
in the model. A classical object-relational model must be revised and retested
to keep up with advancement of the science being modeled. This can cause software
development to be rate-limiting in drug discovery.
The MetaGraph Framework was built to solve this problem by supporting a very
granular graph-like data structure by using a fixed number of types of relationships
(edges) that can link any two data elements (nodes). Edge types represent concepts
like ownership, description, equality, and temporal order. There are also a fixed
number of root node types that are sub-classed to represent specific types of
data. Edges are a subclass of nodes, allowing edges to point to other edges.
A piece of data, and the reason that a relationship exists between two pieces
of data, can be documented.
We developed an open source Java toolkit in three layers. The base layer provides
persistence; SQL persistence is currently implemented (through JDBC), and XML
persistence and temporary persistence (in memory) are also implemented. Using
the layer above the persistence layer one can individually find and manipulate
the node and edge objects. The top level supports "clusters," which
are used to find and manipulate functional groups of nodes and edges. This
Framework has been deployed for several internal projects, and as the basis
of the GeneTrove
functional genomics database.
Continue to MetaGraph Details
Return to MetaGraph.org home