1. sones
sones Whitepaper
GraphDB on Windows Azure
2. The sones GraphDB
The sones GraphDB is a new, object-oriented data management system based on graphs.
It enables efficient saving, management and evaluation of extremely complex data bases
that are networked to a high degree. The integrated solution combines the advantages of
file storage with the possibilities of a Database Management System. Unstructured data
(e.g. video files), semi-structured data (Meta data, e.g. log files) and structured data can be
linked to each other and thus be managed from one source and be evaluated as needed.
The idea for a new development of a database based on the graph theory arose in the
beginning of 2005 due to dissatisfaction with the existing solutions available in the
market. Often, those offered no or only insufficient ad-hoc analysis functions for queries
into strongly linked data. Besides, these systems were mostly not able to cope with the
frequently changing data and information structures applications have to process
especially in the web environment. Therefore, the development focused on providing
efficient data handling and real time analysis for these tasks which could previously due to
their complexity hardly be solved or only with the highest efforts. A fundamentally new
database technology was developed based on totally independent and new concepts.
The solution includes an object-oriented storage solution (sones GraphFS) for high-
performance data storage as well as the graph-oriented database management system
(sones GraphDB), which links the data to each other and makes their high-performance
evaluation possible. The summation of these two parts into one software offers the
advantage of highly efficient processing since any abstraction layers can be avoided. The
otherwise customary efforts for the operation of a database solution will be extremely
decreased by this solution.
The object-oriented storage solution (sones GraphFS) also stores unstructured and semi-
structured data, like e.g. document formats, photos, more efficiently than file systems,
which are the basis for the operation system. This principle successfully overcomes one of
the significant performance problems in the previous object-oriented
3. databases, which is the mapping of object models in relational tables. A patent has been
filed for the file system.
The graph-oriented data management (sones GraphDB) offers high flexibility of the data
structuring and enables the direct linking of the data without additional auxiliary
constructs like, for example, allocation tables in relational data bases. Since the data
model is oriented on the established programming languages, there is no discrepancy
between application logic and data storage. The complexity is in part drastically smaller
compared to the existing relational solutions. This leads amongst others to the following
advantage: With increasing linkage of the data, the complexity does not increase
exponentially, but in a linear manner. A further decisive advantage is the ability of the
data base to change or amend data structures during runtime. New requirements on the
part of the application can therefore be realized without extensive re-programming of a
relational database schematic.
The Graph Query Language (sones GQL) was developed to facilitate the data
management for the user. While its extent and syntax is oriented on SQL, it was adjusted
to the graph-oriented approach and makes the advantages which exist in this model
accessible to a direct and easily comprehensible query.
4. The Windows Azure platform
Azure Services Platform is an application platform in the cloud that allows applications to
be hosted and run at Microsoft datacenters. It provides a cloud operating system called
Windows Azure that serves as a runtime for the applications and provides a set of services
that allows development, management and hosting of applications off-premises. All Azure
Services and applications built using them run on top of Windows Azure.
Windows Azure has three core components: Compute, Storage and Fabric. As the names
suggest, Compute provides computation environment with Web Role and Worker Role
while Storage focuses on providing scalable storage (Blobs, Tables, Queue, Drives) for
large scale needs.
The hosting environment of Windows Azure is called the Fabric Controller - which pools
individual systems into a network that automatically manages resources, load balancing,
geo-replication and application lifecycle without requiring the hosted apps to explicitly
deal with those requirements. In addition, it also provides other services that most
applications require — such as the Windows Azure Storage Service that provides
applications with the capability to store unstructured data such as binary large objects,
queues, drives and non-relational tables. Applications can also use other services that are a
part of the Azure Services Platform.
Azure Services Platform provides an API built on REST, HTTP and XML that allows a
developer to interact with the services provided by Windows Azure. A client-side
managed class library is also provided that encapsulates the functions of interacting with
the services. It also integrates with Microsoft Visual Studio so that it can be used as the
IDE to develop and publish Azure-hosted applications. Windows Azure is commercially
available as of 1st Feb 2010. Users can purchase Windows Azure service time from the
http://www.microsoft.com/azure website.
Azure also offers Content Delivery (CDN) services as an option. Currently in no-cost
"Community Technology Preview" the Azure CDN enables worldwide low-latency
delivery of static content from Azure Storage to end users from 18 data centers
worldwide.
5. Current State of sones GraphDB on Windows Azure
Currently the sones GraphDB 1.1 is available as a Windows Azure platform edition,
especially optimized so it can run in the Windows Azure cloud. It’s important to note that
this is the first step in a plan to fully utilize the platform and tools Windows Azure is
offering.
(Deployment on Windows Azure)
6. The sones GraphDB is available as a Windows Azure deployment package and can be
configured using the service configuration file. The Windows Azure version of the sones
GraphDB matches all features of the 1.1 OpenSource release:
• Full featured In-Memory Graph Database
• Graph Query Language 1.1
• All Graph* APIs available
• REST Interface to use the GraphDB as a service
• Intuitive browser based AJAX WebShell Interface for easy user access
• The C# API to embed the sones GraphDB into 3rd party Windows Azure services
• API and Plug-In Interface for Graph Algorithms
• comes with “Shortest Path” algorithm
Since the Windows Azure version of the sones GraphDB shares the code base with the
OpenSource and Enterprise version of the sones GraphDB it includes basically all the
technology and APIs that all other editions of the sones GraphDB ship with and therefore
achieves the same performance metrics as these other versions.
That said the sones GraphDB 1.1 on Windows Azure is the perfect match for those who
want to utilize the full flexibility of data and scheme using a full featured Graph Database.
The current maximum RAM size of an extra-large Windows Azure Instance is 14 Gbyte
which essentially is the size limit of one instance of the sones GraphDB 1.1 on Windows
Azure at the moment. To add the capability of virtually unlimited storage space and
therefore increase the number of stored objects beyond the limits of RAM we plan to add
a Windows Azure Storage module.
7. In this first step the sones GraphDB benefits from the easy and hassle free deployment for
additional instances of the sones GraphDB. In further steps it is planned to include these
features:
• using the Windows Azure Service Update features to seamlessly update running
GraphDB instances to newer versions for example through staging
• using Master/Slave replication to use many sones GraphDB Windows Azure
instances to overcome the in-memory limitation
• using a specialized Windows Azure storage service to create a persistency layer
that fully utilizes the Windows Azure platform and helps to overcome the in-
memory size limitation.
These features will be available approximately at the end of 2010 according to the current
development plans of the sones GraphDB 1.2.