This project has retired. For details please refer to its Attic page.
Apache Marmotta - Platform - Cloud Setup

Apache Marmotta Platform: Cloud Setup

Introduction

Starting with version 3.2, Apache Marmotta can be configured to run in a clustered or cloud environment. Instances in the cluster are managed using Apache Zookeeper, which needs to be installed separately. Also, clustered setup only works when using the KiWi triple store backend, and using a common database (or database cluster) for all servers in a cluster.

The following diagram gives an overview over a typical cluster setup of Apache Marmotta:

Zookeeper Setup

Setting up Zookeeper for Apache Marmotta is straightforward, as the Zookeeper module does most of the initialisation for you. To connect to a running Zookeeper server, the following configuration options need to be passed over to Marmotta, either as system properties or as servlet context parameters:

  • zookeeper.server (required): comma-separated list of Zookeeper server names with optional port numbers for all Zookeeper instances in your setup
  • zookeeper.timeout (optional, default 60000): timeout for connecting to the Zookeeper servers; if the server does not respond within this time in milliseconds, the connection is ignored
  • zookeeper.cluster (optional, default “default”): name of the cluster to which this instance belongs; if not given, the cluster name will be “default”
  • zookeeper.instance (optional, default random UUID): name of the instance inside the cluster; if not given, will be created as a random UUID

The main feature of the Zookeeper module is that Marmotta instances can be automatically configured through Zookeeper. Marmotta instances will react to configuration changes in Zookeeper and update their local configuration accordingly. The Apache Marmotta configuration stored in Zookeeper follows the following structure:

+ marmotta
    + config   - global configuration options
    |   + <config_key> <config_value>
    |   + ...
    + clusters
      + default
      |   + config   - cluster-level configuration options
      |   |   + <config_key> <config_value>
      |   |   + ...
      |   + snowflake  - used for generating unique IDs
      |   + instances
      |       + <instance_name>
      |       |   + config - instance-level configuration options
      |       |       + <config_key> <config_value>
      |       |       + ...
      |       + <instance_name2>
      |       |   + ...
      |       + ...
      + <cluster1>
          + config
          |   + ...
          + ...

Configuration values can be stored on either the global level, in which case the configuration applies to all Marmotta instances, the cluster level, in which case the configuration applies to only those Marmotta instances in a single cluster, or the instance level, in which case the configuration only applies to a specific Marmotta instance. More specific configurations take precedence over more generic configurations.

Database Setup

All servers in a cluster need to be configured to access the same database. Therefore, the following configuration properties should be defined on the cluster level in Zookeeper:

  • database.type: type of the database to use; PostgreSQL or MySQL are preferrable
  • database.url: JDBC URL for accessing the database
  • database.user: user name for accessing the database
  • database.password: password for accessing the database

When initialising the cluster for the first time, it is advisable to first start up only a single Marmotta instance to let it allow setting up the necessary database tables. When the database initialisation is complete, all other instances cam be started up in any order. When starting up a new instance, the Zookeeper module will automatically create a proper datacenter ID for generating database IDs that are unique over the cluster.

When running in a high-load environment, it is also useful to startup the database in a database cluster. This is e.g. supported by PostgreSQL. The setup of a high-availability cluster is described in the PostgreSQL documentation.

Additional Configuration

For properly running Apache Marmotta in a cluster, the following additional configurations need to be considered:

  • kiwi.context should be set on the cluster level so that all URIs are constructed using the same URI prefix; normally, this option is automatically set to the host name of the server, but for a cluster it should be set to the host name of the load balancer
  • kiwi.host should also be set on the cluster level so that webpage links in the admin interface are properly configured; note however that the Marmotta admin user interface should not be used for changing the configuration, this should be done through Zookeeper