This project has retired. For details please refer to its Attic page.
Apache Marmotta - Linked Data Client - Usage

Linked Data Client: Usage

Currently, all libraries of the Linked Data Client are available only via Maven or as part of the Marmotta source code.

Maven Artifacts

To use the Linked Data Client in your own projects, please add the following Maven dependencies to your project build:

<dependency>
    <groupId>org.apache.marmotta</groupId>
    <artifactId>ldclient-api</artifactId>
    <version>3.3.0</version>
</dependency>
<dependency>
    <groupId>org.apache.marmotta</groupId>
    <artifactId>ldclient-core</artifactId>
    <version>3.3.0</version>
</dependency>

This will add the basic Linked Data Client support to your project. In addition, you will need at least one data provider backend. Typically, you would at least add the RDF backend for accessing resources conforming to the Linked Data principles:

<dependency>
    <groupId>org.apache.marmotta</groupId>
    <artifactId>ldclient-provider-rdf</artifactId>
    <version>3.3.0</version>
</dependency>

Backends are automatically used by the Linked Data Client as soon as they are found on the classpath. We provide many more backends for different legacy systems. Please see the list in the Introduction and the detailed description in Modules. Some of the backends will be applied automatically, others need to be configured explicitly using so-called “endpoint” configurations (see below).

Code Usage

Basic usage of the Linked Data Client library is very straightforward. You can create a new Linked Data Client with default configuration by simply adding the following statement:

LDClient ldclient = new LDClient();

Optionally, you can also pass a client configuration that allows you to customize the way LDClient will handle requests, e.g. socket and connection timeouts, number of parallel requests, default endpoint configurations, etc. The client configuration can also be updated later using the getClientConfiguration() method.

A resource is requested using the following statement:

ClientResponse result = ldclient.retrieveResource("http://...");

The result is a client response object giving you access to the retrieved triples as well as some metadata about the request (e.g. expiry time as communicated by the server). In case the retrieval fails (timeout or parse errors), the method call will throw a DataRetrievalException that should be handled by the caller. You can access the retrieved triples as follows:

RepositoryConnection con = result.getTriples().getConnection();
con.begin();

// access the connection, e.g. using SPARQL
...

con.commit();
con.close();

The LDClient instance manages an internal connection pool and connection monitor. To avoid resource leakage, you should:

  • reuse the same LDClient instance for all Linked Data requests that have the same configuration
  • properly shutdown each LDClient instance with ldclient.shutdown()

Providers and Endpoints

To allow access to different kinds of data sources and therefore wrap different formats into RDF, the LDClient library has two important concepts: providers and endpoints:

  • a provider implements the functionality of retrieving the data from a data source of a given type, e.g. MediaWiki systems; a provider typically carries out a mapping from a legacy data format to RDF triples, and in many cases even rewrites the resource URL to the actual URL that is used to retrieve the data (e.g. mapping the wiki article to the MediaWiki API endpoint), as well as implements the actual access protocol (e.g. HTTP or LDAP); for the most common case of HTTP resources, the ldclient-core package offers an abstract superclass already implementing HTTP access
  • a endpoint defines how to access a concrete class of resources; for example, it could define that all URLs that match a Wikipedia page should use the MediaWiki provider with a certain configuration; through endpoint definitions it is also possible to exclude retrieval of certain URL patterns by using the “NONE” provider and thus create blacklists

In some cases, a provider will only have a single endpoint definition, e.g. when accessing popular webservices like YouTube. Similarly, in some cases there will be reasonable default endpoint configurations (e.g. mapping Wikipedia to MediaWiki API). In both cases, such endpoints will be registered automatically when the library is found on the classpath using the Java service registry. In all other cases, it is necessary to explicitly register endpoint configurations with the LDClient instance. See the API Javadoc for more details.