DCAT/RDF API Reference

This page documents the DCAT and RDF export functionality.

Overview

The DCAT module provides functionality to export MTNA RDS metadata as RDF graphs following the Data Catalog Vocabulary (DCAT) standard.

Main Class

class dartfx.mtnards.dcat.MtnaRdsDcat(server, datasets=None, catalog=None, data_product=None, uri_style=None)[source]

Bases: object

DCAT exporter for MTNA RDS servers, catalogs, and data products.

__init__(server, datasets=None, catalog=None, data_product=None, uri_style=None)[source]

Initializes the DCAT exporter.

Parameters:
  • server (MtnaRdsServer) – The MTNA RDS server to export from.

  • datasets (set[MtnaRdsDataProduct | str] | None) – A set of data products to include.

  • catalog (MtnaRdsCatalog | None) – Optional single catalog to include.

  • data_product (MtnaRdsDataProduct | None) – Optional single data product to include.

  • uri_style (str | None) – Optional URI generation style (currently unused by this exporter).

server: MtnaRdsServer
uri_style: str | None
catalogs: set[MtnaRdsCatalog]
datasets: set[MtnaRdsDataProduct]
graph()[source]

Alias for get_graph() for backward compatibility with documentation.

get_prefixes_ttl(dataset)[source]
add_catalog(catalog)[source]

Adds a single catalog to the generator

add_catalogs(catalogs)[source]

Adds catalogs to the generator

add_dataset(dataset)[source]

Adds a single dataset to the generator

add_datasets(datasets)[source]

Adds datasets to the generator

get_graph()[source]

Generates RDF graph for server, catalogs, and datasets. The workflow is as follows: - Generate DCAT graph - Initialize the server catalog - Initialize the catalogs - Initialize the datasets add to graph - Add datasets to graph - Add server catalog to graph

URI Styles

The DCAT exporter supports different URI generation styles:

  • "uuid" - UUID without namespace (e.g., 123e4567-e89b-12d3-a456-426614174000)

  • "uuid_urn" - UUID as URN (e.g., urn:uuid:123e4567-e89b-12d3-a456-426614174000)

  • "hostname" - Based on server hostname (e.g., https://rds.example.com/resource/123)

Example Usage

Basic RDF Generation

from dartfx.mtnards import MtnaRdsServer
from dartfx.mtnards.dcat import MtnaRdsDcat

server = MtnaRdsServer(host="rds.highvaluedata.net")
catalog = server.catalogs['us-anes']
data_product = catalog.data_products_by_id['anes_1948']

# Create DCAT exporter
dcat = MtnaRdsDcat(
    server=server,
    catalog=catalog,
    data_product=data_product,
    uri_style="uuid"
)

# Generate RDF graph
graph = dcat.graph()

# Serialize to Turtle format
turtle = graph.serialize(format='turtle')
print(turtle)

Different Output Formats

# Turtle (readable text format)
turtle = graph.serialize(format='turtle')

# XML/RDF
xml = graph.serialize(format='xml')

# JSON-LD
jsonld = graph.serialize(format='json-ld')

# N-Triples
nt = graph.serialize(format='nt')

# N3
n3 = graph.serialize(format='n3')

Querying RDF Graphs

from rdflib import Namespace, RDF

DCAT = Namespace("http://www.w3.org/ns/dcat#")
DCT = Namespace("http://purl.org/dc/terms/")

# Query for all datasets
for subj in graph.subjects(RDF.type, DCAT.Dataset):
    title = graph.value(subj, DCT.title)
    description = graph.value(subj, DCT.description)
    print(f"Dataset: {title}")
    print(f"Description: {description}")
    print()

RDF Namespaces

The DCAT exporter uses the following standard namespaces:

DCAT Classes Used

The exporter generates instances of these DCAT classes:

  • dcat:Catalog - Represents the RDS catalog

  • dcat:Dataset - Represents individual data products

  • dcat:Distribution - Data access information

  • dcat:DataService - RDS server information

  • foaf:Agent - Publishers and creators

Properties

Common properties included in the RDF output:

  • dct:title - Dataset title

  • dct:description - Dataset description

  • dct:identifier - Dataset identifier

  • dct:issued - Release date

  • dct:modified - Last modification date

  • dct:publisher - Publishing organization

  • dcat:keyword - Keywords/tags

  • dcat:theme - Subject categories

  • dcat:contactPoint - Contact information

  • dcat:distribution - Access URLs

Integration with RDFLib

The module uses RDFLib for RDF graph manipulation:

from rdflib import Graph, URIRef, Literal, Namespace

# Create custom graph
g = Graph()

# Add custom triples
CUSTOM = Namespace("http://example.org/ns#")
subject = URIRef("http://example.org/dataset/123")
g.add((subject, CUSTOM.customProperty, Literal("value")))

# Merge with DCAT graph
dcat_graph = dcat.graph()
dcat_graph += g

# Serialize merged graph
print(dcat_graph.serialize(format='turtle'))

Best Practices

Choosing URI Styles

  • Use "uuid_urn" for maximum interoperability

  • Use "hostname" for human-readable URIs

  • Use "uuid" for compact identifiers

Performance Considerations

  • RDF graph generation involves API calls to the RDS server

  • Results are cached where possible

  • For large catalogs, consider generating RDF incrementally

Validation

Validate generated RDF:

# Check graph size
print(f"Triples: {len(graph)}")

# Verify key properties exist
from rdflib import RDF
DCAT = Namespace("http://www.w3.org/ns/dcat#")

datasets = list(graph.subjects(RDF.type, DCAT.Dataset))
print(f"Datasets found: {len(datasets)}")