DCAT/RDF API Reference
This page documents the DCAT and RDF export functionality.
Overview
The DCAT module provides functionality to export MTNA RDS metadata as RDF graphs following the Data Catalog Vocabulary (DCAT) standard.
Main Class
- class dartfx.mtnards.dcat.MtnaRdsDcat(server, datasets=None, catalog=None, data_product=None, uri_style=None)[source]
Bases:
objectDCAT exporter for MTNA RDS servers, catalogs, and data products.
- __init__(server, datasets=None, catalog=None, data_product=None, uri_style=None)[source]
Initializes the DCAT exporter.
- Parameters:
server (MtnaRdsServer) – The MTNA RDS server to export from.
datasets (set[MtnaRdsDataProduct | str] | None) – A set of data products to include.
catalog (MtnaRdsCatalog | None) – Optional single catalog to include.
data_product (MtnaRdsDataProduct | None) – Optional single data product to include.
uri_style (str | None) – Optional URI generation style (currently unused by this exporter).
- server: MtnaRdsServer
- catalogs: set[MtnaRdsCatalog]
- datasets: set[MtnaRdsDataProduct]
- graph()[source]
Alias for
get_graph()for backward compatibility with documentation.
URI Styles
The DCAT exporter supports different URI generation styles:
"uuid"- UUID without namespace (e.g.,123e4567-e89b-12d3-a456-426614174000)"uuid_urn"- UUID as URN (e.g.,urn:uuid:123e4567-e89b-12d3-a456-426614174000)"hostname"- Based on server hostname (e.g.,https://rds.example.com/resource/123)
Example Usage
Basic RDF Generation
from dartfx.mtnards import MtnaRdsServer
from dartfx.mtnards.dcat import MtnaRdsDcat
server = MtnaRdsServer(host="rds.highvaluedata.net")
catalog = server.catalogs['us-anes']
data_product = catalog.data_products_by_id['anes_1948']
# Create DCAT exporter
dcat = MtnaRdsDcat(
server=server,
catalog=catalog,
data_product=data_product,
uri_style="uuid"
)
# Generate RDF graph
graph = dcat.graph()
# Serialize to Turtle format
turtle = graph.serialize(format='turtle')
print(turtle)
Different Output Formats
# Turtle (readable text format)
turtle = graph.serialize(format='turtle')
# XML/RDF
xml = graph.serialize(format='xml')
# JSON-LD
jsonld = graph.serialize(format='json-ld')
# N-Triples
nt = graph.serialize(format='nt')
# N3
n3 = graph.serialize(format='n3')
Querying RDF Graphs
from rdflib import Namespace, RDF
DCAT = Namespace("http://www.w3.org/ns/dcat#")
DCT = Namespace("http://purl.org/dc/terms/")
# Query for all datasets
for subj in graph.subjects(RDF.type, DCAT.Dataset):
title = graph.value(subj, DCT.title)
description = graph.value(subj, DCT.description)
print(f"Dataset: {title}")
print(f"Description: {description}")
print()
RDF Namespaces
The DCAT exporter uses the following standard namespaces:
DCAT: http://www.w3.org/ns/dcat# - Data Catalog Vocabulary
DCT: http://purl.org/dc/terms/ - Dublin Core Terms
FOAF: http://xmlns.com/foaf/0.1/ - Friend of a Friend
RDF: http://www.w3.org/1999/02/22-rdf-syntax-ns# - RDF Syntax
RDFS: http://www.w3.org/2000/01/rdf-schema# - RDF Schema
XSD: http://www.w3.org/2001/XMLSchema# - XML Schema
DCAT Classes Used
The exporter generates instances of these DCAT classes:
dcat:Catalog- Represents the RDS catalogdcat:Dataset- Represents individual data productsdcat:Distribution- Data access informationdcat:DataService- RDS server informationfoaf:Agent- Publishers and creators
Properties
Common properties included in the RDF output:
dct:title- Dataset titledct:description- Dataset descriptiondct:identifier- Dataset identifierdct:issued- Release datedct:modified- Last modification datedct:publisher- Publishing organizationdcat:keyword- Keywords/tagsdcat:theme- Subject categoriesdcat:contactPoint- Contact informationdcat:distribution- Access URLs
Integration with RDFLib
The module uses RDFLib for RDF graph manipulation:
from rdflib import Graph, URIRef, Literal, Namespace
# Create custom graph
g = Graph()
# Add custom triples
CUSTOM = Namespace("http://example.org/ns#")
subject = URIRef("http://example.org/dataset/123")
g.add((subject, CUSTOM.customProperty, Literal("value")))
# Merge with DCAT graph
dcat_graph = dcat.graph()
dcat_graph += g
# Serialize merged graph
print(dcat_graph.serialize(format='turtle'))
Best Practices
Choosing URI Styles
Use
"uuid_urn"for maximum interoperabilityUse
"hostname"for human-readable URIsUse
"uuid"for compact identifiers
Performance Considerations
RDF graph generation involves API calls to the RDS server
Results are cached where possible
For large catalogs, consider generating RDF incrementally
Validation
Validate generated RDF:
# Check graph size
print(f"Triples: {len(graph)}")
# Verify key properties exist
from rdflib import RDF
DCAT = Namespace("http://www.w3.org/ns/dcat#")
datasets = list(graph.subjects(RDF.type, DCAT.Dataset))
print(f"Datasets found: {len(datasets)}")