Pydantic ↔ RDF Integration
The dartfx.rdf.pydantic module adds a thin mixin that lets you
annotate Pydantic models with RDF metadata, build rdflib.Graph instances, and
reconstruct the models from existing graphs. This page walks through the most
important building blocks and patterns.
Quick start
Import
RdfBaseModelandRdfProperty.Define a namespace for your resources and declare any prefixes you want to be emitted in the resulting graph.
Annotate each serialisable field with an RDF predicate.
from typing import Annotated, List
from rdflib import Namespace, URIRef
from dartfx.rdf.pydantic import RdfBaseModel, RdfProperty
EX = Namespace("https://example.org/ns/")
class Organisation(RdfBaseModel):
rdf_type = EX.Organisation
rdf_namespace = EX
rdf_prefixes = {"ex": EX}
id: str
name: Annotated[str, RdfProperty(EX.name)]
homepage: Annotated[URIRef, RdfProperty(EX.homepage)]
keywords: Annotated[List[str], RdfProperty(EX.keyword)]
org = Organisation(
id="toolkit",
name="RDF Toolkit",
homepage=URIRef("https://example.org/toolkit"),
keywords=["python", "metadata"],
)
turtle = org.to_rdf(format="turtle")
RdfBaseModel takes care of creating a subject identifier, emitting RDF
triples for every annotated field, and binding the default prefixes. The graph
returned by to_rdf_graph() can be
serialised in any format supported by rdflib.
Mapping rules
The
predicateargument onRdfPropertycan be either a fullrdflib.term.URIRefor a string. Strings will be coerced intoURIRefinstances at runtime.A model-level
rdf_typeconstant addsrdf:typetriples for every instance.If
rdf_namespaceis defined and the model exposes anidvalue (or the field configured viardf_id_field), the identifier is appended to the namespace. Absolute identifiers, such as UUID URNs or HTTP URLs, are used as provided.Lists of annotated fields are emitted as repeated predicate/object pairs. The same applies to nested
RdfBaseModelsubclasses, which are recursively serialised.
Reading data back
Instances can be rehydrated from either a graph object or a textual serialisation.
clone = Organisation.from_rdf(turtle, format="turtle")
assert clone == org
When a model sets rdf_type the parser uses it to locate the correct subject
in the graph. Otherwise it expects the graph to contain exactly one subject and
raises an error if there are multiple candidates. You can always bypass the
heuristics by passing the subject keyword argument to
from_rdf_graph() or
from_rdf().
Custom Datatypes
RdfProperty accepts an optional datatype parameter to fine-tune literal
serialisation. Datatypes may be defined as strings, namespace terms, or full
URIRef instances.
Handle URIs specifically by choosing between resource identifiers or typed literals:
Resource identifiers: Use
rdflib.URIRefas the field type. The toolkit will ensure these are emitted as URI nodes in the graph.XSD.anyURI literals: Use
str(or Pydantic’sAnyUrl) and setdatatype=XSD.anyURI. This emits a literal with an explicit datatype.
from pydantic import AnyUrl
from rdflib import XSD, SCHEMA, URIRef
class Dataset(RdfBaseModel):
rdf_type = EX.Dataset
rdf_namespace = EX
id: str
created: Annotated[str, RdfProperty(EX.created, datatype=XSD.date)]
# Serialized as a URI Resource
see_also: Annotated[URIRef | None, RdfProperty(SCHEMA.seeAlso)] = None
# Serialized as "..."^^xsd:anyURI
download_url: Annotated[AnyUrl | None, RdfProperty(SCHEMA.downloadUrl, datatype=XSD.anyURI)] = None
dataset = Dataset(
id="demo",
title="Example",
created="2024-03-01",
see_also=URIRef("https://example.org/docs"),
download_url="https://example.org/files/data.zip"
)
graph = dataset.to_rdf_graph()
Custom serialisation hooks
When you need more control, RdfProperty allows you to pass serializer
and parser callables. serializer receives the field value and must
return an rdflib node; parser runs during deserialisation and receives
whatever node was found in the graph.
def to_uppercase(value: str) -> str:
return value.upper()
def parse_lower(node) -> str:
return str(node).lower()
class TaggedConcept(RdfBaseModel):
rdf_type = EX.Concept
rdf_namespace = EX
id: str
label: Annotated[
str,
RdfProperty(EX.label, serializer=to_uppercase, parser=parse_lower),
]
concept = TaggedConcept(id="term", label="Toolkit")
round_trip = TaggedConcept.from_rdf(concept.to_rdf())
assert round_trip.label == "toolkit"
Advanced scenarios
Override
rdf_id_fieldif your identifier lives on a different field name.Supply
rdf_prefixesto bind additional prefixes on the emitted graph.Set
base_uriwhen serialising or parsing if you want generated identifiers to be relative to an external namespace instead ofrdf_namespace.
The tests in tests.test_pydantic_rdf provide additional examples that
cover nested resources, optional values, and custom datatypes.
Subject URI generation
By default, RdfBaseModel delegates subject URI
creation to a RdfUriGenerator — a simple
typing.Protocol satisfied by any callable with the signature:
(model: RdfBaseModel, *, base_uri: str | None = None) -> URIRef | BNode
The default strategy is DefaultUriGenerator,
which resolves a subject in the following order:
If
rdf_id_fieldis set and non-None: build a URI from the value (prepend namespace / base_uri, or use as-is if already an absolute URI).If no identifier: mint a UUID URI (
auto_uuid=True).If
auto_uuid=False: return aBNode.
Note
Why auto_uuid=True is the default
Strictly speaking, an anonymous resource should be a Blank Node. However, UUID URIs are the practical default because they:
travel across graph boundaries (BNodes cannot),
survive round-trips through parse/serialise cycles, and
never collide when two graphs are merged.
Use DefaultUriGenerator(auto_uuid=False) when you explicitly want
anonymous, locally-scoped resources (e.g. reified statements).
Replacing the default generator
Assign any RdfUriGenerator to the
rdf_uri_generator field — either at the class level (as a default for
all instances) or at the instance level (to override per object):
from dartfx.rdf.pydantic import RdfBaseModel, DefaultUriGenerator
# Class-level: all instances use BNodes unless they have an id
class Statement(RdfBaseModel):
rdf_uri_generator = DefaultUriGenerator(auto_uuid=False)
...
# Instance-level: one specific object gets a custom generator
person = Person(
id="alice",
rdf_uri_generator=lambda model, *, base_uri=None: EX[type(model).__name__],
)
You can also pass a generator at call-site, which takes priority over the instance:
graph = person.to_rdf_graph(rdf_uri_generator=my_call_site_generator)
Built-in generators
The _uri_generators module provides four
ready-to-use implementations beyond DefaultUriGenerator.
All are exported from dartfx.rdf.pydantic.
Generator |
Use when… |
|---|---|
The URI shape is known and model fields supply the parts. |
|
No stable id; need deterministic, content-addressable URIs. |
|
Multiple strategies needed with a clear priority order. |
|
Lightest option: just |
TemplateUriGenerator
Builds URIs from a Python format-string where {field_name} placeholders
are replaced by model field values. Returns a BNode if a
required field is None.
from dartfx.rdf.pydantic import TemplateUriGenerator
class Dataset(RdfBaseModel):
rdf_type = EX.Dataset
rdf_uri_generator = TemplateUriGenerator(
"https://example.org/datasets/{year}/{slug}"
)
year: int | None = None
slug: str | None = None
ds = Dataset(year=2024, slug="climate")
# Subject: <https://example.org/datasets/2024/climate>
HashUriGenerator
Produces a deterministic URI by hashing the concatenated values of specified model fields. Useful for deduplication across separate serialisations.
from dartfx.rdf.pydantic import HashUriGenerator
class Publication(RdfBaseModel):
rdf_uri_generator = HashUriGenerator(
namespace="https://example.org/pub/",
fields=["doi", "title"],
algorithm="sha256", # default
)
doi: str | None = None
title: str | None = None
pub = Publication(doi="10.1234/ex", title="My Paper")
# Subject: <https://example.org/pub/<sha256-digest>>
The hash is computed over "|".join(str(v) for v in fields if v is not None).
Returns a BNode if all specified fields are None.
CompositeUriGenerator
Tries a sequence of generators in order and returns the result of the first
one that produces a URIRef. Falls back to
BNode if all generators fail.
from dartfx.rdf.pydantic import (
CompositeUriGenerator,
DefaultUriGenerator,
HashUriGenerator,
)
gen = CompositeUriGenerator(
DefaultUriGenerator(auto_uuid=False), # use id if set, else try next
HashUriGenerator("https://example.org/h/", ["title"]),
)
class Article(RdfBaseModel):
rdf_uri_generator = gen
id: str | None = None
title: str | None = None
PrefixedUriGenerator
The simplest option: concatenates a fixed prefix with the value of a single model field.
from dartfx.rdf.pydantic import PrefixedUriGenerator
class Concept(RdfBaseModel):
rdf_uri_generator = PrefixedUriGenerator(
prefix="https://vocab.example.org/concepts/",
field="code",
)
code: str | None = None
label: str | None = None
c = Concept(code="001", label="Agriculture")
# Subject: <https://vocab.example.org/concepts/001>
Returns a BNode when the field value is None.
Custom generators
Any callable with the right signature qualifies:
from rdflib import URIRef, BNode
from dartfx.rdf.pydantic import RdfBaseModel, RdfUriGenerator
def my_generator(
model: RdfBaseModel,
*,
base_uri: str | None = None,
) -> URIRef | BNode:
return EX[f"{type(model).__name__}/{model.id}"]
assert isinstance(my_generator, RdfUriGenerator) # True — protocol is runtime-checkable
# Or as a class with __call__:
class MyGenerator:
def __call__(
self,
model: RdfBaseModel,
*,
base_uri: str | None = None,
) -> URIRef | BNode:
...