DDI-Codebook to DDI-CDI CDIF Mappings
Overview
This document describes the mappings implemented in the codebook_to_cdif method that converts DDI-Codebook (version 2.5) metadata into DDI-CDI (version 1.0) CDIF profile resources.
The conversion process transforms DDI-Codebook elements into a dictionary of DDI-CDI resources that can be serialized to RDF or other formats. The mapping follows the Cross Domain Integration Framework (CDIF) profile specifications.
General Approach
Each DDI-Codebook resource is assigned a UUID-based identifier
Codebook IDs are preserved as non-DDI identifiers with type
"ddi-codebook"The method supports two modes for representing value domains:
SKOS mode (
use_skos=True): Uses SKOS ConceptSchemes and ConceptsStandard mode (
use_skos=False): Uses DDI-CDI Code, CodeList, Category, and CategorySet
Variable-Level Mappings
DDI-Codebook Variable → DDI-CDI InstanceVariable
For each variable (var) in the codebook:
Source (DDI-Codebook) |
Target (DDI-CDI) |
Notes |
|---|---|---|
|
|
Used as part of the unique identifier |
|
|
Preserved with type |
|
|
Set via |
|
|
Set via |
Value Domain Mappings
Variables with categories are mapped to value domains. The mapping depends on whether categories are substantive (data values) or sentinel (missing values).
Substantive Value Domain (Non-Missing Categories)
Created when var.n_non_missing_catgry > 0
Standard Mode (use_skos=False)
Mapping hierarchy:
DDI-Codebook Variable (with non-missing categories)
↓
DDI-CDI SubstantiveValueDomain
↓ (takesValuesFrom)
DDI-CDI CodeList
↓ (has CategorySet)
DDI-CDI CategorySet
Source |
Target |
Relationship |
|---|---|---|
|
|
Created with |
SubstantiveValueDomain |
InstanceVariable |
Relationship: |
SubstantiveValueDomain |
CodeList |
Relationship: |
CodeList |
CategorySet |
Relationship: |
SKOS Mode (use_skos=True)
Mapping hierarchy:
DDI-Codebook Variable (with non-missing categories)
↓
DDI-CDI SubstantiveValueDomain
↓ (takesValuesFrom)
SKOS ConceptScheme
↓ (hasTopConcept)
SKOS Concept(s)
Source |
Target |
Notes |
|---|---|---|
|
|
Created with |
SubstantiveValueDomain |
SKOS ConceptScheme |
URI: |
Sentinel Value Domain (Missing Categories)
Created when var.n_missing_catgry > 0
Standard Mode (use_skos=False)
Mapping hierarchy:
DDI-Codebook Variable (with missing categories)
↓
DDI-CDI SentinelValueDomain
↓ (takesValuesFrom)
DDI-CDI CodeList (sentinel)
↓ (has CategorySet)
DDI-CDI CategorySet (sentinel)
Source |
Target |
Relationship |
|---|---|---|
|
|
Created with |
SentinelValueDomain |
InstanceVariable |
Relationship: |
SentinelValueDomain |
CodeList |
Relationship: |
CodeList |
CategorySet |
Relationship: |
SKOS Mode (use_skos=True)
Mapping hierarchy:
DDI-Codebook Variable (with missing categories)
↓
DDI-CDI SentinelValueDomain
↓ (takesValuesFrom)
SKOS ConceptScheme (sentinel)
↓ (hasTopConcept)
SKOS Concept(s)
Source |
Target |
Notes |
|---|---|---|
|
|
Created with |
SentinelValueDomain |
SKOS ConceptScheme |
URI: |
Category and Code Mappings
For each category (catgry) within a variable:
Standard Mode (use_skos=False)
Mapping hierarchy:
DDI-Codebook catgry
↓
DDI-CDI Code ← (usesNotation) ← Notation
↓ (denotes)
DDI-CDI Category
Source (DDI-Codebook) |
Target (DDI-CDI) |
Notes |
|---|---|---|
|
|
Sanitized and URL-encoded as |
|
Non-DDI identifier on Code |
Type: |
|
|
If label exists; otherwise uses |
|
|
Set via |
Code |
Notation |
Relationship: |
Code |
Category |
Relationship: |
Notation |
Category |
Relationship: |
Code-Category Distribution:
If
catgry.is_missing == False: Added to substantive CodeList and CategorySetIf
catgry.is_missing == True: Added to sentinel CodeList and CategorySet
SKOS Mode (use_skos=True)
Mapping hierarchy:
DDI-Codebook catgry
↓
SKOS Concept
Source (DDI-Codebook) |
Target (SKOS) |
Notes |
|---|---|---|
|
|
Added via |
|
|
Added via |
Concept |
ConceptScheme |
Relationship: |
Concept URI Format:
{base_uuid}_Concept_{var.id}_{code_value_uid}
Where code_value_uid is the URL-encoded, sanitized version of catValu.
Dataset and Structure Mappings
For each file description (fileDscr) in the codebook:
DDI-Codebook fileDscr → DDI-CDI DataSet
Source (DDI-Codebook) |
Target (DDI-CDI) |
Notes |
|---|---|---|
|
|
Used as part of unique identifier |
|
|
Preserved with type |
DDI-Codebook fileDscr → DDI-CDI LogicalRecord
Source (DDI-Codebook) |
Target (DDI-CDI) |
Notes |
|---|---|---|
|
|
Used as part of unique identifier |
|
|
Preserved with type |
LogicalRecord |
DataSet |
Relationship: |
LogicalRecord |
InstanceVariable(s) |
Relationship: |
DDI-Codebook → DDI-CDI DataStructure
Source (DDI-Codebook) |
Target (DDI-CDI) |
Notes |
|---|---|---|
|
|
Uses codebook ID, not file ID |
|
|
Preserved with type |
DataStructure |
DataSet |
Relationship: |
Variable Positioning in DataStructure
For each variable in a file:
Attribute |
Value / Notes |
|---|---|
Position |
Sequential (0, 1, 2…) - Zero-based order within file |
ComponentPosition |
Created for each variable to track its ordinal sequence in the data structure |
Mapping hierarchy:
DataStructure
↓ (has_ComponentPosition)
ComponentPosition (value = pos)
Resource Organization
All created resources are stored in a dictionary with their URI as the key:
{
"uri1": InstanceVariable,
"uri2": SubstantiveValueDomain,
"uri3": CodeList,
"uri4": Category,
...
}
This structure allows for:
Efficient lookup by URI
Easy serialization to RDF via
add_to_rdf_graph()Preservation of all relationships between resources
Identifier Strategy
UUID Generation
A single
base_uuidis generated for the entire conversionAll resource IDs use this base UUID with unique suffixes
ID Suffix Patterns
Resource Type |
Suffix Pattern |
Example |
|---|---|---|
InstanceVariable |
|
|
SubstantiveValueDomain |
|
|
SentinelValueDomain |
|
|
Substantive CodeList |
|
|
Sentinel CodeList |
|
|
Substantive CategorySet |
|
|
Sentinel CategorySet |
|
|
Code/Category/Notation |
|
|
DataSet |
|
|
LogicalRecord |
|
|
DataStructure |
|
|
Non-DDI Identifiers
All resources that map from DDI-Codebook elements preserve the original ID:
Type:
"ddi-codebook"Value: Original
@IDattribute from Codebook
Important Assumptions
ID Requirements: The codebook files and variables must have their
@IDattribute setFile Subsetting: The
filesparameter for selective conversion is not yet implementedCategory Classification: Categories are classified as missing/sentinel based on the
is_missingattributeLabel Fallback: If a category has no label, the code value is used as the label
URI Sanitization: Code values are URL-encoded and spaces are replaced with underscores for URI safety
Processing Order
Variables: Process all variables and their categories first
Datasets: Process file descriptions and create dataset structures
Associations: Link variables to logical records and data structures
This order ensures that all InstanceVariable objects are created before they are referenced by LogicalRecords and DataStructures.
SKOS vs Standard Mode Comparison
Aspect |
SKOS Mode |
Standard Mode |
|---|---|---|
Value representation |
SKOS ConceptScheme + Concepts |
CodeList + CategorySet + Code + Category |
Notation |
On Concept |
Separate Notation resource |
Label |
prefLabel on Concept |
Name on Category + content on Notation |
Hierarchy |
hasTopConcept relationship |
Code denotes Category |
Complexity |
Simpler (2 resource types) |
More complex (4 resource types) |
Standards alignment |
Uses W3C SKOS |
Pure DDI-CDI |
Method Signature
def codebook_to_cdif(
codebook: codeBookType,
baseuri: str = None,
files: list[str] = None,
use_skos: bool = True
) -> dict[str, DdiCdiResource]
Parameters
- codebook:
The DDI-Codebook object to convert (must be
codeBookType)- baseuri:
Optional base URI for resources (currently not used; UUID-based IDs are generated)
- files:
Optional list of file IDs to process (not yet implemented)
- use_skos:
Boolean flag to use SKOS mode (True) or standard DDI-CDI mode (False)
Returns
A dictionary mapping resource URIs to DdiCdiResource objects.
Usage Example
Basic Conversion
from dartfx.ddi import ddicodebook
from dartfx.ddi.utils import codebook_to_cdif
# Load DDI-Codebook
cb = ddicodebook.loadxml('survey_data.xml')
# Convert to DDI-CDI CDIF resources (using SKOS)
resources = codebook_to_cdif(cb, use_skos=True)
# Access specific resources
for uri, resource in resources.items():
print(f"{type(resource).__name__}: {uri}")
Standard Mode Conversion
# Convert using standard DDI-CDI mode (without SKOS)
resources = codebook_to_cdif(cb, use_skos=False)
# Find all InstanceVariables
from dartfx.ddi.ddicdi.model_1_0_0 import InstanceVariable
variables = [r for r in resources.values()
if isinstance(r, InstanceVariable)]
print(f"Found {len(variables)} variables")
Converting to RDF Graph
from dartfx.ddi.utils import codebook_to_cdif_graph
# Convert directly to RDF graph
graph = codebook_to_cdif_graph(cb, use_skos=True)
# Serialize to Turtle format
turtle_output = graph.serialize(format='turtle')
print(turtle_output)
# Save to file
graph.serialize('output.ttl', format='turtle')
Exploring Resources
from dartfx.ddi.ddicdi.model_1_0_0 import (
InstanceVariable,
SubstantiveValueDomain,
CodeList,
Category
)
resources = codebook_to_cdif(cb, use_skos=False)
# Count different resource types
resource_counts = {}
for resource in resources.values():
type_name = type(resource).__name__
resource_counts[type_name] = resource_counts.get(type_name, 0) + 1
print("Resource counts:")
for type_name, count in sorted(resource_counts.items()):
print(f" {type_name}: {count}")
Version Information
DDI-Codebook Version: 2.5
DDI-CDI Version: 1.0
Profile: CDIF (Cross Domain Integration Framework)
References
CDIF Profile Documentation
See Also
DDI-Codebook Processing - DDI-Codebook API reference
DDI Cross Domain Integration (CDI) - DDI-CDI API reference
Examples - More conversion examples
This documentation describes the implementation in src/dartfx/ddi/utils.py