DDI-Codebook to DDI-CDI CDIF Mappings ====================================== Overview -------- This document describes the mappings implemented in the ``codebook_to_cdif`` method that converts DDI-Codebook (version 2.5) metadata into DDI-CDI (version 1.0) CDIF profile resources. The conversion process transforms DDI-Codebook elements into a dictionary of DDI-CDI resources that can be serialized to RDF or other formats. The mapping follows the Cross Domain Integration Framework (CDIF) profile specifications. General Approach ---------------- - Each DDI-Codebook resource is assigned a UUID-based identifier - Codebook IDs are preserved as non-DDI identifiers with type ``"ddi-codebook"`` - The method supports two modes for representing value domains: - **SKOS mode** (``use_skos=True``): Uses SKOS ConceptSchemes and Concepts - **Standard mode** (``use_skos=False``): Uses DDI-CDI Code, CodeList, Category, and CategorySet Variable-Level Mappings ----------------------- DDI-Codebook Variable → DDI-CDI InstanceVariable ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For each variable (``var``) in the codebook: .. list-table:: :header-rows: 1 :widths: 30 30 40 * - Source (DDI-Codebook) - Target (DDI-CDI) - Notes * - ``var/@ID`` - ``InstanceVariable.id_suffix`` - Used as part of the unique identifier * - ``var/@ID`` - ``InstanceVariable.non_ddi_id`` - Preserved with type ``"ddi-codebook"`` * - ``var/varName`` or ``var/@name`` - ``InstanceVariable.name`` - Set via ``set_simple_name()`` * - ``var/labl`` - ``InstanceVariable.displayLabel`` - Set via ``set_simple_display_label()`` Value Domain Mappings --------------------- Variables with categories are mapped to value domains. The mapping depends on whether categories are substantive (data values) or sentinel (missing values). Substantive Value Domain (Non-Missing Categories) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Created when ``var.n_non_missing_catgry > 0`` Standard Mode (use_skos=False) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mapping hierarchy:: DDI-Codebook Variable (with non-missing categories) ↓ DDI-CDI SubstantiveValueDomain ↓ (takesValuesFrom) DDI-CDI CodeList ↓ (has CategorySet) DDI-CDI CategorySet .. list-table:: :header-rows: 1 :widths: 30 30 40 * - Source - Target - Relationship * - ``var`` (with categories) - ``SubstantiveValueDomain`` - Created with ``id_suffix=var.id`` * - SubstantiveValueDomain - InstanceVariable - Relationship: ``takesSubstantiveValues`` * - SubstantiveValueDomain - CodeList - Relationship: ``takesValuesFrom`` * - CodeList - CategorySet - Relationship: ``has`` (via ``set_category_set()``) SKOS Mode (use_skos=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^ Mapping hierarchy:: DDI-Codebook Variable (with non-missing categories) ↓ DDI-CDI SubstantiveValueDomain ↓ (takesValuesFrom) SKOS ConceptScheme ↓ (hasTopConcept) SKOS Concept(s) .. list-table:: :header-rows: 1 :widths: 30 30 40 * - Source - Target - Notes * - ``var`` (with categories) - ``SubstantiveValueDomain`` - Created with ``id_suffix=var.id`` * - SubstantiveValueDomain - SKOS ConceptScheme - URI: ``{base_uuid}_SubstantiveConceptScheme_{var.id}`` Sentinel Value Domain (Missing Categories) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Created when ``var.n_missing_catgry > 0`` Standard Mode (use_skos=False) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mapping hierarchy:: DDI-Codebook Variable (with missing categories) ↓ DDI-CDI SentinelValueDomain ↓ (takesValuesFrom) DDI-CDI CodeList (sentinel) ↓ (has CategorySet) DDI-CDI CategorySet (sentinel) .. list-table:: :header-rows: 1 :widths: 30 30 40 * - Source - Target - Relationship * - ``var`` (with missing categories) - ``SentinelValueDomain`` - Created with ``id_suffix=var.id`` * - SentinelValueDomain - InstanceVariable - Relationship: ``takesSentinelValues`` * - SentinelValueDomain - CodeList - Relationship: ``takesValuesFrom``, ID suffix: ``var.id + "_sentinel"`` * - CodeList - CategorySet - Relationship: ``has``, ID suffix: ``var.id + "_sentinel"`` SKOS Mode (use_skos=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^ Mapping hierarchy:: DDI-Codebook Variable (with missing categories) ↓ DDI-CDI SentinelValueDomain ↓ (takesValuesFrom) SKOS ConceptScheme (sentinel) ↓ (hasTopConcept) SKOS Concept(s) .. list-table:: :header-rows: 1 :widths: 30 30 40 * - Source - Target - Notes * - ``var`` (with missing categories) - ``SentinelValueDomain`` - Created with ``id_suffix=var.id`` * - SentinelValueDomain - SKOS ConceptScheme - URI: ``{base_uuid}_SentinelConceptScheme_{var.id}`` Category and Code Mappings --------------------------- For each category (``catgry``) within a variable: Standard Mode (use_skos=False) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Mapping hierarchy:: DDI-Codebook catgry ↓ DDI-CDI Code ← (usesNotation) ← Notation ↓ (denotes) DDI-CDI Category .. list-table:: :header-rows: 1 :widths: 30 30 40 * - Source (DDI-Codebook) - Target (DDI-CDI) - Notes * - ``catgry/catValu`` - ``Code.identifier`` - Sanitized and URL-encoded as ``code_value_uid`` * - ``catgry/catValu`` - Non-DDI identifier on Code - Type: ``"code-value"`` * - ``catgry/labl`` - ``Notation.content`` - If label exists; otherwise uses ``catValu`` * - ``catgry/labl`` - ``Category.name`` - Set via ``set_simple_name()`` * - Code - Notation - Relationship: ``usesNotation`` * - Code - Category - Relationship: ``denotes`` (via ``set_category()``) * - Notation - Category - Relationship: ``formats`` (via ``set_category()``) **Code-Category Distribution:** - If ``catgry.is_missing == False``: Added to substantive CodeList and CategorySet - If ``catgry.is_missing == True``: Added to sentinel CodeList and CategorySet SKOS Mode (use_skos=True) ~~~~~~~~~~~~~~~~~~~~~~~~~~ Mapping hierarchy:: DDI-Codebook catgry ↓ SKOS Concept .. list-table:: :header-rows: 1 :widths: 30 30 40 * - Source (DDI-Codebook) - Target (SKOS) - Notes * - ``catgry/catValu`` - ``Concept.notation`` - Added via ``add_notation()`` * - ``catgry/labl`` - ``Concept.prefLabel`` - Added via ``add_pref_label()`` if exists * - Concept - ConceptScheme - Relationship: ``hasTopConcept`` (substantive or sentinel based on ``is_missing``) **Concept URI Format:** .. code-block:: text {base_uuid}_Concept_{var.id}_{code_value_uid} Where ``code_value_uid`` is the URL-encoded, sanitized version of ``catValu``. Dataset and Structure Mappings ------------------------------- For each file description (``fileDscr``) in the codebook: DDI-Codebook fileDscr → DDI-CDI DataSet ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. list-table:: :header-rows: 1 :widths: 30 30 40 * - Source (DDI-Codebook) - Target (DDI-CDI) - Notes * - ``fileDscr/@ID`` - ``DataSet.id_suffix`` - Used as part of unique identifier * - ``fileDscr/@ID`` - ``DataSet.non_ddi_id`` - Preserved with type ``"ddi-codebook"`` DDI-Codebook fileDscr → DDI-CDI LogicalRecord ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. list-table:: :header-rows: 1 :widths: 30 30 40 * - Source (DDI-Codebook) - Target (DDI-CDI) - Notes * - ``fileDscr/@ID`` - ``LogicalRecord.id_suffix`` - Used as part of unique identifier * - ``fileDscr/@ID`` - ``LogicalRecord.non_ddi_id`` - Preserved with type ``"ddi-codebook"`` * - LogicalRecord - DataSet - Relationship: ``correspondsTo`` (via ``add_dataset()``) * - LogicalRecord - InstanceVariable(s) - Relationship: ``has`` (via ``add_variable()``) for each variable in file DDI-Codebook → DDI-CDI DataStructure ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. list-table:: :header-rows: 1 :widths: 30 30 40 * - Source (DDI-Codebook) - Target (DDI-CDI) - Notes * - ``codebook/@ID`` - ``DataStructure.id_suffix`` - Uses codebook ID, not file ID * - ``codebook/@ID`` - ``DataStructure.non_ddi_id`` - Preserved with type ``"ddi-codebook"`` * - DataStructure - DataSet - Relationship: ``structures`` (via ``add_data_structure()``) Variable Positioning in DataStructure ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For each variable in a file: .. list-table:: :header-rows: 1 :widths: 30 70 * - Attribute - Value / Notes * - Position - Sequential (0, 1, 2...) - Zero-based order within file * - ComponentPosition - Created for each variable to track its ordinal sequence in the data structure Mapping hierarchy:: DataStructure ↓ (has_ComponentPosition) ComponentPosition (value = pos) Resource Organization --------------------- All created resources are stored in a dictionary with their URI as the key: .. code-block:: python { "uri1": InstanceVariable, "uri2": SubstantiveValueDomain, "uri3": CodeList, "uri4": Category, ... } This structure allows for: - Efficient lookup by URI - Easy serialization to RDF via ``add_to_rdf_graph()`` - Preservation of all relationships between resources Identifier Strategy ------------------- UUID Generation ~~~~~~~~~~~~~~~ - A single ``base_uuid`` is generated for the entire conversion - All resource IDs use this base UUID with unique suffixes ID Suffix Patterns ~~~~~~~~~~~~~~~~~~ .. list-table:: :header-rows: 1 :widths: 30 30 40 * - Resource Type - Suffix Pattern - Example * - InstanceVariable - ``{var.id}`` - ``VAR001`` * - SubstantiveValueDomain - ``{var.id}`` - ``VAR001`` * - SentinelValueDomain - ``{var.id}`` - ``VAR001`` * - Substantive CodeList - ``{var.id}`` - ``VAR001`` * - Sentinel CodeList - ``{var.id}_sentinel`` - ``VAR001_sentinel`` * - Substantive CategorySet - ``{var.id}`` - ``VAR001`` * - Sentinel CategorySet - ``{var.id}_sentinel`` - ``VAR001_sentinel`` * - Code/Category/Notation - ``{var.id}_{code_value_uid}`` - ``VAR001_1`` * - DataSet - ``{file.id}`` - ``FILE001`` * - LogicalRecord - ``{file.id}`` - ``FILE001`` * - DataStructure - ``{codebook.id}`` - ``CODEBOOK001`` Non-DDI Identifiers ~~~~~~~~~~~~~~~~~~~ All resources that map from DDI-Codebook elements preserve the original ID: - Type: ``"ddi-codebook"`` - Value: Original ``@ID`` attribute from Codebook Important Assumptions --------------------- 1. **ID Requirements**: The codebook files and variables must have their ``@ID`` attribute set 2. **File Subsetting**: The ``files`` parameter for selective conversion is not yet implemented 3. **Category Classification**: Categories are classified as missing/sentinel based on the ``is_missing`` attribute 4. **Label Fallback**: If a category has no label, the code value is used as the label 5. **URI Sanitization**: Code values are URL-encoded and spaces are replaced with underscores for URI safety Processing Order ---------------- 1. **Variables**: Process all variables and their categories first 2. **Datasets**: Process file descriptions and create dataset structures 3. **Associations**: Link variables to logical records and data structures This order ensures that all InstanceVariable objects are created before they are referenced by LogicalRecords and DataStructures. SKOS vs Standard Mode Comparison --------------------------------- .. list-table:: :header-rows: 1 :widths: 20 40 40 * - Aspect - SKOS Mode - Standard Mode * - Value representation - SKOS ConceptScheme + Concepts - CodeList + CategorySet + Code + Category * - Notation - On Concept - Separate Notation resource * - Label - prefLabel on Concept - Name on Category + content on Notation * - Hierarchy - hasTopConcept relationship - Code denotes Category * - Complexity - Simpler (2 resource types) - More complex (4 resource types) * - Standards alignment - Uses W3C SKOS - Pure DDI-CDI Method Signature ---------------- .. code-block:: python def codebook_to_cdif( codebook: codeBookType, baseuri: str = None, files: list[str] = None, use_skos: bool = True ) -> dict[str, DdiCdiResource] Parameters ~~~~~~~~~~ :codebook: The DDI-Codebook object to convert (must be ``codeBookType``) :baseuri: Optional base URI for resources (currently not used; UUID-based IDs are generated) :files: Optional list of file IDs to process (not yet implemented) :use_skos: Boolean flag to use SKOS mode (True) or standard DDI-CDI mode (False) Returns ~~~~~~~ A dictionary mapping resource URIs to ``DdiCdiResource`` objects. Usage Example ------------- Basic Conversion ~~~~~~~~~~~~~~~~ .. code-block:: python from dartfx.ddi import ddicodebook from dartfx.ddi.utils import codebook_to_cdif # Load DDI-Codebook cb = ddicodebook.loadxml('survey_data.xml') # Convert to DDI-CDI CDIF resources (using SKOS) resources = codebook_to_cdif(cb, use_skos=True) # Access specific resources for uri, resource in resources.items(): print(f"{type(resource).__name__}: {uri}") Standard Mode Conversion ~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python # Convert using standard DDI-CDI mode (without SKOS) resources = codebook_to_cdif(cb, use_skos=False) # Find all InstanceVariables from dartfx.ddi.ddicdi.model_1_0_0 import InstanceVariable variables = [r for r in resources.values() if isinstance(r, InstanceVariable)] print(f"Found {len(variables)} variables") Converting to RDF Graph ~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from dartfx.ddi.utils import codebook_to_cdif_graph # Convert directly to RDF graph graph = codebook_to_cdif_graph(cb, use_skos=True) # Serialize to Turtle format turtle_output = graph.serialize(format='turtle') print(turtle_output) # Save to file graph.serialize('output.ttl', format='turtle') Exploring Resources ~~~~~~~~~~~~~~~~~~~ .. code-block:: python from dartfx.ddi.ddicdi.model_1_0_0 import ( InstanceVariable, SubstantiveValueDomain, CodeList, Category ) resources = codebook_to_cdif(cb, use_skos=False) # Count different resource types resource_counts = {} for resource in resources.values(): type_name = type(resource).__name__ resource_counts[type_name] = resource_counts.get(type_name, 0) + 1 print("Resource counts:") for type_name, count in sorted(resource_counts.items()): print(f" {type_name}: {count}") Related Functions ----------------- codebook_to_cdif_graph() ~~~~~~~~~~~~~~~~~~~~~~~~~ Helper function that wraps ``codebook_to_cdif()`` and converts the result to an RDF Graph: .. code-block:: python def codebook_to_cdif_graph( codebook: codeBookType, baseuri: str = None, files: list[str] = None, use_skos: bool = True ) -> Graph ddi_cdi_resources_to_graph() ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Utility function to convert a dictionary of DDI-CDI resources to an RDF Graph: .. code-block:: python def ddi_cdi_resources_to_graph( resources: dict[str, DdiCdiResource] ) -> Graph Version Information ------------------- - **DDI-Codebook Version**: 2.5 - **DDI-CDI Version**: 1.0 - **Profile**: CDIF (Cross Domain Integration Framework) References ---------- - `DDI-Codebook 2.5 Specification `_ - `DDI-CDI 1.0 Specification `_ - CDIF Profile Documentation - `W3C SKOS (Simple Knowledge Organization System) `_ See Also -------- - :doc:`ddicodebook` - DDI-Codebook API reference - :doc:`ddicdi` - DDI-CDI API reference - :doc:`examples` - More conversion examples ---- *This documentation describes the implementation in* ``src/dartfx/ddi/utils.py``