Quick Start Guide

This guide will walk you through the basics of using the MTNA RDS Toolkit.

Prerequisites

Make sure you have installed the toolkit. See Installation for details.

Basic Workflow

The typical workflow involves three main steps:

Connect to an RDS server
Browse catalogs and data products
Work with data product metadata and variables

Step 1: Connect to an RDS Server

First, create a connection to an MTNA RDS server:

from dartfx.mtnards import MtnaRdsServer

# Connect to the High-Value Data Network (public server)
server = MtnaRdsServer(host="rds.highvaluedata.net")

# View server information
print(server.info.name)
print(server.info.version)

For servers requiring authentication:

server = MtnaRdsServer(
    host="rds.example.com",
    api_key="your-api-key-here"
)

Or using environment variables:

import os

server = MtnaRdsServer(
    host=os.getenv("MTNA_RDS_HOST"),
    api_key=os.getenv("MTNA_RDS_API_KEY")
)

Step 2: Browse Catalogs

Explore the available catalogs on the server:

# List all catalogs
for catalog_id, catalog in server.catalogs.items():
    print(f"ID: {catalog_id}")
    print(f"Name: {catalog.name}")
    print(f"Description: {catalog.description}")
    print("---")

# Access a specific catalog
anes_catalog = server.catalogs['us-anes']
print(f"Catalog: {anes_catalog.name}")

Step 3: Explore Data Products

Browse data products (datasets) within a catalog:

# List all data products in a catalog
for data_product in anes_catalog.data_products:
    print(f"ID: {data_product.id}")
    print(f"Name: {data_product.name}")
    print("---")

# Access a specific data product by ID
anes_1948 = anes_catalog.data_products_by_id['anes_1948']
print(f"Dataset: {anes_1948.name}")
print(f"Description: {anes_1948.description}")

Step 4: Work with Variables

Examine the variables (fields/columns) in a data product:

# List all variables
for var_id, variable in anes_1948.variables.items():
    print(f"Variable ID: {var_id}")
    print(f"Name: {variable.name}")
    print(f"Label: {variable.label}")
    print(f"Type: {variable.data_type}")
    print("---")

# Access a specific variable
var = anes_1948.variables['V480018']
print(f"Variable: {var.label}")
print(f"Description: {var.description}")

Working with Classifications

For categorical variables, you can access their classification codes:

# Check if a variable has classifications
if var.classification_stub:
    classification = var.classification
    print(f"Classification: {classification.name}")

    # List classification codes
    for code in classification.codes.values():
        print(f"{code.value}: {code.label}")

Common Use Cases

Generate Croissant Metadata

Create ML-ready dataset documentation in Croissant format:

# Generate Croissant metadata for a data product
croissant_metadata = anes_1948.croissant()

# Save to file
with open('anes_1948.croissant.jsonld', 'w') as f:
    f.write(croissant_metadata.to_json())

# Or get as dictionary
metadata_dict = croissant_metadata.metadata.to_json()

Export DCAT/RDF

Generate semantic web metadata in DCAT format:

from dartfx.mtnards.dcat import MtnaRdsDcat

# Create DCAT exporter
dcat = MtnaRdsDcat(
    server=server,
    catalog=anes_catalog,
    data_product=anes_1948,
    uri_style="uuid"  # Options: "uuid", "uuid_urn", "hostname"
)

# Generate RDF graph
graph = dcat.graph()

# Export as Turtle
turtle = graph.serialize(format='turtle')
print(turtle)

# Or save to file
with open('anes_1948.ttl', 'w') as f:
    f.write(turtle)

Generate Markdown Documentation

Create human-readable documentation for a dataset:

# Generate markdown documentation
markdown = anes_1948.markdown()

# Save to file
with open('anes_1948.md', 'w') as f:
    f.write(markdown)

Complete Example

Here’s a complete example that ties everything together:

import os
from dartfx.mtnards import MtnaRdsServer
from dartfx.mtnards.dcat import MtnaRdsDcat

# 1. Connect to server
server = MtnaRdsServer(host="rds.highvaluedata.net")
print(f"Connected to: {server.info.name}")

# 2. Get a catalog
catalog = server.catalogs['us-anes']
print(f"Working with catalog: {catalog.name}")

# 3. Get a data product
data_product = catalog.data_products_by_id['anes_1948']
print(f"Dataset: {data_product.name}")
print(f"Variables: {len(data_product.variables)}")

# 4. Generate metadata in multiple formats

# Croissant for ML
croissant = data_product.croissant()
with open('output.croissant.jsonld', 'w') as f:
    f.write(croissant.to_json())
print("✓ Croissant metadata generated")

# DCAT for semantic web
dcat = MtnaRdsDcat(
    server=server,
    catalog=catalog,
    data_product=data_product
)
with open('output.ttl', 'w') as f:
    f.write(dcat.graph().serialize(format='turtle'))
print("✓ DCAT/RDF metadata generated")

# Markdown documentation
with open('output.md', 'w') as f:
    f.write(data_product.markdown())
print("✓ Markdown documentation generated")

print("\nAll metadata files generated successfully!")

Error Handling

Always handle potential errors when working with remote servers:

try:
    server = MtnaRdsServer(host="rds.highvaluedata.net")
    catalogs = server.catalogs
except Exception as e:
    print(f"Error connecting to server: {e}")

# Check if a catalog exists before accessing
if 'us-anes' in server.catalogs:
    catalog = server.catalogs['us-anes']
else:
    print("Catalog not found")

Next Steps

Explore more detailed Examples
Read the Core API Reference reference
Check out the GitHub repository for more examples