Query Patterns

This guide covers common query patterns for consuming EA Consent data from the Delta Share.

Table of Contents

  1. Query Patterns
    1. Overview
    2. Incremental Retrieval
      1. Watermark Pattern Using issuance_ts
      2. Why issuance_ts, Not ingestion_ts?
      3. Tracking Your Watermark
    3. Subject Lookup
      1. Query by Subject Binding Digest
      2. Query by Linkage Tokens
    4. Direct Consent Lookup
    5. Common Query Examples
      1. Find All Active Consents for a Subject
      2. Get Newly Issued Consents Since Last Run
      3. Count Consents by Status
      4. Retrieve Trust Block for Verification
    6. Performance Tips
      1. Use Partition Pruning
      2. Limit Result Sets
      3. Select Only Needed Columns
    7. Next Steps

Overview

The EA Consent Delta Share uses the ea-consent-tb table as the primary table for both discovery and verification.

Typical workflow:

  1. Query ea-consent-tb to find consents (by subject binding digest, linkage tokens, status, time, etc.)
  2. Extract the Trust Block JWT from the trust_block field
  3. Verify the Trust Block signature using keys from ea-consent-verification-keys

Incremental Retrieval

The most common pattern is incremental retrieval - fetching only new consents since your last processing run.

Watermark Pattern Using issuance_ts

Use issuance_ts as your watermark field. This represents the business time when the consent was issued (not when it was ingested).

SQL Pattern:

SELECT *
FROM ea_consent_tb
WHERE issuance_ts > '<last_seen_issuance_ts>'
ORDER BY issuance_ts ASC;

Python Example:

import delta_sharing
from datetime import datetime

profile_file = "/path/to/profile.json"
tb_url = f"{profile_file}#ea-consent-share.ea-consent-schema.ea-consent-tb"

# Load Trust Block table
df = delta_sharing.load_as_pandas(tb_url)

# Filter for new consents since last run
last_seen = datetime(2025, 1, 15, 0, 0, 0)  # Your stored watermark
new_consents = df[df['issuance_ts'] > last_seen].sort_values('issuance_ts')

print(f"Found {len(new_consents)} new consents")

# Update your watermark to the latest issuance_ts
if len(new_consents) > 0:
    new_watermark = new_consents['issuance_ts'].max()
    print(f"New watermark: {new_watermark}")

Why issuance_ts, Not ingestion_ts?

Field Meaning Use For
issuance_ts When the consent was issued (business time) Watermarking, ordering
ingestion_ts When the record was loaded into Delta Debugging, auditing

Consents may be issued before they’re ingested (batch processing, delays). Using issuance_ts ensures you process consents in business order.

Tracking Your Watermark

Store your watermark between processing runs:

import json
from pathlib import Path

WATERMARK_FILE = "watermark.json"

def load_watermark():
    """Load last seen timestamp from file."""
    if Path(WATERMARK_FILE).exists():
        with open(WATERMARK_FILE) as f:
            data = json.load(f)
            return datetime.fromisoformat(data['last_seen_ts'])
    return datetime.min  # First run - get everything

def save_watermark(ts):
    """Save new watermark to file."""
    with open(WATERMARK_FILE, 'w') as f:
        json.dump({'last_seen_ts': ts.isoformat()}, f)

Subject Lookup

Query consents using privacy-preserving correlation keys to link consent data to your existing user records.

Query by Subject Binding Digest

The subject_binding_digest is a cryptographic digest that binds a consent to a specific subject’s identity.

SELECT *
FROM ea_consent_tb
WHERE subject_binding_digest = 'kJ9gS3mK7pL2nQ8rT4vW6xY0zA1bC2dE3fG4hI5jK6l'
  AND status_code = 'active';

Python:

df = delta_sharing.load_as_pandas(tb_url)

# Find active consents for a subject
user_consents = df[
    (df['subject_binding_digest'] == 'kJ9gS3mK7pL2nQ8rT4vW6xY0zA1bC2dE3fG4hI5jK6l') &
    (df['status_code'] == 'active')
]

Query by Linkage Tokens

Linkage tokens enable cross-party record linking without exposing PII. The table supports up to 3 linkage token slots.

-- Find by Datavant token
SELECT *
FROM ea_consent_tb
WHERE linkage_1_system = 'datavant-health-v3'
  AND linkage_1_token = 'DV:abc123def456...'
  AND status_code = 'active';

-- Find by any linkage token (check all three slots)
SELECT *
FROM ea_consent_tb
WHERE (linkage_1_token = 'MMID:patient-12345-67890-abcde-fghij')
   OR (linkage_2_token = 'MMID:patient-12345-67890-abcde-fghij')
   OR (linkage_3_token = 'MMID:patient-12345-67890-abcde-fghij');

If you already know the consent ID, retrieve it directly from ea-consent-tb:

SELECT trust_block, trust_block_format_type, status_code, issuance_ts
FROM ea_consent_tb
WHERE consent_issuer = 'https://issuer.example.org'
  AND consent_id = '85389cfb-75c5-434e-b2e6-651fc75a6ae5';

Python:

# Direct lookup by primary key
df = delta_sharing.load_as_pandas(tb_url)
consent = df[
    (df['consent_issuer'] == 'https://issuer.example.org') &
    (df['consent_id'] == '85389cfb-75c5-434e-b2e6-651fc75a6ae5')
]

if not consent.empty:
    trust_block = consent.iloc[0]['trust_block']
    print(f"Found Trust Block: {trust_block[:50]}...")

Common Query Examples

Find All Active Consents for a Subject

SELECT
    consent_id,
    consent_issuer,
    trust_block_id,
    issuance_ts,
    subject_binding_digest,
    linkage_1_token
FROM ea_consent_tb
WHERE subject_binding_digest = 'kJ9gS3mK7pL2nQ8rT4vW6xY0zA1bC2dE3fG4hI5jK6l'
  AND status_code = 'active'
ORDER BY issuance_ts DESC;

Get Newly Issued Consents Since Last Run

SELECT
    consent_id,
    consent_issuer,
    trust_block_id,
    issuance_ts,
    status_code,
    subject_binding_digest
FROM ea_consent_tb
WHERE issuance_ts > '2025-01-15T00:00:00Z'
ORDER BY issuance_ts ASC;

Count Consents by Status

SELECT
    status_code,
    COUNT(*) as count
FROM ea_consent_tb
GROUP BY status_code;

Retrieve Trust Block for Verification

SELECT
    consent_id,
    consent_issuer,
    trust_block,
    trust_block_format_type,
    issuance_ts
FROM ea_consent_tb
WHERE consent_issuer = 'https://issuer.example.org'
  AND consent_id = '85389cfb-75c5-434e-b2e6-651fc75a6ae5';

Performance Tips

Use Partition Pruning

The ea-consent-tb table is partitioned by issuance_date. Include date filters when possible:

-- Efficient: Uses partition pruning
SELECT *
FROM ea_consent_tb
WHERE issuance_date >= '2025-01-01'
  AND issuance_ts > '2025-01-15T00:00:00Z';

-- Less efficient: Scans all partitions
SELECT *
FROM ea_consent_tb
WHERE issuance_ts > '2025-01-15T00:00:00Z';

Limit Result Sets

For exploratory queries, always use LIMIT:

SELECT *
FROM ea_consent_tb
WHERE status_code = 'active'
LIMIT 100;

Select Only Needed Columns

# Efficient: Load only needed columns
df = delta_sharing.load_as_pandas(tb_url)
df_subset = df[['consent_id', 'consent_issuer', 'subject_binding_digest', 'status_code']]

Next Steps