Query Patterns
This guide covers common query patterns for consuming EA Consent data from the Delta Share.
Table of Contents
- Query Patterns
Overview
The EA Consent Delta Share uses the ea-consent-tb table as the primary table for both discovery and verification.
Typical workflow:
- Query
ea-consent-tbto find consents (by subject binding digest, linkage tokens, status, time, etc.) - Extract the Trust Block JWT from the
trust_blockfield - Verify the Trust Block signature using keys from
ea-consent-verification-keys
Incremental Retrieval
The most common pattern is incremental retrieval - fetching only new consents since your last processing run.
Watermark Pattern Using issuance_ts
Use issuance_ts as your watermark field. This represents the business time when the consent was issued (not when it was ingested).
SQL Pattern:
SELECT *
FROM ea_consent_tb
WHERE issuance_ts > '<last_seen_issuance_ts>'
ORDER BY issuance_ts ASC;
Python Example:
import delta_sharing
from datetime import datetime
profile_file = "/path/to/profile.json"
tb_url = f"{profile_file}#ea-consent-share.ea-consent-schema.ea-consent-tb"
# Load Trust Block table
df = delta_sharing.load_as_pandas(tb_url)
# Filter for new consents since last run
last_seen = datetime(2025, 1, 15, 0, 0, 0) # Your stored watermark
new_consents = df[df['issuance_ts'] > last_seen].sort_values('issuance_ts')
print(f"Found {len(new_consents)} new consents")
# Update your watermark to the latest issuance_ts
if len(new_consents) > 0:
new_watermark = new_consents['issuance_ts'].max()
print(f"New watermark: {new_watermark}")
Why issuance_ts, Not ingestion_ts?
| Field | Meaning | Use For |
|---|---|---|
issuance_ts |
When the consent was issued (business time) | Watermarking, ordering |
ingestion_ts |
When the record was loaded into Delta | Debugging, auditing |
Consents may be issued before they’re ingested (batch processing, delays). Using issuance_ts ensures you process consents in business order.
Tracking Your Watermark
Store your watermark between processing runs:
import json
from pathlib import Path
WATERMARK_FILE = "watermark.json"
def load_watermark():
"""Load last seen timestamp from file."""
if Path(WATERMARK_FILE).exists():
with open(WATERMARK_FILE) as f:
data = json.load(f)
return datetime.fromisoformat(data['last_seen_ts'])
return datetime.min # First run - get everything
def save_watermark(ts):
"""Save new watermark to file."""
with open(WATERMARK_FILE, 'w') as f:
json.dump({'last_seen_ts': ts.isoformat()}, f)
Subject Lookup
Query consents using privacy-preserving correlation keys to link consent data to your existing user records.
Query by Subject Binding Digest
The subject_binding_digest is a cryptographic digest that binds a consent to a specific subject’s identity.
SELECT *
FROM ea_consent_tb
WHERE subject_binding_digest = 'kJ9gS3mK7pL2nQ8rT4vW6xY0zA1bC2dE3fG4hI5jK6l'
AND status_code = 'active';
Python:
df = delta_sharing.load_as_pandas(tb_url)
# Find active consents for a subject
user_consents = df[
(df['subject_binding_digest'] == 'kJ9gS3mK7pL2nQ8rT4vW6xY0zA1bC2dE3fG4hI5jK6l') &
(df['status_code'] == 'active')
]
Query by Linkage Tokens
Linkage tokens enable cross-party record linking without exposing PII. The table supports up to 3 linkage token slots.
-- Find by Datavant token
SELECT *
FROM ea_consent_tb
WHERE linkage_1_system = 'datavant-health-v3'
AND linkage_1_token = 'DV:abc123def456...'
AND status_code = 'active';
-- Find by any linkage token (check all three slots)
SELECT *
FROM ea_consent_tb
WHERE (linkage_1_token = 'MMID:patient-12345-67890-abcde-fghij')
OR (linkage_2_token = 'MMID:patient-12345-67890-abcde-fghij')
OR (linkage_3_token = 'MMID:patient-12345-67890-abcde-fghij');
Direct Consent Lookup
If you already know the consent ID, retrieve it directly from ea-consent-tb:
SELECT trust_block, trust_block_format_type, status_code, issuance_ts
FROM ea_consent_tb
WHERE consent_issuer = 'https://issuer.example.org'
AND consent_id = '85389cfb-75c5-434e-b2e6-651fc75a6ae5';
Python:
# Direct lookup by primary key
df = delta_sharing.load_as_pandas(tb_url)
consent = df[
(df['consent_issuer'] == 'https://issuer.example.org') &
(df['consent_id'] == '85389cfb-75c5-434e-b2e6-651fc75a6ae5')
]
if not consent.empty:
trust_block = consent.iloc[0]['trust_block']
print(f"Found Trust Block: {trust_block[:50]}...")
Common Query Examples
Find All Active Consents for a Subject
SELECT
consent_id,
consent_issuer,
trust_block_id,
issuance_ts,
subject_binding_digest,
linkage_1_token
FROM ea_consent_tb
WHERE subject_binding_digest = 'kJ9gS3mK7pL2nQ8rT4vW6xY0zA1bC2dE3fG4hI5jK6l'
AND status_code = 'active'
ORDER BY issuance_ts DESC;
Get Newly Issued Consents Since Last Run
SELECT
consent_id,
consent_issuer,
trust_block_id,
issuance_ts,
status_code,
subject_binding_digest
FROM ea_consent_tb
WHERE issuance_ts > '2025-01-15T00:00:00Z'
ORDER BY issuance_ts ASC;
Count Consents by Status
SELECT
status_code,
COUNT(*) as count
FROM ea_consent_tb
GROUP BY status_code;
Retrieve Trust Block for Verification
SELECT
consent_id,
consent_issuer,
trust_block,
trust_block_format_type,
issuance_ts
FROM ea_consent_tb
WHERE consent_issuer = 'https://issuer.example.org'
AND consent_id = '85389cfb-75c5-434e-b2e6-651fc75a6ae5';
Performance Tips
Use Partition Pruning
The ea-consent-tb table is partitioned by issuance_date. Include date filters when possible:
-- Efficient: Uses partition pruning
SELECT *
FROM ea_consent_tb
WHERE issuance_date >= '2025-01-01'
AND issuance_ts > '2025-01-15T00:00:00Z';
-- Less efficient: Scans all partitions
SELECT *
FROM ea_consent_tb
WHERE issuance_ts > '2025-01-15T00:00:00Z';
Limit Result Sets
For exploratory queries, always use LIMIT:
SELECT *
FROM ea_consent_tb
WHERE status_code = 'active'
LIMIT 100;
Select Only Needed Columns
# Efficient: Load only needed columns
df = delta_sharing.load_as_pandas(tb_url)
df_subset = df[['consent_id', 'consent_issuer', 'subject_binding_digest', 'status_code']]
Next Steps
- Trust Block Verification - Verify cryptographic signatures on the Trust Blocks you retrieve
- Schema Reference - Complete column definitions and constraints