Delta Share Schema
Overview
The EA Consent Delta Share system uses two Delta tables to manage and share consent data:
- ea-consent-tb for full verifiable payloads/audit and discovery
- ea-consent-verification-keys for signature verification
How to consume
- Watermark pull: issuance_ts >
from ea-consent-tb. - Filter on status_code and subject_binding_digest or linkage tokens.
- Fetch active key for issuer+kid from ea-consent-verification-keys.
- Verify trust block → enforce consent.
Design Principles
- Append-only: Consents are never updated; new records are created for opt-outs or changes.
- Ordering: Determined by
issuance_ts(business time), not update time.issuance_dateis derived for partitioning only. - Immutable SubjectSnapshot: Subject data is frozen at issuance; downstream identity linking handles changes in email/phone/etc.
- Payload Integrity: Preserve full JSON/JWT payload with digests and signatures for non-repudiation.
Expected Queries
The EA Consent Delta Share Schema must efficiently support the following query patterns:
- Subject Linking
- Retrieve consents using privacy-preserving correlation keys (
subject_binding_digestorlinkage_*tokens). - Enable joining or linking consents to other Personal Privacy Network (PPN) records.
- Example: “Find all consents with subject_binding_digest = X or linkage_1_token = Y.”
- Retrieve consents using privacy-preserving correlation keys (
- Incremental Retrieval
- Fetch all new consent records issued since the last processing run.
- Use
issuance_tsas the primary ordering and watermark field. - Example: “Give me all consents where issuance_ts > 2025-08-01T00:00:00Z.”
- Consent Classification
- Quickly determine whether a consent is relevant based on status.
- Filter by classification field (
status_code). - Example: “Find all active consents.”
- Consent Verification
- Once a relevant consent is identified, extract and verify the full Trust Block (compact JWT).
Schema Details
ea-consent-tb
Append-only, full verifiable payload table
Purpose: Stores the complete Trust Block payload (which contains the EA Consent Verifiable Credential). This is the primary table for consent discovery, retrieval, and verification. Consumers filter and retrieve rows from this table, then verify signatures using keys from ea_consent_verification_keys.
Consumer Guidance
Partners should treat this table as:
- The authoritative record – contains the exact Trust Block as issued.
- The discovery surface – use subject_binding_digest and linkage tokens for filtering & incremental pulls.
- The verification surface – consumers must verify the Trust Block / VC signature before enforcement.
- The audit surface – this is the source for compliance review and non-repudiation.
Primary Key (consent_issuer, consent_id) This pair is globally unique per issued consent.
Table Properties:
delta.appendOnly = true- Partitioned by
issuance_date - Recommended clustering (Z-ORDER) on
subject_binding_digest,linkage_1_token,status_codefor efficient filtering
Columns:
Core Identifiers
consent_id(STRING) (REQUIRED) – Unique per consent within issuer. FK from VC.consent_issuer(STRING) (REQUIRED) - Issuing entity/system. FK from VC.trust_block_id(STRING) (REQUIRED) - Unique identifier for the Trust Block containing this consent. Source: Trust Blockidfield.trust_block_issuer(STRING) (REQUIRED) - Issuing entity of the Trust Block. Source: Trust Blockissuerfield.
Consent Classification
status_code(STRING) (REQUIRED) - Normalized consent status for enforcement- Allowed lowercase “active” or “inactive”
- Source: ConsentVC
credentialSubject.consent.status - Example: “active”
Time Fields
ingestion_ts(TIMESTAMP) (REQUIRED) – load/arrival timestamp (UTC) down to seconds.issuance_ts(TIMESTAMP) (REQUIRED) – exact issuance instant (UTC) of the EA Consent down to seconds. Often earlier than ingestion.issuance_date(DATE) (DERIVIED)– derived from issuance_ts, used for partitioning and watermarks.
Payload fields
trust_block_format_type: (STRING) (REQUIRED) - Format of trust_block. Allowed values: COMPACT_JWT, B64_JSONtrust_block:(STRING) (REQUIRED) - The complete Trust Block payload (contains the EA Consent VC). Stored exactly as issued.
Subject Binding
subject_binding_digest(STRING) (REQUIRED) - Cryptographic digest binding this consent to the subject’s identity at consent time. Enables querying all consents for a specific subject. (base64url-encoded SHA-256)- Source:
credentialSubject.person_credential_ref.variants[0].digest.value - Purpose: Privacy-preserving correlation key for finding all consents belonging to the same subject
- Source:
Linkage Tokens (Privacy-Preserving Cross-Party Correlation)
Enable cross-party record linking without exposing raw PII using industry-standard tokenization services. Up to 3 linkage token slots are available:
linkage_1_system(STRING) (OPTIONAL) - Tokenization system identifier for first linkage token (e.g., “datavant-health-v3”, “milliman-deterministic-v1”)linkage_1_token(STRING) (OPTIONAL) - Opaque token value for first linkage token (e.g., “DV:abc123…”)linkage_2_system(STRING) (OPTIONAL) - Tokenization system identifier for second linkage tokenlinkage_2_token(STRING) (OPTIONAL) - Opaque token value for second linkage tokenlinkage_3_system(STRING) (OPTIONAL) - Tokenization system identifier for third linkage tokenlinkage_3_token(STRING) (OPTIONAL) - Opaque token value for third linkage token
Source: EA Person VC credentialSubject.linkage[] array
Purpose: Match consents to external systems using deterministic tokens without sharing identifying information. Organizations can correlate records for the same individual across different systems using these privacy-preserving identifiers.
Example Query Pattern:
-- Find all consents for subjects matching a Datavant token
SELECT consent_id, consent_issuer, subject_binding_digest
FROM ea_consent_tb
WHERE linkage_1_system = 'datavant-health-v3'
AND linkage_1_token = 'DV:abc123def456...';
Metadata
privacy_algorithm_id(STRING) (OPTIONAL) - ID of privacy algorithm applied to table contents, not part of consent. Metadata only.
Constraints/Expectations:
- Append-only: a change/opt-out is a new row with a new issuance_ts.
- Canonical ordering/watermark: issuance_ts (not ingestion time).
- Payload integrity: trust_block is preserved verbatim; consumers should not assume JSON re-serialization.
- Both payload fields always present: trust_block and trust_block_format_type are never NULL.
Example Row (simplified)
{
"consent_id": "85389cfb-75c5-434e-b2e6-651fc75a6ae5",
"consent_issuer": "https://issuer.example.org",
"trust_block_id": "urn:uuid:tb-abc123-def456-ghi789",
"trust_block_issuer": "https://trustblock.issuer.example.org",
"status_code": "active",
"issuance_ts": "2025-02-10T15:30:00Z",
"issuance_date": "2025-02-10",
"ingestion_ts": "2025-02-10T16:00:05Z",
"trust_block_format_type": "COMPACT_JWT",
"trust_block": "eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9....",
"subject_binding_digest": "kJ9gS3mK7pL2nQ8rT4vW6xY0zA1bC2dE3fG4hI5jK6l",
"linkage_1_system": "datavant-health-v3",
"linkage_1_token": "DV:abc123def456ghi789jkl012mno345pqr678stu901vwx234yz",
"linkage_2_system": "milliman-deterministic-v1",
"linkage_2_token": "MMID:patient-12345-67890-abcde-fghij",
"linkage_3_system": null,
"linkage_3_token": null,
"privacy_algorithm_id": null
}
Typical Query Patterns
Retrieve consent by primary key
SELECT *
FROM ea_consent_tb
WHERE consent_issuer = '<issuer>'
AND consent_id = '<consent_id>'
LIMIT 1;
Find consents by subject binding digest
SELECT *
FROM ea_consent_tb
WHERE subject_binding_digest = '<digest>'
ORDER BY issuance_ts DESC;
Find consents by linkage token
SELECT *
FROM ea_consent_tb
WHERE linkage_1_system = 'datavant-health-v3'
AND linkage_1_token = '<token>'
ORDER BY issuance_ts DESC;
Incremental pull (watermark pattern)
SELECT *
FROM ea_consent_tb
WHERE issuance_ts > '<last_seen_issuance_ts>'
ORDER BY issuance_ts ASC;
ea-consent-verification-keys
Append-only, ingestion-versioned verification key registry.
Purpose: Stores all public keys required to verify signatures on Trust Blocks and EA Consent Verifiable Credentials. Keys may rotate or be revoked; new rows are appended to reflect changes.
Consumers always fetch the latest non-revoked key for a given (issuer, kid) pair before verifying a Trust Block.
Consumer Guidance
Partners should treat this table as:
- The authoritative key registry – required for signature verification of Trust Blocks and Consent VCs.
- Versioned key history – every publish, rotation, or revocation appends a new row.
- Multi-issuer surface – contains keys for any issuer in the network (both consent issuers and trust block issuers).
- Not a discovery table – typically small; use precise filters on issuer + kid.
Verification workflow (high-level):
- Extract kid + issuer from the JWT header.
- Query this table for the latest row where:
- issuer = issuer from token
- kid = kid from token
- revocation_ts IS NULL
- Decode base64-encoded key.
- Verify the Trust Block / VC signature.
Primary Key (issuer, kid, ingestion_ts)
Every version of a key appends a new row with a unique ingestion timestamp.
Table Properties:
- delta.appendOnly = true
- Partitioning: to_date(ingestion_ts) (optional — small table)
- Key stability expectation:
- A (issuer, kid) may appear many times over its lifecycle.
- Latest row (by ingestion_ts) is authoritative.
- A (issuer, kid) may appear many times over its lifecycle.
Columns:
Key Identity
issuer(STRING) – REQUIRED - issuing authority/entity (generic term for multi-source keys).kid(STRING) – REQUIRED - JJWT key ID; unique per issuer at any moment.
Timestamps
ingestion_ts(TIMESTAMP, UTC) – REQUIRED - When this key-version row was appendedrevocation_ts(TIMESTAMP, UTC) – OPTIONAL - When key was revoked. NULL = active.
Key Material
format_type(STRING) – OPTIONAL - Format of value. Default: “BASE64_JWK”. Future: “BASE64_PEMvalue(STRING) – REQUIRED; base64-encoded key material (e.g., base64 of JWK JSON or PEM).- When
format_type='BASE64_JWK': Base64 of complete JWK JSON (e.g.,{"kty":"RSA","n":"...","e":"AQAB"}) - When
format_type='BASE64_PEM': Base64 of PEM-formatted key
- When
alg(STRING) – REQUIRED - Signature algorithm, e.g., “ES256”, “RS256”.kty(STRING) – REQUIRED - JWK key type, e.g., “EC”, “RSA”.
Constraints/Expectations:
- Append-only. New publication or revocation ⇒ append a new row with a new
ingestion_ts. - Active key for (
issuer,kid) is the latest row (maxingestion_ts) whererevocation_ts IS NULL. - Revocation A key is considered revoked when the most recent row has revocation_ts NOT NULL.
- Format stability Base64 encoding prevents JSON object reorder issues across Python/Parquet/Delta.
- Future compatibility Extra fields (e.g., not_before_ts, not_after_ts, thumbprint_sha256) may be added without breaking clients.
Example Rows
- Active Key Example
{ "issuer": "https://issuer.example.org", "kid": "key-2025-01", "ingestion_ts": "2025-01-15T12:00:00Z", "format_type": "BASE64_JWK", "value": "eyJrdHkiOiJFQyIsImNydiI6IlAtMjU2IiwiY3J2X2tl...", "alg": "ES256", "kty": "EC", "revocation_ts": null }
Typical Query Patterns
Get the active key for token verification
SELECT *
FROM ea_consent_verification_keys
WHERE issuer = '<issuer>'
AND kid = '<kid>'
AND revocation_ts IS NULL
ORDER BY ingestion_ts DESC
LIMIT 1;
List all keys for an issuer
SELECT *
FROM ea_consent_verification_keys
WHERE issuer = '<issuer>'
ORDER BY kid, ingestion_ts DESC;
Detect newly rotated or revoked keys
SELECT *
FROM ea_consent_verification_keys
WHERE ingestion_ts > '<last_seen_ts>';
Technical Implementation Guidelines
-
Timestamp precision: All timestamp fields (issuance_ts, ingestion_ts, revocation_ts) use UTC with second-level precision. Consumers should not expect sub-second granularity.
-
Append-only behavior: Both tables (ea-consent-tb, ea-consent-verification-keys) are configured as append-only Delta tables. Updates or state changes (e.g., opt-outs, key rotations) generate new rows, not updates.
-
Partitioning: The ea-consent-tb table is partitioned on issuance_date to optimize time-based queries and incremental pulls.
-
Payload preservation: All Trust Blocks and key materials are stored verbatim (e.g., base64 encoding for keys, original compact JWT for trust blocks).
-
Schema evolution: Delta tables are configured with
mergeSchema=trueon writes, enabling automatic schema evolution. New fields can be added to the schema without requiring table recreation. This ensures forward compatibility as the schema evolves.
AWS Storage Layout for Delta Sharing
The EA Consent Delta Share deployment stores all Delta tables in an S3 bucket following a consistent directory and naming structure.
This section provides an example layout (your environment may differ by prefix or workspace).
The S3 bucket structure is as follows:
s3://webshield-delta-sharing/
└── laptop/
└── ea-consent-delta/
├── ea-consent-tb/
│ ├── _delta_log/
│ └── [parquet data files]
│
└── ea-consent-verification-keys/
├── _delta_log/
└── [parquet data files]
Notes
- Directory names use kebab-case, matching table names published via Delta Sharing.
- Each table directory is a valid Delta Lake table containing _delta_log/ and data files.
- The prefix laptop/ea-consent-delta/ is environment-specific; production will typically use a different prefix.
Delta Sharing Configuration (Example)
The EA Consent Delta Sharing deployment stores the two Delta tables in S3 and exposes them through a Delta Sharing Share → Schema → Tables structure.
shares:
- name: "sandbox-webshield-ea-consent-share"
schemas:
- name: "ea-consent-schema"
tables:
- name: "ea-consent-tb"
location: "s3a://webshield-delta-sharing/sandbox/ea-consent-schema/ea-consent-tb"
id: "00000000-0000-0000-0000-000000000001"
- name: "ea-consent-verification-keys"
location: "s3a://webshield-delta-sharing/sandbox/ea-consent-schema/ea-consent-verification-keys"
id: "00000000-0000-0000-0000-000000000002"
Partner Guidance
- This layout is representative, not prescriptive—partners only need the Delta Sharing Host URL + Share/Table names to consume data.
- They do not access S3 directly; the Delta Sharing Server abstracts storage.
- S3 paths are required only if setting up your own share or syncing to an internal environment.