Delta Share Schema

Overview

The EA Consent Delta Share system uses two Delta tables to manage and share consent data:

  1. ea-consent-tb for full verifiable payloads/audit and discovery
  2. ea-consent-verification-keys for signature verification

How to consume

  1. Watermark pull: issuance_ts > from ea-consent-tb.
  2. Filter on status_code and subject_binding_digest or linkage tokens.
  3. Fetch active key for issuer+kid from ea-consent-verification-keys.
  4. Verify trust block → enforce consent.

Design Principles

  • Append-only: Consents are never updated; new records are created for opt-outs or changes.
  • Ordering: Determined by issuance_ts (business time), not update time. issuance_date is derived for partitioning only.
  • Immutable SubjectSnapshot: Subject data is frozen at issuance; downstream identity linking handles changes in email/phone/etc.
  • Payload Integrity: Preserve full JSON/JWT payload with digests and signatures for non-repudiation.

Expected Queries

The EA Consent Delta Share Schema must efficiently support the following query patterns:

  1. Subject Linking
    • Retrieve consents using privacy-preserving correlation keys (subject_binding_digest or linkage_* tokens).
    • Enable joining or linking consents to other Personal Privacy Network (PPN) records.
    • Example: “Find all consents with subject_binding_digest = X or linkage_1_token = Y.”
  2. Incremental Retrieval
    • Fetch all new consent records issued since the last processing run.
    • Use issuance_ts as the primary ordering and watermark field.
    • Example: “Give me all consents where issuance_ts > 2025-08-01T00:00:00Z.”
  3. Consent Classification
    • Quickly determine whether a consent is relevant based on status.
    • Filter by classification field (status_code).
    • Example: “Find all active consents.”
  4. Consent Verification
    • Once a relevant consent is identified, extract and verify the full Trust Block (compact JWT).

Schema Details

Append-only, full verifiable payload table

Purpose: Stores the complete Trust Block payload (which contains the EA Consent Verifiable Credential). This is the primary table for consent discovery, retrieval, and verification. Consumers filter and retrieve rows from this table, then verify signatures using keys from ea_consent_verification_keys.

Consumer Guidance

Partners should treat this table as:

  1. The authoritative record – contains the exact Trust Block as issued.
  2. The discovery surface – use subject_binding_digest and linkage tokens for filtering & incremental pulls.
  3. The verification surface – consumers must verify the Trust Block / VC signature before enforcement.
  4. The audit surface – this is the source for compliance review and non-repudiation.

Primary Key (consent_issuer, consent_id) This pair is globally unique per issued consent.

Table Properties:

  • delta.appendOnly = true
  • Partitioned by issuance_date
  • Recommended clustering (Z-ORDER) on subject_binding_digest, linkage_1_token, status_code for efficient filtering

Columns:

Core Identifiers

  • consent_id (STRING) (REQUIRED) – Unique per consent within issuer. FK from VC.
  • consent_issuer (STRING) (REQUIRED) - Issuing entity/system. FK from VC.
  • trust_block_id (STRING) (REQUIRED) - Unique identifier for the Trust Block containing this consent. Source: Trust Block id field.
  • trust_block_issuer (STRING) (REQUIRED) - Issuing entity of the Trust Block. Source: Trust Block issuer field.

Consent Classification

  • status_code (STRING) (REQUIRED) - Normalized consent status for enforcement
    • Allowed lowercase “active” or “inactive”
    • Source: ConsentVC credentialSubject.consent.status
    • Example: “active”

Time Fields

  • ingestion_ts (TIMESTAMP) (REQUIRED) – load/arrival timestamp (UTC) down to seconds.
  • issuance_ts (TIMESTAMP) (REQUIRED) – exact issuance instant (UTC) of the EA Consent down to seconds. Often earlier than ingestion.
  • issuance_date (DATE) (DERIVIED)– derived from issuance_ts, used for partitioning and watermarks.

Payload fields

  • trust_block_format_type: (STRING) (REQUIRED) - Format of trust_block. Allowed values: COMPACT_JWT, B64_JSON
  • trust_block:(STRING) (REQUIRED) - The complete Trust Block payload (contains the EA Consent VC). Stored exactly as issued.

Subject Binding

  • subject_binding_digest (STRING) (REQUIRED) - Cryptographic digest binding this consent to the subject’s identity at consent time. Enables querying all consents for a specific subject. (base64url-encoded SHA-256)
    • Source: credentialSubject.person_credential_ref.variants[0].digest.value
    • Purpose: Privacy-preserving correlation key for finding all consents belonging to the same subject

Linkage Tokens (Privacy-Preserving Cross-Party Correlation)

Enable cross-party record linking without exposing raw PII using industry-standard tokenization services. Up to 3 linkage token slots are available:

  • linkage_1_system (STRING) (OPTIONAL) - Tokenization system identifier for first linkage token (e.g., “datavant-health-v3”, “milliman-deterministic-v1”)
  • linkage_1_token (STRING) (OPTIONAL) - Opaque token value for first linkage token (e.g., “DV:abc123…”)
  • linkage_2_system (STRING) (OPTIONAL) - Tokenization system identifier for second linkage token
  • linkage_2_token (STRING) (OPTIONAL) - Opaque token value for second linkage token
  • linkage_3_system (STRING) (OPTIONAL) - Tokenization system identifier for third linkage token
  • linkage_3_token (STRING) (OPTIONAL) - Opaque token value for third linkage token

Source: EA Person VC credentialSubject.linkage[] array

Purpose: Match consents to external systems using deterministic tokens without sharing identifying information. Organizations can correlate records for the same individual across different systems using these privacy-preserving identifiers.

Example Query Pattern:

-- Find all consents for subjects matching a Datavant token
SELECT consent_id, consent_issuer, subject_binding_digest
FROM ea_consent_tb
WHERE linkage_1_system = 'datavant-health-v3'
  AND linkage_1_token = 'DV:abc123def456...';

Metadata

  • privacy_algorithm_id (STRING) (OPTIONAL) - ID of privacy algorithm applied to table contents, not part of consent. Metadata only.

Constraints/Expectations:

  • Append-only: a change/opt-out is a new row with a new issuance_ts.
  • Canonical ordering/watermark: issuance_ts (not ingestion time).
  • Payload integrity: trust_block is preserved verbatim; consumers should not assume JSON re-serialization.
  • Both payload fields always present: trust_block and trust_block_format_type are never NULL.

Example Row (simplified)

{
  "consent_id": "85389cfb-75c5-434e-b2e6-651fc75a6ae5",
  "consent_issuer": "https://issuer.example.org",
  "trust_block_id": "urn:uuid:tb-abc123-def456-ghi789",
  "trust_block_issuer": "https://trustblock.issuer.example.org",
  "status_code": "active",
  "issuance_ts": "2025-02-10T15:30:00Z",
  "issuance_date": "2025-02-10",
  "ingestion_ts": "2025-02-10T16:00:05Z",
  "trust_block_format_type": "COMPACT_JWT",
  "trust_block": "eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9....",
  "subject_binding_digest": "kJ9gS3mK7pL2nQ8rT4vW6xY0zA1bC2dE3fG4hI5jK6l",
  "linkage_1_system": "datavant-health-v3",
  "linkage_1_token": "DV:abc123def456ghi789jkl012mno345pqr678stu901vwx234yz",
  "linkage_2_system": "milliman-deterministic-v1",
  "linkage_2_token": "MMID:patient-12345-67890-abcde-fghij",
  "linkage_3_system": null,
  "linkage_3_token": null,
  "privacy_algorithm_id": null
}

Typical Query Patterns

Retrieve consent by primary key

SELECT *
FROM ea_consent_tb
WHERE consent_issuer = '<issuer>'
  AND consent_id = '<consent_id>'
LIMIT 1;

Find consents by subject binding digest

SELECT *
FROM ea_consent_tb
WHERE subject_binding_digest = '<digest>'
ORDER BY issuance_ts DESC;

Find consents by linkage token

SELECT *
FROM ea_consent_tb
WHERE linkage_1_system = 'datavant-health-v3'
  AND linkage_1_token = '<token>'
ORDER BY issuance_ts DESC;

Incremental pull (watermark pattern)

SELECT *
FROM ea_consent_tb
WHERE issuance_ts > '<last_seen_issuance_ts>'
ORDER BY issuance_ts ASC;

Append-only, ingestion-versioned verification key registry.

Purpose: Stores all public keys required to verify signatures on Trust Blocks and EA Consent Verifiable Credentials. Keys may rotate or be revoked; new rows are appended to reflect changes.

Consumers always fetch the latest non-revoked key for a given (issuer, kid) pair before verifying a Trust Block.

Consumer Guidance

Partners should treat this table as:

  1. The authoritative key registry – required for signature verification of Trust Blocks and Consent VCs.
  2. Versioned key history – every publish, rotation, or revocation appends a new row.
  3. Multi-issuer surface – contains keys for any issuer in the network (both consent issuers and trust block issuers).
  4. Not a discovery table – typically small; use precise filters on issuer + kid.

Verification workflow (high-level):

  1. Extract kid + issuer from the JWT header.
  2. Query this table for the latest row where:
    • issuer = issuer from token
    • kid = kid from token
    • revocation_ts IS NULL
  3. Decode base64-encoded key.
  4. Verify the Trust Block / VC signature.

Primary Key (issuer, kid, ingestion_ts)

Every version of a key appends a new row with a unique ingestion timestamp.

Table Properties:

  • delta.appendOnly = true
  • Partitioning: to_date(ingestion_ts) (optional — small table)
  • Key stability expectation:
    • A (issuer, kid) may appear many times over its lifecycle.
      • Latest row (by ingestion_ts) is authoritative.

Columns:

Key Identity

  • issuer (STRING) – REQUIRED - issuing authority/entity (generic term for multi-source keys).
  • kid (STRING) – REQUIRED - JJWT key ID; unique per issuer at any moment.

Timestamps

  • ingestion_ts (TIMESTAMP, UTC) – REQUIRED - When this key-version row was appended
  • revocation_ts (TIMESTAMP, UTC) – OPTIONAL - When key was revoked. NULL = active.

Key Material

  • format_type (STRING) – OPTIONAL - Format of value. Default: “BASE64_JWK”. Future: “BASE64_PEM
  • value (STRING) – REQUIRED; base64-encoded key material (e.g., base64 of JWK JSON or PEM).
    • When format_type='BASE64_JWK': Base64 of complete JWK JSON (e.g., {"kty":"RSA","n":"...","e":"AQAB"})
    • When format_type='BASE64_PEM': Base64 of PEM-formatted key
  • alg (STRING) – REQUIRED - Signature algorithm, e.g., “ES256”, “RS256”.
  • kty (STRING) – REQUIRED - JWK key type, e.g., “EC”, “RSA”.

Constraints/Expectations:

  • Append-only. New publication or revocation ⇒ append a new row with a new ingestion_ts.
  • Active key for (issuer,kid) is the latest row (max ingestion_ts) where revocation_ts IS NULL.
  • Revocation A key is considered revoked when the most recent row has revocation_ts NOT NULL.
  • Format stability Base64 encoding prevents JSON object reorder issues across Python/Parquet/Delta.
  • Future compatibility Extra fields (e.g., not_before_ts, not_after_ts, thumbprint_sha256) may be added without breaking clients.

Example Rows

  1. Active Key Example
    {
      "issuer": "https://issuer.example.org",
      "kid": "key-2025-01",
      "ingestion_ts": "2025-01-15T12:00:00Z",
      "format_type": "BASE64_JWK",
      "value": "eyJrdHkiOiJFQyIsImNydiI6IlAtMjU2IiwiY3J2X2tl...",
      "alg": "ES256",
      "kty": "EC",
      "revocation_ts": null
    }
    

Typical Query Patterns

Get the active key for token verification

SELECT *
FROM ea_consent_verification_keys
WHERE issuer = '<issuer>'
  AND kid = '<kid>'
  AND revocation_ts IS NULL
ORDER BY ingestion_ts DESC
LIMIT 1;

List all keys for an issuer

SELECT *
FROM ea_consent_verification_keys
WHERE issuer = '<issuer>'
ORDER BY kid, ingestion_ts DESC;

Detect newly rotated or revoked keys

SELECT *
FROM ea_consent_verification_keys
WHERE ingestion_ts > '<last_seen_ts>';

Technical Implementation Guidelines

  • Timestamp precision: All timestamp fields (issuance_ts, ingestion_ts, revocation_ts) use UTC with second-level precision. Consumers should not expect sub-second granularity.

  • Append-only behavior: Both tables (ea-consent-tb, ea-consent-verification-keys) are configured as append-only Delta tables. Updates or state changes (e.g., opt-outs, key rotations) generate new rows, not updates.

  • Partitioning: The ea-consent-tb table is partitioned on issuance_date to optimize time-based queries and incremental pulls.

  • Payload preservation: All Trust Blocks and key materials are stored verbatim (e.g., base64 encoding for keys, original compact JWT for trust blocks).

  • Schema evolution: Delta tables are configured with mergeSchema=true on writes, enabling automatic schema evolution. New fields can be added to the schema without requiring table recreation. This ensures forward compatibility as the schema evolves.

AWS Storage Layout for Delta Sharing

The EA Consent Delta Share deployment stores all Delta tables in an S3 bucket following a consistent directory and naming structure.

This section provides an example layout (your environment may differ by prefix or workspace).

The S3 bucket structure is as follows:

s3://webshield-delta-sharing/
└── laptop/
    └── ea-consent-delta/
        ├── ea-consent-tb/
        │   ├── _delta_log/
        │   └── [parquet data files]
        │
        └── ea-consent-verification-keys/
            ├── _delta_log/
            └── [parquet data files]

Notes

  • Directory names use kebab-case, matching table names published via Delta Sharing.
  • Each table directory is a valid Delta Lake table containing _delta_log/ and data files.
  • The prefix laptop/ea-consent-delta/ is environment-specific; production will typically use a different prefix.

Delta Sharing Configuration (Example)

The EA Consent Delta Sharing deployment stores the two Delta tables in S3 and exposes them through a Delta Sharing Share → Schema → Tables structure.

shares:
- name: "sandbox-webshield-ea-consent-share"
  schemas:
  - name: "ea-consent-schema"
    tables:
    - name: "ea-consent-tb"
      location: "s3a://webshield-delta-sharing/sandbox/ea-consent-schema/ea-consent-tb"
      id: "00000000-0000-0000-0000-000000000001"
    - name: "ea-consent-verification-keys"
      location: "s3a://webshield-delta-sharing/sandbox/ea-consent-schema/ea-consent-verification-keys"
      id: "00000000-0000-0000-0000-000000000002"

Partner Guidance

  • This layout is representative, not prescriptive—partners only need the Delta Sharing Host URL + Share/Table names to consume data.
  • They do not access S3 directly; the Delta Sharing Server abstracts storage.
  • S3 paths are required only if setting up your own share or syncing to an internal environment.