Double Anonymization

Two-stage anonymization with provenance tracking for maximum privacy protection and verifiable data transformation.

Overview

MediPact implements double anonymization with provenance tracking to provide maximum privacy protection and verifiable data transformation on the Hedera blockchain. This two-stage approach ensures defense in depth while maintaining data utility for research.

Stage 1: Storage

Optimized for research queries while protecting privacy. Preserves 5-year age ranges and exact dates.

Stage 2: Chain

Maximum privacy for immutable blockchain storage. Further generalizes to 10-year ranges and month/year dates.

Why Double Anonymization?

Defense in Depth

Two layers of protection ensure that if one layer fails, the other still protects patient privacy. Different anonymization strategies at each stage provide comprehensive coverage.

Different Purposes

Storage anonymization is optimized for research queries (preserves detail), while chain anonymization is optimized for privacy on immutable blockchain storage (maximum generalization).

Provenance Tracking

Both hashes stored together on Hedera with a provenance proof allows anyone to verify that the chain hash was derived from the storage hash, providing complete audit trail and transformation verification.

Compliance Ready

Meets strict regulatory requirements including GDPR and HIPAA. Demonstrates layered privacy protection and exceeds Safe Harbor de-identification standards.

Two-Stage Process

Stage 1: Storage Anonymization

Purpose: Research-Optimized Privacy

Stage 1 anonymization is designed to protect privacy while preserving data utility for research queries.

What Gets Removed

  • Patient names
  • Patient IDs (original)
  • Specific addresses (street, city)
  • Phone numbers
  • Exact dates of birth
  • Exact age (replaced with age range)

What Gets Preserved

  • Age Range: 5-year ranges (e.g., "35-39")
  • Location: Country, region, district
  • Dates: Exact dates (YYYY-MM-DD)
  • Gender: Male, Female, Other, Unknown
  • Occupation: Specific categories (e.g., "Healthcare Worker")
  • Medical Data: All clinical information intact

Example

1{
2  "anonymousPatientId": "PID-001",
3  "ageRange": "35-39",
4  "country": "Uganda",
5  "region": "Central",
6  "gender": "Male",
7  "occupationCategory": "Healthcare Worker",
8  "effectiveDate": "2024-03-15",
9  "observationCodeLoinc": "4548-4",
10  "valueQuantity": "8.1"
11}

Stage 2: Chain Anonymization

Purpose: Maximum Blockchain Privacy

Stage 2 anonymization applies further generalization specifically for immutable blockchain storage where data cannot be deleted or modified.

Additional Generalizations

  • Age Ranges: 5-year → 10-year (e.g., "35-39" → "30-39")
  • Dates: Exact → Month/Year (e.g., "2024-03-15" → "2024-03")
  • Location: Remove region/district (keep only country)
  • Occupation: Further generalize (e.g., "Healthcare Worker" → "Healthcare")
  • Rare Values: Suppress values that could identify individuals

Example

1{
2  "anonymousPatientId": "PID-001",
3  "ageRange": "30-39",
4  "country": "Uganda",
5  "gender": "Male",
6  "occupationCategory": "Healthcare",
7  "effectiveDate": "2024-03",
8  "observationCodeLoinc": "4548-4",
9  "valueQuantity": "8.1"
10}

Note: Region/district removed, age range expanded, date rounded to month, occupation generalized.

Provenance Records

What is a Provenance Record?

A provenance record contains both hashes (storage + chain) with a cryptographic proof linking them together, stored immutably on Hedera HCS.

Structure

1{
2  "storage": {
3    "hash": "abc123def456...",
4    "anonymizationLevel": "storage",
5    "timestamp": "2024-03-15T10:30:00Z"
6  },
7  "chain": {
8    "hash": "def456ghi789...",
9    "anonymizationLevel": "chain",
10    "derivedFrom": "abc123def456...",
11    "timestamp": "2024-03-15T10:30:00Z"
12  },
13  "anonymousPatientId": "PID-001",
14  "resourceType": "Patient",
15  "hospitalId": "HOSP-XXX",
16  "timestamp": "2024-03-15T10:30:00Z",
17  "provenanceProof": "xyz789abc123..."
18}

Storage Hash (H1)

SHA-256 hash of Stage 1 anonymized data. Used for backend storage verification.

Chain Hash (H2)

SHA-256 hash of Stage 2 anonymized data. Used for immutable blockchain storage.

Provenance Proof

Cryptographic proof linking both hashes together. Proves transformation chain.

Verification Process

Anyone can verify the provenance chain on Hedera HashScan:

1. Origin Verification

Verify both hashes exist and match expected values:

1assert(provenanceRecord.storage.hash === expectedStorageHash);
2assert(provenanceRecord.chain.hash === expectedChainHash);

2. Transformation Verification

Verify chain hash was derived from storage hash:

1assert(provenanceRecord.chain.derivedFrom === provenanceRecord.storage.hash);

3. Provenance Proof Verification

Verify the provenance proof links both hashes:

1const expectedProof = generateProvenanceProof(
2  provenanceRecord.storage.hash,
3  provenanceRecord.chain.hash,
4  provenanceRecord.anonymousPatientId,
5  provenanceRecord.resourceType
6);
7assert(provenanceRecord.provenanceProof === expectedProof);

Comparison Table

FeatureStage 1 (Storage)Stage 2 (Chain)
Age Range5-year (e.g., "35-39")10-year (e.g., "30-39")
DatesExact (YYYY-MM-DD)Month/Year (YYYY-MM)
LocationCountry + Region + DistrictCountry only
OccupationSpecific categoryBroad category
PurposeResearch queriesBlockchain storage
Privacy LevelHighMaximum
Data UtilityHigh (preserves detail)Medium (generalized)

Benefits

Double Protection

Two layers of anonymization ensure maximum privacy protection with defense in depth.

Provenance Chain

Verifiable transformation chain on Hedera allows anyone to verify origin and transformation.

Origin Proof

Both hashes prove same source, providing complete audit trail for compliance.

Transformation Proof

Chain hash derived from storage hash is verifiable, proving the transformation chain.

Public Verification

Anyone can verify provenance records on HashScan, ensuring transparency and trust.

Compliance Ready

Meets strict regulatory requirements including GDPR and HIPAA Safe Harbor standards.

HashScan Verification

Each provenance record is stored on Hedera and can be verified on HashScan:

  1. Visit HashScan link from adapter output
  2. View provenance record JSON
  3. Verify both hashes (storage + chain)
  4. Verify derivedFrom link
  5. Verify provenance proof