Sharing Tokenized Data Between Organizations
How to securely exchange OpenToken outputs for cross-organization person matching.
Overview
Organizations often need to identify overlapping individuals across datasets without exposing raw person data. OpenToken enables this by generating deterministic, cryptographically secure tokens that can be shared and matched externally.
Typical scenario:
- Organization A and Organization B each hold patient records
- Both organizations run OpenToken on their data using the same secrets
- They exchange only the token output (no raw PII)
- Matching tokens indicate the same person exists in both datasets
┌─────────────────┐ ┌─────────────────┐
│ Organization A │ │ Organization B │
│ (Patient Data) │ │ (Patient Data) │
└────────┬────────┘ └────────┬────────┘
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ OpenToken │ │ OpenToken │
│ (same secrets) │ │ (same secrets) │
└────────┬────────┘ └────────┬────────┘
│ │
▼ ▼
┌─────────────────┐ Exchange ┌─────────────────┐
│ Tokens + Meta │ ←───────────────→ │ Tokens + Meta │
└────────┬────────┘ └────────┬────────┘
│ │
└───────────────┬──────────────────────┘
▼
┌─────────────────┐
│ Token Matching │
│ (find overlap) │
└─────────────────┘
Sender Workflow
The sending organization prepares tokenized data for sharing.
Step 1: Agree on Shared Secrets
Before tokenization, both parties must agree on:
- Hashing secret (required): Used for HMAC-SHA256
- Encryption key (recommended for external sharing): Used for AES-256-GCM
Best practice: Use a secure channel (encrypted email, secure file transfer, or direct key exchange in person) to share secrets. Never send secrets alongside token files.
Step 2: Run OpenToken
Generate tokens using the agreed-upon secrets.
Encrypted mode (recommended for external sharing):
java -jar opentoken-cli-*.jar \
-i patient_data.csv \
-t csv \
-o tokens_for_partner.csv \
-h "$SHARED_HASHING_SECRET" \
-e "$SHARED_ENCRYPTION_KEY"
Hash-only mode (overlap analysis helper, internal artifact):
Hash-only output is primarily used inside your environment to support overlap analysis against encrypted tokens received from a partner:
java -jar opentoken-cli-*.jar \
--hash-only \
-i local_patient_data.csv \
-t csv \
-o local_hash_only_tokens.csv \
-h "$SHARED_HASHING_SECRET"
Typical pattern:
- Both parties exchange encrypted tokens only.
- You decrypt the partner’s encrypted tokens to the hash-only equivalent (see Decrypting Tokens).
- You generate hash-only tokens for your own dataset.
- You perform overlap analysis by joining the two hash-only datasets.
See Hash-Only Mode for trade-offs between encrypted and hash-only tokens.
Step 3: Review Metadata
Check the generated .metadata.json for processing statistics:
{
"TotalRows": 50000,
"TotalRowsWithInvalidAttributes": 120,
"InvalidAttributesByType": {
"SocialSecurityNumber": 80,
"BirthDate": 40
},
"BlankTokensByRule": {
"T1": 80,
"T3": 80
},
"HashingSecretHash": "e0b4e60b...",
"EncryptionSecretHash": "a1b2c3d4..."
}
Key checks:
TotalRowsWithInvalidAttributes: High counts may indicate data quality issuesBlankTokensByRule: T1 and T3 require SSN; blanks are expected if SSN is often missingHashingSecretHash/EncryptionSecretHash: Share these hashes (not the secrets) so the recipient can verify they used the correct keys
Step 4: Prepare Transfer Package
Include in the transfer:
| File | Purpose | Contains Secrets? |
|---|---|---|
tokens.csv (or .parquet) |
Token output | No |
tokens.metadata.json |
Processing stats, secret hashes | Hashes only (not secrets) |
| Data dictionary (optional) | Column definitions, RecordId mapping | No |
Do NOT include:
- Raw input data (PII)
- Hashing secret or encryption key (share separately via secure channel)
- Decrypted tokens
Step 5: Transfer Securely
Use encrypted file transfer:
- SFTP with encryption
- Cloud storage with encryption at rest (S3, Azure Blob, GCS)
- Secure email with encrypted attachment
Recipient Workflow
The receiving organization ingests shared tokens and matches against their own data.
Step 1: Obtain Shared Secrets
Receive the hashing secret (and encryption key, if applicable) through a secure channel separate from the token files.
Step 2: Verify Secret Hashes
Before processing, verify that your secrets match the sender’s:
python tools/hash_calculator.py \
--hashing-secret "$SHARED_HASHING_SECRET" \
--encryption-key "$SHARED_ENCRYPTION_KEY"
Compare the output hashes with HashingSecretHash and EncryptionSecretHash in the received metadata file. If they don’t match, tokens will not match correctly.
Step 3: Generate Your Own Tokens
Run OpenToken on your local data using the same secrets:
java -jar opentoken-cli-*.jar \
-i local_patient_data.csv \
-t csv \
-o local_tokens.csv \
-h "$SHARED_HASHING_SECRET" \
-e "$SHARED_ENCRYPTION_KEY"
Step 4: Match Tokens
Join token files to find matching records:
-- Find overlapping patients
SELECT
partner.RecordId AS PartnerRecordId,
local.RecordId AS LocalRecordId,
partner.RuleId
FROM partner_tokens partner
JOIN local_tokens local
ON partner.Token = local.Token
AND partner.RuleId = local.RuleId;
Interpretation:
- T4 match: Very high confidence (SSN + Sex + BirthDate)
- T3 match: High confidence (Last name + full first name + Sex + BirthDate)
- Multiple rule matches: Stronger confidence (more agreement across attributes)
See Matching Model for matching strategies.
Step 5: Handle Encrypted Tokens (If Applicable)
If tokens are encrypted and you need to debug or verify:
java -jar opentoken-cli-*.jar \
-d \
-i partner_tokens.csv \
-t csv \
-o partner_decrypted.csv \
-e "$SHARED_ENCRYPTION_KEY"
See Decrypting Tokens for details.
Security Considerations
Use Encrypted Tokens for External Sharing
Encrypted mode (-e flag) adds AES-256-GCM encryption on top of HMAC-SHA256:
| Mode | External Sharing | Defense in Depth | Reversible |
|---|---|---|---|
| Encrypted | ✓ Recommended | Yes | To HMAC hash (with key) |
| Hash-only | ⚠ Use with caution | No | Not reversible |
Encrypted tokens provide an additional security layer if token files are intercepted.
Protect Shared Secrets
- Never send secrets with token files. Use a separate secure channel.
- Store secrets in a vault (AWS Secrets Manager, HashiCorp Vault, Azure Key Vault)
- Limit access to secrets to authorized personnel only
- Rotate secrets periodically and re-tokenize as needed
Verify Partner Identity
Before sharing:
- Confirm the partner organization’s identity through established business relationships
- Use signed data-sharing agreements
- Verify contact information through independent channels
Audit and Logging
- Log token generation events: Who ran OpenToken, when, with which input file
- Log token transfers: When tokens were sent/received, to/from whom
- Log matching events: Who performed matching, what results were produced
Minimize Data Exposure
- Share only tokens: Never share raw input data alongside tokens
- Limit token rules if appropriate: If only T2 matching is needed, consider generating only T2 tokens
- Use RecordId mappings: Map internal IDs to opaque identifiers before sharing
What Tokens Do NOT Reveal
Tokens are one-way transformations. Without the hashing secret:
- Attackers cannot reverse tokens to original attributes
- Attackers cannot generate new tokens for known individuals
- Attackers cannot determine which attributes produced a token
With only the token file, an attacker cannot identify individuals.
Common Pitfalls
| Issue | Cause | Solution |
|---|---|---|
| Zero matches between datasets | Different secrets used | Verify secret hashes match in metadata files |
| Partial matches only | Normalization differences | Ensure both parties use the same OpenToken version |
| High invalid record counts | Data quality issues | Clean data before tokenization; review validation rules |
| Secrets exposed in logs | Logging misconfiguration | Configure logging to exclude sensitive parameters |
| Token file intercepted | Insecure transfer | Use encrypted file transfer; prefer encrypted tokens |
Checklist: Before Sharing
Sender:
- Agreed on secrets with recipient (via secure channel)
- Generated tokens with correct secrets
- Verified metadata shows expected row counts
- Prepared transfer package (tokens + metadata only)
- Using encrypted file transfer
Recipient:
- Received secrets via secure channel (separate from token files)
- Verified secret hashes match sender’s metadata
- Generated own tokens with same secrets
- Matching logic uses correct RuleId and Token columns
Related Documentation
- Hash-Only Mode — When to skip encryption
- Decrypting Tokens — Reversing encrypted tokens for verification
- Security — Cryptographic details and key management
- Key Management & Secrets — Secret handling best practices
- Configuration — CLI arguments and environment variables
- Matching Model — How token matching works