Metadata Format
Complete reference for the OpenToken metadata JSON file structure, fields, and usage for audit and verification.
Overview
OpenToken generates a metadata file alongside every token output file. Metadata files provide:
- Processing statistics: Counts of total records, invalid attributes, and blank tokens
- System information: Platform (Java/Python), runtime version, library version
- Secure hashes: SHA-256 hashes of secrets for verification (not the secrets themselves)
- Audit trail: What was processed and how (platform, version, and validation statistics)
Metadata files:
- Always use JSON format with
.metadata.jsonextension - Are automatically generated (e.g.,
output.csv→output.metadata.json) - Contain no raw person data or actual secrets
Metadata Structure
File Format
Filename: <output-file-name>.metadata.json
Format: JSON (UTF-8)
Extension: .metadata.json
Example filenames:
output.csv→output.metadata.jsontokens.parquet→tokens.metadata.json/data/results.csv→/data/results.metadata.json
JSON Schema
{
"Platform": "string",
"JavaVersion": "string (optional, Java only)",
"PythonVersion": "string (optional, Python only)",
"OpenTokenVersion": "string",
"TotalRows": integer,
"TotalRowsWithInvalidAttributes": integer,
"InvalidAttributesByType": {
"AttributeName": integer,
...
},
"BlankTokensByRule": {
"RuleId": integer,
...
},
"HashingSecretHash": "string (hex)",
"EncryptionSecretHash": "string (hex, optional)"
}
Field Descriptions
Platform Information
| Field | Type | Description | Example |
|---|---|---|---|
Platform |
String | Processing platform/language | "Java" or "Python" |
JavaVersion |
String | Java runtime version (Java only) | "21.0.0" |
PythonVersion |
String | Python runtime version (Python only) | "3.11.5" |
OpenTokenVersion |
String | OpenToken library version | "1.12.2" |
Notes:
- Only
JavaVersionORPythonVersionappears (not both) - Platform value determines which version field is present
Processing Statistics
| Field | Type | Description | Example |
|---|---|---|---|
TotalRows |
Integer | Total input records processed | 101 |
TotalRowsWithInvalidAttributes |
Integer | Records with ≥1 invalid attribute | 9 |
InvalidAttributesByType |
Object | Count of invalid values by attribute name | {"FirstName": 1, "BirthDate": 3} |
BlankTokensByRule |
Object | Count of blank tokens by rule ID | {"T1": 5, "T2": 12} |
InvalidAttributesByType:
- Keys: Attribute names (e.g.,
FirstName,BirthDate,SocialSecurityNumber) - Values: Count of invalid occurrences across all records
- A single record with 2 invalid attributes contributes 2 to the sum
- Sum of counts ≥
TotalRowsWithInvalidAttributes
BlankTokensByRule:
- Keys: Rule IDs (
T1,T2,T3,T4,T5) - Values: Count of blank tokens for that rule
- Blank tokens occur when a rule requires an invalid attribute
- Example: Invalid
BirthDatecauses blank tokens for T1, T2, T3, T4 (but not T5)
Secret Hashes
| Field | Type | Description | Example |
|---|---|---|---|
HashingSecretHash |
String | SHA-256 hash of hashing secret (hex) | "e0b4e60b6a9f7ea3b13c0d6a6e1b8c5d..." |
EncryptionSecretHash |
String | SHA-256 hash of encryption key (hex, optional) | "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6..." |
Security:
- Hashes are not reversible (SHA-256 is one-way)
- Used for verification: calculate hash of your secret and compare to metadata
EncryptionSecretHashomitted in--hash-onlymode (no encryption used)
Example Metadata
Full Example (Encryption Mode)
{
"Platform": "Java",
"JavaVersion": "21.0.0",
"OpenTokenVersion": "1.13.2",
"TotalRows": 101,
"TotalRowsWithInvalidAttributes": 9,
"InvalidAttributesByType": {
"SocialSecurityNumber": 2,
"FirstName": 1,
"PostalCode": 1,
"LastName": 2,
"BirthDate": 3
},
"BlankTokensByRule": {
"T1": 5,
"T2": 12,
"T3": 3,
"T4": 8,
"T5": 7
},
"HashingSecretHash": "e0b4e60b6a9f7ea3b13c0d6a6e1b8c5d4e3f2a9b8c7d6e5f4a3b2c1d0e9f8a7b6c5d4e3f2a1b0c9d8e7f6a5b4c3d2e1f0",
"EncryptionSecretHash": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8"
}
Hash-Only Mode Example
{
"Platform": "Python",
"PythonVersion": "3.11.5",
"OpenTokenVersion": "1.13.2",
"TotalRows": 50,
"TotalRowsWithInvalidAttributes": 2,
"InvalidAttributesByType": {
"PostalCode": 2
},
"BlankTokensByRule": {
"T2": 2
},
"HashingSecretHash": "abc123def456789abc123def456789abc123def456789abc123def456789abc123"
}
Note: No EncryptionSecretHash because --hash-only mode doesn’t use encryption.
Interpreting Metadata
Valid vs Invalid Records
Calculate valid records:
Valid Records = TotalRows - TotalRowsWithInvalidAttributes
Example:
{
"TotalRows": 100,
"TotalRowsWithInvalidAttributes": 5
}
- 100 records processed
- 5 records had errors
- 95 records were fully valid
Invalid Attribute Counts
Count totals:
Sum of InvalidAttributesByType values ≥ TotalRowsWithInvalidAttributes
Why ≥? A single record can have multiple invalid attributes.
Example:
{
"TotalRows": 100,
"TotalRowsWithInvalidAttributes": 5,
"InvalidAttributesByType": {
"FirstName": 2,
"PostalCode": 3,
"BirthDate": 1
}
}
- Total invalid attribute instances: 2 + 3 + 1 = 6
- Records with errors: 5
- At least one record had 2+ invalid attributes
Blank Token Analysis
Blank tokens occur when a rule requires an invalid attribute.
Token rule dependencies:
- T1: LastName, FirstName, Sex, BirthDate
- T2: LastName, FirstName, BirthDate, PostalCode
- T3: LastName, FirstName, Sex, BirthDate
- T4: SocialSecurityNumber, Sex, BirthDate
- T5: LastName, FirstName, Sex
Example:
{
"InvalidAttributesByType": {
"BirthDate": 3
},
"BlankTokensByRule": {
"T1": 3,
"T2": 3,
"T3": 3,
"T4": 3,
"T5": 0
}
}
- 3 records had invalid BirthDate
- T1–T4 all use BirthDate → 3 blank tokens each
- T5 doesn’t use BirthDate → 0 blank tokens
Hash Verification
Purpose
Verify that the secrets used for token generation match expected values without exposing the secrets themselves.
Verification Process
- Calculate hash of your secret:
python tools/hash_calculator.py --hashing-secret "HashingKey" - Compare to metadata:
cat output.metadata.json | grep HashingSecretHash - Match = correct secret used
Hash Calculation
The hash is computed as:
SHA-256(secret) → hex-encoded string (64 hex characters)
Python implementation:
import hashlib
def calculate_hash(secret: str) -> str:
return hashlib.sha256(secret.encode('utf-8')).hexdigest()
hashing_hash = calculate_hash("HashingKey")
encryption_hash = calculate_hash("Secret-Encryption-Key-Goes-Here.")
Java implementation:
import java.security.MessageDigest;
import java.nio.charset.StandardCharsets;
public static String calculateHash(String secret) throws Exception {
MessageDigest digest = MessageDigest.getInstance("SHA-256");
byte[] hash = digest.digest(secret.getBytes(StandardCharsets.UTF_8));
return bytesToHex(hash);
}
private static String bytesToHex(byte[] bytes) {
StringBuilder result = new StringBuilder();
for (byte b : bytes) {
result.append(String.format("%02x", b));
}
return result.toString();
}
Using the Hash Calculator Tool
The tools/hash_calculator.py script provides command-line hash calculation:
# Calculate both hashes
python tools/hash_calculator.py \
--hashing-secret "HashingKey" \
--encryption-key "Secret-Encryption-Key-Goes-Here."
# Output:
# HashingSecretHash: e0b4e60b6a9f7ea3b13c0d6a6e1b8c5d...
# EncryptionSecretHash: a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6...
Usage Notes
Audit Trail
Metadata provides an audit record of:
- What was processed (record counts and attribute-level statistics)
- When it was processed (inferred from surrounding system logs or job metadata)
- How it was processed (platform, version)
- What secrets were used (via hashes)
- What errors occurred (invalid attributes)
Store metadata files alongside token outputs for compliance and troubleshooting.
Cross-Language Consistency
Both Java and Python implementations produce identical metadata structure. Only differences:
JavaVersionvsPythonVersionfield name- Timestamp format may vary slightly (both ISO 8601 compliant)
Retention
Consider retaining metadata longer than token files:
- Metadata contains no person data
- Provides audit trail for compliance
- Useful for troubleshooting historic runs
Security
Metadata files contain SHA-256 hashes of secrets:
- ✓ Safe to log, store, and share (no secrets exposed)
- ✓ Enables verification without revealing secrets
- ✗ Cannot reverse hashes to recover secrets
- ✗ Attacker with metadata alone cannot generate tokens
Next Steps
- View token rules: Concepts: Token Rules
- Understand validation: Security
- Use hash calculator:
tools/hash_calculator.py - See full examples: Quickstarts