Metadata Format

Complete reference for the OpenToken metadata JSON file structure, fields, and usage for audit and verification.

Overview

OpenToken generates a metadata file alongside every token output file. Metadata files provide:

Processing statistics: Counts of total records, invalid attributes, and blank tokens
System information: Platform (Java/Python), runtime version, library version
Secure hashes: SHA-256 hashes of secrets for verification (not the secrets themselves)
Audit trail: What was processed and how (platform, version, and validation statistics)

Metadata files:

Always use JSON format with .metadata.json extension
Are automatically generated (e.g., output.csv → output.metadata.json)
Contain no raw person data or actual secrets

Metadata Structure

File Format

Filename: <output-file-name>.metadata.json
Format: JSON (UTF-8)
Extension: .metadata.json

Example filenames:

output.csv → output.metadata.json
tokens.parquet → tokens.metadata.json
/data/results.csv → /data/results.metadata.json

JSON Schema

{
  "Platform": "string",
  "JavaVersion": "string (optional, Java only)",
  "PythonVersion": "string (optional, Python only)",
  "OpenTokenVersion": "string",
  "TotalRows": integer,
  "TotalRowsWithInvalidAttributes": integer,
  "InvalidAttributesByType": {
    "AttributeName": integer,
    ...
  },
  "BlankTokensByRule": {
    "RuleId": integer,
    ...
  },
  "HashingSecretHash": "string (hex)",
  "EncryptionSecretHash": "string (hex, optional)"
}

Field Descriptions

Platform Information

Field	Type	Description	Example
`Platform`	String	Processing platform/language	`"Java"` or `"Python"`
`JavaVersion`	String	Java runtime version (Java only)	`"21.0.0"`
`PythonVersion`	String	Python runtime version (Python only)	`"3.11.5"`
`OpenTokenVersion`	String	OpenToken library version	`"1.12.2"`

Notes:

Only JavaVersion OR PythonVersion appears (not both)
Platform value determines which version field is present

Processing Statistics

Field	Type	Description	Example
`TotalRows`	Integer	Total input records processed	`101`
`TotalRowsWithInvalidAttributes`	Integer	Records with ≥1 invalid attribute	`9`
`InvalidAttributesByType`	Object	Count of invalid values by attribute name	`{"FirstName": 1, "BirthDate": 3}`
`BlankTokensByRule`	Object	Count of blank tokens by rule ID	`{"T1": 5, "T2": 12}`

InvalidAttributesByType:

Keys: Attribute names (e.g., FirstName, BirthDate, SocialSecurityNumber)
Values: Count of invalid occurrences across all records
A single record with 2 invalid attributes contributes 2 to the sum
Sum of counts ≥ TotalRowsWithInvalidAttributes

BlankTokensByRule:

Keys: Rule IDs (T1, T2, T3, T4, T5)
Values: Count of blank tokens for that rule
Blank tokens occur when a rule requires an invalid attribute
Example: Invalid BirthDate causes blank tokens for T1, T2, T3, T4 (but not T5)

Secret Hashes

Field	Type	Description	Example
`HashingSecretHash`	String	SHA-256 hash of hashing secret (hex)	`"e0b4e60b6a9f7ea3b13c0d6a6e1b8c5d..."`
`EncryptionSecretHash`	String	SHA-256 hash of encryption key (hex, optional)	`"a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6..."`

Security:

Hashes are not reversible (SHA-256 is one-way)
Used for verification: calculate hash of your secret and compare to metadata
EncryptionSecretHash omitted in --hash-only mode (no encryption used)

Example Metadata

Full Example (Encryption Mode)

{
  "Platform": "Java",
  "JavaVersion": "21.0.0",
  "OpenTokenVersion": "1.13.2",
  "TotalRows": 101,
  "TotalRowsWithInvalidAttributes": 9,
  "InvalidAttributesByType": {
    "SocialSecurityNumber": 2,
    "FirstName": 1,
    "PostalCode": 1,
    "LastName": 2,
    "BirthDate": 3
  },
  "BlankTokensByRule": {
    "T1": 5,
    "T2": 12,
    "T3": 3,
    "T4": 8,
    "T5": 7
  },
  "HashingSecretHash": "e0b4e60b6a9f7ea3b13c0d6a6e1b8c5d4e3f2a9b8c7d6e5f4a3b2c1d0e9f8a7b6c5d4e3f2a1b0c9d8e7f6a5b4c3d2e1f0",
  "EncryptionSecretHash": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8"
}

Hash-Only Mode Example

{
  "Platform": "Python",
  "PythonVersion": "3.11.5",
  "OpenTokenVersion": "1.13.2",
  "TotalRows": 50,
  "TotalRowsWithInvalidAttributes": 2,
  "InvalidAttributesByType": {
    "PostalCode": 2
  },
  "BlankTokensByRule": {
    "T2": 2
  },
  "HashingSecretHash": "abc123def456789abc123def456789abc123def456789abc123def456789abc123"
}

Note: No EncryptionSecretHash because --hash-only mode doesn’t use encryption.

Interpreting Metadata

Valid vs Invalid Records

Calculate valid records:

Valid Records = TotalRows - TotalRowsWithInvalidAttributes

Example:

{
  "TotalRows": 100,
  "TotalRowsWithInvalidAttributes": 5
}

100 records processed
5 records had errors
95 records were fully valid

Invalid Attribute Counts

Count totals:

Sum of InvalidAttributesByType values ≥ TotalRowsWithInvalidAttributes

Why ≥? A single record can have multiple invalid attributes.

Example:

{
  "TotalRows": 100,
  "TotalRowsWithInvalidAttributes": 5,
  "InvalidAttributesByType": {
    "FirstName": 2,
    "PostalCode": 3,
    "BirthDate": 1
  }
}

Total invalid attribute instances: 2 + 3 + 1 = 6
Records with errors: 5
At least one record had 2+ invalid attributes

Blank Token Analysis

Blank tokens occur when a rule requires an invalid attribute.

Token rule dependencies:

T1: LastName, FirstName, Sex, BirthDate
T2: LastName, FirstName, BirthDate, PostalCode
T3: LastName, FirstName, Sex, BirthDate
T4: SocialSecurityNumber, Sex, BirthDate
T5: LastName, FirstName, Sex

Example:

{
  "InvalidAttributesByType": {
    "BirthDate": 3
  },
  "BlankTokensByRule": {
    "T1": 3,
    "T2": 3,
    "T3": 3,
    "T4": 3,
    "T5": 0
  }
}

3 records had invalid BirthDate
T1–T4 all use BirthDate → 3 blank tokens each
T5 doesn’t use BirthDate → 0 blank tokens

Hash Verification

Purpose

Verify that the secrets used for token generation match expected values without exposing the secrets themselves.

Verification Process

Calculate hash of your secret:

python tools/hash_calculator.py --hashing-secret "HashingKey"

Compare to metadata:

cat output.metadata.json | grep HashingSecretHash

Match = correct secret used

Hash Calculation

The hash is computed as:

SHA-256(secret) → hex-encoded string (64 hex characters)

Python implementation:

import hashlib

def calculate_hash(secret: str) -> str:
    return hashlib.sha256(secret.encode('utf-8')).hexdigest()

hashing_hash = calculate_hash("HashingKey")
encryption_hash = calculate_hash("Secret-Encryption-Key-Goes-Here.")

Java implementation:

import java.security.MessageDigest;
import java.nio.charset.StandardCharsets;

public static String calculateHash(String secret) throws Exception {
    MessageDigest digest = MessageDigest.getInstance("SHA-256");
    byte[] hash = digest.digest(secret.getBytes(StandardCharsets.UTF_8));
    return bytesToHex(hash);
}

private static String bytesToHex(byte[] bytes) {
    StringBuilder result = new StringBuilder();
    for (byte b : bytes) {
        result.append(String.format("%02x", b));
    }
    return result.toString();
}

Using the Hash Calculator Tool

The tools/hash_calculator.py script provides command-line hash calculation:

# Calculate both hashes
python tools/hash_calculator.py \
  --hashing-secret "HashingKey" \
  --encryption-key "Secret-Encryption-Key-Goes-Here."

# Output:
# HashingSecretHash: e0b4e60b6a9f7ea3b13c0d6a6e1b8c5d...
# EncryptionSecretHash: a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6...

Usage Notes

Audit Trail

Metadata provides an audit record of:

What was processed (record counts and attribute-level statistics)
When it was processed (inferred from surrounding system logs or job metadata)
How it was processed (platform, version)
What secrets were used (via hashes)
What errors occurred (invalid attributes)

Store metadata files alongside token outputs for compliance and troubleshooting.

Cross-Language Consistency

Both Java and Python implementations produce identical metadata structure. Only differences:

JavaVersion vs PythonVersion field name
Timestamp format may vary slightly (both ISO 8601 compliant)

Retention

Consider retaining metadata longer than token files:

Metadata contains no person data
Provides audit trail for compliance
Useful for troubleshooting historic runs

Security

Metadata files contain SHA-256 hashes of secrets:

✓ Safe to log, store, and share (no secrets exposed)
✓ Enables verification without revealing secrets
✗ Cannot reverse hashes to recover secrets
✗ Attacker with metadata alone cannot generate tokens

Next Steps

View token rules: Concepts: Token Rules
Understand validation: Security
Use hash calculator: tools/hash_calculator.py
See full examples: Quickstarts