CLI Reference
Complete reference for OpenToken CLI arguments, modes, and examples. This page is the single source of truth for CLI flags and options; other documentation (such as Configuration) links here instead of duplicating them.
Security Note
Treat generated token outputs and metadata as sensitive. In particular, --hash-only output is intended for internal use and should not be shared externally (for example, in tickets, chats, or public repos).
Hash-only mode is primarily used to build internal overlap-analysis datasets that can be joined against encrypted tokens received from external partners (after decryption). If you need to exchange tokens across organizations, use encrypted mode and follow a controlled exchange process: Sharing Tokenized Data.
Command Syntax
# Java
java -jar opentoken-cli-*.jar [OPTIONS]
# Python
python -m opentoken_cli.main [OPTIONS]
Required Arguments
| Argument | Short | Description |
|---|---|---|
--input |
-i |
Path to input file (CSV or Parquet) |
--output |
-o |
Path to output file |
--type |
-t |
File type: csv or parquet |
--hashingsecret |
-h |
Secret key for HMAC-SHA256 hashing |
Optional Arguments
| Argument | Short | Description | Default |
|---|---|---|---|
--encryptionkey |
-e |
32-character key for AES-256 encryption | Required unless --hash-only |
--hash-only |
Generate hashed tokens without encryption | false |
|
--output-type |
-ot |
Output file type if different from input | Same as input |
--decrypt |
-d |
Decrypt mode (input must be encrypted) | false |
Modes of Operation
Encrypted Mode (Default)
Generates fully encrypted tokens using AES-256-GCM. Tokens can be decrypted later with the encryption key.
java -jar opentoken-cli-*.jar \
-i input.csv -t csv -o output.csv \
-h "HashingSecret" \
-e "EncryptionKey-Exactly32Chars!!"
Token Pipeline:
Signature → SHA-256 → HMAC-SHA256 → AES-256-GCM → Base64
Hash-Only Mode
Generates one-way hashed tokens. Faster but tokens cannot be decrypted.
java -jar opentoken-cli-*.jar \
-i input.csv -t csv -o output.csv \
-h "HashingSecret" \
--hash-only
Token Pipeline:
Signature → SHA-256 → HMAC-SHA256 → Base64
File Format Examples
CSV Input
RecordId,FirstName,LastName,BirthDate,Sex,PostalCode,SSN
patient_001,John,Doe,1980-01-15,Male,98004,123-45-6789
patient_002,Jane,Smith,1975-03-22,Female,90210,987-65-4321
Column Aliases Accepted:
| Standard Name | Accepted Aliases |
|---|---|
| RecordId | Id |
| FirstName | GivenName |
| LastName | Surname |
| BirthDate | DateOfBirth |
| Sex | Gender |
| PostalCode | ZipCode, ZIP3, ZIP4, ZIP5 |
| SSN | SocialSecurityNumber, NationalIdentificationNumber |
CSV Output
RecordId,RuleId,Token
patient_001,T1,Gn7t1Zj16E5Qy+z9iINtczP6fRDYta6C0XFr...
patient_001,T2,pUxPgYL9+cMxkA+8928Pil+9W+dm9kISwHYP...
patient_001,T3,rwjfwIo5OcJUItTx8KCoSZMtr7tVGSyXsWv/...
patient_001,T4,9o7HIYZkhizczFzJL1HFyanlllzSa8hlgQWQ...
patient_001,T5,QpBpGBqaMhagfcHGZhVavn23ko03jkyS9Vo4...
Parquet Schema
Input:
RecordId: string
FirstName: string
LastName: string
BirthDate: string (YYYY-MM-DD)
Sex: string
PostalCode: string
SSN: string
Output:
RecordId: string
RuleId: string
Token: string
Metadata Output
Every run generates a .metadata.json file:
{
"Platform": "Java",
"JavaVersion": "21.0.0",
"OpenTokenVersion": "1.13.2",
"TotalRows": 100,
"TotalRowsWithInvalidAttributes": 3,
"InvalidAttributesByType": {
"BirthDate": 2,
"SSN": 1
},
"BlankTokensByRule": {
"T1": 2,
"T4": 1
},
"HashingSecretHash": "e0b4e60b...",
"EncryptionSecretHash": "a1b2c3d4..."
}
Docker Script Options
Bash (run-opentoken.sh)
./run-opentoken.sh \
-i ./input.csv \
-o ./output.csv \
-t csv \
-h "HashingKey" \
-e "EncryptionKey" \
[--skip-build] \
[--verbose]
| Option | Description |
|---|---|
--skip-build |
Skip Docker image rebuild |
--verbose |
Show detailed output |
--help |
Show help message |
PowerShell (run-opentoken.ps1)
.\run-opentoken.ps1 `
-i .\input.csv `
-o .\output.csv `
-FileType csv `
-h "HashingKey" `
-e "EncryptionKey" `
[-SkipBuild] `
[-Verbose]
Error Messages
| Error | Cause | Solution |
|---|---|---|
| “Encryption key not provided” | Missing -e in encrypted mode |
Add -e "key" or use --hash-only |
| “Encryption key must be 32 characters” | Key length wrong | Use exactly 32 characters |
| “Input file not found” | Invalid path | Check file exists |
| “Unknown file type” | Invalid -t value |
Use csv or parquet |
| “Invalid attribute: BirthDate” | Date validation failed | Use YYYY-MM-DD format |
Exit Codes
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | Invalid arguments |
| 2 | File not found |
| 3 | Processing error |
Performance Tips
- Use Parquet for large datasets (faster I/O, compression)
- Use
--hash-onlyif decryption not needed (20-30% faster) - For very large files, consider PySpark integration
Next Steps
- Java API Reference - Programmatic usage
- Python API Reference - Programmatic usage
- Configuration - Advanced settings
- Decrypting Tokens - Reverse encrypted tokens