Running Batch Jobs

How to run OpenToken in batch mode across CSV or Parquet files at scale using CLI or Docker.


Overview

OpenToken processes input files (CSV or Parquet) and produces two outputs:

  1. Tokens file (CSV or Parquet): Contains RecordId, RuleId, Token columns
  2. Metadata file (JSON): Processing statistics, secret hashes, and validation counts

CLI Batch Processing

Basic Syntax

opentoken-cli [OPTIONS] -i <input> -t <type> -o <output> -h <hashing-secret> [-e <encryption-key>]

Required Arguments

Argument Alias Description Example
-i --input Input file path -i data.csv
-t --type Input file type (csv or parquet) -t csv
-o --output Output file path -o tokens.csv
-h --hashingsecret HMAC-SHA256 hashing secret -h "MyKey"

Optional Arguments

Argument Alias Description Default
-e --encryptionkey AES-256 encryption key Required unless --hash-only
-ot --output-type Output file type Same as input
  --hash-only Hash-only mode (no encryption) False
-d --decrypt Decrypt mode False

Java CLI Example

cd lib/java
mvn clean install -DskipTests

java -jar opentoken-cli/target/opentoken-cli-*.jar \
  -i ../../resources/sample.csv \
  -t csv \
  -o ../../resources/output.csv \
  -h "HashingKey" \
  -e "Secret-Encryption-Key-Goes-Here."

Python CLI Example

cd lib/python/opentoken-cli
source ../../.venv/bin/activate
pip install -r requirements.txt -e . -e ../opentoken

python -m opentoken_cli.main \
  -i ../../../resources/sample.csv \
  -t csv \
  -o ../../../resources/output.csv \
  -h "HashingKey" \
  -e "Secret-Encryption-Key-Goes-Here."

Docker Batch Processing

Bash (Linux/Mac):

cd /path/to/OpenLinkToken

./run-opentoken.sh \
  -i ./resources/sample.csv \
  -o ./resources/output.csv \
  -t csv \
  -h "HashingKey" \
  -e "Secret-Encryption-Key-Goes-Here."

PowerShell (Windows):

cd C:\path\to\OpenLinkToken

.\run-opentoken.ps1 `
  -i .\resources\sample.csv `
  -o .\resources\output.csv `
  -FileType csv `
  -h "HashingKey" `
  -e "Secret-Encryption-Key-Goes-Here."

Script Options

Option Bash PowerShell Description
File type -t -FileType csv or parquet
Skip rebuild -s -SkipBuild Reuse existing image
Verbose -v -Verbose Show detailed output

Manual Docker Commands

# Build the image
docker build -t opentoken:latest .

# Run with sample data
docker run --rm -v $(pwd)/resources:/app/resources \
  opentoken:latest \
  -i /app/resources/sample.csv \
  -t csv \
  -o /app/resources/output.csv \
  -h "HashingKey" \
  -e "Secret-Encryption-Key-Goes-Here."

# View output
cat resources/output.csv
cat resources/output.metadata.json

Exit Codes

Exit Code Meaning
0 Success
1 General error (invalid arguments, file not found, etc.)
Non-zero Processing failure; check stderr for details

Output Files

Tokens File (CSV)

RecordId,RuleId,Token
ID001,T1,Gn7t1Zj16E5Qy+z9iINtczP6fRDYta6C0XFrQtpjnVQSEZ5pQXAzo02Aa9LS9oNMOog6Ssw9GZE6fvJrX2sQ/cThSkB6m91L
ID001,T2,pUxPgYL9+cMxkA+8928Pil+9W+dm9kISwHYPdkZS+I2nQ/bQ/8HyL3FOVf3NYPW5NKZZO1OZfsz7LfKYpTlaxyzMLqMF2Wk7
...

Rows per input record: 5 (one per rule T1–T5)

Metadata File (JSON)

{
  "Platform": "Java",
  "JavaVersion": "21.0.0",
  "OpenTokenVersion": "1.0.0",
  "TotalRows": 100,
  "TotalRowsWithInvalidAttributes": 3,
  "InvalidAttributesByType": { "BirthDate": 2, "PostalCode": 1 },
  "BlankTokensByRule": { "T1": 2, "T2": 1 },
  "HashingSecretHash": "abc123...",
  "EncryptionSecretHash": "def456..."
}

See Reference: Metadata Format for complete field descriptions.


Common Patterns

Environment Variables for Secrets

export OPENTOKEN_HASHING_SECRET="MyHashingKey"
export OPENTOKEN_ENCRYPTION_KEY="MyEncryptionKey32CharactersLong"

java -jar opentoken-cli-*.jar \
  -i data.csv -t csv -o tokens.csv \
  -h "$OPENTOKEN_HASHING_SECRET" \
  -e "$OPENTOKEN_ENCRYPTION_KEY"

Logging and Monitoring

Check the metadata file after each run for:

  • TotalRowsWithInvalidAttributes: Records that failed validation
  • InvalidAttributesByType: Breakdown by attribute type
  • BlankTokensByRule: Rules that produced blank tokens

Troubleshooting

Problem Solution
“Encryption key not provided” Add -e "YourKey" or use --hash-only
“Invalid BirthDate” Use YYYY-MM-DD format; date must be 1910-01-01 to today
“Column not found” Check column names match accepted aliases
Docker build fails Ensure Docker is running; use absolute paths

Next Steps