Python Quickstart

For a high-level overview and other entry points, see Quickstarts.

Install the Python packages and run the OpenToken CLI with a virtual environment.

Prerequisites

  • Python 3.10+
  • pip (usually included with Python)

Verify your installation:

python --version   # Should show 3.10 or higher
pip --version

Setup Virtual Environment

Important: The virtual environment should be created at the repository root.

cd /path/to/OpenLinkToken

# Create virtual environment at repo root
python -m venv .venv

# Activate (Linux/Mac)
source .venv/bin/activate

# Activate (Windows)
.\.venv\Scripts\activate

Install Dependencies

# Install core library
cd lib/python/opentoken
pip install -r requirements.txt -e .

# Install CLI
cd ../opentoken-cli
pip install -r requirements.txt -e .

Run Token Generation

Basic Encrypted Tokens

python -m opentoken_cli.main \
  -i ../../../resources/sample.csv \
  -t csv \
  -o ../../../resources/output.csv \
  -h "YourHashingSecret" \
  -e "YourEncryptionKey-32Chars-Here!"

Hash-Only Mode (No Encryption)

python -m opentoken_cli.main \
  -i ../../../resources/sample.csv \
  -t csv \
  -o ../../../resources/output.csv \
  -h "YourHashingSecret" \
  --hash-only

Parquet Format

python -m opentoken_cli.main \
  -i input.parquet \
  -t parquet \
  -o output.parquet \
  -h "YourHashingSecret" \
  -e "YourEncryptionKey-32Chars-Here!"

Verify Output

# View token output
head ../../../resources/output.csv

# View metadata
cat ../../../resources/output.metadata.json

Using the Python API Programmatically

from opentoken.attributes.person.birth_date_attribute import BirthDateAttribute
from opentoken.attributes.person.first_name_attribute import FirstNameAttribute
from opentoken.attributes.person.last_name_attribute import LastNameAttribute
from opentoken.attributes.person.postal_code_attribute import PostalCodeAttribute
from opentoken.attributes.person.sex_attribute import SexAttribute
from opentoken.attributes.person.social_security_number_attribute import SocialSecurityNumberAttribute
from opentoken.tokens.token_definition import TokenDefinition
from opentoken.tokens.token_generator import TokenGenerator
from opentoken.tokens.tokenizer.sha256_tokenizer import SHA256Tokenizer
from opentoken.tokentransformer.encrypt_token_transformer import EncryptTokenTransformer
from opentoken.tokentransformer.hash_token_transformer import HashTokenTransformer

record_id = "patient_123"

person_attributes = {
  FirstNameAttribute: "John",
  LastNameAttribute: "Doe",
  BirthDateAttribute: "1980-01-15",
  SexAttribute: "Male",
  PostalCodeAttribute: "98004",
  SocialSecurityNumberAttribute: "123-45-6789",
}

token_definition = TokenDefinition()
tokenizer = SHA256Tokenizer([
  HashTokenTransformer("HashingSecret"),
  EncryptTokenTransformer("Secret-Encryption-Key-Goes-Here."),
])

generator = TokenGenerator(token_definition, tokenizer)
result = generator.get_all_tokens(person_attributes)
if result.invalid_attributes:
  print(f"Invalid attributes: {sorted(result.invalid_attributes)}")

for rule_id, token in result.tokens.items():
  print(f"{record_id},{rule_id},{token}")

Cross-Language Parity

OpenToken guarantees that Java and Python produce identical tokens for the same input. This is verified by interoperability tests:

cd tools/interoperability
pip install -r requirements.txt
python java_python_interoperability_test.py

The test:

  1. Generates tokens using Java CLI
  2. Generates tokens using Python CLI
  3. Compares all tokens byte-by-byte
  4. Fails if any mismatch is found

PySpark Integration

For distributed processing on Spark or Databricks:

cd lib/python/opentoken-pyspark
pip install -r requirements.txt -e .

See Spark or Databricks for usage.

Troubleshooting

“ModuleNotFoundError: No module named ‘opentoken’”

Make sure you installed with -e . (editable mode) from the correct directory.

“Python version not supported”

OpenToken requires Python 3.10+. Check with python --version.

Virtual Environment Not Activated

If commands fail, ensure venv is active:

source /path/to/OpenLinkToken/.venv/bin/activate

Import Errors After Updates

Reinstall the packages:

pip install -e . --force-reinstall

Development Setup

For contributing to OpenToken:

# Install development dependencies
pip install -r dev-requirements.txt

# Run tests
pytest

# Run with coverage
pytest --cov=opentoken --cov-report=html

Next Steps