Token Rules

OpenToken generates five distinct token types (T1-T5). Each rule defines a token signature (a deterministic, normalized string) which is then transformed into the output token via hashing (and optionally encryption).

Overview

Each token rule defines:

Required attributes: Must all be present and valid
Concatenation order: Determines the token signature
Use case: When this rule is most effective

Notation used below:

U(X) = uppercase(X)
[0] = first character
[0:3] = first 3 characters

T1: Last Name + First Initial + Sex + Birth Date

Purpose: Higher recall matching that tolerates first-name variation

T1 Signature

T1 = U(LastName) | U(FirstName[0]) | U(Sex) | BirthDate

T1 Characteristics

Precision: Medium-high
Recall: High
Best for: Candidate generation and matching when nicknames/first-name variation are common

T1 Example

Input:
  FirstName: Thomas
  LastName: O'Reilly
  BirthDate: 11/03/1995
  Sex: Male

Normalized:
  FirstName: THOMAS
  LastName: OREILLY
  BirthDate: 1995-11-03
  Sex: M

Token Signature: "OREILLY|T|M|1995-11-03"

T2: Last Name + First Name + Birth Date + ZIP-3

Purpose: Adds geography; useful when postal code is available

T2 Signature

T2 = U(LastName) | U(FirstName) | BirthDate | U(PostalCode[0:3])

T2 Characteristics

Precision: High
Recall: Good (requires postal code)
Best for: Matching within regions; cases where sex is missing or unreliable

T2 Example

Input:
  FirstName: Maria
  LastName: Garcia
  BirthDate: 03/22/1988
  PostalCode: 90210-4455

Normalized:
  FirstName: MARIA
  LastName: GARCIA
  BirthDate: 1988-03-22
  PostalCode: 90210

Token Signature: "GARCIA|MARIA|1988-03-22|902"

T3: Last Name + First Name + Sex + Birth Date

Purpose: Higher precision than T1 by requiring the full first name

T3 Signature

T3 = U(LastName) | U(FirstName) | U(Sex) | BirthDate

T3 Characteristics

Precision: High
Recall: Medium-high
Best for: Stricter matching when you expect name stability

T3 Example

Token Signature: "GARCIA|MARIA|F|1988-03-22"

T4: SSN (Digits Only) + Sex + Birth Date

Purpose: Very high precision matching when SSN is present

T4 Signature

T4 = SSN_digits | U(Sex) | BirthDate

Notes:

The SSN is normalized to digits only (dashes removed).

T4 Characteristics

Precision: Very high
Recall: Low (SSN often missing)
Best for: High-confidence linking when SSN exists and is valid

T4 Example

Input SSN: 452-38-7291
SSN_digits: 452387291

Token Signature: "452387291|F|1988-03-22"

T5: Last Name + First 3 Letters + Sex

Purpose: Highest recall / lowest precision (use cautiously)

T5 Signature

T5 = U(LastName) | U(FirstName[0:3]) | U(Sex)

T5 Characteristics

Precision: Lower
Recall: Highest
Best for: Broad matching / candidate generation, followed by stricter confirmation

T5 Example

FirstName: Jonathan -> FirstName[0:3] = JON
Token Signature: "SMITH|JON|M"

Token Rule Summary

RuleId	Signature attributes	Typical precision	Typical recall
T1	Last, First[0], Sex, BirthDate	Medium-high	High
T2	Last, First, BirthDate, ZIP3	High	Good
T3	Last, First, Sex, BirthDate	High	Medium-high
T4	SSN(digits), Sex, BirthDate	Very high	Low
T5	Last, First[0:3], Sex	Lower	Highest

Attribute Fallback

When a required attribute is missing or invalid:

Token not generated for that rule
Other rules still computed if their attributes are valid
The metadata file records aggregate counts

{
  "TotalRows": 105,
  "TotalRowsWithInvalidAttributes": 12,
  "InvalidAttributesByType": {
    "BirthDate": 3,
    "FirstName": 1,
    "LastName": 2,
    "PostalCode": 2,
    "Sex": 0,
    "SocialSecurityNumber": 4
  },
  "BlankTokensByRule": {
    "T1": 6,
    "T2": 8,
    "T3": 6,
    "T4": 7,
    "T5": 3
  }
}