This presentation provides an introduction to tokenization. It describes what tokenization is, how it implement and also compares it with encryption. Most people try to separate tokenization from encryption. However, it may not really be the case as tokenization could be form of encryption as well.
2. What is tokenization?
• Replace a value with a surrogate value called
“token”
value Tokenize token
• Examples
Value Token Comment
1344 6423 1231 1521 aX73pQ43T1#+4oxT4 Token consists of alphanumeric values
1344 6423 1231 1521 3124224578918001 Token consists of numeric values only
1344 6423 1231 1521 aX73pQ43T1#+y1521 Token replaces the first 12 digits with a alphanumeric value
3. Properties of a Good Token
• Format and length preserving
• Some characteristics may be preserved (e.g. last four
digits of CC#s)
• Irreversible without some private information (i.e.
given a token, it is difficult to find the value)
• Distinguishable from the value
– If the token is not distinguishable from the value,
customers won’t be able to identify sensitive data and
apply proper protection mechanisms; further, customers
may inadvertently leak sensitive data thinking they are
tokens
4. What is de-tokenization?
• The reverse process of finding the actual value
from a token
token De-tokenize value
5. Why tokenize?
• Reduced risk due to limited exposure of
sensitive information (sensitive information is
centralized in one location and downstream
apps work with tokens)
• Reduce the PCI scope (the number of nodes
with sensitive data reduces)
• Minimal changes to applications to support
tokenization (tokenization is format and
length preserving)
6. An Example – Tokenizing CC#s
Point of Payment App
Sale Tokenization
System
(2) Tokenize CC
(3) Tokenized CC
(1) Payment, CC
Customer Data
Warehouse
(4) Tokenized CC
Order Processing
App
CRM App
[INTERNET]
MERCHANT
DATA CENTER
(5) Tokenized CC
7. Single-use vs. Multi-use tokens
Single-use token Multi-use token
Usually used to represent a single
transaction
Usually used to represent a unique
value (for example, CC#), usually
used across multiple transactions
A given value, it may map to
multiple tokens
Token maps to a unique value
within the tokenization system
Short lived Long lived
8. How to Generate Tokens?
• Use a mathematically reversible cryptographic
function (e.g. Format Preserving Encryption)
• Use a one-way non-reversible cryptographic
function (e.g. a hash function such as SHA-2)
• Static tables mapping values to random
tokens (tokens are not mathematically
derived from values)
11. How to manage tokens?
• Two options
– In-house
– Third-party service provider
• In-house tokenization server
– Company owns and operates the token system and token database
– The token server stores the original sensitive data
– Usually used by large companies who wants to keep sensitive data
• Third-party tokenization server (TaaS – Tokenization as a Service)
– Third-party service providers generate tokens and give to companies
– Usually used by small companies who do not want actual sensitive
data
– E.g. In CC transactions, the payment processor generates a token and
gives only the token to merchant for future references (e.g.: recurring
fees, refund, etc.) – sacrifice control and pay higher tax fee in
exchange for convenience, reduced liability and cheaper PCI
compliance.
12. Tokenization vs. Encryption
Tokenization Encryption
Output is format and length preserving Output is not generally format or length
preserving (e.g. AES, RSA) (exception –
FPE – Format Preserving Encryption, OPE
– Order Preserving Encryption)
May or may not use encryption as the
mapping function (could use a hash
function or a static mapping table)
Encryption does not have any using
tokenization internally
Out is may or may not be reversible Output is always reversible given the key
Regulatory compliance – PCI DSS Regulatory compliance – Safe Harbor,
HIPAA
A main use case is to reduce PCI scope by
passing tokens to downstream
applications
A main use case is to ensure the
confidentiality of data at rest (even if the
storage media is compromised to lost,
attackers are not able to see the actual
data as they don’t have the keys)
13. How Tokenization is currently Used
in the corporate market?
• Use tokenization to replace sensitive data such as
CC# with random numbers (3rd method of
tokenization mentioned earlier)
• Keep the sensitive data encrypted in a database
• Since tokens preserve the length and format,
changes to applications is minimal
• The sensitive data is exposed only when it is
necessary; otherwise, apps work with the tokens