My talk from Swissjs 2015 . This is about issues to keep in mind when localising software, showing some WTF moments that people probably don't keep in mind when the think about localisation.
14. Let’s talk terms
• Language is a language as it is spoken or
written
• Locale is the name given to a set of parameters
that define how things should be done for users
speaking a certain language in a certain place
• There are many more locales than countries
24. Date Formatting
• Obviously names of months and weekdays
• Order of distinct parts
• Separator character
• Commonly used formats in different contexts
25. Date Formatting
• Libraries usually provide a generic short/
medium/long format
• Libraries also provide templates
• If your library’s template language has any
characters that are not for replacement, they are
doing it wrong
• Apple does it right since 10.11 and iOS9
26. 2015-07-18 17:47
Long Medium Short
en-US
July 18, 2015 at
4:58:00 PM CEST
Jul 18, 2015,
4:58:00 PM
7/18/15, 4:58 PM
fr-CA
18 juillet 2015
16:58:00 UTC+2
18 juil. 2015
16:58:00
15-07-18 16:58
fr-CH
18 juillet 2015
16:58:00 UTC+2
18 juil. 2015
16:58:00
18.07.15 16:58
fr-FR
18 juillet 2015
16:58:00 UTC+2
18 juil. 2015
16:58:00
18/07/2015 16:58
27. Choice of calendar
• Most of the world is using the Gregorian
calendar
• The Julian calendar uses the same month names
but is off by 13 days (they have July 5th right
now)
• Other calendars use different month names
• Might affect holiday calculations
28. Collation order
• How to compare to strings. Which one is first?
• Where to put the characters with pesky
accents?
• How to deal with case differences?
• What about non-latin scripts?
29. Collation fun*
• Phonebook german vs. ordinary german, vs.
Austrian german (dealing with umlauts)
• Contractions (Spanish ch counts as one letter,
ch in Czech sorts after h, but c after b, etc)
• Handling of accents is language-dependent
• Case insensitive is a mess
30. Case folding
• Some languages don’t differentiate between upper- and
lowercase
• Inconsistent mapping between upper- and lowercase (ß
=> SS, the reverse is not always true)
• Uppercasing accented characters is language (and
sometimes locale) dependent. French characters often
loose accents when uppercasing
• Inconsistent uppercasing for some languages (uppercase
turkish i is İ. Lowercase turkish I is ı)
31. Double the fun
• Collation and Case-Folding provide an interesting
team
• Depending on locale, upper- and lowercase should be
sorted together or apart
• In some locales, case doesn’t matter at all when sorting
• In some locales, case always matters when sorting
• Depends on the use-case
32. Collation strength
• icu created the concept of “collation strength”
• strength 1 is the most lenient
• strength 5 is the most exact
• Example: Strength 2 removes accents unless
the language is Danish
37. Locale handling is like escaping
• Always store raw unformatted data
• Format near the end of the chain
• Just before you escape
• Parse user input as early as possible
• Use native data types
38. UI Language is not locale
• Users might prefer to use the os in a different
language than what’s inferred by their locale
• Just because I’m in de_CH it doesn’t mean I
want your software to speak german to me
• UI language is completely different from the
users locale
42. Mixing Locales
• Forming sentences in UI language with locale formatted
data is… challenging
• Be mindful that language might influence some locale
formatting.
• “This talk lasts ”
• or rather “This talk lasts 30 minutes”
• It depends. Does the locale also use hours and minutes?
46. What about web sites?
• Never, ever infer UI language by IP Geolocation.
People from Google: This slide is for you!
47. What about web sites?
• Never, ever infer UI language by IP Geolocation.
• Ever. Ever. EVER.
People from Google: This slide is for you!
48. What about web sites?
• Never, ever infer UI language by IP Geolocation.
• Ever. Ever. EVER.
• Promise!
People from Google: This slide is for you!
49. What about web sites?
• Never, ever infer UI language by IP Geolocation.
• Ever. Ever. EVER.
• Promise!
• You may infer Locale from IP Geolocation
though
People from Google: This slide is for you!
50. Rely on HTTP
• Trust Accept-Language - by now browser set
it correctly
• Use the header to determine UI language
• Use the header to determine default locale
• But ask the user
• Same goes for time zones
52. The past
• There has always been date formatting
(Date.toLocaleString). Mostly useless
• People were self-nebling (search youtube for “ich
neble selber”) for example in date pickers and
libraries
• hint: applying substr() to Date.toDateString() is not a
correct solution.
• same goes for using replace(‘.’, ‘,’) on a number
53. The present
• Microsoft has donated a huge chunk of localisation code to the
jQuery project.
• It’s not integrated into jQuery, but maintained by the jQuery project
• Check out https://github.com/jquery/globalize
• Doesn’t support collation
• The library is big
• But most of it is data and this problem can only be solved with a
huge database of special cases
61. The future
• ECMA-402 from 2012
• Yes. Specs from 2012 are “the future” in JS land
• Provides the global Intl object
• Date, Number formatting and Collation
• see: http://www.ecma-international.org/
ecma-402/1.0/
64. var f = new Intl.DateTimeFormat('de-CH', {
weekday: 'long', year: 'numeric',
month: 'long', day: 'numeric'
});
console.log(f.format(new Date()));
var n = new Intl.NumberFormat('de-CH', {
style: "decimal",
minimumFractionDigits: 2
});
console.log(n.format(1234.5));
var currency = new Intl.NumberFormat('de-CH', {
style: "currency",
currency: 'EUR'
});
console.log(currency.format(1234.5));
var comp = new Intl.Collator('de-CH');
var words = [
"Swissjs", "swissjs", "is",
"loads", "of", "fun"
];
console.log(words.sort(comp));
65. var f = new Intl.DateTimeFormat('de-CH', {
weekday: 'long', year: 'numeric',
month: 'long', day: 'numeric'
});
console.log(f.format(new Date()));
var n = new Intl.NumberFormat('de-CH', {
style: "decimal",
minimumFractionDigits: 2
});
console.log(n.format(1234.5));
var currency = new Intl.NumberFormat('de-CH', {
style: "currency",
currency: 'EUR'
});
console.log(currency.format(1234.5));
var comp = new Intl.Collator('de-CH');
var words = [
"Swissjs", "swissjs", "is",
"loads", "of", "fun"
];
console.log(words.sort(comp));
66. var f = new Intl.DateTimeFormat('de-CH', {
weekday: 'long', year: 'numeric',
month: 'long', day: 'numeric'
});
console.log(f.format(new Date()));
var n = new Intl.NumberFormat('de-CH', {
style: "decimal",
minimumFractionDigits: 2
});
console.log(n.format(1234.5));
var currency = new Intl.NumberFormat('de-CH', {
style: "currency",
currency: 'EUR'
});
console.log(currency.format(1234.5));
var comp = new Intl.Collator('de-CH');
var words = [
"Swissjs", "swissjs", "is",
"loads", "of", "fun"
];
console.log(words.sort(comp));
67. var f = new Intl.DateTimeFormat('de-CH', {
weekday: 'long', year: 'numeric',
month: 'long', day: 'numeric'
});
console.log(f.format(new Date()));
var n = new Intl.NumberFormat('de-CH', {
style: "decimal",
minimumFractionDigits: 2
});
console.log(n.format(1234.5));
var currency = new Intl.NumberFormat('de-CH', {
style: "currency",
currency: 'EUR'
});
console.log(currency.format(1234.5));
var comp = new Intl.Collator('de-CH');
var words = [
"Swissjs", "swissjs", "is",
"loads", "of", "fun"
];
console.log(words.sort(comp));
68. var f = new Intl.DateTimeFormat('de-CH', {
weekday: 'long', year: 'numeric',
month: 'long', day: 'numeric'
});
console.log(f.format(new Date()));
var n = new Intl.NumberFormat('de-CH', {
style: "decimal",
minimumFractionDigits: 2
});
console.log(n.format(1234.5));
var currency = new Intl.NumberFormat('de-CH', {
style: "currency",
currency: 'EUR'
});
console.log(currency.format(1234.5));
var comp = new Intl.Collator('de-CH');
var words = [
"Swissjs", "swissjs", "is",
"loads", "of", "fun"
];
console.log(words.sort(comp));
69. var f = new Intl.DateTimeFormat('de-CH', {
weekday: 'long', year: 'numeric',
month: 'long', day: 'numeric'
});
console.log(f.format(new Date()));
var n = new Intl.NumberFormat('de-CH', {
style: "decimal",
minimumFractionDigits: 2
});
console.log(n.format(1234.5));
var currency = new Intl.NumberFormat('de-CH', {
style: "currency",
currency: 'EUR'
});
console.log(currency.format(1234.5));
var comp = new Intl.Collator('de-CH');
var words = [
"Swissjs", "swissjs", "is",
"loads", "of", "fun"
];
console.log(words.sort(comp));
70. var f = new Intl.DateTimeFormat('de-CH', {
weekday: 'long', year: 'numeric',
month: 'long', day: 'numeric'
});
console.log(f.format(new Date()));
var n = new Intl.NumberFormat('de-CH', {
style: "decimal",
minimumFractionDigits: 2
});
console.log(n.format(1234.5));
var currency = new Intl.NumberFormat('de-CH', {
style: "currency",
currency: 'EUR'
});
console.log(currency.format(1234.5));
var comp = new Intl.Collator('de-CH');
var words = [
"Swissjs", "swissjs", "is",
"loads", "of", "fun"
];
console.log(words.sort(comp));
71. var f = new Intl.DateTimeFormat('de-CH', {
weekday: 'long', year: 'numeric',
month: 'long', day: 'numeric'
});
console.log(f.format(new Date()));
var n = new Intl.NumberFormat('de-CH', {
style: "decimal",
minimumFractionDigits: 2
});
console.log(n.format(1234.5));
var currency = new Intl.NumberFormat('de-CH', {
style: "currency",
currency: 'EUR'
});
console.log(currency.format(1234.5));
var comp = new Intl.Collator('de-CH');
var words = [
"Swissjs", "swissjs", "is",
"loads", "of", "fun"
];
console.log(words.sort(comp));
72. Conclusion
• Proper localisation is part of our job to make the web useful for
everybody
• Use the libraries provided
• Whenever you think you know better than the library: No. You
don’t.
• Remember that UI language and Locale are not always connected
• Don’t do IP geolocation for language choice
• When in doubt: Ask the user. She’ll know for sure.