Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Data Wars: The Bloody Enterprise strikes back

269 Aufrufe

Veröffentlicht am

I would like to describe such cases when we create problems for "future us" just by an accident. I will show how different Java data types can ease or increase the pain in supporting the application later. Most common pitfals and tricky corner cases you probably have never thought about.

Veröffentlicht in: Software

Data Wars: The Bloody Enterprise strikes back

  1. 1. Data Wars The Bloody Enterprise strikes back
  2. 2. Victor Polischuk @alkovictor
  3. 3. It is GOOD when we have a lot of data when we have data several years old, the older the better It is BAD when we have to remove historical data
  4. 4. It is BAD when we have a lot of code when we have code several years old, the older the worse It is GOOD when we have to remove historical code
  5. 5. Money, numbers, and arithmetic Identities Text data and Strings Date and time
  6. 6. Please Participate
  7. 7. Money
  8. 8. Money Float & Double Convert to Integer
  9. 9. Money Float & Double Problem Developers usually have no idea how it is represented: • 𝑒 = 2.718281828459045 • π = 3.141592653589793 • tan π 2 = 1.633123935319537𝑬16
  10. 10. Money Float & Double Quiz #1 • = ? Float: 0.6 + 0.1 • = ? Double: 0.6 + 0.1
  11. 11. Money Float & Double Quiz #1 • = 0.70000005 Float: 0.6 + 0.1 • = 0.7 Double: 0.6 + 0.1
  12. 12. Money: Float & Double: stackoverflow.com
  13. 13. Money Float & Double Quiz #2 • = ? Float: 0.2 + 0.1 • = ? Double: 0.2 + 0.1
  14. 14. Money Float & Double Quiz #2 • = 0.3 Float: 0.2 + 0.1 • = 0.30000000000000004 Double: 0.2 + 0.1
  15. 15. Money Float & Double Drill Down • Binary representation: [sign] [exponent] [mantissa] • Float: 1 bit, 8 bits, 23 bits • Double: 1 bit, 11 bits, 52 bits • Value: (2 ∗ 1 − 𝑠 − 1) ∗ 2 𝑒−2 𝐸−1−1 ∗ (1 + 𝑚/2 𝑀) 0.1f = 0-01111011-10011001100110011001101 0.1f = + 2-127+123 * (2-1 + 2-4 + 2-5 + 2-8 + 2-9 + 2-12 + 2-13 + 2-16 + 2-17 …) 0.1f = 2-4 * (1 + 5033165 / 223) = 0.100000001490116119384765625
  16. 16. Money Float & Double Example
  17. 17. Money Float & Double Quiz #3 +0.0f = 0-00000000-00000000000000000000000 -0.0f = 1-00000000-00000000000000000000000 +0.0f == -0.0f?
  18. 18. Money Float & Double Drill Down /** * Get or create float value for the given float. * * @param d the float * @return the value */ public static ValueFloat get(float d) { if (d == 1.0F) { return ONE; } else if (d == 0.0F) { // -0.0 == 0.0, and we want to return 0.0 for both return ZERO; } return (ValueFloat) Value.cache(new ValueFloat(d)); }
  19. 19. Money Float & Double Summary Just never use it Forget it exists Unless you are working on a video codec
  20. 20. Money Convert to Integer Multiply decimals up to integers • (as a constant probably) Keep the “scale” somewhere else
  21. 21. Money Convert to Integer Quiz #1 •= ? 10 * 230 + 5
  22. 22. Money Convert to Integer Quiz #1 •= 28, where 10 is 10% 10 * 230 + 5
  23. 23. Money Convert to Integer Drill Down @Embeddable public class Amount implements Serializable { private int rate; @Transient private final int scale; public Amount() { scale = 6; } public Amount(int rate, int scale) { this.scale = scale; setRate(rate); } …
  24. 24. Money Convert to Integer Summary • It is better to keep precision closer to the number • It is better when arithmetic just works • It is better when equals and compareTo work • int <*/+> int can exceed int (same with long) • Consistency is almost always above performance
  25. 25. Money Solution BigDecimal Precision and accuracy are known and adjustable Arithmetic is included Supported by JDK, JDBC, and etc Performance is quite nice
  26. 26. identity
  27. 27. Identity Unexpected Overflow UUID
  28. 28. Identity Overflow Integer and Long are finite types Sometimes they can overflow Moreover they usually twice smaller than you think
  29. 29. Identity Overflow Example
  30. 30. Identity Overflow Example
  31. 31. Identity Overflow Sucker Punch • Also, “some languages” cannot work with 53+ bits integer types • In addition, “some languages” work with custom 32-bit integer types
  32. 32. Identity Overflow Summary There is a difference between DB and API identity • Always use integer types as identity for DB • Always use text types as identity for API • Avoid using 32- bit types as identity at all Unless you are 99.9% sure
  33. 33. Identity UUID • Are not guaranteed to be globally unique • Not K-ordered • In most of the cases are excessively big (128 bit) • Can be the reason of a serious performance degradation • Have different versions which may suite better or worse • Strangely enough RDBMS rarely supports UUID/GUID data types • Weird: • Time based on 100-nanosecond intervals since 15th of October 1582 • Were invented/published around 1999
  34. 34. Identity UUID Store as String • 16 bytes UUID is 128-bit value • 36 symbols – which is more than 2 times bigger A96A0D4C-49D0-4431-B126-4C66688ADEF3
  35. 35. Identity UUID Drill Down High long 32 time_low 16 time_mid 4 version 12 time_hi Low long 4 variant 12 clock_seq 48 node
  36. 36. Identity UUID Example $ uuidgen 0c8aa0f6-9f6f-4fad-9662-1b683f2f4a0d $ uuidgen 1ee09695-3a04-4e7a-8bab-e67dabc4b5a2 $ uuidgen -t 3770f4d0-88b3-11e6-bba6-005056bb68cb $ uuidgen -t 3b14dd54-88b3-11e6-8c53-005056bb68cb
  37. 37. Identity Solution • Use text representation for public identities (API) • Database Sequences (Long) • UUID + Database Sequences (Long) • UUID (BigInteger/Binary) • Twitter Snowflake (Long) – outdated • UUID (String) • *Flake (128 bit)
  38. 38. String
  39. 39. String Java and encoding JDBC drivers and DB types
  40. 40. String Encoding Java uses UTF-16 for String encoding UTF-16 has symbol range: 0x0000..0x10FFFF String uses char[] (byte[] in JDK9) Char has range: 0x0000..0xFFFF
  41. 41. String Encoding Quiz #1 How to represent range 0x100000..0x10FFFF using char?
  42. 42. String Encoding Quiz #1 • Define surrogate range: 0xD800..0xDFFF (0x800 characters) • Split it equally to “High”: 0xD800..0xDBFF and “Low”: 0xDC00..0xDFFF • Combine “High”-to-“Low” to get 0x400 * 0x400 = 0x100000 symbols • Profit??? • Profit!!!
  43. 43. String Encoding Quiz #2 String x = new String(new char[]{ 'z',0xD801,0xDC37,'a','b','c' }); System.out.println(x); System.out.println(x.substring(0,2)); System.out.println(x.substring(2));
  44. 44. String Encoding Quiz #2 z𐐷abc z? ?abc
  45. 45. String Encoding Solution No solution, just be aware Yet, it might be more sophisticated soon
  46. 46. String JDBC vs DB Mapping DB specific types to JDBC Some DB or driver exceptional cases BLOB vs CLOB Narrower DB encoding
  47. 47. String JDBC vs DB Mapping public enum JDBCType implements SQLType { CHAR(Types.CHAR), VARCHAR(Types.VARCHAR), LONGVARCHAR(Types.LONGVARCHAR), ... BLOB(Types.BLOB), CLOB(Types.CLOB), ... NCHAR(Types.NCHAR), NVARCHAR(Types.NVARCHAR), LONGNVARCHAR(Types.LONGNVARCHAR), NCLOB(Types.NCLOB),
  48. 48. String JDBC vs DB Mapping • CHARACTER [(len)] or CHAR [(len)] • VARCHAR (len) • BOOLEAN • SMALLINT • INTEGER or INT • DECIMAL [(p[,s])] or DEC [(p[,s])] • NUMERIC [(p[,s])] • REAL • FLOAT(p) • DOUBLE PRECISION • DATE • TIME • TIMESTAMP • CLOB [(len)] or CHARACTER LARGE OBJECT [(len)] or CHAR LARGE OBJECT [(len)] • BLOB [(len)] or BINARY LARGE OBJECT [(len)]
  49. 49. String JDBC vs DB Quiz #1 What is JDBC type LONGVARCHAR?
  50. 50. String JDBC vs DB Drill Down setStringInternal(int var1,String var2) throws SQLException { ... int var6 = var2 != null?var2.length():0; ... if(var6 <= this.maxVcsCharsSql) { this.basicBindString(var1, var2); } else if(var6 <= this.maxStreamNCharsSql) { this.setStringForClobCritical(var1, var2); } else { this.setStringForClobCritical(var1, var2); }
  51. 51. String JDBC vs DB CHAR & VARCHAR
  52. 52. String JDBC vs DB BLOB & CLOB • just bytes Binary Large OBject • just characters in your DB encoding Character Large OBject
  53. 53. String JDBC vs DB Char & NChar • uses your DB encoding or no encoding Char, Varchar, CLOB… • uses specified encoding NChar, NVarchar, NCLOB… • does not have NBLOB BLOB
  54. 54. String JDBC vs DB Cp1251 vs Cp1252 Sometimes encoding does not matter much Unless too smart drivers spoil it Unless they are not compatible
  55. 55. String JDBC vs DB Solution Check your DB encoding upfront If needed use N* DB types and N* JDBC types as well
  56. 56. String JDBC vs DB Solution Losing data because of encoding is lame If you expect some strange strings coming use N* types Never forget that symbol is not a char/byte it may save you one day Your JDBC driver can screw you
  57. 57. Date
  58. 58. Date Time zones DST and leap miracles
  59. 59. Date Time Zone Does DB and App time zone match? What can go wrong if they don’t?
  60. 60. Date Time Zone Quiz #1 • Database: Oracle 11g • Database time zone: CET/CEST (+01:00/+02:00) • Application: Java 8 • Application time zone: EET/EEST (+02:00/+03:00) • setTimestamp(‘2016-10-14 15:35:01’)? • getTimestamp()?
  61. 61. Date Time Zone Quiz #1 Hint final int oracleYear(int var1) { int var2 = ((this.rowSpaceByte[0 + var1] & 255) - 100) * 100 + (this.rowSpaceByte[1 + var1] & 255) - 100; return var2 <= 0?var2 + 1:var2; } final int oracleMonth(int var1) { return this.rowSpaceByte[2 + var1] - 1; } final int oracleDay(int var1) { return this.rowSpaceByte[3 + var1]; } final int oracleHour(int var1) { return this.rowSpaceByte[4 + var1] - 1; } final int oracleMin(int var1) { return this.rowSpaceByte[5 + var1] - 1; } final int oracleSec(int var1) { return this.rowSpaceByte[6 + var1] - 1; } final int oracleTZ1(int var1) { return this.rowSpaceByte[11 + var1]; } final int oracleTZ2(int var1) { return this.rowSpaceByte[12 + var1]; }
  62. 62. Date Time Zone Quiz #1 • 2016-10-14 15:35:01 Database • 2016-10-14 15:35:01 Application
  63. 63. Date Time Zone Quiz #1 • 2016-10-14 15:35:01 Database • 2016-10-14 15:35:01 Application with UTC time zone
  64. 64. Date Time Zone Quiz #2 JavaScript client: time zone unknown Java server: EET time zone How to pass dates?
  65. 65. Date Time Zone Quiz #2 • Date… ehmm… JSON does not know what it is… • Long is a bit of a problem for 53+ impotent integer types (now 41, ~140,000 years and we will cross the border) • String as ISO 8601 is a lesser evil
  66. 66. Date Time Zone Solution Use the same App/DB time zone Check your DB driver to ensure conversion safety Store timestamps as long: DB and API Store timestamps as String: API
  67. 67. Date DST & Magic Missing and extra hours Leap seconds
  68. 68. Date DST & Magic Calculations 24 hours in a day 60 minutes in an hour 60 seconds in a minute FTW: 24 * 60 * 60 * 1000
  69. 69. Date DST & Magic Quiz #1 27.03.2016 00:00:00 - 28.03.2016 00:00:00 (EET/EEST)
  70. 70. Date DST & Magic Quiz #1 26.03.2016 22:00:00 - 27.03.2016 21:00:00 (UTC) – 23h
  71. 71. Date DST & Magic Quiz #2 31.12.2016 00:00:00 – 01.01.2017 00:00:00 (EET/EEST) 01.01.2017 00:00:00 – 02.01.2017 00:00:00 (EET/EEST)
  72. 72. Date DST & Magic Quiz #2 30.12.2016 22:00:00 – 31.01.2017 22:00:00 (UTC) – 24h 31.12.2017 22:00:00 – 01.01.2017 22:00:00 (UTC) – 24h+1s
  73. 73. Date DST & Magic Quiz #2 It will happen: 31.12.2016 23:59:60 (UTC) It had happened: 30.06.2015 23:59:60 (UTC) Blame the Earth, and Moon, and Sun Blame software developers
  74. 74. Date One Last Thing Date vs Interval Date is a tuple of year, month, day, hour, and etc. Instant is a precise point on the timeline
  75. 75. Date One Last Thing Date vs Interval Date can be converted to Instant Instant can be converted to Date • even within a Chronology However, “conversion rate” is not constant
  76. 76. Date Summary • Use UTC as much as possible • Keep in mind the difference between Date and Instant • Think of Date/Instant interoperation as it was designed/used by idiots • 24*60*60*1000 is, basically, simplification. Quite harmful at times. • Use proper date libraries – you wouldn’t want to reinvent it again. • GMT is not yet another name for UTC, beware!
  77. 77. Thank you

×