Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Java BigData Full Stack Development (version 2.0)

632 Aufrufe

Veröffentlicht am

LETI lection for students at 15.11.2016

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

Java BigData Full Stack Development (version 2.0)

  1. 1. Java BigData Full Stack Development as is ... Alexey Zinovyev, Java Trainer in EPAM
  2. 2. About With IT since 2007 With Java since 2009 With Hadoop since 2012 With EPAM since 2015
  3. 3. 3Java Big Data Full Stack Development Contacts E-mail : Alexey_Zinovyev@epam.com Twitter : @zaleslaw @BigDataRussia vk.com/big_data_russia Big Data Russia vk.com/java_jvm Java & JVM langs
  4. 4. 4Java Big Data Full Stack Development The Good Old Days
  5. 5. 5Java Big Data Full Stack Development HRs & RMs are looking for Java developers
  6. 6. 6Java Big Data Full Stack Development Is Java Dream Team waiting You?
  7. 7. 7Java Big Data Full Stack Development Required Skills • Advanced SQL • Basic Linux • Core Java & JVM • Backend Development Experience • Basic Computer Science Level
  8. 8. 8Java Big Data Full Stack Development REAL WORLD
  9. 9. 9Java Big Data Full Stack Development Let’s just use Javascript in frontend ONLY
  10. 10. 10Java Big Data Full Stack Development In frontend ONLY?
  11. 11. 11Java Big Data Full Stack Development Cruel world
  12. 12. 12Java Big Data Full Stack Development Do you know ML JS library?
  13. 13. 13Java Big Data Full Stack Development Wild animals everywhere
  14. 14. 14Java Big Data Full Stack Development And what I tell you
  15. 15. 15Java Big Data Full Stack Development And what I tell you
  16. 16. 16Java Big Data Full Stack Development It’s Time for Java Superhero, yeah!
  17. 17. 17Java Big Data Full Stack Development Before patterns discovering you should .. • Select small pieces • Define default values for missed data • Remove strange signals from data • Merge some tables in one if required
  18. 18. 18Java Big Data Full Stack Development How it really works • Share your date with us • Our magic manipulations • Building an answering machine • PROFIT!!!
  19. 19. 19Java Big Data Full Stack Development How to start?
  20. 20. 20Java Big Data Full Stack Development
  21. 21. 21Java Big Data Full Stack Development WHAT IS BIG DATA?
  22. 22. 22Java Big Data Full Stack Development Joke about Excel
  23. 23. 23Java Big Data Full Stack Development 5V
  24. 24. 24Java Big Data Full Stack Development Every 60 seconds…
  25. 25. 25Java Big Data Full Stack Development From Mobile Devices
  26. 26. 26Java Big Data Full Stack Development From Industry
  27. 27. 27Java Big Data Full Stack Development We started to keep and handle stupid new things!
  28. 28. 28Java Big Data Full Stack Development 10^6 rows in MySQL
  29. 29. 29Java Big Data Full Stack Development GB->TB->PB->?
  30. 30. 30Java Big Data Full Stack Development Is BigData about PBs?
  31. 31. 31Java Big Data Full Stack Development Is BigData about PBs?
  32. 32. 32Java Big Data Full Stack Development It’s hard to … • .. store • .. handle • .. search in • .. visualize • .. send in network
  33. 33. 33Java Big Data Full Stack Development Likes in Classmates: how to count?
  34. 34. 34Java Big Data Full Stack Development Crazy Zoo 2012
  35. 35. 35Java Big Data Full Stack Development Crazy Zoo 2016
  36. 36. 36Java Big Data Full Stack Development What will be lighted this training
  37. 37. 37Java Big Data Full Stack Development NOSQL
  38. 38. 38Java Big Data Full Stack Development What’s the problem with RBDMS’s • Caching • Master/Slave • Cluster • Table Partitioning • Sharding
  39. 39. 39Java Big Data Full Stack Development Family
  40. 40. 40Java Big Data Full Stack Development Database party
  41. 41. 41Java Big Data Full Stack Development Spring Data
  42. 42. 42Java Big Data Full Stack Development How to start?
  43. 43. 43Java Big Data Full Stack Development Java MongoDB Driver + Robomongo
  44. 44. 44Java Big Data Full Stack Development BIG DATA TOOL MASTER VS DATA SCIENTIST
  45. 45. 45Java Big Data Full Stack Development TRAIN MODEL
  46. 46. 46Java Big Data Full Stack Development Datasets • Facebook users, tweets • Trade transactions • Government • Medicine (genomic data) • Telecommunications
  47. 47. 47Java Big Data Full Stack Development Data Sources • Relational Databases • Data warehouses (Historical data) • Files in CSV or in binary format • Internet or electronic mails • Scientific, research (R, Octave, Matlab)
  48. 48. 48Java Big Data Full Stack Development Hey, man, predict something!
  49. 49. 49Java Big Data Full Stack Development Man or sofa?
  50. 50. 50Java Big Data Full Stack Development Typical questions for DM • Which loan applicants are high-risk?
  51. 51. 51Java Big Data Full Stack Development Typical questions for DM • Which loan applicants are high-risk? • How do we detect phone card fraud?
  52. 52. 52Java Big Data Full Stack Development Typical questions for DM • Which loan applicants are high-risk? • How do we detect phone card fraud? • What is the revenue prediction for next year?
  53. 53. 53Java Big Data Full Stack Development Typical questions for DM • Which loan applicants are high-risk? • How do we detect phone card fraud? • What is the revenue prediction for next year? • Can you recommend music for users?
  54. 54. 54Java Big Data Full Stack Development Green circle is blue square or red triangle? Let’s ask its neighbors! kNN (k-nearest neighbor)
  55. 55. 55Java Big Data Full Stack Development Collaborative Filtering
  56. 56. 56Java Big Data Full Stack Development Machine Learning vs Traditional Programming
  57. 57. 57Java Big Data Full Stack Development Data Science
  58. 58. 58Java Big Data Full Stack Development Can a Java programmer to be a Data Scientist?
  59. 59. 59Java Big Data Full Stack Development Sexy Data Scientist
  60. 60. 60Java Big Data Full Stack Development Real Data Scientist
  61. 61. 61Java Big Data Full Stack Development How to start?
  62. 62. 62Java Big Data Full Stack Development Weka
  63. 63. 63Java Big Data Full Stack Development HADOOP
  64. 64. 64Java Big Data Full Stack Development Hadoop and Data Knights
  65. 65. 65Java Big Data Full Stack Development Hadoop
  66. 66. 66Java Big Data Full Stack Development MapReduce in different languages
  67. 67. 67Java Big Data Full Stack Development MapReduce for WordCount
  68. 68. 68Java Big Data Full Stack Development Hadoop Jobs
  69. 69. 69Java Big Data Full Stack Development Hadoop frameworks • Universal (MapReduce, Tez, RDD in Spark) • Abstract (Pig, Pipeline Spark) • SQL - like (Hive, Impala, Spark SQL) • Processing graph (Giraph, GraphX) • Machine Learning (Mahout, MLib) • Stream processing (Spark Streaming, Storm)
  70. 70. 70Java Big Data Full Stack Development SPARK
  71. 71. 71Java Big Data Full Stack Development SPARK: the bloody son of MR • MapReduce in memory • Up to 50x faster than Hadoop • RDD is a basic building block (immutable distributed collections of objects) • Pipeline API (no needs in PIG)
  72. 72. 72Java Big Data Full Stack Development Spark Family
  73. 73. 73Java Big Data Full Stack Development MLlib supports • Classification and regression • Collaborative filtering • Clustering • Dimensionality reduction • Optimization
  74. 74. 74Java Big Data Full Stack Development Code sample MLlib (K-Means) // Cluster the data into two classes using KMeans int numClusters = 2; int numIterations = 20; KMeansModel clusters = KMeans.train(parsedData.rdd(), numClusters, numIterations); // Evaluate clustering by computing Within Set Sum of Squared Errors double WSSSE = clusters.computeCost(parsedData.rdd()); System.out.println("Within Set Sum of Squared Errors = " + WSSSE); // Save and load model clusters.save(sc.sc(), "myModelPath"); KMeansModel sameModel = KMeansModel.load(sc.sc(), "myModelPath");
  75. 75. 75Java Big Data Full Stack Development MLlib • .. extends scikit-learn (Python lib) and Mahout • .. runs fully on Spark and supports Spark’s Pipeline API • .. dataset is represented by Spark SQL’s SchemaRDD • .. supports Hive like external data source • .. is well for large datasets and parallelized algorithms
  76. 76. 76Java Big Data Full Stack Development It solves all problems!
  77. 77. 77Java Big Data Full Stack Development How to start?
  78. 78. 78Java Big Data Full Stack Development HDP Zoo
  79. 79. 79Java Big Data Full Stack Development Ok, Google!
  80. 80. 80Java Big Data Full Stack Development AWS Amazon
  81. 81. 81Java Big Data Full Stack Development Infrastructure issues are waiting YOU!
  82. 82. 82Java Big Data Full Stack Development DEEP LEARNING
  83. 83. 83Java Big Data Full Stack Development Deep Learning help us build NEW FUTURE
  84. 84. 84Java Big Data Full Stack Development Deep Learning help us build NEW FUTURE
  85. 85. 85Java Big Data Full Stack Development HOW TO LEARN?
  86. 86. 86Java Big Data Full Stack Development 1. Read books and write ‘pet’ projects DIFFERENT WAYS
  87. 87. 87Java Big Data Full Stack Development 1. Read books and write ‘pet’ projects 2. Become a mentee in Mentoring Process DIFFERENT WAYS
  88. 88. 88Java Big Data Full Stack Development 1. Read books and write ‘pet’ projects 2. Become a mentee in Mentoring Process 3. MOOC DIFFERENT WAYS
  89. 89. 89Java Big Data Full Stack Development 1. Read books and write ‘pet’ projects 2. Become a mentee in Mentoring Process 3. MOOC 4. Take a training course DIFFERENT WAYS
  90. 90. 90Java Big Data Full Stack Development 1. Read books and write ‘pet’ projects 2. Become a mentee in Mentoring Process 3. MOOC 4. Take a training course 5. Visit conferences DIFFERENT WAYS
  91. 91. 91Java Big Data Full Stack Development Recommended Books
  92. 92. 92Java Big Data Full Stack Development Contacts E-mail : Alexey_Zinovyev@epam.com Twitter : @zaleslaw @BigDataRussia vk.com/big_data_russia Big Data Russia vk.com/java_jvm Java & JVM langs

×