Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Big Data Intelligence
The Beginning
Prof.Ashok.R | +91-9943900101 | ashok@zettab.com
ZettaB.com
Big Data Training in Coimb...
Caution
• The grass is always green on the other side
Be inspired!
• Stories.. and more stories…
Be informed!
• The devil ...
The Dream in 1945
3
• A dream machine of Vannevar Bush (1945)
• An extended supplement to Human Memory
• A device which st...
Leads to Web and Web Scale Data
• Data Volume: doubles every 1.2 years
5 EB – Total data produced in 5000 years till 2003
...
Desktop
Hobbyist
Internet
Big Data
Byte one grain of rice
Kilobyte cup of rice
Source: What is big data, Slideshare.net
Me...
Big Data Intelligence (BDI)
The ability to understand all of us better by connecting the dots
from massive data sets (with...
WHY DO WE PREDICT
7
To Survive
8
With largest neural network brain to store and process
big volume of data with 100 billions of neurons and 2....
The Prediction Power
• 10000 hours (7-8 years) of rigorous practice is required to be
the world-class expert—in anything D...
Can Machines Think (to Predict)?
10
Alan Turing asked this question
in 1950 and proposed a test
to validate it.
Which is machine, and
which is woman???
Can Machine Imitate Brain?
Which is man,
and which is
woman???
Turing Test
11
Did any machine pass?
12
Any machine nearer? (near AI)
“William Wilkinson’s ‘An account of the
principalities of Wallachia...
IBM
Watson Computer
13
“William Wilkinson’s ‘An account of the principalities of Wallachia and Modavia’ inspired this
auth...
Near AI Solutions
• Natural language processing
• Machine learning
• Prediction analytics
• Face recognition
• Languages t...
Can machine predict?
• Share price in a stock market next day
• Top 5 products consumers want to buy next week
• Price of ...
Google Story: Where it all began
• 50 billion indexed pages
• Thousands/Millions/Billions of pages may match each search
q...
Page Rank
• Give pages ranks (scores) based on links to them
– Links from many pages  high rank
– Link from a high-rank p...
Matrix-Vector Multiplication
MatrixGoogle
A.rr t1t


A
• Page rank equation in a practical form,
(Rank vector r is the ...
RAM is not Enough
• Won’t be a problem for small dimension (NxN)
• Consider, N=1 billion (pages that match a query)
• Dime...
Worker Node
20
Datacenter node
16 cores
10-30 TB disks
(Secondary)
128-512GB RAM
(Main memory)
1-4TB (SSD)
1 -10 Gbps
0.2-...
Disk is slowest and not Enough
• 50 billion web pages x 20KB = 1 PB
• 1 computer reads 30-35 MB/sec from disk
~10 months t...
Parallelism using Cluster
• 8-64 nodes/rack, 4-16 racks in a cluster
• 1 Gbps bandwidth within rack, 8 Gbps out of rack
• ...
But Nodes Fail at Scale
• One server may stay upto 3 years (1,000 days)
• If you have 1,000 servers, expect to loose 1/day...
Traditional RDBMS Fail
• Not designed for variety of data types (Text, Video, Images)
• Not capable to handle big volume (...
Google Solution: DFS
• Distributed File System
• Divide the bigger data file into smaller chunks of size 16-64
MB and stor...
& Map-Reduce
Map-Reduce environment(Master) takes care of:
• Handling machine failures (with replica nodes)
• Partitioning...
Big Data Platform
M-R App
MapReduce Stack
(Hadoop & Spark)
Distributed File Systems
(HDFS/ GFS)
BUT WITH RESTRICTION
DFS is useful, only when
• Size is big (> 1 TB)
• Files are rarely updated
– Works for Google (to store indexed pages)
– W...
M-R is useful, only when
• Dimension in billions
– Matrix-vector multiplication in Google Pagerank
• Graph with millions o...
BDI PARADIGM
Google Creates
• DFS (GFS)
• Map-Reduce
• Dremel (Big Query)
• Pregel
& Apache Follows
• GFS  HDFS
• Map-Reduce Hadoop, Spark
• Dremel  Drill
• Pregel  Giraph
SCALA
• Uses and Runs on Java Virtual Machine
• Yet, simpler to write (succinct) than Java
– Strong Type Inference (static...
Mining
• Link Analysis
• Classification
• Content based recommendation
• Collaborative Filtering
• Finding similar items
•...
Cloud
• Amazon AWS
• Google Cloud Platform
• IBM BlueMix
• OpenStack
• Data Bricks, Cloudera, HortonWorks, MapR,…
• SAP, O...
One Circle
BIG POTENTIAL
Big Market
• $16.9 billion in 2015
• $50 billion by 2017
• 90 percent of the Fortune 500 already initiated big data
projec...
Big Players
• Leaders
– IBM, HP, Dell, SAP, Teradata, Oracle, SAS, Accenture
(>$400 Million)
• Pure players (100% revenue ...
Big Jobs
J. Leskovec, A. Rajaraman, J. Ullman: Mining of
Massive Datasets, http://www.mmds.org
41
BIG ENABLERS
42
Smart phones
• 1.2 billion sold in 2014
– 23.1 % increase over 2013
• Accounts to 27% of global handsets
– but consumes 95...
Nielsen’s Law
44
bandwidth doubles every twenty-one months
5G in 2020 and 6G in 2030.
Moore's Law
45
Zilog PC
1980 iPhone
2007
Kryder's Law
46
In 2020, 2.5-inch disk drive would
store ~ 40 TB and cost about $40.
Storage capacity (doubles every 12
mo...
All Together
47
Annual
Growth Rate
Nielsen's Law
Internet
bandwidth
50%
Moore's Law
Computing
power
60%
Kryder’s Law
Stora...
BIG SOURCES
Social Networks
49
as of August 2015
http://www.statista.com/
No. of active users in millions
Facebook
Ref: Chassis-plans.com, Wikibon
50
60 million posts per day
2.6 billion likes per day
375 million photos uploaded...
Twitter
Ref: Chassis-plans.com, Wikibon
51
500 Million tweets per day
1.6 Billion search queries per day
316 Million montl...
Youtube
• 100 hours of new video every minute
• 53% mobile traffic is video
• Avg Human vision input: ½ million hours/life...
MORE STORIES
House of Cards
Big data analytics picked up on the success of the British version
of House of Cards, and the popularity of...
Lumiata
creates personalized treatment recommendations based on patients'
health data, using 170 million data points
55
ra...
MedAware
Avoids prescription errors due to
Drug mix-up
Patient mix-up
Unawareness of clinical data
Dosage mix-up
56
Exampl...
Windward
• Only platform to analyze maritime data from ships and ocean
to maintain ship history, predict threats and help ...
mnubo
• Analytics of IoT Data
• Analytics of data from Connected car for driving
habits, vehicle failure pattern, inventor...
rocana
59
How many of your servers
are talking to blacklisted IPs?
How long has your
business been hacked?
Recana helps IT...
Whetlab
• Only 5 data scientists worked
• Twitter acquired at undisclosed deal to increase the ability to
show users the k...
Applied Predictive Technologies
Cloud based cause and effect analytics platform to accurately
measure the profit impact of...
Netflix Challenge
• Data: How users have rated movies
– 100.5 million ratings by 5 Lakh users to 18K movies
• Goal: Predic...
KDD Cup Challenge
• Data: How users rated songs
– 252.8 million ratings by 1 million users to 650K songs (Yahoo!)
• Goal: ...
BDI for National Security
• TIA after (11/9)
• NATGRID after Mumbai attack (26/11)
– We could have stopped both, if we wou...
More Applications
• Building a Stock Investment Strategy Model
• Predicting Customer Transaction Behavior
• Failure Predic...
WHAT NEXT
A first course on BDI
Day Topics
Day 1 FN BDI: The Beginning
DFS and Map-Reduce
Distributed Graph (Pregel)
Page Rank algor...
M.S. Options in USA
68
University Program
Stanford University M.S-CS, Specialization in Information
Management and Analyti...
PG options in India
69
Institute Program
Indian School of Business Certified Program in Business
Analytics (CBA)
Great Lak...
Road Ahead
”The ultimate
search engine would
understand exactly
what you mean and
give back exactly
what you want.”
- Larr...
Evolution of Manager Desk
71
Tree is God and above all
72
Prof.Ashok.R | +91-9943900101 | ashok@zettab.com
ZettaB.com
Big Data Training in Coimbatore
Nächste SlideShare
Wird geladen in …5
×

BDI- The Beginning (Big data training in Coimbatore)

The main objective of “Big Data intelligence” is to understand all of us better to predict the future. Be it 4 billion google queries a day or 1 billion FB users, we need smarter AI algorithms to learn and connect the dots from the ocean of data. With massive parallelism and Map-Reduce techniques, millions of servers take us one step closer to the “Turing’s Intelligent machine”. Near AI success stories are google, facebook, twitter, youtube and Amazon. Let's begin our journey by knowing big hype, big dreams of 50's , big laws, big growth and basic operations to extract big data intelligence.For more information on Big Data training in coimbatore, please visit https://bigzettab.wordpress.com/ . - Prof. Ashok.R, +91-9943900101, ashok@zettab.com.

  • Als Erste(r) kommentieren

BDI- The Beginning (Big data training in Coimbatore)

  1. 1. Big Data Intelligence The Beginning Prof.Ashok.R | +91-9943900101 | ashok@zettab.com ZettaB.com Big Data Training in Coimbatore Ref: Ullman et.al, Mining Massive Datasets
  2. 2. Caution • The grass is always green on the other side Be inspired! • Stories.. and more stories… Be informed! • The devil is in the details Be challenged! 2Hsuan- Tien Lin
  3. 3. The Dream in 1945 3 • A dream machine of Vannevar Bush (1945) • An extended supplement to Human Memory • A device which stores individual library such as books, records and communications • Microfilms can be searched, copied and shared • Useful to store and share information among lawyers, patent attorney, Doctor and chemists • The base concept from which WWW evolved
  4. 4. Leads to Web and Web Scale Data • Data Volume: doubles every 1.2 years 5 EB – Total data produced in 5000 years till 2003 20 EB – Data collected by Google alone for a day now • Data Variety : structured, semi and unstructured xml, JSON, doc, pdf, html, email body, .mp4,.jpeg… • Data Velocity : Lot happens in a minute 72 hours of new video uploaded in YouTube 3 million searches in google 200 million emails sent 350 thousand Tweets | 1 million searches in Twitter 690 thousand shares | 420 GB data handled in FB 20 million photo views in Flickr 4 Source: Qmee, Wikibon https://blog.kissmetrics.com/facebook-statistics/
  5. 5. Desktop Hobbyist Internet Big Data Byte one grain of rice Kilobyte cup of rice Source: What is big data, Slideshare.net Megabyte 8 bags of rice Gigabyte 3 Trucks of rice Terabyte 2 container ships Petabyte Fills half the area of Tirupur Exabyte Fills the area of south india ZettaByte Fills Indian ocean twice PB/EB/ZB 210 220 230 240 250 260 270 1
  6. 6. Big Data Intelligence (BDI) The ability to understand all of us better by connecting the dots from massive data sets (with TB/PB Volume, streaming Velocity and Variety in sources) to predict the future. 6
  7. 7. WHY DO WE PREDICT 7
  8. 8. To Survive 8 With largest neural network brain to store and process big volume of data with 100 billions of neurons and 2.5 PB equivalent memory @ 100 million MIPS (33K i7 cores) Vision | Touch | Hearing | Smell | Taste Scientificamerican.com, Storagecraft.com 1250 MB/s | 125 MB/s | 12.5 MB/s | 1.25 MB/s You only feel 0.7% of What you sense
  9. 9. The Prediction Power • 10000 hours (7-8 years) of rigorous practice is required to be the world-class expert—in anything Daniel Levitin, The neurologist • This enables the ability to predict 2 seconds before others- “Two Second Advantage” – Wayne Gretzky, The greatest Ice-hockey player of all time, was able to predict where the puck was going to be, an instant before it arrived – Sachin Tendulkar – Warren Buffet – Viswanath Anand 9 wins the competitors
  10. 10. Can Machines Think (to Predict)? 10 Alan Turing asked this question in 1950 and proposed a test to validate it.
  11. 11. Which is machine, and which is woman??? Can Machine Imitate Brain? Which is man, and which is woman??? Turing Test 11
  12. 12. Did any machine pass? 12 Any machine nearer? (near AI) “William Wilkinson’s ‘An account of the principalities of Wallachia and Modavia’ inspired this author’s most famous novel.” Jeopardy! Quiz Contest. The challenge is to predict the question and bet with reasonable confidence. No.
  13. 13. IBM Watson Computer 13 “William Wilkinson’s ‘An account of the principalities of Wallachia and Modavia’ inspired this author’s most famous novel.”
  14. 14. Near AI Solutions • Natural language processing • Machine learning • Prediction analytics • Face recognition • Languages translation • Speech recognition 14
  15. 15. Can machine predict? • Share price in a stock market next day • Top 5 products consumers want to buy next week • Price of Tomato(1 Kg) next month • No. of cars to be sold next quarter • Potential criminals in the city/ mega event • When machine/human will become sick • Best matched course/school to study • Best matched job/company to work 15
  16. 16. Google Story: Where it all began • 50 billion indexed pages • Thousands/Millions/Billions of pages may match each search query • How to rank them in order to display the most relevant (important) pages in the top. • Predict what you want to see. Not what you asked. Do You Know? 4 billion searches happen in a day Each query uses 1000 nodes Result returned in 0.2 seconds. 20 billion pages crawled per day 20 Exabytes of data collected in a day
  17. 17. Page Rank • Give pages ranks (scores) based on links to them – Links from many pages  high rank – Link from a high-rank page  high rank Parallel Programming With Spark, Matei Zaharia    ji t it j r r i 1 d “rank” rj for page j
  18. 18. Matrix-Vector Multiplication MatrixGoogle A.rr t1t   A • Page rank equation in a practical form, (Rank vector r is the Eigen vector of A) Iteration is repeated, till rank vector converges (or max. iteration reaches) For iteration t+1,
  19. 19. RAM is not Enough • Won’t be a problem for small dimension (NxN) • Consider, N=1 billion (pages that match a query) • Dimension is now in billions – A is billion x billion matrix – r is billion size rank vector – r(old,new) has two billion entries ( 16 GB for 8 bytes double values) – A has billion x billion entries ( 8 ExaBytes) Though, we have methods such as sparse matrix to reduce dimensions in actual implementation. RAM size of a highly configured server node: 128-512 GB
  20. 20. Worker Node 20 Datacenter node 16 cores 10-30 TB disks (Secondary) 128-512GB RAM (Main memory) 1-4TB (SSD) 1 -10 Gbps 0.2-1GB/s (x10 disks) (Seek) 1-4GB/s (x4 disks) 40-60GB/s Source: AmpLab, UCB, Dell
  21. 21. Disk is slowest and not Enough • 50 billion web pages x 20KB = 1 PB • 1 computer reads 30-35 MB/sec from disk ~10 months to read all • Also, it requires 1,000 hard drives to store all 21J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org can you wait that long for one search?
  22. 22. Parallelism using Cluster • 8-64 nodes/rack, 4-16 racks in a cluster • 1 Gbps bandwidth within rack, 8 Gbps out of rack • Node specs : 8-16 cores, 128-512 GB RAM, 10×1 TB disks Aggregation switch Rack switch ToR
  23. 23. But Nodes Fail at Scale • One server may stay upto 3 years (1,000 days) • If you have 1,000 servers, expect to loose 1/day • Google has 1 Million servers –Hence 1000 machines will fail every day. 23J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
  24. 24. Traditional RDBMS Fail • Not designed for variety of data types (Text, Video, Images) • Not capable to handle big volume (PB/EB/ZB) • Not designed for parallelism • Poor fault tolerance at Scale (Million servers) • Slow down due to joins, volume, ACID check and high velocity requests • Designed for transaction processing; Not designed for deep analytics (intensive computing) 24
  25. 25. Google Solution: DFS • Distributed File System • Divide the bigger data file into smaller chunks of size 16-64 MB and store them in different nodes in different racks. • Chunks are replicated (2-3) for fault tolerance 25 C0 C1 C2C5 Chunk server 1 D1 C5 Chunk server 3 C1 C3C5 Chunk server 2 … C2D0 D0 C0 C5 Chunk server N C2 D0 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
  26. 26. & Map-Reduce Map-Reduce environment(Master) takes care of: • Handling machine failures (with replica nodes) • Partitioning the input data • Scheduling workers • Performing the group by key step • Managing inter-machine communication J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 26
  27. 27. Big Data Platform M-R App MapReduce Stack (Hadoop & Spark) Distributed File Systems (HDFS/ GFS)
  28. 28. BUT WITH RESTRICTION
  29. 29. DFS is useful, only when • Size is big (> 1 TB) • Files are rarely updated – Works for Google (to store indexed pages) – Will not be effective for Airline reservation system (where frequent data updates are done) 29
  30. 30. M-R is useful, only when • Dimension in billions – Matrix-vector multiplication in Google Pagerank • Graph with millions of nodes and billions of edges – FB Network Graph • Deep analytical application with intensive computing – Useful in Finding users with similar buying pattern for products recommendations in Amazon – But not useful to manage online retail sales of Amazon (frequent data updates, transactions) 30
  31. 31. BDI PARADIGM
  32. 32. Google Creates • DFS (GFS) • Map-Reduce • Dremel (Big Query) • Pregel
  33. 33. & Apache Follows • GFS  HDFS • Map-Reduce Hadoop, Spark • Dremel  Drill • Pregel  Giraph
  34. 34. SCALA • Uses and Runs on Java Virtual Machine • Yet, simpler to write (succinct) than Java – Strong Type Inference (statically typed) – Lesser Code • Functional Programming (+ OOP) – First class functions • Used to develop Spark stack (Hadoop 2.0) • Most suited for Map-Reduce applications – Traits, collections, nested classes – Immutable dataset – Scalable
  35. 35. Mining • Link Analysis • Classification • Content based recommendation • Collaborative Filtering • Finding similar items • Clustering, Decomposition… Machine Learning (Supervised/ Unsupervised)
  36. 36. Cloud • Amazon AWS • Google Cloud Platform • IBM BlueMix • OpenStack • Data Bricks, Cloudera, HortonWorks, MapR,… • SAP, Oracle… Spark as a service, Hadoop as a service
  37. 37. One Circle
  38. 38. BIG POTENTIAL
  39. 39. Big Market • $16.9 billion in 2015 • $50 billion by 2017 • 90 percent of the Fortune 500 already initiated big data projects • Big Data Spending : $8M Per company • 200 TB of stored data per company – with >1000 employees Ref: McKinsey 2011 39
  40. 40. Big Players • Leaders – IBM, HP, Dell, SAP, Teradata, Oracle, SAS, Accenture (>$400 Million) • Pure players (100% revenue from Big data) – Palantir, Pivotal, Splunk, Mu Sigma, Actian, Opera Solutions (>$100 Million) • Indian Players – TCS, CapGemini (>$10 million) 40 WikiBon 2013
  41. 41. Big Jobs J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 41
  42. 42. BIG ENABLERS 42
  43. 43. Smart phones • 1.2 billion sold in 2014 – 23.1 % increase over 2013 • Accounts to 27% of global handsets – but consumes 95% of global traffic (1.5 EB/month) • Daily SMS count exceeded the world population Ref: Znet, Fool.com 43
  44. 44. Nielsen’s Law 44 bandwidth doubles every twenty-one months 5G in 2020 and 6G in 2030.
  45. 45. Moore's Law 45 Zilog PC 1980 iPhone 2007
  46. 46. Kryder's Law 46 In 2020, 2.5-inch disk drive would store ~ 40 TB and cost about $40. Storage capacity (doubles every 12 months) grows faster than Moore’s law (processing capacity doubles every 18-24 months).
  47. 47. All Together 47 Annual Growth Rate Nielsen's Law Internet bandwidth 50% Moore's Law Computing power 60% Kryder’s Law Storage capacity 100%
  48. 48. BIG SOURCES
  49. 49. Social Networks 49 as of August 2015 http://www.statista.com/ No. of active users in millions
  50. 50. Facebook Ref: Chassis-plans.com, Wikibon 50 60 million posts per day 2.6 billion likes per day 375 million photos uploaded per day 15 TB data uploaded per day 600 TB data handled per day 700 TB Graph search DB 300 PB user data http://allfacebook.com/orcfile b130817
  51. 51. Twitter Ref: Chassis-plans.com, Wikibon 51 500 Million tweets per day 1.6 Billion search queries per day 316 Million montly active users 80% active users on mobile
  52. 52. Youtube • 100 hours of new video every minute • 53% mobile traffic is video • Avg Human vision input: ½ million hours/life • Youtube new uploads: 15 million hours/ year 52
  53. 53. MORE STORIES
  54. 54. House of Cards Big data analytics picked up on the success of the British version of House of Cards, and the popularity of David Fincher (Actor) and Kevin Spacey (Director) movies Netflix then made a major decision to commit $100 million for two 13-episode sessions for its remake (US version) with above team and streamed online Netflix earned $1 Billion in that Quarter. The Atlantic: May 2012 https://gigaom.com/2013/04/22/netflix-q1-2014-earnings/ first Emmy-winning Streaming show
  55. 55. Lumiata creates personalized treatment recommendations based on patients' health data, using 170 million data points 55 raised US$10 Million from VCs Ash Damle Founder & CEO
  56. 56. MedAware Avoids prescription errors due to Drug mix-up Patient mix-up Unawareness of clinical data Dosage mix-up 56 Example: Chlorambucil (chemotherapy) prescribed to a patient without cancer, instead of Chloramphenicol (antibiotic) Using mathematical model derived from Millions of EMRs which represents real-world treatment patterns Raised US$1 million funding
  57. 57. Windward • Only platform to analyze maritime data from ships and ocean to maintain ship history, predict threats and help make huge financial decisions on shipping and commodity flows • Earlier to 2010, it was impossible to know vessel’s location once it sailed past 30 miles off shores; Then commercial satellites were introduced ; But the big data collected from ships gave corrupted picture 57 Raised $15.8 million.
  58. 58. mnubo • Analytics of IoT Data • Analytics of data from Connected car for driving habits, vehicle failure pattern, inventory management, usage based insurance etc (36M connected cars will be on the road in 2020) 58 Raised $6 million
  59. 59. rocana 59 How many of your servers are talking to blacklisted IPs? How long has your business been hacked? Recana helps IT identify the root cause of performance or security issues at any scale and complexity and resolve underlying issues in real-time. Instead of employing “brute force” searches against millions of log entries, advanced analytics identifies anomalies for investigation. raised $19.4 million
  60. 60. Whetlab • Only 5 data scientists worked • Twitter acquired at undisclosed deal to increase the ability to show users the kinds of tweets and content they actually want to see. 60
  61. 61. Applied Predictive Technologies Cloud based cause and effect analytics platform to accurately measure the profit impact of pricing, marketing, merchandising, operations, and capital initiatives, tailoring investments in these areas to maximize ROI. Acquired by MasterCard for $600 million. 61
  62. 62. Netflix Challenge • Data: How users have rated movies – 100.5 million ratings by 5 Lakh users to 18K movies • Goal: Predict how a user would rate an unrated movie – A recommender system problem – 10% improvement: 1 million dollar prize 62Hsuan- Tien Lin
  63. 63. KDD Cup Challenge • Data: How users rated songs – 252.8 million ratings by 1 million users to 650K songs (Yahoo!) • Goal: Recommend new songs that user would like 63Hsuan- Tien Lin
  64. 64. BDI for National Security • TIA after (11/9) • NATGRID after Mumbai attack (26/11) – We could have stopped both, if we would have connected the pieces of intel from all security agencies and info tracked from suspects together. 64
  65. 65. More Applications • Building a Stock Investment Strategy Model • Predicting Customer Transaction Behavior • Failure Prediction • Opinion Mining to Determine User Sentiments • Financial Loss Prediction • Insurance Claim Prediction Model • Bond Trade Price Prediction • Prediction of Number of Days in the Hospital • Accelerating Discovery of Drugs for Mutants of H1N1 • Molecular Activity Prediction • Job Recommendation Engine 65 https://insofeprojects.wordpress.com/insofe-projects/
  66. 66. WHAT NEXT
  67. 67. A first course on BDI Day Topics Day 1 FN BDI: The Beginning DFS and Map-Reduce Distributed Graph (Pregel) Page Rank algorithm Day 1 AN BDI Tools Landscape Dremel and Big Query Naïve Bayes Classifier Day 2 FN TF-IDF, Jaccard and Cosine Collaborative filtering Shingling, Minhashing Locality Sensitive Hashing Day Topics Day 2 AN Scala Basics for MR apps Practice session More fun with Scala Day 3 FN Spark projects using Scala Day 3 AN Student Projects ideas Q&A
  68. 68. M.S. Options in USA 68 University Program Stanford University M.S-CS, Specialization in Information Management and Analytics Four course graduate certificate in mining massive datasets (link) Northwestern University Master of Science In Analytics DePaul University Master of Science in Predictive Analytics North Carolina State University Master of Science In Analytics University of Ottawa, Canada M.Sc in Analytics University of Connecticut MS in Business Analytics and Project Management informationweek.com IBM Director Dr. Spohrer's short list
  69. 69. PG options in India 69 Institute Program Indian School of Business Certified Program in Business Analytics (CBA) Great Lakes Institute of Management PGP in Business Analytics IIM Bangalore Analytics Essentials, BAI IIM Ahmedabad Advanced Analytics for Management AnalyticsVidya.com, analyticsindiamag.com
  70. 70. Road Ahead ”The ultimate search engine would understand exactly what you mean and give back exactly what you want.” - Larry Page
  71. 71. Evolution of Manager Desk 71
  72. 72. Tree is God and above all 72
  73. 73. Prof.Ashok.R | +91-9943900101 | ashok@zettab.com ZettaB.com Big Data Training in Coimbatore

×