Weitere ähnliche Inhalte
Mehr von Impetus Technologies (20)
Kürzlich hochgeladen (20)
Big Data Architectures: Beyond Hadoop- Impetus Webinar
- 1. Impetus Technologies Inc.
1 © 2014 Impetus Technologies
Big Data Architectures
Beyond the Elephant Ride
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=60
- 2. Outline
• The Hadoop ecosystem and challenges
• Big Data solutions beyond Hadoop
- How and where to use them?
• Some use cases
• Big Data Architecture Strategy
2 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=60
- 3. Disclaimer
• Not advising to discard Hadoop
• Will discuss Big Data technologies that complement and
supplement Hadoop
3 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=60
- 4. What is Hadoop?
Scalable data processing engine
• DFS: Scalable fault-tolerant distributed file-system
• Map Reduce: Parallel processing programming model
4 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=60
- 5. Hadoop Ecosystem
5 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=60
- 6. Where to use Hadoop?
• Risk Analysis
– Intrusion detection, Credit scoring
• Recommendation
– Customers who purchased this also liked
6 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=60
- 7. Where to use Hadoop?
• Sentiment Analysis
– Positive, Negative or Neutral sentiment in sentences
• Targeted Ads
– Display ads based on user behavior and preferences
7 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=60
- 8. Where to use Hadoop?
• Machine Learning
– Spam vs. Not Spam
• And a lot of other areas…
8 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=60
- 9. Challenges with Hadoop
9 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=60
- 10. Limitations
• Data security
• Dependence on OS/Language
• MapReduce programming
• Batch processing only
10 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=60
- 11. 11 © 2014 Impetus Technologies
Beyond Hadoop
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=60
- 12. Faster Hadoop
• MapR
– Simple to manage (NFS)
– MapR Express Lane
– Handles real-time data flows
12 © 2014 Impetus Technologies
Dominant Players – HortonWorks,
Cloudera, Hadapt
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=60
- 13. Transactional Systems
• E-commerce Websites
-ATM
-Traditional solutions - MySQL, Oracle, MSSQL
13 © 2014 Impetus Technologies
Go NewSQL - VoltDB, Clusterix
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=60
- 14. Explaining VoltDB
14 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=60
- 15. Real Time Computation
• Continuous computation
- Trending topics
• Stream processing
- Twitter Firehose
15 © 2014 Impetus Technologies
Try.. Storm, Esper, S4, CloudScale!
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=60
- 16. Explaining Storm
Spouts- Data Source Bolts – Data Processors
16 © 2014 Impetus Technologies
Topologies – Combination of Spouts and Bots
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=60
- 17. Real-time Traffic
17 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=60
- 18. Doing it Right
18 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=60
- 19. Graph Computation
• Page Rank
• Shortest Path
• “Friends of my friends’ friends”
19 © 2014 Impetus Technologies
We suggest – Giraph, Pregel
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=60
- 20. LinkedIn Degrees of Separation
20 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=60
- 21. Fast Key-Value Access
• Show latest items listing in your homepage
• Caching
21 © 2014 Impetus Technologies
Explore NoSQL - Redis and Riak!
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=60
- 22. How it works?
Latest comments posted by user
Traditional Approach: Query in Runtime
SELECT * FROM foo WHERE ... ORDER BY time DESC LIMIT 10
Redis Live Cache Approach
FUNCTION get_latest_comments(start,num_items):
id_list = redis.lrange("latest.comments",start,start+num_items-1)
IF id_list.length < num_items
id_list = SQL_DB("SELECT ... ORDER BY time LIMIT ...")
END
RETURN id_list
END
22 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=60
- 23. How it works?
23 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=60
- 24. 24 © 2014 Impetus Technologies
Good to Know
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=60
- 25. Recap
Already Invested in Hadoop
25 © 2014 Impetus Technologies
Explore Faster Hadoop –
HortonWorks, Cloudera,
MapR, Hadapt
Alternatives to Hadoop HPCC, Disco
Complex business queries,
online transaction
processing
“New Gen” SQL
VoltDB, Clustrix, Hadapt
Real Time Analytics CloudScale, Storm, Esper
Fast Key-Value Access Redis, Riak
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=60
- 26. Bleeding Edge A peek into the future
26 © 2014 Impetus Technologies
High performance super
computing Open MPI , BSP
Highly efficient, large scale
graph computing Pregel, Giraph
Low latency queries over very
large data sets
Dremel
Incremental updates on
massive datasets
Percolator (Caffeine)
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=60
- 27. Architecture Strategy
27 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=60
- 28. Recommendations
• Think Beyond the Warehouse
• Time for Real-time
• Not Only Hadoop
• Hadoop is an enabler for better data warehouse solutions,
not a replacement
• Back To SQL?
• SQL is not bad
• Hadoop and SQL complement each other
• Integrations & Visualizations
• Realtime
28 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=60
- 29. Recommendations
• Big Data in upstream operational systems
• Forecasting Systems
• Supply Chains
• CRMs
29 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=60
- 30. Our Architecture Strategy
30 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=60
- 31. 31 © 2014 Impetus Technologies
About Impetus
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=60
- 32. Our Expertise
• Strategic partners for software product engineering and
R&D
• Thought leaders in cutting-edge technologies
• Mature processes and practices that are methodical, yet
flexible
• Diverse domain expertise
32 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=60
- 33. 33 © 2014 Impetus Technologies
Q & A
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=60
- 34. 34 © 2014 Impetus Technologies
Thank You
Write to us at inquiry@impetus.com
Follow us on Twitter @impetustech
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=60