Basic hadoop introductory material : http://www.coreservlets.com/hadoop-tutorial/
Evaluate hadoop without installation : http://go.cloudera.com/cloudera-live.html
Postgresql good parts : http://www.slideshare.net/Aveic/postgresql-34323147
Postgresql as NOSQL column store : http://postgresguide.com/sexy/hstore.html
Postgresql as Elastic search basic functionality : http://blog.lostpropertyhq.com/postgres-full-text-search-is-good-enough/
Good big data compatible OSS softwares : http://netflix.github.io/
Practical Hbase usage : https://www.facebook.com/UsingHbase
Using cassandra for write heavy applications : http://www.datastax.com/1-million-writes
On-line analytics in STORM : http://hortonworks.com/hadoop/storm/
E-commerce Domain specific use case : http://www.slideshare.net/jaykumarpatel/cassandra-at-ebay-13920376
Good use case of selecting data store based on proper understanding of CAP theorem : http://tech-blog.flipkart.net/2013/01/nosql-for-a-user-engagement-platform/
Recommendation engine in Big Data scenarios : http://www.slideshare.net/hava101/recommendations-play-flipkart-14115791
High volume log proessing: http://www.splunk.com/view/product-tour/SP-CAAAAGV Open source alternatives : http://logstash.net/ and http://graylog2.org/
1. For Ahmedabad Java Meetup Group (300+ members strong now!)
Big Data Workshop
– An introduction
and workshop launch session
May, 2014
Dhruv Gohil
From Ishi systems
2. Welcome!
l
Why a workshop and not a presentation
l
What you should do in workshop?
l
What is expected from you in this session
l
What you should expect from this session?
l
What are up-coming sessions going to be like?
4. OK... So what are we gonna do today?
➔Workshop setup and series introduction
➔Already done! (See it's easy!)
➔Big is not only ‘big’.
➔Why we need 'Big data'?
➔What 'Big data' is NOT?
➔fear of Big data? Kick it off!
5. Let me tell you a story..
http://en.wikipedia.org/wiki/Information_Management_System
6. If you still think about 'Entities' and 'Tables'
Everything you have been taught in college
about Database is ALL WRONG.
http://slideshot.epfl.ch/play/suri_stonebraker
9. Big Data is not only ‘big’
Volume, Velocity, Variety
GB/TB vs PB/EB
Centralized vs Distributed
Structured vs Semi-Structured/Unstructured
Data Model vs Schema
Known relationships vs Flexible associations
10. What 'Big data' is NOT?
Big data है इसलिलिए Hadoop हैँ , Hadoop हैँ इसललिए Big data नहिहं!
11. What 'Big data' is NOT?
Applying for a job here?
Hadoop सले कम तो गालिी के बराबर है !
12. What 'Big data' is NOT?
Why always Hadoop comes to mind with big
data?
What else we should know?
Tools vs Methodologies
Being too futuristic vs. being
practical/economical
13. Big Data in your organization
http://www.fakingnews.firstpost.com/2014/04/transcript-of-rahul-gandhis-interview-for-job-of-a-c-programmer/
We brought RTSC. Right To Source Code.
Now, deal with it.
14. Big Data in your organization
➢ Cost of tools/software decreases, but cost of
knowledge increases
➢ Being agile is the only way to deal competition
➢ Are you working with...
✔ Social networking and media
✔ Mobile devices
✔ Internet transactions
✔ Networked devices and sensors
15. Big Data in your product/service
● Have to change thinking in perspective of access vs. storage
● Design based on when/where data is used vs. when/where
data is produced.
● Use redundancy in contrast of storage cost
● Understand NoSQL = Not Only SQL
✔ Streams
✔ In memory analytics
✔ Massively parallel processing (Data crunching)
16. Big Data in your project
Random Research says..
➔ 99% client of yours asked for Big Data
project, ended up having total paid customers
less then your own fingers.
A Project hits Business scalability much much
earlier then technical scalability.
17. Big Data for your clients
➢ Business first - technology second
➢ Current reality for client projects:
✔ Use big data tools which works at small scale :-)
✔ Design with domain in mind not the database
client suggests.
➢ Always design for read optimization in mind
(the golden rule)
18. Big Data project for small data customers
If you can do it postgresql, then do it postgresql
(the blue elephant rule)
20. The CAP theorem- Basics of NoSQL Databases
Read a lot about design of database before
using any non traditional database. Or read
good negative posts to know when NOT to use
it.
e.g. : http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/
21. Now... the good parts !
It's your time to speak now!
Workshop session:
About practical selection of technology and
design for real word use cases.
22. All references used in workshop reference
➔ Basic hadoop introductory material : http://www.coreservlets.com/hadoop-tutorial/
➔ Evaluate hadoop without installation : http://go.cloudera.com/cloudera-live.html
➔ Postgresql good parts : http://www.slideshare.net/Aveic/postgresql-34323147
➔ Postgresql as NOSQL column store : http://postgresguide.com/sexy/hstore.html
➔ Postgresql as Elastic search basic functionality : http://blog.lostpropertyhq.com/postgres-full-text-search-is-good-enough/
➔ Good big data compatible OSS softwares : http://netflix.github.io/
➔ Practical Hbase usage : https://www.facebook.com/UsingHbase
➔ Using cassandra for write heavy applications : http://www.datastax.com/1-million-writes
➔ On-line analytics in STORM : http://hortonworks.com/hadoop/storm/
➔ E-commerce Domain specific use case : http://www.slideshare.net/jaykumarpatel/cassandra-at-ebay-13920376
➔ Good use case of selecting data store based on proper understanding of CAP theorem :
http://tech-blog.flipkart.net/2013/01/nosql-for-a-user-engagement-platform/
➔ Recommendation engine in Big Data scenarios : http://www.slideshare.net/hava101/recommendations-play-flipkart-14115791
➔ High volume log proessing: http://www.splunk.com/view/product-tour/SP-CAAAAGV Open source alternatives : http://logstash.net/ and
http://graylog2.org/