Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming Stack
Why we still need SQL for Big Data ?
How to make Big Data more responsive and faster ?
Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming Stack
1. Apache Phoenix with
Actor Model (Akka.io)
for Real-time Big Data
Programming Stack
Why we still need SQL for Big Data ?
How to make Big Data more responsive and faster ?
By http://nguyentantrieu.info
Tech Lead at eClick team - FPT Online
2. Contents
1. What is Big data and Why ?
2. When standard relational database (Oracle,MySQL, ...) is
not good enough
3. Common problems in big data system
4. Introducing open-source tools in Big Data System
a. Apache Phoenix for ad-hoc query
b. Actor Model and Akka.io for reactive data processing
3. What Does Big Data Actually Mean?
“Big data means data
that cannot fit easily into
a standard relational database.”
Hal Varian- Chief Economist, Google
http://www.brookings.edu/blogs/techtank/posts/2014/09/11-big-data-definition
4. When standard relational database
(Oracle,MySQL, ...) is not good enough
the “analytic system” MySQL database from a
startup, tracking all actions in mobile games:
iOS, Android, ...
6. Definition from the crowd
“Big data is a term describing the storage
and analysis of large and or complex
data sets using a series of techniques
including, but not limited to: NoSQL,
MapReduce and machine learning.”
Jonathan Stuart Ward and Adam Barker
Source:
http://arxiv.org/abs/1309.5821
http://www.technologyreview.com/view/519851/the-big-data-conundrum-how-to-define-
it/
7. “Chaotic” fact and the demand
80% of that data is unstructured or “chaotic”
Photos, videos and social media posts - data that says so much
about us - but cannot be analyzed via traditional methods
Demand:
“Finding order among chaos”
8. 3 common problems in Big Data System
1. Size: the volume of the datasets is a critical
factor.
2. Complexity: the structure, behaviour and
permutations of the datasets is a critical
factor.
3. Technologies: the tools and techniques
which are used to process a sizable or
complex dataset is a critical factor.
9. Introducing open-source tools in Big Data System
Apache Phoenix
as SQL ad-hoc query
engine
Actor Model as nano-service
for reactive data
computation
in the dawn of “Fast data”
16. Interesting features of Apache Phoenix
● Embedded JDBC driver implements the majority of java.sql interfaces,
including the metadata APIs.
● Allows columns to be modeled as a multi-part row key or key/value cells.
● Full query support with predicate push down and optimal scan key
formation.
● DDL support: CREATE TABLE, DROP TABLE, and ALTER TABLE for
adding/removing columns.
● Versioned schema repository. Snapshot queries use the schema that was
in place when data was written.
● DML support: UPSERT VALUES for row-by-row insertion, UPSERT
SELECT for mass data transfer between the same or different tables, and
DELETE for deleting rows.
● Limited transaction support through client-side batching.
● Single table only - no joins yet and secondary indexes are a work in
progress.
● Follows ANSI SQL standards whenever possible
● Requires HBase v 0.94.2 or above
● 100% Java
25. What is actor model ?
● Carl Hewitt defined the Actor
Model in 1973 as a mathematical
theory that treats “Actors” as the
universal primitives of concurrent
digital computation.
● A fitting model for heavily-parallel
processing in a cloud environment
31. Quick demo
Using Akka (Rfx) and Apache Phoenix
for Social Media Real-time Analytics
32. Links for self-study and research
Actor Model and Programming:
● http://nguyentantrieu.info/blog/the-architecture-for-real-time-event-processing-with-
reactive-actor-model
● http://www.slideshare.net/drorbr/the-actor-model-towards-better-concurrency
● http://www.infoq.com/articles/reactive-cloud-actors
● http://www.mc2ads.com/p/rfx-for-big-data-developer.html
Apache Phoenix
● http://java.dzone.com/articles/apache-phoenix-sql-driver
● http://phoenix.apache.org/Phoenix-in-15-minutes-or-less.html
Big Data and Data Science
● http://www.mc2ads.com and http://www.mc2ads.org
● http://datascience101.wordpress.com
● http://lambda-architecture.net
● http://www.bigdata-startups.com
● https://www.coursera.org/course/datasci