4. Approach
1. Created Table at HBase
2. Generated Dummy Data in Spark as a RDD and Saved to HBase
3. Reading HBase Table as a HadoopRDD in Spark
4. Traversing HadoopRDD Elements and Printing it out.
5. Entire Spark Transformation coded in Scala
6. Generated Executable Uber(“Super”) Jar with SBT
7. Submitted Spark job using spark-submit
8. TODO
1. Read HBase Table and Store HBaseRDD to Apache Hive with Spark.
a. Incremental load from HBase to Hive
b. SparkSQL / SparkRDD and Apache Drill Performance Benchmark for Reading
HBase.
2. Save Summarized Spark Aggregation to PostgreSQL
9. Assumed : High Level Architecture
Event Queuing Kafka Batch
Incremental load /
Transformations
Reporting Tool
(Dashboard)
Processed DW - Ad
hoc Analysis / 1st
level aggregation
Data Ingestion
Pipeline
RestfulAPI’s
Summarized DB - KPI
Reporting