SlideShare ist ein Scribd-Unternehmen logo
1 von 38
Energy Usage Insights
with Hadoop & HBase
July 25, 2013
Scott Kuehn Data Architect
Oren Benjamin Senior Software Engineer
Our Utility Partners
2
Australia New Zealand France Nova ScotiaUK
Energy Usage Insights
326 July 2013
Home Energy Report
426 July 2013
Energy Savings
526 July 2013
0.0%
0.5%
1.0%
1.5%
2.0%
2.5%
3.0%
3.5%
4.0%
4.5%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
Average Steady State Savings = ~1.5 – 3.5%
Months since program start
Energy saved
Impact
626 July 2013
$300,000,000
2,500,000,000 kWh
4,000,000,000 lbs CO2
Web Portal
726 July 2013
826 July 2013
Data Overview: Energy Usage Streams
926 July 2013
meter usage cost start end
0001 719.23 57.52 2013-01-04T00:00:00 2013-02-11T00:00:00
0001 742.61 59.36 2013-02-11T00:00:00 2013-03-12T00:00:00
0002 0.2050 2013-01-01T00:00:00 2013-01-01T00:15:00
0002 0.2250 2013-01-01T00:15:00 2013-01-01T00:30:00
0002 0.2350 2013-01-01T00:30:00 2013-01-01T00:45:00
0002 0.2050 2013-01-01T00:45:00 2013-01-01T01:00:00
0002 0.2250 2013-01-01T01:00:00 2013-01-01T01:15:00
0001 – Meter (Bills)
0002 – Smart Meter (Quarter-hourly reads)
Data Overview: Smart Meter
1026 July 2013
Data Overview: Entities
1126 July 2013
Customer
Account
Site
Meter
Account
Customer
Account
Meter
Data Overview: Size
1226 July 2013
» Billing data: 60M households
» Smart meter data: 15M households
» On disk: 5TB (raw)
» More smart meter data than all other data combined
Architecture: Usage Data Store
1326 July 2013
Customer
Account
Site
Meter
Account
Customer
Account
Meter
Architecture: Usage Data Store
1426 July 2013
Customer
Account
Site
Meter
Account
Customer
Account
Meter
HBase + Hadoop Architecture v1.0
1526 July 2013
Meter
metadata
Usage data
Mysql report/
AMI DB's
Batch
Workers
Web
servers
Sqoop
MySQL
report/AMI
DB's
HDFS
M/RHBase
HBase + Hadoop Architecture v2.0
1626 July 2013
Meter
metadata
Batch
Workers
Web
servers
HDFS file upload
Mysql report/
AMI DB's
MySQL
report/AMI
DB's
metadata
requests
HDFS
M/RHBase
Usage data
Data Schema: Kiji
1726 July 2013
Kiji Schema
»  Table layout definition
»  Schema management
»  Object serialization
»  Entity-centric data model
Supporting Projects
»  Kiji MR
»  Kiji Hive Adapter
»  Kiji REST
»  ...
Entity-centric Table: Row Key
1826 July 2013
Hash prefix Utility company Site ID
1 byte 4 bytes 8 bytes
"keys_format":{
"encoding":"FORMATTED",
"salt": { "hash_type": "MD5”, "hash_size": 1 },
"components":[
{ "name":"utility_company”, "type":"INTEGER” },
{ "name":"site_id”, "type":"LONG” }
]
}
Entity-centric Table: Site
1926 July 2013
A single row
0.12 kWh
1.3 Therm
24 Therm
356 kWh
Usage Data Column Family
UUA
June 18 - July
17; $25
Insights Column Family
stream:0 stream:1
stream:2 stream:3
uua:0
bill_forecast:0
Insight Example: Rate Calculation
2026 July 2013
Insights: Jobs & Services
2126 July 2013
»  M/R jobs to compute insights in batch
»  Services to access pre-computed insights / compute insights on demand
»  Insight for a Site is calculated based on the data in the Site’s row
»  The calculated insight is saved back to the Site row
Insight Example: Rate Calculation
2226 July 2013
Usage data column family
site
… … …rate
calculation
bill
forecast
Insights column family
Rate Calculation
MapReduce
stream:0 stream:n
Rate Calculation: Producer
2326 July 2013
public class RateCalculationProducer extends KijiProducer {
	
  
@Override	
  
public	
  void	
  produce(KijiRowData	
  siteRowData,	
  
	
   	
   	
   	
   	
   	
  ProducerContext	
  context)	
  {	
  
	
   	
  RateCalculation	
  insight	
  =	
  computeInsight(siteRowData);	
  
	
   	
  context.put(insight);	
  
}	
  
}	
  
Rate Calculation: Producer
2426 July 2013
public class RateCalculationProducer extends KijiProducer {
	
  
@Override	
  
public	
  void	
  produce(KijiRowData	
  siteRowData,	
  
	
   	
   	
   	
   	
   	
  ProducerContext	
  context)	
  {	
  
	
   	
  RateCalculation	
  insight	
  =	
  computeInsight(siteRowData);	
  
	
   	
  context.put(insight);	
  
}	
  
	
  
@Override	
  
public	
  String	
  getOutputColumn()	
  {	
  
	
   	
  return	
  "rate_calculation”;	
  
}	
  
	
  
}	
  
2526 July 2013
public class RateCalculationProducer extends KijiProducer {	
  
	
  
	
  @Override	
  
	
  public	
  KijiDataRequest	
  getDataRequest()	
  {	
  
	
   	
  Configuration	
  conf	
  =	
  getConf();	
  
	
  	
  	
  	
   	
  long	
  startTime	
  =	
  parseLong(conf.get(START_PARAM));	
  
	
  
	
  	
  	
  	
   	
  return	
  KijiDataRequest.builder()	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  .withTimeRange(startTime,	
  END_OF_TIME)	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  .addColumns(ColumnsDef.create()	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  .withMaxVersions(ALL_VERSIONS)	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  .addFamily("usage_data"))	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  .build();	
  
	
  	
  	
  }	
  
	
  
@Override	
  
public	
  void	
  produce(KijiRowData	
  siteRowData,	
  ...	
  	
  
In-practice
2626 July 2013
»  ETL to an entity-centric schema
»  Bulk loading
»  Mixed workloads
Design decisions and challenges
In-practice: ETL to entity-centric schema
2726 July 2013
meter usage cost start end
0001 719.23 57.52 2013-01-04T00:00:00 2013-02-11T00:00:00
0001 742.61 59.36 2013-02-11T00:00:00 2013-03-12T00:00:00
0002 0.2050 2013-01-01T00:00:00 2013-01-01T00:15:00
0002 0.2250 2013-01-01T00:15:00 2013-01-01T00:30:00
0002 0.2350 2013-01-01T00:30:00 2013-01-01T00:45:00
0002 0.2050 2013-01-01T00:45:00 2013-01-01T01:00:00
0002 0.2250 2013-01-01T01:00:00 2013-01-01T01:15:00
0001 – Meter (Bills)
0002 – Smart Meter (Quarter-hourly reads)
In practice: ETL to entity-centric schema
2826 July 2013
»  Use bulkloading for performance
»  Make ingest process idempotent
»  Introduce a read-log for utility company billing corrections
»  ETL Steps:
1. Ingest all reads into a read-log table2
2. Load reads into the corresponding Site row
Read-log table
M/R Bulkload
Pivot
Site table21
M/R Bulkload
Billing files
In practice: bulk loading
2926 July 2013
»  Bulk loaded files are not assigned sequence numbers
»  All compactions become major compactions
»  Solution: Find a temporary fix, monitor the HBase JIRA
In practice: Mixed workloads
3026 July 2013
Site table
Reporting
apps
Web
servers
M/R
Ad-hoc reads
and forecasts
Batch insight
calculations
Bulk scans
In practice: Mixed workloads
3126 July 2013
»  Supporting mixed workloads requires adapting jobs and configurations
»  IO: Switch to bulkloading, enable direct HDFS reads
»  Major compactions: Disabled
»  Memory: increase heap and region sizes, use MSLAB
»  Verify performance by simulating nominal and high load scenarios
In practice: Mixed workloads
3226 July 2013
Results Visualized
3326 July 2013
Animation of jobs in progress
Mixed Workload Success
3426 July 2013
9ms
2ms
»  Mean read time is ~2ms
»  Nearly 200 forecasts/sec on performance testing cluster
3526 July 2013
Recap
3626 July 2013
Opower
»  Save energy
»  Make money
»  Big (enough) data
Oren Benjamin
oren.benjamin@opower.com
We’re hiring.
http://opower.com/careers
Scott Kuehn
scott.kuehn@opower.com
Rate Calculation: Rate Engine
3726 July 2013
public interface RateEngine {
/**	
  	
  
	
  *	
  Compute	
  the	
  cost	
  per	
  usage	
  read	
  for	
  the	
  given	
  Site	
  	
  
	
  *	
  over	
  the	
  requested	
  time	
  interval.	
  	
  
	
  *	
  @return	
  a	
  RateCalculation	
  containing	
  the	
  result	
  
	
  */	
  
RateCalculation calculate(Site site, List<UsageRead> usageReads);
}
Rate Calculation: Application Context
3826 July 2013
public class RateCalculationProducer extends KijiProducer {
	
  private	
  ConfigurableApplicationContext	
  appContext;	
  
	
  private	
  RateEngine	
  rateEngine;
	
  
	
  @Override	
  
	
  public	
  void	
  setup(KijiContext	
  context)	
  {	
  
	
  	
  	
  	
   	
  String	
  contextPath	
  =	
  getConf().get(CONTEXT_PATH_KEY);	
  
	
  	
  	
  	
   	
  appContext	
  =	
  new	
  XmlAppContext(contextPath);	
  
	
  	
  	
  	
   	
  rateEngine	
  =	
  appContext.getBean(RateEngine.class);	
  
	
  
@Override	
  
public	
  void	
  produce(KijiRowData	
  siteRowData,	
  …

Weitere ähnliche Inhalte

Ähnlich wie Energy usage insights_with_hadoop_and_h_base

MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case StudyMongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case StudyMongoDB
 
Democratization of NOSQL Document-Database over Relational Database Comparati...
Democratization of NOSQL Document-Database over Relational Database Comparati...Democratization of NOSQL Document-Database over Relational Database Comparati...
Democratization of NOSQL Document-Database over Relational Database Comparati...IRJET Journal
 
MineDB Mineral Resource Evaluation White Paper
MineDB Mineral Resource Evaluation White PaperMineDB Mineral Resource Evaluation White Paper
MineDB Mineral Resource Evaluation White PaperDerek Diamond
 
Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...
Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...
Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...Sumeet Singh
 
Fast Range Aggregate Queries for Big Data Analysis
Fast Range Aggregate Queries for Big Data AnalysisFast Range Aggregate Queries for Big Data Analysis
Fast Range Aggregate Queries for Big Data AnalysisIRJET Journal
 
BlueBRIDGE: Cloud infrastructure serving aquafarms and supporting models
BlueBRIDGE: Cloud infrastructure serving aquafarms and supporting modelsBlueBRIDGE: Cloud infrastructure serving aquafarms and supporting models
BlueBRIDGE: Cloud infrastructure serving aquafarms and supporting modelsBlue BRIDGE
 
Modernizing sql server the right way
Modernizing sql server the right wayModernizing sql server the right way
Modernizing sql server the right wayMariano Kovo
 
Dataiku Flow and dctc - Berlin Buzzwords
Dataiku Flow and dctc - Berlin BuzzwordsDataiku Flow and dctc - Berlin Buzzwords
Dataiku Flow and dctc - Berlin BuzzwordsDataiku
 
Solving Data Engineers Velocity - Wix's Data Warehouse Automation
Solving Data Engineers Velocity - Wix's Data Warehouse AutomationSolving Data Engineers Velocity - Wix's Data Warehouse Automation
Solving Data Engineers Velocity - Wix's Data Warehouse AutomationItai Yaffe
 
The Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemsThe Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemstaimur hafeez
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformGoDataDriven
 
Concept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with TelematicsConcept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with TelematicsSeeling Cheung
 
Azuresatpn19 - An Introduction To Azure Data Factory
Azuresatpn19 - An Introduction To Azure Data FactoryAzuresatpn19 - An Introduction To Azure Data Factory
Azuresatpn19 - An Introduction To Azure Data FactoryRiccardo Perico
 
Reference for data migration pls choose and
Reference for data migration pls choose andReference for data migration pls choose and
Reference for data migration pls choose andiswarianagarajan
 
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New York
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New YorkInfinitely Scalable Clusters - Grid Computing on Public Cloud - New York
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New YorkHentsū
 
Azure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake AnalyticsAzure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake AnalyticsWaqas Idrees
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoophuguk
 
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET-  	  A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...IRJET-  	  A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...IRJET Journal
 
Energy-Efficient Task Scheduling in Cloud Environment
Energy-Efficient Task Scheduling in Cloud EnvironmentEnergy-Efficient Task Scheduling in Cloud Environment
Energy-Efficient Task Scheduling in Cloud EnvironmentIRJET Journal
 

Ähnlich wie Energy usage insights_with_hadoop_and_h_base (20)

MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case StudyMongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
 
Democratization of NOSQL Document-Database over Relational Database Comparati...
Democratization of NOSQL Document-Database over Relational Database Comparati...Democratization of NOSQL Document-Database over Relational Database Comparati...
Democratization of NOSQL Document-Database over Relational Database Comparati...
 
Azure SQL Data Warehouse
Azure SQL Data Warehouse Azure SQL Data Warehouse
Azure SQL Data Warehouse
 
MineDB Mineral Resource Evaluation White Paper
MineDB Mineral Resource Evaluation White PaperMineDB Mineral Resource Evaluation White Paper
MineDB Mineral Resource Evaluation White Paper
 
Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...
Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...
Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...
 
Fast Range Aggregate Queries for Big Data Analysis
Fast Range Aggregate Queries for Big Data AnalysisFast Range Aggregate Queries for Big Data Analysis
Fast Range Aggregate Queries for Big Data Analysis
 
BlueBRIDGE: Cloud infrastructure serving aquafarms and supporting models
BlueBRIDGE: Cloud infrastructure serving aquafarms and supporting modelsBlueBRIDGE: Cloud infrastructure serving aquafarms and supporting models
BlueBRIDGE: Cloud infrastructure serving aquafarms and supporting models
 
Modernizing sql server the right way
Modernizing sql server the right wayModernizing sql server the right way
Modernizing sql server the right way
 
Dataiku Flow and dctc - Berlin Buzzwords
Dataiku Flow and dctc - Berlin BuzzwordsDataiku Flow and dctc - Berlin Buzzwords
Dataiku Flow and dctc - Berlin Buzzwords
 
Solving Data Engineers Velocity - Wix's Data Warehouse Automation
Solving Data Engineers Velocity - Wix's Data Warehouse AutomationSolving Data Engineers Velocity - Wix's Data Warehouse Automation
Solving Data Engineers Velocity - Wix's Data Warehouse Automation
 
The Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemsThe Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystems
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data Platform
 
Concept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with TelematicsConcept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with Telematics
 
Azuresatpn19 - An Introduction To Azure Data Factory
Azuresatpn19 - An Introduction To Azure Data FactoryAzuresatpn19 - An Introduction To Azure Data Factory
Azuresatpn19 - An Introduction To Azure Data Factory
 
Reference for data migration pls choose and
Reference for data migration pls choose andReference for data migration pls choose and
Reference for data migration pls choose and
 
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New York
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New YorkInfinitely Scalable Clusters - Grid Computing on Public Cloud - New York
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New York
 
Azure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake AnalyticsAzure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake Analytics
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
 
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET-  	  A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...IRJET-  	  A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
 
Energy-Efficient Task Scheduling in Cloud Environment
Energy-Efficient Task Scheduling in Cloud EnvironmentEnergy-Efficient Task Scheduling in Cloud Environment
Energy-Efficient Task Scheduling in Cloud Environment
 

Kürzlich hochgeladen

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Kürzlich hochgeladen (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Energy usage insights_with_hadoop_and_h_base

  • 1. Energy Usage Insights with Hadoop & HBase July 25, 2013 Scott Kuehn Data Architect Oren Benjamin Senior Software Engineer
  • 2. Our Utility Partners 2 Australia New Zealand France Nova ScotiaUK
  • 5. Energy Savings 526 July 2013 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 4.5% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 Average Steady State Savings = ~1.5 – 3.5% Months since program start Energy saved
  • 9. Data Overview: Energy Usage Streams 926 July 2013 meter usage cost start end 0001 719.23 57.52 2013-01-04T00:00:00 2013-02-11T00:00:00 0001 742.61 59.36 2013-02-11T00:00:00 2013-03-12T00:00:00 0002 0.2050 2013-01-01T00:00:00 2013-01-01T00:15:00 0002 0.2250 2013-01-01T00:15:00 2013-01-01T00:30:00 0002 0.2350 2013-01-01T00:30:00 2013-01-01T00:45:00 0002 0.2050 2013-01-01T00:45:00 2013-01-01T01:00:00 0002 0.2250 2013-01-01T01:00:00 2013-01-01T01:15:00 0001 – Meter (Bills) 0002 – Smart Meter (Quarter-hourly reads)
  • 10. Data Overview: Smart Meter 1026 July 2013
  • 11. Data Overview: Entities 1126 July 2013 Customer Account Site Meter Account Customer Account Meter
  • 12. Data Overview: Size 1226 July 2013 » Billing data: 60M households » Smart meter data: 15M households » On disk: 5TB (raw) » More smart meter data than all other data combined
  • 13. Architecture: Usage Data Store 1326 July 2013 Customer Account Site Meter Account Customer Account Meter
  • 14. Architecture: Usage Data Store 1426 July 2013 Customer Account Site Meter Account Customer Account Meter
  • 15. HBase + Hadoop Architecture v1.0 1526 July 2013 Meter metadata Usage data Mysql report/ AMI DB's Batch Workers Web servers Sqoop MySQL report/AMI DB's HDFS M/RHBase
  • 16. HBase + Hadoop Architecture v2.0 1626 July 2013 Meter metadata Batch Workers Web servers HDFS file upload Mysql report/ AMI DB's MySQL report/AMI DB's metadata requests HDFS M/RHBase Usage data
  • 17. Data Schema: Kiji 1726 July 2013 Kiji Schema »  Table layout definition »  Schema management »  Object serialization »  Entity-centric data model Supporting Projects »  Kiji MR »  Kiji Hive Adapter »  Kiji REST »  ...
  • 18. Entity-centric Table: Row Key 1826 July 2013 Hash prefix Utility company Site ID 1 byte 4 bytes 8 bytes "keys_format":{ "encoding":"FORMATTED", "salt": { "hash_type": "MD5”, "hash_size": 1 }, "components":[ { "name":"utility_company”, "type":"INTEGER” }, { "name":"site_id”, "type":"LONG” } ] }
  • 19. Entity-centric Table: Site 1926 July 2013 A single row 0.12 kWh 1.3 Therm 24 Therm 356 kWh Usage Data Column Family UUA June 18 - July 17; $25 Insights Column Family stream:0 stream:1 stream:2 stream:3 uua:0 bill_forecast:0
  • 20. Insight Example: Rate Calculation 2026 July 2013
  • 21. Insights: Jobs & Services 2126 July 2013 »  M/R jobs to compute insights in batch »  Services to access pre-computed insights / compute insights on demand »  Insight for a Site is calculated based on the data in the Site’s row »  The calculated insight is saved back to the Site row
  • 22. Insight Example: Rate Calculation 2226 July 2013 Usage data column family site … … …rate calculation bill forecast Insights column family Rate Calculation MapReduce stream:0 stream:n
  • 23. Rate Calculation: Producer 2326 July 2013 public class RateCalculationProducer extends KijiProducer {   @Override   public  void  produce(KijiRowData  siteRowData,              ProducerContext  context)  {      RateCalculation  insight  =  computeInsight(siteRowData);      context.put(insight);   }   }  
  • 24. Rate Calculation: Producer 2426 July 2013 public class RateCalculationProducer extends KijiProducer {   @Override   public  void  produce(KijiRowData  siteRowData,              ProducerContext  context)  {      RateCalculation  insight  =  computeInsight(siteRowData);      context.put(insight);   }     @Override   public  String  getOutputColumn()  {      return  "rate_calculation”;   }     }  
  • 25. 2526 July 2013 public class RateCalculationProducer extends KijiProducer {      @Override    public  KijiDataRequest  getDataRequest()  {      Configuration  conf  =  getConf();            long  startTime  =  parseLong(conf.get(START_PARAM));              return  KijiDataRequest.builder()                                    .withTimeRange(startTime,  END_OF_TIME)                                    .addColumns(ColumnsDef.create()                                            .withMaxVersions(ALL_VERSIONS)                                            .addFamily("usage_data"))                                    .build();        }     @Override   public  void  produce(KijiRowData  siteRowData,  ...    
  • 26. In-practice 2626 July 2013 »  ETL to an entity-centric schema »  Bulk loading »  Mixed workloads Design decisions and challenges
  • 27. In-practice: ETL to entity-centric schema 2726 July 2013 meter usage cost start end 0001 719.23 57.52 2013-01-04T00:00:00 2013-02-11T00:00:00 0001 742.61 59.36 2013-02-11T00:00:00 2013-03-12T00:00:00 0002 0.2050 2013-01-01T00:00:00 2013-01-01T00:15:00 0002 0.2250 2013-01-01T00:15:00 2013-01-01T00:30:00 0002 0.2350 2013-01-01T00:30:00 2013-01-01T00:45:00 0002 0.2050 2013-01-01T00:45:00 2013-01-01T01:00:00 0002 0.2250 2013-01-01T01:00:00 2013-01-01T01:15:00 0001 – Meter (Bills) 0002 – Smart Meter (Quarter-hourly reads)
  • 28. In practice: ETL to entity-centric schema 2826 July 2013 »  Use bulkloading for performance »  Make ingest process idempotent »  Introduce a read-log for utility company billing corrections »  ETL Steps: 1. Ingest all reads into a read-log table2 2. Load reads into the corresponding Site row Read-log table M/R Bulkload Pivot Site table21 M/R Bulkload Billing files
  • 29. In practice: bulk loading 2926 July 2013 »  Bulk loaded files are not assigned sequence numbers »  All compactions become major compactions »  Solution: Find a temporary fix, monitor the HBase JIRA
  • 30. In practice: Mixed workloads 3026 July 2013 Site table Reporting apps Web servers M/R Ad-hoc reads and forecasts Batch insight calculations Bulk scans
  • 31. In practice: Mixed workloads 3126 July 2013 »  Supporting mixed workloads requires adapting jobs and configurations »  IO: Switch to bulkloading, enable direct HDFS reads »  Major compactions: Disabled »  Memory: increase heap and region sizes, use MSLAB »  Verify performance by simulating nominal and high load scenarios
  • 32. In practice: Mixed workloads 3226 July 2013
  • 33. Results Visualized 3326 July 2013 Animation of jobs in progress
  • 34. Mixed Workload Success 3426 July 2013 9ms 2ms »  Mean read time is ~2ms »  Nearly 200 forecasts/sec on performance testing cluster
  • 36. Recap 3626 July 2013 Opower »  Save energy »  Make money »  Big (enough) data Oren Benjamin oren.benjamin@opower.com We’re hiring. http://opower.com/careers Scott Kuehn scott.kuehn@opower.com
  • 37. Rate Calculation: Rate Engine 3726 July 2013 public interface RateEngine { /**      *  Compute  the  cost  per  usage  read  for  the  given  Site      *  over  the  requested  time  interval.      *  @return  a  RateCalculation  containing  the  result    */   RateCalculation calculate(Site site, List<UsageRead> usageReads); }
  • 38. Rate Calculation: Application Context 3826 July 2013 public class RateCalculationProducer extends KijiProducer {  private  ConfigurableApplicationContext  appContext;    private  RateEngine  rateEngine;    @Override    public  void  setup(KijiContext  context)  {            String  contextPath  =  getConf().get(CONTEXT_PATH_KEY);            appContext  =  new  XmlAppContext(contextPath);            rateEngine  =  appContext.getBean(RateEngine.class);     @Override   public  void  produce(KijiRowData  siteRowData,  …