SlideShare ist ein Scribd-Unternehmen logo
1 von 12
Downloaden Sie, um offline zu lesen
Olist
A Brazilian E-commerce Company
APAN 5310 Project Team 1
Juno Zhu | Manasa Damera | Sarah Faye Wu | Yuhuan Su
Agenda
Background - Client Scenario & Data Overview
Database Normalization
ETL Process Optimization
Analytics Insights Automation - Benefits & Procedure
Dashboard Demo
Client Scenario: Powering Business Intelligence at Olist
Database
Normalization
Create a normalized
relational database
as a central data
repository to collect
data
ETL Process
Optimization
Conduct data
manipulation and
data cleaning via
Python; Upload the
data to Postgresql
database
Analytics
Insights
Automation
Generate analytical
insights through an
interactive
dashboard via
Metabase
Scattered data
storage through
multiple flat files
Inefficient
information query
process
Lack of analytics
insights to make
business decisions
Current
Pain Points
Reduce data
storage redundancy
Create efficient data
query and analytics
procedure
Empower data
driven decision
making capability
Future Impact
Original Data Sample
Repetitive columns
that should be
combined
Geolocation File Customer File
Too large file size
to be uploaded
into Codio
Data Overview
● Data consists of 100,000 orders from 2016 through 2018 placed by customers on Olist from several sellers located across Brazil
● 9 Flat CSV Files: Customers, Geolocation, Order Items, Order Payments, Order Reviews, Orders, Products, Sellers and Category File
● Total size 123.4 MB
● If we merge geolocation (61.3MB) with customers dataset (9MB)
to link each other, the customers dataset will be over 150MB.
● Thus, we sample these two datasets for further usage.
Underlying
duplicates difficult
to be detected
● Geolocation dataset has underlying “duplicates” which cannot be detected by using
“drop_duplicates()” function in Python, because the language might be different.
● In the reviews dataset, one review_id would link to different oder_id with different
information in other columns. (composite primary key: review_id, order_id)
Scattered data
storage through
multiple files
Different
languages across
different files
Products File
Orders File Order Items File
● Information about orders, delivery, product ordered is stored in separate files.
Normalization Plan: Creating an Optimized Data Schema
1st Normal Form
● Added primary keys such as geolocation_id to Address table
● Added foreign keys such as product_category_id to Product Category table
● Dropped duplicated data such as address data from Customers/Sellers tables
2nd Normal Form
● No changes on the tables as all non-key attributes were fully dependent
3rd Normal Form
● The Orders table was split into Orders and Delivery tables
Extract
ETL Process:
Transform Load
Uploading the
transformed data to
a centralized
repository in
PostgreSQL
database
Extracting
e-commerce data
from multiple CSV
flat files
Performing data
cleaning and
manipulation on
the extracted data
via Python
ETL - Transform Process Debrief
Step 1 Extract, rename, and reorder the columns
Step 2 Get the relevant information for each table by merging datasets
Step 3 Drop duplicated entries
Step 4
Ensure that the primary key column only includes unique values and
uniquely identifies each record in a table
Step 5 Construct the "id" variable if necessary
Step 6
If the table exists foreign keys, merge the current dataset with the
dataset referred to by this key to get the intersection. Drop
unnecessary columns and rename columns after merging.
Step 7
Change the data type of the variable of the raw dataset to stay
consistent with the column data type we designed.
New Customer Table
New Address Table
Analytical Procedures Benefits (WHY):
Customer Insights
CMO: Understand customers’ demographic info, shopping behavior and product preference to make
targeted marketing strategy. Identify customers’ cities distribution/ customer lifetime value/ top
categories/ peak purchase time/ number of customers by year and month.
Seller Insights
Client Account Executive: Understand sellers’ demographic info, sales performance and product
rank to inform sellers improve performance. Identify top sellers/ categories with highest growth.
Financials Insights
CFO: Analyze platform revenue and cost on a real-time pace to make quick decisions and identify
potential performance issues. Understand order value/ monthly and annual sales.
Operations Insights
COO: Oversee logistics performance and react timely when significant shipment delays occurred.
Monitor monthly on-time delivery rate performance.
Post Purchase Service Insights
Customer Service Executive: Review customer reviews metrics to ensure
a high-quality closed loop service. Analyze order review scores/ customer complaints.
Empower
C-level executives and
analysts
to understand
business performance
from a 360 degree view
Analytical Procedures Instructions (HOW):
C-level executives communicate key metrics
used to review each department’s performance
to the analysts.
Creation
Vision
Analysts build customized metrics for dashboard
by writing queries using both python and
postgreSQL on Metabase platform.
Action
C-level executives review the dashboard on a daily basis to
oversee business performance. Once they notice an issue such
as a drop in sales, they should inform analysts to perform
further analysis and make data-driven decisions.
Implementation
Analysts should seek feedback from the
executives to further improve the analytical
procedure by revising the metrics.
Further Considerations
● On-premises solution for sensitive and
personally identifiable customer data
● Anonymization of customer data for
cloud upload
● Offsite/cloud for less sensitive data and
anonymized customer data
Database Interaction Demo
http://35.237.178.81:3000/dashboard/1
Thank you!
Q&A
References
Data Sources:
1. Kaggle (Brazilian E-Commerce Public Dataset by Olist),
https://www.kaggle.com/olistbr/brazilian-ecommerce/home
2. Silberschatz, A., Korth, H. F., and Sudarshan, S. (2011). Database System Concepts (6th Edition). McGraw-Hill.
ISBN-13: 978-0073523323
Code - Data sampling [Link]
Code - Create database & Extract, Transform, Load in Python [Link]

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction of ssis
Introduction of ssisIntroduction of ssis
Introduction of ssisdeepakk073
 
Building a Big Data Pipeline
Building a Big Data PipelineBuilding a Big Data Pipeline
Building a Big Data PipelineJesus Rodriguez
 
An introduction to data warehousing
An introduction to data warehousingAn introduction to data warehousing
An introduction to data warehousingShahed Khalili
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?RTTS
 
Informatica interview questions and answers|Informatica Faqs 2014
Informatica interview questions and answers|Informatica Faqs 2014Informatica interview questions and answers|Informatica Faqs 2014
Informatica interview questions and answers|Informatica Faqs 2014BigClasses.com
 
Automating Power BI Creations
Automating Power BI CreationsAutomating Power BI Creations
Automating Power BI CreationsAngel Abundez
 
Data warehousing and business intelligence project report
Data warehousing and business intelligence project reportData warehousing and business intelligence project report
Data warehousing and business intelligence project reportsonalighai
 
Kaggle winning solutions: Retail Sales Forecasting
Kaggle winning solutions: Retail Sales ForecastingKaggle winning solutions: Retail Sales Forecasting
Kaggle winning solutions: Retail Sales ForecastingYan Xu
 
An overview of BigQuery
An overview of BigQuery An overview of BigQuery
An overview of BigQuery GirdhareeSaran
 
Data warehouse : Order Management
Data warehouse : Order ManagementData warehouse : Order Management
Data warehouse : Order ManagementKritiya Sangnitidaj
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookJames Serra
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing conceptspcherukumalla
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Pentaho Data Integration Introduction
Pentaho Data Integration IntroductionPentaho Data Integration Introduction
Pentaho Data Integration Introductionmattcasters
 
06. Transformation Logic Template (Source to Target)
06. Transformation Logic Template (Source to Target)06. Transformation Logic Template (Source to Target)
06. Transformation Logic Template (Source to Target)Alan D. Duncan
 
Pentaho | Data Integration & Report designer
Pentaho | Data Integration & Report designerPentaho | Data Integration & Report designer
Pentaho | Data Integration & Report designerHamdi Hmidi
 

Was ist angesagt? (20)

SSIS Presentation
SSIS PresentationSSIS Presentation
SSIS Presentation
 
Introduction of ssis
Introduction of ssisIntroduction of ssis
Introduction of ssis
 
Building a Big Data Pipeline
Building a Big Data PipelineBuilding a Big Data Pipeline
Building a Big Data Pipeline
 
Dimensional Modelling
Dimensional ModellingDimensional Modelling
Dimensional Modelling
 
Google BigQuery
Google BigQueryGoogle BigQuery
Google BigQuery
 
An introduction to data warehousing
An introduction to data warehousingAn introduction to data warehousing
An introduction to data warehousing
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?
 
Informatica interview questions and answers|Informatica Faqs 2014
Informatica interview questions and answers|Informatica Faqs 2014Informatica interview questions and answers|Informatica Faqs 2014
Informatica interview questions and answers|Informatica Faqs 2014
 
Automating Power BI Creations
Automating Power BI CreationsAutomating Power BI Creations
Automating Power BI Creations
 
Data warehousing and business intelligence project report
Data warehousing and business intelligence project reportData warehousing and business intelligence project report
Data warehousing and business intelligence project report
 
Kaggle winning solutions: Retail Sales Forecasting
Kaggle winning solutions: Retail Sales ForecastingKaggle winning solutions: Retail Sales Forecasting
Kaggle winning solutions: Retail Sales Forecasting
 
An overview of BigQuery
An overview of BigQuery An overview of BigQuery
An overview of BigQuery
 
Data warehouse : Order Management
Data warehouse : Order ManagementData warehouse : Order Management
Data warehouse : Order Management
 
Big query
Big queryBig query
Big query
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Pentaho Data Integration Introduction
Pentaho Data Integration IntroductionPentaho Data Integration Introduction
Pentaho Data Integration Introduction
 
06. Transformation Logic Template (Source to Target)
06. Transformation Logic Template (Source to Target)06. Transformation Logic Template (Source to Target)
06. Transformation Logic Template (Source to Target)
 
Pentaho | Data Integration & Report designer
Pentaho | Data Integration & Report designerPentaho | Data Integration & Report designer
Pentaho | Data Integration & Report designer
 

Ähnlich wie Project+team+1 slides (2)

Business Intelligence: Data Warehouses
Business Intelligence: Data WarehousesBusiness Intelligence: Data Warehouses
Business Intelligence: Data WarehousesMichael Lamont
 
POS Data Quality: Overcoming a Lingering Retail Nightmare
POS Data Quality: Overcoming a Lingering Retail NightmarePOS Data Quality: Overcoming a Lingering Retail Nightmare
POS Data Quality: Overcoming a Lingering Retail NightmareCognizant
 
Bi Architecture And Conceptual Framework
Bi Architecture And Conceptual FrameworkBi Architecture And Conceptual Framework
Bi Architecture And Conceptual FrameworkSlava Kokaev
 
Excel Tips for the Time-Crunched Marketer
Excel Tips for the Time-Crunched MarketerExcel Tips for the Time-Crunched Marketer
Excel Tips for the Time-Crunched MarketerHanapin Marketing
 
Business Intelligence Challenges 2009
Business Intelligence Challenges 2009Business Intelligence Challenges 2009
Business Intelligence Challenges 2009Lonnell Branch
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousingwork
 
Designing high performance datawarehouse
Designing high performance datawarehouseDesigning high performance datawarehouse
Designing high performance datawarehouseUday Kothari
 
Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overviewashok kumar
 
Building a financial data warehouse: A lesson in empathy
Building a financial data warehouse: A lesson in empathyBuilding a financial data warehouse: A lesson in empathy
Building a financial data warehouse: A lesson in empathySolmaz Shahalizadeh
 
Data Collection Process And Integrity
Data Collection Process And IntegrityData Collection Process And Integrity
Data Collection Process And IntegrityGerrit Klaschke, CSM
 
Data Alchemy Overview Presentation (Static Version)
Data Alchemy Overview Presentation (Static Version)Data Alchemy Overview Presentation (Static Version)
Data Alchemy Overview Presentation (Static Version)Mark Rubenstein
 
Data warehouse
Data warehouseData warehouse
Data warehouse_123_
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehousekiran14360
 
Overview of business intelligence
Overview of business intelligenceOverview of business intelligence
Overview of business intelligenceAhsan Kabir
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysNEWYORKSYS-IT SOLUTIONS
 
Case Study Scenario - Global Trading PLCGlobal Trading PLC is.docx
Case Study Scenario - Global Trading PLCGlobal Trading PLC is.docxCase Study Scenario - Global Trading PLCGlobal Trading PLC is.docx
Case Study Scenario - Global Trading PLCGlobal Trading PLC is.docxtidwellveronique
 

Ähnlich wie Project+team+1 slides (2) (20)

Business Intelligence: Data Warehouses
Business Intelligence: Data WarehousesBusiness Intelligence: Data Warehouses
Business Intelligence: Data Warehouses
 
POS Data Quality: Overcoming a Lingering Retail Nightmare
POS Data Quality: Overcoming a Lingering Retail NightmarePOS Data Quality: Overcoming a Lingering Retail Nightmare
POS Data Quality: Overcoming a Lingering Retail Nightmare
 
Bi Architecture And Conceptual Framework
Bi Architecture And Conceptual FrameworkBi Architecture And Conceptual Framework
Bi Architecture And Conceptual Framework
 
Excel Tips for the Time-Crunched Marketer
Excel Tips for the Time-Crunched MarketerExcel Tips for the Time-Crunched Marketer
Excel Tips for the Time-Crunched Marketer
 
Business Intelligence Challenges 2009
Business Intelligence Challenges 2009Business Intelligence Challenges 2009
Business Intelligence Challenges 2009
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Designing high performance datawarehouse
Designing high performance datawarehouseDesigning high performance datawarehouse
Designing high performance datawarehouse
 
Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overview
 
Orqubit Business Intelligence
Orqubit Business IntelligenceOrqubit Business Intelligence
Orqubit Business Intelligence
 
Group - 9 Final Deliverable
Group - 9 Final DeliverableGroup - 9 Final Deliverable
Group - 9 Final Deliverable
 
Building a financial data warehouse: A lesson in empathy
Building a financial data warehouse: A lesson in empathyBuilding a financial data warehouse: A lesson in empathy
Building a financial data warehouse: A lesson in empathy
 
Data Collection Process And Integrity
Data Collection Process And IntegrityData Collection Process And Integrity
Data Collection Process And Integrity
 
Data Alchemy Overview Presentation (Static Version)
Data Alchemy Overview Presentation (Static Version)Data Alchemy Overview Presentation (Static Version)
Data Alchemy Overview Presentation (Static Version)
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Big Data + PeopleSoft = BIG WIN!
Big Data + PeopleSoft = BIG WIN!Big Data + PeopleSoft = BIG WIN!
Big Data + PeopleSoft = BIG WIN!
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehouse
 
Mli 2017 business mbi
Mli 2017 business mbiMli 2017 business mbi
Mli 2017 business mbi
 
Overview of business intelligence
Overview of business intelligenceOverview of business intelligence
Overview of business intelligence
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
 
Case Study Scenario - Global Trading PLCGlobal Trading PLC is.docx
Case Study Scenario - Global Trading PLCGlobal Trading PLC is.docxCase Study Scenario - Global Trading PLCGlobal Trading PLC is.docx
Case Study Scenario - Global Trading PLCGlobal Trading PLC is.docx
 

Kürzlich hochgeladen

Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制vexqp
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjurptikerjasaptiker
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxVivek487417
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 

Kürzlich hochgeladen (20)

Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 

Project+team+1 slides (2)

  • 1. Olist A Brazilian E-commerce Company APAN 5310 Project Team 1 Juno Zhu | Manasa Damera | Sarah Faye Wu | Yuhuan Su
  • 2. Agenda Background - Client Scenario & Data Overview Database Normalization ETL Process Optimization Analytics Insights Automation - Benefits & Procedure Dashboard Demo
  • 3. Client Scenario: Powering Business Intelligence at Olist Database Normalization Create a normalized relational database as a central data repository to collect data ETL Process Optimization Conduct data manipulation and data cleaning via Python; Upload the data to Postgresql database Analytics Insights Automation Generate analytical insights through an interactive dashboard via Metabase Scattered data storage through multiple flat files Inefficient information query process Lack of analytics insights to make business decisions Current Pain Points Reduce data storage redundancy Create efficient data query and analytics procedure Empower data driven decision making capability Future Impact
  • 4. Original Data Sample Repetitive columns that should be combined Geolocation File Customer File Too large file size to be uploaded into Codio Data Overview ● Data consists of 100,000 orders from 2016 through 2018 placed by customers on Olist from several sellers located across Brazil ● 9 Flat CSV Files: Customers, Geolocation, Order Items, Order Payments, Order Reviews, Orders, Products, Sellers and Category File ● Total size 123.4 MB ● If we merge geolocation (61.3MB) with customers dataset (9MB) to link each other, the customers dataset will be over 150MB. ● Thus, we sample these two datasets for further usage. Underlying duplicates difficult to be detected ● Geolocation dataset has underlying “duplicates” which cannot be detected by using “drop_duplicates()” function in Python, because the language might be different. ● In the reviews dataset, one review_id would link to different oder_id with different information in other columns. (composite primary key: review_id, order_id) Scattered data storage through multiple files Different languages across different files Products File Orders File Order Items File ● Information about orders, delivery, product ordered is stored in separate files.
  • 5. Normalization Plan: Creating an Optimized Data Schema 1st Normal Form ● Added primary keys such as geolocation_id to Address table ● Added foreign keys such as product_category_id to Product Category table ● Dropped duplicated data such as address data from Customers/Sellers tables 2nd Normal Form ● No changes on the tables as all non-key attributes were fully dependent 3rd Normal Form ● The Orders table was split into Orders and Delivery tables
  • 6. Extract ETL Process: Transform Load Uploading the transformed data to a centralized repository in PostgreSQL database Extracting e-commerce data from multiple CSV flat files Performing data cleaning and manipulation on the extracted data via Python
  • 7. ETL - Transform Process Debrief Step 1 Extract, rename, and reorder the columns Step 2 Get the relevant information for each table by merging datasets Step 3 Drop duplicated entries Step 4 Ensure that the primary key column only includes unique values and uniquely identifies each record in a table Step 5 Construct the "id" variable if necessary Step 6 If the table exists foreign keys, merge the current dataset with the dataset referred to by this key to get the intersection. Drop unnecessary columns and rename columns after merging. Step 7 Change the data type of the variable of the raw dataset to stay consistent with the column data type we designed. New Customer Table New Address Table
  • 8. Analytical Procedures Benefits (WHY): Customer Insights CMO: Understand customers’ demographic info, shopping behavior and product preference to make targeted marketing strategy. Identify customers’ cities distribution/ customer lifetime value/ top categories/ peak purchase time/ number of customers by year and month. Seller Insights Client Account Executive: Understand sellers’ demographic info, sales performance and product rank to inform sellers improve performance. Identify top sellers/ categories with highest growth. Financials Insights CFO: Analyze platform revenue and cost on a real-time pace to make quick decisions and identify potential performance issues. Understand order value/ monthly and annual sales. Operations Insights COO: Oversee logistics performance and react timely when significant shipment delays occurred. Monitor monthly on-time delivery rate performance. Post Purchase Service Insights Customer Service Executive: Review customer reviews metrics to ensure a high-quality closed loop service. Analyze order review scores/ customer complaints. Empower C-level executives and analysts to understand business performance from a 360 degree view
  • 9. Analytical Procedures Instructions (HOW): C-level executives communicate key metrics used to review each department’s performance to the analysts. Creation Vision Analysts build customized metrics for dashboard by writing queries using both python and postgreSQL on Metabase platform. Action C-level executives review the dashboard on a daily basis to oversee business performance. Once they notice an issue such as a drop in sales, they should inform analysts to perform further analysis and make data-driven decisions. Implementation Analysts should seek feedback from the executives to further improve the analytical procedure by revising the metrics. Further Considerations ● On-premises solution for sensitive and personally identifiable customer data ● Anonymization of customer data for cloud upload ● Offsite/cloud for less sensitive data and anonymized customer data
  • 12. References Data Sources: 1. Kaggle (Brazilian E-Commerce Public Dataset by Olist), https://www.kaggle.com/olistbr/brazilian-ecommerce/home 2. Silberschatz, A., Korth, H. F., and Sudarshan, S. (2011). Database System Concepts (6th Edition). McGraw-Hill. ISBN-13: 978-0073523323 Code - Data sampling [Link] Code - Create database & Extract, Transform, Load in Python [Link]