SlideShare ist ein Scribd-Unternehmen logo
1 von 42
Compressed Linear Algebra for
Large Scale Machine Learning
Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, Berthold
Reinwald
IBM Research - Almaden; San Jose, CA, USA
University of Maryland; College Park, MD, USA
Presented by: Issa Memari
25/1/2018
Motivation
Machine learning
algorithms
Matrix-vector
multiplications
Iterative
Motivation
Machine learning
algorithms
Matrix-vector
multiplications
Iterative
Full scan of the data
matrix
Motivation
Machine learning
algorithms
Matrix-vector
multiplications
Iterative
Full scan of the data
matrix
Large scale data +
IO bound operations
Solution: Fit more data into memory
Data Compression
algorithm
Compressed data
Solution: Fit more data into memory
Data Compression
algorithm
Compressed data
Compressed
data block
Compressed
data block
Compressed
data block
Decompression
algorithm
Solution: Fit more data into memory
Data Compression
algorithm
Compressed data
Compressed
data block
Compressed
data block
Compressed
data block
Decompression
algorithm
Data block
Machine
learning
algorithm
Compression techniques
General purpose
compression
Heavyweight Lightweight
Compression techniques
General purpose
compression
Heavyweight Lightweight
High compression
ratio
Low decompression
speed
Compression techniques
General purpose
compression
Heavyweight Lightweight
High compression
ratio
Low decompression
speed
Low compression
ratio
High decompression
speed
Compression techniques
General purpose
compression
Heavyweight Lightweight
High compression
ratio
Low decompression
speed
Low compression
ratio
High decompression
speed
Compressed linear
algebra
No decompression
High compression
ratio
Column-wise compression: Motivation
1. Low column cardinalities
Column-wise compression: Motivation
2. Non-uniform sparsity across columns
Column-wise compression: Motivation
3. Tall and skinny matrices
Column encoding formats
1. Run-Length Encoding (RLE)
2. Offset-List Encoding (OLE)
3. Uncompressed Columns (UC)
Column encoding formats:
Run-Length Encoding
General purpose compression
9
9
9
9
0
8.2
9
9
9
0
1
2
3
4
5
6
7
8
9
10
Index Column
10 values
Column encoding formats:
Run-Length Encoding
General purpose compression
9
9
9
9
0
8.2
9
9
9
0
1
2
3
4
5
6
7
8
9
10
{9}
-----
1
4
-----
3
3
{8.2}
-----
6
1
Index Column Encoded column
10 values 8 values
Column encoding formats:
Offset-List Encoding
General purpose compression
9
9
9
9
0
8.2
9
9
9
0
1
2
3
4
5
6
7
8
9
10
ColumnIndex
10 values
Column encoding formats:
Offset-List Encoding
General purpose compression
9
9
9
9
0
8.2
9
9
9
0
1
2
3
4
5
6
7
8
9
10
{9}
-----
1
2
3
4
7
8
9
{8.2}
-----
6
Encoded columnColumnIndex
10 values 10 values
RLE vs. OLE
0
1.7
0
1.7
0
1.7
0
2
0
0
1
2
3
4
5
6
7
8
9
10
ColumnIndex
10 values
RLE vs. OLE
0
1.7
0
1.7
0
1.7
0
2
0
0
1
2
3
4
5
6
7
8
9
10
ColumnIndex
{1.7}
-----
2
1
-----
2
1
-----
2
1
{2}
-----
8
1
RLE encoded column
10 values
10 values
RLE vs. OLE
0
1.7
0
1.7
0
1.7
0
2
0
0
1
2
3
4
5
6
7
8
9
10
ColumnIndex
{1.7}
-----
2
1
-----
2
1
-----
2
1
{2}
-----
8
1
RLE encoded column
{1.7}
-----
2
4
6
{2}
-----
8
OLE encoded column
10 values
10 values 6 values
Column encoding formats:
Uncompressed Columns
General purpose compression
10
20
30
40
50
60
70
80
90
100
1
2
3
4
5
6
7
8
9
10
Index Column
10 values
Column encoding formats:
Uncompressed Columns
General purpose compression
10
20
30
40
50
60
70
80
90
100
1
2
3
4
5
6
7
8
9
10
Index Column Encoded column
10 values 10 values
10
20
30
40
50
60
70
80
90
100
Column co-coding
1.7
2
0
2
2
1.7
0
3
1.7
2
1
2
3
4
5
6
7
8
9
10
Column 1Index
1.7
6
0
6
0
1.7
0
6
1.7
6
Column 2
20 values
Column co-coding
1.7
2
0
2
2
1.7
0
3
1.7
2
1
2
3
4
5
6
7
8
9
10
Column 1Index
1.7
6
0
6
0
1.7
0
6
1.7
6
Column 2
20 values
OLE encoded group
{1.7,1.7}
-----
1
6
9
{2,6}
-----
2
4
10
{2,0}
-----
5
{3,6}
-----
8
16 values
Data layout: OLE
Data layout: OLE
Data layout: OLE
Data layout: RLE
Data layout: RLE
Data layout: RLE
Compressed linear algebra:
Matrix-vector multiplication
Example on the whiteboard
Compression planning
Compression planning involves three tasks:
1. Estimating column compression ratios
2. Partitioning columns into groups
3. Choosing the encoding format for each group
Estimating column compression ratios
Instead of scanning the full data matrix, estimate parameters from a random sample of the data
Partitioning columns into groups
1. Enumerate all possible partitions, infeasible. Bell(13)=4213597
2. Greedy brute force.
3. Bin packing + greedy brute force.
Choosing the encoding format for each group
1. Scan the data matrix and compute actual compressed sizes for chosen groups.
2. For each group, compute compressed size as the minimum of OLE and RLE sizes.
3. If a group is incompressible, keep removing the column with largest estimated compressed size
until group is compressible or empty.
Experiments
CLA
Experiments: Datasets
Dataset #Rows #Columns Sparsity In-memory
size
Higgs 11M 28 0.92 2.5 GB
Census 2.5M 68 0.43 1.3 GB
Covtype 600K 54 0.22 0.14 GB
ImageNet 1.2M 900 0.31 4.4 GB
Mnist8m 8.1M 784 0.25 19 GB
Experiments: Compression ratio
Dataset Gzip Snappy CSR-VI D-VI CLA
Higgs 1.93 1.38 1.04 1.9 2.03
Census 17.11 6.04 3.62 7.99 27.46
Covtype 10.4 6.13 3.56 2.48 12.73
ImageNet 5.54 3.35 2.07 1.93 7.38
Mnist8m 4.12 2.60 2.53 N/A 6.14
Experiments: Matrix-vector multiplication time
Thank you for listening

Weitere ähnliche Inhalte

Was ist angesagt?

2. Linear Data Structure Using Arrays - Data Structures using C++ by Varsha P...
2. Linear Data Structure Using Arrays - Data Structures using C++ by Varsha P...2. Linear Data Structure Using Arrays - Data Structures using C++ by Varsha P...
2. Linear Data Structure Using Arrays - Data Structures using C++ by Varsha P...widespreadpromotion
 
Matematika terapan week 6
Matematika terapan week 6 Matematika terapan week 6
Matematika terapan week 6 Hardini_HD
 
memory allocation by Novodita
memory allocation by Novoditamemory allocation by Novodita
memory allocation by NovoditaSHRISTEERAI1
 
Week2 chapter 02_2.1_system_concept_2015 (1)
Week2 chapter 02_2.1_system_concept_2015 (1)Week2 chapter 02_2.1_system_concept_2015 (1)
Week2 chapter 02_2.1_system_concept_2015 (1)dubuk74
 
Data structures; arrays By ZAK
Data structures; arrays By ZAKData structures; arrays By ZAK
Data structures; arrays By ZAKTabsheer Hasan
 
UNIT III NON LINEAR DATA STRUCTURES – TREES
UNIT III 	NON LINEAR DATA STRUCTURES – TREESUNIT III 	NON LINEAR DATA STRUCTURES – TREES
UNIT III NON LINEAR DATA STRUCTURES – TREESKathirvel Ayyaswamy
 
Multi-Dimensional Clustering: A High-Level Overview
Multi-Dimensional Clustering: A High-Level Overview Multi-Dimensional Clustering: A High-Level Overview
Multi-Dimensional Clustering: A High-Level Overview terraborealis
 
Preparing for BIT – IT2301 Database Management Systems 2001e
Preparing for BIT – IT2301 Database Management Systems 2001ePreparing for BIT – IT2301 Database Management Systems 2001e
Preparing for BIT – IT2301 Database Management Systems 2001eGihan Wikramanayake
 
Unit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptxUnit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptxMalla Reddy University
 
Introduction to Data Structure
Introduction to Data Structure Introduction to Data Structure
Introduction to Data Structure Kamal Singh Lodhi
 
Sql Server Interview Question
Sql Server Interview QuestionSql Server Interview Question
Sql Server Interview Questionpukal rani
 
Data structures and Alogarithims
Data structures and AlogarithimsData structures and Alogarithims
Data structures and AlogarithimsVictor Palmar
 

Was ist angesagt? (16)

Orange Data Mining & Data Visualization Tool
Orange Data Mining & Data Visualization ToolOrange Data Mining & Data Visualization Tool
Orange Data Mining & Data Visualization Tool
 
2. Linear Data Structure Using Arrays - Data Structures using C++ by Varsha P...
2. Linear Data Structure Using Arrays - Data Structures using C++ by Varsha P...2. Linear Data Structure Using Arrays - Data Structures using C++ by Varsha P...
2. Linear Data Structure Using Arrays - Data Structures using C++ by Varsha P...
 
Matematika terapan week 6
Matematika terapan week 6 Matematika terapan week 6
Matematika terapan week 6
 
memory allocation by Novodita
memory allocation by Novoditamemory allocation by Novodita
memory allocation by Novodita
 
Week2 chapter 02_2.1_system_concept_2015 (1)
Week2 chapter 02_2.1_system_concept_2015 (1)Week2 chapter 02_2.1_system_concept_2015 (1)
Week2 chapter 02_2.1_system_concept_2015 (1)
 
Data structures; arrays By ZAK
Data structures; arrays By ZAKData structures; arrays By ZAK
Data structures; arrays By ZAK
 
Datapreprocess
DatapreprocessDatapreprocess
Datapreprocess
 
UNIT III NON LINEAR DATA STRUCTURES – TREES
UNIT III 	NON LINEAR DATA STRUCTURES – TREESUNIT III 	NON LINEAR DATA STRUCTURES – TREES
UNIT III NON LINEAR DATA STRUCTURES – TREES
 
Chapter 1
Chapter 1Chapter 1
Chapter 1
 
Multi-Dimensional Clustering: A High-Level Overview
Multi-Dimensional Clustering: A High-Level Overview Multi-Dimensional Clustering: A High-Level Overview
Multi-Dimensional Clustering: A High-Level Overview
 
Preparing for BIT – IT2301 Database Management Systems 2001e
Preparing for BIT – IT2301 Database Management Systems 2001ePreparing for BIT – IT2301 Database Management Systems 2001e
Preparing for BIT – IT2301 Database Management Systems 2001e
 
Unit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptxUnit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptx
 
Introduction to Data Structure
Introduction to Data Structure Introduction to Data Structure
Introduction to Data Structure
 
Sql Server Interview Question
Sql Server Interview QuestionSql Server Interview Question
Sql Server Interview Question
 
Data Mining: Data processing
Data Mining: Data processingData Mining: Data processing
Data Mining: Data processing
 
Data structures and Alogarithims
Data structures and AlogarithimsData structures and Alogarithims
Data structures and Alogarithims
 

Ähnlich wie Compressed linear algebra for large scale machine learning

Bsc cs ii dfs u-1 introduction to data structure
Bsc cs ii dfs u-1 introduction to data structureBsc cs ii dfs u-1 introduction to data structure
Bsc cs ii dfs u-1 introduction to data structureRai University
 
Mca ii dfs u-1 introduction to data structure
Mca ii dfs u-1 introduction to data structureMca ii dfs u-1 introduction to data structure
Mca ii dfs u-1 introduction to data structureRai University
 
Data preperation
Data preperationData preperation
Data preperationFraboni Ec
 
prvg4sczsginx3ynyqlc-signature-b84f0cf1da1e7d0fde4ecfab2a28f243cfa561f9aa2c9b...
prvg4sczsginx3ynyqlc-signature-b84f0cf1da1e7d0fde4ecfab2a28f243cfa561f9aa2c9b...prvg4sczsginx3ynyqlc-signature-b84f0cf1da1e7d0fde4ecfab2a28f243cfa561f9aa2c9b...
prvg4sczsginx3ynyqlc-signature-b84f0cf1da1e7d0fde4ecfab2a28f243cfa561f9aa2c9b...ImXaib
 
Data preparation
Data preparationData preparation
Data preparationTony Nguyen
 
Data preparation
Data preparationData preparation
Data preparationJames Wong
 
Bca ii dfs u-1 introduction to data structure
Bca ii dfs u-1 introduction to data structureBca ii dfs u-1 introduction to data structure
Bca ii dfs u-1 introduction to data structureRai University
 
Datapreprocessingppt
DatapreprocessingpptDatapreprocessingppt
DatapreprocessingpptShree Hari
 
«Дизайн продвинутых нереляционных схем для Big Data»
«Дизайн продвинутых нереляционных схем для Big Data»«Дизайн продвинутых нереляционных схем для Big Data»
«Дизайн продвинутых нереляционных схем для Big Data»Olga Lavrentieva
 
Session 4 start coding Tensorflow 2.0
Session 4 start coding Tensorflow 2.0Session 4 start coding Tensorflow 2.0
Session 4 start coding Tensorflow 2.0Rajagopal A
 
A Survey of Machine Learning Techniques for Self-tuning Hadoop Performance
A Survey of Machine Learning Techniques for Self-tuning Hadoop Performance A Survey of Machine Learning Techniques for Self-tuning Hadoop Performance
A Survey of Machine Learning Techniques for Self-tuning Hadoop Performance IJECEIAES
 
Predicting Winner of DOTA2 Game
Predicting Winner of DOTA2 GamePredicting Winner of DOTA2 Game
Predicting Winner of DOTA2 GamePrashanth Raj
 
Boosting conversion rates on ecommerce using deep learning algorithms
Boosting conversion rates on ecommerce using deep learning algorithmsBoosting conversion rates on ecommerce using deep learning algorithms
Boosting conversion rates on ecommerce using deep learning algorithmsArmando Vieira
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesSơn Còm Nhom
 
memory allocation.pptx
memory allocation.pptxmemory allocation.pptx
memory allocation.pptxSHRISTEERAI1
 

Ähnlich wie Compressed linear algebra for large scale machine learning (20)

Data Mining
Data MiningData Mining
Data Mining
 
Bsc cs ii dfs u-1 introduction to data structure
Bsc cs ii dfs u-1 introduction to data structureBsc cs ii dfs u-1 introduction to data structure
Bsc cs ii dfs u-1 introduction to data structure
 
Mca ii dfs u-1 introduction to data structure
Mca ii dfs u-1 introduction to data structureMca ii dfs u-1 introduction to data structure
Mca ii dfs u-1 introduction to data structure
 
Data preperation
Data preperationData preperation
Data preperation
 
Data preperation
Data preperationData preperation
Data preperation
 
Data preperation
Data preperationData preperation
Data preperation
 
Data preparation
Data preparationData preparation
Data preparation
 
Data preparation
Data preparationData preparation
Data preparation
 
prvg4sczsginx3ynyqlc-signature-b84f0cf1da1e7d0fde4ecfab2a28f243cfa561f9aa2c9b...
prvg4sczsginx3ynyqlc-signature-b84f0cf1da1e7d0fde4ecfab2a28f243cfa561f9aa2c9b...prvg4sczsginx3ynyqlc-signature-b84f0cf1da1e7d0fde4ecfab2a28f243cfa561f9aa2c9b...
prvg4sczsginx3ynyqlc-signature-b84f0cf1da1e7d0fde4ecfab2a28f243cfa561f9aa2c9b...
 
Data preparation
Data preparationData preparation
Data preparation
 
Data preparation
Data preparationData preparation
Data preparation
 
Bca ii dfs u-1 introduction to data structure
Bca ii dfs u-1 introduction to data structureBca ii dfs u-1 introduction to data structure
Bca ii dfs u-1 introduction to data structure
 
Datapreprocessingppt
DatapreprocessingpptDatapreprocessingppt
Datapreprocessingppt
 
«Дизайн продвинутых нереляционных схем для Big Data»
«Дизайн продвинутых нереляционных схем для Big Data»«Дизайн продвинутых нереляционных схем для Big Data»
«Дизайн продвинутых нереляционных схем для Big Data»
 
Session 4 start coding Tensorflow 2.0
Session 4 start coding Tensorflow 2.0Session 4 start coding Tensorflow 2.0
Session 4 start coding Tensorflow 2.0
 
A Survey of Machine Learning Techniques for Self-tuning Hadoop Performance
A Survey of Machine Learning Techniques for Self-tuning Hadoop Performance A Survey of Machine Learning Techniques for Self-tuning Hadoop Performance
A Survey of Machine Learning Techniques for Self-tuning Hadoop Performance
 
Predicting Winner of DOTA2 Game
Predicting Winner of DOTA2 GamePredicting Winner of DOTA2 Game
Predicting Winner of DOTA2 Game
 
Boosting conversion rates on ecommerce using deep learning algorithms
Boosting conversion rates on ecommerce using deep learning algorithmsBoosting conversion rates on ecommerce using deep learning algorithms
Boosting conversion rates on ecommerce using deep learning algorithms
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and Techniques
 
memory allocation.pptx
memory allocation.pptxmemory allocation.pptx
memory allocation.pptx
 

Kürzlich hochgeladen

Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfSubhamKumar3239
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Milind Agarwal
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 

Kürzlich hochgeladen (20)

Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 

Compressed linear algebra for large scale machine learning

Hinweis der Redaktion

  1. In fact if we try to encode these two columns using OLE individually, we’re gonna end up with an encoding that contains 20 values. The first column is even incompressible (encoded size is larger than uncompressed)
  2. Let us now take a look at how the encoded data is actually stored in memory, this will be useful to understand how to compute the size of a compressed column group.
  3. Bij is the number of segments of tuple tij Zi is the total number of offsets in the column group Di is the number of distinct tuples |Gi| is the number of columns in the group
  4. Bij is the number of segments of tuple tij Zi is the total number of offsets in the column group Di is the number of distinct tuples |Gi| is the number of columns in the group
  5. Size in bytes for an OLE encoded group of columns and for an RLE encoded group of columns
  6. Say what is compression ratio Describe what this partitioning thing is Describe what choosing the encoding format is
  7. Bij is the number of segments of tuple tij Zi is the total number of offsets in the column group Di is the number of distinct tuples |Gi| is the number of columns in the group Rij is the number of runs for tuple tij