SlideShare ist ein Scribd-Unternehmen logo
1 von 75
Cubes
                   light-weight OLAP




Stefan Urbanek ■ @Stiivi ■ stefan.urbanek@gmail.com ■ July 2012
source

  github.com/Stiivi/cubes

         documentation

packages.python.org/cubes/
Overview

■   purpose
■   analytical modelling and OLAP
■   slicing and dicing
■   OLAP server
■   SQL backend
analytical data modelling
        lightweight
http://tendre.sme.sk
aggregation browsing
     slicing and dicing
modelling   reporting
            aggregation browsing
Architecture
✂
 model     browser




             http

backends   server
Logical Model
 multidimensional, analytical
business/analyst’s
  point of view
transactions                 analysis
         OLTP                        OLAP




application (operational) data   analytical data
Model
           {
               “name” = “My Model”
               “description” = ....

               “cubes” = [...]
               “dimensions” = [...]
           }




cubes                         dimensions
measures                        levels, attributes, hierarchy
Facts

                  measurable


      fact

                    fact data cell




most detailed information
location




type




              time



           dimensions
Dimension

■ provide context for facts
■ used to filter queries or reports
■ control scope of aggregation of
  facts
Hierarchy


     2010 May 1st



        levels
Dimension

■   levels and attributes          “dimensions” = [
                                     {
■   hierarchy*                          “name”:”date”,
                                        “levels”: ...

■   key attributes                   },
                                        “hierarchy”: ...

                                     ...
■   label attributes               ]




                       *partial support for multiple hierarchies
label attribute   key attribute
                  for links to slices
Cube
               “cubes” = [
                 {
                    “name”:”contracts”,
                    “dimensions”: [ “date”,
                                    “category” ]
                    “measures”: [
■ dimensions          {
                        “name”: “amount”,
                        “label”: “Contract Amount”,
■ measures            }
                        “aggregations”: [“sum”]

                    ]
                 },
                 ...
               ]


                *partial support for multiple hierarchies
"attributes": [
                           {
                             "name":"group",
                             "label": "Group code"

localizable                },
                           {
                             "name":"group_label",
model and attributes         "label": "Group",
                             "locales": ["en", "sk"]
                           }
                       ]
Aggregation
  Browser

    ∑
∑ measures
get more details
Aggregation
                               Browser




SQL Snowflake   SQL Denormalized                               Some HTTP Data
                                            MongoDB Browser
  Browser          Browser                                     Service Browser




                                                                     ?




                        “batteries” that are included
Browser Workspace




logical model
                +   data
Cell
context of interest




cell
cell
Path

              [45,2]




[2012, 6]
                       list of level keys
1   load_model("model.json")

           Application



                  ∑

                                 3   model.cube("sales")
                                 4   workspace.browser(cube)


             cubes

       Aggregation Browser
            backend



2   create_workspace("sql",
                     model,
                     url="sqlite:///data.sqlite")
summary




drill-down
browser.aggregate(o cell)




                            summary
browser.aggregate(o cell,
                  . drilldown=[9 "sector"])




                         drill-down
for row in result.drilldown:




              row["amount_sum"]
row[q label_attribute]            row[k key]
received_amount_sum


measure      aggregation




           record_count
browser.facts(o cell)


browser.values(o cell, 9 dimension)


browser.cell_details(o cell)
✂
    Slicing and Dicing
✂
✂
✂
               April 2012
constructi
 on work                       construction work in
                                    april 2012
             type




    supplier



                            date
cut types
✂

point         set           range
           [[2010,10],   from=[2010,10]
[2010]
            [2010,12]]   to=[2010,12]
Implicit Hierarchy
       drilldown
whole cube


                                          o cell = Cell(cube)
                                          browser.aggregate(o cell)
                Total




                                          browser.aggregate(o cell,
                                                       drilldown=[9 “date”])


2006 2007 2008 2009 2010


                                          ✂ cut = PointCut(9 “date”, [2010])
                                          o cell = o cell.slice(✂ cut)

                                          browser.aggregate(o cell,
                                                       drilldown=[9 “date”])
Jan   Feb Mar Apr March April May   ...
Drill-down Level
. drilldown = [9 "date"]


                implicit: next from o cell




. drilldown = {9 "date": "month"}


                              explicit
Cross Table
 experimental interface
2009     2010

     Assets           Due from Banks     3044     1803
     Assets              Investments    41012    36012
     Assets        Loans Outstanding   103657   118104
     Assets            Nonnegotiable     1202     1123
     Assets             Other Assets     2247     3071
     Assets        Other Receivables      984      811
     Assets              Receivables      176      171
     Assets               Securities       33      289
     Equity            Capital Stock    11491    11492
     Equity         Deferred Amounts      359      313
     Equity                    Other    -1683    -3043
     Equity        Retained Earnings    29870    28793
Liabilities               Borrowings   110040   128577
Liabilities   Derivative Liabilities   115642   110418
Liabilities                    Other       57        8
Liabilities        Other Liabilities     7321     5454
Liabilities             Sold or Lent     2323      998
rows = ["item.category",
        "item.subcategory"]

columns = ["year"]

measures = ["amount_sum"]

table = result.cross_table(
              rows,
              columns,
              measures
        )
Slicer
The HTTP OLAP Server



      ✂
Application




HTTP                         JSON
             Slicer



                   ∑




       Aggregation Browser
GET /model

GET /aggregate

GET /values

GET /report
w
 logical model       configuration   data




$ slicer serve slicer.ini
[server]
backend: sql
log_level: info

[model]
path: model.json
locales: en,sk

[workspace]
url: postgres://localhost/database
schema: datamart
fact_prefix: ft_
dimension_prefix: dm_



                                 w
∑      amount




GET /aggregate
GET aggregate




{
    "cell": [],
    "drilldown": [],
    "summary": {
        "record_count": 62,
        "amount_sum": 1116860
    }
}
∑         amount
✂




GET /aggregate?cut=date:2010
GET aggregate?cut=year:2010




{
    "cell": [
        {
            "path": ["2010"],
            "type": "point",
            "dimension": "year",
            "level_depth": 1
        }
    ],
    "drilldown": [],
    "summary": {
        "record_count": 31,
        "amount_sum": 566020
    }
}
GET aggregate?drilldown=year



{
     "cell": [],
     "total_cell_count": 2,
     "drilldown": [
         {
             "record_count": 31,
             "amount_sum": 550840,
             "year": 2009
         },
         {
             "record_count": 31,
             "amount_sum": 566020,
             "year": 2010
         }
     ],
     "summary": {
         "record_count": 62,
         "amount_sum": 1116860
     }
}
GET report


                     Content-Type: application/json
list of cuts         {
                         "cell": [
                             {
                                 "dimension": "date",
                                 "type": "range",
                                 "from": [2009],
                                 "to": [2011,6]
                             }
                         ],
                         "queries": {
        list of              "by_segment": {
     named queries               "query": "aggregate",
                                 "drilldown": ["segment"]
                             },
                             "by_year": {
                                 "query": "aggregate",
                                 "drilldown": {"date":"year"}
                             }
                         }
                     }
SQL Backend
 What data it works with?
★   or
         ❄
★

dimensions   fact table
❄


             fact table
dimensions
Aggregation Browser


                     Browsing Context


               Snowflake            Denormalized
                             or
                Mapper               Mapper



denormalized view




snowflake
           ❄
logical




              physical
          ❄
SQL Features
■ does not require DB write access
■ denormalisation
 ■   denormalised browsing, indexing


■ simple date datatype dimension
 ■   extraction of date parts during mapping


■ multiple schema support
Slicer
command-line tool
■ model validation
  slicer model validate model.json



■ model translation
  slicer model translate model.json translation.json



■ workspace testing
  slicer test config.ini



■ denormalization
  slicer denormalize --materialize --index config.ini
Future
■ formatters for visualisation libraries
■ JavaScript library*             help needed

■ backends
■ derived measures


                        *http://github.com/Stiivi/cubes-js
Open Data

■ shared repository of models
■ shared repository of dimensions
■ public cubes
   open Slicer HTTP APIs




                           http://github.com/Stiivi/cubes/wiki
stay light
 Nutrition Facts
 Serving Size 1 cube

 Amount Per Serving
                       % Daily Value
 Total Fat 0g                    0%

   Saturated Fat 0g
   Trans Fat 0g
Thank You
              source:
    github.com/Stiivi/cubes
           documentation:
  packages.python.org/cubes/
             examples:
github.com/Stiivi/cubes-examples
Backup
Transactions                 Reporting
                              multidimensional
object–relational modelling
                                 modelling

      ORM mapping              logical model
                                 (and mapping)


   database connection            browser

     database engine             workspace
Limitations

■ one cut per dimension in a cell
 ■   logical conjunction of cuts (cut1 AND cut2 AND cut3 ...)


■ dimension-only selection
■ one - default hierarchy
 ■   some internals are ready for multiple

Weitere ähnliche Inhalte

Was ist angesagt?

Database Management System
Database Management SystemDatabase Management System
Database Management SystemHitesh Mohapatra
 
Top 10 Cypher Tuning Tips & Tricks
Top 10 Cypher Tuning Tips & TricksTop 10 Cypher Tuning Tips & Tricks
Top 10 Cypher Tuning Tips & TricksNeo4j
 
Secure Software Development with 3rd Party Dependencies
Secure Software Development with 3rd Party DependenciesSecure Software Development with 3rd Party Dependencies
Secure Software Development with 3rd Party Dependenciesthariyarox
 
Streamlined landscape creation with new Terrain Tools- Unite Copenhagen 2019
Streamlined landscape creation with new Terrain Tools- Unite Copenhagen 2019Streamlined landscape creation with new Terrain Tools- Unite Copenhagen 2019
Streamlined landscape creation with new Terrain Tools- Unite Copenhagen 2019Unity Technologies
 
Graph Database Query Languages
Graph Database Query LanguagesGraph Database Query Languages
Graph Database Query LanguagesJay Coskey
 
Learn 90% of Python in 90 Minutes
Learn 90% of Python in 90 MinutesLearn 90% of Python in 90 Minutes
Learn 90% of Python in 90 MinutesMatt Harrison
 
Basics of Object Oriented Programming in Python
Basics of Object Oriented Programming in PythonBasics of Object Oriented Programming in Python
Basics of Object Oriented Programming in PythonSujith Kumar
 
Arrays In Python | Python Array Operations | Edureka
Arrays In Python | Python Array Operations | EdurekaArrays In Python | Python Array Operations | Edureka
Arrays In Python | Python Array Operations | EdurekaEdureka!
 
Optimizing MapReduce Job performance
Optimizing MapReduce Job performanceOptimizing MapReduce Job performance
Optimizing MapReduce Job performanceDataWorks Summit
 
Module 4 Enumeration
Module 4   EnumerationModule 4   Enumeration
Module 4 Enumerationleminhvuong
 

Was ist angesagt? (20)

Database Management System
Database Management SystemDatabase Management System
Database Management System
 
PDBC
PDBCPDBC
PDBC
 
Top 10 Cypher Tuning Tips & Tricks
Top 10 Cypher Tuning Tips & TricksTop 10 Cypher Tuning Tips & Tricks
Top 10 Cypher Tuning Tips & Tricks
 
Python basic
Python basicPython basic
Python basic
 
Data Analysis in Python
Data Analysis in PythonData Analysis in Python
Data Analysis in Python
 
Fcfs Cpu Scheduling With Gantt Chart
Fcfs Cpu Scheduling With Gantt ChartFcfs Cpu Scheduling With Gantt Chart
Fcfs Cpu Scheduling With Gantt Chart
 
NumPy.pptx
NumPy.pptxNumPy.pptx
NumPy.pptx
 
Python for everybody
Python for everybodyPython for everybody
Python for everybody
 
Dart ( 1 )
Dart ( 1 )Dart ( 1 )
Dart ( 1 )
 
Secure Software Development with 3rd Party Dependencies
Secure Software Development with 3rd Party DependenciesSecure Software Development with 3rd Party Dependencies
Secure Software Development with 3rd Party Dependencies
 
Streamlined landscape creation with new Terrain Tools- Unite Copenhagen 2019
Streamlined landscape creation with new Terrain Tools- Unite Copenhagen 2019Streamlined landscape creation with new Terrain Tools- Unite Copenhagen 2019
Streamlined landscape creation with new Terrain Tools- Unite Copenhagen 2019
 
Graph mining ppt
Graph mining pptGraph mining ppt
Graph mining ppt
 
Graph Database Query Languages
Graph Database Query LanguagesGraph Database Query Languages
Graph Database Query Languages
 
NUMPY
NUMPY NUMPY
NUMPY
 
Learn 90% of Python in 90 Minutes
Learn 90% of Python in 90 MinutesLearn 90% of Python in 90 Minutes
Learn 90% of Python in 90 Minutes
 
Basics of Object Oriented Programming in Python
Basics of Object Oriented Programming in PythonBasics of Object Oriented Programming in Python
Basics of Object Oriented Programming in Python
 
Arrays In Python | Python Array Operations | Edureka
Arrays In Python | Python Array Operations | EdurekaArrays In Python | Python Array Operations | Edureka
Arrays In Python | Python Array Operations | Edureka
 
Optimizing MapReduce Job performance
Optimizing MapReduce Job performanceOptimizing MapReduce Job performance
Optimizing MapReduce Job performance
 
Understanding NMAP
Understanding NMAPUnderstanding NMAP
Understanding NMAP
 
Module 4 Enumeration
Module 4   EnumerationModule 4   Enumeration
Module 4 Enumeration
 

Ähnlich wie Cubes light-weight OLAP analytical modelling

Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)Stefan Urbanek
 
Patterns of Enterprise Application Architecture (by example)
Patterns of Enterprise Application Architecture (by example)Patterns of Enterprise Application Architecture (by example)
Patterns of Enterprise Application Architecture (by example)Paulo Gandra de Sousa
 
Dublin Ireland Spark Meetup October 15, 2015
Dublin Ireland Spark Meetup October 15, 2015Dublin Ireland Spark Meetup October 15, 2015
Dublin Ireland Spark Meetup October 15, 2015eddiebaggott
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADtab0ris_1
 
The openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query LanguageThe openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query LanguageNeo4j
 
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!Daniel Cousineau
 
MongoDB for Analytics
MongoDB for AnalyticsMongoDB for Analytics
MongoDB for AnalyticsMongoDB
 
Backbone.js: Run your Application Inside The Browser
Backbone.js: Run your Application Inside The BrowserBackbone.js: Run your Application Inside The Browser
Backbone.js: Run your Application Inside The BrowserHoward Lewis Ship
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analyticsMongoDB
 
Real-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera
Real-Time Analytics with Solr: Presented by Yonik Seeley, ClouderaReal-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera
Real-Time Analytics with Solr: Presented by Yonik Seeley, ClouderaLucidworks
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.GeeksLab Odessa
 
Query for json databases
Query for json databasesQuery for json databases
Query for json databasesBinh Le
 
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...Accumulo Summit
 
Apache doris (incubating) introduction
Apache doris (incubating) introductionApache doris (incubating) introduction
Apache doris (incubating) introductionleanderlee2
 
D3 meetup (Backbone and D3)
D3 meetup (Backbone and D3)D3 meetup (Backbone and D3)
D3 meetup (Backbone and D3)Alpine Data
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & AggregationWebinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & AggregationMongoDB
 

Ähnlich wie Cubes light-weight OLAP analytical modelling (20)

Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)
 
Patterns of Enterprise Application Architecture (by example)
Patterns of Enterprise Application Architecture (by example)Patterns of Enterprise Application Architecture (by example)
Patterns of Enterprise Application Architecture (by example)
 
PoEAA by Example
PoEAA by ExamplePoEAA by Example
PoEAA by Example
 
Dublin Ireland Spark Meetup October 15, 2015
Dublin Ireland Spark Meetup October 15, 2015Dublin Ireland Spark Meetup October 15, 2015
Dublin Ireland Spark Meetup October 15, 2015
 
MongoDB Meetup
MongoDB MeetupMongoDB Meetup
MongoDB Meetup
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADta
 
The openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query LanguageThe openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query Language
 
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
 
MongoDB for Analytics
MongoDB for AnalyticsMongoDB for Analytics
MongoDB for Analytics
 
Backbone.js: Run your Application Inside The Browser
Backbone.js: Run your Application Inside The BrowserBackbone.js: Run your Application Inside The Browser
Backbone.js: Run your Application Inside The Browser
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analytics
 
Real-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera
Real-Time Analytics with Solr: Presented by Yonik Seeley, ClouderaReal-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera
Real-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
 
Query for json databases
Query for json databasesQuery for json databases
Query for json databases
 
Couchbas for dummies
Couchbas for dummiesCouchbas for dummies
Couchbas for dummies
 
Cubes 1.0 Overview
Cubes 1.0 OverviewCubes 1.0 Overview
Cubes 1.0 Overview
 
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
 
Apache doris (incubating) introduction
Apache doris (incubating) introductionApache doris (incubating) introduction
Apache doris (incubating) introduction
 
D3 meetup (Backbone and D3)
D3 meetup (Backbone and D3)D3 meetup (Backbone and D3)
D3 meetup (Backbone and D3)
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & AggregationWebinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
 

Mehr von Stefan Urbanek

Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...Stefan Urbanek
 
New york data brewery meetup #1 – introduction
New york data brewery meetup #1 – introductionNew york data brewery meetup #1 – introduction
New york data brewery meetup #1 – introductionStefan Urbanek
 
Cubes – pluggable model explained
Cubes – pluggable model explainedCubes – pluggable model explained
Cubes – pluggable model explainedStefan Urbanek
 
Cubes – ways of deployment
Cubes – ways of deploymentCubes – ways of deployment
Cubes – ways of deploymentStefan Urbanek
 
Knowledge Management Lecture 4: Models
Knowledge Management Lecture 4: ModelsKnowledge Management Lecture 4: Models
Knowledge Management Lecture 4: ModelsStefan Urbanek
 
Dallas Data Brewery Meetup #2: Data Quality Perception
Dallas Data Brewery Meetup #2: Data Quality PerceptionDallas Data Brewery Meetup #2: Data Quality Perception
Dallas Data Brewery Meetup #2: Data Quality PerceptionStefan Urbanek
 
Dallas Data Brewery - introduction
Dallas Data Brewery - introductionDallas Data Brewery - introduction
Dallas Data Brewery - introductionStefan Urbanek
 
Bubbles – Virtual Data Objects
Bubbles – Virtual Data ObjectsBubbles – Virtual Data Objects
Bubbles – Virtual Data ObjectsStefan Urbanek
 
Knowledge Management Lecture 3: Cycle
Knowledge Management Lecture 3: CycleKnowledge Management Lecture 3: Cycle
Knowledge Management Lecture 3: CycleStefan Urbanek
 
Knowledge Management Lecture 2: Individuals, communities and organizations
Knowledge Management Lecture 2: Individuals, communities and organizationsKnowledge Management Lecture 2: Individuals, communities and organizations
Knowledge Management Lecture 2: Individuals, communities and organizationsStefan Urbanek
 
Knowledge Management Lecture 1: definition, history and presence
Knowledge Management Lecture 1: definition, history and presenceKnowledge Management Lecture 1: definition, history and presence
Knowledge Management Lecture 1: definition, history and presenceStefan Urbanek
 
Open spending as-is 2011-06
Open spending   as-is 2011-06Open spending   as-is 2011-06
Open spending as-is 2011-06Stefan Urbanek
 
Cubes - Lightweight OLAP Framework
Cubes - Lightweight OLAP FrameworkCubes - Lightweight OLAP Framework
Cubes - Lightweight OLAP FrameworkStefan Urbanek
 
Open Data Decentralisation
Open Data DecentralisationOpen Data Decentralisation
Open Data DecentralisationStefan Urbanek
 
Data Cleansing introduction (for BigClean Prague 2011)
Data Cleansing introduction (for BigClean Prague 2011)Data Cleansing introduction (for BigClean Prague 2011)
Data Cleansing introduction (for BigClean Prague 2011)Stefan Urbanek
 
Knowledge Management Introduction
Knowledge Management IntroductionKnowledge Management Introduction
Knowledge Management IntroductionStefan Urbanek
 

Mehr von Stefan Urbanek (18)

StepTalk Introduction
StepTalk IntroductionStepTalk Introduction
StepTalk Introduction
 
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
 
Sepro - introduction
Sepro - introductionSepro - introduction
Sepro - introduction
 
New york data brewery meetup #1 – introduction
New york data brewery meetup #1 – introductionNew york data brewery meetup #1 – introduction
New york data brewery meetup #1 – introduction
 
Cubes – pluggable model explained
Cubes – pluggable model explainedCubes – pluggable model explained
Cubes – pluggable model explained
 
Cubes – ways of deployment
Cubes – ways of deploymentCubes – ways of deployment
Cubes – ways of deployment
 
Knowledge Management Lecture 4: Models
Knowledge Management Lecture 4: ModelsKnowledge Management Lecture 4: Models
Knowledge Management Lecture 4: Models
 
Dallas Data Brewery Meetup #2: Data Quality Perception
Dallas Data Brewery Meetup #2: Data Quality PerceptionDallas Data Brewery Meetup #2: Data Quality Perception
Dallas Data Brewery Meetup #2: Data Quality Perception
 
Dallas Data Brewery - introduction
Dallas Data Brewery - introductionDallas Data Brewery - introduction
Dallas Data Brewery - introduction
 
Bubbles – Virtual Data Objects
Bubbles – Virtual Data ObjectsBubbles – Virtual Data Objects
Bubbles – Virtual Data Objects
 
Knowledge Management Lecture 3: Cycle
Knowledge Management Lecture 3: CycleKnowledge Management Lecture 3: Cycle
Knowledge Management Lecture 3: Cycle
 
Knowledge Management Lecture 2: Individuals, communities and organizations
Knowledge Management Lecture 2: Individuals, communities and organizationsKnowledge Management Lecture 2: Individuals, communities and organizations
Knowledge Management Lecture 2: Individuals, communities and organizations
 
Knowledge Management Lecture 1: definition, history and presence
Knowledge Management Lecture 1: definition, history and presenceKnowledge Management Lecture 1: definition, history and presence
Knowledge Management Lecture 1: definition, history and presence
 
Open spending as-is 2011-06
Open spending   as-is 2011-06Open spending   as-is 2011-06
Open spending as-is 2011-06
 
Cubes - Lightweight OLAP Framework
Cubes - Lightweight OLAP FrameworkCubes - Lightweight OLAP Framework
Cubes - Lightweight OLAP Framework
 
Open Data Decentralisation
Open Data DecentralisationOpen Data Decentralisation
Open Data Decentralisation
 
Data Cleansing introduction (for BigClean Prague 2011)
Data Cleansing introduction (for BigClean Prague 2011)Data Cleansing introduction (for BigClean Prague 2011)
Data Cleansing introduction (for BigClean Prague 2011)
 
Knowledge Management Introduction
Knowledge Management IntroductionKnowledge Management Introduction
Knowledge Management Introduction
 

Kürzlich hochgeladen

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 

Kürzlich hochgeladen (20)

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 

Cubes light-weight OLAP analytical modelling

  • 1. Cubes light-weight OLAP Stefan Urbanek ■ @Stiivi ■ stefan.urbanek@gmail.com ■ July 2012
  • 2. source github.com/Stiivi/cubes documentation packages.python.org/cubes/
  • 3. Overview ■ purpose ■ analytical modelling and OLAP ■ slicing and dicing ■ OLAP server ■ SQL backend
  • 6. aggregation browsing slicing and dicing
  • 7. modelling reporting aggregation browsing
  • 9. ✂ model browser http backends server
  • 12. transactions analysis OLTP OLAP application (operational) data analytical data
  • 13. Model { “name” = “My Model” “description” = .... “cubes” = [...] “dimensions” = [...] } cubes dimensions measures levels, attributes, hierarchy
  • 14. Facts measurable fact fact data cell most detailed information
  • 15. location type time dimensions
  • 16. Dimension ■ provide context for facts ■ used to filter queries or reports ■ control scope of aggregation of facts
  • 17. Hierarchy 2010 May 1st levels
  • 18. Dimension ■ levels and attributes “dimensions” = [ { ■ hierarchy* “name”:”date”, “levels”: ... ■ key attributes }, “hierarchy”: ... ... ■ label attributes ] *partial support for multiple hierarchies
  • 19. label attribute key attribute for links to slices
  • 20. Cube “cubes” = [ { “name”:”contracts”, “dimensions”: [ “date”, “category” ] “measures”: [ ■ dimensions { “name”: “amount”, “label”: “Contract Amount”, ■ measures } “aggregations”: [“sum”] ] }, ... ] *partial support for multiple hierarchies
  • 21. "attributes": [ { "name":"group", "label": "Group code" localizable }, { "name":"group_label", model and attributes "label": "Group", "locales": ["en", "sk"] } ]
  • 25. Aggregation Browser SQL Snowflake SQL Denormalized Some HTTP Data MongoDB Browser Browser Browser Service Browser ? “batteries” that are included
  • 27. Cell
  • 29. cell
  • 30. Path [45,2] [2012, 6] list of level keys
  • 31. 1 load_model("model.json") Application ∑ 3 model.cube("sales") 4 workspace.browser(cube) cubes Aggregation Browser backend 2 create_workspace("sql", model, url="sqlite:///data.sqlite")
  • 34. browser.aggregate(o cell, . drilldown=[9 "sector"]) drill-down
  • 35. for row in result.drilldown: row["amount_sum"] row[q label_attribute] row[k key]
  • 36. received_amount_sum measure aggregation record_count
  • 37. browser.facts(o cell) browser.values(o cell, 9 dimension) browser.cell_details(o cell)
  • 38. Slicing and Dicing ✂
  • 39. ✂ ✂ April 2012 constructi on work construction work in april 2012 type supplier date
  • 40. cut types ✂ point set range [[2010,10], from=[2010,10] [2010] [2010,12]] to=[2010,12]
  • 41. Implicit Hierarchy drilldown
  • 42. whole cube o cell = Cell(cube) browser.aggregate(o cell) Total browser.aggregate(o cell, drilldown=[9 “date”]) 2006 2007 2008 2009 2010 ✂ cut = PointCut(9 “date”, [2010]) o cell = o cell.slice(✂ cut) browser.aggregate(o cell, drilldown=[9 “date”]) Jan Feb Mar Apr March April May ...
  • 43. Drill-down Level . drilldown = [9 "date"] implicit: next from o cell . drilldown = {9 "date": "month"} explicit
  • 45. 2009 2010 Assets Due from Banks 3044 1803 Assets Investments 41012 36012 Assets Loans Outstanding 103657 118104 Assets Nonnegotiable 1202 1123 Assets Other Assets 2247 3071 Assets Other Receivables 984 811 Assets Receivables 176 171 Assets Securities 33 289 Equity Capital Stock 11491 11492 Equity Deferred Amounts 359 313 Equity Other -1683 -3043 Equity Retained Earnings 29870 28793 Liabilities Borrowings 110040 128577 Liabilities Derivative Liabilities 115642 110418 Liabilities Other 57 8 Liabilities Other Liabilities 7321 5454 Liabilities Sold or Lent 2323 998
  • 46. rows = ["item.category", "item.subcategory"] columns = ["year"] measures = ["amount_sum"] table = result.cross_table( rows, columns, measures )
  • 47. Slicer The HTTP OLAP Server ✂
  • 48. Application HTTP JSON Slicer ∑ Aggregation Browser
  • 49. GET /model GET /aggregate GET /values GET /report
  • 50. w logical model configuration data $ slicer serve slicer.ini
  • 51. [server] backend: sql log_level: info [model] path: model.json locales: en,sk [workspace] url: postgres://localhost/database schema: datamart fact_prefix: ft_ dimension_prefix: dm_ w
  • 52. amount GET /aggregate
  • 53. GET aggregate { "cell": [], "drilldown": [], "summary": { "record_count": 62, "amount_sum": 1116860 } }
  • 54. amount ✂ GET /aggregate?cut=date:2010
  • 55. GET aggregate?cut=year:2010 { "cell": [ { "path": ["2010"], "type": "point", "dimension": "year", "level_depth": 1 } ], "drilldown": [], "summary": { "record_count": 31, "amount_sum": 566020 } }
  • 56. GET aggregate?drilldown=year { "cell": [], "total_cell_count": 2, "drilldown": [ { "record_count": 31, "amount_sum": 550840, "year": 2009 }, { "record_count": 31, "amount_sum": 566020, "year": 2010 } ], "summary": { "record_count": 62, "amount_sum": 1116860 } }
  • 57. GET report Content-Type: application/json list of cuts { "cell": [ { "dimension": "date", "type": "range", "from": [2009], "to": [2011,6] } ], "queries": { list of "by_segment": { named queries "query": "aggregate", "drilldown": ["segment"] }, "by_year": { "query": "aggregate", "drilldown": {"date":"year"} } } }
  • 58. SQL Backend What data it works with?
  • 59. or ❄
  • 60. ★ dimensions fact table
  • 61. fact table dimensions
  • 62.
  • 63. Aggregation Browser Browsing Context Snowflake Denormalized or Mapper Mapper denormalized view snowflake ❄
  • 64. logical physical ❄
  • 65. SQL Features ■ does not require DB write access ■ denormalisation ■ denormalised browsing, indexing ■ simple date datatype dimension ■ extraction of date parts during mapping ■ multiple schema support
  • 67. ■ model validation slicer model validate model.json ■ model translation slicer model translate model.json translation.json ■ workspace testing slicer test config.ini ■ denormalization slicer denormalize --materialize --index config.ini
  • 69. ■ formatters for visualisation libraries ■ JavaScript library* help needed ■ backends ■ derived measures *http://github.com/Stiivi/cubes-js
  • 70. Open Data ■ shared repository of models ■ shared repository of dimensions ■ public cubes open Slicer HTTP APIs http://github.com/Stiivi/cubes/wiki
  • 71. stay light Nutrition Facts Serving Size 1 cube Amount Per Serving % Daily Value Total Fat 0g 0% Saturated Fat 0g Trans Fat 0g
  • 72. Thank You source: github.com/Stiivi/cubes documentation: packages.python.org/cubes/ examples: github.com/Stiivi/cubes-examples
  • 74. Transactions Reporting multidimensional object–relational modelling modelling ORM mapping logical model (and mapping) database connection browser database engine workspace
  • 75. Limitations ■ one cut per dimension in a cell ■ logical conjunction of cuts (cut1 AND cut2 AND cut3 ...) ■ dimension-only selection ■ one - default hierarchy ■ some internals are ready for multiple

Hinweis der Redaktion

  1. OLAP and Logical Model, Architecture, Slicing and Dicing, HTTP Server, SQL Backend\n\n
  2. \n
  3. \n
  4. Q: Who is familiar with OLAP?\n
  5. quick setup and reporting\ndoes not cover everything (intentionally)\n
  6. example application - public procurements of slovakia\n
  7. quick setup and reporting\ndoes not cover everything (intentionally)\n
  8. will talk about modelling first, then reporting, then going to mix\n
  9. how it looks like and what it does?\n
  10. FIXME: add slicer tool here\n
  11. not going into details, but just to align terminology and define context\n
  12. not so rare we see creating reports directly from what is available, instead of starting with business needs and tryig to find a way how to derive it from what is available\n
  13. different approach to data use, different needs\nwhile in apps you are focusing on transactions - trans data/oltp, in reporting you are focusing on analysis -> analytical data\nlogically separate (does not have to be physically separate)\n
  14. \n
  15. \n
  16. \n
  17. CONTEXT: where did the sale happened? who signed the contract?\nFILTER: how much was spent for construction work?\nAGGREGATION SCOPE: what was the revenue by country?\n\nused for ordering or sorting\ndefine master-detail relationships\n
  18. \n
  19. \n
  20. provides metadata to easily create apps\n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. what the browser does?\n
  28. aggregating measures\n
  29. \n
  30. aggregation browser has to have concrete backend implementation\n
  31. + bunch of other stuff\n
  32. context\n
  33. before I will talk about aggregation browser, I have to introduce a cell\n
  34. \n
  35. \n
  36. our filter/selection defines the cell\nthis is kind of multidimensional “breadcrumbs”\n
  37. path - taken from file system terminology for easier understanding\nthose are keys\nnote that displayed is level label, not a key\n
  38. ... let’s put it into a picture\n
  39. \n
  40. “aggregation result” was created according to usual report look\n
  41. FIXME: add picture\n
  42. you can specify multiple dimensions and explicit level to be drilled down (for example “month” level of a date dimension)\n
  43. it provides list of records, which are represented as dictionaries \nyou have to find out which one is level attribute or the key\n\n
  44. no need to find the context of dimension of interest\nif not sufficient, one can still fall-back to the manual method\n
  45. \n
  46. facts – get details\nvalues - can be used to create selection boxes, also level can be specified\ncell_details is used for creating the multidimensional breadcrumbs mentioned before - it contains data to humanly describe current context of interest\nordering and pagination is supported\n
  47. what was that “cell” thing?\n
  48. \n
  49. also show hierarchy\n
  50. \n
  51. \n
  52. same drilldown, different cell\n
  53. implicit: raises error if current level is the last one\nexample: you are exploring year 2010 (cell) and would like to see split by year (higher level)\n
  54. \n
  55. \n
  56. \n
  57. \n
  58. \n
  59. \n
  60. \n
  61. \n
  62. just to name a few...\n
  63. \n
  64. \n
  65. \n
  66. \n
  67. \n
  68. \n
  69. \n
  70. \n
  71. \n
  72. powered by sqlalchemy\n
  73. powered by great abstraction framework\nconstruction of SQL statements\n
  74. \n
  75. \n
  76. \n
  77. denormalized\n
  78. thanks to new browser and browsing context it is possible to transparently switch between original snowflake and generated denormalized view (which can be materialized and indexed based on dimension level keys)\n
  79. in which table and which column is the attribute?\n
  80. \n
  81. \n
  82. \n
  83. \n
  84. if someone would like to contribute with his skills, he is more than welcome and I will help\n
  85. so if you have OS app, like Django that more users use, you can publish reporting model for others.\nput your cube in the Wiki\n
  86. \n
  87. MIT license\n
  88. \n
  89. \n
  90. \n