4. How do you know that
the data you are looking at
is the data you are looking at?
5. How fast can a domain expert
(very likely a technical layperson)
get understandable answers to his/her questions
if there are so many ways and places where your
data is stored?
?
6. How do you embrace unknown
and change?
Requirements (questions) are changing, labelling (naming things) is changing, source
systems (structures) are changing, …
A
B
C
A
B
C
2014 2015
??
∑✦ ?
12. Data Processing Data Use
Governance
Technologies
Quality
Management
ETL
Management
Extraction
Discovery and
Acquisition
Cleansing and
Conforming
Analytical
Modeling
Presentation,
Analysis
Language
Master Data
Management
sources
Storage Infrastructure
…
…
15. Transformations
■ how to make data understandable to the users?
■ mappings – structures and mechanisms
■ how to handle manual corrections?
■ metadata based transformations
16. Multidimensional Modeling
■ why? how it can help?
■ conceptual hierarchies, drill-downs
■ ★ star and ❄ snowflake schemas
logical approach, query generators
■ relational database implementation (SQL)
fact and dimension table modelling, stars in a cluster
(as a concept)
17. Slowly Changing Dimensions
■ why? what is it?
■ how to load dimensions
UPSERT on steroids
■ what about not-so-slowly changing dimensions?
25. mETL
MAIN COMPONENTS
All processes start with a source file
from which the data are retrieved.
There are unique types, which all
have their own settings.
After the data is read from the
source, and the transformations
are completed, the finalized record
gets to the Target which will write
and create the file with the final
data.
github.com/ceumicrodata/mETL
: :
34. The New Data Brewery
■ projects helping to solve data warehouse problems
■ curated and incubated catalogue
■ community
knowledge sharing, tool development, professional help
■ complementary to the NumPy/SciPy ecosystem
not competitive – different purpose
b
36. Topics Summary
■ categorical data
■ multi-dimensional modeling
star and snowflake schemas, metadata, dimension modelling, slowly changing dimensions
■ data quality management
approaches, data quality indicators
■ master data management
concepts and their implementation
37. Meetup
■ meet experts in the domain
■ share stories, ask for stories
■ discuss a problem, get feedback
■ get questions answered
?