SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Downloaden Sie, um offline zu lesen
Získáváme, čistíme a
ukládáme data
Digital Humanities, Lekce druhá
Josef Šlerka, Studia nových médií, 15. 10. 2012
ETL (light verze)
Extracting data from outside sources
Transforming it to fit operational needs (which can
include quality levels)
Loading it into the end target (database, more
specifically, operational data store, data mart or data
warehouse)
(viz Wikipedie)
Real-life podle Wiki
1. Cycle initiation
2. Build reference data
3. Extract (from sources)
4. Validate
5. Transform (clean, apply business rules, check for
data integrity, create aggregates or disaggregates)
6. Stage (load into staging tables, if used)
Real-life podle Wiki

7. Audit reports (for example, on compliance with
business rules. Also, in case of failure, helps to
diagnose/repair)
8. Publish (to target tables)
9. Archive
10. Clean up
Extracting
co se vám bude hodit...
Extract
strukturovaná data vs nestrukturovaná
pro DH nejčastěji databáze vs web
web API vs scrapping
lze si vystačit i jen malým znalostmi
statická data vs real-time mohou být zákeřná, ale jde
to řešit
XPATH

XPath, the XML Path Language, is a query language
for selecting nodes from an XML document. In
addition, XPath may be used to compute values (e.g.,
strings, numbers, or Boolean values) from the content
of an XML document. XPath was defined by the World
Wide Web Consortium (W3C)
Jednoduché nástroje
Google Docs (hlavně statická data)
http://drive.google.com
YQL (hlavně statická data)
http://developer.yahoo.com/yql/console/
Yahoo Pipes (hlavně dynamická data)
http://pipes.yahoo.com/pipes/
IFTTT (hlavně dynamická data)
https://ifttt.com/
Ale mocné....


Twitter Archiving Google Spreadsheet TAGS v3
http://mashe.hawksey.info/2012/01/twitter-archive-
tagsv3/
Transforming
Hlavně o čištění a sjednocování dat ...
Google Refine
http://code.google.com/p/google-refine/downloads/list?
can=1
Google Refine is a standalone desktop application
provided by Google for data cleanup and
transformation to other formats. It is similar to
spreadsheet applications (and can work with
spreadsheet file formats), however acts more like
database.
Loading
kam s nimi, když ne do tradiční databáze...
Google Fusion Tables
jednoduché řešení pro běžné uživatele
http://www.google.com/fusiontables/Home/
Web service provided by Google for data
management. Data is stored in multiple tables that
Internet users can view and download. The Web
service provides means for visualizing data with pie
charts, bar charts, lineplots, scatterplots, timelines as
well as geographical maps. Data is exported in a
comma-separated values file format.
A teď ještě jedno
demo....

Weitere ähnliche Inhalte

Was ist angesagt?

Visualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and KibanaVisualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and KibanaObjectRocket
 
Ehr.care system introduction
Ehr.care system introduction Ehr.care system introduction
Ehr.care system introduction xudong_lu
 
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...Andy Petrella
 
Packages for data wrangling データ前処理のためのパッケージ
Packages for data wrangling データ前処理のためのパッケージPackages for data wrangling データ前処理のためのパッケージ
Packages for data wrangling データ前処理のためのパッケージHiroki K
 
Preparing for BIT – IT2301 Database Management Systems 2001e
Preparing for BIT – IT2301 Database Management Systems 2001ePreparing for BIT – IT2301 Database Management Systems 2001e
Preparing for BIT – IT2301 Database Management Systems 2001eGihan Wikramanayake
 
R programming lab 2 - jupyter notebook
R programming lab   2 - jupyter notebookR programming lab   2 - jupyter notebook
R programming lab 2 - jupyter notebookAshwini Mathur
 

Was ist angesagt? (6)

Visualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and KibanaVisualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and Kibana
 
Ehr.care system introduction
Ehr.care system introduction Ehr.care system introduction
Ehr.care system introduction
 
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
 
Packages for data wrangling データ前処理のためのパッケージ
Packages for data wrangling データ前処理のためのパッケージPackages for data wrangling データ前処理のためのパッケージ
Packages for data wrangling データ前処理のためのパッケージ
 
Preparing for BIT – IT2301 Database Management Systems 2001e
Preparing for BIT – IT2301 Database Management Systems 2001ePreparing for BIT – IT2301 Database Management Systems 2001e
Preparing for BIT – IT2301 Database Management Systems 2001e
 
R programming lab 2 - jupyter notebook
R programming lab   2 - jupyter notebookR programming lab   2 - jupyter notebook
R programming lab 2 - jupyter notebook
 

Andere mochten auch

Věštění (s) Wikipedií
Věštění (s) WikipediíVěštění (s) Wikipedií
Věštění (s) WikipediíJosef Šlerka
 
Několik čísel na téma děti (?) a internet (?)
Několik čísel na téma děti (?) a internet (?)Několik čísel na téma děti (?) a internet (?)
Několik čísel na téma děti (?) a internet (?)Josef Šlerka
 
Hranice se stírají
Hranice se stírajíHranice se stírají
Hranice se stírajíJosef Šlerka
 
Strojová cesta do zákazníkovy duše
Strojová cesta do zákazníkovy dušeStrojová cesta do zákazníkovy duše
Strojová cesta do zákazníkovy dušeJosef Šlerka
 
Český a slovenský Twitter pod lupou
Český a slovenský Twitter pod lupouČeský a slovenský Twitter pod lupou
Český a slovenský Twitter pod lupouJosef Šlerka
 
Každý bude jiný... Bohužel...
Každý bude jiný... Bohužel...Každý bude jiný... Bohužel...
Každý bude jiný... Bohužel...Josef Šlerka
 
The Art of Trolling 2.0 For Dummies
The Art of Trolling 2.0 For DummiesThe Art of Trolling 2.0 For Dummies
The Art of Trolling 2.0 For DummiesJosef Šlerka
 
Ways to understand fans - social network analysis
Ways to understand fans - social network analysisWays to understand fans - social network analysis
Ways to understand fans - social network analysisJosef Šlerka
 
The Art of Trolling 2.0
The Art of Trolling 2.0The Art of Trolling 2.0
The Art of Trolling 2.0Josef Šlerka
 
All about Facebook? All about you!
All about Facebook? All about you!All about Facebook? All about you!
All about Facebook? All about you!Josef Šlerka
 
Malý velký svět bublin na Facebooku
Malý velký svět bublin na FacebookuMalý velký svět bublin na Facebooku
Malý velký svět bublin na FacebookuJosef Šlerka
 
Úvod do studia nových médií
Úvod do studia nových médiíÚvod do studia nových médií
Úvod do studia nových médiíJosef Šlerka
 
Některé obecně rozšířené mýty o Facebooku
Některé obecně rozšířené  mýty o FacebookuNěkteré obecně rozšířené  mýty o Facebooku
Některé obecně rozšířené mýty o FacebookuJosef Šlerka
 

Andere mochten auch (18)

Věštění (s) Wikipedií
Věštění (s) WikipediíVěštění (s) Wikipedií
Věštění (s) Wikipedií
 
Social Insider
Social InsiderSocial Insider
Social Insider
 
Několik čísel na téma děti (?) a internet (?)
Několik čísel na téma děti (?) a internet (?)Několik čísel na téma děti (?) a internet (?)
Několik čísel na téma děti (?) a internet (?)
 
Internet of things
Internet of thingsInternet of things
Internet of things
 
Hranice se stírají
Hranice se stírajíHranice se stírají
Hranice se stírají
 
Strojová cesta do zákazníkovy duše
Strojová cesta do zákazníkovy dušeStrojová cesta do zákazníkovy duše
Strojová cesta do zákazníkovy duše
 
Český a slovenský Twitter pod lupou
Český a slovenský Twitter pod lupouČeský a slovenský Twitter pod lupou
Český a slovenský Twitter pod lupou
 
Last.fm
Last.fmLast.fm
Last.fm
 
Každý bude jiný... Bohužel...
Každý bude jiný... Bohužel...Každý bude jiný... Bohužel...
Každý bude jiný... Bohužel...
 
The Art of Trolling 2.0 For Dummies
The Art of Trolling 2.0 For DummiesThe Art of Trolling 2.0 For Dummies
The Art of Trolling 2.0 For Dummies
 
Ways to understand fans - social network analysis
Ways to understand fans - social network analysisWays to understand fans - social network analysis
Ways to understand fans - social network analysis
 
Shall we dance
Shall we danceShall we dance
Shall we dance
 
The Art of Trolling 2.0
The Art of Trolling 2.0The Art of Trolling 2.0
The Art of Trolling 2.0
 
All about Facebook? All about you!
All about Facebook? All about you!All about Facebook? All about you!
All about Facebook? All about you!
 
Just metadata
Just metadataJust metadata
Just metadata
 
Malý velký svět bublin na Facebooku
Malý velký svět bublin na FacebookuMalý velký svět bublin na Facebooku
Malý velký svět bublin na Facebooku
 
Úvod do studia nových médií
Úvod do studia nových médiíÚvod do studia nových médií
Úvod do studia nových médií
 
Některé obecně rozšířené mýty o Facebooku
Některé obecně rozšířené  mýty o FacebookuNěkteré obecně rozšířené  mýty o Facebooku
Některé obecně rozšířené mýty o Facebooku
 

Ähnlich wie Získáváme, čistíme a ukládáme data

Big Data .. Are you ready for the next wave?
Big Data .. Are you ready for the next wave?Big Data .. Are you ready for the next wave?
Big Data .. Are you ready for the next wave?Mahmoud Sabri
 
Populate your Search index, NEST 2016-01
Populate your Search index, NEST 2016-01Populate your Search index, NEST 2016-01
Populate your Search index, NEST 2016-01David Smiley
 
ETL Tools Ankita Dubey
ETL Tools Ankita DubeyETL Tools Ankita Dubey
ETL Tools Ankita DubeyAnkita Dubey
 
Real time data processing frameworks
Real time data processing frameworksReal time data processing frameworks
Real time data processing frameworksIJDKP
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiManish Gupta
 
Summary introduction to data engineering
Summary introduction to data engineeringSummary introduction to data engineering
Summary introduction to data engineeringNovita Sari
 
XML In The Real World - Use Cases For Oracle XMLDB
XML In The Real World - Use Cases For Oracle XMLDBXML In The Real World - Use Cases For Oracle XMLDB
XML In The Real World - Use Cases For Oracle XMLDBMarco Gralike
 
From Data Hell to Bliss: Getting the Most Out of Your Acumatica Data
From Data Hell to Bliss: Getting the Most Out of Your Acumatica DataFrom Data Hell to Bliss: Getting the Most Out of Your Acumatica Data
From Data Hell to Bliss: Getting the Most Out of Your Acumatica DataTim Rodman, (CPA-Inactive)
 
GeoKettle: A powerful open source spatial ETL tool
GeoKettle: A powerful open source spatial ETL toolGeoKettle: A powerful open source spatial ETL tool
GeoKettle: A powerful open source spatial ETL toolThierry Badard
 
Information On Line Transaction Processing
Information On Line Transaction ProcessingInformation On Line Transaction Processing
Information On Line Transaction ProcessingStefanie Yang
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksSlim Baltagi
 
Enabling SQL Access to Data Lakes
Enabling SQL Access to Data LakesEnabling SQL Access to Data Lakes
Enabling SQL Access to Data LakesVasu S
 
Etl design document
Etl design documentEtl design document
Etl design documentsgyazuddin
 
Hadoop and IoT Sinergija 2014
Hadoop and IoT Sinergija 2014Hadoop and IoT Sinergija 2014
Hadoop and IoT Sinergija 2014Darko Marjanovic
 
Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming...
Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming...Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming...
Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming...Trieu Nguyen
 
Eat whatever you can with PyBabe
Eat whatever you can with PyBabeEat whatever you can with PyBabe
Eat whatever you can with PyBabeDataiku
 

Ähnlich wie Získáváme, čistíme a ukládáme data (20)

Big Data .. Are you ready for the next wave?
Big Data .. Are you ready for the next wave?Big Data .. Are you ready for the next wave?
Big Data .. Are you ready for the next wave?
 
Populate your Search index, NEST 2016-01
Populate your Search index, NEST 2016-01Populate your Search index, NEST 2016-01
Populate your Search index, NEST 2016-01
 
ETL DW-RealTime
ETL DW-RealTimeETL DW-RealTime
ETL DW-RealTime
 
ETL Tools Ankita Dubey
ETL Tools Ankita DubeyETL Tools Ankita Dubey
ETL Tools Ankita Dubey
 
Real time data processing frameworks
Real time data processing frameworksReal time data processing frameworks
Real time data processing frameworks
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
 
Summary introduction to data engineering
Summary introduction to data engineeringSummary introduction to data engineering
Summary introduction to data engineering
 
XML In The Real World - Use Cases For Oracle XMLDB
XML In The Real World - Use Cases For Oracle XMLDBXML In The Real World - Use Cases For Oracle XMLDB
XML In The Real World - Use Cases For Oracle XMLDB
 
From Data Hell to Bliss: Getting the Most Out of Your Acumatica Data
From Data Hell to Bliss: Getting the Most Out of Your Acumatica DataFrom Data Hell to Bliss: Getting the Most Out of Your Acumatica Data
From Data Hell to Bliss: Getting the Most Out of Your Acumatica Data
 
GeoKettle: A powerful open source spatial ETL tool
GeoKettle: A powerful open source spatial ETL toolGeoKettle: A powerful open source spatial ETL tool
GeoKettle: A powerful open source spatial ETL tool
 
Information On Line Transaction Processing
Information On Line Transaction ProcessingInformation On Line Transaction Processing
Information On Line Transaction Processing
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
 
Enabling SQL Access to Data Lakes
Enabling SQL Access to Data LakesEnabling SQL Access to Data Lakes
Enabling SQL Access to Data Lakes
 
Etl design document
Etl design documentEtl design document
Etl design document
 
ETL
ETL ETL
ETL
 
notes
notesnotes
notes
 
Lighty
LightyLighty
Lighty
 
Hadoop and IoT Sinergija 2014
Hadoop and IoT Sinergija 2014Hadoop and IoT Sinergija 2014
Hadoop and IoT Sinergija 2014
 
Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming...
Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming...Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming...
Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming...
 
Eat whatever you can with PyBabe
Eat whatever you can with PyBabeEat whatever you can with PyBabe
Eat whatever you can with PyBabe
 

Mehr von Josef Šlerka

Znaky, znaky, modely
Znaky, znaky, modelyZnaky, znaky, modely
Znaky, znaky, modelyJosef Šlerka
 
LLM a mixed methods v humanitních vědách
LLM a mixed methods v humanitních vědáchLLM a mixed methods v humanitních vědách
LLM a mixed methods v humanitních vědáchJosef Šlerka
 
Vliv AI na mediální trh
Vliv AI na mediální trhVliv AI na mediální trh
Vliv AI na mediální trhJosef Šlerka
 
Informační věda - Pravděpodobnosti
Informační věda - PravděpodobnostiInformační věda - Pravděpodobnosti
Informační věda - PravděpodobnostiJosef Šlerka
 
Informacni veda: Pocitace
Informacni veda: PocitaceInformacni veda: Pocitace
Informacni veda: PocitaceJosef Šlerka
 
Inforamační věda: Algoritmus
Inforamační věda: AlgoritmusInforamační věda: Algoritmus
Inforamační věda: AlgoritmusJosef Šlerka
 
Co je to datova novinarina
Co je to datova novinarinaCo je to datova novinarina
Co je to datova novinarinaJosef Šlerka
 
Algoritmy a sociální sítě - stručný úvod
Algoritmy a sociální sítě - stručný úvodAlgoritmy a sociální sítě - stručný úvod
Algoritmy a sociální sítě - stručný úvodJosef Šlerka
 
Parallel Polis Revisited: Way from concept of Parallel Polis to Distributed R...
Parallel Polis Revisited: Way from concept of Parallel Polis to Distributed R...Parallel Polis Revisited: Way from concept of Parallel Polis to Distributed R...
Parallel Polis Revisited: Way from concept of Parallel Polis to Distributed R...Josef Šlerka
 
Dezinformační weby a zpravodajství v ČR
Dezinformační weby a zpravodajství v ČRDezinformační weby a zpravodajství v ČR
Dezinformační weby a zpravodajství v ČRJosef Šlerka
 
INFOWAR IN CZECH REPUBLIC
INFOWAR IN CZECH REPUBLICINFOWAR IN CZECH REPUBLIC
INFOWAR IN CZECH REPUBLICJosef Šlerka
 
Česká média dnes aneb Pokus o kontext k aktuální debatě
Česká média dnes aneb Pokus o kontext k aktuální debatěČeská média dnes aneb Pokus o kontext k aktuální debatě
Česká média dnes aneb Pokus o kontext k aktuální debatěJosef Šlerka
 
Svět viděný cizíma očima
Svět viděný cizíma očimaSvět viděný cizíma očima
Svět viděný cizíma očimaJosef Šlerka
 
Do Birds of a Feather Flock Together?
Do Birds of a Feather Flock Together?Do Birds of a Feather Flock Together?
Do Birds of a Feather Flock Together?Josef Šlerka
 
Projekt Navigátor - datová část
Projekt Navigátor - datová částProjekt Navigátor - datová část
Projekt Navigátor - datová částJosef Šlerka
 
Stručná zpráva o jednom experimentu
Stručná zpráva o jednom experimentuStručná zpráva o jednom experimentu
Stručná zpráva o jednom experimentuJosef Šlerka
 
Wikipedie ve službách zla?!
Wikipedie ve službách zla?!Wikipedie ve službách zla?!
Wikipedie ve službách zla?!Josef Šlerka
 

Mehr von Josef Šlerka (20)

Znaky, znaky, modely
Znaky, znaky, modelyZnaky, znaky, modely
Znaky, znaky, modely
 
LLM a mixed methods v humanitních vědách
LLM a mixed methods v humanitních vědáchLLM a mixed methods v humanitních vědách
LLM a mixed methods v humanitních vědách
 
Vliv AI na mediální trh
Vliv AI na mediální trhVliv AI na mediální trh
Vliv AI na mediální trh
 
Informační věda - Pravděpodobnosti
Informační věda - PravděpodobnostiInformační věda - Pravděpodobnosti
Informační věda - Pravděpodobnosti
 
Informacni veda: Pocitace
Informacni veda: PocitaceInformacni veda: Pocitace
Informacni veda: Pocitace
 
Inforamační věda: Algoritmus
Inforamační věda: AlgoritmusInforamační věda: Algoritmus
Inforamační věda: Algoritmus
 
Co je to datova novinarina
Co je to datova novinarinaCo je to datova novinarina
Co je to datova novinarina
 
Algoritmy a sociální sítě - stručný úvod
Algoritmy a sociální sítě - stručný úvodAlgoritmy a sociální sítě - stručný úvod
Algoritmy a sociální sítě - stručný úvod
 
Atlas konspirací
Atlas konspiracíAtlas konspirací
Atlas konspirací
 
Parallel Polis Revisited: Way from concept of Parallel Polis to Distributed R...
Parallel Polis Revisited: Way from concept of Parallel Polis to Distributed R...Parallel Polis Revisited: Way from concept of Parallel Polis to Distributed R...
Parallel Polis Revisited: Way from concept of Parallel Polis to Distributed R...
 
Dezinformační weby a zpravodajství v ČR
Dezinformační weby a zpravodajství v ČRDezinformační weby a zpravodajství v ČR
Dezinformační weby a zpravodajství v ČR
 
INFOWAR IN CZECH REPUBLIC
INFOWAR IN CZECH REPUBLICINFOWAR IN CZECH REPUBLIC
INFOWAR IN CZECH REPUBLIC
 
Česká média dnes aneb Pokus o kontext k aktuální debatě
Česká média dnes aneb Pokus o kontext k aktuální debatěČeská média dnes aneb Pokus o kontext k aktuální debatě
Česká média dnes aneb Pokus o kontext k aktuální debatě
 
Svět viděný cizíma očima
Svět viděný cizíma očimaSvět viděný cizíma očima
Svět viděný cizíma očima
 
Do Birds of a Feather Flock Together?
Do Birds of a Feather Flock Together?Do Birds of a Feather Flock Together?
Do Birds of a Feather Flock Together?
 
Projekt Navigátor - datová část
Projekt Navigátor - datová částProjekt Navigátor - datová část
Projekt Navigátor - datová část
 
AI a žurnalistika
AI a žurnalistikaAI a žurnalistika
AI a žurnalistika
 
Stručná zpráva o jednom experimentu
Stručná zpráva o jednom experimentuStručná zpráva o jednom experimentu
Stručná zpráva o jednom experimentu
 
Volba a metoda
Volba a metodaVolba a metoda
Volba a metoda
 
Wikipedie ve službách zla?!
Wikipedie ve službách zla?!Wikipedie ve službách zla?!
Wikipedie ve službách zla?!
 

Kürzlich hochgeladen

Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxAmita Gupta
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 

Kürzlich hochgeladen (20)

Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 

Získáváme, čistíme a ukládáme data

  • 1. Získáváme, čistíme a ukládáme data Digital Humanities, Lekce druhá Josef Šlerka, Studia nových médií, 15. 10. 2012
  • 2. ETL (light verze) Extracting data from outside sources Transforming it to fit operational needs (which can include quality levels) Loading it into the end target (database, more specifically, operational data store, data mart or data warehouse) (viz Wikipedie)
  • 3. Real-life podle Wiki 1. Cycle initiation 2. Build reference data 3. Extract (from sources) 4. Validate 5. Transform (clean, apply business rules, check for data integrity, create aggregates or disaggregates) 6. Stage (load into staging tables, if used)
  • 4. Real-life podle Wiki 7. Audit reports (for example, on compliance with business rules. Also, in case of failure, helps to diagnose/repair) 8. Publish (to target tables) 9. Archive 10. Clean up
  • 5. Extracting co se vám bude hodit...
  • 6. Extract strukturovaná data vs nestrukturovaná pro DH nejčastěji databáze vs web web API vs scrapping lze si vystačit i jen malým znalostmi statická data vs real-time mohou být zákeřná, ale jde to řešit
  • 7. XPATH XPath, the XML Path Language, is a query language for selecting nodes from an XML document. In addition, XPath may be used to compute values (e.g., strings, numbers, or Boolean values) from the content of an XML document. XPath was defined by the World Wide Web Consortium (W3C)
  • 8. Jednoduché nástroje Google Docs (hlavně statická data) http://drive.google.com YQL (hlavně statická data) http://developer.yahoo.com/yql/console/ Yahoo Pipes (hlavně dynamická data) http://pipes.yahoo.com/pipes/ IFTTT (hlavně dynamická data) https://ifttt.com/
  • 9. Ale mocné.... Twitter Archiving Google Spreadsheet TAGS v3 http://mashe.hawksey.info/2012/01/twitter-archive- tagsv3/
  • 10. Transforming Hlavně o čištění a sjednocování dat ...
  • 11. Google Refine http://code.google.com/p/google-refine/downloads/list? can=1 Google Refine is a standalone desktop application provided by Google for data cleanup and transformation to other formats. It is similar to spreadsheet applications (and can work with spreadsheet file formats), however acts more like database.
  • 12. Loading kam s nimi, když ne do tradiční databáze...
  • 13. Google Fusion Tables jednoduché řešení pro běžné uživatele http://www.google.com/fusiontables/Home/ Web service provided by Google for data management. Data is stored in multiple tables that Internet users can view and download. The Web service provides means for visualizing data with pie charts, bar charts, lineplots, scatterplots, timelines as well as geographical maps. Data is exported in a comma-separated values file format.
  • 14. A teď ještě jedno demo....