SlideShare a Scribd company logo
1 of 8
Amu Prabhjot Singh 10BM60011
 Divya Hamirwasia 10BM60025
   an interactive data transformation tool
    developed by the Stanford Visualization
    Group.
   allows direct manipulation of visual data
   provides automatic suggestions for relevant
    transformations
   used in activities like reformatting data values
    and formats, integrating data from multiple
    sources, missing values etc
   use of Wrangler reduces the specification
    time significantly
   When the user selects any data, applicable transformations are
    suggested by the tool based on the current context of interaction
   Data wrangler uses a modeling technique to enumerate and rate the
    possible transformations
   This model combines user's inputs with diversity, frequency and
    specification difficulty of applicable transform types
   Wrangler provides short natural language descriptions of the
    transforms and also provides the visual previews of the transform
    results
   This helps analysts to assess the viable transforms quickly
   Wrangler's interactive history viewer records and shows the step of
    transforms applied on the data set so as to facilitate reuse.
   Wrangler scripts can be run in a web browser using JavaScript or
    Python code
   underlying declarative data transformation language
   language consists of 8 classes of transformations
    ◦ Map
         One to zero
         One to One
         One to Many
    ◦ Look ups and Joins
    ◦ Reshape
         Fold
         unfold
    ◦ Positional
         Fill
         Lag
    ◦    Sorting
    ◦    Aggregation
    ◦   Key Generation
    ◦   Schema Transforms
   This is the example data available with data
    wrangler.
   House crime data from the U.S. Bureau of
    Justice Statistics
   Csv format data
User interactions

                                        Inferring transform
 Current working                            parameters
    transform

                                       Generating candidate
                       DATA WRANGLER       transforms
 Data descriptions

                                        Ranking the results

Corpus of historical
  usage statistics
   GETTING STARTED
    ◦ Browser based tool: http://vis.stanford.edu/wrangler/
   DATA ENTRY
    ◦ copy and paste the data to be wrangled into the input window.
    ◦ Input format : csv files, tsv files and manual entry
   TRANSFORMS
     • Cut                              • Merge
     • Delete                           • Promote
     • Drop                             • Split
     • Edit                             • Translate
     • Extract                          • Transpose
     • Fill                             • Unfold
     • Fold
   OUTPUT
    Two types of outputs:
    ◦ Data Output.xlsx
       Csv, tsv, row oriented JSON, column oriented JSON, look up tables
    ◦ Script
       Python, java script
   helps to speed up the process of data
    manipulation
   helps managers to spend more time analyzing
    and learning from their data rather than
    spending much of the time just rearranging it
   allows interactive transformation of messy, real-
    world data and export data for use in
    Excel, R, Tableau, Protovis etc
   LIMITATION: data containing more than 40
    columns and 1000 rows cannot be wrangled

More Related Content

Similar to DataWrangler @VGSOM

Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons
George Stathis
 

Similar to DataWrangler @VGSOM (20)

Scalable And Incremental Data Profiling With Spark
Scalable And Incremental Data Profiling With SparkScalable And Incremental Data Profiling With Spark
Scalable And Incremental Data Profiling With Spark
 
Potter’S Wheel
Potter’S WheelPotter’S Wheel
Potter’S Wheel
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons
 
DA_MAP
DA_MAPDA_MAP
DA_MAP
 
Document Databases & RavenDB
Document Databases & RavenDBDocument Databases & RavenDB
Document Databases & RavenDB
 
Fyp presentation 2 (SQL Converter)
Fyp presentation 2 (SQL Converter)Fyp presentation 2 (SQL Converter)
Fyp presentation 2 (SQL Converter)
 
Big data meet_up_08042016
Big data meet_up_08042016Big data meet_up_08042016
Big data meet_up_08042016
 
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions ScaleZeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
 
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions ScaleZeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
 
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
 
20160317 - PAZUR - PowerBI & R
20160317  - PAZUR - PowerBI & R20160317  - PAZUR - PowerBI & R
20160317 - PAZUR - PowerBI & R
 
Netflix Edge Engineering Open House Presentations - June 9, 2016
Netflix Edge Engineering Open House Presentations - June 9, 2016Netflix Edge Engineering Open House Presentations - June 9, 2016
Netflix Edge Engineering Open House Presentations - June 9, 2016
 
Apache Spark Streaming
Apache Spark StreamingApache Spark Streaming
Apache Spark Streaming
 
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
 
Wrangler
WranglerWrangler
Wrangler
 
U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for Developers
 
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
 
Analyzing Semi-Structured Data At Volume In The Cloud
Analyzing Semi-Structured Data At Volume In The CloudAnalyzing Semi-Structured Data At Volume In The Cloud
Analyzing Semi-Structured Data At Volume In The Cloud
 
xGem Data Stream Processing
xGem Data Stream ProcessingxGem Data Stream Processing
xGem Data Stream Processing
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
 

Recently uploaded

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 

Recently uploaded (20)

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 

DataWrangler @VGSOM

  • 1. Amu Prabhjot Singh 10BM60011 Divya Hamirwasia 10BM60025
  • 2. an interactive data transformation tool developed by the Stanford Visualization Group.  allows direct manipulation of visual data  provides automatic suggestions for relevant transformations  used in activities like reformatting data values and formats, integrating data from multiple sources, missing values etc  use of Wrangler reduces the specification time significantly
  • 3. When the user selects any data, applicable transformations are suggested by the tool based on the current context of interaction  Data wrangler uses a modeling technique to enumerate and rate the possible transformations  This model combines user's inputs with diversity, frequency and specification difficulty of applicable transform types  Wrangler provides short natural language descriptions of the transforms and also provides the visual previews of the transform results  This helps analysts to assess the viable transforms quickly  Wrangler's interactive history viewer records and shows the step of transforms applied on the data set so as to facilitate reuse.  Wrangler scripts can be run in a web browser using JavaScript or Python code
  • 4. underlying declarative data transformation language  language consists of 8 classes of transformations ◦ Map  One to zero  One to One  One to Many ◦ Look ups and Joins ◦ Reshape  Fold  unfold ◦ Positional  Fill  Lag ◦ Sorting ◦ Aggregation ◦ Key Generation ◦ Schema Transforms
  • 5. This is the example data available with data wrangler.  House crime data from the U.S. Bureau of Justice Statistics  Csv format data
  • 6. User interactions Inferring transform Current working parameters transform Generating candidate DATA WRANGLER transforms Data descriptions Ranking the results Corpus of historical usage statistics
  • 7. GETTING STARTED ◦ Browser based tool: http://vis.stanford.edu/wrangler/  DATA ENTRY ◦ copy and paste the data to be wrangled into the input window. ◦ Input format : csv files, tsv files and manual entry  TRANSFORMS • Cut • Merge • Delete • Promote • Drop • Split • Edit • Translate • Extract • Transpose • Fill • Unfold • Fold  OUTPUT Two types of outputs: ◦ Data Output.xlsx  Csv, tsv, row oriented JSON, column oriented JSON, look up tables ◦ Script  Python, java script
  • 8. helps to speed up the process of data manipulation  helps managers to spend more time analyzing and learning from their data rather than spending much of the time just rearranging it  allows interactive transformation of messy, real- world data and export data for use in Excel, R, Tableau, Protovis etc  LIMITATION: data containing more than 40 columns and 1000 rows cannot be wrangled