SlideShare a Scribd company logo
1 of 18
Setup data for fast, error-free analysis in Q
Q R E S E A R C H S O F T W A R E . C O M
W E B I N A R
Other resources for learning about data cleaning
Online Training (within Q: Help > Online Training)
Lots of resources on the wiki (wiki.q-researchsoftware.com):
• Technical detail about how to clean (use Search)
• Video library (more about this coming soon)
We will send you a detailed eBook on data tidying and cleaning
in the next couple of weeks.
support@q-researchsoftware.com (Help > Email Support)
Conceptualizing the data analysis workflow
3
Housekeeping
Imputation
Weighting
Transformation Analysis
Reporting
Production
Tidy raw data set(s)
Data
cleaning
cycle
Data shaping
Clean data
Extracting metadata-rich
source data
 
Metadata is the key to understanding the data
ID Each organization has one value on this variable and no other
organizations have the same value.
Industry The industry classification of the firm.
Shop Agree (A) or disagree (D) that “It is important to shop around”
Understand Agree (A) or disagree (D) that “I understand my company's
communication needs”
Key Agree (A) or disagree (D) that “Communications technology is
key to our business”
Interested Agree (A) or disagree (D) that “I am interested in
communications technology”
Value Agree (A) or disagree (D) that “Value for money is more
important to us than getting the best technology”
Profit ($) An estimate of the gross profit provided by each firm to the
industry (excluding fixed costs). Constructed from a series of
survey questions about the types of products held, usage levels
and bill payments.
# Employees Number of employees of the business
ID Industry Shop Understa
nd
Key Interest Value Profit ($) #
Employees
1 Retail Trade A A A A D 9777.47 12
2 Retail Trade A A A D A 3595.79 12
3 Cult. and Rec.
Services
A A A A D 2660.15 20
4 Retail Trade A A D A A 2303.08 30
5 Manufacturing A D A D D 644.57 6
6 Mining D A A A D 3517.85 99
7 Agr., Forest. &
Fishing
A D A D D 6905.25 8
8 Retail Trade D D A A D 9916.39 60
9 Health &
Community
Services
A A A A A 1855.43 56
10 Property &
Business
Services
A A A A D 765.10 4
11 Communication
Services
D A D D A 838.13 1
12 Manufacturing A A A A A 2303.08 30
13 Manufacturing D D D D D 2151.92 7
14 Manufacturing A A A A D 1263.65 1
Data Metadata
Key bits of metadata in Q
Variable labels
Value labels
Multiple response set information
Missing data
Unique identifiers
 Faster project setup
 Reduce the risk of errors
 Reduce the time to report data
 Helps spotting changes in definitions
Extracting metadata-rich source data
Excel files
.xls or xlsx
SPSS format
.sav
Triple S
.sss
SPSS Dimensions
.mdd SQL databases
Text format
.txt, .tab, .tsv
CSV files
.csv files
Metadata “poor” data“Good” datafile types
 
Search wiki: Setting Up Files
With No Metadata & Excel and
CSV Data File Specifications
Search wiki: SPSS Data File
Specifications
The first aim: Getting a tidy raw data set into your project
Right shapeWrong Shape
 Rows & Columns
 Row = unit of analysis
 Column = variable
 Column has a name
Ideally with:
• Unique identifier
• Associated metadata
Data shaping
Tidy raw data set(s)
Data set reshaping tools in Q
Ways of reshaping
Do entirely by code
(R Data Set)
Do by clicking buttons
Aggregation 
Sorting   Sort columns on the Data tab
Filtering   Filter the whole report and/or Data tab
Deleting   Delete rows on Data tab; if required, export as an SPSS
data file
Partitioning (splitting)   Delete rows on the data tab, then export as an SPSS file.
Repeat with different rows deleted.
Sampling   Create a filter using a random numbers, then see
Deleting.
Stretching 
Stacking   Tools > Stack SPSS .sav Data File (using Tools > Save Data
as SPSS.sav Data File first, if necessary)
Widening (flattening) 
Merging data by case (appending)   Tools > Merge Data Files > Add New cases
Deduping (deduplicating)   Create a new R variable with expression of
duplicated(variableName) and see Deleting
Merging by variable (augmenting)   Tools > Merge Data Files > Add New Variables
String splitting   Create > Variables R Variable
Creating a Distance Matrix   Create > Correlations > Distance Matrix
Data shaping
Stacking
9
ID
Apple
Microsoft
IBM
Apple
Microsoft
IBM
Apple
Microsoft
IBM
1 6 9 7 1 0 0 1 1 0
2 8 7 7 1 0 0 1 0 0
3 0 9 8 0 1 0 0 0 0
4 0 0 0 0 0 0 0 0 0
This brand is
fun
This brand is
exciting
Likelihood to
recommend
Stacking
ID
Apple
Microsoft
IBM
Apple
Microsoft
IBM
Apple
Microsoft
IBM
1 6 9 7 1 0 0 1 1 0
2 8 7 7 1 0 0 1 0 0
3 0 9 8 0 1 0 0 0 0
4 0 0 0 0 0 0 0 0 0
This brand is
fun
This brand is
exciting
Likelihood to
recommend
ID Brand
Likelihood to
recommend
This brand is
fun
This brand is
exciting
1 Apple 6 1 1
1 Microsoft 9 0 1
1 IBM 7 0 0
2 Apple 6 1 1
2 Microsoft 9 0 1
2 IBM 7 0 0
3 Apple 6 1 1
3 Microsoft 9 0 1
3 IBM 7 0 0
4 Apple 6 1 1
4 Microsoft 9 0 1
4 IBM 7 0 0
From: one row per respondent To: one row per brand per respondent
Widening
ID
Apple
Microsoft
IBM
Apple
Microsoft
IBM
Apple
Microsoft
IBM
1 6 9 7 1 0 0 1 1 0
2 8 7 7 1 0 0 1 0 0
3 0 9 8 0 1 0 0 0 0
4 0 0 0 0 0 0 0 0 0
This brand is
fun
This brand is
exciting
Likelihood to
recommend
ID Brand
Likelihood to
recommend
This brand is
fun
This brand is
exciting
1 Apple 6 1 1
1 Microsoft 9 0 1
1 IBM 7 0 0
2 Apple 6 1 1
2 Microsoft 9 0 1
2 IBM 7 0 0
3 Apple 6 1 1
3 Microsoft 9 0 1
3 IBM 7 0 0
4 Apple 6 1 1
4 Microsoft 9 0 1
4 IBM 7 0 0
Widening, which is also known as flattening, is the reverse of stacking.
Widening
Conceptualizing the data analysis workflow
13
1. Wrong Question Type
2. Incorrect Base Size
3. Unusual Values
4. Too-small categories
5. Poor Metadata
6. Multi-variable problems
Tidy raw data set(s)
Data
cleaning
cycle
Data shaping
Importing
source data
Clean data
The Cleaning Cycle
Dirt
#1 Wrong Question Type
#2 Incorrect Base Size
#3 Unusual Values
#4 Too-small Categories
#5 Poor Metadata
#6 Multi-variable problems
Summary Tables
The Cleaning Cycle
Dirt How to detect Cleaning action
#1 Wrong Question Type
• Variables and Questions Tab
• Summary Tables
• Change Question Type &
setting
#2 Incorrect Base Size • Summary Tables
• Recode
• Delete Cases
• Get new data
#3 Unusual Values • Summary Tables
• Recode
• Change values of raw data
• Delete cases
• Back code
#4 Too-small Categories • Summary Tables • Merge
#5 Poor Metadata • Summary Tables
• Manually change
• Search and replace
#6 Multi-variable problems
• Crosstabs
• Sankey Diagrams
• Missing Value Patterns
• Flatlining
• Validation Rules
• Nets
• Recode
• Delete cases
• Get new data
phone.sav
1. Wrong Question Type
2. Incorrect Base Size
3. Unusual Values
4. Too-small categories
5. Poor Metadata
6. Multi-variable problems
Data
cleaning
cycle
Housekeeping
Include questionnaire
numbering
in the Question labels
(makes for quick search)
1
Hide (H-tag)
irrelevant
questions/variables
2
Move questions
to the top/bottom using
the blue buttons
(or see Move Data in the
Automate Menu)
3
Clean variable labels
(Tip: use Find/ Replace
and the asterisk *)
4
Four useful tips for a Tidy Variables and Questions tab
Keep learning more
Q wiki: Basic Workflow For Checking and Cleaning a Project
eBook on Data Tidying and Cleaning – coming soon!
Subscribe to Q blog (on website) – www.q-researchsoftware.com

More Related Content

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Featured

Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationErica Santiago
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellSaba Software
 

Featured (20)

Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
 

Data setup for fast, error free analysis (webinar)

  • 1. Setup data for fast, error-free analysis in Q Q R E S E A R C H S O F T W A R E . C O M W E B I N A R
  • 2. Other resources for learning about data cleaning Online Training (within Q: Help > Online Training) Lots of resources on the wiki (wiki.q-researchsoftware.com): • Technical detail about how to clean (use Search) • Video library (more about this coming soon) We will send you a detailed eBook on data tidying and cleaning in the next couple of weeks. support@q-researchsoftware.com (Help > Email Support)
  • 3. Conceptualizing the data analysis workflow 3 Housekeeping Imputation Weighting Transformation Analysis Reporting Production Tidy raw data set(s) Data cleaning cycle Data shaping Clean data Extracting metadata-rich source data  
  • 4. Metadata is the key to understanding the data ID Each organization has one value on this variable and no other organizations have the same value. Industry The industry classification of the firm. Shop Agree (A) or disagree (D) that “It is important to shop around” Understand Agree (A) or disagree (D) that “I understand my company's communication needs” Key Agree (A) or disagree (D) that “Communications technology is key to our business” Interested Agree (A) or disagree (D) that “I am interested in communications technology” Value Agree (A) or disagree (D) that “Value for money is more important to us than getting the best technology” Profit ($) An estimate of the gross profit provided by each firm to the industry (excluding fixed costs). Constructed from a series of survey questions about the types of products held, usage levels and bill payments. # Employees Number of employees of the business ID Industry Shop Understa nd Key Interest Value Profit ($) # Employees 1 Retail Trade A A A A D 9777.47 12 2 Retail Trade A A A D A 3595.79 12 3 Cult. and Rec. Services A A A A D 2660.15 20 4 Retail Trade A A D A A 2303.08 30 5 Manufacturing A D A D D 644.57 6 6 Mining D A A A D 3517.85 99 7 Agr., Forest. & Fishing A D A D D 6905.25 8 8 Retail Trade D D A A D 9916.39 60 9 Health & Community Services A A A A A 1855.43 56 10 Property & Business Services A A A A D 765.10 4 11 Communication Services D A D D A 838.13 1 12 Manufacturing A A A A A 2303.08 30 13 Manufacturing D D D D D 2151.92 7 14 Manufacturing A A A A D 1263.65 1 Data Metadata
  • 5. Key bits of metadata in Q Variable labels Value labels Multiple response set information Missing data Unique identifiers  Faster project setup  Reduce the risk of errors  Reduce the time to report data  Helps spotting changes in definitions
  • 6. Extracting metadata-rich source data Excel files .xls or xlsx SPSS format .sav Triple S .sss SPSS Dimensions .mdd SQL databases Text format .txt, .tab, .tsv CSV files .csv files Metadata “poor” data“Good” datafile types   Search wiki: Setting Up Files With No Metadata & Excel and CSV Data File Specifications Search wiki: SPSS Data File Specifications
  • 7. The first aim: Getting a tidy raw data set into your project Right shapeWrong Shape  Rows & Columns  Row = unit of analysis  Column = variable  Column has a name Ideally with: • Unique identifier • Associated metadata Data shaping Tidy raw data set(s)
  • 8. Data set reshaping tools in Q Ways of reshaping Do entirely by code (R Data Set) Do by clicking buttons Aggregation  Sorting   Sort columns on the Data tab Filtering   Filter the whole report and/or Data tab Deleting   Delete rows on Data tab; if required, export as an SPSS data file Partitioning (splitting)   Delete rows on the data tab, then export as an SPSS file. Repeat with different rows deleted. Sampling   Create a filter using a random numbers, then see Deleting. Stretching  Stacking   Tools > Stack SPSS .sav Data File (using Tools > Save Data as SPSS.sav Data File first, if necessary) Widening (flattening)  Merging data by case (appending)   Tools > Merge Data Files > Add New cases Deduping (deduplicating)   Create a new R variable with expression of duplicated(variableName) and see Deleting Merging by variable (augmenting)   Tools > Merge Data Files > Add New Variables String splitting   Create > Variables R Variable Creating a Distance Matrix   Create > Correlations > Distance Matrix Data shaping
  • 9. Stacking 9 ID Apple Microsoft IBM Apple Microsoft IBM Apple Microsoft IBM 1 6 9 7 1 0 0 1 1 0 2 8 7 7 1 0 0 1 0 0 3 0 9 8 0 1 0 0 0 0 4 0 0 0 0 0 0 0 0 0 This brand is fun This brand is exciting Likelihood to recommend
  • 10. Stacking ID Apple Microsoft IBM Apple Microsoft IBM Apple Microsoft IBM 1 6 9 7 1 0 0 1 1 0 2 8 7 7 1 0 0 1 0 0 3 0 9 8 0 1 0 0 0 0 4 0 0 0 0 0 0 0 0 0 This brand is fun This brand is exciting Likelihood to recommend ID Brand Likelihood to recommend This brand is fun This brand is exciting 1 Apple 6 1 1 1 Microsoft 9 0 1 1 IBM 7 0 0 2 Apple 6 1 1 2 Microsoft 9 0 1 2 IBM 7 0 0 3 Apple 6 1 1 3 Microsoft 9 0 1 3 IBM 7 0 0 4 Apple 6 1 1 4 Microsoft 9 0 1 4 IBM 7 0 0 From: one row per respondent To: one row per brand per respondent
  • 11. Widening ID Apple Microsoft IBM Apple Microsoft IBM Apple Microsoft IBM 1 6 9 7 1 0 0 1 1 0 2 8 7 7 1 0 0 1 0 0 3 0 9 8 0 1 0 0 0 0 4 0 0 0 0 0 0 0 0 0 This brand is fun This brand is exciting Likelihood to recommend ID Brand Likelihood to recommend This brand is fun This brand is exciting 1 Apple 6 1 1 1 Microsoft 9 0 1 1 IBM 7 0 0 2 Apple 6 1 1 2 Microsoft 9 0 1 2 IBM 7 0 0 3 Apple 6 1 1 3 Microsoft 9 0 1 3 IBM 7 0 0 4 Apple 6 1 1 4 Microsoft 9 0 1 4 IBM 7 0 0 Widening, which is also known as flattening, is the reverse of stacking.
  • 13. Conceptualizing the data analysis workflow 13 1. Wrong Question Type 2. Incorrect Base Size 3. Unusual Values 4. Too-small categories 5. Poor Metadata 6. Multi-variable problems Tidy raw data set(s) Data cleaning cycle Data shaping Importing source data Clean data
  • 14. The Cleaning Cycle Dirt #1 Wrong Question Type #2 Incorrect Base Size #3 Unusual Values #4 Too-small Categories #5 Poor Metadata #6 Multi-variable problems Summary Tables
  • 15. The Cleaning Cycle Dirt How to detect Cleaning action #1 Wrong Question Type • Variables and Questions Tab • Summary Tables • Change Question Type & setting #2 Incorrect Base Size • Summary Tables • Recode • Delete Cases • Get new data #3 Unusual Values • Summary Tables • Recode • Change values of raw data • Delete cases • Back code #4 Too-small Categories • Summary Tables • Merge #5 Poor Metadata • Summary Tables • Manually change • Search and replace #6 Multi-variable problems • Crosstabs • Sankey Diagrams • Missing Value Patterns • Flatlining • Validation Rules • Nets • Recode • Delete cases • Get new data
  • 16. phone.sav 1. Wrong Question Type 2. Incorrect Base Size 3. Unusual Values 4. Too-small categories 5. Poor Metadata 6. Multi-variable problems Data cleaning cycle
  • 17. Housekeeping Include questionnaire numbering in the Question labels (makes for quick search) 1 Hide (H-tag) irrelevant questions/variables 2 Move questions to the top/bottom using the blue buttons (or see Move Data in the Automate Menu) 3 Clean variable labels (Tip: use Find/ Replace and the asterisk *) 4 Four useful tips for a Tidy Variables and Questions tab
  • 18. Keep learning more Q wiki: Basic Workflow For Checking and Cleaning a Project eBook on Data Tidying and Cleaning – coming soon! Subscribe to Q blog (on website) – www.q-researchsoftware.com

Editor's Notes

  1. Hello and welcome to this webinar on setting up your data for fast, error-free analysis in Q. My name is Matt Steele and I’m part of Q’s London-based Customer Success team. Today, we’re exploring the topic of Data Setup. The idea being is that it will make your analysis fast and error free. You can submit questions as we go along with the GoToWebinar system. We’ll be collating and posting the Q&A’s on our website/wiki. We’ll also have a recording of this session in our archive, so you can rewatch any bits if you want to.
  2. The object of today is to inspire you to get the most out of Q. These leave-behind materials are going to be crucial as you explore the ideas and processes that I’m going to showcase today. Importantly there will be an eBook released that will cover off, in detail with step by step instructions, how to perform various tasks in Q. In the meantime, we also have a variety of free materials available - including our support email address.
  3. If you’re here already you know that quantitative research is a process, and that process has a preamble even before this diagram. From the moment you get the data though, it follows a journey. The journey is not completely linear of course – as sometimes you have to go back to do some cleaning after you start your analysis. But getting your data clean from the get-go enables you to do better and faster analysis. "So, we will spend a about a minute on extracting metadata rich files, 8 or so minutes on data shaping, and a good 20 minutes on data cleaning. And I’m going to throw in a couple of HouseKeeping tips for use in Q.
  4. So what is metadata? Metadata is information that is necessary to correctly interpret survey data. Metadata is sometimes referred to as a data dictionary. It tells you what the information in the datafile means. So in this example the metadata on the right is explaining the data on the left. For example, that the first variable ID is a unique identifier variable. Furthermore it tells us that A refers to Agree and D refers to disagree for the middle variables, and the statements they pertain to.
  5. Q loves metadata. And users of Q reap the benefits of datafiles that have good metadata. The better the metadata, the better the automatic setup of the datafile. That’s why we have datafile specifications we recommend you give to your suppliers.
  6. The SPSS .sav format is the file type we see most often for survey data, and it’s an industry standard. If you’re getting survey data in Excel or CSV, you should be asking for it in .sav format. Sometimes data is not available in .sav, and you then need to enter in and do more setup work to atone for the lack of metadata. But if you’re working with survey data, then I don’t see any reason why you shouldn’t be getting it in .sav format (or other good datafile format) at some point.
  7. What we’re aiming for in this first bit is tidy raw data set. That’s the data set (or sets plural) that gets loaded into your Q Project. Market research data is usually already in the right shape. It may be a little “dirty” (hence the black spots here), but generally it’s in the right shape (ie: typically a row is a respondent, a question variable is a column), etc. But sometimes the data needs to be restructured differently.
  8. There are lots of different types of reshaping. In the associated eBook we’ll explain what each of these means. The important thing to realise is that Q can do them all – and can do so in 2 ways. ALL of them can be done by code. That involves bringing in an R dataset in Q. Almost all of it can be done by Q’s Graphic User Interface (ie: button clicking).
  9. The first one we’re going to look at is stacking. A data file is “stacked” when a single respondent’s data appears as multiple cases (i.e., multiple rows in the Data tab). Most commonly, this is because the respondent has provided data about multiple occasions or about different people. In this example, each respondent) rated three brands on likelihood to recommend, etc. A common situation is when we want to generalise a model over brands, such as in Driver Analysis. Or perhaps we’ve captured some diary data and we want to stack the data so it’s based on occasions rather than respondents.
  10. In this example, we need to stack the datafile so each brand becomes a case. Q can convert an un-stacked .sav file into a stacked file, which can then be imported and analyzed within Q in the standard way. **DEMONSTRATE TECHNOLOGY.SAV
  11. Widening, which is also known as flattening, is the reverse of stacking. So in our technology example, it would be the reverse of the situation we saw before. This is an example of something you can achieve via the use of code (ie: an R dataset).
  12. So to convert it into a Tidy Raw Data Set, we need to reshape the data… so that rather than having one column of variables and another of values, we instead have 5 columns of values. **SHOW IN Q
  13. So with reshaping out of the way, now let’s turn out attention to looking at the 6 aspects of “dirt” we can find…. and then how we’re going to clean them in Q. (Don’t read each aloud).
  14. The basic workflow for each is to create Summary Tables. A summary table is a table in the Q report for each question, showing the basic statistic and base size. You then work through the Summary Tables one by one. For really big studies there are more automated approaches. But after showing you this workflow for the next 10 or so minutes, I’ll show you some automated approaches within Q.
  15. They each have their own cleaning action associated with it. I could sit here and explain all these, but they really are best explained through example.
  16. And to do this we’re going to be using phone.sav . It’s a tidy raw dataset – but it’s full of juicy dirt and errors. This phone study is what I would call an old-school study – data captured by face-to-face interviewer and then manual data entry (called data punching back in the day). Here’s a copy of the questionnaire. It’s a complete mess of a datafile – either from bad face-to-face interviewing (data capture) or perhaps bad data entry. When you’re doing this on your own (be it with phone.sav or your own datailfe), I find it useful to have the questionnaire to hand. Now I’m not going to clean the entire datafile today – and I couldn’t anyway in 10-minutes or so. I’m just going to show you how to detect and tackle the main types of dirt… and then if you want to read up more, you can explore the eBook we’ll release and some other links we’ll leave behind. ** GO TO Q
  17. Housekeeping. You should show what you want here.  You’re a much better housekeeper than me I think. Update Cola Tracking - January to December.sav