Data analysis in R for beginners

•

1 gefällt mir•489 views

A fast pace beginners course for those new to R and who want a quick broad demonstration of R under the hood. We cover data ingestion, manipulating data frames, data summary and exploration, interactive visualization, creating dashboards, predictive modelling, and big data integrations.

Daten & Analysen

Data Analysis
in .
for Beginners
Alton Alexander
Data Science Consultant

Why R?
• R is open source – like python not like SAS
• Out of the box R is single machine, in memory
statistical computing engine
– Download from https://www.r-project.org/
• Use an IDE
– R Studio https://www.rstudio.com/
– Revolution Analytics (MSFT)
– Jython (ipython)

Essential Learning Resources
A new book for learning R
Q: What have you tried
and what works?

Topics
• Data ingestion
• Manipulation
• Summary and exploration
• Writing Reports
• Interactive visualization and dashboarding
• Predictive Modeling & Forecasting
• Big Data Integrations

Data ingestion
• Load data
– Load.csv()
– library(RJDBC)
– library(RODBC)

Data Structures and Manipulation
• Another major reason for using R
– Ability to work with data in Data Frames
– Like pandas in python and data tables in SAS
• Reasons for doing data manipulation (munging)
– Feature extraction
– ETL
– Data cleansing
– Pivots, stack/unstack, aggregate, groupby, reshape

Set Theory
SQL joins and
their results
merge, sqldf in R
http://www.r-bloggers.com/manipulating-
data-frames-using-sqldf-a-brief-overview/

Summary and Exploration
• Powerful summary functions for
programmatically quantifying datasets
• Functions include:
– Summary(), hist(), levels(), aggregate()

Interactive Visualization
and Dashboarding
• Shiny from Rstudio
• Like tableau
– Local and server options
• Much more customizable, more coding, no GUI or
click to edit
• But you can bring in powerful libraries to build
web apps comparatively fast

Predictive Modeling & Forecasting
• Examples
– Customer segmentation
• Unsupervised classification
– Marketing mix models
• Explain the coefficients
– Attribution modeling
• Supervised time series of events
– Multivariate testing
• (AB tests with statistical significance, ANOVA)
– Lead scoring
• P2B Models, topic of interest, propensity to buy, expected spend

5 Libraries for Machine Learning
Allowing the machine to capture complexity:
1. gbm [Gradient Boosting Machine]
2. randomForest [Random Forest]
3. e1071 [Support Vector Machines]
Taking advantage of high-cardinality categorical or text-data:
4. glmnet [Lasso and Elastic-Net Regularized Generalized Linear Models]
5. tau [Text Analysis Utilities]

Big Data Integration
• Single laptop is often sufficient
– Millions of rows on a 32GB i7 laptop
• Scale using a larger server
– Often sufficient but has limitations (100s of GB)
• Clustered compute engine
– Algorithm considerations to affect performance

RServer
• For datasets that don’t fit in memory or for
convenience there is a SERVER option
– A shared compute engine
– Shares resources
– Think +100 GB of RAM

Big Data Integration - Frameworks
• H2O.ai
• SparkR
• Revolution Analytics
• In DB processing
– Applying lead score or
segmentation model in
real time
– Spark, teradata, vertica

Get Alton’s FREE Reports!
Go to http://frontanalysis.com/bigdatameetup/
Complete the survey including your email
I’ll email you the two reports:
1. Anonymized Summary of the Survey
2. LinkedIn Job Suggestions for a Utah Data Scientist

Empfohlen

Learning analytics definitions processes potentialFernando Bordignon

Analytics BasicsRebecca Haden

ECDU12 - Leonora Valvo – The Original Social: Facilitating the Right Connecti...ShowGizmo

ECDU12 - Paul Cook – Risking your hybrid event - UKShowGizmo

Ignite Liverpool - Event Hosting For BeginnersAdrian McEwen

The Datafication of HR: Graduating from Metrics to AnalyticsVisier

The Insider's Guide to Workforce AnalyticsVisier

Events Management 101: For Beginners Orly Ballesteros

Empfohlen

Learning analytics definitions processes potentialFernando Bordignon

Analytics BasicsRebecca Haden

ECDU12 - Leonora Valvo – The Original Social: Facilitating the Right Connecti...ShowGizmo

ECDU12 - Paul Cook – Risking your hybrid event - UKShowGizmo

Ignite Liverpool - Event Hosting For BeginnersAdrian McEwen

The Datafication of HR: Graduating from Metrics to AnalyticsVisier

The Insider's Guide to Workforce AnalyticsVisier

Events Management 101: For Beginners Orly Ballesteros

Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...gajnagarg

Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...gajnagarg

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823

Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Riyadh +966572737505 get cytotec

👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh

Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal

Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823

Predicting Loan Approval: A Data Science ProjectBoston Institute of Analytics

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823

Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...gajnagarg

Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Riyadh +966572737505 get cytotec

Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823

CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann

➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823

Discover Why Less is More in B2B Researchmichael115558

Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01

Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Standamitlee9823

2024 State of Marketing Report – by HubspotMarius Sescu

Everything You Need To Know About ChatGPTExpeed Software

Weitere ähnliche Inhalte

Kürzlich hochgeladen

Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...gajnagarg

Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...gajnagarg

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823

Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Riyadh +966572737505 get cytotec

👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh

Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal

Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823

Predicting Loan Approval: A Data Science ProjectBoston Institute of Analytics

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823

Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...gajnagarg

Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Riyadh +966572737505 get cytotec

Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823

CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann

➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823

Discover Why Less is More in B2B Researchmichael115558

Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01

Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Standamitlee9823

Kürzlich hochgeladen (20)

Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...

Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec

👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...

Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -

Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...

Predicting Loan Approval: A Data Science Project

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...

Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...

Abortion pills in Jeddah | +966572737505 | Get Cytotec

Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore

CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK

➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...

Discover Why Less is More in B2B Research

Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...

Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand

Empfohlen

2024 State of Marketing Report – by HubspotMarius Sescu

Everything You Need To Know About ChatGPTExpeed Software

Product Design Trends in 2024 | Teenage EngineeringsPixeldarts

How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow

AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork

Skeleton Culture CodeSkeleton Technologies

PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley

Content Methodology: A Best Practices Report (Webinar)contently

How to Prepare For a Successful Job Search for 2024Albert Qian

Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)

Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal

5 Public speaking tips from TED - Visualized summarySpeakerHub

ChatGPT and the Future of Work - Clark Boyd Clark Boyd

Getting into the tech field. what next Tessa Mero

Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray

How to have difficult conversations Rajiv Jayarajah, MAppComm, ACC

Introduction to Data ScienceChristy Abraham Joy

Time Management & Productivity - Best PracticesVit Horky

The six step guide to practical project managementMindGenius

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36

Empfohlen (20)

2024 State of Marketing Report – by Hubspot

Everything You Need To Know About ChatGPT

Product Design Trends in 2024 | Teenage Engineerings

How Race, Age and Gender Shape Attitudes Towards Mental Health

AI Trends in Creative Operations 2024 by Artwork Flow.pdf

Skeleton Culture Code

PEPSICO Presentation to CAGNY Conference Feb 2024

Content Methodology: A Best Practices Report (Webinar)

How to Prepare For a Successful Job Search for 2024

Social Media Marketing Trends 2024 // The Global Indie Insights

Trends In Paid Search: Navigating The Digital Landscape In 2024

5 Public speaking tips from TED - Visualized summary

ChatGPT and the Future of Work - Clark Boyd

Getting into the tech field. what next

Google's Just Not That Into You: Understanding Core Updates & Search Intent

How to have difficult conversations

Introduction to Data Science

Time Management & Productivity - Best Practices

The six step guide to practical project management

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Data analysis in R for beginners

1. Data Analysis in . for Beginners Alton Alexander Data Science Consultant

2. Why R? • R is open source – like python not like SAS • Out of the box R is single machine, in memory statistical computing engine – Download from https://www.r-project.org/ • Use an IDE – R Studio https://www.rstudio.com/ – Revolution Analytics (MSFT) – Jython (ipython)

3. R studio Download Overview

4. Essential Learning Resources A new book for learning R Q: What have you tried and what works?

5. Topics • Data ingestion • Manipulation • Summary and exploration • Writing Reports • Interactive visualization and dashboarding • Predictive Modeling & Forecasting • Big Data Integrations

6. Demo Options data R studio

7. Data ingestion • Load data – Load.csv() – library(RJDBC) – library(RODBC)

8. Data Structures and Manipulation • Another major reason for using R – Ability to work with data in Data Frames – Like pandas in python and data tables in SAS • Reasons for doing data manipulation (munging) – Feature extraction – ETL – Data cleansing – Pivots, stack/unstack, aggregate, groupby, reshape

9. Set Theory SQL joins and their results merge, sqldf in R http://www.r-bloggers.com/manipulating- data-frames-using-sqldf-a-brief-overview/

10. Summary and Exploration • Powerful summary functions for programmatically quantifying datasets • Functions include: – Summary(), hist(), levels(), aggregate()

11. Interactive Visualization and Dashboarding • Shiny from Rstudio • Like tableau – Local and server options • Much more customizable, more coding, no GUI or click to edit • But you can bring in powerful libraries to build web apps comparatively fast

12. Predictive Modeling & Forecasting • Examples – Customer segmentation • Unsupervised classification – Marketing mix models • Explain the coefficients – Attribution modeling • Supervised time series of events – Multivariate testing • (AB tests with statistical significance, ANOVA) – Lead scoring • P2B Models, topic of interest, propensity to buy, expected spend

13. 5 Libraries for Machine Learning Allowing the machine to capture complexity: 1. gbm [Gradient Boosting Machine] 2. randomForest [Random Forest] 3. e1071 [Support Vector Machines] Taking advantage of high-cardinality categorical or text-data: 4. glmnet [Lasso and Elastic-Net Regularized Generalized Linear Models] 5. tau [Text Analysis Utilities]

14. Big Data Integration • Single laptop is often sufficient – Millions of rows on a 32GB i7 laptop • Scale using a larger server – Often sufficient but has limitations (100s of GB) • Clustered compute engine – Algorithm considerations to affect performance

15. RServer • For datasets that don’t fit in memory or for convenience there is a SERVER option – A shared compute engine – Shares resources – Think +100 GB of RAM

16. Big Data Integration - Frameworks • H2O.ai • SparkR • Revolution Analytics • In DB processing – Applying lead score or segmentation model in real time – Spark, teradata, vertica

17. Why R? In High Demand Nationally

18. Get Alton’s FREE Reports! Go to http://frontanalysis.com/bigdatameetup/ Complete the survey including your email I’ll email you the two reports: 1. Anonymized Summary of the Survey 2. LinkedIn Job Suggestions for a Utah Data Scientist