SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
R-Package DescTools
Why and where to go?
Andri Signorell, Helsana Health Sciences,
Zurich R-Group 21.01.2016
Randomized clinical trials (RCT)
do not represent the reality in health care
• Population included in RCT does not correspond to
the population finally receiving the treatment
2Andri Signorell, 21.01.2016
Only 1/3 of the ultimatlely treated
people would at all fulfill the inclusion
criteria
Elderly underrepresented in
clinical trials
Medication of one patient…
Is this evidence-based medicine?
3
Real example from
our database:
Mrs. G. H. in G.
received in 2013
drugs with
101 different agents
(ATC-Codes)
in total
533 prescriptions
Andri Signorell, 21.01.2016
Unnötige Herzkatheteruntersuchungen in der
Schweiz
ni. Mit einem Herzkatheter können beim Patienten
gefährliche Verschlüsse in den Herzkranzarterien
nachgewiesen und behoben werden. Weil die
Untersuchung aber teuer, invasiv und nicht frei von
Komplikationen ist, sollte sie nur bei begründetem
Verdacht auf Engnisse durchgeführt werden – so sehen es
die Richtlinien vor. Wird das in der Schweiz befolgt?
Dieser Frage sind Forscher in einer Studie nachgegangen.
Ihre vor kurzem in «Plos One» veröffentlichten Resultate
legen nahe, dass drei von zehn Herzkathetern unnötig
sind. (NZZ, 5.3.2015)
Zeichnung: Felix Schaad
Andri Signorell, 21.01.2016
Orders of magnitude
• Analytical DataWareHouse (TeraData),
updated daily and in a bitemporal history
• 492 tables und 7494 attributes
• 1'468'893 insured in 2014
• complete treatment information since ~ 2005
• 201'875'131 claims with all in all
949'392'044 detailed positions
• Analysed with
Andri Signorell, 21.01.2016
Where's the pain point?
Cross-Industry Standard Process
for Data-Mining
Shearer C., The CRISP-DM model: the new blueprint for
data mining, J Data Warehousing (2000); 5:13—22.
80% of the analysts ressources
are lost for data understanding
and preparation – … and no one
is doing something about it!
Andri Signorell, 21.01.2016
Users, even expert statisticians, do not always
screen the data.
B. D. Ripley, Robust statistics (2004)
Andri Signorell, 21.01.2016
Get the Right Tool for the Job!
• Datasets with 150
Variablen, 500’000 rows
not unusal
• R might not always be
optimal for this order of
magnitude (performance,
RAM)
• Programming paradigm let
grow the screening code
and make it confusing!
Andri Signorell, 21.01.2016
DescTools focus
• provide elaborated descriptive routines
– numeric, factor, logical, table, numeric ~ factor, ...
– data.frame, formula interface
• integrate descriptive plots
• easy output to MS-Word document
> Desc(d.pizza$temperature) # describe single variable
> wrd <- GetNewWrd()
> Desc(d.pizza, wrd=wrd) # describe data.frame and send
# it directly to Word
> Desc(. ~ driver, d.pizza)
> Desc(driver ~ ., d.pizza)
Andri Signorell, 21.01.2016
Describe numeric
> summary(d.pizza$temperature) # base R
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
19.30 42.22 50.00 47.94 55.30 64.80 40
> describe(d.pizza$temperature) # library(Hmisc)
d.pizza$temperature
n missing unique Info Mean .05 .10 .25 .50 .75 .90 .95
1170 39 375 1 47.94 26.70 33.29 42.23 50.00 55.30 58.80 60.50
lowest : 19.30 19.40 20.00 20.20 20.35, highest: 63.80 64.10 64.60 64.70 64.80
> Desc(d.pizza$temperature) # library(DescTools)
--------------------------------------------------
d.pizza$temperature (numeric)
length n NAs unique 0s mean meanSE
1'210 1'170 40 375 0 47.937 0.291
.05 .10 .25 median .75 .90 .95
26.700 33.290 42.225 50 55.300 58.800 60.500
rng sd vcoef mad IQR skew kurt
45.500 9.938 0.207 9.192 13.075 -0.842 0.051
lowest : 19.3, 19.4, 20, 20.2 (2), 20.35
highest: 63.8, 64.1, 64.6, 64.7, 64.8
Screening-Fragen:
• What happens at the edges?
• Are there Missings?
• Are all elements unique?
• Has 0 been misused as NA?
Andri Signorell, 21.01.2016
• Base R
plot(d.pizza$temperature)
• DescTools
plot(Desc(d.pizza$temperature))
Visualization excellence …
… is that which gives to the viewer the greatest number of ideas in the shortest
time with the least ink in the smallest space.
… requires telling the truth about the data.
Edward Tufte The Visual Display of Quantitative Information and Envisioning Information, Graphics Press, PO Box 430, Cheshire, CT 06410.
Andri Signorell, 21.01.2016
Describe table
> tab <- table(d.pizza$driver, d.pizza$area)
> summary(tab)
Number of cases in table: 1194
Number of factors: 2
Test for independence of all factors:
Chisq = 1009.5, df = 12, p-value = 1.697e-208
> describe(tab)
tab
3 Variables 7 Observations
----------------------------------------------------
Brent
n missing unique Info Mean
7 0 7 1 67.57
6 19 29 42 72 128 177
Frequency 1 1 1 1 1 1 1
% 14 14 14 14 14 14 14
----------------------------------------------------
Camden
n missing unique Info Mean
7 0 7 1 48.71
1 4 19 41 47 87 142
Frequency 1 1 1 1 1 1 1
% 14 14 14 14 14 14 14
----------------------------------------------------
...
base R: reduced to the limits…
Hmisc:
Oups! Missinterpreted…
Andri Signorell, 21.01.2016
> tab <- as.table(apply(HairEyeColor, c(1,2), sum))[
+ , c("Brown","Hazel","Green","Blue")]
> (z <- Desc(tab, row.vars=c(3, 1), rfrq="011",
plotit=FALSE, main="Hair ~ Eye"))
Hair ~ Eye
Summary:
n: 592, rows: 4, columns: 4
Pearson's Chi-squared test:
X-squared = 138.29, df = 9, p-value < 2.2e-16
Likelihood Ratio:
X-squared = 146.44, df = 9, p-value < 2.2e-16
Mantel-Haenszel Chi-squared:
X-squared = 109.64, df = 1, p-value < 2.2e-16
Phi-Coefficient 0.483
Contingency Coeff. 0.435
Cramer's V 0.279
Eye
Brown Hazel Green Blue Sum
Hair
freq Black 68 15 5 20 108
Brown 119 54 29 84 286
Red 26 14 14 17 71
Blond 7 10 16 94 127
Sum 220 93 64 215 592
p.row Black 63% 13.9% 4.6% 18.5% .
Brown 41.6% 18.9% 10.1% 29.4% .
Red 36.6% 19.7% 19.7% 23.9% .
Blond 5.5% 7.9% 12.6% 74% .
Sum 37.2% 15.7% 10.8% 36.3% .
p.col Black 30.9% 16.1% 7.8% 9.3% 18.2%
Brown 54.1% 58.1% 45.3% 39.1% 48.3%
Red 11.8% 15.1% 21.9% 7.9% 12%
Blond 3.2% 10.8% 25% 43.7% 21.5%
Sum . . . . .
> # do the plot by hand, while setting the colours
> cols1 <- SetAlpha(c("sienna4", "burlywood",
"chartreuse3", "slategray1"), 0.6)
> cols2 <- SetAlpha(c("moccasin", "salmon1", "wheat3",
"gray32"), 0.8)
> plot(z, col1=cols1, col2=cols2, horiz=FALSE)
Andri Signorell, 21.01.2016
Describe factors in Word
Desc(d.pizza$driver, wrd=GetNewWrd())
Andri Signorell, 21.01.2016
Summary:
n pairs: 768, valid: 768 (100%), missings: 0 (0%), groups: 2
neg pos Total
mean 31.19 37.07 33.24
median 27.00 36.00 29.00
sd 11.67 10.97 11.76
IQR 14.00 16.00 17.00
n 500 268 768
np 65.1% 34.9% 100%
NAs 0 0 0
0s 0 0 0
Kruskal-Wallis rank sum test:
Kruskal-Wallis chi-squared = 73.253, df = 1, p-value < 2.2e-16
Proportions of diabetes in the quantiles of age:
Q1 Q2 Q3 Q4 Q5
neg 86.7% 76.1% 57% 54.3% 46.8%
pos 13.3% 23.9% 43% 45.7% 53.2%
> Desc(diabetes ~ age, data=d.pima,
digits=2, breaks=5, margin=TRUE, conf.level=0.90) factor ~ numeric
further:
factor ~ factor
numeric ~ factor
numeric ~ numeric
Andri Signorell, 21.01.2016
+ ~ 440 Functions
• Statistical functions and Confidence Intervals
Skew, Kurt, CramerV, SomersDelta, CohenKappa, HuberM, MeanCI,
BinomCI, …
• Additional Tests not found in base R
HotellingsT2Test, JarqueBeraTest, BreslowDayTest, DurbinWatsonTest,
LeveneTest, ScheffeTest, …
• Date functions
Today, AddMonths, Day, Month, Year, Weekday, IsWeekend, Zodiac, …
• String functions
StrAlign, StrTrim, StrDist, StrCountW, StrVal, …
• Operators and other
%()%, Untable, CollapseTable, Dummy, Large, Small, …
Andri Signorell, 21.01.2016
Pain Point «Speed»
> x <- runif(1e8)
> system.time(e1071::kurtosis(x))
user system elapsed
5.67 0.55 6.21
> system.time(DescTools::Kurt(x))
user system elapsed
0.47 0.00 0.47
http://www.noamross.net/blog/2013/4/25/faster-talk.html
-> Get a Bigger Computer
Andri Signorell, 21.01.2016
Andri Signorell, 21.01.2016
Pain point «Import»
R Data Import/Export
This is a guide to importing and exporting data
to and from R.
This manual is for R, version 3.1.2 (2014-10-31).
Copyright © 2000–2014 R Core Team
Andri Signorell, 21.01.2016
DescTools::XLGetRange()
• Import directly from XL
Andri Signorell, 21.01.2016
Can one be a good data analyst without being a half-good programmer?
The short answer to that is, 'No.' The long answer to that is, 'No.'
-- Frank Harrell 1999 S-PLUS User Conference, New Orleans (October 1999)
Could you spontaneously produce the R-code needed to present todays’ date?
“Donnerstag, 21. Januar 2016”
• Solution Base R*):
> format(Sys.Date(), "%A, %d. %B %Y")
[1] "Donnerstag, 21. Januar 2016"
• Solution DescTools:
> Format(Today(), fmt="dddd, dd. mmmm yyyy")
[1] "Donnerstag, 21. Januar 2016"
Pain Point «User Interface»
Andri Signorell, 21.01.2016
The reasonable man adapts himself to the world; the
unreasonable one persists in trying to adapt the world
to himself.
Therefore, all progress depends on the unreasonable
man.
George Bernard Shaw
Be unreasonable and contact me
with feedback or feature ideas!
andri@signorell.net
Andri Signorell, 21.01.2016
Thanks to
• All the R-Core members and R–contributors
• Frank E Harrell Jr, with contributions from Charles Dupont and many others. (2014). Hmisc:
Harrell Miscellaneous. R package version 3.14-6. http://CRAN.R-project.org/package=Hmisc
• Revelle, W. (2015) psych: Procedures for Personality and Psychological Research,
Northwestern University, Evanston, Illinois, USA, http://CRAN.R-project.org/package=psych
Version = 1.5.1.
• Lemon, J. (2006) Plotrix: a package in the red light district of R. R-News, 6(4): 8-12.
• Hans Peter Wolf and Uni Bielefeld (2014). aplpack: Another Plot PACKage: stem.leaf, bagplot,
faces, spin3R, plotsummary, plothulls, and some slider functions. R package version 1.3.0.
http://CRAN.R-project.org/package=aplpack
• Martin Maechler et al. (2015). sfsmisc: Utilities from Seminar fuer Statistik ETH Zurich. R
package version 1.0-27. http://CRAN.R-project.org/package=sfsmisc
• Christian W. Hoffmann <http://www.echoffmann.ch> (2014). cwhmisc: Miscellaneous
Functions for math, plotting, printing, statistics, strings, and tools. R package version 5.0.
http://CRAN.R-project.org/package=cwhmisc
• And many more! See DescTools’ authors list!
Andri Signorell, 21.01.2016

Weitere ähnliche Inhalte

Was ist angesagt?

Three problems of probability
Three problems of probabilityThree problems of probability
Three problems of probability
azmatmengal
 
Discrete probability
Discrete probabilityDiscrete probability
Discrete probability
Ranjan Kumar
 
Sesión de aprendizaje mat
Sesión de aprendizaje matSesión de aprendizaje mat
Sesión de aprendizaje mat
Claudia Velande
 
Ejerciciossolucionariosnmerosdecimales 100826185300-phpapp01
Ejerciciossolucionariosnmerosdecimales 100826185300-phpapp01Ejerciciossolucionariosnmerosdecimales 100826185300-phpapp01
Ejerciciossolucionariosnmerosdecimales 100826185300-phpapp01
Ivonne Caicedo Salcedo
 

Was ist angesagt? (20)

Discrete probability distributions
Discrete probability distributionsDiscrete probability distributions
Discrete probability distributions
 
Three problems of probability
Three problems of probabilityThree problems of probability
Three problems of probability
 
Complements and Conditional Probability, and Bayes' Theorem
 Complements and Conditional Probability, and Bayes' Theorem Complements and Conditional Probability, and Bayes' Theorem
Complements and Conditional Probability, and Bayes' Theorem
 
Mate basica i
Mate basica iMate basica i
Mate basica i
 
Practice test ch 8 hypothesis testing ch 9 two populations
Practice test ch 8 hypothesis testing ch 9 two populationsPractice test ch 8 hypothesis testing ch 9 two populations
Practice test ch 8 hypothesis testing ch 9 two populations
 
Solution to the Practice Test 3A, Chapter 6 Normal Probability Distribution
Solution to the Practice Test 3A, Chapter 6 Normal Probability DistributionSolution to the Practice Test 3A, Chapter 6 Normal Probability Distribution
Solution to the Practice Test 3A, Chapter 6 Normal Probability Distribution
 
Discrete probability
Discrete probabilityDiscrete probability
Discrete probability
 
Anova (Analysis of variation)
Anova (Analysis of variation)Anova (Analysis of variation)
Anova (Analysis of variation)
 
Permutations and combinations
Permutations and combinationsPermutations and combinations
Permutations and combinations
 
Binomial Probability Distributions
Binomial Probability DistributionsBinomial Probability Distributions
Binomial Probability Distributions
 
Sesión de aprendizaje mat
Sesión de aprendizaje matSesión de aprendizaje mat
Sesión de aprendizaje mat
 
Sampling Distributions
Sampling DistributionsSampling Distributions
Sampling Distributions
 
Ejerciciossolucionariosnmerosdecimales 100826185300-phpapp01
Ejerciciossolucionariosnmerosdecimales 100826185300-phpapp01Ejerciciossolucionariosnmerosdecimales 100826185300-phpapp01
Ejerciciossolucionariosnmerosdecimales 100826185300-phpapp01
 
Halba
HalbaHalba
Halba
 
Operaciones con polinomios sexto grado
Operaciones con polinomios sexto gradoOperaciones con polinomios sexto grado
Operaciones con polinomios sexto grado
 
Histograms
HistogramsHistograms
Histograms
 
Business Statistics Chapter 6
Business Statistics Chapter 6Business Statistics Chapter 6
Business Statistics Chapter 6
 
Chapter 5 part1- The Sampling Distribution of a Sample Mean
Chapter 5 part1- The Sampling Distribution of a Sample MeanChapter 5 part1- The Sampling Distribution of a Sample Mean
Chapter 5 part1- The Sampling Distribution of a Sample Mean
 
Correlation
CorrelationCorrelation
Correlation
 
Discrete Distribution.pptx
Discrete Distribution.pptxDiscrete Distribution.pptx
Discrete Distribution.pptx
 

Ähnlich wie Zurich R User group: Desc tools

Fall 1998 review questions for comprehensive final
Fall 1998 review questions for comprehensive finalFall 1998 review questions for comprehensive final
Fall 1998 review questions for comprehensive final
arbi
 
Scientific Notation
Scientific NotationScientific Notation
Scientific Notation
Awais Khan
 
Statistics and Data Mining with Perl Data Language
Statistics and Data Mining with Perl Data LanguageStatistics and Data Mining with Perl Data Language
Statistics and Data Mining with Perl Data Language
maggiexyz
 

Ähnlich wie Zurich R User group: Desc tools (20)

Project_Report_RMD
Project_Report_RMDProject_Report_RMD
Project_Report_RMD
 
1504 basic statistics
1504 basic statistics1504 basic statistics
1504 basic statistics
 
Healthcare deserts: How accessible is US healthcare?
Healthcare deserts: How accessible is US healthcare?Healthcare deserts: How accessible is US healthcare?
Healthcare deserts: How accessible is US healthcare?
 
2013.11.14 Big Data Workshop Bruno Voisin
2013.11.14 Big Data Workshop Bruno Voisin 2013.11.14 Big Data Workshop Bruno Voisin
2013.11.14 Big Data Workshop Bruno Voisin
 
Mnh csv python
Mnh csv pythonMnh csv python
Mnh csv python
 
Principal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and VisualizationPrincipal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and Visualization
 
Mnh csv python
Mnh csv pythonMnh csv python
Mnh csv python
 
Advanced Statistics And Probability (MSC 615
Advanced Statistics And Probability (MSC 615Advanced Statistics And Probability (MSC 615
Advanced Statistics And Probability (MSC 615
 
Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with R
 
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W1 Statistical Methods
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W1 Statistical MethodsJavier Garcia - Verdugo Sanchez - Six Sigma Training - W1 Statistical Methods
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W1 Statistical Methods
 
Fall 1998 review questions for comprehensive final
Fall 1998 review questions for comprehensive finalFall 1998 review questions for comprehensive final
Fall 1998 review questions for comprehensive final
 
Scientific Notation
Scientific NotationScientific Notation
Scientific Notation
 
The R of War
The R of WarThe R of War
The R of War
 
Data Envelopment Analysis
Data Envelopment AnalysisData Envelopment Analysis
Data Envelopment Analysis
 
Statistics and Data Mining with Perl Data Language
Statistics and Data Mining with Perl Data LanguageStatistics and Data Mining with Perl Data Language
Statistics and Data Mining with Perl Data Language
 
Engineering Data Analysis-ProfCharlton
Engineering Data  Analysis-ProfCharltonEngineering Data  Analysis-ProfCharlton
Engineering Data Analysis-ProfCharlton
 
Piano rubyslava final
Piano rubyslava finalPiano rubyslava final
Piano rubyslava final
 
Low cost data acquisition from digital caliper to pc
Low cost data acquisition from digital caliper to pcLow cost data acquisition from digital caliper to pc
Low cost data acquisition from digital caliper to pc
 
1
11
1
 
2018 Modern Math Workshop - Foundations of Statistical Learning Theory: Quint...
2018 Modern Math Workshop - Foundations of Statistical Learning Theory: Quint...2018 Modern Math Workshop - Foundations of Statistical Learning Theory: Quint...
2018 Modern Math Workshop - Foundations of Statistical Learning Theory: Quint...
 

Mehr von Zurich_R_User_Group

Mehr von Zurich_R_User_Group (11)

Anomaly detection - database integrated
Anomaly detection - database integratedAnomaly detection - database integrated
Anomaly detection - database integrated
 
R at Sanitas - Workflow, Problems and Solutions
R at Sanitas - Workflow, Problems and SolutionsR at Sanitas - Workflow, Problems and Solutions
R at Sanitas - Workflow, Problems and Solutions
 
Modeling Bus Bunching
Modeling Bus BunchingModeling Bus Bunching
Modeling Bus Bunching
 
Visualizing the frequency of transit delays using QGIS and the Leaflet javasc...
Visualizing the frequency of transit delays using QGIS and the Leaflet javasc...Visualizing the frequency of transit delays using QGIS and the Leaflet javasc...
Visualizing the frequency of transit delays using QGIS and the Leaflet javasc...
 
Introduction to Renjin, the alternative engine for R
Introduction to Renjin, the alternative engine for R Introduction to Renjin, the alternative engine for R
Introduction to Renjin, the alternative engine for R
 
How to use R in different professions: R for Car Insurance Product (Speaker: ...
How to use R in different professions: R for Car Insurance Product (Speaker: ...How to use R in different professions: R for Car Insurance Product (Speaker: ...
How to use R in different professions: R for Car Insurance Product (Speaker: ...
 
How to use R in different professions: R In Finance (Speaker: Gabriel Foix, M...
How to use R in different professions: R In Finance (Speaker: Gabriel Foix, M...How to use R in different professions: R In Finance (Speaker: Gabriel Foix, M...
How to use R in different professions: R In Finance (Speaker: Gabriel Foix, M...
 
Where South America is Swinging to the Right: An R-Driven Data Journalism Pr...
Where South America is Swinging to the Right:  An R-Driven Data Journalism Pr...Where South America is Swinging to the Right:  An R-Driven Data Journalism Pr...
Where South America is Swinging to the Right: An R-Driven Data Journalism Pr...
 
Visualization Challenge: Mapping Health During Travel
Visualization Challenge: Mapping Health During TravelVisualization Challenge: Mapping Health During Travel
Visualization Challenge: Mapping Health During Travel
 
January 2016 Meetup: Speeding up (big) data manipulation with data.table package
January 2016 Meetup: Speeding up (big) data manipulation with data.table packageJanuary 2016 Meetup: Speeding up (big) data manipulation with data.table package
January 2016 Meetup: Speeding up (big) data manipulation with data.table package
 
December 2015 Meetup - Shiny: Make Your R Code Interactive - Craig Wang
December 2015 Meetup - Shiny: Make Your R Code Interactive - Craig WangDecember 2015 Meetup - Shiny: Make Your R Code Interactive - Craig Wang
December 2015 Meetup - Shiny: Make Your R Code Interactive - Craig Wang
 

Kürzlich hochgeladen

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 

Kürzlich hochgeladen (20)

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 

Zurich R User group: Desc tools

  • 1. R-Package DescTools Why and where to go? Andri Signorell, Helsana Health Sciences, Zurich R-Group 21.01.2016
  • 2. Randomized clinical trials (RCT) do not represent the reality in health care • Population included in RCT does not correspond to the population finally receiving the treatment 2Andri Signorell, 21.01.2016 Only 1/3 of the ultimatlely treated people would at all fulfill the inclusion criteria Elderly underrepresented in clinical trials
  • 3. Medication of one patient… Is this evidence-based medicine? 3 Real example from our database: Mrs. G. H. in G. received in 2013 drugs with 101 different agents (ATC-Codes) in total 533 prescriptions Andri Signorell, 21.01.2016
  • 4. Unnötige Herzkatheteruntersuchungen in der Schweiz ni. Mit einem Herzkatheter können beim Patienten gefährliche Verschlüsse in den Herzkranzarterien nachgewiesen und behoben werden. Weil die Untersuchung aber teuer, invasiv und nicht frei von Komplikationen ist, sollte sie nur bei begründetem Verdacht auf Engnisse durchgeführt werden – so sehen es die Richtlinien vor. Wird das in der Schweiz befolgt? Dieser Frage sind Forscher in einer Studie nachgegangen. Ihre vor kurzem in «Plos One» veröffentlichten Resultate legen nahe, dass drei von zehn Herzkathetern unnötig sind. (NZZ, 5.3.2015) Zeichnung: Felix Schaad Andri Signorell, 21.01.2016
  • 5. Orders of magnitude • Analytical DataWareHouse (TeraData), updated daily and in a bitemporal history • 492 tables und 7494 attributes • 1'468'893 insured in 2014 • complete treatment information since ~ 2005 • 201'875'131 claims with all in all 949'392'044 detailed positions • Analysed with Andri Signorell, 21.01.2016
  • 6. Where's the pain point? Cross-Industry Standard Process for Data-Mining Shearer C., The CRISP-DM model: the new blueprint for data mining, J Data Warehousing (2000); 5:13—22. 80% of the analysts ressources are lost for data understanding and preparation – … and no one is doing something about it! Andri Signorell, 21.01.2016
  • 7. Users, even expert statisticians, do not always screen the data. B. D. Ripley, Robust statistics (2004) Andri Signorell, 21.01.2016
  • 8. Get the Right Tool for the Job! • Datasets with 150 Variablen, 500’000 rows not unusal • R might not always be optimal for this order of magnitude (performance, RAM) • Programming paradigm let grow the screening code and make it confusing! Andri Signorell, 21.01.2016
  • 9. DescTools focus • provide elaborated descriptive routines – numeric, factor, logical, table, numeric ~ factor, ... – data.frame, formula interface • integrate descriptive plots • easy output to MS-Word document > Desc(d.pizza$temperature) # describe single variable > wrd <- GetNewWrd() > Desc(d.pizza, wrd=wrd) # describe data.frame and send # it directly to Word > Desc(. ~ driver, d.pizza) > Desc(driver ~ ., d.pizza) Andri Signorell, 21.01.2016
  • 10. Describe numeric > summary(d.pizza$temperature) # base R Min. 1st Qu. Median Mean 3rd Qu. Max. NA's 19.30 42.22 50.00 47.94 55.30 64.80 40 > describe(d.pizza$temperature) # library(Hmisc) d.pizza$temperature n missing unique Info Mean .05 .10 .25 .50 .75 .90 .95 1170 39 375 1 47.94 26.70 33.29 42.23 50.00 55.30 58.80 60.50 lowest : 19.30 19.40 20.00 20.20 20.35, highest: 63.80 64.10 64.60 64.70 64.80 > Desc(d.pizza$temperature) # library(DescTools) -------------------------------------------------- d.pizza$temperature (numeric) length n NAs unique 0s mean meanSE 1'210 1'170 40 375 0 47.937 0.291 .05 .10 .25 median .75 .90 .95 26.700 33.290 42.225 50 55.300 58.800 60.500 rng sd vcoef mad IQR skew kurt 45.500 9.938 0.207 9.192 13.075 -0.842 0.051 lowest : 19.3, 19.4, 20, 20.2 (2), 20.35 highest: 63.8, 64.1, 64.6, 64.7, 64.8 Screening-Fragen: • What happens at the edges? • Are there Missings? • Are all elements unique? • Has 0 been misused as NA? Andri Signorell, 21.01.2016
  • 11. • Base R plot(d.pizza$temperature) • DescTools plot(Desc(d.pizza$temperature)) Visualization excellence … … is that which gives to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space. … requires telling the truth about the data. Edward Tufte The Visual Display of Quantitative Information and Envisioning Information, Graphics Press, PO Box 430, Cheshire, CT 06410. Andri Signorell, 21.01.2016
  • 12. Describe table > tab <- table(d.pizza$driver, d.pizza$area) > summary(tab) Number of cases in table: 1194 Number of factors: 2 Test for independence of all factors: Chisq = 1009.5, df = 12, p-value = 1.697e-208 > describe(tab) tab 3 Variables 7 Observations ---------------------------------------------------- Brent n missing unique Info Mean 7 0 7 1 67.57 6 19 29 42 72 128 177 Frequency 1 1 1 1 1 1 1 % 14 14 14 14 14 14 14 ---------------------------------------------------- Camden n missing unique Info Mean 7 0 7 1 48.71 1 4 19 41 47 87 142 Frequency 1 1 1 1 1 1 1 % 14 14 14 14 14 14 14 ---------------------------------------------------- ... base R: reduced to the limits… Hmisc: Oups! Missinterpreted… Andri Signorell, 21.01.2016
  • 13. > tab <- as.table(apply(HairEyeColor, c(1,2), sum))[ + , c("Brown","Hazel","Green","Blue")] > (z <- Desc(tab, row.vars=c(3, 1), rfrq="011", plotit=FALSE, main="Hair ~ Eye")) Hair ~ Eye Summary: n: 592, rows: 4, columns: 4 Pearson's Chi-squared test: X-squared = 138.29, df = 9, p-value < 2.2e-16 Likelihood Ratio: X-squared = 146.44, df = 9, p-value < 2.2e-16 Mantel-Haenszel Chi-squared: X-squared = 109.64, df = 1, p-value < 2.2e-16 Phi-Coefficient 0.483 Contingency Coeff. 0.435 Cramer's V 0.279 Eye Brown Hazel Green Blue Sum Hair freq Black 68 15 5 20 108 Brown 119 54 29 84 286 Red 26 14 14 17 71 Blond 7 10 16 94 127 Sum 220 93 64 215 592 p.row Black 63% 13.9% 4.6% 18.5% . Brown 41.6% 18.9% 10.1% 29.4% . Red 36.6% 19.7% 19.7% 23.9% . Blond 5.5% 7.9% 12.6% 74% . Sum 37.2% 15.7% 10.8% 36.3% . p.col Black 30.9% 16.1% 7.8% 9.3% 18.2% Brown 54.1% 58.1% 45.3% 39.1% 48.3% Red 11.8% 15.1% 21.9% 7.9% 12% Blond 3.2% 10.8% 25% 43.7% 21.5% Sum . . . . . > # do the plot by hand, while setting the colours > cols1 <- SetAlpha(c("sienna4", "burlywood", "chartreuse3", "slategray1"), 0.6) > cols2 <- SetAlpha(c("moccasin", "salmon1", "wheat3", "gray32"), 0.8) > plot(z, col1=cols1, col2=cols2, horiz=FALSE) Andri Signorell, 21.01.2016
  • 14. Describe factors in Word Desc(d.pizza$driver, wrd=GetNewWrd()) Andri Signorell, 21.01.2016
  • 15. Summary: n pairs: 768, valid: 768 (100%), missings: 0 (0%), groups: 2 neg pos Total mean 31.19 37.07 33.24 median 27.00 36.00 29.00 sd 11.67 10.97 11.76 IQR 14.00 16.00 17.00 n 500 268 768 np 65.1% 34.9% 100% NAs 0 0 0 0s 0 0 0 Kruskal-Wallis rank sum test: Kruskal-Wallis chi-squared = 73.253, df = 1, p-value < 2.2e-16 Proportions of diabetes in the quantiles of age: Q1 Q2 Q3 Q4 Q5 neg 86.7% 76.1% 57% 54.3% 46.8% pos 13.3% 23.9% 43% 45.7% 53.2% > Desc(diabetes ~ age, data=d.pima, digits=2, breaks=5, margin=TRUE, conf.level=0.90) factor ~ numeric further: factor ~ factor numeric ~ factor numeric ~ numeric Andri Signorell, 21.01.2016
  • 16. + ~ 440 Functions • Statistical functions and Confidence Intervals Skew, Kurt, CramerV, SomersDelta, CohenKappa, HuberM, MeanCI, BinomCI, … • Additional Tests not found in base R HotellingsT2Test, JarqueBeraTest, BreslowDayTest, DurbinWatsonTest, LeveneTest, ScheffeTest, … • Date functions Today, AddMonths, Day, Month, Year, Weekday, IsWeekend, Zodiac, … • String functions StrAlign, StrTrim, StrDist, StrCountW, StrVal, … • Operators and other %()%, Untable, CollapseTable, Dummy, Large, Small, … Andri Signorell, 21.01.2016
  • 17. Pain Point «Speed» > x <- runif(1e8) > system.time(e1071::kurtosis(x)) user system elapsed 5.67 0.55 6.21 > system.time(DescTools::Kurt(x)) user system elapsed 0.47 0.00 0.47 http://www.noamross.net/blog/2013/4/25/faster-talk.html -> Get a Bigger Computer Andri Signorell, 21.01.2016
  • 19. Pain point «Import» R Data Import/Export This is a guide to importing and exporting data to and from R. This manual is for R, version 3.1.2 (2014-10-31). Copyright © 2000–2014 R Core Team Andri Signorell, 21.01.2016
  • 20. DescTools::XLGetRange() • Import directly from XL Andri Signorell, 21.01.2016
  • 21. Can one be a good data analyst without being a half-good programmer? The short answer to that is, 'No.' The long answer to that is, 'No.' -- Frank Harrell 1999 S-PLUS User Conference, New Orleans (October 1999) Could you spontaneously produce the R-code needed to present todays’ date? “Donnerstag, 21. Januar 2016” • Solution Base R*): > format(Sys.Date(), "%A, %d. %B %Y") [1] "Donnerstag, 21. Januar 2016" • Solution DescTools: > Format(Today(), fmt="dddd, dd. mmmm yyyy") [1] "Donnerstag, 21. Januar 2016" Pain Point «User Interface» Andri Signorell, 21.01.2016
  • 22. The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man. George Bernard Shaw Be unreasonable and contact me with feedback or feature ideas! andri@signorell.net Andri Signorell, 21.01.2016
  • 23. Thanks to • All the R-Core members and R–contributors • Frank E Harrell Jr, with contributions from Charles Dupont and many others. (2014). Hmisc: Harrell Miscellaneous. R package version 3.14-6. http://CRAN.R-project.org/package=Hmisc • Revelle, W. (2015) psych: Procedures for Personality and Psychological Research, Northwestern University, Evanston, Illinois, USA, http://CRAN.R-project.org/package=psych Version = 1.5.1. • Lemon, J. (2006) Plotrix: a package in the red light district of R. R-News, 6(4): 8-12. • Hans Peter Wolf and Uni Bielefeld (2014). aplpack: Another Plot PACKage: stem.leaf, bagplot, faces, spin3R, plotsummary, plothulls, and some slider functions. R package version 1.3.0. http://CRAN.R-project.org/package=aplpack • Martin Maechler et al. (2015). sfsmisc: Utilities from Seminar fuer Statistik ETH Zurich. R package version 1.0-27. http://CRAN.R-project.org/package=sfsmisc • Christian W. Hoffmann <http://www.echoffmann.ch> (2014). cwhmisc: Miscellaneous Functions for math, plotting, printing, statistics, strings, and tools. R package version 5.0. http://CRAN.R-project.org/package=cwhmisc • And many more! See DescTools’ authors list! Andri Signorell, 21.01.2016