SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Downloaden Sie, um offline zu lesen
R Statistics with MongoDB

R Statistics with Mon‐
goDB
Dr. Markus Schmidberger
October 14th, 2013 Munich, Germany
Email: markus@mongosoup.de
Twitter: @cloudHPC

1 von 36
Dr. Markus Schmidberger

R Statistics with MongoDB

2 von 36
R Statistics with MongoDB

Outline

Introduction to Big Data, MongoSoup and R
R statistics with MongoDB and Examples
Summary & Questions

3 von 36
R Statistics with MongoDB

Big Data
Wikipedia: … a collection of data sets so large and complex that it
becomes difficult to process using on-hand database management
tools or traditional data processing. …
storing
processing

4 von 36
Storing: NoSQL - MongoDB

R Statistics with MongoDB

databases using looser consistency models to store data
German MongoDB as a Service: MongoSoup
cloudControl Add-On
currently running on AWS EU-Region (Ireland)
all features available: shared / dedicated hosting, replica
set, sharding
24/7 support available

5 von 36
R Statistics with MongoDB

MongoSoup in < 5 min

go to cloudControl: www.cloudcontrol.com
add an account and a billing address
create a new app, e.g. “rmongodb”
install cloudControl command line tools: cctrlapp
enable your preferred MongoSoup hosting: cctrlapp
rmongodb/default addon.add mongosoup.medium
go to the cloudControl Web-Console-AddOns and get your
credentials
https://www.cloudcontrol.com/console/app/rmongodb

6 von 36
Processing: Analyzing with R and Hadoop
R Statistics with MongoDB

backward-looking analysis is outdated
today: quasi real-time analysis
tomorrow: forward-looking predictive analysis
more complex methods, more data available, more
processing time required
Check my Strata London Tutorial “Big Data Analyses with R”

7 von 36
R Statistics with MongoDB

Introduction to R

R is a free software environment for statistical computing
and graphics
offers tools to manage and analyze data
standard statistical methods are implemented
compiles and runs under different OS
support via huge community

www.r-project.org

8 von 36
huge online-libraries with > 5000 R-packages:

R Statistics with MongoDB

http://cran.r-project.org
possibility to write personalized code and to contribute new
packages
really famous since January 6, 2009: The New York Times,
“Data Analysts Captivated by R's Power”

9 von 36
R Statistics with MongoDB

RStudio IDE

http://www.rstudio.com

10 von 36
R Statistics with MongoDB

R as calculator

(5+5) - 1 * 3
[1] 7
x <- 3
x
[1] 3
x^2 + 4
[1] 13

11 von 36
R Statistics with MongoDB

y <- c(1,2,3)
y
[1] 1 2 3
x <- 1:10
x
[1]

1

2

3

4

5

6

7

8

9 10

x < 5
[1] TRUE TRUE TRUE TRUE FALSE FALSE
FALSE FALSE FALSE FALSE

12 von 36
R Statistics with MongoDB

x[3:7]

[1] 3 4 5 6 7
mean(x)
[1] 5.5
help("mean")
?mean

13 von 36
R Statistics with MongoDB

14 von 36
Many Statistical Functions

R Statistics with MongoDB

kmeans(dat, 4)
K-means clustering with 4 clusters of sizes
21, 18, 30, 31
Cluster means:
[,1]
[,2]
1 0.7755 0.8509
2 -0.1557 -0.2305
3 1.2299 1.1472
4 0.1510 0.1507
Clustering vector:
[1] 4 2 4 4 2 4 4
2 2 4 4 4 2 4 2 4 4
[36] 4 4 4 4 4 4 4
3 1 3 3 3 1 1 3 3 3
[71] 1 3 1 1 3 3 3
1 3 1 3 3 3 3 1 3 3

4
2
4
3
3
3

2
4
2
1
1

4
2
4
3
1

4
2
2
1
3

4
4
2
3
3

2 2 4 4 1 4 2
4
4 2 2 1 1 1 1
3
1 1 1 3 3 3 3

Within cluster sum of squares by cluster:
[1] 3.318 1.166 4.019 3.195
(between_SS / total_SS = 83.0 %)
Available components:
[1] "cluster"
"centers"
"totss"
"withinss"
[5] "tot.withinss" "betweenss"
"size"

15 von 36
R Statistics with MongoDB

plot(dat, col = cl$cluster, cex=2, pch=16)
points(cl$centers, col = 1:4, pch = 13, cex
= 4)

16 von 36
R Shiny - easy web application

R Statistics with MongoDB

developed by RStudio
turns R analyses into interactive web applications that
anyone can use
let your users choose input parameters using friendly
controls like sliders, drop-downs, and text fields
easily incorporate any number of outputs like plots, tables,
and summaries
no HTML or JavaScript knowledge is necessary, only R
http://www.rstudio.com/shiny/

17 von 36
R Statistics with MongoDB

R and Databases
SQL provides a standard language to filter, aggregate, group,
sort data
SQL in new places: Hive, Impala, …
ODBC provides SQL interface to non-database data (Excel,
CSV, text files)
R stores relational data in data.frames (extended lists)

18 von 36
R Statistics with MongoDB

data(iris)
head(iris, n=3)
Sepal.Length Sepal.Width Petal.Length
Petal.Width Species
1
5.1
3.5
1.4
0.2 setosa
2
4.9
3.0
1.4
0.2 setosa
3
4.7
3.2
1.3
0.2 setosa
class(iris)
[1] "data.frame"

19 von 36
R Statistics with MongoDB

R package: sqldf

running SQL statements on R data frames
library(sqldf)
sqldf("select * from iris limit 2")
Sepal_Length Sepal_Width Petal_Length
Petal_Width Species
1
5.1
3.5
1.4
0.2 setosa
2
4.9
3.0
1.4
0.2 setosa
sqldf("select count(*) from iris")
count(*)
1
150

20 von 36
Other relational R package

R Statistics with MongoDB

RMySQL package provides an interface to MySQL
RPostgreSQL package provides an interface to PostgreSQL
ROracle package provides an interface for Oracle
RJDBC package provides access to databases through a
JDBC interface
RSQLite package provides access to SQLite
(SQLite engine is included)
One big problem:
all packages read the full result in R memory

21 von 36
R Statistics with MongoDB

R and MongoDB

on CRAN there are two packages to connect R with MongoDB
rmongodb supported by MongoDB, Inc.
powerful for big data
difficult to use due to BSON objects
RMongo
easy to use
limited functionality
reads full results in R memory
does not work on MAC OS X

22 von 36
R Statistics with MongoDB

R package: RMongo

library(Rmongo)
mongo <- mongoDbConnect("cc_JwQcDLJSYQJb",
"dbs001.mongosoup.de", 27017)
dbAuthenticate(mongo,
username="JwQcDLJSYQJb",
password="RSXPkUkXXXXX")
dbShowCollections(mongo)
dbGetQuery(mongo, "zips","{'state':'AL'}")
dbInsertDocument(mongo, "test_data",
'{"foo": "bar", "size": 5 }')
dbDisconnect(mongo)

23 von 36
R Statistics with MongoDB

R package: rmongodb

developed on top of the MongoDB supported C driver
library(rmongodb)
mongo <mongo.create(host="dbs001.mongosoup.de",
db="cc_JwQcDLJSYQJb",
username="JwQcDLJSYQJb",
password="RSXPkUkXXXXX")
mongo
[1] 0
attr(,"mongo")
<pointer: 0x105a1de80>
attr(,"class")
[1] "mongo"
attr(,"host")
[1] "dbs001.mongosoup.de"
attr(,"name")
[1] ""
attr(,"username")
[1] "JwQcDLJSYQJb"
attr(,"password")
[1] "RSXPkUkxRdOX"
attr(,"db")
[1] "cc_JwQcDLJSYQJb"
attr(,"timeout")
[1] 0

24 von 36
R Statistics with MongoDB

mongo.get.database.collections(mongo,
"cc_JwQcDLJSYQJb")
[1] "cc_JwQcDLJSYQJb.zips"
"cc_JwQcDLJSYQJb.ccp" "cc_JwQcDLJSYQJb.test"
mongo <- mongo.disconnect(mongo)

25 von 36
R Statistics with MongoDB

buf <- mongo.bson.buffer.create()
mongo.bson.buffer.append(buf, "state", "AL")
[1] TRUE
query <- mongo.bson.from.buffer(buf)
query
state : 2

26 von 36

AL
R Statistics with MongoDB

res <- mongo.find.one(mongo,
"cc_JwQcDLJSYQJb.zips", query)
res
city : 2
loc : 4
0 : 1
1 : 1
pop : 16
state : 2
_id : 2

27 von 36

ACMAR

6055
AL
35004

-86.515570
33.584132
R Statistics with MongoDB

out <- mongo.bson.to.list(res)
out$loc
[1] -86.52

33.58

typeof(out$loc)
[1] "double"
out$pop
[1] 6055
out$state
[1] "AL"

28 von 36
R Statistics with MongoDB

cursor <- mongo.find(mongo,
"cc_JwQcDLJSYQJb.zips", query)
res <- NULL
while (mongo.cursor.next(cursor)){
value <- mongo.cursor.value(cursor)
Rvalue <- mongo.bson.to.list(value)
res <- rbind(res, Rvalue)
}
err <- mongo.cursor.destroy(cursor)
head(res, n=4)
city
_id
Rvalue "ACMAR"
"35004"
Rvalue "ADAMSVILLE"
"35005"
Rvalue "ADGER"
"35006"
Rvalue "KEYSTONE"
"35007"

29 von 36

loc

pop

Numeric,2 6055

state
"AL"

Numeric,2 10616 "AL"
Numeric,2 3205

"AL"

Numeric,2 14218 "AL"
It is all about creating BSON query or field objects

R Statistics with MongoDB

b <- mongo.bson.from.list(
list(name="Fred", age=29, city="Boston"))
b
name : 2
age : 1
city : 2

Fred
29.000000
Boston

mongo.bson.to.list(b)
$name
[1] "Fred"
$age
[1] 29
$city
[1] "Boston"

30 von 36
R Statistics with MongoDB

?mongo.bson
?mongo.bson.buffer.append
?mongo.bson.buffer.start.array
?mongo.bson.buffer.start.object
buf <- mongo.bson.buffer.create()
mongo.bson.buffer.append(buf, "aggregate",
"zips")
mongo.bson.buffer.start.array(buf,
"pipeline")
mongo.bson.buffer.start.object(buf,
"$group")
mongo.bson.buffer.append(buf, "_id",
"$state")
mongo.bson.buffer.start.object(buf,
"totalPop")
mongo.bson.buffer.append(buf, "$sum",
"$pop")
mongo.bson.buffer.finish.object(buf)
mongo.bson.buffer.finish.object(buf)
mongo.bson.buffer.start.object(buf, "$match")
mongo.bson.buffer.start.object(buf,
"totalPop")
mongo.bson.buffer.append(buf, "$gte",
"10000")
mongo.bson.buffer.finish.object(buf)
mongo.bson.buffer.finish.object(buf)
mongo.bson.buffer.finish.object(buf)
query <- mongo.bson.from.buffer(buf)

31 von 36
CCP Web Analytics Challenge

R Statistics with MongoDB

buf <- mongo.bson.buffer.create()
query <- mongo.bson.from.buffer(buf)
buf <- mongo.bson.buffer.create()
err <- mongo.bson.buffer.append(buf, "user",
1)
err <- mongo.bson.buffer.append(buf, "type",
1)
field <- mongo.bson.from.buffer(buf)
out <- mongo.find(mongo,
"cc_JwQcDLJSYQJb.ccp", query, fields=field,
limit=1000)
res <- NULL
while (mongo.cursor.next(out)){
value <- mongo.cursor.value(out)
Rvalue <- mongo.bson.to.list(value)
res <- rbind(res, Rvalue)
}

32 von 36
R Statistics with MongoDB

boxplot( as.integer(table(unlist(res[,2]))
), cex=4, horizontal=TRUE, main="Number of
actions per user")

33 von 36
R Statistics with MongoDB

Shiny Mongo
R based MongoDB User Interface
R packages shiny and rmongodb
less than 200 lines of code
DEMO: http://localhost:8100

https://github.com/comsysto/ShinyMongo

34 von 36
R Statistics with MongoDB

Summary
R is a powerful statistical tool to analyse many different kind
of data
R can access databases
MongoDB and rmongodb ready for Big Data
start playing around with R, Big Data and MongoDB
http://www.r-project.org
http://www.mongodb.org
http://www.mongosoup.de 

35 von 36
R Statistics with MongoDB

See you soon

thanks a lot for your attention
there are R trainings in December 2013 in Munich
http://comsysto.com/events.html#r
we are hosting many events and meetups
meet you at the MongoSoup booth

Email: markus@mongosoup.de
Twitter: @cloudHPC

36 von 36

Weitere ähnliche Inhalte

Was ist angesagt?

Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
Cloudera, Inc.
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
MongoDB
 

Was ist angesagt? (20)

Data science at the command line
Data science at the command lineData science at the command line
Data science at the command line
 
2011 Mongo FR - Indexing in MongoDB
2011 Mongo FR - Indexing in MongoDB2011 Mongo FR - Indexing in MongoDB
2011 Mongo FR - Indexing in MongoDB
 
Xadoop - new approaches to data analytics
Xadoop - new approaches to data analyticsXadoop - new approaches to data analytics
Xadoop - new approaches to data analytics
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
 
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
 
MongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced AggregationMongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced Aggregation
 
Supercharge your Analytics with ClickHouse, v.2. By Vadim Tkachenko
Supercharge your Analytics with ClickHouse, v.2. By Vadim TkachenkoSupercharge your Analytics with ClickHouse, v.2. By Vadim Tkachenko
Supercharge your Analytics with ClickHouse, v.2. By Vadim Tkachenko
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation Pipeline
 
Indexing and Performance Tuning
Indexing and Performance TuningIndexing and Performance Tuning
Indexing and Performance Tuning
 
Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1
 
Webinar: Exploring the Aggregation Framework
Webinar: Exploring the Aggregation FrameworkWebinar: Exploring the Aggregation Framework
Webinar: Exploring the Aggregation Framework
 
How to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analyticsHow to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analytics
 
Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDB
 
MongoDB: Comparing WiredTiger In-Memory Engine to Redis
MongoDB: Comparing WiredTiger In-Memory Engine to RedisMongoDB: Comparing WiredTiger In-Memory Engine to Redis
MongoDB: Comparing WiredTiger In-Memory Engine to Redis
 
Webinar: MongoDB + Hadoop
Webinar: MongoDB + HadoopWebinar: MongoDB + Hadoop
Webinar: MongoDB + Hadoop
 
Streaming Operational Data with MariaDB MaxScale
Streaming Operational Data with MariaDB MaxScaleStreaming Operational Data with MariaDB MaxScale
Streaming Operational Data with MariaDB MaxScale
 

Andere mochten auch

Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
MongoDB
 
Using MongoDB as a high performance graph database
Using MongoDB as a high performance graph databaseUsing MongoDB as a high performance graph database
Using MongoDB as a high performance graph database
Chris Clarke
 
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: Sharding
MongoDB
 
Using R with MongoDB(R User Conference Korea 2015, SK C&C 김인범)
Using R with MongoDB(R User Conference Korea 2015, SK C&C 김인범) Using R with MongoDB(R User Conference Korea 2015, SK C&C 김인범)
Using R with MongoDB(R User Conference Korea 2015, SK C&C 김인범)
InBum Kim
 

Andere mochten auch (20)

Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
 
MongoDB & Machine Learning
MongoDB & Machine LearningMongoDB & Machine Learning
MongoDB & Machine Learning
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
 
Using MongoDB as a high performance graph database
Using MongoDB as a high performance graph databaseUsing MongoDB as a high performance graph database
Using MongoDB as a high performance graph database
 
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: Sharding
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your Niche
 
Benchmarking MongoDB for Fame and Fortune
Benchmarking MongoDB for Fame and FortuneBenchmarking MongoDB for Fame and Fortune
Benchmarking MongoDB for Fame and Fortune
 
Get More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMXGet More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMX
 
Seefeld stats r_bio
Seefeld stats r_bioSeefeld stats r_bio
Seefeld stats r_bio
 
Docopt, beautiful command-line options for R, user2014
Docopt, beautiful command-line options for R,  user2014Docopt, beautiful command-line options for R,  user2014
Docopt, beautiful command-line options for R, user2014
 
Performance comparison: Multi-Model vs. MongoDB and Neo4j
Performance comparison: Multi-Model vs. MongoDB and Neo4jPerformance comparison: Multi-Model vs. MongoDB and Neo4j
Performance comparison: Multi-Model vs. MongoDB and Neo4j
 
R Statistics
R StatisticsR Statistics
R Statistics
 
Extending and customizing ibm spss statistics with python, r, and .net (2)
Extending and customizing ibm spss statistics with python, r, and .net (2)Extending and customizing ibm spss statistics with python, r, and .net (2)
Extending and customizing ibm spss statistics with python, r, and .net (2)
 
Statistics with R
Statistics with RStatistics with R
Statistics with R
 
Using R with MongoDB(R User Conference Korea 2015, SK C&C 김인범)
Using R with MongoDB(R User Conference Korea 2015, SK C&C 김인범) Using R with MongoDB(R User Conference Korea 2015, SK C&C 김인범)
Using R with MongoDB(R User Conference Korea 2015, SK C&C 김인범)
 
Is It Fast? : Measuring MongoDB Performance
Is It Fast? : Measuring MongoDB PerformanceIs It Fast? : Measuring MongoDB Performance
Is It Fast? : Measuring MongoDB Performance
 

Ähnlich wie R statistics with mongo db

ComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical SciencesComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical Sciences
alexstorer
 

Ähnlich wie R statistics with mongo db (20)

MongoDB FabLab León
MongoDB FabLab LeónMongoDB FabLab León
MongoDB FabLab León
 
Data science and OSS
Data science and OSSData science and OSS
Data science and OSS
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to Streaming
 
Data Analytics with MongoDB - Jane Fine
Data Analytics with MongoDB - Jane FineData Analytics with MongoDB - Jane Fine
Data Analytics with MongoDB - Jane Fine
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and Implications
 
India software developers conference 2013 Bangalore
India software developers conference 2013 BangaloreIndia software developers conference 2013 Bangalore
India software developers conference 2013 Bangalore
 
Spark & Cassandra - DevFest Córdoba
Spark & Cassandra - DevFest CórdobaSpark & Cassandra - DevFest Córdoba
Spark & Cassandra - DevFest Córdoba
 
ComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical SciencesComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical Sciences
 
No SQL and MongoDB - Hyderabad Scalability Meetup
No SQL and MongoDB - Hyderabad Scalability MeetupNo SQL and MongoDB - Hyderabad Scalability Meetup
No SQL and MongoDB - Hyderabad Scalability Meetup
 
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان دادهمعرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
 
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
 
MongoDB installation,CRUD operation & JavaScript shell
MongoDB installation,CRUD operation & JavaScript shellMongoDB installation,CRUD operation & JavaScript shell
MongoDB installation,CRUD operation & JavaScript shell
 
New Developments in Spark
New Developments in SparkNew Developments in Spark
New Developments in Spark
 
MongoDB Basics Unileon
MongoDB Basics UnileonMongoDB Basics Unileon
MongoDB Basics Unileon
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Data stores: beyond relational databases
Data stores: beyond relational databasesData stores: beyond relational databases
Data stores: beyond relational databases
 
Jumpstart: MongoDB BI Connector & Tableau
Jumpstart: MongoDB BI Connector & TableauJumpstart: MongoDB BI Connector & Tableau
Jumpstart: MongoDB BI Connector & Tableau
 
20180420 hk-the powerofmysql8
20180420 hk-the powerofmysql820180420 hk-the powerofmysql8
20180420 hk-the powerofmysql8
 
Bridging data analysis and interactive visualization
Bridging data analysis and interactive visualizationBridging data analysis and interactive visualization
Bridging data analysis and interactive visualization
 
Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick Database
 

Mehr von MongoDB

Mehr von MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Kürzlich hochgeladen

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

R statistics with mongo db

  • 1. R Statistics with MongoDB R Statistics with Mon‐ goDB Dr. Markus Schmidberger October 14th, 2013 Munich, Germany Email: markus@mongosoup.de Twitter: @cloudHPC 1 von 36
  • 2. Dr. Markus Schmidberger R Statistics with MongoDB 2 von 36
  • 3. R Statistics with MongoDB Outline Introduction to Big Data, MongoSoup and R R statistics with MongoDB and Examples Summary & Questions 3 von 36
  • 4. R Statistics with MongoDB Big Data Wikipedia: … a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing. … storing processing 4 von 36
  • 5. Storing: NoSQL - MongoDB R Statistics with MongoDB databases using looser consistency models to store data German MongoDB as a Service: MongoSoup cloudControl Add-On currently running on AWS EU-Region (Ireland) all features available: shared / dedicated hosting, replica set, sharding 24/7 support available 5 von 36
  • 6. R Statistics with MongoDB MongoSoup in < 5 min go to cloudControl: www.cloudcontrol.com add an account and a billing address create a new app, e.g. “rmongodb” install cloudControl command line tools: cctrlapp enable your preferred MongoSoup hosting: cctrlapp rmongodb/default addon.add mongosoup.medium go to the cloudControl Web-Console-AddOns and get your credentials https://www.cloudcontrol.com/console/app/rmongodb 6 von 36
  • 7. Processing: Analyzing with R and Hadoop R Statistics with MongoDB backward-looking analysis is outdated today: quasi real-time analysis tomorrow: forward-looking predictive analysis more complex methods, more data available, more processing time required Check my Strata London Tutorial “Big Data Analyses with R” 7 von 36
  • 8. R Statistics with MongoDB Introduction to R R is a free software environment for statistical computing and graphics offers tools to manage and analyze data standard statistical methods are implemented compiles and runs under different OS support via huge community www.r-project.org 8 von 36
  • 9. huge online-libraries with > 5000 R-packages: R Statistics with MongoDB http://cran.r-project.org possibility to write personalized code and to contribute new packages really famous since January 6, 2009: The New York Times, “Data Analysts Captivated by R's Power” 9 von 36
  • 10. R Statistics with MongoDB RStudio IDE http://www.rstudio.com 10 von 36
  • 11. R Statistics with MongoDB R as calculator (5+5) - 1 * 3 [1] 7 x <- 3 x [1] 3 x^2 + 4 [1] 13 11 von 36
  • 12. R Statistics with MongoDB y <- c(1,2,3) y [1] 1 2 3 x <- 1:10 x [1] 1 2 3 4 5 6 7 8 9 10 x < 5 [1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE 12 von 36
  • 13. R Statistics with MongoDB x[3:7] [1] 3 4 5 6 7 mean(x) [1] 5.5 help("mean") ?mean 13 von 36
  • 14. R Statistics with MongoDB 14 von 36
  • 15. Many Statistical Functions R Statistics with MongoDB kmeans(dat, 4) K-means clustering with 4 clusters of sizes 21, 18, 30, 31 Cluster means: [,1] [,2] 1 0.7755 0.8509 2 -0.1557 -0.2305 3 1.2299 1.1472 4 0.1510 0.1507 Clustering vector: [1] 4 2 4 4 2 4 4 2 2 4 4 4 2 4 2 4 4 [36] 4 4 4 4 4 4 4 3 1 3 3 3 1 1 3 3 3 [71] 1 3 1 1 3 3 3 1 3 1 3 3 3 3 1 3 3 4 2 4 3 3 3 2 4 2 1 1 4 2 4 3 1 4 2 2 1 3 4 4 2 3 3 2 2 4 4 1 4 2 4 4 2 2 1 1 1 1 3 1 1 1 3 3 3 3 Within cluster sum of squares by cluster: [1] 3.318 1.166 4.019 3.195 (between_SS / total_SS = 83.0 %) Available components: [1] "cluster" "centers" "totss" "withinss" [5] "tot.withinss" "betweenss" "size" 15 von 36
  • 16. R Statistics with MongoDB plot(dat, col = cl$cluster, cex=2, pch=16) points(cl$centers, col = 1:4, pch = 13, cex = 4) 16 von 36
  • 17. R Shiny - easy web application R Statistics with MongoDB developed by RStudio turns R analyses into interactive web applications that anyone can use let your users choose input parameters using friendly controls like sliders, drop-downs, and text fields easily incorporate any number of outputs like plots, tables, and summaries no HTML or JavaScript knowledge is necessary, only R http://www.rstudio.com/shiny/ 17 von 36
  • 18. R Statistics with MongoDB R and Databases SQL provides a standard language to filter, aggregate, group, sort data SQL in new places: Hive, Impala, … ODBC provides SQL interface to non-database data (Excel, CSV, text files) R stores relational data in data.frames (extended lists) 18 von 36
  • 19. R Statistics with MongoDB data(iris) head(iris, n=3) Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa class(iris) [1] "data.frame" 19 von 36
  • 20. R Statistics with MongoDB R package: sqldf running SQL statements on R data frames library(sqldf) sqldf("select * from iris limit 2") Sepal_Length Sepal_Width Petal_Length Petal_Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa sqldf("select count(*) from iris") count(*) 1 150 20 von 36
  • 21. Other relational R package R Statistics with MongoDB RMySQL package provides an interface to MySQL RPostgreSQL package provides an interface to PostgreSQL ROracle package provides an interface for Oracle RJDBC package provides access to databases through a JDBC interface RSQLite package provides access to SQLite (SQLite engine is included) One big problem: all packages read the full result in R memory 21 von 36
  • 22. R Statistics with MongoDB R and MongoDB on CRAN there are two packages to connect R with MongoDB rmongodb supported by MongoDB, Inc. powerful for big data difficult to use due to BSON objects RMongo easy to use limited functionality reads full results in R memory does not work on MAC OS X 22 von 36
  • 23. R Statistics with MongoDB R package: RMongo library(Rmongo) mongo <- mongoDbConnect("cc_JwQcDLJSYQJb", "dbs001.mongosoup.de", 27017) dbAuthenticate(mongo, username="JwQcDLJSYQJb", password="RSXPkUkXXXXX") dbShowCollections(mongo) dbGetQuery(mongo, "zips","{'state':'AL'}") dbInsertDocument(mongo, "test_data", '{"foo": "bar", "size": 5 }') dbDisconnect(mongo) 23 von 36
  • 24. R Statistics with MongoDB R package: rmongodb developed on top of the MongoDB supported C driver library(rmongodb) mongo <mongo.create(host="dbs001.mongosoup.de", db="cc_JwQcDLJSYQJb", username="JwQcDLJSYQJb", password="RSXPkUkXXXXX") mongo [1] 0 attr(,"mongo") <pointer: 0x105a1de80> attr(,"class") [1] "mongo" attr(,"host") [1] "dbs001.mongosoup.de" attr(,"name") [1] "" attr(,"username") [1] "JwQcDLJSYQJb" attr(,"password") [1] "RSXPkUkxRdOX" attr(,"db") [1] "cc_JwQcDLJSYQJb" attr(,"timeout") [1] 0 24 von 36
  • 25. R Statistics with MongoDB mongo.get.database.collections(mongo, "cc_JwQcDLJSYQJb") [1] "cc_JwQcDLJSYQJb.zips" "cc_JwQcDLJSYQJb.ccp" "cc_JwQcDLJSYQJb.test" mongo <- mongo.disconnect(mongo) 25 von 36
  • 26. R Statistics with MongoDB buf <- mongo.bson.buffer.create() mongo.bson.buffer.append(buf, "state", "AL") [1] TRUE query <- mongo.bson.from.buffer(buf) query state : 2 26 von 36 AL
  • 27. R Statistics with MongoDB res <- mongo.find.one(mongo, "cc_JwQcDLJSYQJb.zips", query) res city : 2 loc : 4 0 : 1 1 : 1 pop : 16 state : 2 _id : 2 27 von 36 ACMAR 6055 AL 35004 -86.515570 33.584132
  • 28. R Statistics with MongoDB out <- mongo.bson.to.list(res) out$loc [1] -86.52 33.58 typeof(out$loc) [1] "double" out$pop [1] 6055 out$state [1] "AL" 28 von 36
  • 29. R Statistics with MongoDB cursor <- mongo.find(mongo, "cc_JwQcDLJSYQJb.zips", query) res <- NULL while (mongo.cursor.next(cursor)){ value <- mongo.cursor.value(cursor) Rvalue <- mongo.bson.to.list(value) res <- rbind(res, Rvalue) } err <- mongo.cursor.destroy(cursor) head(res, n=4) city _id Rvalue "ACMAR" "35004" Rvalue "ADAMSVILLE" "35005" Rvalue "ADGER" "35006" Rvalue "KEYSTONE" "35007" 29 von 36 loc pop Numeric,2 6055 state "AL" Numeric,2 10616 "AL" Numeric,2 3205 "AL" Numeric,2 14218 "AL"
  • 30. It is all about creating BSON query or field objects R Statistics with MongoDB b <- mongo.bson.from.list( list(name="Fred", age=29, city="Boston")) b name : 2 age : 1 city : 2 Fred 29.000000 Boston mongo.bson.to.list(b) $name [1] "Fred" $age [1] 29 $city [1] "Boston" 30 von 36
  • 31. R Statistics with MongoDB ?mongo.bson ?mongo.bson.buffer.append ?mongo.bson.buffer.start.array ?mongo.bson.buffer.start.object buf <- mongo.bson.buffer.create() mongo.bson.buffer.append(buf, "aggregate", "zips") mongo.bson.buffer.start.array(buf, "pipeline") mongo.bson.buffer.start.object(buf, "$group") mongo.bson.buffer.append(buf, "_id", "$state") mongo.bson.buffer.start.object(buf, "totalPop") mongo.bson.buffer.append(buf, "$sum", "$pop") mongo.bson.buffer.finish.object(buf) mongo.bson.buffer.finish.object(buf) mongo.bson.buffer.start.object(buf, "$match") mongo.bson.buffer.start.object(buf, "totalPop") mongo.bson.buffer.append(buf, "$gte", "10000") mongo.bson.buffer.finish.object(buf) mongo.bson.buffer.finish.object(buf) mongo.bson.buffer.finish.object(buf) query <- mongo.bson.from.buffer(buf) 31 von 36
  • 32. CCP Web Analytics Challenge R Statistics with MongoDB buf <- mongo.bson.buffer.create() query <- mongo.bson.from.buffer(buf) buf <- mongo.bson.buffer.create() err <- mongo.bson.buffer.append(buf, "user", 1) err <- mongo.bson.buffer.append(buf, "type", 1) field <- mongo.bson.from.buffer(buf) out <- mongo.find(mongo, "cc_JwQcDLJSYQJb.ccp", query, fields=field, limit=1000) res <- NULL while (mongo.cursor.next(out)){ value <- mongo.cursor.value(out) Rvalue <- mongo.bson.to.list(value) res <- rbind(res, Rvalue) } 32 von 36
  • 33. R Statistics with MongoDB boxplot( as.integer(table(unlist(res[,2])) ), cex=4, horizontal=TRUE, main="Number of actions per user") 33 von 36
  • 34. R Statistics with MongoDB Shiny Mongo R based MongoDB User Interface R packages shiny and rmongodb less than 200 lines of code DEMO: http://localhost:8100 https://github.com/comsysto/ShinyMongo 34 von 36
  • 35. R Statistics with MongoDB Summary R is a powerful statistical tool to analyse many different kind of data R can access databases MongoDB and rmongodb ready for Big Data start playing around with R, Big Data and MongoDB http://www.r-project.org http://www.mongodb.org http://www.mongosoup.de  35 von 36
  • 36. R Statistics with MongoDB See you soon thanks a lot for your attention there are R trainings in December 2013 in Munich http://comsysto.com/events.html#r we are hosting many events and meetups meet you at the MongoSoup booth Email: markus@mongosoup.de Twitter: @cloudHPC 36 von 36