Cool Features Presentation at Mongo Seattle

Map/Reduce, geospatial search, and other Cool
Features

Richard M Kreuter
10gen Inc.
richard@10gen.com

July 27, 2010

Map/Reduce, geospatial search, and other Cool Features

Things I’ll cover

Array tricks
Geospatial searches
Map/Reduce
The ﬁndAndModify command


Array tricks

Suppose your collection looked like this:
> db . a r r a y s . f i n d ( )
{ ” i d ” : 1 , ” t a g s ” : [ ” a ” , ”b ” , ” c ” ] }
{ ” i d ” : 2 , ” t a g s ” : [ ”b ” , ” c ” , ”d” ] }
{ ” i d ” : 3 , ” t a g s ” : [ ” c ” , ”d ” , ” e ” ] }


Array tricks, continued

....then consider the following queries:
db . arrays . f i n d ({ t a g s : ’a ’}) // matches 1
db . arrays . f i n d ({ t a g s : ’c ’}) // matches 1 , 2 , 3
db . arrays . f i n d ({ t a g s : { $in : [ ’ a ’ , ’ e ’ ] } } ) // 1 and 3
db . arrays . f i n d ({ t a g s : {$all : [ ’ a ’ , ’ e ’ ] } } ) // no ma
db . arrays . f i n d ({ t a g s : {$all : [ ’ c ’ , ’ d ’ ] } } ) // match


Array tricks, continued continued

As of v1.5.1 (and so v1.6) you can project slices of arrays with the
$slice operator
// Will return { id :1 , tags : [” a ”]}
db . a r r a y s . f i n d ({ id :1} , { tags : { $ s l i c e :1}})
// Will return { id :1 , tags : [” c ”]}
db . a r r a y s . f i n d ({ id : 1 } , { t a g s : { $ s l i c e : −1}})
// Will return { id :1 , tags : [” b” , ”c ”]}
db . a r r a y s . f i n d ({ id :1} , { tags : { $ s l i c e :[1 ,2]}})


Array tricks, continued continued continued

And you can also update portions of documents matched by the
update’s selector with the positional operator, $:
> db . a r r a y s . u p d a t e ( { t a g s : ’ b ’ } ,
{ $ s e t : { ’ t a g s . $ ’ : ’X’ } } ,
false , true );
> db . a r r a y s . f i n d ( ) ;
{ ” i d ” : 1 , ” t a g s ” : [ ” a ” , ”X” , ” c ” ] }
{ ” i d ” : 2 , ” t a g s ” : [ ”X” , ” c ” , ”d” ] }
{ ” i d ” : 3 , ” t a g s ” : [ ” c ” , ”d ” , ” e ” ] }


geospatial data

Consider a collection whose documents look like this:
> db . z i p s . f i n d O n e ( { z i p : ’ 9 8 1 0 5 ’ } ) ;
{
” i d ” : O b j e c t I d (”4 c4ee17c97af873c2208857a ”) ,
” c i t y ” : ”SEATTLE” ,
” z i p ” : ”98105” ,
” loc ” : {
”y” : 47.663266 ,
”x” : 122.302236
},
” pop ” : 3 7 1 2 0 ,
” s t a t e ” : ”WA”
}


geospatial data, continued

You can do various kinds of queries; here’s a regex query:
> db . z i p s . f i n d O n e ( { z i p : / ˆ 9 8 1 0 / } ) ;
{
” i d ” : O b j e c t I d (”4 c4ee17c97af873c22088576 ”) ,
” c i t y ” : ”SEATTLE” ,
” z i p ” : ”98101” ,
” loc ” : {
”y” : 47.611435 ,
”x” : 122.330456
},
” pop ” : 5 8 0 1 ,
” s t a t e ” : ”WA”
}


geospatial data, continued continued

... here’s a range query ...
> db . z i p s . f i n d ({ z i p : { $gte : ’ 9 8 1 0 1 ’ , $ l t : ’98110 ’}} ,
{ zip :1 , city :1});
{ ” id ”: . . . , ” c i t y ” : ”SEATTLE” , ” z i p ” : ”98101”}
{ ” id ”: . . . , ” c i t y ” : ”TUKWILA” , ” z i p ” : ”98108”}


geospatial data, continued continued continued

... and here are some simple aggregated queries and a sorted query.
> db . z i p s . c o u n t ( { c i t y : ’ SEATTLE ’ , s t a t e : { $ne : ”WA” } } )
0
> db . z i p s . c o u n t ( { c i t y : ’ PHILADELPHIA ’ ,
s t a t e : { $ne : ” PA” } } )
4
> db . z i p s . c o u n t ( { pop : 1 } )
10
> db . z i p s . c o u n t ( { pop : 0 } )
67
> db . z i p s . f i n d ( ) . s o r t ( { pop : − 1 } ) . l i m i t ( 1 0 )


geospatial queries

Since v1.4, MongoDB supports a few kinds of geospatial queries,
enabled by creating an index of type ’2d’ on the collection:
> db . z i p s . e n s u r e I n d e x ( { l o c : ’ 2 d ’ , s t a t e : 1 } ) ;

// s u b s e q u e n t q u e r i e s w i l l u s e o u r l o c
// a s a r e f e r e n c e p o i n t :
> o u r l o c = db . z i p s . f i n d O n e ( { z i p : ’ 9 8 1 0 5 ’ } ) . l o c ;


geospatial queries, continued

> db . z i p s . f i n d ({ l o c : { $near : o u r l o c } } ) . l i m i t ( 3 ) ;
{ ” id” : . . . , ” c i t y ” : ”SEATTLE” , ” z i p ” : ” 9 8 1 0 5 ” ,
” loc ” : { ”y” : 47.663266 , ”x” : 122.302236 } ,
” pop ” : 3 7 1 2 0 , ” s t a t e ” : ”WA” }
” loc ” : { ”y” : 47.684918 , ”x” : 122.296828 } ,
” pop ” : 4 0 4 5 4 , ” s t a t e ” : ”WA” }
” loc ” : { ”y” : 47.630115 , ”x” : 122.297157 } ,
” pop ” : 1 9 7 6 0 , ” s t a t e ” : ”WA” }


geospatial queries, continued continued

> db . z i p s . c o u n t ( { l o c :
{ $within :
{ $box : [ [ 4 6 , 1 2 0 ] , [ 4 8 , 1 2 4 ] ] } } } )
249
> db . z i p s . c o u n t ( { l o c : { $ w i t h i n :
{ $center : [ our loc , 3]}}});
507
> db . z i p s . c o u n t ( { l o c : { $ w i t h i n :
{ $center : [ our loc , 3]}} ,
s t a t e : { $ne : ’WA’ } } ) ;
147


Map/Reduce with geospatial data

Here’s a simple Map/Reduce job that sums up populations by
state.
> f u n c t i o n map1 ( ) { e m i t ( t h i s . s t a t e , t h i s . pop ) ;
}
> f u n c t i o n r e d u c e 1 ( key , v a l u e s ) {
r e t u r n A r r a y . sum ( v a l u e s ) ; }
> db . z i p s . mapReduce ( map1 , r e d u c e 1 ,
{ out : ’ s t a t e s . simple ’ } ) ;
...
> db . s t a t e s . s i m p l e . f i n d O n e ( { i d : ’WA’ } )
{ ” i d ” : ”WA” , ” v a l u e ” : 4866692 }


Map/Reduce with geospatial data, continued

Here’s a slightly more complex map/reduce, that counts up
zipcodes and populations by state.
f u n c t i o n map2 ( ) { e m i t ( t h i s . s t a t e ,
{ pop : t h i s . pop , c o u n t : 1 } ) ;
}
f u n c t i o n r e d u c e 2 ( key , v a l u e s ) {
f o r ( v a r i =1; i <v a l u e s . l e n g t h ; i ++){
v a l u e s [ 0 ] . c o u n t += v a l u e s [ i ] . c o u n t ;
v a l u e s [ 0 ] . pop += v a l u e s [ i ] . pop ;
}
return values [ 0 ] ;
}
db . z i p s . mapReduce ( map2 , r e d u c e 2 ,
{ o u t : ’ s t a t e s . more ’ } ) ;


Map/Reduce with geospatial data, continued continued

> db . s t a t e s . more . f i n d ( ) . s o r t ( { ’ v a l u e . pop ’ : − 1 } )
. limit (3);
{ ” i d ” : ”CA” ,
” v a l u e ” : { ” pop ” : 2 9 7 6 0 0 2 1 , ” c o u n t ” : 1523 } }
{ ” i d ” : ”NY” ,
” v a l u e ” : { ” pop ” : 1 7 9 9 0 4 5 5 , ” c o u n t ” : 1596 } }
{ ” i d ” : ”TX” ,
” v a l u e ” : { ” pop ” : 1 6 9 8 6 5 1 0 , ” c o u n t ” : 1676 } }


Map/Reduce with geospatial data, continued cubed

Finally, a map/reduce job that computes an average population
per zipcode for each state.
f u n c t i o n map3 ( ) { e m i t ( { s t a t e : t h i s . s t a t e ,
city : this . city },
{ pop : t h i s . pop , c o u n t : 1 } ) ; }
f u n c t i o n avg ( key , v a l u e ) {
v a l u e . avg = v a l u e . pop / v a l u e . c o u n t ;
return value ;
}
db . z i p s . mapReduce ( map3 , r e d u c e 2 ,
{ o u t : ’ c i t i e s ’ , f i n a l i z e : avg } ) ;


Map/Reduce with geospatial data, continued4

> db . c i t i e s . f i n d ( { ’ i d . s t a t e ’ : ’WA’ } )
. s o r t ( { ’ v a l u e . pop ’ : − 1 } ) . l i m i t ( 3 ) ;
{ ” i d ” : { ” s t a t e ” : ”WA” , ” c i t y ” : ”SEATTLE” } ,
” v a l u e ” : { ” pop ” : 5 2 0 0 9 6 , ” c o u n t ” : 2 4 ,
” avg ” : 2 1 6 7 0 . 6 6 6 6 6 6 6 6 6 6 6 8 } }
{ ” i d ” : { ” s t a t e ” : ”WA” , ” c i t y ” : ”SPOKANE” } ,
” v a l u e ” : { ” pop ” : 2 8 3 9 8 6 , ” c o u n t ” : 1 2 ,
” avg ” : 2 3 6 6 5 . 5 } }
{ ” i d ” : { ” s t a t e ” : ”WA” , ” c i t y ” : ”TACOMA” } ,
” v a l u e ” : { ” pop ” : 1 9 4 2 8 2 , ” c o u n t ” : 1 3 ,
” avg ” : 1 4 9 4 4 . 7 6 9 2 3 0 7 6 9 2 3 } }


ﬁndAndModify for an atomically incrementing counter

db . c o u n t e r s . s a v e ( { i d : ’ some i d ’ , v a l u e : 0 } ) ;
function get (){
r e t u r n db . c o u n t e r s . f i n d A n d M o d i f y ( {
q u e r y : { i d : ’ some i d ’ } ,
update :{ $inc :{ v a l u e :1}}})
}
> get ()
{ ” i d ” : ” some i d ” , ” v a l u e ” : 0 }
> get ()
{ ” i d ” : ” some i d ” , ” v a l u e ” : 1 }
> get ()
{ ” i d ” : ” some i d ” , ” v a l u e ” : 2 }
> get ()
{ ” i d ” : ” some i d ” , ” v a l u e ” : 3 }


Cool Features Presentation at Mongo Seattle

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (17)

Mehr von MongoDB

Mehr von MongoDB (20)

Cool Features Presentation at Mongo Seattle