Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
RHadoop, Hadoop for R
Scholarly Activity 05-09 change  50%37.5%  25%12.5%   0%-12.5% -25%-37.5%         R        SAS       SPSS     S-Plus    St...
Scholarly Activity 05-09 change                                                 50%                                       ...
Scholarly Activity 05-09 change                                                 50%                                       ...
David Champagne, CTO
f s    h dr
rh d f srhb      ase
rh d f srhb      ase      rm  r
rmr
sapply(data, function)mapreduce(data, map = function)
library(rmr)mapreduce(…)
Rmr
Rmr      Java, C++
Rmr                  Cascading,      Java, C++                   Crunch
Rmr, Rhipe, Dumbo,Rmr Pydoop, Hadoopy                     Cascading,        Java, C++                      Crunch
Rmr, Rhipe, Dumbo,Rmr Pydoop, Hadoopy                     Cascading,        Java, C++                      Crunch
Expose MR   Hide MRRmr, Rhipe, Dumbo,Rmr Pydoop, Hadoopy                               Cascading,        Java, C++        ...
Expose MR   Hide MR                               Hive, PigRmr, Rhipe, Dumbo,Rmr Pydoop, Hadoopy                          ...
Expose MR   Hide MR                               Hive, PigRmr, Rhipe, Dumbo,Rmr                            Cascalog, Pydo...
mapreduce(input, output, map, reduce)
mapreduce(input, output, map, reduce)
mapreduce(input, output, map, reduce)
mapreduce(input, output, map, reduce)
mapreduce(input, output, map, reduce)
mapreduce(input, output, map, reduce)
map = function(k, v) if (hash(k) %% 10 == 0)keyval(k, v)    reduce = function(k, vv) keyval(k, length(vv))
map = function(k, v) if (hash(k) %% 10 == 0)keyval(k, v)    reduce = function(k, vv) keyval(k, length(vv))
map = function(k, v) if (hash(k) %% 10 == 0)keyval(k, v)    reduce = function(k, vv) keyval(k, length(vv))
map = function(k, v) if (hash(k) %% 10 == 0)keyval(k, v)    reduce = function(k, vv) keyval(k, length(vv))
map = function(k, v) if (hash(k) %% 10 == 0)keyval(k, v)    reduce = function(k, vv) keyval(k, length(vv))
map = function(k, v) if (hash(k) %% 10 == 0)keyval(k, v)    reduce = function(k, vv) keyval(k, length(vv))
map = function(k, v) if (hash(k) %% 10 == 0)keyval(k, v)    reduce = function(k, vv) keyval(k, length(vv))
map = function(k, v) if (hash(k) %% 10 == 0)keyval(k, v)    reduce = function(k, vv) keyval(k, length(vv))
condition = function(x) x > 10
condition = function(x) x > 10out = mapreduce(
condition = function(x) x > 10out = mapreduce(        input = input,
condition = function(x) x > 10out = mapreduce(        input = input,        map = function(k,v)
condition = function(x) x > 10out = mapreduce(        input = input,        map = function(k,v)                 if (condit...
condition = function(x) x > 10out = mapreduce(        input = input,        map = function(k,v)                 if (condit...
x = from.dfs(hdfs.object)hdfs.object = to.dfs(x)
INSERT OVERWRITE TABLE pv_gender_sumSELECT pv_users.gender, count (DISTINCT pv_users.userid)FROM pv_usersGROUP BY pv_users...
INSERT OVERWRITE TABLE pv_gender_sumSELECT pv_users.gender, count (DISTINCT pv_users.userid)FROM pv_usersGROUP BY pv_users...
kmeans =  function(points, ncenters, iterations = 10,           distfun = function(a,b) norm(as.matrix(a-b), type = F)) { ...
kmeans =  function(points, ncenters, iterations = 10,           distfun = function(a,b) norm(as.matrix(a-b), type = F)) { ...
kmeans =  function(points, ncenters, iterations = 10,           distfun = function(a,b) norm(as.matrix(a-b), type = F)) { ...
kmeans =  function(points, ncenters, iterations = 10,           distfun = function(a,b) norm(as.matrix(a-b), type = F)) { ...
kmeans =  function(points, ncenters, iterations = 10,           distfun = function(a,b) norm(as.matrix(a-b), type = F)) { ...
#!/usr/bin/pythonimport sysfrom math import fabsfrom org.apache.pig.scripting import Pigfilename = "student.txt"k = 4toler...
if results.isSuccessful() == "FAILED":        raise "Pig job failed"    iter = results.result("result").iterator()    cent...
import java.io.IOException;import org.apache.pig.EvalFunc;import org.apache.pig.data.Tuple;public class FindCentroid exten...
mapreduce(mapreduce(…
mapreduce(mapreduce(…mapreduce(input = c(input1, input2), …)
mapreduce(mapreduce(…mapreduce(input = c(input1, input2), …)equijoin = function(    left.input, right.input, input,    out...
out1 = mapreduce(…)mapreduce(input = out1, <xyz>)mapreduce(input = out1, <abc>)
out1 = mapreduce(…)mapreduce(input = out1, <xyz>)mapreduce(input = out1, <abc>)abstract.job = function(input, output, …) {...
input.format, output.format, format
input.format, output.format, formatcombine
input.format, output.format, formatcombinereduce.on.data.frame
input.format, output.format, formatcombinereduce.on.data.framelocal, hadoop backends
input.format, output.format, formatcombinereduce.on.data.framelocal, hadoop backendsbackend.parameters
input.format, output.format, formatcombinereduce.on.data.framelocal, hadoop backendsbackend.parametersprofiling
input.format, output.format, formatcombinereduce.on.data.framelocal, hadoop backendsbackend.parametersprofilingverbose
RHADOOP USERONE FAT CLUSTER AVE.HYDROPOWER CITY, OR 0x0000             RHADOOP@      REVOLUTIONANALYTICS.COM
RHadoop, R meets Hadoop
RHadoop, R meets Hadoop
RHadoop, R meets Hadoop
RHadoop, R meets Hadoop
RHadoop, R meets Hadoop
RHadoop, R meets Hadoop
RHadoop, R meets Hadoop
RHadoop, R meets Hadoop
RHadoop, R meets Hadoop
RHadoop, R meets Hadoop
RHadoop, R meets Hadoop
RHadoop, R meets Hadoop
RHadoop, R meets Hadoop
Nächste SlideShare
Wird geladen in …5
×

RHadoop, R meets Hadoop

97.737 Aufrufe

Veröffentlicht am

(Presented by Antonio Piccolboni to Strata 2012 Conference, Feb 29 2012).

Rhadoop is an open source project spearheaded by Revolution Analytics to grant data scientists access to Hadoop’s scalability from their favorite language, R. RHadoop is comprised of three packages.

- rhdfs provides file level manipulation for HDFS, the Hadoop file system
- rhbase provides access to HBASE, the hadoop database
- rmr allows to write mapreduce programs in R

rmr allows R developers to program in the mapreduce framework, and to all developers provides an alternative way to implement mapreduce programs that strikes a delicate compromise betwen power and usability. It allows to write general mapreduce programs, offering the full power and ecosystem of an existing, established programming language. It doesn’t force you to replace the R interpreter with a special run-time—it is just a library. You can write logistic regression in half a page and even understand it. It feels and behaves almost like the usual R iteration and aggregation primitives. It is comprised of a handful of functions with a modest number of arguments and sensible defaults that combine in many useful ways. But there is no way to prove that an API works: one can only show examples of what it allows to do and we will do that covering a few from machine learning and statistics. Finally, we will discuss how to get involved.

Veröffentlicht in: Technologie, News & Politik
  • DOWNLOAD THAT BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download Full EPUB Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download Full doc Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download PDF EBOOK here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download EPUB Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download doc Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book that can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer that is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story That Helped Ignite a Movement,-- Atomic Habits: An Easy &amp; Proven Way to Build Good Habits &amp; Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money That the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths that Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • ACCESS that WEBSITE Over for All Ebooks (Unlimited) ......................................................................................................................... DOWNLOAD FULL PDF EBOOK here { http://bit.ly/2m77EgH } ......................................................................................................................... DOWNLOAD FULL EPUB Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... Download Full PDF EBOOK here { http://bit.ly/2m77EgH } ......................................................................................................................... Download EPUB Ebook here { http://bit.ly/2m77EgH }
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (Unlimited) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... Download Full EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... ACCESS WEBSITE for All Ebooks ......................................................................................................................... Download Full PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... Download EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... Download doc Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier

RHadoop, R meets Hadoop

  1. 1. RHadoop, Hadoop for R
  2. 2. Scholarly Activity 05-09 change 50%37.5% 25%12.5% 0%-12.5% -25%-37.5% R SAS SPSS S-Plus Stata
  3. 3. Scholarly Activity 05-09 change 50% 37.5% 25% 12.5% Packages 0%10000 -12.5% -25% 1000 -37.5% R SAS SPSS S-Plus Stata 100 10 1 2002 2004 2006 2008 2010
  4. 4. Scholarly Activity 05-09 change 50% 37.5% 25% 12.5% Packages 0%10000 -12.5% -25% 1000 -37.5% R SAS SPSS S-Plus Stata 100 10 http://r4stats.com/popularity 1 2002 2004 2006 2008 2010
  5. 5. David Champagne, CTO
  6. 6. f s h dr
  7. 7. rh d f srhb ase
  8. 8. rh d f srhb ase rm r
  9. 9. rmr
  10. 10. sapply(data, function)mapreduce(data, map = function)
  11. 11. library(rmr)mapreduce(…)
  12. 12. Rmr
  13. 13. Rmr Java, C++
  14. 14. Rmr Cascading, Java, C++ Crunch
  15. 15. Rmr, Rhipe, Dumbo,Rmr Pydoop, Hadoopy Cascading, Java, C++ Crunch
  16. 16. Rmr, Rhipe, Dumbo,Rmr Pydoop, Hadoopy Cascading, Java, C++ Crunch
  17. 17. Expose MR Hide MRRmr, Rhipe, Dumbo,Rmr Pydoop, Hadoopy Cascading, Java, C++ Crunch
  18. 18. Expose MR Hide MR Hive, PigRmr, Rhipe, Dumbo,Rmr Pydoop, Hadoopy Cascading, Java, C++ Crunch
  19. 19. Expose MR Hide MR Hive, PigRmr, Rhipe, Dumbo,Rmr Cascalog, Pydoop, Hadoopy Scalding, Scrunch Cascading, Java, C++ Crunch
  20. 20. mapreduce(input, output, map, reduce)
  21. 21. mapreduce(input, output, map, reduce)
  22. 22. mapreduce(input, output, map, reduce)
  23. 23. mapreduce(input, output, map, reduce)
  24. 24. mapreduce(input, output, map, reduce)
  25. 25. mapreduce(input, output, map, reduce)
  26. 26. map = function(k, v) if (hash(k) %% 10 == 0)keyval(k, v) reduce = function(k, vv) keyval(k, length(vv))
  27. 27. map = function(k, v) if (hash(k) %% 10 == 0)keyval(k, v) reduce = function(k, vv) keyval(k, length(vv))
  28. 28. map = function(k, v) if (hash(k) %% 10 == 0)keyval(k, v) reduce = function(k, vv) keyval(k, length(vv))
  29. 29. map = function(k, v) if (hash(k) %% 10 == 0)keyval(k, v) reduce = function(k, vv) keyval(k, length(vv))
  30. 30. map = function(k, v) if (hash(k) %% 10 == 0)keyval(k, v) reduce = function(k, vv) keyval(k, length(vv))
  31. 31. map = function(k, v) if (hash(k) %% 10 == 0)keyval(k, v) reduce = function(k, vv) keyval(k, length(vv))
  32. 32. map = function(k, v) if (hash(k) %% 10 == 0)keyval(k, v) reduce = function(k, vv) keyval(k, length(vv))
  33. 33. map = function(k, v) if (hash(k) %% 10 == 0)keyval(k, v) reduce = function(k, vv) keyval(k, length(vv))
  34. 34. condition = function(x) x > 10
  35. 35. condition = function(x) x > 10out = mapreduce(
  36. 36. condition = function(x) x > 10out = mapreduce( input = input,
  37. 37. condition = function(x) x > 10out = mapreduce( input = input, map = function(k,v)
  38. 38. condition = function(x) x > 10out = mapreduce( input = input, map = function(k,v) if (condition(v)) keyval(k,v))
  39. 39. condition = function(x) x > 10out = mapreduce( input = input, map = function(k,v) if (condition(v)) keyval(k,v))
  40. 40. x = from.dfs(hdfs.object)hdfs.object = to.dfs(x)
  41. 41. INSERT OVERWRITE TABLE pv_gender_sumSELECT pv_users.gender, count (DISTINCT pv_users.userid)FROM pv_usersGROUP BY pv_users.gender;
  42. 42. INSERT OVERWRITE TABLE pv_gender_sumSELECT pv_users.gender, count (DISTINCT pv_users.userid)FROM pv_usersGROUP BY pv_users.gender;mapreduce(input = mapreduce(input = "pv_users", map = function(k, v) keyval(v[userid], v[gender]), reduce = function(uid, genders) lapply(unique(genders), function(g) keyval(NULL, g)), output = "pv_gender_sum", map = function(x, gender) keyval(gender, 1) reduce = function(gender,counts) keyval(k,sum(unlist(counts)))
  43. 43. kmeans = function(points, ncenters, iterations = 10, distfun = function(a,b) norm(as.matrix(a-b), type = F)) { newCenters = kmeans.iter(points, distfun, ncenters = ncenters) for(i in 1:iterations) { newCenters = kmeans.iter(points, distfun, centers = newCenters)} newCenters}
  44. 44. kmeans = function(points, ncenters, iterations = 10, distfun = function(a,b) norm(as.matrix(a-b), type = F)) { newCenters = kmeans.iter(points, distfun, ncenters = ncenters) for(i in 1:iterations) { newCenters = kmeans.iter(points, distfun, centers = newCenters)} newCenters}kmeans.iter = function(points, distfun, ncenters = dim(centers)[1], centers = NULL) { from.dfs( mapreduce( input = points, map = if (is.null(centers)) { function(k,v) keyval(sample(1:ncenters,1),v)} else { function(k,v) { distances = apply(centers, 1, function(c) distfun(c,v)) keyval(centers[which.min(distances),], v)}}, reduce = function(k,vv) keyval(NULL, apply(do.call(rbind, vv), 2, mean))), to.data.frame = T)}
  45. 45. kmeans = function(points, ncenters, iterations = 10, distfun = function(a,b) norm(as.matrix(a-b), type = F)) { newCenters = kmeans.iter(points, distfun, ncenters = ncenters) for(i in 1:iterations) { newCenters = kmeans.iter(points, distfun, centers = newCenters)} newCenters}kmeans.iter = function(points, distfun, ncenters = dim(centers)[1], centers = NULL) { from.dfs( mapreduce( input = points, map = if (is.null(centers)) { function(k,v) keyval(sample(1:ncenters,1),v)} else { function(k,v) { distances = apply(centers, 1, function(c) distfun(c,v)) keyval(centers[which.min(distances),], v)}}, reduce = function(k,vv) keyval(NULL, apply(do.call(rbind, vv), 2, mean))), to.data.frame = T)}
  46. 46. kmeans = function(points, ncenters, iterations = 10, distfun = function(a,b) norm(as.matrix(a-b), type = F)) { newCenters = kmeans.iter(points, distfun, ncenters = ncenters) for(i in 1:iterations) { newCenters = kmeans.iter(points, distfun, centers = newCenters)} newCenters}kmeans.iter = function(points, distfun, ncenters = dim(centers)[1], centers = NULL) { from.dfs( mapreduce( input = points, map = if (is.null(centers)) { function(k,v) keyval(sample(1:ncenters,1),v)} else { function(k,v) { distances = apply(centers, 1, function(c) distfun(c,v)) keyval(centers[which.min(distances),], v)}}, reduce = function(k,vv) keyval(NULL, apply(do.call(rbind, vv), 2, mean))), to.data.frame = T)}
  47. 47. kmeans = function(points, ncenters, iterations = 10, distfun = function(a,b) norm(as.matrix(a-b), type = F)) { newCenters = kmeans.iter(points, distfun, ncenters = ncenters) for(i in 1:iterations) { newCenters = kmeans.iter(points, distfun, centers = newCenters)} newCenters}kmeans.iter = function(points, distfun, ncenters = dim(centers)[1], centers = NULL) { from.dfs( mapreduce( input = points, map = if (is.null(centers)) { function(k,v) keyval(sample(1:ncenters,1),v)} else { function(k,v) { distances = apply(centers, 1, function(c) distfun(c,v)) keyval(centers[which.min(distances),], v)}}, reduce = function(k,vv) keyval(NULL, apply(do.call(rbind, vv), 2, mean))), to.data.frame = T)}
  48. 48. #!/usr/bin/pythonimport sysfrom math import fabsfrom org.apache.pig.scripting import Pigfilename = "student.txt"k = 4tolerance = 0.01MAX_SCORE = 4MIN_SCORE = 0MAX_ITERATION = 100# initial centroid, equally divide the spaceinitial_centroids = ""last_centroids = [None] * kfor i in range(k): last_centroids[i] = MIN_SCORE + float(i)/k*(MAX_SCORE-MIN_SCORE) initial_centroids = initial_centroids + str(last_centroids[i]) if i!=k-1: initial_centroids = initial_centroids + ":"P = Pig.compile("""register udf.jar DEFINE find_centroid FindCentroid($centroids); raw = load student.txt as (name:chararray, age:int, gpa:double); centroided = foreach raw generate gpa, find_centroid(gpa) as centroid; grouped = group centroided by centroid; result = foreach grouped generate group, AVG(centroided.gpa); store result into output; """)converged = Falseiter_num = 0while iter_num<MAX_ITERATION: Q = P.bind({centroids:initial_centroids}) results = Q.runSingle()
  49. 49. if results.isSuccessful() == "FAILED": raise "Pig job failed" iter = results.result("result").iterator() centroids = [None] * k distance_move = 0 # get new centroid of this iteration, caculate the moving distance with last iteration for i in range(k): tuple = iter.next() centroids[i] = float(str(tuple.get(1))) distance_move = distance_move + fabs(last_centroids[i]-centroids[i]) distance_move = distance_move / k; Pig.fs("rmr output") print("iteration " + str(iter_num)) print("average distance moved: " + str(distance_move)) if distance_move<tolerance: sys.stdout.write("k-means converged at centroids: [") sys.stdout.write(",".join(str(v) for v in centroids)) sys.stdout.write("]n") converged = True break last_centroids = centroids[:] initial_centroids = "" for i in range(k): initial_centroids = initial_centroids + str(last_centroids[i]) if i!=k-1: initial_centroids = initial_centroids + ":" iter_num += 1if not converged: print("not converge after " + str(iter_num) + " iterations") sys.stdout.write("last centroids: [") sys.stdout.write(",".join(str(v) for v in last_centroids)) sys.stdout.write("]n")
  50. 50. import java.io.IOException;import org.apache.pig.EvalFunc;import org.apache.pig.data.Tuple;public class FindCentroid extends EvalFunc<Double> { double[] centroids; public FindCentroid(String initialCentroid) { String[] centroidStrings = initialCentroid.split(":"); centroids = new double[centroidStrings.length]; for (int i=0;i<centroidStrings.length;i++) centroids[i] = Double.parseDouble(centroidStrings[i]); } @Override public Double exec(Tuple input) throws IOException { double min_distance = Double.MAX_VALUE; double closest_centroid = 0; for (double centroid : centroids) { double distance = Math.abs(centroid - (Double)input.get(0)); if (distance < min_distance) { min_distance = distance; closest_centroid = centroid; } } return closest_centroid; }}
  51. 51. mapreduce(mapreduce(…
  52. 52. mapreduce(mapreduce(…mapreduce(input = c(input1, input2), …)
  53. 53. mapreduce(mapreduce(…mapreduce(input = c(input1, input2), …)equijoin = function( left.input, right.input, input, output, outer, map.left, map.right, reduce, reduce.all)
  54. 54. out1 = mapreduce(…)mapreduce(input = out1, <xyz>)mapreduce(input = out1, <abc>)
  55. 55. out1 = mapreduce(…)mapreduce(input = out1, <xyz>)mapreduce(input = out1, <abc>)abstract.job = function(input, output, …) { … result = mapreduce(input = input, output = output) … result}
  56. 56. input.format, output.format, format
  57. 57. input.format, output.format, formatcombine
  58. 58. input.format, output.format, formatcombinereduce.on.data.frame
  59. 59. input.format, output.format, formatcombinereduce.on.data.framelocal, hadoop backends
  60. 60. input.format, output.format, formatcombinereduce.on.data.framelocal, hadoop backendsbackend.parameters
  61. 61. input.format, output.format, formatcombinereduce.on.data.framelocal, hadoop backendsbackend.parametersprofiling
  62. 62. input.format, output.format, formatcombinereduce.on.data.framelocal, hadoop backendsbackend.parametersprofilingverbose
  63. 63. RHADOOP USERONE FAT CLUSTER AVE.HYDROPOWER CITY, OR 0x0000 RHADOOP@ REVOLUTIONANALYTICS.COM

×