SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Functional Ideas for
a Cloudy Future
              Richard Minerich
              @Rickasaurus
              Senior Researcher
              at Bayard Rock
Functional Programming?




+                 =
Properties of FP?
           (It Depends on Who You Ask)

- First Class Functions
- Currying, Composition, Combinators
- Low Level Abstraction, Metaprogramming
- Immutability, Fancy Types, Constraints
- Fast Tail Recursion, Scope Minimization
The Spectrum of Functional




Convenience “FP” Constraints
Make Life Easy Now   Make Life Easy Later
Referential Transparency
- It's all about scope!
- Mutation only infects in so far as it’s scope
- Global variables can be ok, if your referential
   transparency scope is a process
- This can be function, class, thread, process, or
   even a whole computer
What is Functional Programming?
- Complementary convenience and constraints
- A highly constrained set of approaches to
  programing
- Where you lose in order to gain
- Low level constraints that propagate upwards
  to the top level of your program
Program Scope over my Career
• Largest scope was usually a process with one
  thread
• Then a process with a few threads
• Then a process with many threads
• Then a few machines
• Now a ton of machines
We need to scale out
• Desktop apps are going away
• Hosted hardware is on the way out
• No one cares about little data
                   - But! -
• Old algorithms don’t generalize well
• New tradeoffs between speed and scope
• Too many costs to keep track of
Thinking about Resource Costs

Far Machines      Far Network
    Machines       Network
     Processes      Disk
        Threads      Memory
     Instructions     Cache
What is Cloud Computing?
- More than just a sneaky way to charge a ton
  for hosting

- Paradigms that simply resource management
- You always lose in order to gain
- High level constraints that propagate
  downward into your subtasks
Papers Published Over Time
(Microsoft Academic Search April 2012)


                                         “Cloud Computing”



  “Type System”
Properties of Cloud Computing
- Resources (Network, Disk, Memory, Cache)
- What constraints can make this easier?
  - Force everything into one of a few styles of
    computation?
  - What if want we want to do is still possible but
    doesn't fit our cluster’s paradigm?
  - Where's the escape hatch?
Cloud Computing Methodologies
     (Warning, Gross oversimplifications ahead!)

- MPI (Fixed Processes)
     (OpenMPI, Tempest)
- Agents (Dynamic Processes)
     (Erlang, Parallel Haskell, Akka)
- MapReduce       (More Like Collect-GroupBy-Fold)
     (Hadoop, Google)
- Others/Hybrids
     (Iterative Map-Reduce, Mesos/Spark)
From: Flexible and Efficient Distributed Resolution of Large Entities
(Molnar, et al.)
This is Word Count.
Seriously.
63 Lines!
Hadoop
           (without losing your mind)




-   Pig/Hive if your problem is simple
-   Scoobi with Scala
-   Scalding (on Cascading) on Scala
-   F# + .NET API once Microsoft Ships
Scoobi
Scalding
From the Pangool Website: http://pangool.net/benchmark.html
Which to Pick?
MPI/Agents =
 Difficult to get right, Extremely Powerful
MapReduce =
 Limiting, Easier to use, Robust to failures
Middle Road =
 Iterative MapReduce, Mesos/Spark
Mesos: You don’t have to choose




Spark
This is Word Count.
Seriously!?
10 Lines.
That’s 53 Less
or ~15%
Cloud Computing is
       Functional Programming




- Can’t Escape Referential Transparency
- Simple Composition is Key to Small Programs
- Object Oriented: a Square Peg in a Round Hole
Thanks for Listening!
            Any Questions?
Visit my blog for ants and rants:
  RichardMinerich.com
Follow me on Twitter:
  @Rickasaurus
Come to NYC for the SkillsMatter F# Tutorials
  June 5th and 6Th: is.gd/fsharptutorials

Weitere ähnliche Inhalte

Ähnlich wie Functional Ideas for a Cloudy Future

Next Generation of Hadoop MapReduce
Next Generation of Hadoop MapReduceNext Generation of Hadoop MapReduce
Next Generation of Hadoop MapReduce
huguk
 

Ähnlich wie Functional Ideas for a Cloudy Future (20)

Spark
SparkSpark
Spark
 
SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 
Datacenter Computing with Apache Mesos - BigData DC
Datacenter Computing with Apache Mesos - BigData DCDatacenter Computing with Apache Mesos - BigData DC
Datacenter Computing with Apache Mesos - BigData DC
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scale
 
Next Generation of Hadoop MapReduce
Next Generation of Hadoop MapReduceNext Generation of Hadoop MapReduce
Next Generation of Hadoop MapReduce
 
Dori Exterman, Considerations for choosing the parallel computing strategy th...
Dori Exterman, Considerations for choosing the parallel computing strategy th...Dori Exterman, Considerations for choosing the parallel computing strategy th...
Dori Exterman, Considerations for choosing the parallel computing strategy th...
 
Introduction to Distributed Computing Engines for Data Processing - Simone Ro...
Introduction to Distributed Computing Engines for Data Processing - Simone Ro...Introduction to Distributed Computing Engines for Data Processing - Simone Ro...
Introduction to Distributed Computing Engines for Data Processing - Simone Ro...
 
Machine Learning with Spark
Machine Learning with SparkMachine Learning with Spark
Machine Learning with Spark
 
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
 
Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?
 
Big Data and Hadoop in Cloud - Leveraging Amazon EMR
Big Data and Hadoop in Cloud - Leveraging Amazon EMRBig Data and Hadoop in Cloud - Leveraging Amazon EMR
Big Data and Hadoop in Cloud - Leveraging Amazon EMR
 
Distributed Computing & MapReduce
Distributed Computing & MapReduceDistributed Computing & MapReduce
Distributed Computing & MapReduce
 
Big Data! Great! Now What? #SymfonyCon 2014
Big Data! Great! Now What? #SymfonyCon 2014Big Data! Great! Now What? #SymfonyCon 2014
Big Data! Great! Now What? #SymfonyCon 2014
 
Performance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ ApplicationsPerformance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ Applications
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutions
 
Role of python in hpc
Role of python in hpcRole of python in hpc
Role of python in hpc
 
Designing for the Cloud Tutorial - QCon SF 2009
Designing for the Cloud Tutorial - QCon SF 2009Designing for the Cloud Tutorial - QCon SF 2009
Designing for the Cloud Tutorial - QCon SF 2009
 
Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big Data
 
Map reducecloudtech
Map reducecloudtechMap reducecloudtech
Map reducecloudtech
 

Mehr von Richard Minerich (6)

How we use functional programming to find the bad guys @ Build Stuff LT and U...
How we use functional programming to find the bad guys @ Build Stuff LT and U...How we use functional programming to find the bad guys @ Build Stuff LT and U...
How we use functional programming to find the bad guys @ Build Stuff LT and U...
 
GHCi: More Awesome Than You Thought
GHCi: More Awesome Than You ThoughtGHCi: More Awesome Than You Thought
GHCi: More Awesome Than You Thought
 
F# and the DLR
F# and the DLRF# and the DLR
F# and the DLR
 
Fun and Games in F#
Fun and Games in F#Fun and Games in F#
Fun and Games in F#
 
Getting the MVVM Kicked Out of Your F#'n Monads
Getting the MVVM Kicked Out of Your F#'n MonadsGetting the MVVM Kicked Out of Your F#'n Monads
Getting the MVVM Kicked Out of Your F#'n Monads
 
How you can get started with F# today
How you can get started with F# todayHow you can get started with F# today
How you can get started with F# today
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 

Functional Ideas for a Cloudy Future

  • 1. Functional Ideas for a Cloudy Future Richard Minerich @Rickasaurus Senior Researcher at Bayard Rock
  • 2.
  • 4. Properties of FP? (It Depends on Who You Ask) - First Class Functions - Currying, Composition, Combinators - Low Level Abstraction, Metaprogramming - Immutability, Fancy Types, Constraints - Fast Tail Recursion, Scope Minimization
  • 5. The Spectrum of Functional Convenience “FP” Constraints Make Life Easy Now Make Life Easy Later
  • 6. Referential Transparency - It's all about scope! - Mutation only infects in so far as it’s scope - Global variables can be ok, if your referential transparency scope is a process - This can be function, class, thread, process, or even a whole computer
  • 7. What is Functional Programming? - Complementary convenience and constraints - A highly constrained set of approaches to programing - Where you lose in order to gain - Low level constraints that propagate upwards to the top level of your program
  • 8. Program Scope over my Career • Largest scope was usually a process with one thread • Then a process with a few threads • Then a process with many threads • Then a few machines • Now a ton of machines
  • 9.
  • 10.
  • 11. We need to scale out • Desktop apps are going away • Hosted hardware is on the way out • No one cares about little data - But! - • Old algorithms don’t generalize well • New tradeoffs between speed and scope • Too many costs to keep track of
  • 12. Thinking about Resource Costs Far Machines Far Network Machines Network Processes Disk Threads Memory Instructions Cache
  • 13. What is Cloud Computing? - More than just a sneaky way to charge a ton for hosting - Paradigms that simply resource management - You always lose in order to gain - High level constraints that propagate downward into your subtasks
  • 14. Papers Published Over Time (Microsoft Academic Search April 2012) “Cloud Computing” “Type System”
  • 15. Properties of Cloud Computing - Resources (Network, Disk, Memory, Cache) - What constraints can make this easier? - Force everything into one of a few styles of computation? - What if want we want to do is still possible but doesn't fit our cluster’s paradigm? - Where's the escape hatch?
  • 16. Cloud Computing Methodologies (Warning, Gross oversimplifications ahead!) - MPI (Fixed Processes) (OpenMPI, Tempest) - Agents (Dynamic Processes) (Erlang, Parallel Haskell, Akka) - MapReduce (More Like Collect-GroupBy-Fold) (Hadoop, Google) - Others/Hybrids (Iterative Map-Reduce, Mesos/Spark)
  • 17. From: Flexible and Efficient Distributed Resolution of Large Entities (Molnar, et al.)
  • 18. This is Word Count. Seriously. 63 Lines!
  • 19. Hadoop (without losing your mind) - Pig/Hive if your problem is simple - Scoobi with Scala - Scalding (on Cascading) on Scala - F# + .NET API once Microsoft Ships
  • 22. From the Pangool Website: http://pangool.net/benchmark.html
  • 23. Which to Pick? MPI/Agents = Difficult to get right, Extremely Powerful MapReduce = Limiting, Easier to use, Robust to failures Middle Road = Iterative MapReduce, Mesos/Spark
  • 24. Mesos: You don’t have to choose Spark
  • 25.
  • 26. This is Word Count. Seriously!?
  • 27. 10 Lines. That’s 53 Less or ~15%
  • 28. Cloud Computing is Functional Programming - Can’t Escape Referential Transparency - Simple Composition is Key to Small Programs - Object Oriented: a Square Peg in a Round Hole
  • 29. Thanks for Listening! Any Questions? Visit my blog for ants and rants: RichardMinerich.com Follow me on Twitter: @Rickasaurus Come to NYC for the SkillsMatter F# Tutorials June 5th and 6Th: is.gd/fsharptutorials

Hinweis der Redaktion

  1. We do research and development on anti-money laundering and some of the largest banks in the world use products we made every day.
  2. How many of you loved legos when you were kids? Isn’t this what we really want programming to be like? A big pile of little parts, it’s obvious how they work but we don’t want to have to make them all from scratch.I loved legos as a child. They’re simple and intuitive but you can compose them into amazing things.If you had to make the legos yourself from smaller pieces they would be tedious. But big blocks like duplos severely constrain your imagination. Legos sit at just the right level of abstraction.Learning to program in BASIC was more of a challenge though, it ended up being more like weaving and less like building with legos. Tragically, I didn’t discover functional programming until I was almost 30.
  3. At one time I was an imperative guy who wrote imagine processing code in C++ and C#. Those were hard times, full of null exceptions and race conditions. It could take several months to make a product releasable with a sizable team.Then I went to a talk by Rick Hickey and he showed me justhow awesome FP can be.After I learned functional programming in F# my productivity Skyrocketed, bugs disappeared, I was able to make much cooler stuff in a much shorter time. I even had more time because the old stuff required less maintenance. - More than quadrupled my productivityNow I spend all that newly found free time giving talks and arguing with object oriented programmers on the internet .
  4. Here’s where it gets tricky. Just what is functional programming? Well it depends on who you ask. Python users will tell you it’s something that comes in a module, while Haskell programmers will tell you that just about everyone else is faking it. Really, it’s more of a spectrum where as you get more and more functional you gain more and more benefits but also have to give up some things along the way.
  5. \\FP is the intersection of some set of convenience features and some set of constraintsThe more convenience features you have, like nice tuple syntax or comprehensions the easier stuff is to get done fastThe most constraint features you have, like immutability and fancy type, the easier it is to revisit and refactor laterThis isn’t purely true though, for example you may find you need to write fewer tests with fancy types.The way I see it, the more your conveniences share with your constraints the more “functional” your language is.As you squish these two Venn diagrams together more and more of the features on either side complement each other. The constraint features keep you protected from the convenience features getting out of hand.The convenience feature keep the constraints from slowing down your productivity. Think of Python as when there’s just a tiny bit of overlap and Haskell as when they’re almost completely overlapping. Everyone else lands somewhere in the middle. F# and Haskell closer to squished. C#, Javascript and Ruby a bit further out.
  6. Now I could spend days telling you all about functional programming, but there’s one idea in FP that I would say is the most important. That idea is referential transparency.All referential transparency means is that from here I can understand what all the stuff in scope does. It’s all deterministic. For a given input, you’ll always get the same output (unless there’s something like hardware failure).The most interesting thing about referential transparency is that it doesn’t need to hold for your entire program to hold for most of it. You can write that algorithm you know is fast imperatively in that style and if you wrap it intelligently it’s just useful from the outside as if it was done with pure functional programming.But you do lose some confidence about the properties of that function.
  7. As a fuzzy definition – complementary set of convenience features and constraintsFor example, if you know how all the code underneath you works it makes it much easier to ensure safety at a higher level.Constraints make it much easier to think about what your program is doing.
  8. We’re all being dragged along, some faster than others depending on the kind of work we do.In fact, some older programming languages like Python and OCaml were effectively crippled by decisions they made years ago when it looked like we could scale on one CPU forever. In both languages the root of this problem is called “The giant lock”. I hope as we move to even larger scopes those locks become somewhat irrelevant as the cost of having many processes becomes dwarfed by other factors.
  9. Source: http://www.indybay.org/newsitems/2006/05/18/18240941.phpThis is the mandatory Moore's law slide before talking about cloud computing. All aspects of computing will eventually end up looking like the graph on the right. Strange that computation peaked out way before storage and memory, but that’s just how it ended up. Now we’re left with having to find interesting ways to deal with it.
  10. (Raise hands if you’ve seen this slide before)However, Something like Moore’s Law still lives, for now. As you’ve probably heard we’re still gaining ground on the power efficiency front. We can scale out instead of up.
  11. You really can’t escape it, tablets are just the beginning and desktop computers as we know them are on the way out for most people. The unfortunate part of this whole deal is that we can’t apply most of our work directly in any of the scaled out models. So we’re stuck in a world that boldly marches on, dragging us kicking and screaming into a much harder way of doing things. There’s a lot of new things to consider in this brave new world beyond clock cycles.
  12. This is a complete conceptual hazard. It’s really hard to keep all of this in your head at once and still come up with solutions to interesting problems. Each of these things have different sub properties to consider in different situations as well. For example, sometimes just network bandwidth matters, sometimes latency, sometimes both.Getting things done in the global network, we’re going to need a way to reason about these things without having to keep them all in our head at once.
  13. At first, I was convinced that cloud hosting in general was pretty much a big scam, but as the data has grown bigger I’ve seen the error of my ways. Often you don’t need these computers for very long. Maintaining the cluster For many problems you can get near linear scaling, so it’s pretty awesome to be able to fire up a ton of instances, in general this costs about the same as using fewer computers They provide great frameworks and tools for thinking about these kinds of problems that reduce the need to worry about resources so much. They allow you to think in more general terms (like O notation) With each methodology you take on some communication constraints in order to make problems easier to think about While type systems are like constraining a floor that you can’t fall through, cloud computing methodologies in general are more like a ceiling that dictate how parts of your program are combined
  14. In the past three years the number of papers published on algorithms in the cloud has skyrocketed.Rich Hickey – working on datanomicsSimon Payton-Jones – working on parallel haskellWhy? Because it’s hugely useful for solving hard problems and computers just aren’t getting much faster. And there’s the fact that the amount of data lying around is skyrocketing as our storage capabilities continue to increase. What are we going to do with all that data?
  15. MapReduce - You can only do two things, and in this order MPI/Agents – more about how you communicate Generalizations of mapreduceSo what do we do? We force you into one of several choices of computing methodology. Each has different constraints. With multi-paradigm problems you can often fake it for smaller data sets, but as they grow it becomes more and more important to be flexible. - Finally, unlike functional programming, there is no magic escape hatch. Calling a C library is no longer the answer to all of life’s performance problems.
  16. Whirlwind tour!When you first get the cloud computing bug it can all be a bit overwhelming. There’s just a ton of frameworks which each sport very nice benchmarks for the things they choose to benchmark on. Some are mature and others are small projects. They all have limitations and cohorts of enthusiastic followers. You may be familiar with some of these but we’re just going to focus on a few.MPI – Centrally controlled Agents – Can launch each otherMapReduce – Very constraining but people always break the rulesIterative Map-Reduce (Academic/unpolished)Spark – I see as the future of cloud computing, constraining but not so much that you must constantly break the rules
  17. This research was done with 14 off the shelf computers put together by college students in Hungary. It’s one of the first examples on entity resolution on data that can handle what exists right now in the real world.In my business large scale entity resolution with any kind of guarantees seemed like a pipe dream. This paper changed my whole perspective. Sure, we’re measuring time in hours, but if your task is already taking hours on one computer what does it matter.
  18. - Word Count in MapReduce with Java, Pretty much the “Hello Word” of the MapReduce paradigm. This hurts to even look at.Just when I thought I had escaped the tedium of object oriented programming, here I was trying to use a paradigm that fits functional programming like a glove and yet I was reduced to writing pages of code to do a simple word count. To make matters even worse, I had lost my beautiful Visual Studio tooling. The friction was just unbelievable. The thought of how many lines of code a real entity resolution system would be made me a bit queasy to say the least.- Note iteration over elements. Is this really necessary when we can have higher level abstractions?!
  19. - Map -> Choose (Open Map) – one to many- Partition -> Sort and group by key Reduce -> Constrained Reduce – many to many (or fewer)Now, don’t get me wrong. There’s are reasons why many of the largest software companies (including IBM and Microsoft) are embracing Hadoop. It’s big, it schedules well, it’s pretty darn fast and, most importantly it’s mature.
  20. Very simple and clean, you say what you want, not how you want it done
  21. Very similar to Scoobi, although based on Cascading. For some reason the Scalding folks love to use a lot of type annotations.
  22. Both of these programs are doing almost exactly what that java code before was except they are composing little functional subprograms instead of trying to do it all by hand.Actually, scalding is doing a bit more work in this code, as it lowercases and removes punctuation. Otherwise they’re actually quite similar and fairly beautiful to look at.Against the pages of Java it took to do Word Count before this is a god send.I do think the Scoobi looks a bit nicer because it’s a bit less verbose. That’s a matter of style and comfort with the language and less about the frameworks though.
  23. There are just a ton of Hadoop toolkits, but let’s focus on the blue and green ones .Seeing as how Scalding is built on top of cascading, it seems like a poor choice for a small company with limited resources.Here we run into a bit of a problem though. On one hand we have Scalding, a big project by the folks at twitter. On the other we have Scoobi made by OpenNICTAwhich is a small institution with a bend toward scientific computing.Pangool is a slightly less horrible API for JavaScrunch for Crunch
  24. ButMapReduce is just one of many choices for cloud computing paradigms. When you go solve a difficult problem with the cloud your choice should depend on a ton of factors. Can you accomplish what you’re looking to do? What technologies are you comfortable with? Are you comfortable using research software? Most importantly, can I get this done without talking to anyone from IT?
  25. For a smaller company with limited resources like mine, Mesos is quite significant. It allows you to build one cluster and perform many different styles of computation all sharing the same scheduler.Notice that Spark writes a lot like Scoobi, but without all of the ceremony. It also loosens the straps of the standard Map-Reduce straight jacket a bit by allowing you to keep things in memory between iterations.
  26. For a lot of difficult problems Spark is hugely better, but Hadoop has better tooling and a lot of people using it. With Mesos you get the best of both worlds. It’s a recent discovery for me, but I’m already a huge fan.
  27. Just to come back to this for a minute, look at this code and imagine it was your future. The thought of this for myself almost brought me to tears.While the giants of tech like Microsoft and IBM have been asleep at the wheel, functional programmers have been busy solving the hard problems. There’s absolutely no reason to go back to this kind of nightmare, you’d have to be certifiably insane.
  28. Now imagine this, you split the lines, add a number to the word, combine them up and then add them. It reads almost like English.
  29. Not just the ideas from functional programming, So, just as predicted years ago, functional programming has come to dominate at least one aspect of modern computing. Even in using java you are forced to write small programs which are referentially transparent and operate in parallel. But with functional programming you can have small composable referentially transparent parts with hidden implementation you don’t have to care about.
  30. We’ll have some of the Cloud Numericsguys there giving a tutorialhow to do linear algebra in the cloud.