SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
Bob Sutor – VP, Open Systems and Linux
5 November, 2010




Data, Problems, and Languages
ApacheCon 2010




                                         © 2009 IBM Corporation
Abstract



    Much research work over the next decade will be driven by those seeking
     to solve complex problems employing the cloud, multicore processors,
           distributed data, business analytics, and mobile computing.
    In this talk I’ll discuss some past approaches but also look at work being
       done in the labs on languages like X10 that extend the value of Java
     through parallelism, technologies that drive cross-stack interoperability,
          and approaches to handling and analyzing both structured and
                                  unstructured data.


                        The content of this presentation is my own and does not necessarily
                            represent my employer’s positions, strategies or opinions.




2     5 November 2010    ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages           © 2010 IBM Corporation
Rules of conversation


! For any programming language I mention that does something, there will
  be 10 that do something similar that I won’t mention, including your
  favorites.
! For any feature in any programming language, someone else has done it
  before, probably.
! No syntax wars today, nor arguments about weakly vs. strongly typed
  languages, braces, indentation, or tabs.
! IDEs and tools like debuggers are critical to the success of languages and
  thus to solving problems, but I don’t address them here. (Someone has
  probably done something in Emacs or Eclipse, anyway.)




3   5 November 2010   ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages   © 2010 IBM Corporation
Some personal history


! Started programming in 1973 at the age of 15 using BASIC on a time-
  sharing system.
! Problem solved with this was education.
! Next moved on to APL, an interactive, non-compiled language optimized
  for manipulating arrays with an unusual and interesting notation going
  back to 1957.
! I “misused” this to create a text processor for
  our high school newspaper.
! That is, I used a language for processing
  data outside of the usual application areas,
  though I didn’t have a lot of choice.


                                                                                   Photo courtesy of Wikipedia

4   5 November 2010   ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages                         © 2010 IBM Corporation
IBM Research, 1980s-90s: Axiom / A# / Aldor


! Core research was with AXIOM, a computer algebra system.
! These systems manipulate equations and formulas symbolically, not just
  numerically.
! Example computations involve integration, expression simplification,
  differential equations, matrices, and so on.
! Much of the research around them is in algorithms and the software
  needed to implement them efficiently.
! The better known systems today include Maple and Mathematica though
  some open source systems like Sage are available.




5   5 November 2010   ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages   © 2010 IBM Corporation
Categories and parametric polymorphism


! Category- and                                        -- A function returning an integer.!
                                                       factorial(n: Integer): Integer == {!
  domain-producing                                         if n = 0 then 1 else n*factorial(n-1)!
  functions use the                                    }!
                                                       !
  same language as                                     -- Functions returning a category and a domain.!
  first-order                                          Module(R: Ring): Category == Ring with {!
  functions.                                               *: (R, %) -> %!
                                                       }!
                                                       !
                                                       Complex(R: Ring): Module(R) with {!
                                                           complex: (%, %) -> R;!
                                                           real: % -> R;!
                                                           imag: % -> R;!
                                                           conjugate: % -> %; ...!
                                                       } == add {!
                                                           Rep == Record(real: R, imag: R);!
                                                           real(z:%): R == rep(z).real;!
                                                           (w: %) + (z: %): % == ...!
                                                       }!


6   5 November 2010   ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages           © 2010 IBM Corporation
Conditional types


! Type producing                                   UnivariatePolynomial(R: Ring): Module(R) with {!
                                                       coeff: (%, Integer) -> R;!
  expressions may be                                   monomial: (R, Integer) -> %;!
  conditional.                                         if R has Field then EuclideanDomain;!
                                                       ...!
                                                   } == add {!
                                                       ...!
                                                   }!




!  http://www.aldor.org/
!  A major problem was how to make this strongly typed library usable
   in a weakly typed interface that allowed easy calculations.
!  http://axiom-developer.org/axiom-website/bookvol0.pdf



7   5 November 2010   ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages         © 2010 IBM Corporation
Why special programming language work for math?


! By looking at math you have
    – Rich relationships among non-trivial concepts, with complex interfaces
    – A well-defined domain
    – Many programming language problems that had early use here: algebraic
      expressions, arrays, big integers, garbage collection, pattern matching,
      parametric polymorphism, ...
    – Large libraries requiring efficient code
! Simple programming interfaces quickly become insufficient.
! In the case of Axiom, the library was built on ideas borrowed from
  mathematical category theory, and so things like AbelianMonoid, Ring,
  Field, and Module(R) become part of the implementation.




8   5 November 2010   ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages   © 2010 IBM Corporation
Problem areas driving programming evolution today


! Big Data, including streams
! Multicore and concurrent processing
! Cloud computing
! Simplification
! Cost and efficiency
! All the above, at the same time




9   5 November 2010   ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages   © 2010 IBM Corporation
Data is getting big ! and wide, and tall, and deep



                                                                                        �
                                                        �                                   �
                                                            �




                                             �                                                  �




                                                                                    �
10   5 November 2010   ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages                   © 2010 IBM Corporation
Elements of a programming model


! Programming language
     – Safety, portability, productivity
     – Concurrency, synchronization, memory model

! Tools
     – Compiler, debugger, test and performance, IDE
     – Parallelization, multi-threaded analysis

! Runtime
     – Performance, security, reliability, monitoring, management
     – Scalability, OS interaction, races, deadlocks
! Ecosystem
     – Skills, support, documentation, libraries, ISVs
     – Parallel algorithms


11   5 November 2010   ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages   © 2010 IBM Corporation
Elements of a parallel programming model


! Programming language
     – Safety, portability, productivity
     – Concurrency, synchronization, memory model

! Tools
     – Compiler, debugger, test and performance, IDE
     – Parallelization, multi-threaded analysis

! Runtime
     – Performance, security, reliability, monitoring, management
     – Scalability, OS interaction, races, deadlocks
! Ecosystem
     – Skills, support, documentation, libraries, ISVs
     – Parallel algorithms


12   5 November 2010   ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages   © 2010 IBM Corporation
Emerging parallel programming models




13   5 November 2010   ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages   © 2010 IBM Corporation
System S and streaming computing

                            Data processed as it arrives,
                            not as batches or transactions


                            NYSE                                                                               Transforms
Millions of Events




                                                                                                                                       Dynamic P/E
                                                                                                 VWAP                                     Ratio
                                                                                               Calculation
                                                                                                                                       Calculation
                                                                                                                                                     Annotate
                                                             10 Q
                                                           Earnings
                                                           Extraction
                                                                                                    Correlate                                            Join P/E
                         SEC Edgar                                                                                        Earnings
                                                                                                                           Moving
                                                                                                                                                           with
                                                                                                                                                        Aggregate
                                                                                                                          Average                         Impact
                                                   Caption                                                               Calculation
                                                    Caption
                                                  Extraction
                                                     Caption
                                                   Extraction                             Earnings
                                                    Extraction            Topic            Earnings
                                                                                          Related            Earnings
                                                                            Topic           Earnings
                                                                                           Related
                                                                        Filtration         News               News
                                                                             Topic
                                                                         Filtration          Related
                                                                                            News
                                                                                          Analysis                                                                        Trade Decision
                                                   Speech                                                      Join
                        Video News                  Speech
                                                 Recognition
                                                                           Filtration         News
                                                                                           Analysis
                                                                                            Analysis
                                                     Speech
                                                  Recognition
                         Video News                Recognition

                                                                                 Hurricane
                                                           Hurricane             Forecast
                                                                                  Hurricane
                                                            Weather               Model 1
                                                                                   Forecast                  Hurricane                      Hurricane
                                                                                     Hurricane                 Risk       Hurricane          Industry
                                                             Data                  Model 2
                                                                                      Forecast                             Impact
                                                                                        Hurricane            Encoder                          Impact
                                                           Extraction
                       Weather Data                                                   Model !
                                                                                        Forecast
                                                                                         Model N
                                                           Filter                                               Classify

                                                                                        millisecond latency
                                                       http://www-01.ibm.com/software/sw-library/en_US/detail/R924335M43279V91.html
       14            5 November 2010   ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages                                                                   © 2010 IBM Corporation
Example of SPL

 stream<string8 buyer, string8 item, decimal64 price> Bid = FileSource() { !
! !param !fileName!: "BidSource.dat”;!
!}!
!stream<string8 seller, string8 item, decimal64 price> Quote = FileSource() { !
! !param !fileName!: "SaleSource.dat”;!
!}!
!stream<string8 buyer, string8 seller, string8 item> Sale = Join(Bid; Quote) {!
! !window !Bid     !: sliding, time(30);!
!!        !Quote   !: sliding, count(50);!
! !param !match    !: Bid.item == Quote.item && Bid.price >= Quote.price;!
! !output !Sale    !: item = Bid.item;!
!}!




                                                              Bid
                                   FileSource
                                                                                            Sale
                                                                                     Join
                                   FileSource
                                                              Quote

 15   5 November 2010   ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages                                 15
                                                                                                   © 2010 IBM Corporation
Apache technologies for Big Data include !


! Hadoop MapReduce
     – “Hadoop MapReduce is a programming model and software framework for
       writing applications that rapidly process vast amounts of data in parallel on
       large clusters of compute nodes.”
     – http://hadoop.apache.org/mapreduce/

! Pig
     – “Ease of programming. It is trivial to achieve parallel execution of simple,
       ‘embarrassingly parallel’ data analysis tasks. Complex tasks comprised of
       multiple interrelated data transformations are explicitly encoded as data flow
       sequences, making them easy to write, understand, and maintain.”
     – “Pig's infrastructure layer consists of a compiler that produces sequences of
       Map-Reduce programs, for which large-scale parallel implementations already
       exist”
     – http://pig.apache.org/


16   5 November 2010   ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages   © 2010 IBM Corporation
X10


! Productivity goal
     – More convenient than Java: type inference, closures, syntactic slickness
     – More accurate than Java: constrained types, better generics

! Performance goal
     – Concurrency is in the language
     – Detailed model of multicore, accelerators, etc.
     – For sequential programming, structs and fewer required runtime checks

! Target areas of application are scientific computing and business
  analytics
! Two back ends
     – Java
     – C++: faster and required for many multi-core processors


17   5 November 2010   ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages   © 2010 IBM Corporation
X10 and concurrency


! Concurrency is hard
! X10 does not try to hide concurrency or the multiprocessor, or pretend
  that concurrency is just a minor extension.
! From the developers:
                                       “If you are not used to concurrency,
                                       X10 looks horrible and complicated.
                                          If you are used to concurrency,
                                               X10 looks pretty slick.”
! http://x10.codehaus.org/




18   5 November 2010   ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages   © 2010 IBM Corporation
Asynchronous Partitioned Global Address Space




19   5 November 2010   ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages   © 2010 IBM Corporation
Simplification


! At this point the discussion often jumps right to scripting and dynamic
  languages.
! Simplicity is relative to other approaches in the same application domain.
! Simple on client frontend isn’t the same as simple on the server backend.
! We need more information like “it’s probably best not to try to solve that
  kind of problem in this language.”
! Things to avoid when simplifying:
     – Lack of interoperability
     – Bad scalability
     – Universal but ugly client-side user interfaces
     – Write-only syntax



20   5 November 2010   ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages   © 2010 IBM Corporation
Scripting


! “Scripting is like fashion, it varies by the
  season and personal taste.”
! Scripting languages like JavaScript, PHP,
  Python, and Ruby are not going away anytime
  soon, in that order (?).
! Many of the people to whom I’ve spoken
  predict great things for the future of JavaScript,
  beyond its near ubiquity and despite its “bad
  parts.”
! Google’s V8 JavaScript engine should lead to
  the language getting embedded in more
  programming languages.
! Would WoW use JavaScript today if Blizzard
  started over?
21   5 November 2010   ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages   © 2010 IBM Corporation
Pay by the bite (byte?)

Sample cloud computing costs
Instance                                                                            Linux Usage
Small                                                                               $0.085 per hour
Large                                                                               $0.34 per hour


Future computing costs?
Instance                                                                            Linux Usage
Object creation                                                                     $? per instance
Loop (small)                                                                        $? per instance
List reduction                                                                      $? per instance
Hash table lookup                                                                   $? per instance



22   5 November 2010   ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages                     © 2010 IBM Corporation
Customers will want to measure other things


! Customers will want to pay by the task accomplished, not by the
  programming features or operations used.
! So programming languages and execution environments, both local and
  distributed, will get measured by how efficiently they use processor,
  memory, and disk resources as well as how quickly and correctly they
  produce their results.
! Time-to-execution will also be critical: a programming language that
  allows me to code something in one line in one minute might be favored
  over the one that requires ten lines done in one hour.
! Languages that are unnecessarily resource hogs will get marginalized.
! How do you answer: how much energy is used in computing that result?



23   5 November 2010   ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages   © 2010 IBM Corporation
Conclusions and summary


! Add features to solve problems, not just add features.
! Programming Language Design: A Problem Solving Approach
! One language does not fit all and never will.
! People are terrified of being stuck with huge libraries of source code
  written in now abandoned languages.
! Java is not going away and will remain critical for enterprises for many
  decades to come.
! “It’s the JVM, stupid.”
! Programming languages will be divided between languages that compile
  down to Java bytecodes and those that don’t.
! We’ll see a lot more languages in the future.


24   5 November 2010   ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages   © 2010 IBM Corporation
Thanks to !


! Colleagues who have shared their time and wisdom with me on these
  topics.
! They include John Duimovich, Sam Ruby, Brent Hailpern, David Boloker,
  Bob Blainey, Stephen Watt, Vijay Saraswat, David Ungar, Tessa Lau,
  Rodric Rabbah, John Field, Martin Hirzel, and members of the IBM
  Research staff.
! I thank them for their conversations and sharing material with me, much of
  which I have liberally borrowed.




25   5 November 2010   ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages   © 2010 IBM Corporation

Weitere ähnliche Inhalte

Ähnlich wie ApacheCon 2010 Keynote: Problems, Data, and Languages

SystemML - Declarative Machine Learning
SystemML - Declarative Machine LearningSystemML - Declarative Machine Learning
SystemML - Declarative Machine LearningLuciano Resende
 
Multi Lingual Websites In Umbraco
Multi Lingual Websites In UmbracoMulti Lingual Websites In Umbraco
Multi Lingual Websites In UmbracoPaul Marden
 
What's new in Apache SystemML - Declarative Machine Learning
What's new in Apache SystemML  - Declarative Machine LearningWhat's new in Apache SystemML  - Declarative Machine Learning
What's new in Apache SystemML - Declarative Machine LearningLuciano Resende
 
Data Science - Part II - Working with R & R studio
Data Science - Part II -  Working with R & R studioData Science - Part II -  Working with R & R studio
Data Science - Part II - Working with R & R studioDerek Kane
 
Introduction to R and R Studio
Introduction to R and R StudioIntroduction to R and R Studio
Introduction to R and R StudioRupak Roy
 
Programing paradigm &amp; implementation
Programing paradigm &amp; implementationPrograming paradigm &amp; implementation
Programing paradigm &amp; implementationBilal Maqbool ツ
 
R Programming Language
R Programming LanguageR Programming Language
R Programming LanguageNareshKarela1
 
A intro to (hosted) Shiny Apps
A intro to (hosted) Shiny AppsA intro to (hosted) Shiny Apps
A intro to (hosted) Shiny AppsDaniel Koller
 
1_Introduction.pptx
1_Introduction.pptx1_Introduction.pptx
1_Introduction.pptxranapoonam1
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Julian Hyde
 
R basics for MBA Students[1].pptx
R basics for MBA Students[1].pptxR basics for MBA Students[1].pptx
R basics for MBA Students[1].pptxrajalakshmi5921
 
R as supporting tool for analytics and simulation
R as supporting tool for analytics and simulationR as supporting tool for analytics and simulation
R as supporting tool for analytics and simulationAlvaro Gil
 
SiriusCon 2015 - Breathe Life into Your Designer!
SiriusCon 2015 - Breathe Life into Your Designer!SiriusCon 2015 - Breathe Life into Your Designer!
SiriusCon 2015 - Breathe Life into Your Designer!melbats
 
RSJ2011 OSS Robotics and Tools OpenHRI Intro
RSJ2011 OSS Robotics and Tools OpenHRI IntroRSJ2011 OSS Robotics and Tools OpenHRI Intro
RSJ2011 OSS Robotics and Tools OpenHRI IntroYosuke Matsusaka
 
Javascript, DOM, browsers and frameworks basics
Javascript, DOM, browsers and frameworks basicsJavascript, DOM, browsers and frameworks basics
Javascript, DOM, browsers and frameworks basicsNet7
 
Semantically-aware Networks and Services for Training and Knowledge Managemen...
Semantically-aware Networks and Services for Training and Knowledge Managemen...Semantically-aware Networks and Services for Training and Knowledge Managemen...
Semantically-aware Networks and Services for Training and Knowledge Managemen...Gilbert Paquette
 

Ähnlich wie ApacheCon 2010 Keynote: Problems, Data, and Languages (20)

SystemML - Declarative Machine Learning
SystemML - Declarative Machine LearningSystemML - Declarative Machine Learning
SystemML - Declarative Machine Learning
 
Multi Lingual Websites In Umbraco
Multi Lingual Websites In UmbracoMulti Lingual Websites In Umbraco
Multi Lingual Websites In Umbraco
 
What's new in Apache SystemML - Declarative Machine Learning
What's new in Apache SystemML  - Declarative Machine LearningWhat's new in Apache SystemML  - Declarative Machine Learning
What's new in Apache SystemML - Declarative Machine Learning
 
Data Science - Part II - Working with R & R studio
Data Science - Part II -  Working with R & R studioData Science - Part II -  Working with R & R studio
Data Science - Part II - Working with R & R studio
 
Introduction to R and R Studio
Introduction to R and R StudioIntroduction to R and R Studio
Introduction to R and R Studio
 
R programming Language
R programming LanguageR programming Language
R programming Language
 
Getting started with R
Getting started with RGetting started with R
Getting started with R
 
Programing paradigm &amp; implementation
Programing paradigm &amp; implementationPrograming paradigm &amp; implementation
Programing paradigm &amp; implementation
 
R Programming Language
R Programming LanguageR Programming Language
R Programming Language
 
A intro to (hosted) Shiny Apps
A intro to (hosted) Shiny AppsA intro to (hosted) Shiny Apps
A intro to (hosted) Shiny Apps
 
1_Introduction.pptx
1_Introduction.pptx1_Introduction.pptx
1_Introduction.pptx
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...
 
R basics for MBA Students[1].pptx
R basics for MBA Students[1].pptxR basics for MBA Students[1].pptx
R basics for MBA Students[1].pptx
 
R as supporting tool for analytics and simulation
R as supporting tool for analytics and simulationR as supporting tool for analytics and simulation
R as supporting tool for analytics and simulation
 
SiriusCon 2015 - Breathe Life into Your Designer!
SiriusCon 2015 - Breathe Life into Your Designer!SiriusCon 2015 - Breathe Life into Your Designer!
SiriusCon 2015 - Breathe Life into Your Designer!
 
RSJ2011 OSS Robotics and Tools OpenHRI Intro
RSJ2011 OSS Robotics and Tools OpenHRI IntroRSJ2011 OSS Robotics and Tools OpenHRI Intro
RSJ2011 OSS Robotics and Tools OpenHRI Intro
 
Introduction to Apache Beam
Introduction to Apache BeamIntroduction to Apache Beam
Introduction to Apache Beam
 
Javascript, DOM, browsers and frameworks basics
Javascript, DOM, browsers and frameworks basicsJavascript, DOM, browsers and frameworks basics
Javascript, DOM, browsers and frameworks basics
 
Semantically-aware Networks and Services for Training and Knowledge Managemen...
Semantically-aware Networks and Services for Training and Knowledge Managemen...Semantically-aware Networks and Services for Training and Knowledge Managemen...
Semantically-aware Networks and Services for Training and Knowledge Managemen...
 
Java
JavaJava
Java
 

Mehr von Robert Sutor

Considering New Data Sources
Considering New Data SourcesConsidering New Data Sources
Considering New Data SourcesRobert Sutor
 
5 mistakes to avoid when creating a mobile app
5 mistakes to avoid when creating a mobile app5 mistakes to avoid when creating a mobile app
5 mistakes to avoid when creating a mobile appRobert Sutor
 
For the Love of Big Data
For the Love of Big DataFor the Love of Big Data
For the Love of Big DataRobert Sutor
 
IBM Mobile Strategy - Mobile World Congress 2012
IBM Mobile Strategy - Mobile World Congress 2012IBM Mobile Strategy - Mobile World Congress 2012
IBM Mobile Strategy - Mobile World Congress 2012Robert Sutor
 
Open Source Governance for your Organization
Open Source Governance for your OrganizationOpen Source Governance for your Organization
Open Source Governance for your OrganizationRobert Sutor
 
Landmines for Open Source in the Mobile Space
Landmines for Open Source in the Mobile SpaceLandmines for Open Source in the Mobile Space
Landmines for Open Source in the Mobile SpaceRobert Sutor
 
Regarding Clouds, Mainframes, and Desktops … and Linux
Regarding Clouds, Mainframes, and Desktops … and LinuxRegarding Clouds, Mainframes, and Desktops … and Linux
Regarding Clouds, Mainframes, and Desktops … and LinuxRobert Sutor
 
Linux Everywhere? Matching the Workload to the Computer
Linux Everywhere? Matching the Workload to the ComputerLinux Everywhere? Matching the Workload to the Computer
Linux Everywhere? Matching the Workload to the ComputerRobert Sutor
 
Linux, Virtualisation, and Clouds
Linux, Virtualisation, and CloudsLinux, Virtualisation, and Clouds
Linux, Virtualisation, and CloudsRobert Sutor
 
The Intersection of Ideas in Open Source and Open Standards
The Intersection of Ideas in Open Source and Open StandardsThe Intersection of Ideas in Open Source and Open Standards
The Intersection of Ideas in Open Source and Open StandardsRobert Sutor
 
Smaller, Flatter, Smarter
Smaller, Flatter, SmarterSmaller, Flatter, Smarter
Smaller, Flatter, SmarterRobert Sutor
 

Mehr von Robert Sutor (11)

Considering New Data Sources
Considering New Data SourcesConsidering New Data Sources
Considering New Data Sources
 
5 mistakes to avoid when creating a mobile app
5 mistakes to avoid when creating a mobile app5 mistakes to avoid when creating a mobile app
5 mistakes to avoid when creating a mobile app
 
For the Love of Big Data
For the Love of Big DataFor the Love of Big Data
For the Love of Big Data
 
IBM Mobile Strategy - Mobile World Congress 2012
IBM Mobile Strategy - Mobile World Congress 2012IBM Mobile Strategy - Mobile World Congress 2012
IBM Mobile Strategy - Mobile World Congress 2012
 
Open Source Governance for your Organization
Open Source Governance for your OrganizationOpen Source Governance for your Organization
Open Source Governance for your Organization
 
Landmines for Open Source in the Mobile Space
Landmines for Open Source in the Mobile SpaceLandmines for Open Source in the Mobile Space
Landmines for Open Source in the Mobile Space
 
Regarding Clouds, Mainframes, and Desktops … and Linux
Regarding Clouds, Mainframes, and Desktops … and LinuxRegarding Clouds, Mainframes, and Desktops … and Linux
Regarding Clouds, Mainframes, and Desktops … and Linux
 
Linux Everywhere? Matching the Workload to the Computer
Linux Everywhere? Matching the Workload to the ComputerLinux Everywhere? Matching the Workload to the Computer
Linux Everywhere? Matching the Workload to the Computer
 
Linux, Virtualisation, and Clouds
Linux, Virtualisation, and CloudsLinux, Virtualisation, and Clouds
Linux, Virtualisation, and Clouds
 
The Intersection of Ideas in Open Source and Open Standards
The Intersection of Ideas in Open Source and Open StandardsThe Intersection of Ideas in Open Source and Open Standards
The Intersection of Ideas in Open Source and Open Standards
 
Smaller, Flatter, Smarter
Smaller, Flatter, SmarterSmaller, Flatter, Smarter
Smaller, Flatter, Smarter
 

Kürzlich hochgeladen

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Kürzlich hochgeladen (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

ApacheCon 2010 Keynote: Problems, Data, and Languages

  • 1. Bob Sutor – VP, Open Systems and Linux 5 November, 2010 Data, Problems, and Languages ApacheCon 2010 © 2009 IBM Corporation
  • 2. Abstract Much research work over the next decade will be driven by those seeking to solve complex problems employing the cloud, multicore processors, distributed data, business analytics, and mobile computing. In this talk I’ll discuss some past approaches but also look at work being done in the labs on languages like X10 that extend the value of Java through parallelism, technologies that drive cross-stack interoperability, and approaches to handling and analyzing both structured and unstructured data. The content of this presentation is my own and does not necessarily represent my employer’s positions, strategies or opinions. 2 5 November 2010 ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages © 2010 IBM Corporation
  • 3. Rules of conversation ! For any programming language I mention that does something, there will be 10 that do something similar that I won’t mention, including your favorites. ! For any feature in any programming language, someone else has done it before, probably. ! No syntax wars today, nor arguments about weakly vs. strongly typed languages, braces, indentation, or tabs. ! IDEs and tools like debuggers are critical to the success of languages and thus to solving problems, but I don’t address them here. (Someone has probably done something in Emacs or Eclipse, anyway.) 3 5 November 2010 ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages © 2010 IBM Corporation
  • 4. Some personal history ! Started programming in 1973 at the age of 15 using BASIC on a time- sharing system. ! Problem solved with this was education. ! Next moved on to APL, an interactive, non-compiled language optimized for manipulating arrays with an unusual and interesting notation going back to 1957. ! I “misused” this to create a text processor for our high school newspaper. ! That is, I used a language for processing data outside of the usual application areas, though I didn’t have a lot of choice. Photo courtesy of Wikipedia 4 5 November 2010 ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages © 2010 IBM Corporation
  • 5. IBM Research, 1980s-90s: Axiom / A# / Aldor ! Core research was with AXIOM, a computer algebra system. ! These systems manipulate equations and formulas symbolically, not just numerically. ! Example computations involve integration, expression simplification, differential equations, matrices, and so on. ! Much of the research around them is in algorithms and the software needed to implement them efficiently. ! The better known systems today include Maple and Mathematica though some open source systems like Sage are available. 5 5 November 2010 ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages © 2010 IBM Corporation
  • 6. Categories and parametric polymorphism ! Category- and -- A function returning an integer.! factorial(n: Integer): Integer == {! domain-producing if n = 0 then 1 else n*factorial(n-1)! functions use the }! ! same language as -- Functions returning a category and a domain.! first-order Module(R: Ring): Category == Ring with {! functions. *: (R, %) -> %! }! ! Complex(R: Ring): Module(R) with {! complex: (%, %) -> R;! real: % -> R;! imag: % -> R;! conjugate: % -> %; ...! } == add {! Rep == Record(real: R, imag: R);! real(z:%): R == rep(z).real;! (w: %) + (z: %): % == ...! }! 6 5 November 2010 ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages © 2010 IBM Corporation
  • 7. Conditional types ! Type producing UnivariatePolynomial(R: Ring): Module(R) with {! coeff: (%, Integer) -> R;! expressions may be monomial: (R, Integer) -> %;! conditional. if R has Field then EuclideanDomain;! ...! } == add {! ...! }! !  http://www.aldor.org/ !  A major problem was how to make this strongly typed library usable in a weakly typed interface that allowed easy calculations. !  http://axiom-developer.org/axiom-website/bookvol0.pdf 7 5 November 2010 ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages © 2010 IBM Corporation
  • 8. Why special programming language work for math? ! By looking at math you have – Rich relationships among non-trivial concepts, with complex interfaces – A well-defined domain – Many programming language problems that had early use here: algebraic expressions, arrays, big integers, garbage collection, pattern matching, parametric polymorphism, ... – Large libraries requiring efficient code ! Simple programming interfaces quickly become insufficient. ! In the case of Axiom, the library was built on ideas borrowed from mathematical category theory, and so things like AbelianMonoid, Ring, Field, and Module(R) become part of the implementation. 8 5 November 2010 ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages © 2010 IBM Corporation
  • 9. Problem areas driving programming evolution today ! Big Data, including streams ! Multicore and concurrent processing ! Cloud computing ! Simplification ! Cost and efficiency ! All the above, at the same time 9 5 November 2010 ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages © 2010 IBM Corporation
  • 10. Data is getting big ! and wide, and tall, and deep � � � � � � � 10 5 November 2010 ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages © 2010 IBM Corporation
  • 11. Elements of a programming model ! Programming language – Safety, portability, productivity – Concurrency, synchronization, memory model ! Tools – Compiler, debugger, test and performance, IDE – Parallelization, multi-threaded analysis ! Runtime – Performance, security, reliability, monitoring, management – Scalability, OS interaction, races, deadlocks ! Ecosystem – Skills, support, documentation, libraries, ISVs – Parallel algorithms 11 5 November 2010 ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages © 2010 IBM Corporation
  • 12. Elements of a parallel programming model ! Programming language – Safety, portability, productivity – Concurrency, synchronization, memory model ! Tools – Compiler, debugger, test and performance, IDE – Parallelization, multi-threaded analysis ! Runtime – Performance, security, reliability, monitoring, management – Scalability, OS interaction, races, deadlocks ! Ecosystem – Skills, support, documentation, libraries, ISVs – Parallel algorithms 12 5 November 2010 ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages © 2010 IBM Corporation
  • 13. Emerging parallel programming models 13 5 November 2010 ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages © 2010 IBM Corporation
  • 14. System S and streaming computing Data processed as it arrives, not as batches or transactions NYSE Transforms Millions of Events Dynamic P/E VWAP Ratio Calculation Calculation Annotate 10 Q Earnings Extraction Correlate Join P/E SEC Edgar Earnings Moving with Aggregate Average Impact Caption Calculation Caption Extraction Caption Extraction Earnings Extraction Topic Earnings Related Earnings Topic Earnings Related Filtration News News Topic Filtration Related News Analysis Trade Decision Speech Join Video News Speech Recognition Filtration News Analysis Analysis Speech Recognition Video News Recognition Hurricane Hurricane Forecast Hurricane Weather Model 1 Forecast Hurricane Hurricane Hurricane Risk Hurricane Industry Data Model 2 Forecast Impact Hurricane Encoder Impact Extraction Weather Data Model ! Forecast Model N Filter Classify millisecond latency http://www-01.ibm.com/software/sw-library/en_US/detail/R924335M43279V91.html 14 5 November 2010 ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages © 2010 IBM Corporation
  • 15. Example of SPL stream<string8 buyer, string8 item, decimal64 price> Bid = FileSource() { ! ! !param !fileName!: "BidSource.dat”;! !}! !stream<string8 seller, string8 item, decimal64 price> Quote = FileSource() { ! ! !param !fileName!: "SaleSource.dat”;! !}! !stream<string8 buyer, string8 seller, string8 item> Sale = Join(Bid; Quote) {! ! !window !Bid !: sliding, time(30);! !! !Quote !: sliding, count(50);! ! !param !match !: Bid.item == Quote.item && Bid.price >= Quote.price;! ! !output !Sale !: item = Bid.item;! !}! Bid FileSource Sale Join FileSource Quote 15 5 November 2010 ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages 15 © 2010 IBM Corporation
  • 16. Apache technologies for Big Data include ! ! Hadoop MapReduce – “Hadoop MapReduce is a programming model and software framework for writing applications that rapidly process vast amounts of data in parallel on large clusters of compute nodes.” – http://hadoop.apache.org/mapreduce/ ! Pig – “Ease of programming. It is trivial to achieve parallel execution of simple, ‘embarrassingly parallel’ data analysis tasks. Complex tasks comprised of multiple interrelated data transformations are explicitly encoded as data flow sequences, making them easy to write, understand, and maintain.” – “Pig's infrastructure layer consists of a compiler that produces sequences of Map-Reduce programs, for which large-scale parallel implementations already exist” – http://pig.apache.org/ 16 5 November 2010 ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages © 2010 IBM Corporation
  • 17. X10 ! Productivity goal – More convenient than Java: type inference, closures, syntactic slickness – More accurate than Java: constrained types, better generics ! Performance goal – Concurrency is in the language – Detailed model of multicore, accelerators, etc. – For sequential programming, structs and fewer required runtime checks ! Target areas of application are scientific computing and business analytics ! Two back ends – Java – C++: faster and required for many multi-core processors 17 5 November 2010 ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages © 2010 IBM Corporation
  • 18. X10 and concurrency ! Concurrency is hard ! X10 does not try to hide concurrency or the multiprocessor, or pretend that concurrency is just a minor extension. ! From the developers: “If you are not used to concurrency, X10 looks horrible and complicated. If you are used to concurrency, X10 looks pretty slick.” ! http://x10.codehaus.org/ 18 5 November 2010 ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages © 2010 IBM Corporation
  • 19. Asynchronous Partitioned Global Address Space 19 5 November 2010 ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages © 2010 IBM Corporation
  • 20. Simplification ! At this point the discussion often jumps right to scripting and dynamic languages. ! Simplicity is relative to other approaches in the same application domain. ! Simple on client frontend isn’t the same as simple on the server backend. ! We need more information like “it’s probably best not to try to solve that kind of problem in this language.” ! Things to avoid when simplifying: – Lack of interoperability – Bad scalability – Universal but ugly client-side user interfaces – Write-only syntax 20 5 November 2010 ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages © 2010 IBM Corporation
  • 21. Scripting ! “Scripting is like fashion, it varies by the season and personal taste.” ! Scripting languages like JavaScript, PHP, Python, and Ruby are not going away anytime soon, in that order (?). ! Many of the people to whom I’ve spoken predict great things for the future of JavaScript, beyond its near ubiquity and despite its “bad parts.” ! Google’s V8 JavaScript engine should lead to the language getting embedded in more programming languages. ! Would WoW use JavaScript today if Blizzard started over? 21 5 November 2010 ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages © 2010 IBM Corporation
  • 22. Pay by the bite (byte?) Sample cloud computing costs Instance Linux Usage Small $0.085 per hour Large $0.34 per hour Future computing costs? Instance Linux Usage Object creation $? per instance Loop (small) $? per instance List reduction $? per instance Hash table lookup $? per instance 22 5 November 2010 ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages © 2010 IBM Corporation
  • 23. Customers will want to measure other things ! Customers will want to pay by the task accomplished, not by the programming features or operations used. ! So programming languages and execution environments, both local and distributed, will get measured by how efficiently they use processor, memory, and disk resources as well as how quickly and correctly they produce their results. ! Time-to-execution will also be critical: a programming language that allows me to code something in one line in one minute might be favored over the one that requires ten lines done in one hour. ! Languages that are unnecessarily resource hogs will get marginalized. ! How do you answer: how much energy is used in computing that result? 23 5 November 2010 ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages © 2010 IBM Corporation
  • 24. Conclusions and summary ! Add features to solve problems, not just add features. ! Programming Language Design: A Problem Solving Approach ! One language does not fit all and never will. ! People are terrified of being stuck with huge libraries of source code written in now abandoned languages. ! Java is not going away and will remain critical for enterprises for many decades to come. ! “It’s the JVM, stupid.” ! Programming languages will be divided between languages that compile down to Java bytecodes and those that don’t. ! We’ll see a lot more languages in the future. 24 5 November 2010 ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages © 2010 IBM Corporation
  • 25. Thanks to ! ! Colleagues who have shared their time and wisdom with me on these topics. ! They include John Duimovich, Sam Ruby, Brent Hailpern, David Boloker, Bob Blainey, Stephen Watt, Vijay Saraswat, David Ungar, Tessa Lau, Rodric Rabbah, John Field, Martin Hirzel, and members of the IBM Research staff. ! I thank them for their conversations and sharing material with me, much of which I have liberally borrowed. 25 5 November 2010 ApacheCon 2010 - Bob Sutor - Data, Problems, and Languages © 2010 IBM Corporation