SlideShare ist ein Scribd-Unternehmen logo
1 von 87
BigData and Modern XML
Jim Fuller
email: jim.fuller@marklogic.com twitter: @xquery
Senior Engineer, Europe
19/09/12
Senior engineer




http://jim.fuller.name
http://exslt.org           @xquery                      XSLT UK 2001
http://www.xmlprague.cz

                                            @perl6
                                            Perlmonks
                                            Pilgrim
Kickoff



XML current status             Modern XML & BigData
‘ontogeny recapitulates phylogeny’
                   or
      A (very)Brief History of ML
• Late 1950s: Noam Chomsky ‘generative
  grammars’
• 1969: Charles Goldfarb (w/ Ed Mosher and
  Ray Lorie) created GML
• 1986: SGML formalized
• 1998: XML 1.0 W3C recommendation
• 1998 – 2012: A lot of stuff happened
• Future: XML 2.0 … microXML ?
RDBMS Goliath vs XML David
• Back then, XML was the proto ‘nosql’
• X in AJAX

• Now many ‘davids’
• AJAJ
Documents
• Back then, it wasn’t unusual for vendor to say
  ‘tough luck’ with your data (pay up)
• Now, most office documents are in XML
The ‘long tail’ of XML Vocabularies

 • Back then, vocabularies built with
   proprietary approaches

 • Today, 1000’s of vocabularies based on
   XML
   – ‘2012 U.S. GAAP Taxonomy Adopted by SEC;
     FASB Webcast April 3’
Anyone heard of shipdex ?
Back then, XML/Markup Conferences
•   Software Development 99 East, November 8-13, 1999, Washington D.C.
•    XML One Fall 99, November 8-11, 1999, Santa Clara, CA
•    XML '99 December 6-9, 1999, Philadelphia PA
•    Markup Technologies '99 Conference December 5-9, 1999, Philadelphia
•    Web Design 2000, February 7-9, 2000, Atlanta
•    XTech '2000, February 27-March 2, San Jose
•    Software Development 2000 West, March 20-24, 2000, San Jose
•     Sixteenth International Unicode Conference, Boston, March 27-30, 2000
•    The Ninth International World Wide Web Conference, May 15-19, 2000,
    Amsterdam
•    DL 2000: Fifth ACM Conference on Digital Libraries, June 3-6 2000, Texas
•    XML Europe 2000, June 12-16, Paris
•    Web Design World 2000, July 17-21, 2000, Seattle, Washington
•    MetaStructures, August 14-16, 2000, Montreal, Quebec, Canada
•    XML Developers Conference, August 17-18, 2000, Montreal, Quebec
•    Internet World Expo, October 25-27, 2000, New York City
•    XML 2000/Markup Technologies 2000, December 3-7, Washington
•   ….. Even a Geek Cruises XML Excursion - January 2001
Today - XML/Markup Conferences
• The XML ‘parallelogram’
    –   Balisage
    –   XML Summer School
    –   XML Prague
    –   XML Amsterdam
•   Xtech*
•   markupForum
•   XATA
•   MarkLogic World (600 ppl)
•   databaseX (London November 2013 ?)
Other important good stuff
• Evolution of the Operating System
  – Unix is the operating system for text
  – Windows tried to be the operating system for
    binaries, then adopted xml .. Mixed bag
  – Java (vm) has a strong xml stack
• The web changed everything to text based
  markup.
• cheap RAM/Disk/CPU
• Virtualization = scale out
Other important good stuff




http://googleblog.blogspot.cz/2012/02/unicode-over-60-percent-of-web.html
Unfair to point out failure ?
•   Namespaces
•   XLINK
•   WS* astronautics
•   Draconian error checking
•   XML SCHEMA
•   XFORMS
•   XSLT 1.0 (or any xml) in the browser
•   XHTML vs HTML5
•   Too many specs (modularity good, complexity
    bad)
“Winning isn’t everything. There
should be no conceit in victory and no
   despair in defeat.” - Matt Busby

• 2001 I was the RDBMS serial killer
  – ‘kill RDBMS’
• Define successful ?
  –   Adoption ?
  –   Cheaper ?
  –   Faster ?
  –   Better ?
Drill Down distraction - Why is Xquery
        successful productive ?
• Choose my most successful (adhoc stories,
  visible success)
• Functional, dynamic … work with structure,
  text and values … stored proc + query lang
• XPATH^
• Is it possible to qualify/quantify Xquery
  productivity?
Programming Language Productivity
Data compiled from studies by Prechelt and Garret of a particular string
              processing problem - public domain 2006.
Programming Language Productivity
Data compiled from studies by Prechelt and Garret of a particular string
             processing problem - public domain 2006.
* 28msec – 2011
 http://www.28msec.com/html/home


           Java       XQuery
SimpleDB   2905       572
S3         8589       1469
SNS        2309       455

           13803      2496
Developing an Enterprise Web
Application in XQuery - 2009 Martin
   Kaufmann, Donald Kossmann

                Java/J2EE   XQuery
   Model        3100        240
   View         4100        1500
   Controller   900         1180

                8100 (?)    2920 (3490)
Nooooo!   The problem with loc


          correlation of failure
          with very high loc is
          the only certain fact
                with loc

             That’s about it
An empirical comparison of C, C++,
Java, Perl, Python, Rexx, and Tcl for a
  search/string-processing program
Lutz Prechelt (prechelt@ira.uka.de) Fakulta ̈t fu ̈r Informatik
                   Universita ̈t Karlsruhe


    Language                           #loc per Function Point
                  C                               91
                 C++                              53
                 Java                                54
                 Perl                                21
    * Designing and writing programs using dynamic languages tended to
    take half as long as well as resulting in half the code.
Function Point Method
Nooooo!

              #loc per FP
                   =
             Lines of code
                  Per
            Function Point
Project Uncertainty Principle




         * Dilbert Comic 2003 United Features Syndicate Inc
Reviewed 11 projects

                                                       FP Analysis
                                                 Calc FP inputs/outputs
                                                 Calc VAF (0.65 + [ (Ci) / 100])
                                                 AVP = VAF * sum(FP)

                                                                #loc
                                                             using cloc


                                                  = #loc per FP
* FP overview - http://www.softwaremetrics.com/fpafund.htm
Language                            #loc per Function Point
              Perl                               21
              Eiffel                               21
              SQL                                 13-30
            XQuery                                27-33
            Haskell                                38
             Erlang                                40
            Python                                42-47
              Java                                50-80
           Javascript                             50-55
            Scheme                                 53
              C++                                 59-80
                C                                128-140
             http://www.qsm.com/resources/function-point-languages-table
Xquery 2011 Survey
Preferred Programming Language
                                    73%
                              55%
                        45%
                  32%
            22%
Which data formats do you use the
             most ?

                                     95%

                               40%

                               39%

                          32%

                         27%

                   18%

                   15%
Do you think XQuery makes you a
 more productive programmer ?

                       67%



            14%



        10%



       8%
Is XQuery more productive then (with???) Java
  in developing web based data applications ?

                             58%



                    22%



              12%



             8%
Time to bust one myth
• xml is too slow and bloated

• http://www.navioo.com/ajax/ajax_json_x
  ml_Benchmarking.php

• In data orientated AJAJ scenarios with
  JSON … best most benchmarks today is
  30% faster with less load (so more with
  less resources)
mongodb




     * http://www.linkedin.com/skills/skill
Javascript




       * http://www.linkedin.com/skills/skill
XQuery




    * http://www.linkedin.com/skills/skill
XSLT




   * http://www.linkedin.com/skills/skill
hadoop




    * http://www.linkedin.com/skills/skill
Java




       * http://www.linkedin.com/skills/skill
JSON




   * http://www.linkedin.com/skills/skill
XML




  * http://www.linkedin.com/skills/skill
Back When SQL Was Invented…
born in the 90’s
XML ?
Might even be
Channel effect of Aging inTechnology
• “Average age of @guardian Facebook
  audience is 29. Website is 37, print paper 44.
  Amazing channel effect, really. #newsrw”
• Babyboomers, Gen X, Y and Z
• I feel a bit uneasy framing generational
  arguments …
Death of the XML Child
…Overachieving Child Prodigies
          grow up
Lets not get distracted.
Don’t mention the war
XML Hard Core - XML Hype cycle

       2002

                    2006




                                       2012
1998
                        2007

       XML’s reported death->   2009
REST of the World - XML Hype cycle

                     2006

       2002




                            2009
1998

                                   2012
          XML’s reported death->
hype cycle




*2012 Gartner Hype Cycle http://www.gartner.com
2001 Edd Dumbill – xml.com
‘Stop the XML hype, I want to get off

           As editor of XML.com, I welcome the massive
success XML has had. But things prized by the XML
community — openness and interoperability — are
getting swallowed up in a blaze of marketing hype. Is this
the price of success, or something we can avoid? ‘

                Source: Edd Dumbill (March 2001)
2012 Edd Dumbill g+ post
‘For many years I was the editor of XML.com,
and the chair of the XML Europe conference.
Today, it seems that XML's mission to be a web
language is mostly dead. I'm not saying XML is
useless: it has proved itself as a more easily-used
SGML, but I'm not sure it's expanded too far
outside of that.’
               Source: Edd Dumbill (March 2012)
Current Status: XML is dead
• XML fought too many battles (RDBMS, NoSQL,
  web developers, HTML5)
• Age channeling and Hype curve in effect
• But XML technology stack is embracing JSON
  etc …
• No room for sentimentality in technology
XML is dead boring
Halftime Break
Big Data & Modern XML
What’s the problem ?
Is XML Applicable to Big Data ?
•   We know it is, that’s why I am here
•   Some of you already know
•   Need to dig into the detail
•   But we first need to simplify things
http://kensall.com/big-picture/bigpix22.html
* http://gigaom.com/cloud/big-data-equals-big-opportunities-for-businesses-infographic/


                   BigData Opportunity
* http://gigaom.com/cloud/big-data-equals-big-opportunities-for-businesses-infographic/


                   BigData Opportunity
managing data variability, volume & velocity is hard




You need to be a (data) scientist to build this rocket ship.
So whats the problem again ?




     #1 – How to Apply Modern XML to your BigData
     problems ?

         #1a: XML Milieu too complicated, need to identify
         what is successful as Modern XML

         #1b – BigData is a huge opportunity

         #1c – BigData has a huge learning curve and high risks
Solving #1 – Defining Modern XML
• Identify the technologies
• Identify and classify the Scenarios
Modern XML Technology analysis
• Internal survey of ML Customer projects &
External survey of projects (w/ pref towards
Big/Complex projects)
• Informal Survey (polldaddy)
• Qualitative and quantitative
Eisenhower - "What is important is seldom
urgent and what is urgent is seldom important,"
                URGENT               NOT URGENT




IMPORTANT

                         Critical             Goals



NOT IMPORTANT

                     interruptions         Distractions
Survey Interpretations
•   XML 1.0, Namespaces is important now
•   XProc, XHTML important now
•   XSLT 2 and XQuery 1 very important now
•   XSLT 2 and XQuery 2 in the browser near future
•   XQuery 3.0 important near future
•   SAX/DOM now, XOM possible future
•   XML Schema 1.0 now, 1.1 for the near future
•   Schematron surprising
•   Semweb is for the future
•   SVG and MathML due to web browser support
•   XML vocabulary has a very ‘long tail’
Modern XML
               Technology Candidates
Core           XML 1.0                 These technologies trended
               Namespaces              highly across all analysis

Other
                                       Bold – could be trending due
                                       to browser impl/historical
Transform      XSLT 2.0                dep
               XQuery 1.0
Processing     SAX, DOM

Schema         Schematron
               XML Schema 1.0
Semantics      RDF
               OWL

Vocabularies   Office Doc ML
               SVG
Modern XML
               Tier 1
Core           XML 1.0                    These technologies trended
               Namespaces                 highly across all analysis

Other          XProc
                                          Bold – could be trending due
                                          to browser impl/historical
Transform      XSLT 2.0 / 3.0 / browser   dep
               XQuery 1.0 / 3.0
Processing     SAX, DOM
                                          Italic – strong signal, early
Schema         Schematron                 usage, interest of unproven
               XML Schema 1.0 / 1.1       spec/tech

Semantics      RDF
               OWL

Vocabularies   Office Doc ML
               SVG
Modern XML                 Modern XML
               Tier 1                     Tier 2
Core           XML 1.0                    XML Canonicalization
               Namespaces                 xml:id


Other          XProc                      XHTML*

Transform      XSLT 2.0 / 3.0 / browser   XSLT 1.0
               XQuery 1.0 / 3.0
Processing     SAX, DOM                   XOM, STAX


                                          RELAX-NG
Schema         Schematron
               XML Schema 1.0 / 1.1
                                          SPARQL
Semantics      RDF
               OWL

Vocabularies   Office Doc ML              MathML
               SVG                        Docbook
                                          SOAP* , DITA, EPUB
Modern XML                 Modern XML
               Tier 1                     Tier 2
Core           XML 1.0                    XML Canonicalization
               Namespaces                 xml:id
                                          XML infoset
Other          XProc                      XHTML*

Transform      XSLT 2.0 / 3.0 / browser   XSLT 1.0
               XQuery 1.0 / 3.0
Processing     SAX, DOM                   XOM, STAX


                                          RELAX-NG
Schema         Schematron
               XML Schema 1.0 / 1.1
                                          SPARQL
Semantics      RDF
               OWL

Vocabularies   Office Doc ML              MathML
               SVG                        Docbook
                                          SOAP , DITA, EPUB,
Data Formats   XML, text, binary, JSON
The technology triggers
• XML Database – reduce the complexity/risk of
  BigData
  –   MarkLogic
  –   eXist
  –   Zorba
  –   Sedna
  –   Basex
  –   Others (Oracle!)
• Xquery - Rapid prototyping
• Avoid purist architectures, embrace
  heterogeneity
Modern XML / BigData Scenarios
• Classic Scenarios
    –   Document (xml) Database
    –   Aggregation
    –   Enterprise Search
    –   Heterogeneous Content store
    –   Publishing

• BigData Scenarios
    –   BigData ‘classic’
    –   Extreme personalisation
    –   Predictive analytics
    –   Financial analysis
    –   Realtime analysis (management/financial)
    –   Actionable intelligence

• Semantic Web – too early to categorize but its for real
Solving Problem #2 – Focus on the
             Practicalities
• What type of Big Data problem do you have ?
  – The urgent, important ones you know about
  – The urgent, important ones you don’t know about
• Create a dedicated team (analytics, problem
  domain experts) to identify the later
• Assess data maturity (Data Audit)
• With power comes responsibility … Ethical
  Analytics
BigData Tech Advice
• Start using an XML database asap!
• Don’t get distracted by the zoo … start
  hadooping right away
• ‘Data outlives code’, spend more time on the
  data, clean abstractions, cogent, opening it up
Size appropriately
Volume – will be relative to your current capability,
if the requirement is a magnitude greater past
current infrastructure scaling

Velocity – Updates versus reads ? High volatility
with realtime queries ?

Variety – managing versioning ?

Complexity – multiples, complex processes
Size Appropriately: Are you a
         ‘Facebook’ (Google, Yahoo…) ?
•     2.5 billion content items shared per day (status updates + wall posts +
    photos + videos + comments)
•     2.7 billion Likes per day
•     300 million photos uploaded per day
•     100+ petabytes of disk space in one of FB’s largest Hadoop (HDFS)
    clusters
•     105 terabytes of data scanned via Hive, Facebook’s Hadoop query
    language, every 30 minutes
•     70,000 queries executed on these databases per day
•     500+terabytes of new data ingested into the databases every day

• Are you planning to scale out too ~180,900 servers ?
• ~18000 database servers ingesting 500+ terabytes of data through a
  guestimated 50+ billion calls …. A day!

               http://www.datacenterknowledge.com/the-facebook-data-center-faq/
Solving Problem #3 – Understanding
                the risks
• Biggest mistakes seen with BigData adoption
• ‘data scientists themselves don't have much of
  intuition either…and that is a problem. I saw an
  estimate recently that said 70 to 80 percent of the
  results that are found in the machine learning
  literature, which is a key Big Data scientific field, are
  probably wrong because the researchers didn't
  understand that they were overfitting the data’. –
  Alex Pentland MIT's “Big Data guy”
Summary
• We reviewed some aspects of XML current
  status in the dataverse
• Identified a subset of the XML Milieu – calling
  it Modern XML
• Identified the scenarios where Modern XML
  are being brought to bear with Bigdata
• Reviewed common mistakes and Risks with
  BigData
Final Thesis
• Modern XML provides great foundation today
  – Great for ‘classic’ scenarios
  – Great technical positioning for addressing
    challenges of BigData
  – Great technical positioning for semweb
• Adopting an XML database mitigates risk
• Knowing Bigdata/Modern XML scenarios helps
  us mitigate risks
• There is a big prize if you get BigData right
Avoid stereotypes




             I’m a RDBMS

                           I’m a Protocol Buffer
                                                   I’m a Json
I’m an XML
Jeni Tennison XML Prague 2012 talk



                             JSON

             XML
                   RDF

    HTML
Be wary of Paradigm Shifts
• RedMonks - Language divergence
• Andresson - Software is eating the world
• 128bit and beyond current von
  neuman/harvard arch ?
• Power Wall (at server farms/mobile devices)
• The web revolution is not done yet
  (http://www.firebase.com/index.html)
Embrace change
‘Form is temporary.
         Class is permanent’
• XML is emerging from its ‘Trough of
  disillusionment’, because its useful, productive
  and reacting to new requirements.
• Modern XML is successful on many different
  measure, mature and dead boring
• Modern XML can help solve your BigData
  problems
Pull the Technology Trigger –
          Try an XML Database Today!
• MarkLogic 6
    –   Web dev ‘surface area’, work with JSON
    –   REST API
    –   Java API
    –   Work across different data
•   Zorba
•   eXist
•   BaseX
•   Sedna

Weitere ähnliche Inhalte

Andere mochten auch

Information in a Current Condition
Information in a Current Condition Information in a Current Condition
Information in a Current Condition Emma Dickens
 
RTBI_Impact report_Final_PDF file
RTBI_Impact report_Final_PDF fileRTBI_Impact report_Final_PDF file
RTBI_Impact report_Final_PDF fileParasuram K
 
Dynamic composition of virtual network functions in a cloud environment
Dynamic composition of virtual network functions in a cloud environmentDynamic composition of virtual network functions in a cloud environment
Dynamic composition of virtual network functions in a cloud environmentFrancesco Foresta
 
Upgrading PDF Plugins to DITA_DITA-OT Day 2016
Upgrading PDF Plugins to DITA_DITA-OT Day 2016Upgrading PDF Plugins to DITA_DITA-OT Day 2016
Upgrading PDF Plugins to DITA_DITA-OT Day 2016IXIASOFT
 
Groovy on Android (as of 2016)
Groovy on Android (as of 2016)Groovy on Android (as of 2016)
Groovy on Android (as of 2016)Kevin H.A. Tan
 
Scala adoption by enterprises
Scala adoption by enterprisesScala adoption by enterprises
Scala adoption by enterprisesMike Slinn
 
Sbt, idea and eclipse
Sbt, idea and eclipseSbt, idea and eclipse
Sbt, idea and eclipseMike Slinn
 
Composable Futures with Akka 2.0
Composable Futures with Akka 2.0Composable Futures with Akka 2.0
Composable Futures with Akka 2.0Mike Slinn
 

Andere mochten auch (11)

Information in a Current Condition
Information in a Current Condition Information in a Current Condition
Information in a Current Condition
 
RTBI_Impact report_Final_PDF file
RTBI_Impact report_Final_PDF fileRTBI_Impact report_Final_PDF file
RTBI_Impact report_Final_PDF file
 
Certificate_1
Certificate_1Certificate_1
Certificate_1
 
Dynamic composition of virtual network functions in a cloud environment
Dynamic composition of virtual network functions in a cloud environmentDynamic composition of virtual network functions in a cloud environment
Dynamic composition of virtual network functions in a cloud environment
 
Upgrading PDF Plugins to DITA_DITA-OT Day 2016
Upgrading PDF Plugins to DITA_DITA-OT Day 2016Upgrading PDF Plugins to DITA_DITA-OT Day 2016
Upgrading PDF Plugins to DITA_DITA-OT Day 2016
 
Groovy on Android (as of 2016)
Groovy on Android (as of 2016)Groovy on Android (as of 2016)
Groovy on Android (as of 2016)
 
Scala at Netflix
Scala at NetflixScala at Netflix
Scala at Netflix
 
Hanuman
HanumanHanuman
Hanuman
 
Scala adoption by enterprises
Scala adoption by enterprisesScala adoption by enterprises
Scala adoption by enterprises
 
Sbt, idea and eclipse
Sbt, idea and eclipseSbt, idea and eclipse
Sbt, idea and eclipse
 
Composable Futures with Akka 2.0
Composable Futures with Akka 2.0Composable Futures with Akka 2.0
Composable Futures with Akka 2.0
 

Ähnlich wie XML Amsterdam 2012 Keynote

Extending XForms with Server-Side Functionality
Extending XForms with Server-Side FunctionalityExtending XForms with Server-Side Functionality
Extending XForms with Server-Side FunctionalityMarkku Laine
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingPaco Nathan
 
ITCamp 2012 - Alessandro Pilotti - Web API, web sockets and RSignal
ITCamp 2012 - Alessandro Pilotti - Web API, web sockets and RSignalITCamp 2012 - Alessandro Pilotti - Web API, web sockets and RSignal
ITCamp 2012 - Alessandro Pilotti - Web API, web sockets and RSignalITCamp
 
XQuery - The GSD (Getting Stuff Done) language
XQuery - The GSD (Getting Stuff Done) languageXQuery - The GSD (Getting Stuff Done) language
XQuery - The GSD (Getting Stuff Done) languagejimfuller2009
 
Neotys PAC 2018 - Jonathon Wright
Neotys PAC 2018 - Jonathon WrightNeotys PAC 2018 - Jonathon Wright
Neotys PAC 2018 - Jonathon WrightNeotys_Partner
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyftmarkgrover
 
Building modern web sites with ASP .Net Web API, WebSockets and RSignal
Building modern web sites with ASP .Net Web API, WebSockets and RSignalBuilding modern web sites with ASP .Net Web API, WebSockets and RSignal
Building modern web sites with ASP .Net Web API, WebSockets and RSignalAlessandro Pilotti
 
Machine learning model to production
Machine learning model to productionMachine learning model to production
Machine learning model to productionGeorg Heiler
 
XML London 2013 - Architecture of xproc.xq an XProc processor
XML London 2013 - Architecture of xproc.xq an XProc processorXML London 2013 - Architecture of xproc.xq an XProc processor
XML London 2013 - Architecture of xproc.xq an XProc processorjimfuller2009
 
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
TensorFlow meetup: Keras - Pytorch - TensorFlow.jsTensorFlow meetup: Keras - Pytorch - TensorFlow.js
TensorFlow meetup: Keras - Pytorch - TensorFlow.jsStijn Decubber
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupPaco Nathan
 
The Magic's in the Glue: Daniela Florescu Presentation on XQuery and the Cloud
The Magic's in the Glue:  Daniela Florescu Presentation on XQuery and the CloudThe Magic's in the Glue:  Daniela Florescu Presentation on XQuery and the Cloud
The Magic's in the Glue: Daniela Florescu Presentation on XQuery and the CloudDave Kellogg
 
JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogene...
JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogene...JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogene...
JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogene...Andrey Sadovykh
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowKaxil Naik
 
pythonOCC PDE2009 presentation
pythonOCC PDE2009 presentationpythonOCC PDE2009 presentation
pythonOCC PDE2009 presentationThomas Paviot
 
Where do you want to go today 2007
Where do you want to go today   2007Where do you want to go today   2007
Where do you want to go today 2007Mike Feltman
 

Ähnlich wie XML Amsterdam 2012 Keynote (20)

Extending XForms with Server-Side Functionality
Extending XForms with Server-Side FunctionalityExtending XForms with Server-Side Functionality
Extending XForms with Server-Side Functionality
 
Linked Process
Linked ProcessLinked Process
Linked Process
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 
ITCamp 2012 - Alessandro Pilotti - Web API, web sockets and RSignal
ITCamp 2012 - Alessandro Pilotti - Web API, web sockets and RSignalITCamp 2012 - Alessandro Pilotti - Web API, web sockets and RSignal
ITCamp 2012 - Alessandro Pilotti - Web API, web sockets and RSignal
 
XQuery - The GSD (Getting Stuff Done) language
XQuery - The GSD (Getting Stuff Done) languageXQuery - The GSD (Getting Stuff Done) language
XQuery - The GSD (Getting Stuff Done) language
 
Dev Ops Training
Dev Ops TrainingDev Ops Training
Dev Ops Training
 
Neotys PAC 2018 - Jonathon Wright
Neotys PAC 2018 - Jonathon WrightNeotys PAC 2018 - Jonathon Wright
Neotys PAC 2018 - Jonathon Wright
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyft
 
Building modern web sites with ASP .Net Web API, WebSockets and RSignal
Building modern web sites with ASP .Net Web API, WebSockets and RSignalBuilding modern web sites with ASP .Net Web API, WebSockets and RSignal
Building modern web sites with ASP .Net Web API, WebSockets and RSignal
 
Machine learning model to production
Machine learning model to productionMachine learning model to production
Machine learning model to production
 
XML London 2013 - Architecture of xproc.xq an XProc processor
XML London 2013 - Architecture of xproc.xq an XProc processorXML London 2013 - Architecture of xproc.xq an XProc processor
XML London 2013 - Architecture of xproc.xq an XProc processor
 
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
TensorFlow meetup: Keras - Pytorch - TensorFlow.jsTensorFlow meetup: Keras - Pytorch - TensorFlow.js
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
 
Tech
TechTech
Tech
 
my_resume(eng)
my_resume(eng)my_resume(eng)
my_resume(eng)
 
The Magic's in the Glue: Daniela Florescu Presentation on XQuery and the Cloud
The Magic's in the Glue:  Daniela Florescu Presentation on XQuery and the CloudThe Magic's in the Glue:  Daniela Florescu Presentation on XQuery and the Cloud
The Magic's in the Glue: Daniela Florescu Presentation on XQuery and the Cloud
 
JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogene...
JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogene...JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogene...
JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogene...
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
 
pythonOCC PDE2009 presentation
pythonOCC PDE2009 presentationpythonOCC PDE2009 presentation
pythonOCC PDE2009 presentation
 
Where do you want to go today 2007
Where do you want to go today   2007Where do you want to go today   2007
Where do you want to go today 2007
 

Kürzlich hochgeladen

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Kürzlich hochgeladen (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

XML Amsterdam 2012 Keynote

  • 1. BigData and Modern XML Jim Fuller email: jim.fuller@marklogic.com twitter: @xquery Senior Engineer, Europe 19/09/12
  • 2. Senior engineer http://jim.fuller.name http://exslt.org @xquery XSLT UK 2001 http://www.xmlprague.cz @perl6 Perlmonks Pilgrim
  • 3. Kickoff XML current status Modern XML & BigData
  • 4. ‘ontogeny recapitulates phylogeny’ or A (very)Brief History of ML • Late 1950s: Noam Chomsky ‘generative grammars’ • 1969: Charles Goldfarb (w/ Ed Mosher and Ray Lorie) created GML • 1986: SGML formalized • 1998: XML 1.0 W3C recommendation • 1998 – 2012: A lot of stuff happened • Future: XML 2.0 … microXML ?
  • 5. RDBMS Goliath vs XML David • Back then, XML was the proto ‘nosql’ • X in AJAX • Now many ‘davids’ • AJAJ
  • 6. Documents • Back then, it wasn’t unusual for vendor to say ‘tough luck’ with your data (pay up) • Now, most office documents are in XML
  • 7. The ‘long tail’ of XML Vocabularies • Back then, vocabularies built with proprietary approaches • Today, 1000’s of vocabularies based on XML – ‘2012 U.S. GAAP Taxonomy Adopted by SEC; FASB Webcast April 3’
  • 8. Anyone heard of shipdex ?
  • 9. Back then, XML/Markup Conferences • Software Development 99 East, November 8-13, 1999, Washington D.C. • XML One Fall 99, November 8-11, 1999, Santa Clara, CA • XML '99 December 6-9, 1999, Philadelphia PA • Markup Technologies '99 Conference December 5-9, 1999, Philadelphia • Web Design 2000, February 7-9, 2000, Atlanta • XTech '2000, February 27-March 2, San Jose • Software Development 2000 West, March 20-24, 2000, San Jose • Sixteenth International Unicode Conference, Boston, March 27-30, 2000 • The Ninth International World Wide Web Conference, May 15-19, 2000, Amsterdam • DL 2000: Fifth ACM Conference on Digital Libraries, June 3-6 2000, Texas • XML Europe 2000, June 12-16, Paris • Web Design World 2000, July 17-21, 2000, Seattle, Washington • MetaStructures, August 14-16, 2000, Montreal, Quebec, Canada • XML Developers Conference, August 17-18, 2000, Montreal, Quebec • Internet World Expo, October 25-27, 2000, New York City • XML 2000/Markup Technologies 2000, December 3-7, Washington • ….. Even a Geek Cruises XML Excursion - January 2001
  • 10. Today - XML/Markup Conferences • The XML ‘parallelogram’ – Balisage – XML Summer School – XML Prague – XML Amsterdam • Xtech* • markupForum • XATA • MarkLogic World (600 ppl) • databaseX (London November 2013 ?)
  • 11. Other important good stuff • Evolution of the Operating System – Unix is the operating system for text – Windows tried to be the operating system for binaries, then adopted xml .. Mixed bag – Java (vm) has a strong xml stack • The web changed everything to text based markup. • cheap RAM/Disk/CPU • Virtualization = scale out
  • 12. Other important good stuff http://googleblog.blogspot.cz/2012/02/unicode-over-60-percent-of-web.html
  • 13. Unfair to point out failure ? • Namespaces • XLINK • WS* astronautics • Draconian error checking • XML SCHEMA • XFORMS • XSLT 1.0 (or any xml) in the browser • XHTML vs HTML5 • Too many specs (modularity good, complexity bad)
  • 14. “Winning isn’t everything. There should be no conceit in victory and no despair in defeat.” - Matt Busby • 2001 I was the RDBMS serial killer – ‘kill RDBMS’ • Define successful ? – Adoption ? – Cheaper ? – Faster ? – Better ?
  • 15. Drill Down distraction - Why is Xquery successful productive ? • Choose my most successful (adhoc stories, visible success) • Functional, dynamic … work with structure, text and values … stored proc + query lang • XPATH^ • Is it possible to qualify/quantify Xquery productivity?
  • 16. Programming Language Productivity Data compiled from studies by Prechelt and Garret of a particular string processing problem - public domain 2006.
  • 17. Programming Language Productivity Data compiled from studies by Prechelt and Garret of a particular string processing problem - public domain 2006.
  • 18. * 28msec – 2011 http://www.28msec.com/html/home Java XQuery SimpleDB 2905 572 S3 8589 1469 SNS 2309 455 13803 2496
  • 19. Developing an Enterprise Web Application in XQuery - 2009 Martin Kaufmann, Donald Kossmann Java/J2EE XQuery Model 3100 240 View 4100 1500 Controller 900 1180 8100 (?) 2920 (3490)
  • 20. Nooooo! The problem with loc correlation of failure with very high loc is the only certain fact with loc That’s about it
  • 21. An empirical comparison of C, C++, Java, Perl, Python, Rexx, and Tcl for a search/string-processing program Lutz Prechelt (prechelt@ira.uka.de) Fakulta ̈t fu ̈r Informatik Universita ̈t Karlsruhe Language #loc per Function Point C 91 C++ 53 Java 54 Perl 21 * Designing and writing programs using dynamic languages tended to take half as long as well as resulting in half the code.
  • 22. Function Point Method Nooooo! #loc per FP = Lines of code Per Function Point
  • 23. Project Uncertainty Principle * Dilbert Comic 2003 United Features Syndicate Inc
  • 24. Reviewed 11 projects FP Analysis Calc FP inputs/outputs Calc VAF (0.65 + [ (Ci) / 100]) AVP = VAF * sum(FP) #loc using cloc = #loc per FP * FP overview - http://www.softwaremetrics.com/fpafund.htm
  • 25. Language #loc per Function Point Perl 21 Eiffel 21 SQL 13-30 XQuery 27-33 Haskell 38 Erlang 40 Python 42-47 Java 50-80 Javascript 50-55 Scheme 53 C++ 59-80 C 128-140 http://www.qsm.com/resources/function-point-languages-table
  • 27. Preferred Programming Language 73% 55% 45% 32% 22%
  • 28. Which data formats do you use the most ? 95% 40% 39% 32% 27% 18% 15%
  • 29. Do you think XQuery makes you a more productive programmer ? 67% 14% 10% 8%
  • 30. Is XQuery more productive then (with???) Java in developing web based data applications ? 58% 22% 12% 8%
  • 31. Time to bust one myth • xml is too slow and bloated • http://www.navioo.com/ajax/ajax_json_x ml_Benchmarking.php • In data orientated AJAJ scenarios with JSON … best most benchmarks today is 30% faster with less load (so more with less resources)
  • 32. mongodb * http://www.linkedin.com/skills/skill
  • 33. Javascript * http://www.linkedin.com/skills/skill
  • 34. XQuery * http://www.linkedin.com/skills/skill
  • 35. XSLT * http://www.linkedin.com/skills/skill
  • 36. hadoop * http://www.linkedin.com/skills/skill
  • 37. Java * http://www.linkedin.com/skills/skill
  • 38. JSON * http://www.linkedin.com/skills/skill
  • 39. XML * http://www.linkedin.com/skills/skill
  • 40. Back When SQL Was Invented…
  • 41. born in the 90’s
  • 42. XML ?
  • 44. Channel effect of Aging inTechnology • “Average age of @guardian Facebook audience is 29. Website is 37, print paper 44. Amazing channel effect, really. #newsrw” • Babyboomers, Gen X, Y and Z • I feel a bit uneasy framing generational arguments …
  • 45. Death of the XML Child …Overachieving Child Prodigies grow up
  • 46. Lets not get distracted.
  • 48. XML Hard Core - XML Hype cycle 2002 2006 2012 1998 2007 XML’s reported death-> 2009
  • 49. REST of the World - XML Hype cycle 2006 2002 2009 1998 2012 XML’s reported death->
  • 50. hype cycle *2012 Gartner Hype Cycle http://www.gartner.com
  • 51. 2001 Edd Dumbill – xml.com ‘Stop the XML hype, I want to get off As editor of XML.com, I welcome the massive success XML has had. But things prized by the XML community — openness and interoperability — are getting swallowed up in a blaze of marketing hype. Is this the price of success, or something we can avoid? ‘ Source: Edd Dumbill (March 2001)
  • 52. 2012 Edd Dumbill g+ post ‘For many years I was the editor of XML.com, and the chair of the XML Europe conference. Today, it seems that XML's mission to be a web language is mostly dead. I'm not saying XML is useless: it has proved itself as a more easily-used SGML, but I'm not sure it's expanded too far outside of that.’ Source: Edd Dumbill (March 2012)
  • 53. Current Status: XML is dead • XML fought too many battles (RDBMS, NoSQL, web developers, HTML5) • Age channeling and Hype curve in effect • But XML technology stack is embracing JSON etc … • No room for sentimentality in technology
  • 54. XML is dead boring
  • 56. Big Data & Modern XML
  • 58. Is XML Applicable to Big Data ? • We know it is, that’s why I am here • Some of you already know • Need to dig into the detail • But we first need to simplify things
  • 62. managing data variability, volume & velocity is hard You need to be a (data) scientist to build this rocket ship.
  • 63. So whats the problem again ? #1 – How to Apply Modern XML to your BigData problems ? #1a: XML Milieu too complicated, need to identify what is successful as Modern XML #1b – BigData is a huge opportunity #1c – BigData has a huge learning curve and high risks
  • 64. Solving #1 – Defining Modern XML • Identify the technologies • Identify and classify the Scenarios
  • 65. Modern XML Technology analysis • Internal survey of ML Customer projects & External survey of projects (w/ pref towards Big/Complex projects) • Informal Survey (polldaddy) • Qualitative and quantitative
  • 66. Eisenhower - "What is important is seldom urgent and what is urgent is seldom important," URGENT NOT URGENT IMPORTANT Critical Goals NOT IMPORTANT interruptions Distractions
  • 67. Survey Interpretations • XML 1.0, Namespaces is important now • XProc, XHTML important now • XSLT 2 and XQuery 1 very important now • XSLT 2 and XQuery 2 in the browser near future • XQuery 3.0 important near future • SAX/DOM now, XOM possible future • XML Schema 1.0 now, 1.1 for the near future • Schematron surprising • Semweb is for the future • SVG and MathML due to web browser support • XML vocabulary has a very ‘long tail’
  • 68. Modern XML Technology Candidates Core XML 1.0 These technologies trended Namespaces highly across all analysis Other Bold – could be trending due to browser impl/historical Transform XSLT 2.0 dep XQuery 1.0 Processing SAX, DOM Schema Schematron XML Schema 1.0 Semantics RDF OWL Vocabularies Office Doc ML SVG
  • 69. Modern XML Tier 1 Core XML 1.0 These technologies trended Namespaces highly across all analysis Other XProc Bold – could be trending due to browser impl/historical Transform XSLT 2.0 / 3.0 / browser dep XQuery 1.0 / 3.0 Processing SAX, DOM Italic – strong signal, early Schema Schematron usage, interest of unproven XML Schema 1.0 / 1.1 spec/tech Semantics RDF OWL Vocabularies Office Doc ML SVG
  • 70. Modern XML Modern XML Tier 1 Tier 2 Core XML 1.0 XML Canonicalization Namespaces xml:id Other XProc XHTML* Transform XSLT 2.0 / 3.0 / browser XSLT 1.0 XQuery 1.0 / 3.0 Processing SAX, DOM XOM, STAX RELAX-NG Schema Schematron XML Schema 1.0 / 1.1 SPARQL Semantics RDF OWL Vocabularies Office Doc ML MathML SVG Docbook SOAP* , DITA, EPUB
  • 71.
  • 72. Modern XML Modern XML Tier 1 Tier 2 Core XML 1.0 XML Canonicalization Namespaces xml:id XML infoset Other XProc XHTML* Transform XSLT 2.0 / 3.0 / browser XSLT 1.0 XQuery 1.0 / 3.0 Processing SAX, DOM XOM, STAX RELAX-NG Schema Schematron XML Schema 1.0 / 1.1 SPARQL Semantics RDF OWL Vocabularies Office Doc ML MathML SVG Docbook SOAP , DITA, EPUB, Data Formats XML, text, binary, JSON
  • 73. The technology triggers • XML Database – reduce the complexity/risk of BigData – MarkLogic – eXist – Zorba – Sedna – Basex – Others (Oracle!) • Xquery - Rapid prototyping • Avoid purist architectures, embrace heterogeneity
  • 74. Modern XML / BigData Scenarios • Classic Scenarios – Document (xml) Database – Aggregation – Enterprise Search – Heterogeneous Content store – Publishing • BigData Scenarios – BigData ‘classic’ – Extreme personalisation – Predictive analytics – Financial analysis – Realtime analysis (management/financial) – Actionable intelligence • Semantic Web – too early to categorize but its for real
  • 75. Solving Problem #2 – Focus on the Practicalities • What type of Big Data problem do you have ? – The urgent, important ones you know about – The urgent, important ones you don’t know about • Create a dedicated team (analytics, problem domain experts) to identify the later • Assess data maturity (Data Audit) • With power comes responsibility … Ethical Analytics
  • 76. BigData Tech Advice • Start using an XML database asap! • Don’t get distracted by the zoo … start hadooping right away • ‘Data outlives code’, spend more time on the data, clean abstractions, cogent, opening it up
  • 77. Size appropriately Volume – will be relative to your current capability, if the requirement is a magnitude greater past current infrastructure scaling Velocity – Updates versus reads ? High volatility with realtime queries ? Variety – managing versioning ? Complexity – multiples, complex processes
  • 78. Size Appropriately: Are you a ‘Facebook’ (Google, Yahoo…) ? • 2.5 billion content items shared per day (status updates + wall posts + photos + videos + comments) • 2.7 billion Likes per day • 300 million photos uploaded per day • 100+ petabytes of disk space in one of FB’s largest Hadoop (HDFS) clusters • 105 terabytes of data scanned via Hive, Facebook’s Hadoop query language, every 30 minutes • 70,000 queries executed on these databases per day • 500+terabytes of new data ingested into the databases every day • Are you planning to scale out too ~180,900 servers ? • ~18000 database servers ingesting 500+ terabytes of data through a guestimated 50+ billion calls …. A day! http://www.datacenterknowledge.com/the-facebook-data-center-faq/
  • 79. Solving Problem #3 – Understanding the risks • Biggest mistakes seen with BigData adoption • ‘data scientists themselves don't have much of intuition either…and that is a problem. I saw an estimate recently that said 70 to 80 percent of the results that are found in the machine learning literature, which is a key Big Data scientific field, are probably wrong because the researchers didn't understand that they were overfitting the data’. – Alex Pentland MIT's “Big Data guy”
  • 80. Summary • We reviewed some aspects of XML current status in the dataverse • Identified a subset of the XML Milieu – calling it Modern XML • Identified the scenarios where Modern XML are being brought to bear with Bigdata • Reviewed common mistakes and Risks with BigData
  • 81. Final Thesis • Modern XML provides great foundation today – Great for ‘classic’ scenarios – Great technical positioning for addressing challenges of BigData – Great technical positioning for semweb • Adopting an XML database mitigates risk • Knowing Bigdata/Modern XML scenarios helps us mitigate risks • There is a big prize if you get BigData right
  • 82. Avoid stereotypes I’m a RDBMS I’m a Protocol Buffer I’m a Json I’m an XML
  • 83. Jeni Tennison XML Prague 2012 talk JSON XML RDF HTML
  • 84. Be wary of Paradigm Shifts • RedMonks - Language divergence • Andresson - Software is eating the world • 128bit and beyond current von neuman/harvard arch ? • Power Wall (at server farms/mobile devices) • The web revolution is not done yet (http://www.firebase.com/index.html)
  • 86. ‘Form is temporary. Class is permanent’ • XML is emerging from its ‘Trough of disillusionment’, because its useful, productive and reacting to new requirements. • Modern XML is successful on many different measure, mature and dead boring • Modern XML can help solve your BigData problems
  • 87. Pull the Technology Trigger – Try an XML Database Today! • MarkLogic 6 – Web dev ‘surface area’, work with JSON – REST API – Java API – Work across different data • Zorba • eXist • BaseX • Sedna

Hinweis der Redaktion

  1. First encounter with BigData – mapmaking (Gravity map of Rhode Island) late 1980’s – geophysics generates a lot of data points
  2. Apologies for the gratitious football analogies … it was either that or Jaws
  3. Chomsky proposed the notion of grammar to capture the structural constraints of a particular language. A grammar is described as a set of production rules. Depending on the kind of rules one is allowed to write, Chomsky distinguished four types of grammars of decreasing complexity, from type 0 (unconstrained) to type 3 (regular grammar). While type 0 and type 1 grammars need a full-fledged Turing machine to be checked, type 2 or context free grammars (CFG) only need a stack machine, and type 3 or regular grammars only need a finite state automaton. The last two are interesting from a computer science perspective, as they require less complex algorithms.
  4. Binaries replaced in most office programsAvg 200 word docs on each pc (comScore tech matrix study 2008), 1 billion * 100 billion xml files latently living on pc users hard drivesGartner study 2010 – as little as a few hundred billion xml based MS Word docs on the webWhats in email, sharehpoint, websites ?These are all lowball figures … not including open source file formats, or ebooks80% of all companies use some form of Office (a few years ago MS quote that there were billion instances of office worldwide) with nearly half of these being versions that default generate XML … that’s a lot of xmlAustralia Australia's Department of Finance has released a desktop policy that required all agencies to adopt Office Open XML as the standard document format.[37]Belgium Belgium's Federal Public Service for Information and Communication Technology in 2006 was evaluating the adoption of the Office Open XML format. It already then confirmed that it would consider all ISO standards to be open standards, mentioning Office Open XML as such a possible future ISO standard.[38]Denmark In June 2007, the DanishMinistry of Science, Technology and Innovation recommended that beginning with January 1, 2008 public authorities must support at least one of the two word processing document formats Office Open XML or Open Document Format in all new IT solutions, where appropriate.[39]Germany In Germany the Office Open XML standard is currently under observation by the Federal Commissioner for Information Technology ("Die Beauftragte der BundesregierungfürInformationstechnik"). The latest release of "SAGA" (Standards and Architectures for E-Government-Applications) includes Office Open XML file formats in both its strict and transitional variant. The ISO/IEC 29500 standard may be used to exchange complex documents when further processing is required.[40]Japan On June 29, 2007, the government of Japan published a new interoperability framework which gives preference to the procurement of products that follow open standards.[41][42] On July 2 the government declared that they hold the view that formats like Office Open XML which organizations such as Ecma International and ISO had also approved was, according to them, an open standard.[43] Also, they said that it was one of the preferences, whether the format is open, to choose which software the government shall deploy. Lithuania The Lithuanian Standards Board has adopted the ISO/IEC 29500:2008 Office Open XML format standard as the Lithuanian national standard. The decision was made by Technical Committee 4 Information Technology on March 5, 2009. The proposal to adopt the Office Open XML format standard was submitted by the Lithuanian Archives Department of the Government of the Republic of Lithuania.[44]Norway Norway's Ministry of Government Administration and Reform is evaluating the adoption of the Office Open XML format. The ministry put the document standard under observation in December 2007.[45]Sweden The Kingdom of Sweden has adopted Office Open XML as a 4 part Swedish National Standard SS-ISO/IEC 29500:2009.[46][47][48][49] Switzerland In July 2007, the Swiss Federal Council announced adherence SAGA.che-Government standards mandatory for its departments as well as for cantons, cities and municipalities. The latest version of SAGA.ch includes Office Open XML file formats.[50]United Kingdom The UK has put out an action plan for use of open standards, which includes ISO/IEC 29500 as one of several formats to be supported.[51][52]United States of America On April 15, 2009, the ANSI-accredited INCITSorganisation voted to adopt ISO/IEC 29500:2008 as an American National Standard.[53] The state of Massachusetts has been examining its options for implementing XML-based document processing. In early 2005, Eric Kriss, Secretary of Administration and Finance in Massachusetts, was the first government official in the United States to publicly connect open formats to a public policy purpose: "It is an overriding imperative of the American democratic system that we cannot have our public documents locked up in some kind of proprietary format, perhaps unreadable in the future, or subject to a proprietary system license that restricts access".[54] Since 2007 Massachusetts has classified Office Open XML as "Open Format" and has amended its approved technical standards list — the Enterprise Technical Reference Model (ETRM) — to include Office Open XML. Massachusetts, under heavy pressure from some vendors, now formally endorses Office Open XML formats for its public records.[55]
  5. http://en.wikipedia.org/wiki/List_of_XML_markup_languageshttp://www.service-architecture.com/xml/articles/specific_xml_vocabularies.htmlhttp://www.iso20022.org/the_iso20022_standard.pagehttp://www.pcmag.com/encyclopedia_term/0,1237,t=XML+vocabulary&i=55060,00.asphttps://www.oasis-open.org/standards#ublv2.0NEIMhttp://www.ibm.com/developerworks/xml/library/x-NIEM1/index.htmlPMMLhttp://en.wikipedia.org/wiki/Predictive_Model_Markup_Language
  6. The The Ninth International World Wide Web Conference, May 15-19, 2000, Amsterdam had an XML Trackhttp://www9.org/http://www9.org/w9-devxml.html
  7. SmallerMore focusedThere are also conferences on vocabularies, but they are less about XML and more about the problem domain itself
  8. C/C++ are the language for binariesJava heavily adopted XML good at text/binariesWith html being the single preferred markup language
  9. HTML5 +javascript kills flashIt remains to be seen what will kill PDF’sVirtualisationCheaper hardware/software
  10. Instead of focusing on the negatives we know about, I thought I would spend some time being more precise on the positives
  11. Its ML special sauce
  12. Searched around in the literature of how to measure a programming language’s productivity
  13. Amazon client libraries written in XQuery have 80% less code than their equivalent written in Java.
  14. Useful study on implementing an entire enterprise web applicationDave Thomas mentionedThat the bigger a program gets is the single worst thingA long paper trail of software engineering studies has shown that many internal code metrics (such as methods per class, depth of inheritance tree, coupling among classes etc.) are correlated with external attributes, the most important of which is bugs. What the authors of this paper show is that when they introduce a second variable, namely, the total size of the program, into the statistical analysis and control for it, the correlation between all these code metrics and bugs disappears.Furthermore, this relates to larger development teams who by dint of their size generate large LOC e.g. the failure rate of projects with over 300-400 developers working on them skyrockets.
  15. Probably not to do with loc itself, but with the fact that larger programs usually have more features to fail!
  16. The following study (related to previous study) discovered the avg number of lines of code to implement a single function pointDesigning and writing programs using dynamic languages tend to take half as long, resulting in half the codeMore code = more bugs, studies have shown a direct relationship to failure with high loc
  17. Settled on #loc per function pointLOCLine of code (LOC)Function pointsA method of decomposing a projects requirements in hope of being able to estimate effort to do the project
  18. Before you start throwing stuff at me for mentioning LOC and FPI do not subscribe to using LOCC and FP for project estimation … though clearly there is a lot of historical analysis which I will leverage
  19. Projects have been anonymised to protect the innocent (my colleagues, clients, etc) … disclaimer I did 4 of these Xquery projectsTried to reduce mixed language affect … e.g. but because of Xquery ‘dsl’ness for things like data apps no problemsFP range between ~250-1200Toke me 4 daysVAF in actuality remained close to 1.0Methodologyanalyzed 11 reasonably sized projects (4 were done my me)cloc defined lines of code based on user point of view I defined FP and summed themdefined VAF for each projectVAF = (TDI*0.01) + 0.65AVP = VAF * sum of FP
  20. Close to SQLXquery is a query language and ‘good enough’ stored proc language for working with XMLSeems to matchup that its twice as productive as Java on paperVAF modifier ranges between .6 - 1.3 … in actuality for most of the projects was very close to 1.0 (confirms its usage across the industry)Largest: ~15000 loccQuite surprised by the results … they seem to confirm what people are feeling that xquery does the job with less codeWould need to do analyze a lot more projects … probably not enough xquery projects in existence to match other function point historical data tables for other languages.Threats to validityLow sample sizeInaccurate FP analysisSelection biasMixed language effectJob survey demonstrates that xquery jobs are in demand … an indirect measure that shows there is something cooking with XQueryAdhoc survey shows that a significant % of xquery developers think they are more productive when using XQuery … specifically when programming with XQuery and Java, C++, and JS working with XML, text, RDBMS and JSON.#Loc/FP Analysis confirms that XQuery is about as productive to SQL but has a much larger applicability … theadhoc survey seems to indicate that xquery is used in conjunction with an xml database is significantly leveraged when XQuery is used in conjunction with XML datastoreFindingsXquery is a DSL, though expansive not yet a GPL and its unclear if it should beNeeds better docs, tooling, librariesIs good because of fpIs bad because of fpVery good with XMLXQuery's most suitable purpose is in making semi-structured (i.e. XML) information repositories accessible, scrutable, and tractableXslt is complimentary by generating the viewXrx is productiveProductive used in conjunction with Java
  21. Ran from Sept 20 – Oct 1st102 people responded15,000,000 programmers worldwide (wikipedia)~100 people95% certainconfidence interval: +- 9.8% errorUnited States 43%    United Kingdom 15%     Germany 10%     France 8%     Czech Republic 3%     Netherlands 2%     Switzerland 2%50% people put their name to the poll
  22. This survey targeted developers who used Xquery.Strong correlation between usage of xquery and java and xsltMultiple choiceXquery 73 22%Java 55 17%XSLT 45 14%Javascript 32 10%C++ 22 7%python 18 5%C++ 14 4%Perl 12 4%C# 12 4%ruby 10 3%php 10 3%Haskell 9 3%Scala 9 3%Lisp 7 2%Erlang 1 >1%
  23. Strong correlation between XML and usage of text, rdbms and jsonMultiple choiceXML 95 36%Text 40 15%RDBMS 39 15%JSON 32 12%Binaries (images, video, etc) 27 10%Office documents 18 7%Semantic web stuff (RDF, owl, etc..) 15 6%
  24. Single OptionYes 67 67%Maybe 14 14%No 10 10%Don’tKnow 8 8%
  25. Ok, we’ve drilled down into Xquery … we don’t have time to drill down into every technology we deem productive … but clearly there is something to this xml stack that is real
  26. http://www.navioo.com/ajax/ajax_json_xml_Benchmarking.phpClaiming 2 to 10 times fasterNow little differencehttp://www.navioo.com/ajax/examples/json/test.phpOptimisations in the browser have helped bothIn programming languageEvidenceWith IE8 css2 started getting its act together (nightmares of IE6 fading in the distance) … earlier XSLT 1.0 looked promising, CSS3 even more promisingSafari/chrome/opera -- Data vs document orientated … clearly only some scenariosXml is too slow or bloated XML is not html … and the whole XHTML Forced xml processing with XSLT 1.0 in the browser Dynamic dispatch and fp = big learning curve for most web developersTooling and browsers misinterpreted draconian well formedness
  27. Hstore for postgresql is key value store with ACIDDropping acid
  28. Hstore for postgresql is key value store with ACIDDropping acid
  29. If we told people that Goldfarb’s GML was born in the 60’s … which begot SGML hence XML it
  30. JeniTennison evoked wonderful imagery at her XML Prague 2012 keynote
  31. Though sometimes its hard to not fight a war, when encountering people with well meaning sentiments
  32. We fought many warsRDBMSWeb browsers (browser ppl won html5 is markup)Interchange (JSON won)We are in a ‘don’t mention the war’ period.Not necc isolationist … modern xml technology stack (as we will identify later) is very active in embracing jsonWeb people think textual markup is dead whilst using it ? Strange irony to that, but they are just emerging from the trough of disillusionmentXML folks are embracing how to integrate with JSON … webdevppl don’t want to know about it.Lost the war with RDBMSLost the war for the browserLost the war for interchange
  33. 2002- lots of books, lots of adoption, lots of hype2006- December 2005, Yahoo! began offering some of its web services in JSON and google starts providing JSON to GDATAXML’s perception tainted by financial crisis (lots of content providers going out of business)Yet XML Prague attendance doubled (and sold out between 2009-2011)Bigdata and semantic web showing that we need more
  34. WS* astronautics were shooting XML into orbitHeavy on the Enterprise Investment by browsers, sun, microsoftetcXML Hype cycle was several years in the making ( we are now on the slope of enlightment = modern xml)Switching from relational to hierarchical (text, structure (mixed content), values = semistructured data)Though I find it a bit unfair … bigdata is mentioned on this list as if it was a ‘thing’ but it’s an underpinning The map/reduce hype cycle ?Disagree with some thingsHtml5 is probably just about starting down the trough of disillusionment … Ian Hickson / Anne … html is looking like PDF these days (html5+js+css3) … its great progress but not on things I consider importanthttp://www.itworld.com/it-managementstrategy/293397/gartner-dead-wrong-about-big-data-hype-cycleArgues that the hype cycle is wrong because BigData has ‘real’ benefitis … he is missing the point
  35. Don’t get upset if your pet technology goes in and out of fashion … expect this to happen a few times in your career.Sentimentality – that’s like saying you should start using goto statements because you miss them ,… XML needs to have a meaning a use, a valid domain to be applied too
  36. We’ve talked about where XML has been and where is is today, as well as update some of the older perma topicsBut I mainly wanted to talk to you about XML’s place in the dataverse … as it relates to BigData
  37. Is the problem thatXML is dead or XML time is up ?Not really … because XML is everywhere … its not going anywhere soon.Its everywhere in a way that JSON will never be … which is one of the reasons for JSON success/uptake.The problem is not XML vs JSON, we’ve been over that debate and I think everyone here can see the benefits of each data format.
  38. http://kensall.com/big-picture/bigpix22.html
  39. I said the NoSQL word, now I will say the other word e.g. BigData. … Curt Monash, well known db analyst, calls this polystructured … many call it unstructured but even text data will have some structure, probably all heard about the 85% of data goes unused.Just 10% increase in using a companies existing data can result in giant gains.
  40. Show how this relates to specific industry sectors …
  41. The three v’s of data is hard to manage.When I first saw this graphic I thought it was a pair of programmers (mostly because the guys look kind of like Larry Wall), but I think these guys are business guys and it occurred to me that we are in a strange place now where business folk are making commercial decisions based on algorithms … algorithms are absolutely crucial to our craft but it trivilizes the solution … like saying we will use hammers to build a house; of course we will use hammers.Developers need to balance off their desire to learn algorithms with the reality of getting stuff done
  42. http://jimfuller2011.polldaddy.com/surveys/1906925/report/locationsCaveat – we are talking about solutions with databases !
  43. If things are in urgent/important cell, that’s what you work on first, try to push everything into the Important, not urgent categoryNever read ‘The 7 Habits of Highly Effective People’
  44. http://jimfuller2011.polldaddy.com/surveys/1906925/report
  45. Items in bold are almost certainly skewed by either large historical dependency and/or browsers now implementItems italic/underlined are either in recc stage or was just ‘on the line’ in terms of ranking data
  46. Items in bold are almost certainly skewed by either large historical dependency and/or browsers now implementItems italic/underlined are either in recc stage or was just ‘on the line’ in terms of ranking data
  47. This is much better subsetItems in bold are skewed by either large historical dependency and/or browsers support
  48. http://kensall.com/big-picture/bigpix22.html
  49. This is much better subsetItems in bold are skewed by either large historical dependency and/or browsers support
  50. Data maturity******* Stage one – ‘no usable data’******* Stage two, ‘too much data’, isn’t much better though. When you are swamped with data it will take up too much of your time to sort through it and the chances are that you will end up with many, if not most, of your insights being unrelated to your core business strategy. Before you know it you’re running around in woods that are heavily dense with trees and inhabited by wild geese.******* Stage three, ‘the right data’, is better, as you may well assume. With the ‘right’ data you can get the insights that support your primary business focus, ensuring that you have as much information to facilitate success in your chosen field as possible.******* The ‘predictive’, stage four, is one that many consider to be the optimum stage. This is where you make the transition from reactive to proactive. When you reach the predictive stage you can start to understand how certain influences in the future will affect your business and plan accordingly. A slightly banal yet illustrative example is to calculate what the expected peaks in website visitors will be following an advertising campaign so that enough bandwidth can be employed to cope. Something more complex might involve a simulation of market patterns and supply chain effects should a large scale natural disaster occur.******* The final stage, ‘strategic’, is the most data intensive
  51. So far most of the scenarios I showed are BigData … or at a minimum represent maximums for their industry sectorNo feasibility study – initial sanity check if what you want to do is possible No organized selection process – self selection means no support/buy in at the various levels needed …FOSS selects itself !No proof of conceptPremature project initiation, before data is readyOverfittingIn statistics and machine learning, overfitting occurs when a statistical model describes random error or noise instead of the underlying relationship. Overfitting generally occurs when a model is excessively complex, such as having too many parameters relative to the number of observations. A model which has been overfit will generally have poor predictive performance, as it can exaggerate minor fluctuations in the data.Seems like common sense, but our most successful clients avoided most of these mistakes which reduced risk immeasurablyShoot for the stars, but you don’t really want to build a rocketshipFOSS can be an important onramp to BigData, but eventually you will want to be able to create commercial partnershipsPoC is a scaled down version, the goal is to identify gaps in your skillset … you should help build the PoCStarting a project early is common in enterprise, it’s a mistakeVendors want to sell their software … resist the urge to let them do the work and drill down into the detail with your own problem domain experts
  52. Marc Andreeson 'software is eating the world' http://online.wsj.com/article/SB10001424053111903480904576512250915629460.htmlRedmonk reports that programming languages have never been as diverse as todayRedMonk Tier 1 Languages (02/12)1C2 C#3 C++4 Java5 JavaScript6 Objective-C7 PERL8 PHP9 Python10 Ruby11 ShellscriptSource:RedMonk Tier 2 Languages (02/12)1 ASP2 ActionScript3 Assembly4 Clojure5 CoffeeScript6 ColdFusion7 CommonLisp8D9 Delphi10 EmacsLispwe've been living in a fairly stable hardware bubble for 30 years e.g. same techniques yet smaller, fasterThe power wall About five years ago, however, the top speed for most microprocessors peaked when their clocks hit about 3 gigahertz. The problem is not that the individual transistors themselves can't be pushed to run faster; they can. But doing so for the many millions of them found on a typical microprocessor would require that chip to dissipate impractical amounts of heat. Computer engineers call this the power wall. Given that obstacle, it's clear that all kinds of computers, including supercomputers, are not going to advance at nearly the rates they have in the past.Advances*********** tissue engineering*********** Terascaleneuromorphic chips (memristorssynapes, nanostore memory (logic and memory together)*********** Many billions and probably trillions of electronic tattoos (less than a penny each in most cases) with processing, sensors, memory, wireless*********** 2000 qubit adiabatic quantum computers*********** The human brain project (if funded would be done and if not there are other DARPA and asian projects of comparable scale)*********** Memristors at exascale (supercomputer class), petascale for very affordable systems*********** Sensors even more capable*********** Electronic tattoos even cheaper and more capable.*********** Deep robotics commercialization adoption.*********** Beamed power and persistent UAVs*********** Megascale or gigascale adiabatic quantum computersHardware*********** Optical computing - trapping, storing and manipulating light is difficult.*********** Quantum computing*********** Neuronal computing*********** DNA computing*********** Reversible computing - Normally every computational operation that involves losing a bit of information also discards the energy used to represent it. Reversible computing aims to recover and reuse this energy.*********** Billiard Ball computing - involves chain reactions of electrons passing from molecule to molecule inside a circuit.*********** Magnetic (NMR) computing Every glass of water contains a computer, if you just know how to operate it.*********** Glooper Computer One of the weirdest computers ever built forsakes traditional hardware in favour of "gloopware". Andrew Adamatzky at the University of the West of England, UK, can make interfering waves of propagating ions in a chemical goo behave like logic gates, the building blocks of computers.*********** Mouldy computers*********** Water wave computing Perhaps the most unlikely place to see computing power is in the ripples in a tank of water. Using a ripple tank and an overhead camera, Chrisantha Fernando andSampsaSojakka at the University of Sussex, used wave patterns to make a type of logic gate called an "exclusive OR gate", or XOR gate.
  53. Remember write once, run everywhereProgramming for the browserWell, things changeJava was originally designed for interactive television"Write Once, Run Anywhere" (WORA)Java AppletsBe skeptical of purityXML is for data
  54. When extensibility is not required, XML will always loose against DSLs:Diversity makes strong ecosystemsAs I’ve shown you, Modern XML is being applied to BigData problems today, How it provides;A stable, fast and mature toolset of technologies to work with textual markup, text and in many cases lots of different kind of dataFoundation for semwebCan be applied to a wide range of BigData scenarios