Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Apache Content Technologies

Wird geladen in …3

Hier ansehen

1 von 57 Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie Apache Content Technologies (20)


Weitere von gagravarr (12)

Aktuellste (20)


Apache Content Technologies

  1. 1. If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects
  2. 2. Nick Burch Software Engineer Alfresco
  3. 3. Apache Projects • 79 Top Level Projects • 40 Incubating Projects • 30 “Content Related” Main Projects • 7 “Content Related” Incubating Projects
  4. 4. 37 Projects in 50 minutes With time for questions... This is not a comprehensive guide!
  5. 5. Different Technologies • Serving • Storing • Transforming • Generating • Hosting • Web Framework Rendering / Templating / etc
  6. 6. What can we get in 50 mins? • A quick overview of each project • When talks on the project are happening • When meetups on the project are happening • Anything new/exciting about the project? • What interests me in the project!
  7. 7. Serving up your Content
  8. 8. Apache HTTPD Server http://httpd.apache.org/ • Talks – All day Wednesday Meetup – Thursday evening • Very wide range of features • (Fairly) easy to extend • Can host most programming languages • Can front most content systems • Can proxy your content applications • Can host code and content
  9. 9. Apache TrafficServer http://trafficserver.apache.org/ • High performance web proxy • Forward and reverse proxy • Ideally suited to sitting between your content application and the internet • For proxy-only use cases, will probably be better than httpd • Fewer other features though • Often used as a cloud-edge http router
  10. 10. Apache Tomcat http://tomcat.apache.org/ • Talks – All day Friday! • Java based, as many of the Apache Content Technologies are • Java Servlet Container • And you probably all know the rest!
  11. 11. Tomcat – What's New http://tomcat.apache.org/ • Memory leak detection – for your applications, and for the JVM! • Easier to embed – no need for large numbers of config files! • Asynchronous request processing for things like Comet / Bayeux • Servlet 3.0 • Improved JMX configurability
  12. 12. Storing all that Content
  13. 13. Apache Cassandra http://cassandra.apache.org/ • Talk - 11am Wednesday Meetup - Wednesday evening • One of our many NoSQL Databases • Column-Family store • Eventually consistent • Distributed, replicating, no SPF • Can elastically add machines
  14. 14. Apache CouchDB http://couchdb.apache.org/ • 12pm Wednesday • Relax! • Erlang • NoSQL • Document orientated distributed store • Eventually consistent if replicating • Map-Reduce queries
  15. 15. Apache HBase http://hbase.apache.org/ • 2pm Wednesday • Recently graduated from Hadoop • Another NoSQL Database • Column-Family store, modelled on Google's Big Table paper • Some transactions and locking • Fast range queries and sorting • Built on HDFS
  16. 16. Which Apache NoSQL? • Do you have tuples, documents, variable key/values or complex object? • Must data always be consistent? • If you loose a chunk of machines (partition), should read/write still work? • Query by id, range, arbitrary key/value or map-reduce function? • How much human interaction is required to add or remove nodes?
  17. 17. Apache DB: Derby http://db.apache.org/derby/ • Small, easy to embed SQL database • Can be embedded and accessed via an embedded JDBC driver • Can be accessed over the network • Can be run entirely in-memory • Efficient on-disk format • Has a JavaME version – run it on basic cell phones!
  18. 18. Apache Directory http://directory.apache.org/ • LDAP Directory • Optimised for many reads per write • Hierarchical, class/attribute based storage • Triggers, stored procedures, queries and views • Multi-master replication • Rich permissions model built in
  19. 19. Apache JackRabbit http://jackrabbit.apache.org/ • 1.30pm Thursday • JCR (Java Content Repository) • Hierarchical content store • Supports structured and unstructured data • Transactional • Support versions • Full text search built in
  20. 20. Apache Lucene http://lucene.apache.org/ • All day Friday + Meetup Tuesday night • Inverted index store • (Each term lists it documents, rather than each document listing terms) • Searching is faster than adding • Normally stores text, but additional data can be associated with it • Can hold indexed and un-indexed data
  21. 21. Lucene – What's New? http://lucene.apache.org/ • Lucene and SOLR have merged • Near real-time support when indexing • Better storing of attributes and other data in the token stream • Numeric fields improved – no need to externally process numbers into range buckets yourself • Fast vector highlighter for large docs
  22. 22. Apache Subversion http://subversion.apache.org/ • Meetup Thursday evening • Versioning content store • Efficient at storing changes • Normally stores code, text and the odd binary blob • If you have textual data and you want a versioning store, it's a good fit! • Used by the new Apache CMS
  23. 23. Apache Xindice http://xml.apache.org/xindice/ • Native XML Database • No need to map your complex XML files to a different data structure • Ideally suited to problems where you have large numbers of XML files, and little / no other content • Schema independent model • XPath queries
  24. 24. Transforming and Reading Content
  25. 25. Apache PDFBox http://pdfbox.apache.org/ • 4pm Wednesday • Read, Write, Create and Edit PDFs • Create PDFs from text • Fill in PDF forms • Extract text and formatting (Lucene, Tika etc) • Edit existing files, add images, add text etc
  26. 26. Apache POI http://poi.apache.org/ • 3pm Wednesday + FastFeatherTrack • File format reader and writer for Microsoft office file formats • Support binary & ooxml formats • Strong read edit write for .xls & .xlsx • Read and basic edit for .doc & .docx • Read and basic edit for .ppt & .pptx • Read for Visio, Publisher, Outlook
  27. 27. Apache Tika http://tika.apache.org/ • 9am Friday + Fast Feather Track • Java (+ command line) toolkit for detecting and extracting content • Identifies what a blob of content is • Gives you consistent metadata back for it • Parses the contents into plain text, HTML, XHTML or sax events
  28. 28. Tika – What's New? http://tika.apache.org/ • Lots of new parsers – text, office formats, publishing formats, images, audio, CAD, fonts etc • Long standing parsers improved – better HTML from word for example • Embedded resources and containers • Use expanding – used by many SOLR users, Alfresco, lots of people crunching masses of data on Hadoop
  29. 29. Apache Cocoon http://cocoon.apache.org/ • Component Pipeline framework • Plug together “Lego-Like” generators, transformers and serialisers • Generate your content once in your application, serve to different formats • Read in formats, translate and publish • Can power your own “Yahoo Pipes” • Modular, powerful and easy
  30. 30. Apache Xalan http://xalan.apache.org/ • XSLT processor • XPath engine • Java and C++ flavours • Cross platform • Library and command line executables • Transform your XML • Fast and reliable XSLT transformation engine
  31. 31. Apache XML Graphics: Batik http://xmlgraphics.apache.org/#batik • Java SVG toolkit + library • SVG Parser – read and process existing SVG files • SVG Generator – Graphics2D implementation that outputs SVG • SVG Dom – easy way to manipulate your SVG files • SVG viewer program (Squiggle) • Command line SVG rasteriser
  32. 32. Apache XML Graphics: FOP http://xmlgraphics.apache.org/#fop • XSL-FO processor in Java • Reads W3C XSL-FO, applies the formatting rules to your XML document, and renders it • Output to Text, PS, PDF, SVG, RTF, Java Graphics2D etc • Lets you leave your XML clean, and define semantically meaningful rich rendering rules for it
  33. 33. Apache Commons: Codec http://commons.apache.org/codec/ • Commons Track – Thursday Morning • Encode and decode a variety of encoding formats • Base64, Hex, Phonetic and URLs • Handy when interchanging content with external systems
  34. 34. Apache Commons: Compress http://commons.apache.org/compress/ • Commons Track – Thursday Morning • Standard way to deal with archive formats • Read and write support • zip, tar, gzip, bzip, cpio and ar • Wider range of capabilities than java.util.Zip • Common API across all formats
  35. 35. Apache Commons: Sanselan http://commons.apache.org/sanselan/ • Commons Track – Thursday Morning • Pure Java image reader and writer • Fast parsing of image metadata and information (size, color space, icc etc) • Much easier to use than ImageIO • Slower though, as pure Java • Wider range of formats supported • PNG, GIF, TIFF, JPEG + Exif, BMP, ICO, PNM, PPM, PSD, XMP
  36. 36. Generating Content
  37. 37. Apache Forrest http://forrest.apache.org/ • Document rendering solution build on top of cocoon • Reads in content in a variety of formats (xml, wiki etc), applies the appropriate formatting rules, then outputs to different formats • Heavily used for documentation and websites • eg read in a file, format as changelog and readme, output as html + pdf
  38. 38. Apache Abdera http://abdera.apache.org/ • Atom – syndication and publishing • High performance Java implementation of RFC 4287 + 5023 • Generate Atom feeds from Java or by converting • Parse and process Atom feeds • Atompub server and clients • Supports Atom extensions like GeoRSS, MediaRSS & OpenSearch
  39. 39. Apache Droids (Incubating) http://incubator.apache.org/droids/ • Intelligent Robots! • Generic standalone crawler framework • Easy to extending existing common crawlers • Easy to write custom ones • Queue requests for content, protocol handler gets it, multi threaded • Uses Apache Tika for core of handling fetched resources
  40. 40. Apache JSPWiki (Incubating) http://incubator.apache.org/jspwiki/ • Feature-rich extensible wiki • Written in Java (Servlets + JSP) • Fairly easy to extend • Can be used as a wiki out of the box • Provides a good platform for new wiki based application • Rich wiki markup and syntax • Attachments, security, templates etc
  41. 41. Apache ManifoldCF (Incubating) http://incubator.apache.org/connectors/ • Name has changed a few times... (Lucene/Apache Connectors) • Provides a standard way to get content out of other systems, ready for sending to Lucene etc • Different goals to CMIS (Chemistry) • Uses many parsers and libraries to talk to the different repositories / systems • Analogous to Tika but for repos
  42. 42. Apache PhotArk (Incubating) http://incubator.apache.org/photark/ • 5pm Thursday • Open Source Photo Gallery application • Standalone or servlet modes • Can host photos locally • Can aggregate external photo albums (Flickr, Picassa) for a unified view • SCA programming model – uses Apache Tuscany to power it
  43. 43. Hosting Content
  44. 44. Apache Chemistry (Incubating) http://incubator.apache.org/chemistry/ • 2pm Wednesday • Java, Python and PHP, Atom and WS* • OASIS CMIS (Content Management Interoperability Services) • Client and Server bindings • “SQL for Content” • Consistent view on content across different repositories • Read / Write / Manipulate content
  45. 45. Chemistry vs ManifoldCF incubator /chemistry/ /connectors/ • ManifoldCF treats repo as nasty black box, and handles talking to the parsers • Chemistry talks / exposes repo's contents through CMIS • ManifoldCF supports a wider range of repositories • Chemistry supports read and write • Chemistry delivers a richer model • ManifoldCF great for getting text out
  46. 46. Apache Lenya http://lenya.apache.org/ • 9am Thursday • XML Content Management system • Powered by Apache Cocoon • WSIWYG editors onto Relax-NG XML • Rich workflow engine + staging • Clean URLs, CSS for styling • Sensible handling of metadata, assets, internal links, users, permissions etc
  47. 47. Apache Roller http://roller.apache.org/ • Multi-user blog server • Used by the ASF internally • Scales to thousands of users & blogs • Should work with any JavaEE servlet container and SQL database • Comment moderation and spam filters • Each author has full layout control • Indexes, feeds and Metaweblog API support for 3rd party clients
  48. 48. Apache Shindig http://shindig.apache.org/ • Open Social Application Container • Hosts your open social widgets • Renders OpenSocial applications into HTML + JavaScript • Stores the data for your application • Full client-side JavaScript libraries to deliver gadget functionality • Reference implementation
  49. 49. Apache Wookie (Incubating) http://incubator.apache.org/wookie/ • 5.30pm Wednesday • W3C Widgets server • Upload, Deploy and Host Widgets • Widgets can range from a badge, through a small app to a full-blown collaborative system like chat • Connector framework to make it easy to write widgets in many languages
  50. 50. Web Frameworks (those with a strong Content focus to them)
  51. 51. Apache Sling http://sling.apache.org/ • 12pm Wednesday • “Fun” and easy web framework • REST based • Backed by Jackrabbit content repo • Powered by OSGi • Easy to script, supports multiple output languages (JSP, server side javascript, scala etc) • Stores both templates and content
  52. 52. Apache Tapestry http://tapestry.apache.org/ • Object Orientated web applications • Build your application in terms of objects, methods and properties • Tapestry handles URLs, query parameters and state for you • Pages built with simple HTML • Concentrate on the content that backs each part, and the business logic for it • Tapestry glues it together for you
  53. 53. Apache Tiles http://tiles.apache.org/ • Templating framework for Java • Works well with Struts and Shale • Lets you build your page from lots of tiles (components), which can nest • Build tiles together to make templates • Clean separation between your content, the business logic to select it, and the rendering rules
  54. 54. Apache Velocity http://velocity.apache.org/ • Templating engine • MVC webapp or standalone • Can generate HTML, SQL, PostScript, XML, Java Code or email from templates • Anakia lets you make a xdoc file available to a velocity template, handy when generating HTML from xdoc • Fairly rich templating language
  55. 55. Apache Wicket http://wicket.apache.org/ • Build your web applications in Java • Uses Java in preference to JavaScript, CSS etc • Handy if you have a strong Java team and you need to do some web stuff • Fits well with your Java components • But JS / CSS front end devs tend to be cheaper than Java ones....
  56. 56. Apache Clerezza (Incubating) http://incubator.apache.org/clerezza/ • OSGi based modular semantic web application framework • Lets you build applications that fit into the Semantic Web • Stores and easily manipulates RDF • Full control over REST and URIs • Build applications that both consume semantic data (eg RDF files), and that expose content to others
  57. 57. Any Questions? Any cool projects that I happened to miss?