  If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects
  Nick Burch Software Engineer Alfresco
  Apache Projects • 79 Top Level Projects • 40 Incubating Projects • 30 "Content Related" Main Projects • 7 "Content Related" Incubating Projects
  37 Projects in 50 minutes With time for questions... This is not a comprehensive guide!
  Different Technologies • Serving • Storing • Transforming • Generating • Hosting • Web Framework Rendering / Templating / etc
  What can we get in 50 mins? • A quick overview of each project • When talks on the project are happening • When meetups on the project are happening • Anything new/exciting about the project? • What interests me in the project!
  Serving up your Content
  Apache HTTPD Server http://httpd.apache.org/ • Talks – All day Wednesday Meetup – Thursday evening • Very wide range of features • (Fairly) easy to extend • Can host most programming languages • Can front most content systems • Can proxy your content applications • Can host code and content
  Apache TrafficServer http://trafficserver.apache.org/ • High performance web proxy • Forward and reverse proxy • Ideally suited to sitting between your content application and the internet • For proxy-only use cases, will probably be better than httpd • Fewer other features though • Often used as a cloud-edge http router
  Apache Tomcat http://tomcat.apache.org/ • Talks – All day Friday! • Java based, as many of the Apache Content Technologies are • Java Servlet Container • And you probably all know the rest!
  Tomcat – What's New http://tomcat.apache.org/ • Memory leak detection – for your applications, and for the JVM! • Easier to embed – no need for large numbers of config files! • Asynchronous request processing for things like Comet / Bayeux • Servlet 3.0 • Improved JMX configurability
  Storing all that Content
  Apache Cassandra http://cassandra.apache.org/ • Talk - 11am Wednesday Meetup - Wednesday evening • One of our many NoSQL Databases • Column-Family store • Eventually consistent • Distributed, replicating, no SPF • Can elastically add machines
  Apache CouchDB http://couchdb.apache.org/ • 12pm Wednesday • Relax! • Erlang • NoSQL • Document orientated distributed store • Eventually consistent if replicating • Map-Reduce queries
  Apache HBase http://hbase.apache.org/ • 2pm Wednesday • Recently graduated from Hadoop • Another NoSQL Database • Column-Family store, modelled on Google's Big Table paper • Some transactions and locking • Fast range queries and sorting • Built on HDFS
  Which Apache NoSQL? • Do you have tuples, documents, variable key/values or complex object? • Must data always be consistent? • If you loose a chunk of machines (partition), should read/write still work? • Query by id, range, arbitrary key/value or map-reduce function? • How much human interaction is required to add or remove nodes?
  Apache DB: Derby http://db.apache.org/derby/ • Small, easy to embed SQL database • Can be embedded and accessed via an embedded JDBC driver • Can be accessed over the network • Can be run entirely in-memory • Efficient on-disk format • Has a JavaME version – run it on basic cell phones!
  Apache Directory http://directory.apache.org/ • LDAP Directory • Optimised for many reads per write • Hierarchical, class/attribute based storage • Triggers, stored procedures, queries and views • Multi-master replication • Rich permissions model built in
  Apache JackRabbit http://jackrabbit.apache.org/ • 1.30pm Thursday • JCR (Java Content Repository) • Hierarchical content store • Supports structured and unstructured data • Transactional • Support versions • Full text search built in
  Apache Lucene http://lucene.apache.org/ • All day Friday + Meetup Tuesday night • Inverted index store • (Each term lists it documents, rather than each document listing terms) • Searching is faster than adding • Normally stores text, but additional data can be associated with it • Can hold indexed and un-indexed data
  Lucene – What's New? http://lucene.apache.org/ • Lucene and SOLR have merged • Near real-time support when indexing • Better storing of attributes and other data in the token stream • Numeric fields improved – no need to externally process numbers into range buckets yourself • Fast vector highlighter for large docs
  Apache Subversion http://subversion.apache.org/ • Meetup Thursday evening • Versioning content store • Efficient at storing changes • Normally stores code, text and the odd binary blob • If you have textual data and you want a versioning store, it's a good fit! • Used by the new Apache CMS
  Apache Xindice http://xml.apache.org/xindice/ • Native XML Database • No need to map your complex XML files to a different data structure • Ideally suited to problems where you have large numbers of XML files, and little / no other content • Schema independent model • XPath queries
  Transforming and Reading Content
  Apache PDFBox http://pdfbox.apache.org/ • 4pm Wednesday • Read, Write, Create and Edit PDFs • Create PDFs from text • Fill in PDF forms • Extract text and formatting (Lucene, Tika etc) • Edit existing files, add images, add text etc
  Apache POI http://poi.apache.org/ • 3pm Wednesday + FastFeatherTrack • File format reader and writer for Microsoft office file formats • Support binary & ooxml formats • Strong read edit write for .xls & .xlsx • Read and basic edit for .doc & .docx • Read and basic edit for .ppt & .pptx • Read for Visio, Publisher, Outlook
  Apache Tika http://tika.apache.org/ • 9am Friday + Fast Feather Track • Java (+ command line) toolkit for detecting and extracting content • Identifies what a blob of content is • Gives you consistent metadata back for it • Parses the contents into plain text, HTML, XHTML or sax events
  Tika – What's New? http://tika.apache.org/ • Lots of new parsers – text, office formats, publishing formats, images, audio, CAD, fonts etc • Long standing parsers improved – better HTML from word for example • Embedded resources and containers • Use expanding – used by many SOLR users, Alfresco, lots of people crunching masses of data on Hadoop
  Apache Cocoon http://cocoon.apache.org/ • Component Pipeline framework • Plug together "Lego-Like" generators, transformers and serialisers • Generate your content once in your application, serve to different formats • Read in formats, translate and publish • Can power your own "Yahoo Pipes" • Modular, powerful and easy
  Apache Xalan http://xalan.apache.org/ • XSLT processor • XPath engine • Java and C++ flavours • Cross platform • Library and command line executables • Transform your XML • Fast and reliable XSLT transformation engine
  Apache XML Graphics: Batik http://xmlgraphics.apache.org/#batik • Java SVG toolkit + library • SVG Parser – read and process existing SVG files • SVG Generator – Graphics2D implementation that outputs SVG • SVG Dom – easy way to manipulate your SVG files • SVG viewer program (Squiggle) • Command line SVG rasteriser
  Apache XML Graphics: FOP http://xmlgraphics.apache.org/#fop • XSL-FO processor in Java • Reads W3C XSL-FO, applies the formatting rules to your XML document, and renders it • Output to Text, PS, PDF, SVG, RTF, Java Graphics2D etc • Lets you leave your XML clean, and define semantically meaningful rich rendering rules for it
  Apache Commons: Codec http://commons.apache.org/codec/ • Commons Track – Thursday Morning • Encode and decode a variety of encoding formats • Base64, Hex, Phonetic and URLs • Handy when interchanging content with external systems
  Apache Commons: Compress http://commons.apache.org/compress/ • Commons Track – Thursday Morning • Standard way to deal with archive formats • Read and write support • zip, tar, gzip, bzip, cpio and ar • Wider range of capabilities than java.util.Zip • Common API across all formats
  Apache Commons: Sanselan http://commons.apache.org/sanselan/ • Commons Track – Thursday Morning • Pure Java image reader and writer • Fast parsing of image metadata and information (size, color space, icc etc) • Much easier to use than ImageIO • Slower though, as pure Java • Wider range of formats supported • PNG, GIF, TIFF, JPEG + Exif, BMP, ICO, PNM, PPM, PSD, XMP
  Generating Content
  Apache Forrest http://forrest.apache.org/ • Document rendering solution build on top of cocoon • Reads in content in a variety of formats (xml, wiki etc), applies the appropriate formatting rules, then outputs to different formats • Heavily used for documentation and websites • eg read in a file, format as changelog and readme, output as html + pdf
  Apache Abdera http://abdera.apache.org/ • Atom – syndication and publishing • High performance Java implementation of RFC 4287 + 5023 • Generate Atom feeds from Java or by converting • Parse and process Atom feeds • Atompub server and clients • Supports Atom extensions like GeoRSS, MediaRSS & OpenSearch
  Apache Droids (Incubating) http://incubator.apache.org/droids/ • Intelligent Robots! • Generic standalone crawler framework • Easy to extending existing common crawlers • Easy to write custom ones • Queue requests for content, protocol handler gets it, multi threaded • Uses Apache Tika for core of handling fetched resources
  Apache JSPWiki (Incubating) http://incubator.apache.org/jspwiki/ • Feature-rich extensible wiki • Written in Java (Servlets + JSP) • Fairly easy to extend • Can be used as a wiki out of the box • Provides a good platform for new wiki based application • Rich wiki markup and syntax • Attachments, security, templates etc
  Apache ManifoldCF (Incubating) http://incubator.apache.org/connectors/ • Name has changed a few times... (Lucene/Apache Connectors) • Provides a standard way to get content out of other systems, ready for sending to Lucene etc • Different goals to CMIS (Chemistry) • Uses many parsers and libraries to talk to the different repositories / systems • Analogous to Tika but for repos
  Apache PhotArk (Incubating) http://incubator.apache.org/photark/ • 5pm Thursday • Open Source Photo Gallery application • Standalone or servlet modes • Can host photos locally • Can aggregate external photo albums (Flickr, Picassa) for a unified view • SCA programming model – uses Apache Tuscany to power it
  Hosting Content
  Apache Chemistry (Incubating) http://incubator.apache.org/chemistry/ • 2pm Wednesday • Java, Python and PHP, Atom and WS* • OASIS CMIS (Content Management Interoperability Services) • Client and Server bindings • "SQL for Content" • Consistent view on content across different repositories • Read / Write / Manipulate content
  Chemistry vs ManifoldCF incubator /chemistry/ /connectors/ • ManifoldCF treats repo as nasty black box, and handles talking to the parsers • Chemistry talks / exposes repo's contents through CMIS • ManifoldCF supports a wider range of repositories • Chemistry supports read and write • Chemistry delivers a richer model • ManifoldCF great for getting text out
  Apache Lenya http://lenya.apache.org/ • 9am Thursday • XML Content Management system • Powered by Apache Cocoon • WSIWYG editors onto Relax-NG XML • Rich workflow engine + staging • Clean URLs, CSS for styling • Sensible handling of metadata, assets, internal links, users, permissions etc
  Apache Roller http://roller.apache.org/ • Multi-user blog server • Used by the ASF internally • Scales to thousands of users & blogs • Should work with any JavaEE servlet container and SQL database • Comment moderation and spam filters • Each author has full layout control • Indexes, feeds and Metaweblog API support for 3rd party clients
  Apache Shindig http://shindig.apache.org/ • Open Social Application Container • Hosts your open social widgets • Renders OpenSocial applications into HTML + JavaScript • Stores the data for your application • Full client-side JavaScript libraries to deliver gadget functionality • Reference implementation
  Apache Wookie (Incubating) http://incubator.apache.org/wookie/ • 5.30pm Wednesday • W3C Widgets server • Upload, Deploy and Host Widgets • Widgets can range from a badge, through a small app to a full-blown collaborative system like chat • Connector framework to make it easy to write widgets in many languages
  Web Frameworks (those with a strong Content focus to them)
  Apache Sling http://sling.apache.org/ • 12pm Wednesday • "Fun" and easy web framework • REST based • Backed by Jackrabbit content repo • Powered by OSGi • Easy to script, supports multiple output languages (JSP, server side javascript, scala etc) • Stores both templates and content
  Apache Tapestry http://tapestry.apache.org/ • Object Orientated web applications • Build your application in terms of objects, methods and properties • Tapestry handles URLs, query parameters and state for you • Pages built with simple HTML • Concentrate on the content that backs each part, and the business logic for it • Tapestry glues it together for you
  Apache Tiles http://tiles.apache.org/ • Templating framework for Java • Works well with Struts and Shale • Lets you build your page from lots of tiles (components), which can nest • Build tiles together to make templates • Clean separation between your content, the business logic to select it, and the rendering rules
  Apache Velocity http://velocity.apache.org/ • Templating engine • MVC webapp or standalone • Can generate HTML, SQL, PostScript, XML, Java Code or email from templates • Anakia lets you make a xdoc file available to a velocity template, handy when generating HTML from xdoc • Fairly rich
  55. 55. Apache Wicket http://wicket.apache.org/ • Build your web applications in Java • Uses Java in preference to JavaScript, CSS etc • Handy if you have a strong Java team and you need to do some web stuff • Fits well with your Java components • But JS / CSS front end devs tend to be cheaper than Java ones....
  56. 56. Apache Clerezza (Incubating) http://incubator.apache.org/clerezza/ • OSGi based modular semantic web application framework • Lets you build applications that fit into the Semantic Web • Stores and easily manipulates RDF • Full control over REST and URIs • Build applications that both consume semantic data (eg RDF files), and that expose content to others
  57. 57. Any Questions? Any cool projects that I happened to miss?