Tika is a toolkit for extracting metadata and text from various document formats. It allows developers to parse documents and extract metadata and text content in 3 main steps. Tika shields systems like Alfresco from needing to integrate many individual parsing components. Alfresco uses Tika to index content from various formats by passing file streams through Tika's parsers rather than using multiple custom transformers.
65. Extension use case Adding support for Microsoft Office Open XML Documents (Office 2007+)
66. Apache POI Apache POI provides Text Extraction support for Office OpenXML formats and An advanced coverage of SpreadsheetML specification (WordprocessingML & PresentationML to come)