Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

WebExpo 2008 Newstin

1.113 Aufrufe

Veröffentlicht am

Presentation about Newstin at conference WebExpo 2008 about categorization of text content on web in real time.

More at http://2008.webexpo.cz/prednaska/kategorizace-weboveho-obsahu-v-realnem-case/

Veröffentlicht in: Technologie, Bildung
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

WebExpo 2008 Newstin

  1. 1. Newstin Real-time Web Content Categorization Presentation to WebExpo 2008 October 18, 2008
  2. 2. Company Background  Newstin a.s. founded in 1998 as I2S in Prague  Team of 30 employees  26 engineers  14 nations  Since 2005  Real-time semantic content categorization  Multiple patent filings on cross-language solution  Past activities  Business & government projects in information management and security  Partnership with Business Objects/SAP  RedHerring Europe 100 Winner Award
  3. 3. What is Newstin?  Patented technology  Largest news database, catalog of news in the world  150,000+ information sources in 11 languages  250,000+ articles daily fully processed into 1,000,000+ categories  US, UK, Indian, French, German, Italian, Spanish, Mexican, Portuguese, Brazilian, Czech, Russian, Arabic, Chinese  Japanese, Korean, Turkish coming in Q4 2008  Newstin.com  Popular user applications  Business Intelligence  Enterprise content organization
  4. 4. What is Newstin? (Details)  Newstin is an innovative technology that incorporates a completely new approach to content organization. Newstin technology and its service-oriented architecture is the foundation of a unique system that features fully scalable real-time semantic, multi-language and cross-language document categorization. Newstin patented technology has the potential to become the core platform for organizing any unstructured textual data, including data from all sources on the Internet and potentially including the hidden Web.  Newstin is a powerful engine which harnesses a variety of cutting-edge technologies and implements linguistic processing with semantic analysis, multilevel content categorization and cross-language taxonomy structures. The applications of Newstin technology utilize an inherent capability to make use of context in addition to conventional key word approaches.  Newstin is the largest news database/catalogue in the world currently comprising 40 Million documents & 2.2 Billion metadata items and constantly growing. Newstin article collection is continuously updated from over 160,000 global and weighted sources selected from a pool of over 3 Million preprocessed sources in 12 languages. Daily up to 200,000+ articles are fully processed into 1.1 Million categories in 15 supported editions: US, UK, Indian, French, German, Italian, Spanish, Mexican, Portuguese, Brazilian, Czech, Russian, Arabic, Chinese and Korean; with more languages and editions coming soon.  Newstin is a complex system incorporating content retrieval, metadata processing, analysis and visualization. The extensive operation behind Newstin makes it a perfect platform for SaaS solutions.  Newstin is a bi-directional application of its own. By imposing order on unstructured data Newstin leverages its own extensive metadata collection for business intelligence and enterprise performance management. It is inevitable to organize content first to maximize knowledge mining capability.
  5. 5. Web Content Chaos  An inspiration for Newstin to develop a solution for organizing web content
  6. 6. Semantic Web 2.0 Organization  A portion of Newstin’s taxonomy structure – a step toward organizing web content
  7. 7. Live Demonstration – Newstin.com
  8. 8. Live Demonstration – NewstinMap
  9. 9. Live Demonstration - Connecting VIP
  10. 10. Live Demonstration – BI Example
  11. 11. Live Demonstration – BI Example
  12. 12. Live Demonstration - EmergingStories
  13. 13. B2B: Online Categorization Firewall Enterprise Intranet Unstructured Semantic  Data Newstin Organization Contextual Search Categorization  Visual Navigation  Metadata Engine Cross-language  Mash up  internal/external Semantic / Web 2.0 Capability SaaS to Enterprise Market Standard for Tagging  Product synergy / enhancement  Competitive advantage
  14. 14. Cross-language Information Retrieval  Newstin enables to reach a particluar topic in all supported languages through original definitions
  15. 15. Life Cycle  Newstin is a comprehensive information system
  16. 16. Shrnutí Prezentace - CZ Hlavní téma: Kategorizace webového obsahu v reálném čase Newstin a.s. je česká technologická firma se sídlem v Praze, zaměstnávající 30 inženýrů z 15 zemí. Během 3,5 roku vytvořila unikátní technologii na real-time organizování textových dokumentů s využitím sémantických a lingvistických technologií. Stěžejní a patentovanou součástí Newstin technologie je tzv. cross-lingvální řešení umožňující propojovat internetový obsah v různých jazycích bez použití překladů. Newstin vytvořil největší aktuální databázi článků internetového zpravodajství v 11 světových jazycích včetně češtiny, která obsahuje 37 milionů článků za posledních 9 měsíců a 2 miliardy metadat. V současnosti servery Newstin denně zpracují 250 tis. unikátních článků ze 160 tis. nejdůležitějších zdrojů po celém světě. Další využití technologie Newstin leží v oblasti mediálních analýz a organizaci podnikových dat.
  17. 17. Real-time Web Content Categorization Thank you. Julius Rusnak CTO Newstin a.s. Lomnickeho 9 140 00 Prague Czech Republic

×