SlideShare ist ein Scribd-Unternehmen logo
Open Source Search Engines
Veit Schiele
cusy GmbH, Berlin
de.slideshare.net/cusyio/open-source-search-engines
Veit Schiele
• Gründer und Geschäftsführer 

der Cusy GmbH, einer
datenschutzkonformen Entwicklungs-
und Betriebsplattform
• Enge Zusammenarbeit mit der
Gesellschaft für Datenschutz und
Datensicherheit e.V. (GDD)
you + me + cusy
Agenda
• Evaluation
• Search Appliances
• Third-party hosted services
• Self-hosted services
• Anbindung von Drittsystemen
• collective.elasticindex
• Zusammenfassung und Diskussion
Alternative Search Appliance
Alternative Search Appliance
MaxxCAT Mindbreeze
InSpire
Thunderstone

Empfohlen für Sie

Red Hat Storage Server Roadmap & Integration With Open Stack
Red Hat Storage Server Roadmap & Integration With Open StackRed Hat Storage Server Roadmap & Integration With Open Stack
Red Hat Storage Server Roadmap & Integration With Open Stack

"Red Hat Storage Server is an open, software-defined storage product for private, public, and hybrid cloud environments, based on the open source GlusterFS project, a distributed scale-out file system technology. In this session, you’ll: Hear about the near- and medium-term Red Hat Storage Server roadmap. Get deep insight into its integration roadmap with Red Hat Enterprise Linux OpenStack Platform and its feature roadmap for running big data analytics workloads. Have an opportunity to share your perspectives with senior business and technical leaders from the Red Hat Storage team to help shape the future of Red Hat Storage Server."

openstackred hat enterprise linux openstack platformred hat storage server
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...

The document summarizes key points from a presentation on optimizing Solr and log pipelines for time-series data. The presentation covered using time-based Solr collections that rotate based on size, tiering hot and cold clusters, tuning OS and Solr settings, parsing logs, buffering pipelines, and shipping logs using protocols like UDP, TCP, and Kafka. The overall conclusions were that tuning segments per tier and max merged segment size improved indexing throughput, and that simple, reliable pipelines like Filebeat to Kafka or rsyslog over UNIX sockets generally work best.

Helen Perquy is de Koeken Dame
Helen Perquy is de Koeken DameHelen Perquy is de Koeken Dame
Helen Perquy is de Koeken Dame

Helen Perquy stond met Bart De Pauw aan de wieg van Koeken Troef. Hun wegen gingen uiteen en Perquy richtte Koeken Dame op.

koeken troefkoeken troef bart de pauw
Search Appliances
Alternative Search Appliances
Pros und Cons

+ Einfach und schnell bereitgestellt
+ Geringe Wartungs- und Pflegeaufwände
- Keine Redundanz, kein Backup
- Häufig sind Konfigurationsänderungen 

nur produktiv zu testen
- Monitoring der Lizenzauslastung erforderlich
Third-party hosted services
Third-party hosted services
Beispiele
SearchBloxN2SM OSS Elastic Cloud
Third-party hosted services
Third-party hosted services
Pros und Cons

+ Einfach und schnell bereitgestellt für öffentlich
zugängliche Informationen
+ Keine Wartungs- und Pflegeaufwände
- Einbindung von internen Services, Dateisystemen etc.
schwierig bis unmöglich
- Monitoring der Lizenzauslastung erforderlich
Self-hosted services
Self-hosted services
Beispiele
FessOpenSearchServer Elastic Stack

Empfohlen für Sie

Cigniti joint webinar with Soasta - Agile DevOps: Test-driven IT Environment ...
Cigniti joint webinar with Soasta - Agile DevOps: Test-driven IT Environment ...Cigniti joint webinar with Soasta - Agile DevOps: Test-driven IT Environment ...
Cigniti joint webinar with Soasta - Agile DevOps: Test-driven IT Environment ...

This document summarizes a presentation about testing e-commerce websites for Black Friday. It discusses: - Best practices for retail e-commerce testing including having a testing strategy, understanding user behavior, and testing in production. - How SOASTA CloudTest can help with testing including global test locations, automated scaling, and a "kill switch" to stop tests without fear. - A case study of how Cigniti helped a large US retailer improve customer experience by addressing challenges like outdated technology, siloed teams, and a lack of comprehensive testing strategy. Outcomes included improved performance, scalability, and user experience.

IBM CEC Big Data 2011 06-11 final
IBM CEC Big Data 2011 06-11 finalIBM CEC Big Data 2011 06-11 final
IBM CEC Big Data 2011 06-11 final

Big data analytics can provide businesses with new insights from large volumes of structured and unstructured data. It allows analyzing customer sentiment, detecting medical conditions, predicting weather patterns, assessing risk, and identifying threats. To leverage big data, businesses need to capture data from various sources, analyze it in real-time, and turn it into insights to predict customer, competitive, and market behavior. Deploying big data analytics competencies consistently across an enterprise correlates with higher financial performance and competitive advantage long-term.

Global Azure Bootcamp - Azure OMS
Global Azure Bootcamp - Azure OMSGlobal Azure Bootcamp - Azure OMS
Global Azure Bootcamp - Azure OMS

O documento discute como o Azure Operations Management Suite (OMS) pode ser usado para monitorar nuvens públicas e privadas de forma unificada. O OMS oferece gerenciamento simplificado de qualquer lugar, melhorando a eficiência operacional, disponibilidade e segurança através de recursos como log analytics, pesquisa integrada e dashboards personalizados.

azure operations management operationsazure automationazure backup and recovery
Self-hosted services
Self-hosted services
Pros und Cons

+ Einbindung von internen Services, Dateisystemen etc.
möglich
- Erhöhte Aufwände in Installation, Konfiguration und
Pflege
- Erweiterbar, wenn auch meist mit erheblichem
Aufwand
Self-hosted services
1. Suchmaschine auf Basis
von Apache Lucene
1. Elasticsearch
2. Solr
Search Appliances bestehen im Wesentlichen aus zwei
Komponenten:
Self-hosted services
2. Crawler

Hierfür bieten sich z.B. 

folgende Lösungen an:

1. Scrapy
1. scrapy-elasticsearch
2. Apache Nutch
3. Elasticsearch River Web
Search Appliances bestehen im Wesentlichen aus zwei
Komponenten:

Anbindung von Drittsystemen
Self-hosted services: Enhancements
z.B. mit Apache ManifoldCF
• Microsoft Sharepoint
• EMC Documentum
• DropBox
• RSS-Feeds
• E-Mail
…

Empfohlen für Sie

1st step LogicFlow
1st step LogicFlow1st step LogicFlow
1st step LogicFlow

2017/06/24 Interact 2017 CL02 で利用したセッションスライド

#msinteract2017 #logicapps #microsoftflow
Cloud Camp Azure概要
Cloud Camp Azure概要Cloud Camp Azure概要
Cloud Camp Azure概要

Microsoft Azure の概要資料です。Hyper-Scale、Hybrid、Enterprise Grade のお話し。

microsoft azure
Anbindung von Drittsystemen
Self-hosted services: Enhancements
oder für Elasticsearch 5.2
• FS Crawler
• IMAP/POP3/Mail importer
• …
• s.a. Elasticsearch Plugins
and Integrations
Anbindung von Drittsystemen
Self-hosted services: Enhancements
-Pack:
• Security (vorm. Shield)
• Alerting (vorm. Watcher)
• Monitoring (vorm. Marvel)
• Reporting
• Graph
• Machine Learning
Beispiel: Fraunhofer ISE – 1. Indizierung
Fraunhofer ISE – 1. Indizierung
• Durchsuchen verschiedener Quellen
• Projekt-Websites
• Aufgabenverwaltung
• Dateisystem
• …
• Berechtigungen indizieren
Beispiel: Fraunhofer ISE – 2. Datenhaltung
Fraunhofer ISE – 2. Datenhaltung
• Jedes Repository besitzt eigenes
Datenmodell
• Gemeinsame Abfragen über
Wildcards
• Aliasses / Pipelines

Empfohlen für Sie

D5 crazy speed web development
D5 crazy speed web developmentD5 crazy speed web development
D5 crazy speed web development

The document discusses recent changes in JavaScript development trends since the mid-2010s, including functional programming principles like immutable variables, no side effects, high order functions, and monads. It also covers modern front-end development patterns like MVVM using Knockout.js for declarative bindings and templating. CommonJS modules and asynchronous I/O are discussed in the context of server-side JavaScript.

Rapid Infrastructure Provisioning
Rapid Infrastructure ProvisioningRapid Infrastructure Provisioning
Rapid Infrastructure Provisioning

This is the story of a company that had 10s of customers and were facing severe scaling issues. They approached us. They had a good product predicting a few hundred customers within 6 months. VCs went to them. Infrastructure scaling was the only unknown; funding for software-defined data centers. We introduced Terraform for infrastructure creation, Chef for OS hardening, and then Packer for supporting AWS as well as VSphere. Then, after a few more weeks, when there was a need for faster response from the data center, we went into Serf to immediately trigger chef-clients and then to Consul for service monitoring. Want to describe this journey. Finally, we did the same exact thing in at a Fortune 500 customer to replace 15 year-old scripts. We will also cover sleek ways of dealing with provisioning in different Availability Zones across various AWS regions with Terraform.

infrastructure provisioningautomationinfrastructure-as-a-code
Stephenson big data utrecht 2017
Stephenson   big data utrecht 2017Stephenson   big data utrecht 2017
Stephenson big data utrecht 2017

This document outlines a strategic planning process for building an analytics team in 6 stages: 1. Stage 1 focuses on basic data analysis to answer initial business questions using a business analyst. 2. Stage 2 addresses more complex questions by centralizing data in a data warehouse and building reporting tools with business intelligence specialists. 3. Stage 3 involves more advanced analytics like recommendation engines, churn analysis, and demand forecasting using data scientists, big data platforms, and data science techniques. The document recommends hiring in this order: business analysts, web analysts, data warehouse experts, visualization experts, and data scientists. It also provides an example roadmap for prioritizing analytics focus areas over time.

Beispiel: Fraunhofer ISE – 3. Sicherheit
Fraunhofer ISE – 3. Autorisierung
• Jedes Repository besitzt eigenes
Authorisierungsmodell
• Informationen z.T. aus dem
Certification Authority Server
• Security Proxy überprüft
Autorisierung vor dem Ausliefern
Completeness of Vision
AbilitytoExecute
As of August 2015
Challangers
Niche Players
Leaders
Visionaries
LucidworksLucidworks
Expert System
Recommind
BA Insight
IBM
IHS
Coveo
Sinequa
HP
Mindbreeze
Google
Dassault Systèmes
Attivio
Lexmark
Squiz
Positionierung
Gartner. Magic Quadrant
What we can do for you
What we can do for you
• datenschutzkonform Hosten 

auf der Cusy-Plattform
• Installation, Wartung und Pflege 

auf ihren Maschinen
• Beratung, Anpassung und individuelle
Weiterentwicklung
Kontakt
www.cusy.io/veit
info@cusy.io
@cusyio
+CusyIo

Empfohlen für Sie

Node.JS error handling best practices
Node.JS error handling best practicesNode.JS error handling best practices
Node.JS error handling best practices

The following slides summarize and curate most of the knowledge and patterns gathered to date on Node error handling. Without clear understanding and strategy, Node error handling might be the Achilles heel of your app – its unique single-threaded execution model and loose types raise challenges that don’t exist in any other frameworks. Node by itself doesn’t provide patterns for critical paths like where to put error handling code, even worst it suggest patterns that were rejected by the community like passing errors in callbacks. It covers topics like promises, generators, callbacks, unhandled exceptions, APM products, testing errors, operational errors vs development errors and much more

javascriptnodejsaws
Roadmap to data driven advice michael goedhart 1v0
Roadmap to data driven advice michael goedhart 1v0Roadmap to data driven advice michael goedhart 1v0
Roadmap to data driven advice michael goedhart 1v0

1. RBI aims to provide data-driven advisory services by building on its existing solid foundation of innovative technology, optimized processes, and independent teams. 2. The first steps involve a proof of concept for an initial data product in 2017, followed by an evolving roadmap and expanding team capabilities in analytics. 3. For 2018, the roadmap prioritizes adding customer value through enhancing existing products with analytics capabilities and proving the value of advisory notifications.

Vasilis Bankov & Calin Iliescu AEGON
Vasilis Bankov & Calin Iliescu AEGONVasilis Bankov & Calin Iliescu AEGON
Vasilis Bankov & Calin Iliescu AEGON

Cloud EcoSystems: A Safe and Compliant Home for your Big Science on the Cloud. Vasilis Bankov & Calin Iliescu

Bildnachweise
• Portrait; Ingo Kniest
• Icons; André Henze; © Cusy GmbH
• Michael Gernhardt in space during STS-69 in 1995; PUBLIC DOMAIN: NASA

Weitere ähnliche Inhalte

Andere mochten auch

Introduction to QC
Introduction to QCIntroduction to QC
Introduction to QC
Kazuhiro Kosaka
 
Developers Summit 2012 16-E-1
Developers Summit 2012 16-E-1Developers Summit 2012 16-E-1
Developers Summit 2012 16-E-1
Kohei Kumazawa
 
Red Hat Storage Server Roadmap & Integration With Open Stack
Red Hat Storage Server Roadmap & Integration With Open StackRed Hat Storage Server Roadmap & Integration With Open Stack
Red Hat Storage Server Roadmap & Integration With Open Stack
Red_Hat_Storage
 
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Lucidworks
 
Helen Perquy is de Koeken Dame
Helen Perquy is de Koeken DameHelen Perquy is de Koeken Dame
Helen Perquy is de Koeken Dame
Thierry Debels
 
Cigniti joint webinar with Soasta - Agile DevOps: Test-driven IT Environment ...
Cigniti joint webinar with Soasta - Agile DevOps: Test-driven IT Environment ...Cigniti joint webinar with Soasta - Agile DevOps: Test-driven IT Environment ...
Cigniti joint webinar with Soasta - Agile DevOps: Test-driven IT Environment ...
Cigniti Technologies Ltd
 
IBM CEC Big Data 2011 06-11 final
IBM CEC Big Data 2011 06-11 finalIBM CEC Big Data 2011 06-11 final
IBM CEC Big Data 2011 06-11 final
COMMON Europe
 
Global Azure Bootcamp - Azure OMS
Global Azure Bootcamp - Azure OMSGlobal Azure Bootcamp - Azure OMS
Global Azure Bootcamp - Azure OMS
Bruno Lopes
 
1st step LogicFlow
1st step LogicFlow1st step LogicFlow
1st step LogicFlow
Tomoyuki Obi
 
Cloud Camp Azure概要
Cloud Camp Azure概要Cloud Camp Azure概要
Cloud Camp Azure概要
Daiyu Hatakeyama
 
D5 crazy speed web development
D5 crazy speed web developmentD5 crazy speed web development
D5 crazy speed web development
NAVER D2
 
Rapid Infrastructure Provisioning
Rapid Infrastructure ProvisioningRapid Infrastructure Provisioning
Rapid Infrastructure Provisioning
Uchit Vyas ☁
 
Stephenson big data utrecht 2017
Stephenson   big data utrecht 2017Stephenson   big data utrecht 2017
Stephenson big data utrecht 2017
BigDataExpo
 
Node.JS error handling best practices
Node.JS error handling best practicesNode.JS error handling best practices
Node.JS error handling best practices
Yoni Goldberg
 
Roadmap to data driven advice michael goedhart 1v0
Roadmap to data driven advice michael goedhart 1v0Roadmap to data driven advice michael goedhart 1v0
Roadmap to data driven advice michael goedhart 1v0
BigDataExpo
 
Vasilis Bankov & Calin Iliescu AEGON
Vasilis Bankov & Calin Iliescu AEGONVasilis Bankov & Calin Iliescu AEGON
Vasilis Bankov & Calin Iliescu AEGON
BigDataExpo
 
Fun git hub
Fun git hubFun git hub
Fun git hub
Kenu, GwangNam Heo
 
Dino Product Overview
Dino Product OverviewDino Product Overview
Dino Product Overview
Pim Brokken
 

Andere mochten auch (20)

Introduction to QC
Introduction to QCIntroduction to QC
Introduction to QC
 
Developers Summit 2012 16-E-1
Developers Summit 2012 16-E-1Developers Summit 2012 16-E-1
Developers Summit 2012 16-E-1
 
ecdevday7
ecdevday7ecdevday7
ecdevday7
 
Red Hat Storage Server Roadmap & Integration With Open Stack
Red Hat Storage Server Roadmap & Integration With Open StackRed Hat Storage Server Roadmap & Integration With Open Stack
Red Hat Storage Server Roadmap & Integration With Open Stack
 
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
 
Helen Perquy is de Koeken Dame
Helen Perquy is de Koeken DameHelen Perquy is de Koeken Dame
Helen Perquy is de Koeken Dame
 
Cigniti joint webinar with Soasta - Agile DevOps: Test-driven IT Environment ...
Cigniti joint webinar with Soasta - Agile DevOps: Test-driven IT Environment ...Cigniti joint webinar with Soasta - Agile DevOps: Test-driven IT Environment ...
Cigniti joint webinar with Soasta - Agile DevOps: Test-driven IT Environment ...
 
IBM CEC Big Data 2011 06-11 final
IBM CEC Big Data 2011 06-11 finalIBM CEC Big Data 2011 06-11 final
IBM CEC Big Data 2011 06-11 final
 
Global Azure Bootcamp - Azure OMS
Global Azure Bootcamp - Azure OMSGlobal Azure Bootcamp - Azure OMS
Global Azure Bootcamp - Azure OMS
 
1st step LogicFlow
1st step LogicFlow1st step LogicFlow
1st step LogicFlow
 
stagerapport2.3
stagerapport2.3stagerapport2.3
stagerapport2.3
 
Cloud Camp Azure概要
Cloud Camp Azure概要Cloud Camp Azure概要
Cloud Camp Azure概要
 
D5 crazy speed web development
D5 crazy speed web developmentD5 crazy speed web development
D5 crazy speed web development
 
Rapid Infrastructure Provisioning
Rapid Infrastructure ProvisioningRapid Infrastructure Provisioning
Rapid Infrastructure Provisioning
 
Stephenson big data utrecht 2017
Stephenson   big data utrecht 2017Stephenson   big data utrecht 2017
Stephenson big data utrecht 2017
 
Node.JS error handling best practices
Node.JS error handling best practicesNode.JS error handling best practices
Node.JS error handling best practices
 
Roadmap to data driven advice michael goedhart 1v0
Roadmap to data driven advice michael goedhart 1v0Roadmap to data driven advice michael goedhart 1v0
Roadmap to data driven advice michael goedhart 1v0
 
Vasilis Bankov & Calin Iliescu AEGON
Vasilis Bankov & Calin Iliescu AEGONVasilis Bankov & Calin Iliescu AEGON
Vasilis Bankov & Calin Iliescu AEGON
 
Fun git hub
Fun git hubFun git hub
Fun git hub
 
Dino Product Overview
Dino Product OverviewDino Product Overview
Dino Product Overview
 

Ähnlich wie Opensource Search Engines

Cinema in the Cloud
Cinema in the CloudCinema in the Cloud
Cinema in the Cloud
Oliver Michalski
 
Wolfgang Mader (Huemer Data Center)
Wolfgang Mader (Huemer Data Center)Wolfgang Mader (Huemer Data Center)
Wolfgang Mader (Huemer Data Center)
Agenda Europe 2035
 
Apache Solr vs. Elasticsearch - And The Winner Is...! Ein Vergleich der Shoot...
Apache Solr vs. Elasticsearch - And The Winner Is...! Ein Vergleich der Shoot...Apache Solr vs. Elasticsearch - And The Winner Is...! Ein Vergleich der Shoot...
Apache Solr vs. Elasticsearch - And The Winner Is...! Ein Vergleich der Shoot...
SHI Search | Analytics | Big Data
 
Cloud at massive scale and incredible speed, Ekkard Schnedermann berichtet vo...
Cloud at massive scale and incredible speed, Ekkard Schnedermann berichtet vo...Cloud at massive scale and incredible speed, Ekkard Schnedermann berichtet vo...
Cloud at massive scale and incredible speed, Ekkard Schnedermann berichtet vo...
Ekkard Schnedermann
 
Laudatio Workshop Entwicklersession zu Gemeinsamkeiten in Forschungsdatenrepo...
Laudatio Workshop Entwicklersession zu Gemeinsamkeiten in Forschungsdatenrepo...Laudatio Workshop Entwicklersession zu Gemeinsamkeiten in Forschungsdatenrepo...
Laudatio Workshop Entwicklersession zu Gemeinsamkeiten in Forschungsdatenrepo...
Dennis Zielke
 
Big/Smart/Fast Data – a very compact overview
Big/Smart/Fast Data – a very compact overviewBig/Smart/Fast Data – a very compact overview
Big/Smart/Fast Data – a very compact overview
OMM Solutions GmbH
 
GWAVACon 2015: Micro Focus - Filr 1.2 and beyond
GWAVACon 2015: Micro Focus - Filr 1.2 and beyondGWAVACon 2015: Micro Focus - Filr 1.2 and beyond
GWAVACon 2015: Micro Focus - Filr 1.2 and beyond
GWAVA
 
1. Cloud Native Meetup Innsbruck, 23.11.2023
1. Cloud Native Meetup Innsbruck, 23.11.20231. Cloud Native Meetup Innsbruck, 23.11.2023
1. Cloud Native Meetup Innsbruck, 23.11.2023
Johannes Kleinlercher
 
Architektur und Automation als Enabler für DevOps
Architektur und Automation als Enabler für DevOpsArchitektur und Automation als Enabler für DevOps
Architektur und Automation als Enabler für DevOps
matfsw
 
Analyse von Applikationslogs und Querylogs: Datenbanken, Hadoop oder Splunk?
Analyse von Applikationslogs und Querylogs: Datenbanken, Hadoop oder Splunk?Analyse von Applikationslogs und Querylogs: Datenbanken, Hadoop oder Splunk?
Analyse von Applikationslogs und Querylogs: Datenbanken, Hadoop oder Splunk?
KurtStockinger
 
Sicherheitsprüfung für HP NonStop Systeme
Sicherheitsprüfung für HP NonStop SystemeSicherheitsprüfung für HP NonStop Systeme
Sicherheitsprüfung für HP NonStop Systeme
Peter Haase
 
Vortragsreihe Dortmund: Unified Development Environments
Vortragsreihe Dortmund: Unified Development EnvironmentsVortragsreihe Dortmund: Unified Development Environments
Vortragsreihe Dortmund: Unified Development Environments
Thorsten Kamann
 
Software Entwicklung im Team
Software Entwicklung im TeamSoftware Entwicklung im Team
Software Entwicklung im Team
brandts
 
5 Schritte zu mehr Cybersecurity
5 Schritte zu mehr Cybersecurity5 Schritte zu mehr Cybersecurity
5 Schritte zu mehr Cybersecurity
A. Baggenstos & Co. AG
 
Jug nbg containerplattform dcos
Jug nbg containerplattform dcosJug nbg containerplattform dcos
Jug nbg containerplattform dcos
Ralf Ernst
 
Deployment von Entwicklungsumgebungen eines TYPO3-Intranets mit Vagrant
Deployment von Entwicklungsumgebungen eines TYPO3-Intranets mit VagrantDeployment von Entwicklungsumgebungen eines TYPO3-Intranets mit Vagrant
Deployment von Entwicklungsumgebungen eines TYPO3-Intranets mit Vagrant
Christoph Möller
 
Basisinfrastruktur aus Entwicklersicht
Basisinfrastruktur aus EntwicklersichtBasisinfrastruktur aus Entwicklersicht
Basisinfrastruktur aus Entwicklersicht
cmahnke
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
gedoplan
 
Ec2009 Templates
Ec2009 TemplatesEc2009 Templates
Ec2009 Templates
Ulrich Krause
 
CKAN by Friedrich Lindenberg
CKAN by Friedrich LindenbergCKAN by Friedrich Lindenberg
CKAN by Friedrich Lindenberg
Semantic Web Company
 

Ähnlich wie Opensource Search Engines (20)

Cinema in the Cloud
Cinema in the CloudCinema in the Cloud
Cinema in the Cloud
 
Wolfgang Mader (Huemer Data Center)
Wolfgang Mader (Huemer Data Center)Wolfgang Mader (Huemer Data Center)
Wolfgang Mader (Huemer Data Center)
 
Apache Solr vs. Elasticsearch - And The Winner Is...! Ein Vergleich der Shoot...
Apache Solr vs. Elasticsearch - And The Winner Is...! Ein Vergleich der Shoot...Apache Solr vs. Elasticsearch - And The Winner Is...! Ein Vergleich der Shoot...
Apache Solr vs. Elasticsearch - And The Winner Is...! Ein Vergleich der Shoot...
 
Cloud at massive scale and incredible speed, Ekkard Schnedermann berichtet vo...
Cloud at massive scale and incredible speed, Ekkard Schnedermann berichtet vo...Cloud at massive scale and incredible speed, Ekkard Schnedermann berichtet vo...
Cloud at massive scale and incredible speed, Ekkard Schnedermann berichtet vo...
 
Laudatio Workshop Entwicklersession zu Gemeinsamkeiten in Forschungsdatenrepo...
Laudatio Workshop Entwicklersession zu Gemeinsamkeiten in Forschungsdatenrepo...Laudatio Workshop Entwicklersession zu Gemeinsamkeiten in Forschungsdatenrepo...
Laudatio Workshop Entwicklersession zu Gemeinsamkeiten in Forschungsdatenrepo...
 
Big/Smart/Fast Data – a very compact overview
Big/Smart/Fast Data – a very compact overviewBig/Smart/Fast Data – a very compact overview
Big/Smart/Fast Data – a very compact overview
 
GWAVACon 2015: Micro Focus - Filr 1.2 and beyond
GWAVACon 2015: Micro Focus - Filr 1.2 and beyondGWAVACon 2015: Micro Focus - Filr 1.2 and beyond
GWAVACon 2015: Micro Focus - Filr 1.2 and beyond
 
1. Cloud Native Meetup Innsbruck, 23.11.2023
1. Cloud Native Meetup Innsbruck, 23.11.20231. Cloud Native Meetup Innsbruck, 23.11.2023
1. Cloud Native Meetup Innsbruck, 23.11.2023
 
Architektur und Automation als Enabler für DevOps
Architektur und Automation als Enabler für DevOpsArchitektur und Automation als Enabler für DevOps
Architektur und Automation als Enabler für DevOps
 
Analyse von Applikationslogs und Querylogs: Datenbanken, Hadoop oder Splunk?
Analyse von Applikationslogs und Querylogs: Datenbanken, Hadoop oder Splunk?Analyse von Applikationslogs und Querylogs: Datenbanken, Hadoop oder Splunk?
Analyse von Applikationslogs und Querylogs: Datenbanken, Hadoop oder Splunk?
 
Sicherheitsprüfung für HP NonStop Systeme
Sicherheitsprüfung für HP NonStop SystemeSicherheitsprüfung für HP NonStop Systeme
Sicherheitsprüfung für HP NonStop Systeme
 
Vortragsreihe Dortmund: Unified Development Environments
Vortragsreihe Dortmund: Unified Development EnvironmentsVortragsreihe Dortmund: Unified Development Environments
Vortragsreihe Dortmund: Unified Development Environments
 
Software Entwicklung im Team
Software Entwicklung im TeamSoftware Entwicklung im Team
Software Entwicklung im Team
 
5 Schritte zu mehr Cybersecurity
5 Schritte zu mehr Cybersecurity5 Schritte zu mehr Cybersecurity
5 Schritte zu mehr Cybersecurity
 
Jug nbg containerplattform dcos
Jug nbg containerplattform dcosJug nbg containerplattform dcos
Jug nbg containerplattform dcos
 
Deployment von Entwicklungsumgebungen eines TYPO3-Intranets mit Vagrant
Deployment von Entwicklungsumgebungen eines TYPO3-Intranets mit VagrantDeployment von Entwicklungsumgebungen eines TYPO3-Intranets mit Vagrant
Deployment von Entwicklungsumgebungen eines TYPO3-Intranets mit Vagrant
 
Basisinfrastruktur aus Entwicklersicht
Basisinfrastruktur aus EntwicklersichtBasisinfrastruktur aus Entwicklersicht
Basisinfrastruktur aus Entwicklersicht
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Ec2009 Templates
Ec2009 TemplatesEc2009 Templates
Ec2009 Templates
 
CKAN by Friedrich Lindenberg
CKAN by Friedrich LindenbergCKAN by Friedrich Lindenberg
CKAN by Friedrich Lindenberg
 

Mehr von cusy GmbH

Versionskontrolle in Machine-Learning-Projekten
Versionskontrolle in Machine-Learning-ProjektenVersionskontrolle in Machine-Learning-Projekten
Versionskontrolle in Machine-Learning-Projekten
cusy GmbH
 
Warum gRPC? – und wie in Python implementieren?
Warum gRPC? – und wie in Python implementieren?Warum gRPC? – und wie in Python implementieren?
Warum gRPC? – und wie in Python implementieren?
cusy GmbH
 
About cusy
About cusyAbout cusy
About cusy
cusy GmbH
 
Python-Bibliotheken zur Datenvisualisierung
Python-Bibliotheken zur DatenvisualisierungPython-Bibliotheken zur Datenvisualisierung
Python-Bibliotheken zur Datenvisualisierung
cusy GmbH
 
Continuous Everything
Continuous EverythingContinuous Everything
Continuous Everything
cusy GmbH
 
Quo vadis DevOps
Quo vadis DevOpsQuo vadis DevOps
Quo vadis DevOps
cusy GmbH
 
Devops ohne root
Devops ohne rootDevops ohne root
Devops ohne root
cusy GmbH
 
Cusy Developer-Baukasten
Cusy Developer-BaukastenCusy Developer-Baukasten
Cusy Developer-Baukasten
cusy GmbH
 

Mehr von cusy GmbH (8)

Versionskontrolle in Machine-Learning-Projekten
Versionskontrolle in Machine-Learning-ProjektenVersionskontrolle in Machine-Learning-Projekten
Versionskontrolle in Machine-Learning-Projekten
 
Warum gRPC? – und wie in Python implementieren?
Warum gRPC? – und wie in Python implementieren?Warum gRPC? – und wie in Python implementieren?
Warum gRPC? – und wie in Python implementieren?
 
About cusy
About cusyAbout cusy
About cusy
 
Python-Bibliotheken zur Datenvisualisierung
Python-Bibliotheken zur DatenvisualisierungPython-Bibliotheken zur Datenvisualisierung
Python-Bibliotheken zur Datenvisualisierung
 
Continuous Everything
Continuous EverythingContinuous Everything
Continuous Everything
 
Quo vadis DevOps
Quo vadis DevOpsQuo vadis DevOps
Quo vadis DevOps
 
Devops ohne root
Devops ohne rootDevops ohne root
Devops ohne root
 
Cusy Developer-Baukasten
Cusy Developer-BaukastenCusy Developer-Baukasten
Cusy Developer-Baukasten
 

Opensource Search Engines

  • 1. Open Source Search Engines Veit Schiele cusy GmbH, Berlin de.slideshare.net/cusyio/open-source-search-engines
  • 2. Veit Schiele • Gründer und Geschäftsführer 
 der Cusy GmbH, einer datenschutzkonformen Entwicklungs- und Betriebsplattform • Enge Zusammenarbeit mit der Gesellschaft für Datenschutz und Datensicherheit e.V. (GDD) you + me + cusy
  • 3. Agenda • Evaluation • Search Appliances • Third-party hosted services • Self-hosted services • Anbindung von Drittsystemen • collective.elasticindex • Zusammenfassung und Diskussion
  • 4. Alternative Search Appliance Alternative Search Appliance MaxxCAT Mindbreeze InSpire Thunderstone
  • 5. Search Appliances Alternative Search Appliances Pros und Cons
 + Einfach und schnell bereitgestellt + Geringe Wartungs- und Pflegeaufwände - Keine Redundanz, kein Backup - Häufig sind Konfigurationsänderungen 
 nur produktiv zu testen - Monitoring der Lizenzauslastung erforderlich
  • 6. Third-party hosted services Third-party hosted services Beispiele SearchBloxN2SM OSS Elastic Cloud
  • 7. Third-party hosted services Third-party hosted services Pros und Cons
 + Einfach und schnell bereitgestellt für öffentlich zugängliche Informationen + Keine Wartungs- und Pflegeaufwände - Einbindung von internen Services, Dateisystemen etc. schwierig bis unmöglich - Monitoring der Lizenzauslastung erforderlich
  • 9. Self-hosted services Self-hosted services Pros und Cons
 + Einbindung von internen Services, Dateisystemen etc. möglich - Erhöhte Aufwände in Installation, Konfiguration und Pflege - Erweiterbar, wenn auch meist mit erheblichem Aufwand
  • 10. Self-hosted services 1. Suchmaschine auf Basis von Apache Lucene 1. Elasticsearch 2. Solr Search Appliances bestehen im Wesentlichen aus zwei Komponenten:
  • 11. Self-hosted services 2. Crawler
 Hierfür bieten sich z.B. 
 folgende Lösungen an:
 1. Scrapy 1. scrapy-elasticsearch 2. Apache Nutch 3. Elasticsearch River Web Search Appliances bestehen im Wesentlichen aus zwei Komponenten:

  • 12. Anbindung von Drittsystemen Self-hosted services: Enhancements z.B. mit Apache ManifoldCF • Microsoft Sharepoint • EMC Documentum • DropBox • RSS-Feeds • E-Mail …
  • 13. Anbindung von Drittsystemen Self-hosted services: Enhancements oder für Elasticsearch 5.2 • FS Crawler • IMAP/POP3/Mail importer • … • s.a. Elasticsearch Plugins and Integrations
  • 14. Anbindung von Drittsystemen Self-hosted services: Enhancements -Pack: • Security (vorm. Shield) • Alerting (vorm. Watcher) • Monitoring (vorm. Marvel) • Reporting • Graph • Machine Learning
  • 15. Beispiel: Fraunhofer ISE – 1. Indizierung Fraunhofer ISE – 1. Indizierung • Durchsuchen verschiedener Quellen • Projekt-Websites • Aufgabenverwaltung • Dateisystem • … • Berechtigungen indizieren
  • 16. Beispiel: Fraunhofer ISE – 2. Datenhaltung Fraunhofer ISE – 2. Datenhaltung • Jedes Repository besitzt eigenes Datenmodell • Gemeinsame Abfragen über Wildcards • Aliasses / Pipelines
  • 17. Beispiel: Fraunhofer ISE – 3. Sicherheit Fraunhofer ISE – 3. Autorisierung • Jedes Repository besitzt eigenes Authorisierungsmodell • Informationen z.T. aus dem Certification Authority Server • Security Proxy überprüft Autorisierung vor dem Ausliefern
  • 18. Completeness of Vision AbilitytoExecute As of August 2015 Challangers Niche Players Leaders Visionaries LucidworksLucidworks Expert System Recommind BA Insight IBM IHS Coveo Sinequa HP Mindbreeze Google Dassault Systèmes Attivio Lexmark Squiz Positionierung Gartner. Magic Quadrant
  • 19. What we can do for you What we can do for you • datenschutzkonform Hosten 
 auf der Cusy-Plattform • Installation, Wartung und Pflege 
 auf ihren Maschinen • Beratung, Anpassung und individuelle Weiterentwicklung
  • 21. Bildnachweise • Portrait; Ingo Kniest • Icons; André Henze; © Cusy GmbH • Michael Gernhardt in space during STS-69 in 1995; PUBLIC DOMAIN: NASA