Building the search engine: from thorns to stars

Alexander Tokarev is a database performance architect who gave a presentation on his experience using Oracle In-Memory technology for a faceted search project. The project involved tagging 3 million objects with 42 million tags, which was loaded into an Oracle database. Initial performance testing without In-Memory showed slow query speeds. After implementing Oracle In-Memory, query performance improved up to 21 times faster. However, Tokarev discovered that not all Oracle In-Memory features provided significant benefits and some caused issues. With tuning, the final In-Memory implementation led to a 4x overall performance boost for the faceted search queries.

Tagging search solution design Advanced edition

Optimizing Your Search Experience

Efficient Scalable Search in a Multi-Tenant Environment: Presented by Harry H...

This document discusses efficient scalable search in a multi-tenant environment. It describes Bloomberg Vault, which hosts large volumes of enterprise communications and documents for compliance. The system uses a distributed architecture with shards that are loaded on demand to serve search queries. Security is ensured by dynamically generating field values that encapsulate access permissions for each user's view of a document.

Data Analysis with Apache Flink (Hadoop Summit, 2015)

Aljoscha Krettek

Tagging search solution design

(ATS6-PLAT02) Accelrys Catalog and Protocol Validation

Accelrys Catalog is a powerful new technology for creating an index of the protocols and components within your organization. You will learn about strategies for indexing and how search capabilities can be deployed to professional client and Web Port end users. You will also learn how to use this technology to find out about system usage to aid with system upgrades, server consolidations, and general system maintenance. The protocol validation capability in the admin portal allows administrators to created standard reports on server usage characteristics. You will learn how to report on violations of IT policies (e.g. around security), bad protocol authoring practices, or missing or incomplete protocol documentation. Developers will also learn how to extend and customize the rules used to create these reports.

INTEGRASI ORCID DENGAN CROSSREF

Relawan Jurnal Indonesia

Crossref's ORCID Auto-Update allows publishers to deposit author ORCID IDs with article metadata. With authors' permission, Crossref will automatically post their works to their ORCID profile and update it with any future publications. This benefits authors by compiling their works in one place. Over 1.4 million works have been deposited with ORCIDs so far, with hundreds of thousands automatically updated on authors' profiles through this service.

Empfohlen

Oracle InMemory hardcore edition

Tagging search solution design Advanced edition

Optimizing Your Search Experience

Efficient Scalable Search in a Multi-Tenant Environment: Presented by Harry H...

Data Analysis with Apache Flink (Hadoop Summit, 2015)

Aljoscha Krettek

Tagging search solution design

(ATS6-PLAT02) Accelrys Catalog and Protocol Validation

INTEGRASI ORCID DENGAN CROSSREF

Relawan Jurnal Indonesia

Solr4 nosql search_server_2013

Lucidworks (Archived)

The document discusses Solr 4, an open source search platform built on Apache Lucene. Some key points: - Solr 4 is a NoSQL search server that provides distributed indexing, fault tolerance, and real-time search capabilities. - Solr Cloud is Solr's distributed architecture which uses Zookeeper for coordination to provide features like automatic sharding and replication of indexes across multiple servers. - The document outlines Solr 4's capabilities including schema-less options, atomic updates, optimistic concurrency, and a REST API for managing the schema dynamically.

Sumo Logic - Optimizing Your Search Experience (2016-08-17)

The document discusses optimizing searches in Sumo Logic. It covers basic search structure, setting performance expectations, and optimization tools like field extraction rules, partitions, and scheduled views. Field extraction rules extract fields during ingestion to standardize searches and simplify parsing. Partitions divide data to improve search performance by searching smaller chunks. Scheduled views pre-aggregate data to significantly improve performance for selective queries and long-term trend analysis. The document provides recommendations on when and how to use these optimization tools to improve search performance.

Dev411

guest2130e

This document summarizes best practices for improving ASP.NET performance based on testing with various tools. Key findings include that a DataReader is faster than a DataSet, inline SQL is faster than stored procedures, caching improves performance, and reducing ViewState usage and templates improves speed. The presenter advocates using profiling tools to test performance and recommends strategies like optimizing database queries, caching data, and minimizing unnecessary page elements.

Flink Community Update 2015 June

Márton Balassi

This document summarizes Apache Flink community updates from June 2015. It discusses the 0.9.0 release of Apache Flink, an open source platform for scalable batch and stream data processing. Key points include the addition of two new committers, blog posts and workshops promoting Flink, and various conference and meetup talks about Flink occurring that month. It encourages registration for the Flink Forward conference in October 2015.

Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks

Timothy Potter presented at a Big Data conference in Boston from October 11-14, 2016. He discussed how Lucidworks Fusion provides an alternative to traditional big data stacks that emphasizes fast access, agility and automation over integration. Fusion allows for common access patterns like fast lookups, ranked retrieval and distributed scans while integrating technologies like Solr, Spark, HDFS and more. It provides tools for data ingestion, time-based partitioning, analytics, machine learning and more to solve business problems rather than focus on infrastructure.

Sumo Logic QuickStart Webinar

Document Summarizer

Aditya Lunawat

Consuming External Content and Enriching Content with Apache Camel

therealgaston

This document discusses using Apache Camel as a document processing platform to enrich content from Adobe Experience Manager (AEM) before indexing it into a search engine like Solr. It presents the typical direct integration of AEM and search that has limitations, and proposes using Camel to offload processing and make the integration more fault tolerant. Key aspects covered include using Camel's enterprise integration patterns to extract content from AEM, transform and enrich it through multiple processing stages, and submit it to Solr. The presentation includes examples of how to model content as messages in Camel and build the integration using its Java DSL.

How Rackspace Cloud Monitoring uses Cassandra

gdusbabek

This document summarizes a Cassandra usage example. It describes using Cassandra for a monitoring control (CM) cluster and a separate data cluster. The CM cluster stores metadata and configuration in Cassandra and uses a Node.js API and ORM. The data cluster ingests high-volume time-series metrics data, performs rollups to different granularities (minutes, hours, days), and stores the results in Cassandra for fast retrieval.

(ATS6-PLAT04) Query service

The Query Service is the new platform solution for querying a variety of data sources. The goal of Query Service is that administrators can configure a metadata description of the data source that can then be used by end users without detailed knowledge of the underlying data source. This session explains how to configure Query Service data sources and use them with the RESTful API or component collection.

hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba

Michael Stack

Introduction to Lucene and Solr - 1

YI-CHING WU

Solr中国6月21日企业搜索

longkeyy

This document provides an introduction to enterprise search and its key components. It discusses how search engines work by building indexes on text and answering queries using those indexes. The two main components are indexing, which structures data for easy searching, and search, which returns results based on user queries against the index. It introduces common file formats that can be indexed like text, HTML, PDFs. Lucene and Solr are introduced as open source search libraries, with Solr building on Lucene and adding features like indexing, querying via HTTP, and admin interfaces. The document demonstrates adding, deleting, and searching for documents in Solr.

Journey of Implementing Solr at Target: Presented by Raja Ramachandran, Target

This document summarizes Target's implementation of Solr as its search platform. It discusses how Target transitioned from Oracle-Endeca to Solr to handle its large scale data and enable more flexible relevancy controls. It describes how Target tested Solr through handling live guest traffic in two sprints and moving its typeahead functionality to the public cloud. Finally, it outlines how Target leverages key Solr capabilities like collection aliases, atomic updates, and configurable facets to synchronize designer and product launches.

SharePoint Search Topology and Optimization

Mike Maadarani

This document contains a presentation on SharePoint 2013 search topology and optimization. It discusses the search architecture in SharePoint 2010 and 2013, and how the search components like crawl, query processing, and indexing are distributed across servers. It provides guidance on how to configure search topologies for small, medium and large farms based on the number of items and queries per second. It also covers search configuration options like authorities, query rules and the query builder to tune search relevance.

Azure search

Raju Kumar

Azure Search is a cloud search service that allows developers to add search functionality to applications. Key features include scalability, powerful querying abilities, scoring profiles, and search navigation options. To use Azure Search, developers first create a search service, then define indexes and documents. Documents are added to indexes which are optimized data structures for search. Queries can be executed against indexes to retrieve relevant documents based on search terms. Results can be filtered and scored using various options in Azure Search.

Andrzej bialecki lr-2013-dublin

lucenerevolution

Presented by Andrzej Bialecki, LucidWorks This session presents a set of Solr components for easy management of "sidecar indexes" - indexes that extend the main index with additional stored and / or indexed fields. Conceptually this can be viewed as an extension of the ExternalFileField or as a static join between documents from two collections. This functionality is useful in applications that require very different update regimes for the two parts of the index (e.g. main catalogue items combined with clickthroughs).

Columnar Table Performance Enhancements Of Greenplum Database with Block Meta...

Ontico

HighLoad++ 2017 Зал «Рио-де-Жанейро», 7 ноября, 13:00 Тезисы: http://www.highload.ru/2017/abstracts/2923.html Alibaba built up a data warehouse service named HybridDB in its public cloud service, based on the open sourced Greenplum Database. And it keeps on enhancing HybridDB's preformance. This presentation will talk about how Alibaba improves HybridDB's performance for columnar tables with data block's meta data (MIN/MAX values of block data) and sort keys (pre-defined keys that data will be sorted and stored with). Testing result shows that, block metadata can be generated on-the-fly without much overhead, but can achive better performance even than index scan. With sort keys, a constant response time can be archived for GROUP-BY and ORDER-BY queries.

Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology

This document describes a custom Solr plugin for fuzzy name matching. The plugin handles challenges like name variations and ambiguity. It creates a custom field type that scores name matches and supports multiple fields and values per document. At query time, it generates a custom Lucene query to find candidates, then uses Solr's rerank feature to rescore the top results based on the name matching algorithm. The plugin is configurable to trade off accuracy versus speed and supports multi-lingual name matching.

Cassandra Day Chicago 2015: Top 5 Tips/Tricks with Apache Cassandra and DSE

DataStax Academy

The document provides 5 tips for using Cassandra and DSE: 1) Data modeling best practices to avoid secondary indexes, 2) Understanding compaction choices like size-tiered, leveled, and date-tiered and their use cases, 3) Common mistakes in proofs-of-concept like testing on different hardware and empty nodes, 4) Hardware recommendations like using moderate sized nodes with SSDs, and 5) Anti-patterns like loading large batches of data and modeling queues with improper partitioning.

Taking Splunk to the Next Level - Architecture Breakout Session

This document discusses strategies for scaling a Splunk deployment. It begins by describing how customers typically start with a single use case but then need to scale to handle more data and use cases. It then covers strategies for scaling the forwarding, indexing, search, and management components of Splunk. Key topics include load balancing forwarders, using indexer clustering for high availability, scaling search heads by clustering, and using the deployment server and distributed management console for centralized management. The document emphasizes planning storage capacity and I/O when scaling indexers and considering Splunk's application support when scaling search heads.

ALM Search Presentation for the VSS Arch Council

Sunita Shrivastava

The document discusses Microsoft's ALM Search service architecture and design. It describes plans for the search indexing and query pipelines, including using Elastic Search for indexing and querying across artifacts. It addresses security, performance, deployment topology, and futures like semantic search and integration with on-premise systems. Key points include indexing millions of files in hours, scaling out the indexing pipeline, and supporting cross-account and public repository search.

Weitere ähnliche Inhalte

Was ist angesagt?

Solr4 nosql search_server_2013

Lucidworks (Archived)

Sumo Logic - Optimizing Your Search Experience (2016-08-17)

Dev411

guest2130e

Flink Community Update 2015 June

Márton Balassi

Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks

Sumo Logic QuickStart Webinar

Document Summarizer

Aditya Lunawat

Consuming External Content and Enriching Content with Apache Camel

therealgaston

How Rackspace Cloud Monitoring uses Cassandra

gdusbabek

(ATS6-PLAT04) Query service

hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba

Michael Stack

Introduction to Lucene and Solr - 1

YI-CHING WU

Solr中国6月21日企业搜索

longkeyy

Journey of Implementing Solr at Target: Presented by Raja Ramachandran, Target

SharePoint Search Topology and Optimization

Mike Maadarani

Azure search

Raju Kumar

Andrzej bialecki lr-2013-dublin

lucenerevolution

Columnar Table Performance Enhancements Of Greenplum Database with Block Meta...

Ontico

Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology

Cassandra Day Chicago 2015: Top 5 Tips/Tricks with Apache Cassandra and DSE

DataStax Academy

Was ist angesagt? (20)

Solr4 nosql search_server_2013

Sumo Logic - Optimizing Your Search Experience (2016-08-17)

Dev411

Flink Community Update 2015 June

Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks

Sumo Logic QuickStart Webinar

Document Summarizer

Consuming External Content and Enriching Content with Apache Camel

How Rackspace Cloud Monitoring uses Cassandra

(ATS6-PLAT04) Query service

hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba

Introduction to Lucene and Solr - 1

Solr中国6月21日企业搜索

Journey of Implementing Solr at Target: Presented by Raja Ramachandran, Target

SharePoint Search Topology and Optimization

Azure search

Andrzej bialecki lr-2013-dublin

Columnar Table Performance Enhancements Of Greenplum Database with Block Meta...

Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology

Cassandra Day Chicago 2015: Top 5 Tips/Tricks with Apache Cassandra and DSE

Ähnlich wie Building the search engine: from thorns to stars

Taking Splunk to the Next Level - Architecture Breakout Session

ALM Search Presentation for the VSS Arch Council

Sunita Shrivastava

Taking Splunk to the Next Level – Architecture

Are you outgrowing your initial Splunk deployment? Is Splunk becoming mission critical and you need to make sure it's Enterprise ready? Attend this session led by Splunk experts to learn about taking your Splunk deployment to the next level. Learn about Splunk high availability architectures with Splunk Search Head Clustering and Index Replication. Additionally, learn how to manage your deployment with Splunk’s operational and management controls to manage Splunk capacity and end user experience.

SQLlite and Full Text Search Presentation

leximo

SQLite is a small, self-contained, zero-configuration, transactional SQL database that requires no setup or administration. It is widely used due to its small size and simplicity, with no server dependency or configuration required. SQLite is ideal for lightweight database needs such as embedded systems and websites, and includes a full-text search engine for indexing and searching large bodies of text.

Getting Started with Splunk

Elasticsearch from the trenches

Jai Jones

The document describes challenges faced when building a search application using Elasticsearch to index and search 6 billion documents. The initial approach of using default shard counts and indexing strategies led to out of memory errors and slow searches. Key problems identified were high field data usage bringing down nodes, searching all indices being slow, and the garbage collector being unable to free enough memory. Improvements involved right-sizing the number of shards, monitoring and reducing field data, targeting specific indices in searches, changing garbage collectors, and dedicating hardware roles.

5 multi-instance management

sqlserver.co.il

This document discusses the challenges of managing a multi-instance SQL Server environment and outlines 888 Casino's approach. It describes their architecture with centralized management and data collection. Key areas covered include installations/upgrades, high availability, data retention, maintenance, monitoring, version uploads, and troubleshooting. Automation is emphasized through tools like Object Builder, monitoring with Precise i3, and version upload tools.

Web scale MySQL at Facebook (Domas Mituzas)

Ontico

This document summarizes Facebook's use of MySQL at web scale. Key points include: - Facebook has over 800 million monthly active users generating huge volumes of queries and data. - MySQL is customized with patches and extra resiliency to handle these loads. - Performance optimizations focus on avoiding stalls and fully utilizing hardware. - Tools help identify stalls and bottlenecks like table extensions, purge contention, and I/O pressure. - Memory and disk efficiency are improved through techniques like compact records and flash caching. - Group commit and admission control were added to MySQL to further optimize high concurrency workloads. - Ongoing work looks to leverage new hardware, improve replication, and incorporate compression.

Expert summit SQL Server 2016

Łukasz Grala

Practical SQL query monitoring and optimization

Ivo Andreev

Practical SQL query monitoring and optimization Today the project owners demand results as soon as possible and most often - for yesterday. Time to market is crucial and it is practical to deliver bit-by-bit, get feedback and grow with the number of your customers. But as the project grows, the team does too and not all have the same expertise. As well rarely in the beginning the requirements clear enough to allow performance-wise SQL interaction. In most cases there does not exist an ORM that can solve this task for you and you will need to have hard T-SQL writer in the team. If you already know this story or are going this way then in this practical session we will share how to monitor, measure and optimize your SQL code and DB layer interaction.

Taking Splunk to the Next Level - Architecture Breakout Session