Chicago Solr Meetup - June 10th: Exploring Hadoop with Search

•Als PPTX, PDF herunterladen•

0 gefällt mir•1,524 views

Lucidworks (Archived)

Technologie

Exploring
Hadoop with
Search
Pritesh Patel, Principal
Architect Search and Big
Data Analytics @ Avalon
Consulting, LLC

Why Search + Big Data?
What Hadoop is good at What Search is good at
Distributed File storage Free text retrieval
Store large data sets Index large data sets
Distributed Processing Textual Analysis
Filtering and Sorting
= Intelligence Discovery System
of large textual data sets

How we Integrated Search and Big Data
 Hbase Replication Facade
 Take advantage of results of Analytical Pig and Hive jobs
in Hadoop to make retrieval more intelligent
 Done with inbuilt replication and it scales
 Fast access since in Memory
 Push architecture so its near real time
 CRUD
 Store in HDFS and Search in LW/Solr
 Gives reference to source when integrated this way
 Hbase has a RestFul API to retrieve data given ID that Solr
would have after replication/indexing

Our Demo Architecture
Diagram by Varun Rao @ Avalon Consulting, LLC

A Use Case of this Architecture
 Monitor tweets with words “Hadoop”,
“Lucidworks”, and “Big Data”
 Automatically extract url’s mentioned when
talking about these terms
 In near real time visualize which urls seem to
be mentioned with these terms
 Discover urls that are becoming the most
popular when mentioned with the topics “Big
Data”, “Lucidworks”, and “Hadoop” and
those might be urls you want to read

Demo
 Any one want to send a tweet? Just use
one or more of the words “Hadoop”,
“Lucidworks”, “Big Data”
 Add the any url to the tweet that you’d
like to share. Try:
www.avalonconsult.com or
www.lucidworks.com

So much potential
 You can apply this to so many things.
 Do intelligent entity extraction to discover
topics with UIMA integration of Solr
 Do similar analysis of popular mentions
and people of the topics of choice
 Endless …
 Any questions?

Team
 Client Implementation done by Kevin
Risden @ Avalon
(risdenk@avalonconsult.com)
 Demo Architecture Team
 Varun Rao @ Avalon
(raov@avalonconsult.com)
 Pritesh Patel @ Avalon
(patelp@avalonconsult.com)

Empfohlen

ElasticsearchDivij Sehgal

Expanding Elastic: Learn how anyone can leverage heterogeneous compute to ext...Ryft

Seravia in the Cloudkidrane

Distributed search solutions and comparison zingopen

Jinchao demoJinchao Lin

Meetup Data-science OVHVincent Terrasi

Populate your Search index, NEST 2016-01David Smiley

Elastic searchMahmoud91Tx

Empfohlen

ElasticsearchDivij Sehgal

Expanding Elastic: Learn how anyone can leverage heterogeneous compute to ext...Ryft

Seravia in the Cloudkidrane

Distributed search solutions and comparison zingopen

Jinchao demoJinchao Lin

Meetup Data-science OVHVincent Terrasi

Populate your Search index, NEST 2016-01David Smiley

Elastic searchMahmoud91Tx

Using ElasticSearch as a fast, flexible, and scalable solution to search occu...kristgen

Elasticsearch From the Bottom Upfoundsearch

Try It The Google Way .abhinavbom

Big data ecosystemSlideCentral

Insight_150115_DemoMatt Rubashkin

Hands on experience in real-time data process with AWS Kinesis, Firehose, S3 ...Chuan-Yen Chiang

Hadoop Summit 2011 - Using a Hadoop Data Pipeline to Build a Graph of Users a...Bill Graham

Big data on_aws in korea by abhishek sinha (lunch and learn)Amazon Web Services Korea

"A Toolkit for Digital Research" - CNI 2013Kaitlin Thaney

Apache Hadoop India Summit 2011 talk "Online Content Optimization using Hadoo...Yahoo Developer Network

Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCampBigDataCamp

Google history nd architectureDivyangee Jain

Explore Your Data Using Amazon QuickSight and Build Your First Machine Learni...Amazon Web Services

Big Data Technology Stack : NutshellKhalid Imran

Fikrimuhal TRHUG 2016 Machine LearningSukru Hasdemir

Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...Lucidworks

Elastic Stack RoadmapImma Valls Bernaus

NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020Timothy McAliley

Introduction to Azure Synapse WebinarPeter Ward

Overview on elastic searchAsish Kumar Behera

Introducing LucidWorks App for Splunk Enterprise webinarLucidworks (Archived)

Adobe PhotoshopLaRue

Weitere ähnliche Inhalte

Was ist angesagt?

Using ElasticSearch as a fast, flexible, and scalable solution to search occu...kristgen

Elasticsearch From the Bottom Upfoundsearch

Try It The Google Way .abhinavbom

Big data ecosystemSlideCentral

Insight_150115_DemoMatt Rubashkin

Hands on experience in real-time data process with AWS Kinesis, Firehose, S3 ...Chuan-Yen Chiang

Hadoop Summit 2011 - Using a Hadoop Data Pipeline to Build a Graph of Users a...Bill Graham

Big data on_aws in korea by abhishek sinha (lunch and learn)Amazon Web Services Korea

"A Toolkit for Digital Research" - CNI 2013Kaitlin Thaney

Apache Hadoop India Summit 2011 talk "Online Content Optimization using Hadoo...Yahoo Developer Network

Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCampBigDataCamp

Google history nd architectureDivyangee Jain

Explore Your Data Using Amazon QuickSight and Build Your First Machine Learni...Amazon Web Services

Big Data Technology Stack : NutshellKhalid Imran

Fikrimuhal TRHUG 2016 Machine LearningSukru Hasdemir

Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...Lucidworks

Elastic Stack RoadmapImma Valls Bernaus

NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020Timothy McAliley

Introduction to Azure Synapse WebinarPeter Ward

Overview on elastic searchAsish Kumar Behera

Was ist angesagt? (20)

Using ElasticSearch as a fast, flexible, and scalable solution to search occu...

Elasticsearch From the Bottom Up

Try It The Google Way .

Big data ecosystem

Insight_150115_Demo

Hands on experience in real-time data process with AWS Kinesis, Firehose, S3 ...

Hadoop Summit 2011 - Using a Hadoop Data Pipeline to Build a Graph of Users a...

Big data on_aws in korea by abhishek sinha (lunch and learn)

"A Toolkit for Digital Research" - CNI 2013

Apache Hadoop India Summit 2011 talk "Online Content Optimization using Hadoo...

Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCamp

Google history nd architecture

Explore Your Data Using Amazon QuickSight and Build Your First Machine Learni...

Big Data Technology Stack : Nutshell

Fikrimuhal TRHUG 2016 Machine Learning

Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...

Elastic Stack Roadmap

NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020

Introduction to Azure Synapse Webinar

Overview on elastic search

Andere mochten auch

Introducing LucidWorks App for Splunk Enterprise webinarLucidworks (Archived)

Adobe PhotoshopLaRue

Maroon5tanica

Highly Relevant Search Result Ranking for Law EnforcementLucidworks (Archived)

IE12 大予想彰村地

A haititanica

Civil Wartanica

Center for Enterprise Innovation (CEI) Summary for HREDA, 9-25-14Marty Kaszubowski

Mujer, pajaro y estrellaguest986e5ae

Cmd Training Institute - New PremisesCMD Training Institute

20101023 ie9 cache彰村地

ブラウザー勉強会始めました彰村地

Pista American Idiottanica

Crazytanica

Van goghguest986e5ae

Moving to Solr/Lucene Open Source SearchLucidworks (Archived)

Presentation to Virginia Beach Vision, 1 27-14Marty Kaszubowski

Searching The United States Code with Solr/LuceneLucidworks (Archived)

Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Lucidworks (Archived)

In The Annals Of Rock History The Whotanica

Andere mochten auch (20)

Introducing LucidWorks App for Splunk Enterprise webinar

Adobe Photoshop

Maroon5

Highly Relevant Search Result Ranking for Law Enforcement

IE12 大予想

A haiti

Civil War

Center for Enterprise Innovation (CEI) Summary for HREDA, 9-25-14

Mujer, pajaro y estrella

Cmd Training Institute - New Premises

20101023 ie9 cache

ブラウザー勉強会始めました

Pista American Idiot

Crazy

Van gogh

Moving to Solr/Lucene Open Source Search

Presentation to Virginia Beach Vision, 1 27-14

Searching The United States Code with Solr/Lucene

Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...

In The Annals Of Rock History The Who

Ähnlich wie Chicago Solr Meetup - June 10th: Exploring Hadoop with Search

Big Data , Big Problem?Mohammadhasan Farazmand

963Annu Ahmed

Hadoop essentials by shiva achari - sample chapterShiva Achari

PyCon India 2012: Rapid development of website search in pythonChetan Giridhar

Vital AI: Big Data ModelingVital.AI

Hadoop data-lake-white-paperSupratim Ray

Predictive Analytics and Machine Learning…with SAS and Apache HadoopHortonworks

Big data or big dealeduarderwee

HDFSVardhman Kale

Get involved with the Apache Software FoundationShalin Shekhar Mangar

The ABC of Big DataAndré Faria Gomes

Datalake ArchitectureTechYugadi IT Solutions & Consulting

What is hadoopAsis Mohanty

Overview of big data & hadoop version 1 - Tony NguyenThanh Nguyen

Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen

Hadoop Frameworks Panel__HadoopSummit2010Yahoo Developer Network

Introduction to Big Data, MapReduce, its Use Cases, and the EcosystemsJongwook Woo

What's New & What's Next from AWS?Ian Massingham

Spark tutorial @ KCC 2015Jongwook Woo

lec3_ref.pdfvishal choudhary

Ähnlich wie Chicago Solr Meetup - June 10th: Exploring Hadoop with Search (20)

Big Data , Big Problem?

963

Hadoop essentials by shiva achari - sample chapter

PyCon India 2012: Rapid development of website search in python

Vital AI: Big Data Modeling

Hadoop data-lake-white-paper

Predictive Analytics and Machine Learning…with SAS and Apache Hadoop

Big data or big deal

HDFS

Get involved with the Apache Software Foundation

The ABC of Big Data

Datalake Architecture

What is hadoop

Overview of big data & hadoop version 1 - Tony Nguyen

Overview of Big data, Hadoop and Microsoft BI - version1

Hadoop Frameworks Panel__HadoopSummit2010

Introduction to Big Data, MapReduce, its Use Cases, and the Ecosystems

What's New & What's Next from AWS?

Spark tutorial @ KCC 2015

lec3_ref.pdf

Mehr von Lucidworks (Archived)

Integrating Hadoop & SolrLucidworks (Archived)

The Data-Driven ParadigmLucidworks (Archived)

Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Lucidworks (Archived)

SFBay Area Solr Meetup - July 15th: Integrating Hadoop and SolrLucidworks (Archived)

SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessLucidworks (Archived)

SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceLucidworks (Archived)

Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineLucidworks (Archived)

What's new in solr june 2014Lucidworks (Archived)

Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrLucidworks (Archived)

Minneapolis Solr Meetup - May 28, 2014: Target.com SearchLucidworks (Archived)

Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Lucidworks (Archived)

Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...Lucidworks (Archived)

Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCLucidworks (Archived)

What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCLucidworks (Archived)

Solr At AOL, Presented by Sean Timm at SolrExchage DCLucidworks (Archived)

Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCLucidworks (Archived)

Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCLucidworks (Archived)

Building a data driven search application with LucidWorks SiLKLucidworks (Archived)

Solr4 nosql search_server_2013Lucidworks (Archived)

Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks Lucidworks (Archived)

Mehr von Lucidworks (Archived) (20)

Integrating Hadoop & Solr

The Data-Driven Paradigm

Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...

SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr

SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business

SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance

Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine

What's new in solr june 2014

Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr

Minneapolis Solr Meetup - May 28, 2014: Target.com Search

Exploration of multidimensional biomedical data in pub chem, Presented by Lia...

Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...

Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC

What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC

Solr At AOL, Presented by Sean Timm at SolrExchage DC

Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC

Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC

Building a data driven search application with LucidWorks SiLK

Solr4 nosql search_server_2013

Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks

Kürzlich hochgeladen

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Real Time Object Detection Using Open CVKhem

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

A Domino Admins Adventures (Engage 2024)Gabriella Davis

A Year of the Servo Reboot: Where Are We Now?Igalia

Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech

Artificial Intelligence: Facts and MythsJoaquim Jorge

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

Kürzlich hochgeladen (20)

Axa Assurance Maroc - Insurer Innovation Award 2024

Real Time Object Detection Using Open CV

08448380779 Call Girls In Friends Colony Women Seeking Men

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Boost Fertility New Invention Ups Success Rates.pdf

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

A Domino Admins Adventures (Engage 2024)

A Year of the Servo Reboot: Where Are We Now?

Driving Behavioral Change for Information Management through Data-Driven Gree...

Factors to Consider When Choosing Accounts Payable Services Providers.pptx

Boost PC performance: How more available memory can improve productivity

Advantages of Hiring UIUX Design Service Providers for Your Business

Artificial Intelligence: Facts and Myths

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

The Codex of Business Writing Software for Real-World Solutions 2.pptx

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

Chicago Solr Meetup - June 10th: Exploring Hadoop with Search

1. Exploring Hadoop with Search Pritesh Patel, Principal Architect Search and Big Data Analytics @ Avalon Consulting, LLC

2. Hadoop Ecosystem

3. Possible Integration Points

4. Why Search + Big Data? What Hadoop is good at What Search is good at Distributed File storage Free text retrieval Store large data sets Index large data sets Distributed Processing Textual Analysis Filtering and Sorting = Intelligence Discovery System of large textual data sets

5. How we Integrated Search and Big Data  Hbase Replication Facade  Take advantage of results of Analytical Pig and Hive jobs in Hadoop to make retrieval more intelligent  Done with inbuilt replication and it scales  Fast access since in Memory  Push architecture so its near real time  CRUD  Store in HDFS and Search in LW/Solr  Gives reference to source when integrated this way  Hbase has a RestFul API to retrieve data given ID that Solr would have after replication/indexing

6. Our Demo Architecture Diagram by Varun Rao @ Avalon Consulting, LLC

7. A Use Case of this Architecture  Monitor tweets with words “Hadoop”, “Lucidworks”, and “Big Data”  Automatically extract url’s mentioned when talking about these terms  In near real time visualize which urls seem to be mentioned with these terms  Discover urls that are becoming the most popular when mentioned with the topics “Big Data”, “Lucidworks”, and “Hadoop” and those might be urls you want to read

8. Demo  Any one want to send a tweet? Just use one or more of the words “Hadoop”, “Lucidworks”, “Big Data”  Add the any url to the tweet that you’d like to share. Try: www.avalonconsult.com or www.lucidworks.com

9. So much potential  You can apply this to so many things.  Do intelligent entity extraction to discover topics with UIMA integration of Solr  Do similar analysis of popular mentions and people of the topics of choice  Endless …  Any questions?

10. Team  Client Implementation done by Kevin Risden @ Avalon (risdenk@avalonconsult.com)  Demo Architecture Team  Varun Rao @ Avalon (raov@avalonconsult.com)  Pritesh Patel @ Avalon (patelp@avalonconsult.com)

Hinweis der Redaktion

We’ve all seen this. You see search showing up there, but what does that really mean? --Is it push or is it pull? Well we have multiple options
--Directly from Ingestion, you can send to solr with the respective serializer classes. --Hbase is interesting. It’s the SQL like store for HDFS --Notice that all of these are pushes. I haven’t included pull yet, but they do exist. --One thing to note however is that HBase does have a Web access layer where you can make RestFul calls to grab data.
Complimentary = Intelligence system of large textual data sets
--Hbase is the SQL Store in HDFS --Has distribution with Master and RegionServers --There is an open source project called the Hbase Indexer that creates a façade Most importantly, you can store data in HDFS and search it with Solr without storing in Solr so taking advantage of the strengths of both.
This is what the architecture of this setup looks like.— --Our data source is twitter. --Flume is serializing it and writing directly to Hbase --Hbase is setup with a façade replication that behind the scenes is an indexer to solr --Then we are using SilK (i.e. banana) to visualize that that comes through
You can apply type of architecture to many use cases …