Webinar: Fusion for Data Science

•

2 gefällt mir•1,072 views

Lucidworks

Learn how Lucidworks Fusion simplifies data exploration and analysis.

Software

Fusion for data science:
Grant Ingersoll
CTO, Lucidworks
@gsingers
Scalable search and analytics in one

Get Started
https://github.com/LucidWorks/fusion-examples/tree/master/
fusion-for-datascience-webinar

• Best in breed search solution built on Apache Lucene and Solr
• Easily capture signals like clicks, shares, ratings, etc. and make them actionable
• Powerful data ingestion and analysis capabilities enabling machine learning,
recommendations and positive user feedback loops
• Effortless scale leveraging proven frameworks and algorithms
• Easy integration with big data tools like Hadoop
Fusion Foundations

Billions of Docs
Optional
REST
Security woven
throughout
Proxy/LB
Recs
Worker
Pipes Metrics
NLP Sched.
Blobs Msging
Connectors
Worker Cluster Mgr.
Spark
Shards Shards
Solr
HDFS
Shared Conﬁg
Mgmt
Leader
Election
Load
Balancing
ZK 1
Zookeeper
ZK N
Signals
Fusion Architecture
Millions of Users

• Data exploration and visualization
• Easy Ingestion, feature selection and data reduction
• REST APIs for easy integration with commonly used tools
• Quick and Dirty: classiﬁcation, clustering
• Powerful and scalable aggregations, math/stats framework leveraging Apache Spark
• Out of the box NLP tools for part of speech, sentence detection, named entity and more
• OOTB recommenders plus Mahout extensions
Fusion Data Science Use Cases

Lucene: Core search, pluggable ranking, advanced
storage, sparse matrix
Solr: Faceting, function queries, basic stats, scaling, easy
setup, UIMA, basic NLP, search clustering
Fusion: Pipelines, Connectors/Crawlers, Dashboards/UI,
Spark integration, advanced stats, large scale
aggregations
Fusion: Standing on the shoulders
of giants.

• Ingestion
• 60+ connectors, plus easily push data in using REST APIs
• Feature Selection
• Analyzers for all types
• Easily get/calculate weights for terms and attach payloads
• Term Vectors/Term Dictionary
• Data Reduction
• Filters
• Analyzers
• Data quality tools
Ingestion, Selection, Reduction

• Math:
• Search is essentially Vector * Matrix
• Aggregations
• Enable advanced computation over both core content as well as Fusion’s signals
• Make it easy to try out by leveraging Solr
• Ship with prebuilt “named” aggregations to cover common scenarios
Aggregations and Math

• Effortless scale, integrated with Fusion and Solr
• Leverage existing libraries like:
• Mahout
• Deep Learning 4J
• GraphX, MLLib
• As easy as:
• bin/spark start
• http://.../aggregator/jobs/twitter/hashtags_per_author?spark=true
Spark FTW!

• Fusion powers recommendation use cases such as:
• People who bough this, bought that
• Related searches, spellings and more
• Session analysis
• Fusion ships with several built in recommendation options
- Graph and collaborative ﬁltering based approaches
• Easily enable multi-modal recommendations that combine:
- Content
- Collaborative Filtering
- Spatial
- Historic/Context
Recommendations

• Spark
• APIs for running non-Lucid Spark jobs
• Integration with 3rd party Spark instances (from major Hadoop distros)
• Solr RDD extensions for term dictionary, term vectors
• UI for managing Aggregations
• Full-ﬂedged Graph API
• More Math: matrices, functions, etc.
What's Next

• Lucidworks: http://www.lucidworks.com
• Me: grant@lucidworks.com
• Key Docs:
• https://docs.lucidworks.com
• https://docs.lucidworks.com/display/fusion/Signals+Aggregator+API
• https://docs.lucidworks.com/display/fusion/Aggregator+Functions
• https://docs.lucidworks.com/display/fusion/Signals+Aggregations+and
+Recommendations
Resources

Empfohlen

Webinar: Rapid Solr Development with FusionLucidworks

Webinar: Solr & Fusion for Big DataLucidworks

Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...Lucidworks

Searching for Better Code: Presented by Grant Ingersoll, LucidworksLucidworks

Solr for Data ScienceGrant Ingersoll

TweetMogaz - The Arabic Tweets Platform: Presented by Ahmed Adel, BADRLucidworks

Cascalog at May Bay Area Hadoop User Groupnathanmarz

Webinar: Fusion 2.3 Preview - Enhanced Features with Solr & SparkLucidworks

Empfohlen

Webinar: Rapid Solr Development with FusionLucidworks

Webinar: Solr & Fusion for Big DataLucidworks

Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...Lucidworks

Searching for Better Code: Presented by Grant Ingersoll, LucidworksLucidworks

Solr for Data ScienceGrant Ingersoll

TweetMogaz - The Arabic Tweets Platform: Presented by Ahmed Adel, BADRLucidworks

Cascalog at May Bay Area Hadoop User Groupnathanmarz

Webinar: Fusion 2.3 Preview - Enhanced Features with Solr & SparkLucidworks

This Ain't Your Parent's Search EngineGrant Ingersoll

Webinar: Site Search in an Hour with FusionLucidworks

Data IO: Next Generation Search with Lucene and Solr 4Grant Ingersoll

Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan VolzDatabricks

Intro to SearchGrant Ingersoll

Data Science at Scale by Sarah GuidoSpark Summit

Dogfooding data at Lyftmarkgrover

Apache Spark in IndustryDorian Beganovic

Presto @ Facebook: Past, Present and FutureDataWorks Summit

Webinar: Event Processing & Data Analytics with Lucidworks FusionLucidworks

Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...Lucidworks

Securing Data in Hadoop at UberDataWorks Summit

(PFC308) How Dropbox Scales Massive Workloads Using Amazon SQS | AWS re:Inven...Amazon Web Services

Apache Arrow: Leveling Up the Analytics StackWes McKinney

Spark volume requirements 2018Rachit Arora

Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Lucidworks

Webinar: Replace Google Search Appliance with Lucidworks FusionLucidworks

Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineLucidworks (Archived)

Meetup070416 PresentationsAna Rebelo

NATE-Central-LogStefan Coetzee

Webinar: Secure Solr with FusionLucidworks

Using Graph theory to understand Intent & Concepts - Neo4j User Group (Januar...TUMRA | Big Data Science - Gain a competitive advantage through Big Data & Data Science

Weitere ähnliche Inhalte

Was ist angesagt?

This Ain't Your Parent's Search EngineGrant Ingersoll

Webinar: Site Search in an Hour with FusionLucidworks

Data IO: Next Generation Search with Lucene and Solr 4Grant Ingersoll

Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan VolzDatabricks

Intro to SearchGrant Ingersoll

Data Science at Scale by Sarah GuidoSpark Summit

Dogfooding data at Lyftmarkgrover

Apache Spark in IndustryDorian Beganovic

Presto @ Facebook: Past, Present and FutureDataWorks Summit

Webinar: Event Processing & Data Analytics with Lucidworks FusionLucidworks

Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...Lucidworks

Securing Data in Hadoop at UberDataWorks Summit

(PFC308) How Dropbox Scales Massive Workloads Using Amazon SQS | AWS re:Inven...Amazon Web Services

Apache Arrow: Leveling Up the Analytics StackWes McKinney

Spark volume requirements 2018Rachit Arora

Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Lucidworks

Webinar: Replace Google Search Appliance with Lucidworks FusionLucidworks

Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineLucidworks (Archived)

Meetup070416 PresentationsAna Rebelo

NATE-Central-LogStefan Coetzee

Was ist angesagt? (20)

This Ain't Your Parent's Search Engine

Webinar: Site Search in an Hour with Fusion

Data IO: Next Generation Search with Lucene and Solr 4

Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz

Intro to Search

Data Science at Scale by Sarah Guido

Dogfooding data at Lyft

Apache Spark in Industry

Presto @ Facebook: Past, Present and Future

Webinar: Event Processing & Data Analytics with Lucidworks Fusion

Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...

Securing Data in Hadoop at Uber

(PFC308) How Dropbox Scales Massive Workloads Using Amazon SQS | AWS re:Inven...

Apache Arrow: Leveling Up the Analytics Stack

Spark volume requirements 2018

Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...

Webinar: Replace Google Search Appliance with Lucidworks Fusion

Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine

Meetup070416 Presentations

NATE-Central-Log

Andere mochten auch

Webinar: Secure Solr with FusionLucidworks

Using Graph theory to understand Intent & Concepts - Neo4j User Group (Januar...TUMRA | Big Data Science - Gain a competitive advantage through Big Data & Data Science

Evolving the Optimal Relevancy Ranking Model at Dice.comSimon Hughes

Search in 2020: Presented by Will Hayes, LucidworksLucidworks

Evolving Search Relevancy: Presented by James Strassburg, Direct SupplyLucidworks

The Data-Drive ParadigmLucidworks

SearchHub - How to Spend Your Summer Keeping it Real: Presented by Grant Inge...Lucidworks

How EXPRESS improved Site Search to get customers to the right products, fast...Unbxd

Anatomy of Relevance - From Data to Action: Presented by Saïd Radhouani, Yell...Lucidworks

Automotive Information Research Driven by Apache Solr: Presented by Mario-Lea...Lucidworks

Events, Signals, and RecommendationsLucidworks

It's Just Search: Presented by Erik Hatcher, LucidworksLucidworks

Lucene/Solr Revolution 2015 Opening Keynote with Lucidworks CEO Will HayesLucidworks

Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLucidworks

Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, ClouderaLucidworks

Improving Enterprise Findability: Presented by Jayesh Govindarajan, SalesforceLucidworks

Webinar: Natural Language Search with SolrLucidworks

Webinar: Search and RecommendersLucidworks

Working with Deeply Nested Documents in Apache Solr: Presented by Anshum Gupt...Lucidworks

Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...Lucidworks

Andere mochten auch (20)

Webinar: Secure Solr with Fusion

Using Graph theory to understand Intent & Concepts - Neo4j User Group (Januar...

Evolving the Optimal Relevancy Ranking Model at Dice.com

Search in 2020: Presented by Will Hayes, Lucidworks

Evolving Search Relevancy: Presented by James Strassburg, Direct Supply

The Data-Drive Paradigm

SearchHub - How to Spend Your Summer Keeping it Real: Presented by Grant Inge...

How EXPRESS improved Site Search to get customers to the right products, fast...

Anatomy of Relevance - From Data to Action: Presented by Saïd Radhouani, Yell...

Automotive Information Research Driven by Apache Solr: Presented by Mario-Lea...

Events, Signals, and Recommendations

It's Just Search: Presented by Erik Hatcher, Lucidworks

Lucene/Solr Revolution 2015 Opening Keynote with Lucidworks CEO Will Hayes

Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware

Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera

Improving Enterprise Findability: Presented by Jayesh Govindarajan, Salesforce

Webinar: Natural Language Search with Solr

Webinar: Search and Recommenders

Working with Deeply Nested Documents in Apache Solr: Presented by Anshum Gupt...

Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...

Ähnlich wie Webinar: Fusion for Data Science

Webinar: Fusion 3.1 - What's NewLucidworks

Elasticsearch Introduction at BigData meetupEric Rodriguez (Hiring in Lex)

SQL Analytics for Search Engineers - Timothy Potter, LucidworksngineersLucidworks

Current and emerging trends in library servicesNikesh Narayanan

Practical Machine Learning for Smarter Search with Spark+SolrJake Mannix

Practical Machine Learning for Smarter Search with Solr and SparkJake Mannix

Introduction to SolrErik Hatcher

Apache Solr vs Oracle EndecaPedro Melo Pereira

Fusion 3 Overview Webinar Lucidworks

The Apache Solr Smart Data EcosystemTrey Grainger

Solr: 4 big featuresDavid Smiley

Let's Build an Inverted Index: Introduction to Apache Lucene/SolrSease

Webinar: Personalized Retail Search & Recommendations with FusionLucidworks

Presto: Fast SQL on EverythingDavid Phillips

Data Scientist ToolboxAndrei Savu

Application of Library Management Software: NewGenLibDavid Nzoputa Ofili

Share point 2013 enterprise search (public)Petter Skodvin-Hvammen

CKAN - the open source data portal platformMaurizio Napolitano

Federated to library discovery platfomsNikesh Narayanan

This Ain't Your Parents' Search EngineLucidworks

Ähnlich wie Webinar: Fusion for Data Science (20)

Webinar: Fusion 3.1 - What's New

Elasticsearch Introduction at BigData meetup

SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers

Current and emerging trends in library services

Practical Machine Learning for Smarter Search with Spark+Solr

Practical Machine Learning for Smarter Search with Solr and Spark

Introduction to Solr

Apache Solr vs Oracle Endeca

Fusion 3 Overview Webinar

The Apache Solr Smart Data Ecosystem

Solr: 4 big features

Let's Build an Inverted Index: Introduction to Apache Lucene/Solr

Webinar: Personalized Retail Search & Recommendations with Fusion

Presto: Fast SQL on Everything

Data Scientist Toolbox

Application of Library Management Software: NewGenLib

Share point 2013 enterprise search (public)

CKAN - the open source data portal platform

Federated to library discovery platfoms

This Ain't Your Parents' Search Engine

Mehr von Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks

Drive Agent Effectiveness in SalesforceLucidworks

How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks

Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks

Connected Experiences Are Personalized ExperiencesLucidworks

Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks

[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks

Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks

Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks

AI-Powered Linguistics and Search with Fusion and RosetteLucidworks

The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks

Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks

Smart Answers for Employee and Customer Support After COVID-19Lucidworks

Applying AI & Search in Europe - featuring 451 ResearchLucidworks

Webinar: Accelerate Data Science with Fusion 5.1Lucidworks

Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks

Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks

Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks

Webinar: Building a Business Case for Enterprise SearchLucidworks

Why Insight Engines Matter in 2020 and BeyondLucidworks

Mehr von Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy

Drive Agent Effectiveness in Salesforce

How Crate & Barrel Connects Shoppers with Relevant Products

Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery

Connected Experiences Are Personalized Experiences

Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...

[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...

Preparing for Peak in Ecommerce | eTail Asia 2020

Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...

AI-Powered Linguistics and Search with Fusion and Rosette

The Service Industry After COVID-19: The Soul of Service in a Virtual Moment

Webinar: Smart answers for employee and customer support after covid 19 - Europe

Smart Answers for Employee and Customer Support After COVID-19

Applying AI & Search in Europe - featuring 451 Research

Webinar: Accelerate Data Science with Fusion 5.1

Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy

Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...

Apply Knowledge Graphs and Search for Real-World Decision Intelligence

Webinar: Building a Business Case for Enterprise Search

Why Insight Engines Matter in 2020 and Beyond

Kürzlich hochgeladen

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls

SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01

5 Signs You Need a Fashion PLM Software.pdfWave PLM

Optimizing AI for immediate response in Smart CCTVshikhaohhpro

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.

Microsoft AI Transformation Partner Playbook.pdfWilly Marroquin (WillyDevNET)

TECUNIQUE: Success Stories: IT Service providermohitmore19

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc

Software Quality Assurance Interview QuestionsArshad QA

Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions

HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai

Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812

A Secure and Reliable Document Management System is Essential.docxComplianceQuest1

Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy

Kürzlich hochgeladen (20)

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️

SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...

5 Signs You Need a Fashion PLM Software.pdf

Optimizing AI for immediate response in Smart CCTV

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...

Microsoft AI Transformation Partner Playbook.pdf

TECUNIQUE: Success Stories: IT Service provider

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

How To Use Server-Side Rendering with Nuxt.js

Software Quality Assurance Interview Questions

Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...

HR Software Buyers Guide in 2024 - HRSoftware.com

Unlocking the Future of AI Agents with Large Language Models

A Secure and Reliable Document Management System is Essential.docx

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications

Webinar: Fusion for Data Science

2. Fusion for data science: Grant Ingersoll CTO, Lucidworks @gsingers Scalable search and analytics in one

3. Get Started https://github.com/LucidWorks/fusion-examples/tree/master/ fusion-for-datascience-webinar

4. • Best in breed search solution built on Apache Lucene and Solr • Easily capture signals like clicks, shares, ratings, etc. and make them actionable • Powerful data ingestion and analysis capabilities enabling machine learning, recommendations and positive user feedback loops • Effortless scale leveraging proven frameworks and algorithms • Easy integration with big data tools like Hadoop Fusion Foundations

5. Billions of Docs Optional REST Security woven throughout Proxy/LB Recs Worker Pipes Metrics NLP Sched. Blobs Msging Connectors Worker Cluster Mgr. Spark Shards Shards Solr HDFS Shared Conﬁg Mgmt Leader Election Load Balancing ZK 1 Zookeeper ZK N Signals Fusion Architecture Millions of Users

7. • Data exploration and visualization • Easy Ingestion, feature selection and data reduction • REST APIs for easy integration with commonly used tools • Quick and Dirty: classiﬁcation, clustering • Powerful and scalable aggregations, math/stats framework leveraging Apache Spark • Out of the box NLP tools for part of speech, sentence detection, named entity and more • OOTB recommenders plus Mahout extensions Fusion Data Science Use Cases

8. Lucene: Core search, pluggable ranking, advanced storage, sparse matrix Solr: Faceting, function queries, basic stats, scaling, easy setup, UIMA, basic NLP, search clustering Fusion: Pipelines, Connectors/Crawlers, Dashboards/UI, Spark integration, advanced stats, large scale aggregations Fusion: Standing on the shoulders of giants.

9. Data Exploration Demo

10. • Ingestion • 60+ connectors, plus easily push data in using REST APIs • Feature Selection • Analyzers for all types • Easily get/calculate weights for terms and attach payloads • Term Vectors/Term Dictionary • Data Reduction • Filters • Analyzers • Data quality tools Ingestion, Selection, Reduction

11. • Math: • Search is essentially Vector * Matrix • Aggregations • Enable advanced computation over both core content as well as Fusion’s signals • Make it easy to try out by leveraging Solr • Ship with prebuilt “named” aggregations to cover common scenarios Aggregations and Math

12. • Effortless scale, integrated with Fusion and Solr • Leverage existing libraries like: • Mahout • Deep Learning 4J • GraphX, MLLib • As easy as: • bin/spark start • http://.../aggregator/jobs/twitter/hashtags_per_author?spark=true Spark FTW!

13. Aggregations Demo

14. • Fusion powers recommendation use cases such as: • People who bough this, bought that • Related searches, spellings and more • Session analysis • Fusion ships with several built in recommendation options - Graph and collaborative ﬁltering based approaches • Easily enable multi-modal recommendations that combine: - Content - Collaborative Filtering - Spatial - Historic/Context Recommendations

15. • Spark • APIs for running non-Lucid Spark jobs • Integration with 3rd party Spark instances (from major Hadoop distros) • Solr RDD extensions for term dictionary, term vectors • UI for managing Aggregations • Full-ﬂedged Graph API • More Math: matrices, functions, etc. What's Next

16. • Lucidworks: http://www.lucidworks.com • Me: grant@lucidworks.com • Key Docs: • https://docs.lucidworks.com • https://docs.lucidworks.com/display/fusion/Signals+Aggregator+API • https://docs.lucidworks.com/display/fusion/Aggregator+Functions • https://docs.lucidworks.com/display/fusion/Signals+Aggregations+and +Recommendations Resources