Cloudera Search brings full-text, interactive search and scalable indexing to data in HDFS and Apache HBase. Powered by and adding to Apache Solr, Cloudera Search fully integrates with CDH to bring scale and reliability for next-generation open source search -- Big Data search.
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Cloudera Search Webinar: Big Data Search, Bigger Insights
1. 1
Cloudera
Search
Embracing
Apache
Solr
into
Cloudera’s
Pla9orm
for
Big
Data
Eva
Andreasson,
Sr.
Product
Manager,
Cloudera
Steven
Noels,
Co-‐founder
and
SVP
of
Products,
NGDATA
2. Who
is
Cloudera?
2
What
the
Enterprise
Requires
§ Only
100%
open
source
Hadoop-‐based
pla<orm
with
both
batch
and
real-‐
@me
processing
engines,
enterprise-‐ready
with
na@ve
high
availability
§ Suite
of
system
and
data
management
soEware
§ Comprehensive
support
and
consul@ng
services
§ Broadest
Hadoop
training
and
cer@fica@on
programs
Extensive
Partner
Ecosystem
§ Over
600
partners
across
hardware,
soEware
and
services
The
Leader
in
Big
Data
Management
§ Deliver
a
revolu@onary
data
management
pla<orm
powered
by
Apache
Hadoop
§ World’s
leading
commercial
vendor
of
Apache
Hadoop
§ Enable
organiza@ons
to
improve
opera@onal
efficiency
and
Ask
Bigger
Ques@ons
of
all
their
data
Customers
&
Users
Across
Industries
§ More
produc@on
deployments
than
all
other
vendors
combined
3.
INGEST
STORE
EXPLORE
PROCESS
ANALYZE
SERVE
CDH
CLOUDERA
MANAGER
CLOUDERA
SUPPORT
Cloudera
Enterprise
3
BRINGS
STORAGE
&
COMPUTE
TOGETHER
WORKS
WITH
EVERY
TYPE
OF
DATA
CHANGES
THE
ECONOMICS
OF
DATA
MANGAGEMENT
A
revolu@onary
solu@on
powered
by
Apache
Hadoop
CLOUDERA
NAVIGATOR
4. “
About
NGDATA
NGDATA
is
the
next
genera@on
Customer
Intelligence
company
that
enables
ac@onable
customer
insights,
personalized
product
offers
and
in@mate
customer
experience
with
a
unique
combina@on
of
interac@ve
Big
Data
management
and
machine
learning
technologies
in
one
integrated
solu@on.
Business Expertise
Enterprise
Architectures
Big Data Technology
Machine
Learning,
Algorithms,
Analytics
Customer
Intelligence
VISION
&
EXPERTISE
SOLUTION
Customer Database
Enterprise Data
Reference
Data
Customer
Data
Customer
Engagement
Governance
and Risk
Management
Insights, Trends
and Analysis
lily
A
Next
GeneraVon
Customer
Intelligence
Company
5. Agenda
§ Why
Search?
§ What
is
Cloudera
Search?
§ Using
Cloudera
Search
§ Learn
more
7. Cloudera’s
Enterprise
Strategy
An
Integrated
Part
of
the
Hadoop
System
One
pool
of
data
One
security
framework
One
set
of
system
resources
One
management
interface
9. Benefits
of
Search
Improved
Big
Data
ROI
• An
interac@ve
experience
without
technical
knowledge
• Single
data
set
for
mul@ple
compu@ng
frameworks
9
Faster
Vme
to
insight
• Exploratory
analysis,
esp.
unstructured
data
• Broad
range
of
indexing
op@ons
to
accommodate
needs
Cost
efficiency
• Single
scalable
pla<orm;
no
incremental
investment
• No
need
for
separate
systems,
storage
Solid
foundaVons
and
reliability
• Solr
in
produc@on
environments
for
years
• Hadoop-‐powered
reliability
and
scalability
11. Cloudera
Search
InteracVve
search
for
Hadoop
• Full-‐text
and
faceted
naviga@on
• Batch,
near
real-‐@me,
and
on-‐demand
indexing
11
Apache
Solr
integrated
with
CDH
• Established,
mature
search
with
vibrant
community
• Separate
run@me
like
MapReduce,
Impala
• Incorporated
as
part
of
the
Hadoop
ecosystem
Open
Source
• 100%
Apache,
100%
Solr
• Standard
Solr
APIs
12. Scalable
and
Robust
Index
Storage
HDFS
Lucene
Extrac@on
Mapping
Solr
Zookeeper
SolrCloud
Querying
API
Indexing
API
12
Solr
and
HDFS
• Scalable,
cost-‐efficient
index
storage
• Higher
availability
• Search
and
process
data
in
one
pla<orm
13. Near
Real
Time
Indexing
at
Ingest
Log
File
Solr
and
Flume
• Data
ingest
at
scale
• Flexible
extrac@on
and
mapping
• Indexing
at
data
ingest
• Document-‐level
ACL
HDFS
Flume
Agent
Indexer
Other
Log
File
Flume
Agent
Indexer
13
14. Streamlined
Extrac@on
and
Mapping
Cloudera
Morphlines
• Simple
and
flexible
data
transforma@on
• Reusable
across
mul@ple
index
workloads
• Over
@me,
extend
and
re-‐use
across
pla<orm
workloads
syslog
Flume
Agent
Solr
sink
Command:
readLine
Command:
grok
Command:
loadSolr
Solr
Event
Record
Record
Record
Document
15. Scalable
Batch
Indexing
Index
shard
Files
Index
shard
Indexer
Files
Solr
server
Indexer
Solr
server
15
HDFS
Solr
and
MapReduce
• Flexible,
scalable
batch
indexing
• Start
serving
new
indices
with
no
down@me
• On-‐demand
indexing,
cost-‐
efficient
re-‐indexing
16. Scalable
Batch
Indexing
16
Mapper:
Parse
input
into
indexable
document
Mapper:
Parse
input
into
indexable
document
Mapper:
Parse
input
into
indexable
document
Index
shard
1
Index
shard
2
Arbitrary
reducing
steps
of
indexing
and
merging
End-‐Reducer
(shard
1):
Index
document
End-‐Reducer
(shard
2):
Index
document
17. Searchable
Real-‐Time
Data
Indexing
HBase
HDFS
HBase
interac@ve
load
Indexer(s)
Triggers
on
updates
Solr
server
Solr
server
Solr
server
Solr
server
Solr
server
Search
+
=
planet-‐sized
tabular
data
immediate
access
&
updates
fast
&
flexible
informaVon
discovery
BIG
DATA
DATAMANAGEMENT
18. Searchable
Real-‐Time
Data
HBase
&
Search
HBase
SEP
Triggers
&
Indexer
• HBase
replica@on
mechanism
for
reliable
indexing
• light-‐weight,
zero
impact
on
write
performance
• easy
to
set
up
&
integrate
• flexible,
configura@on-‐based
mapping
&
content
extrac@on
Many
use
cases
• indexes
near-‐real-‐@me
HBase
updates
into
Solr
• fielded
search
on
HBase
columns
• faceted
search
• query
by
example
• datacube
• secondary
indexes
19. Simple,
Customizable
Search
Interface
Hue
• Simple
UI
• Navigated,
faceted
drill
down
• Customizable
display
• Full
text
search,
standard
Solr
API
and
query
language
20. Simplified
Management
Cloudera
Manager
• Install,
configure,
deploy
Solr
services
on
the
cluster
• Unified
management
and
monitoring
• Resource
management
22. Skybox
• Advanced
parallel
image
processing
on
images
stored
in
HDFS
• Before:
difficult
to
interac@vely
evaluate
image
quality
and
correlate
with
satellite
logs
• Now:
Index
images
and
satellite
logs
at
acquisi@on
and
on
demand,
interac@vely
introspect
image
quality
Scalable,
efficient
image
search
for
analysis
and
process
improvement
23. Explorys
Medical
"Hadoop
has
been
Explorys'
center
of
gravity
for
data
management
since
the
company's
incep@on.
The
addi@on
of
Search
to
Cloudera's
pla<orm
expands
its
usability
by
suppor@ng
more
workloads
and
reducing
data
movement
between
infrastructure
systems.
Deploying
Cloudera
Search
supports
Explorys'
mission
to
help
healthcare
providers
deliver
beker,
more
cost
efficient
care
through
fast,
flexible
data
analysis."
-‐-‐
Michael
Onders,
SVP
&
CTO,
Explorys
Event,
exploraVon,
and
data
correlaVon
to
meet
SLAs
24. Pakerns
and
Predic@ons
• Iden@fy
pakerns
in
social
media
and
perform
analy@cs
on
term
usage
to
improve
suicide
predic@ve
capability
• Before:
Social
media
data
sets
too
large;
tradi@onal
enterprise
search
• Now:
Near
real-‐@me
correla@on
of
medical
records,
notes,
social
media;
access
for
doctors
and
non-‐tech
staff
ProacVve
healthcare
for
returning
military
veterans
25. Ques@ons
• Ask
on
the
Q&A
tab
• Recording
will
be
available
at
cloudera.com
• A^er
webinar,
inquire
at:
info@cloudera.com
• Presenters
contact
info:
eva@cloudera.com
stevenn@ngdata.com
Thank
you
for
a,ending!
25
Download
Cloudera
Search
cloudera.com/downloads
Learn
more
about
Cloudera
Search,
powered
by
Solr
cloudera.com/search
Learn
more
about
NGDATA
and
Lily
www.ngdata.com