SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Analytics for Object Storage Simplified
- Unified File and Object for Hadoop
Sandeep R Patil
STSM, Master Inventor, IBM Spectrum Scale
Smita Raut
Object Development Lead, IBM Spectrum Scale
Acknowledgement : Bill Owen, Tomer Perry, Dean Hildebrand,
Piyush Chaudhary, Yong Zeng, Wei Gong, Theodore Hoover Jr,
Muthuannamalai Muthiah.
• Part 1 : Need as well as Design Points for Unified File and Object
 Introduction to Object Storage
 Unified File & Object Access
 Use Cases Enabled By UFO
• Part 2: Analytics with Unified File and Object
 Big Data Analytics and Challanges
 Design Points, Approach and Solution
 Unified File & Object Store and Congnitive Computing
Agenda
 Object Storage Introduction
3
Part 1 : Need as well as Design Points for Unified File and Object
Introduction to Object Store
• Object storage is highly available, distributed, eventually consistent storage.
• Data is stored as individual objects with unique identifier
● Flat addressing scheme that allows for greater scalability
• Has simpler data management and access
• REST-based data access
• Simple atomic operations:
• PUT, POST, GET, DELETE
● Usually software based that runs on commodity hardware
● Capable of scaling to 100s of petabytes
● Uses replication and/or erasure coding for availability instead of RAID
● Access over RESTful API over HTTP, which is a great fit for cloud and mobile applications
– Amazon S3, Swift, CDMI API
4
Object Storage Enables The Next Generation of Data
Management
Multi-Site
Cloud
Storage
Multi-Tenancy
Simpler
management
and flatter
namespace
Simple
APIs/Semanti
cs (Swift/S3,
Versioning,
Whole File
Updates)
Scalable
Metadata
Access
Scalable and
Highly-
Available
Cost Savings
Ubiquitous
Access
But Does it Create Yet Another Storage Island in Your Data Center…??
6
 Unified File and Object Access
7
What is Unified File and Object Access ?
• Accessing object using file interfaces
(SMB/NFS/POSIX) and accessing file using object
interfaces (REST) helps legacy applications
designed for file to seamlessly start integrating into the
object world.
• It allows object data to be accessed using
applications designed to process files. It allows file
data to be published as objects.
• Multi protocol access for file and object in the same
namespace (with common User ID management
capability) allows supporting and hosting data oceans
of different types of data with multiple access options.
• Optimizes various use cases and solution
architectures resulting in better efficiency as well as
cost savings.
8
<Clustered file system>
Swift (With Swift on File)
NFS/SMB/POSIXObject(http)
2
1
<Container>
File Exports created
on container level
OR
POSIX access from
container level
Objects accessed
as FilesData ingested
as Objects
3
Data ingested
as Files4
Files accessed as
Objects
Flexible Identity Management Modes
• Two Identity Management Modes
• Administrators can choose based on their need and use-case
9
Local_Mode Unified_Mode
Identity Management Modes
Object created by Object interface
will be owned by internal “swift” user
Application processing the object data
from file interface will need the required
file ACL to access the data.
Object authentication setup
is independent of File
Authentication setup
Object created from Object interface should be
owned by the user doing the Object PUT (i.e
FILE will be owned by UID/GID of the user)
Users from Object and File are expected to be
common auth and coming from same directory
service (only AD+RFC 2307 or LDAP)
Owner of the object will own and
have access to the data from file
interface.
Suitable for unified file and object access for
end users. Leverage common ILM policies
for file and object data based on data
ownership
Suitable when auth schemes for file and
object are different and unified access
is for applications
 Use Cases Enabled by Unified File Object
10
Use case 1 : Process Object Data with File-Oriented Applications and Publish Outcomes
as Objects
11
Swift on file
Container1
Virtual
Machine
Instances
Virtual
Machine
Instances
Container2
Subsidiary 1 Subsidiary 2
NFS Export
on
Container 1
NFS Export
on
Container 2
Virtual
Machine
Instances
Virtual
Machine
Instances
VM Farm for Subsidiary 1
for video processing
VM Farm for Subsidiary 2
for video processing
…. ….
Ingest
Media Objects
Media House OpenStack Cloud Platform
(Tenant = Media House Subsidiaries)
Manila Shares (NFS) exported only for Subsidiary1
Publishing Channels
Final Video (as objects)
available for streaming
Final processed videos available as
Objects in container which is used for
external publishing
Raw media content sent for media
processing which happens over files
(Object to File access)
NFS Export
on
Container 1’
Container
1’
Manila Shares (NFS) exported only for Subsidiary2
Files converted into objects for publishing
(File to Object access)
Use case 2 : Users read/write data via File and Object with Common
User Authentication and Identity
12
Clustered file system
Data
N
F
S
S
M
B
O
b
je
c
t
Data
N
F
S
S
M
B
O
b
je
c
t
User: John User: Riya
Access Common Data using the same User Credentials across all protocols
Corporate User
Directory
(Active Directory/LDAP)
Riya’s data Read/Written
from Object should be
owned by Riya when
accessed from File
(SMB/NFS/POSIX)
User: Riya
UID: 1001
GID: 2000
Domain: XYZ
We have now understood
Part 1: Need as well as Design for Unified File and Object
….. Let us now deep dive on
Part 2: Analytics with Unified File and Object
 Big Data Analytics and Challenges
14
Analytics – Broadly Categorized Into Two Sets
CongnitiveTraditional Analytics
Big Data
 Big data is a term for data sets that are so large or complex that traditional data
processing applications (database management tools or traditional data processing
applications) are inadequate.
 The challenges include capture, curation, storage, search, sharing, transfer, analysis, and
visualization.
Characteristics
 Volume
– The quantity of generated and stored data.
 Variety
‒ The type and nature of the data.
 Velocity
‒ Speed at which the data is generated and processed
 Variability
‒ Inconsistency of data sets can hamper processes manage it
 Veracity
‒ Quality of captured data can vary greatly, affecting accuracy
Challenges with the Early Big Data Storage Models
Move data to the
analytics engine
...
Perform
analytics
Ingest data at
various end points
Repeat!
It takes hours or days
to move the data!!
Key Business processes are
now depend on the analytics!
Can’t just throw away data due to
regulations or business requirement
!
...
It’s not just
one type of
analytics
!
!
More data source than ever
before, not just data you own,
but public or rented data
 Design Points, Approach & Solution
18
Powered by
Disk Tape Shared Nothing
Cluster
Flash Off Premise
Geographically
dispersed
management of
data including
disaster recovery
Compute
farm
Traditional
applications
New Gen
applications
common name space
What are the Solution Design Points that we came across?
Bring analytics to the data
Encryption for protection of your data
Single Name Space to
house all your data (Files
and Object) Unified data access with File & Object
1
2
5
6
3
Optimize economics based on value of the data4
Block
iSCSI
Client
workstations
Users and
applications
Compute
farm
Traditional
applications
Global Namespace
Analytics
Transparent
HDFS
Spark
OpenStack
Cinder
Glance
Manilla
Object
Swift S3
Powered by
Clustered File System
Automated data placement and data migration
Disk Tape Shared Nothing
Cluster
Flash
New Gen
applications
Transparent
Cloud Tier
| 20
Worldwide Data
Distribution
Site B
Site A
Site C
How Did We Approach The Solution & address the Design Points?
- Took the Data Ocean Approach
SMBNFS
POSIX
File
Encryption DR Site
JBOD/JBOF
Spectrum Scale
RAID
4000+
customers
Unified File and Object – as explained previously
Meeting Design Point 1
Meeting Design Point 2
Meeting Design Point 3
Meeting Design Point 4Meeting Design Point 5
Meeting Design Point 6
Meeting Design Point 6 – Bring Analytics to Data
Apache Hadoop - Key Platform for Big Data and Analytics
 An open-source software framework and most popular BD&A platform
 Designed for distributed storage and processing of very large data sets on computer
clusters built from commodity hardware
 Core of Hadoop consists of
o A processing part called MapReduce
o A storage part, known as Hadoop Distributed File System (HDFS)
o Hadoop common libraries and components
 Leading Hadoop Distro: HortonWorks, CloudEra, MapR, IBM IOP/BigInsights
 HDFS is a shared nothing architecture, which is very inefficient for
high throughput jobs (disks and cores grow in same ratio)
 Costly data protection:
 uses 3-way replication; limited RAID/erasure coding
 Works only with Hadoop i.e weak support for File or Object
protocols
 Clients have to copy data from enterprise storage to HDFS in order
to run Hadoop jobs, this can result in running on stale data.
Meeting Design Point 6 – Bring Analytics to Data
HDFS Shortcomings
Meeting Design Point 6 – How to Bring Analytics to Data ?
Desired Solution: Need In place Analytics (No
Copies Required). Clustered Filesystem
should support HDFS Connectors
How did we design to overcome the inhibitors: Developing a HDFS Transparency
Connector with Unified File and Object Access
6/26/2
017
24
Map/Reduce API
Hadoop FS APIs
Higher-level languages:
Hive, BigSQL JAQL, Pig …
Applications
HDFS Client
Spectrum Scale
HDFS RPC
Hadoop client
Hadoop FileSystem
API
Connector on
libgpfs,posix API
Hadoop client
Hadoop FileSystem
API
Connector on
libgpfs,posix API
GPFS node
Hadoop client
Hadoop FileSystem
API
GPFS node
Connector
Power
Linux
Power
Linux
Commodity hardware Shared storage
hdfs://hostnameX:portnumber
HDFS Client HDFS Client HDFS Client
GPFS Connector
Service
GPFS Connector
Service
HDFS RPC over
network
Spectrum Scale
Connector Server
Meeting Design Point 6- “In-Place” Analytics for Unified File and
Object Data Achieved.
25
IBM Spectrum Scale
Unified File and Object fileset
Object
(http)
Data ingested
as Objects
Spark or Hadoop
MapReduce
In-Place Analytics
Source:https://aws.amazon.com/elasticmapreduce/
Traditional object store – Data to be copied from
object store to dedicated cluster , do the analysis
and copy the result back to object store for
publishing
Object store with Unified File and Object Access –
Object Data available as File on the same fileset. Analytics systems like
Hadoop MapReduce or Spark allow the data to be directly leveraged for
analytics.
No data movement i.e. In-Place immediate data analytics.
Analytics With Unified File and Object AccessAnalytics on Traditional Object Store
Explicit Data movement
Results Published
as Objects with
no data movement
Results returned
in place
 Unified File & Object Store and Congnitive Computing
26
• Cognitive services are based on a technology that encompasses machine learning, reasoning, natural language processing,
speech and vision and more
• IBM Watson Developer Cloud enables cognitive computing features in your app using IBM Watson’s Language, Vision,
Speech and Data APIs.
Examples of Watson Services
Visual Recognition
Input Audio / Voice
Speech to Text Tone Analyzer
Cognitive Services help derive the meaning
of data
Cognitive Services
Some more…
• Alchemy Language – Text Analysis to give
Sentiments of the Document
• Language Translation – Translate and
publish content in multiple languages
• Personality insights – Uncover a deeper
understanding of people's personality
• Retrieve and Rank – Enhance information
retrieval with machine learning
• Natural language Classifier – Interpret and
classify natural language with confidence
And Many More …
Cognitive Service with Unified File and Object
We need a provision to
Cognitively Auto-tag
Heterogeneous Unstructured data
in form of Object to Leverage its
benefits….
28
Solution : Integration of Cognitive
Computing Services with Object Storage
for auto-tagging of unstructured data in
the form of objects.
IBM Spectrum Scale
Object Storage
Clicks
Photo
From Phone
Service to feed Object
to Watson
{
“WatsonTags” :
“Eiffel“: 94.78%,
“Tower“ :
85.81%,
“Tour“: 66.82%
}
How it works
• IBM has open sourced a sample code that will allow you to auto tag objects in
form of images ( hosted over IBM Spectrum Scale Object)
• Solution can be extended to other object types depending on business need
• The code and instruction are available on GitHub under Apache 3.0 license
https://github.com/SpectrumScale/watson-spectrum-scale-object-integration
SwiftonFile based
object storage
Thank You

Weitere ähnliche Inhalte

Was ist angesagt?

Ibm spectrum scale fundamentals workshop for americas part 1 components archi...
Ibm spectrum scale fundamentals workshop for americas part 1 components archi...Ibm spectrum scale fundamentals workshop for americas part 1 components archi...
Ibm spectrum scale fundamentals workshop for americas part 1 components archi...xKinAnx
 
Webinar: How Snapshots CAN be Backups
Webinar: How Snapshots CAN be BackupsWebinar: How Snapshots CAN be Backups
Webinar: How Snapshots CAN be BackupsStorage Switzerland
 
SNIA : Swift Object Storage adding EC (Erasure Code)
SNIA : Swift Object Storage adding EC (Erasure Code)SNIA : Swift Object Storage adding EC (Erasure Code)
SNIA : Swift Object Storage adding EC (Erasure Code)Odinot Stanislas
 
Elastic storage in the cloud session 5224 final v2
Elastic storage in the cloud session 5224 final v2Elastic storage in the cloud session 5224 final v2
Elastic storage in the cloud session 5224 final v2BradDesAulniers2
 
Ozone and HDFS’s evolution
Ozone and HDFS’s evolutionOzone and HDFS’s evolution
Ozone and HDFS’s evolutionDataWorks Summit
 
Genomics Deployments - How to Get Right with Software Defined Storage
 Genomics Deployments -  How to Get Right with Software Defined Storage Genomics Deployments -  How to Get Right with Software Defined Storage
Genomics Deployments - How to Get Right with Software Defined StorageSandeep Patil
 
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...DataWorks Summit
 
SoftLayer Object Storage Overview
SoftLayer Object Storage OverviewSoftLayer Object Storage Overview
SoftLayer Object Storage OverviewMichael Fork
 
VTU 6th Sem Elective CSE - Module 5 cloud computing
VTU 6th Sem Elective CSE - Module 5 cloud computingVTU 6th Sem Elective CSE - Module 5 cloud computing
VTU 6th Sem Elective CSE - Module 5 cloud computingSachin Gowda
 
VTU Open Elective 6th Sem CSE - Module 2 - Cloud Computing
VTU Open Elective 6th Sem CSE - Module 2 - Cloud ComputingVTU Open Elective 6th Sem CSE - Module 2 - Cloud Computing
VTU Open Elective 6th Sem CSE - Module 2 - Cloud ComputingSachin Gowda
 
Proactive Threat Detection and Safeguarding of Data for Enhanced Cyber resili...
Proactive Threat Detection and Safeguarding of Data for Enhanced Cyber resili...Proactive Threat Detection and Safeguarding of Data for Enhanced Cyber resili...
Proactive Threat Detection and Safeguarding of Data for Enhanced Cyber resili...Sandeep Patil
 
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column EncryptionProtect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column EncryptionDataWorks Summit
 
Nordic infrastructure Conference 2017 - SQL Server on Linux Overview
Nordic infrastructure Conference 2017 - SQL Server on Linux OverviewNordic infrastructure Conference 2017 - SQL Server on Linux Overview
Nordic infrastructure Conference 2017 - SQL Server on Linux OverviewTravis Wright
 
Cleversafe august 2016
Cleversafe august 2016Cleversafe august 2016
Cleversafe august 2016Joe Krotz
 
IBM's Cloud Storage Options
IBM's Cloud Storage OptionsIBM's Cloud Storage Options
IBM's Cloud Storage OptionsTony Pearson
 
IBM Cloud Object Storage System (powered by Cleversafe) and its Applications
IBM Cloud Object Storage System (powered by Cleversafe) and its ApplicationsIBM Cloud Object Storage System (powered by Cleversafe) and its Applications
IBM Cloud Object Storage System (powered by Cleversafe) and its ApplicationsTony Pearson
 
Implementing Security on a Large Multi-Tenant Cluster the Right Way
Implementing Security on a Large Multi-Tenant Cluster the Right WayImplementing Security on a Large Multi-Tenant Cluster the Right Way
Implementing Security on a Large Multi-Tenant Cluster the Right WayDataWorks Summit
 
IBM Spectrum Scale on the Cloud
IBM Spectrum Scale on the CloudIBM Spectrum Scale on the Cloud
IBM Spectrum Scale on the CloudTony Pearson
 

Was ist angesagt? (20)

Ibm spectrum scale fundamentals workshop for americas part 1 components archi...
Ibm spectrum scale fundamentals workshop for americas part 1 components archi...Ibm spectrum scale fundamentals workshop for americas part 1 components archi...
Ibm spectrum scale fundamentals workshop for americas part 1 components archi...
 
Webinar: How Snapshots CAN be Backups
Webinar: How Snapshots CAN be BackupsWebinar: How Snapshots CAN be Backups
Webinar: How Snapshots CAN be Backups
 
SNIA : Swift Object Storage adding EC (Erasure Code)
SNIA : Swift Object Storage adding EC (Erasure Code)SNIA : Swift Object Storage adding EC (Erasure Code)
SNIA : Swift Object Storage adding EC (Erasure Code)
 
Elastic storage in the cloud session 5224 final v2
Elastic storage in the cloud session 5224 final v2Elastic storage in the cloud session 5224 final v2
Elastic storage in the cloud session 5224 final v2
 
Ozone and HDFS’s evolution
Ozone and HDFS’s evolutionOzone and HDFS’s evolution
Ozone and HDFS’s evolution
 
Iaas storage-170302090824
Iaas storage-170302090824Iaas storage-170302090824
Iaas storage-170302090824
 
IBM Spectrum Scale Slidecast
IBM Spectrum Scale SlidecastIBM Spectrum Scale Slidecast
IBM Spectrum Scale Slidecast
 
Genomics Deployments - How to Get Right with Software Defined Storage
 Genomics Deployments -  How to Get Right with Software Defined Storage Genomics Deployments -  How to Get Right with Software Defined Storage
Genomics Deployments - How to Get Right with Software Defined Storage
 
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
 
SoftLayer Object Storage Overview
SoftLayer Object Storage OverviewSoftLayer Object Storage Overview
SoftLayer Object Storage Overview
 
VTU 6th Sem Elective CSE - Module 5 cloud computing
VTU 6th Sem Elective CSE - Module 5 cloud computingVTU 6th Sem Elective CSE - Module 5 cloud computing
VTU 6th Sem Elective CSE - Module 5 cloud computing
 
VTU Open Elective 6th Sem CSE - Module 2 - Cloud Computing
VTU Open Elective 6th Sem CSE - Module 2 - Cloud ComputingVTU Open Elective 6th Sem CSE - Module 2 - Cloud Computing
VTU Open Elective 6th Sem CSE - Module 2 - Cloud Computing
 
Proactive Threat Detection and Safeguarding of Data for Enhanced Cyber resili...
Proactive Threat Detection and Safeguarding of Data for Enhanced Cyber resili...Proactive Threat Detection and Safeguarding of Data for Enhanced Cyber resili...
Proactive Threat Detection and Safeguarding of Data for Enhanced Cyber resili...
 
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column EncryptionProtect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
 
Nordic infrastructure Conference 2017 - SQL Server on Linux Overview
Nordic infrastructure Conference 2017 - SQL Server on Linux OverviewNordic infrastructure Conference 2017 - SQL Server on Linux Overview
Nordic infrastructure Conference 2017 - SQL Server on Linux Overview
 
Cleversafe august 2016
Cleversafe august 2016Cleversafe august 2016
Cleversafe august 2016
 
IBM's Cloud Storage Options
IBM's Cloud Storage OptionsIBM's Cloud Storage Options
IBM's Cloud Storage Options
 
IBM Cloud Object Storage System (powered by Cleversafe) and its Applications
IBM Cloud Object Storage System (powered by Cleversafe) and its ApplicationsIBM Cloud Object Storage System (powered by Cleversafe) and its Applications
IBM Cloud Object Storage System (powered by Cleversafe) and its Applications
 
Implementing Security on a Large Multi-Tenant Cluster the Right Way
Implementing Security on a Large Multi-Tenant Cluster the Right WayImplementing Security on a Large Multi-Tenant Cluster the Right Way
Implementing Security on a Large Multi-Tenant Cluster the Right Way
 
IBM Spectrum Scale on the Cloud
IBM Spectrum Scale on the CloudIBM Spectrum Scale on the Cloud
IBM Spectrum Scale on the Cloud
 

Ähnlich wie Analytics with unified file and object

Spectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN CachingSpectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN CachingSandeep Patil
 
Software Defined Analytics with File and Object Access Plus Geographically Di...
Software Defined Analytics with File and Object Access Plus Geographically Di...Software Defined Analytics with File and Object Access Plus Geographically Di...
Software Defined Analytics with File and Object Access Plus Geographically Di...Trishali Nayar
 
DSpace Current State, Concerns and Solution by DSquare Technologies (DSpace S...
DSpace Current State, Concerns and Solution by DSquare Technologies (DSpace S...DSpace Current State, Concerns and Solution by DSquare Technologies (DSpace S...
DSpace Current State, Concerns and Solution by DSquare Technologies (DSpace S...DSquare Technologies
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5Doug O'Flaherty
 
Overview of Big Data by Sunny
Overview of Big Data by SunnyOverview of Big Data by Sunny
Overview of Big Data by SunnyDignitasDigital1
 
Storage solutions for High Performance Computing
Storage solutions for High Performance ComputingStorage solutions for High Performance Computing
Storage solutions for High Performance Computinggmateesc
 
Cochrane von Suchodoletz File Creation, Rendering and Formats
Cochrane von Suchodoletz File Creation, Rendering and FormatsCochrane von Suchodoletz File Creation, Rendering and Formats
Cochrane von Suchodoletz File Creation, Rendering and FormatsFuture Perfect 2012
 
Object Storage Overview
Object Storage OverviewObject Storage Overview
Object Storage OverviewCloudian
 
Unit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxUnit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxAnkitChauhan817826
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiridatastack
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecasesudhakara st
 
Apache hadoop and hive
Apache hadoop and hiveApache hadoop and hive
Apache hadoop and hivesrikanthhadoop
 
Security Threats to Hadoop: Data Leakage Attacks and Investigation
Security Threats to Hadoop: Data Leakage Attacks  and InvestigationSecurity Threats to Hadoop: Data Leakage Attacks  and Investigation
Security Threats to Hadoop: Data Leakage Attacks and Investigation Kiran Gajbhiye
 

Ähnlich wie Analytics with unified file and object (20)

Object storage
Object storageObject storage
Object storage
 
Spectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN CachingSpectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN Caching
 
Software Defined Analytics with File and Object Access Plus Geographically Di...
Software Defined Analytics with File and Object Access Plus Geographically Di...Software Defined Analytics with File and Object Access Plus Geographically Di...
Software Defined Analytics with File and Object Access Plus Geographically Di...
 
Hadoop training in bangalore
Hadoop training in bangaloreHadoop training in bangalore
Hadoop training in bangalore
 
DSpace Current State, Concerns and Solution by DSquare Technologies (DSpace S...
DSpace Current State, Concerns and Solution by DSquare Technologies (DSpace S...DSpace Current State, Concerns and Solution by DSquare Technologies (DSpace S...
DSpace Current State, Concerns and Solution by DSquare Technologies (DSpace S...
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Nov 2010 HUG: Fuzzy Table - B.A.H
Nov 2010 HUG: Fuzzy Table - B.A.HNov 2010 HUG: Fuzzy Table - B.A.H
Nov 2010 HUG: Fuzzy Table - B.A.H
 
Hadoop data management
Hadoop data managementHadoop data management
Hadoop data management
 
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
 
Overview of Big Data by Sunny
Overview of Big Data by SunnyOverview of Big Data by Sunny
Overview of Big Data by Sunny
 
Storage solutions for High Performance Computing
Storage solutions for High Performance ComputingStorage solutions for High Performance Computing
Storage solutions for High Performance Computing
 
Cochrane von Suchodoletz File Creation, Rendering and Formats
Cochrane von Suchodoletz File Creation, Rendering and FormatsCochrane von Suchodoletz File Creation, Rendering and Formats
Cochrane von Suchodoletz File Creation, Rendering and Formats
 
Object Storage Overview
Object Storage OverviewObject Storage Overview
Object Storage Overview
 
DAOS Middleware overview
DAOS Middleware overviewDAOS Middleware overview
DAOS Middleware overview
 
Unit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxUnit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptx
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecase
 
Apache hadoop and hive
Apache hadoop and hiveApache hadoop and hive
Apache hadoop and hive
 
Security Threats to Hadoop: Data Leakage Attacks and Investigation
Security Threats to Hadoop: Data Leakage Attacks  and InvestigationSecurity Threats to Hadoop: Data Leakage Attacks  and Investigation
Security Threats to Hadoop: Data Leakage Attacks and Investigation
 

Mehr von Sandeep Patil

IBM Spectrum Scale Secure- Secure Data in Motion and Rest
IBM Spectrum Scale Secure- Secure Data in Motion and RestIBM Spectrum Scale Secure- Secure Data in Motion and Rest
IBM Spectrum Scale Secure- Secure Data in Motion and RestSandeep Patil
 
Spectrum Scale Best Practices by Olaf Weiser
Spectrum Scale Best Practices by Olaf WeiserSpectrum Scale Best Practices by Olaf Weiser
Spectrum Scale Best Practices by Olaf WeiserSandeep Patil
 
IBM Spectrum Scale Networking Flow
IBM Spectrum Scale Networking FlowIBM Spectrum Scale Networking Flow
IBM Spectrum Scale Networking FlowSandeep Patil
 
IBM Spectrum Scale Authentication for Protocols
IBM Spectrum Scale Authentication for ProtocolsIBM Spectrum Scale Authentication for Protocols
IBM Spectrum Scale Authentication for ProtocolsSandeep Patil
 
IBM Spectrum Scale Security
IBM Spectrum Scale Security IBM Spectrum Scale Security
IBM Spectrum Scale Security Sandeep Patil
 
IBM Spectrum Scale and Its Use for Content Management
 IBM Spectrum Scale and Its Use for Content Management IBM Spectrum Scale and Its Use for Content Management
IBM Spectrum Scale and Its Use for Content ManagementSandeep Patil
 

Mehr von Sandeep Patil (6)

IBM Spectrum Scale Secure- Secure Data in Motion and Rest
IBM Spectrum Scale Secure- Secure Data in Motion and RestIBM Spectrum Scale Secure- Secure Data in Motion and Rest
IBM Spectrum Scale Secure- Secure Data in Motion and Rest
 
Spectrum Scale Best Practices by Olaf Weiser
Spectrum Scale Best Practices by Olaf WeiserSpectrum Scale Best Practices by Olaf Weiser
Spectrum Scale Best Practices by Olaf Weiser
 
IBM Spectrum Scale Networking Flow
IBM Spectrum Scale Networking FlowIBM Spectrum Scale Networking Flow
IBM Spectrum Scale Networking Flow
 
IBM Spectrum Scale Authentication for Protocols
IBM Spectrum Scale Authentication for ProtocolsIBM Spectrum Scale Authentication for Protocols
IBM Spectrum Scale Authentication for Protocols
 
IBM Spectrum Scale Security
IBM Spectrum Scale Security IBM Spectrum Scale Security
IBM Spectrum Scale Security
 
IBM Spectrum Scale and Its Use for Content Management
 IBM Spectrum Scale and Its Use for Content Management IBM Spectrum Scale and Its Use for Content Management
IBM Spectrum Scale and Its Use for Content Management
 

Kürzlich hochgeladen

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 

Kürzlich hochgeladen (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 

Analytics with unified file and object

  • 1. Analytics for Object Storage Simplified - Unified File and Object for Hadoop Sandeep R Patil STSM, Master Inventor, IBM Spectrum Scale Smita Raut Object Development Lead, IBM Spectrum Scale Acknowledgement : Bill Owen, Tomer Perry, Dean Hildebrand, Piyush Chaudhary, Yong Zeng, Wei Gong, Theodore Hoover Jr, Muthuannamalai Muthiah.
  • 2. • Part 1 : Need as well as Design Points for Unified File and Object  Introduction to Object Storage  Unified File & Object Access  Use Cases Enabled By UFO • Part 2: Analytics with Unified File and Object  Big Data Analytics and Challanges  Design Points, Approach and Solution  Unified File & Object Store and Congnitive Computing Agenda
  • 3.  Object Storage Introduction 3 Part 1 : Need as well as Design Points for Unified File and Object
  • 4. Introduction to Object Store • Object storage is highly available, distributed, eventually consistent storage. • Data is stored as individual objects with unique identifier ● Flat addressing scheme that allows for greater scalability • Has simpler data management and access • REST-based data access • Simple atomic operations: • PUT, POST, GET, DELETE ● Usually software based that runs on commodity hardware ● Capable of scaling to 100s of petabytes ● Uses replication and/or erasure coding for availability instead of RAID ● Access over RESTful API over HTTP, which is a great fit for cloud and mobile applications – Amazon S3, Swift, CDMI API 4
  • 5. Object Storage Enables The Next Generation of Data Management Multi-Site Cloud Storage Multi-Tenancy Simpler management and flatter namespace Simple APIs/Semanti cs (Swift/S3, Versioning, Whole File Updates) Scalable Metadata Access Scalable and Highly- Available Cost Savings Ubiquitous Access
  • 6. But Does it Create Yet Another Storage Island in Your Data Center…?? 6
  • 7.  Unified File and Object Access 7
  • 8. What is Unified File and Object Access ? • Accessing object using file interfaces (SMB/NFS/POSIX) and accessing file using object interfaces (REST) helps legacy applications designed for file to seamlessly start integrating into the object world. • It allows object data to be accessed using applications designed to process files. It allows file data to be published as objects. • Multi protocol access for file and object in the same namespace (with common User ID management capability) allows supporting and hosting data oceans of different types of data with multiple access options. • Optimizes various use cases and solution architectures resulting in better efficiency as well as cost savings. 8 <Clustered file system> Swift (With Swift on File) NFS/SMB/POSIXObject(http) 2 1 <Container> File Exports created on container level OR POSIX access from container level Objects accessed as FilesData ingested as Objects 3 Data ingested as Files4 Files accessed as Objects
  • 9. Flexible Identity Management Modes • Two Identity Management Modes • Administrators can choose based on their need and use-case 9 Local_Mode Unified_Mode Identity Management Modes Object created by Object interface will be owned by internal “swift” user Application processing the object data from file interface will need the required file ACL to access the data. Object authentication setup is independent of File Authentication setup Object created from Object interface should be owned by the user doing the Object PUT (i.e FILE will be owned by UID/GID of the user) Users from Object and File are expected to be common auth and coming from same directory service (only AD+RFC 2307 or LDAP) Owner of the object will own and have access to the data from file interface. Suitable for unified file and object access for end users. Leverage common ILM policies for file and object data based on data ownership Suitable when auth schemes for file and object are different and unified access is for applications
  • 10.  Use Cases Enabled by Unified File Object 10
  • 11. Use case 1 : Process Object Data with File-Oriented Applications and Publish Outcomes as Objects 11 Swift on file Container1 Virtual Machine Instances Virtual Machine Instances Container2 Subsidiary 1 Subsidiary 2 NFS Export on Container 1 NFS Export on Container 2 Virtual Machine Instances Virtual Machine Instances VM Farm for Subsidiary 1 for video processing VM Farm for Subsidiary 2 for video processing …. …. Ingest Media Objects Media House OpenStack Cloud Platform (Tenant = Media House Subsidiaries) Manila Shares (NFS) exported only for Subsidiary1 Publishing Channels Final Video (as objects) available for streaming Final processed videos available as Objects in container which is used for external publishing Raw media content sent for media processing which happens over files (Object to File access) NFS Export on Container 1’ Container 1’ Manila Shares (NFS) exported only for Subsidiary2 Files converted into objects for publishing (File to Object access)
  • 12. Use case 2 : Users read/write data via File and Object with Common User Authentication and Identity 12 Clustered file system Data N F S S M B O b je c t Data N F S S M B O b je c t User: John User: Riya Access Common Data using the same User Credentials across all protocols Corporate User Directory (Active Directory/LDAP) Riya’s data Read/Written from Object should be owned by Riya when accessed from File (SMB/NFS/POSIX) User: Riya UID: 1001 GID: 2000 Domain: XYZ
  • 13. We have now understood Part 1: Need as well as Design for Unified File and Object ….. Let us now deep dive on Part 2: Analytics with Unified File and Object
  • 14.  Big Data Analytics and Challenges 14
  • 15. Analytics – Broadly Categorized Into Two Sets CongnitiveTraditional Analytics
  • 16. Big Data  Big data is a term for data sets that are so large or complex that traditional data processing applications (database management tools or traditional data processing applications) are inadequate.  The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization. Characteristics  Volume – The quantity of generated and stored data.  Variety ‒ The type and nature of the data.  Velocity ‒ Speed at which the data is generated and processed  Variability ‒ Inconsistency of data sets can hamper processes manage it  Veracity ‒ Quality of captured data can vary greatly, affecting accuracy
  • 17. Challenges with the Early Big Data Storage Models Move data to the analytics engine ... Perform analytics Ingest data at various end points Repeat! It takes hours or days to move the data!! Key Business processes are now depend on the analytics! Can’t just throw away data due to regulations or business requirement ! ... It’s not just one type of analytics ! ! More data source than ever before, not just data you own, but public or rented data
  • 18.  Design Points, Approach & Solution 18
  • 19. Powered by Disk Tape Shared Nothing Cluster Flash Off Premise Geographically dispersed management of data including disaster recovery Compute farm Traditional applications New Gen applications common name space What are the Solution Design Points that we came across? Bring analytics to the data Encryption for protection of your data Single Name Space to house all your data (Files and Object) Unified data access with File & Object 1 2 5 6 3 Optimize economics based on value of the data4
  • 20. Block iSCSI Client workstations Users and applications Compute farm Traditional applications Global Namespace Analytics Transparent HDFS Spark OpenStack Cinder Glance Manilla Object Swift S3 Powered by Clustered File System Automated data placement and data migration Disk Tape Shared Nothing Cluster Flash New Gen applications Transparent Cloud Tier | 20 Worldwide Data Distribution Site B Site A Site C How Did We Approach The Solution & address the Design Points? - Took the Data Ocean Approach SMBNFS POSIX File Encryption DR Site JBOD/JBOF Spectrum Scale RAID 4000+ customers Unified File and Object – as explained previously Meeting Design Point 1 Meeting Design Point 2 Meeting Design Point 3 Meeting Design Point 4Meeting Design Point 5 Meeting Design Point 6
  • 21. Meeting Design Point 6 – Bring Analytics to Data Apache Hadoop - Key Platform for Big Data and Analytics  An open-source software framework and most popular BD&A platform  Designed for distributed storage and processing of very large data sets on computer clusters built from commodity hardware  Core of Hadoop consists of o A processing part called MapReduce o A storage part, known as Hadoop Distributed File System (HDFS) o Hadoop common libraries and components  Leading Hadoop Distro: HortonWorks, CloudEra, MapR, IBM IOP/BigInsights
  • 22.  HDFS is a shared nothing architecture, which is very inefficient for high throughput jobs (disks and cores grow in same ratio)  Costly data protection:  uses 3-way replication; limited RAID/erasure coding  Works only with Hadoop i.e weak support for File or Object protocols  Clients have to copy data from enterprise storage to HDFS in order to run Hadoop jobs, this can result in running on stale data. Meeting Design Point 6 – Bring Analytics to Data HDFS Shortcomings
  • 23. Meeting Design Point 6 – How to Bring Analytics to Data ? Desired Solution: Need In place Analytics (No Copies Required). Clustered Filesystem should support HDFS Connectors
  • 24. How did we design to overcome the inhibitors: Developing a HDFS Transparency Connector with Unified File and Object Access 6/26/2 017 24 Map/Reduce API Hadoop FS APIs Higher-level languages: Hive, BigSQL JAQL, Pig … Applications HDFS Client Spectrum Scale HDFS RPC Hadoop client Hadoop FileSystem API Connector on libgpfs,posix API Hadoop client Hadoop FileSystem API Connector on libgpfs,posix API GPFS node Hadoop client Hadoop FileSystem API GPFS node Connector Power Linux Power Linux Commodity hardware Shared storage hdfs://hostnameX:portnumber HDFS Client HDFS Client HDFS Client GPFS Connector Service GPFS Connector Service HDFS RPC over network Spectrum Scale Connector Server
  • 25. Meeting Design Point 6- “In-Place” Analytics for Unified File and Object Data Achieved. 25 IBM Spectrum Scale Unified File and Object fileset Object (http) Data ingested as Objects Spark or Hadoop MapReduce In-Place Analytics Source:https://aws.amazon.com/elasticmapreduce/ Traditional object store – Data to be copied from object store to dedicated cluster , do the analysis and copy the result back to object store for publishing Object store with Unified File and Object Access – Object Data available as File on the same fileset. Analytics systems like Hadoop MapReduce or Spark allow the data to be directly leveraged for analytics. No data movement i.e. In-Place immediate data analytics. Analytics With Unified File and Object AccessAnalytics on Traditional Object Store Explicit Data movement Results Published as Objects with no data movement Results returned in place
  • 26.  Unified File & Object Store and Congnitive Computing 26
  • 27. • Cognitive services are based on a technology that encompasses machine learning, reasoning, natural language processing, speech and vision and more • IBM Watson Developer Cloud enables cognitive computing features in your app using IBM Watson’s Language, Vision, Speech and Data APIs. Examples of Watson Services Visual Recognition Input Audio / Voice Speech to Text Tone Analyzer Cognitive Services help derive the meaning of data Cognitive Services Some more… • Alchemy Language – Text Analysis to give Sentiments of the Document • Language Translation – Translate and publish content in multiple languages • Personality insights – Uncover a deeper understanding of people's personality • Retrieve and Rank – Enhance information retrieval with machine learning • Natural language Classifier – Interpret and classify natural language with confidence And Many More …
  • 28. Cognitive Service with Unified File and Object We need a provision to Cognitively Auto-tag Heterogeneous Unstructured data in form of Object to Leverage its benefits…. 28 Solution : Integration of Cognitive Computing Services with Object Storage for auto-tagging of unstructured data in the form of objects. IBM Spectrum Scale Object Storage Clicks Photo From Phone Service to feed Object to Watson { “WatsonTags” : “Eiffel“: 94.78%, “Tower“ : 85.81%, “Tour“: 66.82% } How it works • IBM has open sourced a sample code that will allow you to auto tag objects in form of images ( hosted over IBM Spectrum Scale Object) • Solution can be extended to other object types depending on business need • The code and instruction are available on GitHub under Apache 3.0 license https://github.com/SpectrumScale/watson-spectrum-scale-object-integration SwiftonFile based object storage