SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Hadoop Based SQL and Big Data Analytics
Solution
Hadoop Data Tagging and
Metadata Extension
Hadoop Based SQL and Big Data Analytics
Solution
What is MetaData?
Hadoop Based SQL and Big Data Analytics
Solution
What is MetaData?
Hadoop Based SQL and Big Data Analytics
Solution
• Metadata is simply “Data about Data”.
What is MetaData?
Hadoop Based SQL and Big Data Analytics
Solution
• Metadata is simply “Data about Data”.
• In terms of file system, the metadata is the information about files like size of
file, time on which the file was created, last modified, type of file, owner of file
etc.
What is MetaData?
Hadoop Based SQL and Big Data Analytics
Solution
• Metadata is simply “Data about Data”.
• In terms of file system, the metadata is the information about files like size of
file, time on which the file was created, last modified, type of file, owner of file
etc.
• The file system manages access to both the content of files and the metadata
about those files.
What is MetaData?
Hadoop Based SQL and Big Data Analytics
Solution
• Metadata is simply “Data about Data”.
• In terms of file system, the metadata is the information about files like size of
file, time on which the file was created, last modified, type of file, owner of file
etc.
• The file system manages access to both the content of files and the metadata
about those files.
• Metadata characterizes data. It is used to provide documentation such that data
can be understood and more readily consumed by your organization. Metadata
answers the who, what, when, where, why, and how questions for users of the
data.
MetaData Extension with QueryIO
Hadoop Based SQL and Big Data Analytics
Solution
MetaData Extension with QueryIO
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides On-Ingest metadata extraction service where by extended
metadata can be extracted from the files on ingest and you don't need to worry
about running costly batch jobs later on. This enables the unstructured data on
cluster searchable readily as soon as its ingested.
MetaData Extension with QueryIO
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides On-Ingest metadata extraction service where by extended
metadata can be extracted from the files on ingest and you don't need to worry
about running costly batch jobs later on. This enables the unstructured data on
cluster searchable readily as soon as its ingested.
• To help make unstructured data searchable on Hadoop Cluster, QueryIO stores the
metadata for each file stored on Hadoop in a relational database.
MetaData Extension with QueryIO
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides On-Ingest metadata extraction service where by extended
metadata can be extracted from the files on ingest and you don't need to worry
about running costly batch jobs later on. This enables the unstructured data on
cluster searchable readily as soon as its ingested.
• To help make unstructured data searchable on Hadoop Cluster, QueryIO stores the
metadata for each file stored on Hadoop in a relational database.
• Since all the metadata and tags associated with a file are kept in a relational
database, you can leverage the existing infrastructure built around SQL to search
the data on the Hadoop cluster.
MetaData Extension with QueryIO
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides On-Ingest metadata extraction service where by extended
metadata can be extracted from the files on ingest and you don't need to worry
about running costly batch jobs later on. This enables the unstructured data on
cluster searchable readily as soon as its ingested.
• To help make unstructured data searchable on Hadoop Cluster, QueryIO stores the
metadata for each file stored on Hadoop in a relational database.
• Since all the metadata and tags associated with a file are kept in a relational
database, you can leverage the existing infrastructure built around SQL to search
the data on the Hadoop cluster.
• It understands dozens of file formats such as pdf/xls/doc file formats, image files,
audio and video files, etc.
What are Data Tags?
Hadoop Based SQL and Big Data Analytics
Solution
What are Data Tags?
Hadoop Based SQL and Big Data Analytics
Solution
• Tag is a label attached to someone or something for the purpose of identification
or to give other information.
What are Data Tags?
Hadoop Based SQL and Big Data Analytics
Solution
• Tag is a label attached to someone or something for the purpose of identification
or to give other information.
• A Data Tag is a tag attached to the data or file to provide extra information about
the data or file.
What are Data Tags?
Hadoop Based SQL and Big Data Analytics
Solution
• Tag is a label attached to someone or something for the purpose of identification
or to give other information.
• A Data Tag is a tag attached to the data or file to provide extra information about
the data or file.
• Data tags can be used to categorize the data based on various criteria to manage
vast amount of data. Finally the data can be extracted, sorted and processed
based on these categories.
What are Data Tags?
Hadoop Based SQL and Big Data Analytics
Solution
• Tag is a label attached to someone or something for the purpose of identification
or to give other information.
• A Data Tag is a tag attached to the data or file to provide extra information about
the data or file.
• Data tags can be used to categorize the data based on various criteria to manage
vast amount of data. Finally the data can be extracted, sorted and processed
based on these categories.
• Adding data tags to the data based on some condition or unconditionally is called
Data Tagging.
What are Data Tags?
Hadoop Based SQL and Big Data Analytics
Solution
• Tag is a label attached to someone or something for the purpose of identification
or to give other information.
• A Data Tag is a tag attached to the data or file to provide extra information about
the data or file.
• Data tags can be used to categorize the data based on various criteria to manage
vast amount of data. Finally the data can be extracted, sorted and processed
based on these categories.
• Adding data tags to the data based on some condition or unconditionally is called
Data Tagging.
Data Tagging with QueryIO
Hadoop Based SQL and Big Data Analytics
Solution
Data Tagging with QueryIO
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides advanced manual and automated data tagging feature which
allows you to define properties for files as they are being written to HDFS. It
automatically stores the basic MetaData files stored in HDFS and further extends
the MetaData layer by enabling you to define additional MetaData (Data Tags).
Data Tagging with QueryIO
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides advanced manual and automated data tagging feature which
allows you to define properties for files as they are being written to HDFS. It
automatically stores the basic MetaData files stored in HDFS and further extends
the MetaData layer by enabling you to define additional MetaData (Data Tags).
• Here again the tags defined for all the files on cluster are stored in a relational
database. It takes care of keeping the metadata and tags stored in database in
synch with the files stored on Hadoop cluster.
Data Tagging with QueryIO
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides advanced manual and automated data tagging feature which
allows you to define properties for files as they are being written to HDFS. It
automatically stores the basic MetaData files stored in HDFS and further extends
the MetaData layer by enabling you to define additional MetaData (Data Tags).
• Here again the tags defined for all the files on cluster are stored in a relational
database. It takes care of keeping the metadata and tags stored in database in
synch with the files stored on Hadoop cluster.
• Data tagging helps you to define data tag and operator which should be applied
on files on cluster. You can choose to define data tags using the table you have
already created using Hive DDL or can choose system defined MetaStore tables for
different file formats. You can also provide expressions to apply tags conditionally
on-ingest or on a scheduled time.
Data Tagging with QueryIO
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides advanced manual and automated data tagging feature which
allows you to define properties for files as they are being written to HDFS. It
automatically stores the basic MetaData files stored in HDFS and further extends
the MetaData layer by enabling you to define additional MetaData (Data Tags).
• Here again the tags defined for all the files on cluster are stored in a relational
database. It takes care of keeping the metadata and tags stored in database in
synch with the files stored on Hadoop cluster.
• Data tagging helps you to define data tag and operator which should be applied
on files on cluster. You can choose to define data tags using the table you have
already created using Hive DDL or can choose system defined MetaStore tables for
different file formats. You can also provide expressions to apply tags conditionally
on-ingest or on a scheduled time.
Data Tagging with QueryIO
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides advanced manual and automated data tagging feature which
allows you to define properties for files as they are being written to HDFS. It
automatically stores the basic MetaData files stored in HDFS and further extends
the MetaData layer by enabling you to define additional MetaData (Data Tags).
• Here again the tags defined for all the files on cluster are stored in a relational
database. It takes care of keeping the metadata and tags stored in database in
synch with the files stored on Hadoop cluster.
• Data tagging helps you to define data tag and operator which should be applied
on files on cluster. You can choose to define data tags using the table you have
already created using Hive DDL or can choose system defined MetaStore tables for
different file formats. You can also provide expressions to apply tags conditionally
on-ingest or on a scheduled time.
Data Tagging
Data Tagging with QueryIO
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides advanced manual and automated data tagging feature which
allows you to define properties for files as they are being written to HDFS. It
automatically stores the basic MetaData files stored in HDFS and further extends
the MetaData layer by enabling you to define additional MetaData (Data Tags).
• Here again the tags defined for all the files on cluster are stored in a relational
database. It takes care of keeping the metadata and tags stored in database in
synch with the files stored on Hadoop cluster.
• Data tagging helps you to define data tag and operator which should be applied
on files on cluster. You can choose to define data tags using the table you have
already created using Hive DDL or can choose system defined MetaStore tables for
different file formats. You can also provide expressions to apply tags conditionally
on-ingest or on a scheduled time.
Tag
Tag
Tag
Data Tagging
Unconditional Data Tagging
Hadoop Based SQL and Big Data Analytics
Solution
Unconditional Data Tagging
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides both conditional and unconditional Tagging. User can choose to
tag hand picked files or files in a particular folder.
Unconditional Data Tagging
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides both conditional and unconditional Tagging. User can choose to
tag hand picked files or files in a particular folder.
• For that all the used need to do is open the HDFS data browser, choose the files
you want to tag, and click on “add tag” button.
Unconditional Data Tagging
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides both conditional and unconditional Tagging. User can choose to
tag hand picked files or files in a particular folder.
• For that all the used need to do is open the HDFS data browser, choose the files
you want to tag, and click on “add tag” button.
• Unconditional tagging is useful when you want to tag the files whose HDFS
location is already known to you and the tagging is not dependent on other
attributes of file like file type, file length etc.
Unconditional Data Tagging
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides both conditional and unconditional Tagging. User can choose to
tag hand picked files or files in a particular folder.
• For that all the used need to do is open the HDFS data browser, choose the files
you want to tag, and click on “add tag” button.
• Unconditional tagging is useful when you want to tag the files whose HDFS
location is already known to you and the tagging is not dependent on other
attributes of file like file type, file length etc.
Unconditional Data Tagging
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides both conditional and unconditional Tagging. User can choose to
tag hand picked files or files in a particular folder.
• For that all the used need to do is open the HDFS data browser, choose the files
you want to tag, and click on “add tag” button.
• Unconditional tagging is useful when you want to tag the files whose HDFS
location is already known to you and the tagging is not dependent on other
attributes of file like file type, file length etc.
Conditional Data Tagging
Hadoop Based SQL and Big Data Analytics
Solution
Conditional Data Tagging
Hadoop Based SQL and Big Data Analytics
Solution
• Conditions can be defined using the values of file attributes (Ex: if Length > 1000)
OR by parsing the content of the file (Ex: if NumberOfLines > 100).
Conditional Data Tagging
Hadoop Based SQL and Big Data Analytics
Solution
• Conditions can be defined using the values of file attributes (Ex: if Length > 1000)
OR by parsing the content of the file (Ex: if NumberOfLines > 100).
• Also the tag value can be obtained by parsing the content of file (Ex:
NumberOfWords, ifWord(“QueryIO”)Exist, ifPatternExists etc.)
Conditional Data Tagging
Hadoop Based SQL and Big Data Analytics
Solution
• Conditions can be defined using the values of file attributes (Ex: if Length > 1000)
OR by parsing the content of the file (Ex: if NumberOfLines > 100).
• Also the tag value can be obtained by parsing the content of file (Ex:
NumberOfWords, ifWord(“QueryIO”)Exist, ifPatternExists etc.)
• Conditional data tags can be added on chosen file types or on all files present on
the HDFS cluster.
Conditional Data Tagging
Hadoop Based SQL and Big Data Analytics
Solution
• Conditions can be defined using the values of file attributes (Ex: if Length > 1000)
OR by parsing the content of the file (Ex: if NumberOfLines > 100).
• Also the tag value can be obtained by parsing the content of file (Ex:
NumberOfWords, ifWord(“QueryIO”)Exist, ifPatternExists etc.)
• Conditional data tags can be added on chosen file types or on all files present on
the HDFS cluster.
Conditional Data Tagging
Hadoop Based SQL and Big Data Analytics
Solution
• Conditions can be defined using the values of file attributes (Ex: if Length > 1000)
OR by parsing the content of the file (Ex: if NumberOfLines > 100).
• Also the tag value can be obtained by parsing the content of file (Ex:
NumberOfWords, ifWord(“QueryIO”)Exist, ifPatternExists etc.)
• Conditional data tags can be added on chosen file types or on all files present on
the HDFS cluster.
Conditional Data Tagging
Hadoop Based SQL and Big Data Analytics
Solution
• Conditions can be defined using the values of file attributes (Ex: if Length > 1000)
OR by parsing the content of the file (Ex: if NumberOfLines > 100).
• Also the tag value can be obtained by parsing the content of file (Ex:
NumberOfWords, ifWord(“QueryIO”)Exist, ifPatternExists etc.)
• Conditional data tags can be added on chosen file types or on all files present on
the HDFS cluster.
Download QueryIO Now!
http://QueryIO.com/download/big-data-analytics-download.html
OR
Take a Demo
http://demo.QueryIO.com/queryio
Hadoop Based SQL and Big Data Analytics
Solution
“Its Free”

Weitere ähnliche Inhalte

Andere mochten auch

Big Data Ingestion @ Flipkart Data Platform
Big Data Ingestion @ Flipkart Data PlatformBig Data Ingestion @ Flipkart Data Platform
Big Data Ingestion @ Flipkart Data PlatformNavneet Gupta
 
It's Just Search: Presented by Erik Hatcher, Lucidworks
It's Just Search: Presented by Erik Hatcher, LucidworksIt's Just Search: Presented by Erik Hatcher, Lucidworks
It's Just Search: Presented by Erik Hatcher, LucidworksLucidworks
 
Requirements document for big data use cases
Requirements document for big data use casesRequirements document for big data use cases
Requirements document for big data use casesAllied Consultants
 
Enterprise Architecture in the Era of Big Data and Quantum Computing
Enterprise Architecture in the Era of Big Data and Quantum ComputingEnterprise Architecture in the Era of Big Data and Quantum Computing
Enterprise Architecture in the Era of Big Data and Quantum ComputingKnowledgent
 
Designing Fast Data Architecture for Big Data using Logical Data Warehouse a...
Designing Fast Data Architecture for Big Data  using Logical Data Warehouse a...Designing Fast Data Architecture for Big Data  using Logical Data Warehouse a...
Designing Fast Data Architecture for Big Data using Logical Data Warehouse a...Denodo
 
Video Analytics on Hadoop webinar victor fang-201309
Video Analytics on Hadoop webinar victor fang-201309Video Analytics on Hadoop webinar victor fang-201309
Video Analytics on Hadoop webinar victor fang-201309DrVictorFang
 
Data Discovery & Lineage in Enterprise Hadoop
Data Discovery & Lineage in Enterprise HadoopData Discovery & Lineage in Enterprise Hadoop
Data Discovery & Lineage in Enterprise HadoopDataWorks Summit
 
Data Ingestion, Extraction & Parsing on Hadoop
Data Ingestion, Extraction & Parsing on HadoopData Ingestion, Extraction & Parsing on Hadoop
Data Ingestion, Extraction & Parsing on Hadoopskaluska
 
Building a Scalable Digital Asset Management Platform in the Cloud (MED402) |...
Building a Scalable Digital Asset Management Platform in the Cloud (MED402) |...Building a Scalable Digital Asset Management Platform in the Cloud (MED402) |...
Building a Scalable Digital Asset Management Platform in the Cloud (MED402) |...Amazon Web Services
 
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)Amazon Web Services
 
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Suman Srinivasan
 
하둡 HDFS 훑어보기
하둡 HDFS 훑어보기하둡 HDFS 훑어보기
하둡 HDFS 훑어보기beom kyun choi
 
Building Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Building Reactive Fast Data & the Data Lake with Akka, Kafka, SparkBuilding Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Building Reactive Fast Data & the Data Lake with Akka, Kafka, SparkTodd Fritz
 
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)Amazon Web Services
 

Andere mochten auch (16)

Big Data Ingestion @ Flipkart Data Platform
Big Data Ingestion @ Flipkart Data PlatformBig Data Ingestion @ Flipkart Data Platform
Big Data Ingestion @ Flipkart Data Platform
 
It's Just Search: Presented by Erik Hatcher, Lucidworks
It's Just Search: Presented by Erik Hatcher, LucidworksIt's Just Search: Presented by Erik Hatcher, Lucidworks
It's Just Search: Presented by Erik Hatcher, Lucidworks
 
Requirements document for big data use cases
Requirements document for big data use casesRequirements document for big data use cases
Requirements document for big data use cases
 
Enterprise Architecture in the Era of Big Data and Quantum Computing
Enterprise Architecture in the Era of Big Data and Quantum ComputingEnterprise Architecture in the Era of Big Data and Quantum Computing
Enterprise Architecture in the Era of Big Data and Quantum Computing
 
Designing Fast Data Architecture for Big Data using Logical Data Warehouse a...
Designing Fast Data Architecture for Big Data  using Logical Data Warehouse a...Designing Fast Data Architecture for Big Data  using Logical Data Warehouse a...
Designing Fast Data Architecture for Big Data using Logical Data Warehouse a...
 
Video Analytics on Hadoop webinar victor fang-201309
Video Analytics on Hadoop webinar victor fang-201309Video Analytics on Hadoop webinar victor fang-201309
Video Analytics on Hadoop webinar victor fang-201309
 
Data Discovery & Lineage in Enterprise Hadoop
Data Discovery & Lineage in Enterprise HadoopData Discovery & Lineage in Enterprise Hadoop
Data Discovery & Lineage in Enterprise Hadoop
 
Data Ingestion, Extraction & Parsing on Hadoop
Data Ingestion, Extraction & Parsing on HadoopData Ingestion, Extraction & Parsing on Hadoop
Data Ingestion, Extraction & Parsing on Hadoop
 
Building a Scalable Digital Asset Management Platform in the Cloud (MED402) |...
Building a Scalable Digital Asset Management Platform in the Cloud (MED402) |...Building a Scalable Digital Asset Management Platform in the Cloud (MED402) |...
Building a Scalable Digital Asset Management Platform in the Cloud (MED402) |...
 
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
 
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
 
Video Analysis in Hadoop
Video Analysis in HadoopVideo Analysis in Hadoop
Video Analysis in Hadoop
 
하둡 HDFS 훑어보기
하둡 HDFS 훑어보기하둡 HDFS 훑어보기
하둡 HDFS 훑어보기
 
Building Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Building Reactive Fast Data & the Data Lake with Akka, Kafka, SparkBuilding Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Building Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)
 

Kürzlich hochgeladen

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 

Kürzlich hochgeladen (20)

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 

Hadoop Data Tagging and Metadata Extension

  • 1. Hadoop Based SQL and Big Data Analytics Solution
  • 2. Hadoop Data Tagging and Metadata Extension Hadoop Based SQL and Big Data Analytics Solution
  • 3. What is MetaData? Hadoop Based SQL and Big Data Analytics Solution
  • 4. What is MetaData? Hadoop Based SQL and Big Data Analytics Solution • Metadata is simply “Data about Data”.
  • 5. What is MetaData? Hadoop Based SQL and Big Data Analytics Solution • Metadata is simply “Data about Data”. • In terms of file system, the metadata is the information about files like size of file, time on which the file was created, last modified, type of file, owner of file etc.
  • 6. What is MetaData? Hadoop Based SQL and Big Data Analytics Solution • Metadata is simply “Data about Data”. • In terms of file system, the metadata is the information about files like size of file, time on which the file was created, last modified, type of file, owner of file etc. • The file system manages access to both the content of files and the metadata about those files.
  • 7. What is MetaData? Hadoop Based SQL and Big Data Analytics Solution • Metadata is simply “Data about Data”. • In terms of file system, the metadata is the information about files like size of file, time on which the file was created, last modified, type of file, owner of file etc. • The file system manages access to both the content of files and the metadata about those files. • Metadata characterizes data. It is used to provide documentation such that data can be understood and more readily consumed by your organization. Metadata answers the who, what, when, where, why, and how questions for users of the data.
  • 8. MetaData Extension with QueryIO Hadoop Based SQL and Big Data Analytics Solution
  • 9. MetaData Extension with QueryIO Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides On-Ingest metadata extraction service where by extended metadata can be extracted from the files on ingest and you don't need to worry about running costly batch jobs later on. This enables the unstructured data on cluster searchable readily as soon as its ingested.
  • 10. MetaData Extension with QueryIO Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides On-Ingest metadata extraction service where by extended metadata can be extracted from the files on ingest and you don't need to worry about running costly batch jobs later on. This enables the unstructured data on cluster searchable readily as soon as its ingested. • To help make unstructured data searchable on Hadoop Cluster, QueryIO stores the metadata for each file stored on Hadoop in a relational database.
  • 11. MetaData Extension with QueryIO Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides On-Ingest metadata extraction service where by extended metadata can be extracted from the files on ingest and you don't need to worry about running costly batch jobs later on. This enables the unstructured data on cluster searchable readily as soon as its ingested. • To help make unstructured data searchable on Hadoop Cluster, QueryIO stores the metadata for each file stored on Hadoop in a relational database. • Since all the metadata and tags associated with a file are kept in a relational database, you can leverage the existing infrastructure built around SQL to search the data on the Hadoop cluster.
  • 12. MetaData Extension with QueryIO Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides On-Ingest metadata extraction service where by extended metadata can be extracted from the files on ingest and you don't need to worry about running costly batch jobs later on. This enables the unstructured data on cluster searchable readily as soon as its ingested. • To help make unstructured data searchable on Hadoop Cluster, QueryIO stores the metadata for each file stored on Hadoop in a relational database. • Since all the metadata and tags associated with a file are kept in a relational database, you can leverage the existing infrastructure built around SQL to search the data on the Hadoop cluster. • It understands dozens of file formats such as pdf/xls/doc file formats, image files, audio and video files, etc.
  • 13. What are Data Tags? Hadoop Based SQL and Big Data Analytics Solution
  • 14. What are Data Tags? Hadoop Based SQL and Big Data Analytics Solution • Tag is a label attached to someone or something for the purpose of identification or to give other information.
  • 15. What are Data Tags? Hadoop Based SQL and Big Data Analytics Solution • Tag is a label attached to someone or something for the purpose of identification or to give other information. • A Data Tag is a tag attached to the data or file to provide extra information about the data or file.
  • 16. What are Data Tags? Hadoop Based SQL and Big Data Analytics Solution • Tag is a label attached to someone or something for the purpose of identification or to give other information. • A Data Tag is a tag attached to the data or file to provide extra information about the data or file. • Data tags can be used to categorize the data based on various criteria to manage vast amount of data. Finally the data can be extracted, sorted and processed based on these categories.
  • 17. What are Data Tags? Hadoop Based SQL and Big Data Analytics Solution • Tag is a label attached to someone or something for the purpose of identification or to give other information. • A Data Tag is a tag attached to the data or file to provide extra information about the data or file. • Data tags can be used to categorize the data based on various criteria to manage vast amount of data. Finally the data can be extracted, sorted and processed based on these categories. • Adding data tags to the data based on some condition or unconditionally is called Data Tagging.
  • 18. What are Data Tags? Hadoop Based SQL and Big Data Analytics Solution • Tag is a label attached to someone or something for the purpose of identification or to give other information. • A Data Tag is a tag attached to the data or file to provide extra information about the data or file. • Data tags can be used to categorize the data based on various criteria to manage vast amount of data. Finally the data can be extracted, sorted and processed based on these categories. • Adding data tags to the data based on some condition or unconditionally is called Data Tagging.
  • 19. Data Tagging with QueryIO Hadoop Based SQL and Big Data Analytics Solution
  • 20. Data Tagging with QueryIO Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides advanced manual and automated data tagging feature which allows you to define properties for files as they are being written to HDFS. It automatically stores the basic MetaData files stored in HDFS and further extends the MetaData layer by enabling you to define additional MetaData (Data Tags).
  • 21. Data Tagging with QueryIO Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides advanced manual and automated data tagging feature which allows you to define properties for files as they are being written to HDFS. It automatically stores the basic MetaData files stored in HDFS and further extends the MetaData layer by enabling you to define additional MetaData (Data Tags). • Here again the tags defined for all the files on cluster are stored in a relational database. It takes care of keeping the metadata and tags stored in database in synch with the files stored on Hadoop cluster.
  • 22. Data Tagging with QueryIO Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides advanced manual and automated data tagging feature which allows you to define properties for files as they are being written to HDFS. It automatically stores the basic MetaData files stored in HDFS and further extends the MetaData layer by enabling you to define additional MetaData (Data Tags). • Here again the tags defined for all the files on cluster are stored in a relational database. It takes care of keeping the metadata and tags stored in database in synch with the files stored on Hadoop cluster. • Data tagging helps you to define data tag and operator which should be applied on files on cluster. You can choose to define data tags using the table you have already created using Hive DDL or can choose system defined MetaStore tables for different file formats. You can also provide expressions to apply tags conditionally on-ingest or on a scheduled time.
  • 23. Data Tagging with QueryIO Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides advanced manual and automated data tagging feature which allows you to define properties for files as they are being written to HDFS. It automatically stores the basic MetaData files stored in HDFS and further extends the MetaData layer by enabling you to define additional MetaData (Data Tags). • Here again the tags defined for all the files on cluster are stored in a relational database. It takes care of keeping the metadata and tags stored in database in synch with the files stored on Hadoop cluster. • Data tagging helps you to define data tag and operator which should be applied on files on cluster. You can choose to define data tags using the table you have already created using Hive DDL or can choose system defined MetaStore tables for different file formats. You can also provide expressions to apply tags conditionally on-ingest or on a scheduled time.
  • 24. Data Tagging with QueryIO Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides advanced manual and automated data tagging feature which allows you to define properties for files as they are being written to HDFS. It automatically stores the basic MetaData files stored in HDFS and further extends the MetaData layer by enabling you to define additional MetaData (Data Tags). • Here again the tags defined for all the files on cluster are stored in a relational database. It takes care of keeping the metadata and tags stored in database in synch with the files stored on Hadoop cluster. • Data tagging helps you to define data tag and operator which should be applied on files on cluster. You can choose to define data tags using the table you have already created using Hive DDL or can choose system defined MetaStore tables for different file formats. You can also provide expressions to apply tags conditionally on-ingest or on a scheduled time. Data Tagging
  • 25. Data Tagging with QueryIO Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides advanced manual and automated data tagging feature which allows you to define properties for files as they are being written to HDFS. It automatically stores the basic MetaData files stored in HDFS and further extends the MetaData layer by enabling you to define additional MetaData (Data Tags). • Here again the tags defined for all the files on cluster are stored in a relational database. It takes care of keeping the metadata and tags stored in database in synch with the files stored on Hadoop cluster. • Data tagging helps you to define data tag and operator which should be applied on files on cluster. You can choose to define data tags using the table you have already created using Hive DDL or can choose system defined MetaStore tables for different file formats. You can also provide expressions to apply tags conditionally on-ingest or on a scheduled time. Tag Tag Tag Data Tagging
  • 26. Unconditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution
  • 27. Unconditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides both conditional and unconditional Tagging. User can choose to tag hand picked files or files in a particular folder.
  • 28. Unconditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides both conditional and unconditional Tagging. User can choose to tag hand picked files or files in a particular folder. • For that all the used need to do is open the HDFS data browser, choose the files you want to tag, and click on “add tag” button.
  • 29. Unconditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides both conditional and unconditional Tagging. User can choose to tag hand picked files or files in a particular folder. • For that all the used need to do is open the HDFS data browser, choose the files you want to tag, and click on “add tag” button. • Unconditional tagging is useful when you want to tag the files whose HDFS location is already known to you and the tagging is not dependent on other attributes of file like file type, file length etc.
  • 30. Unconditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides both conditional and unconditional Tagging. User can choose to tag hand picked files or files in a particular folder. • For that all the used need to do is open the HDFS data browser, choose the files you want to tag, and click on “add tag” button. • Unconditional tagging is useful when you want to tag the files whose HDFS location is already known to you and the tagging is not dependent on other attributes of file like file type, file length etc.
  • 31. Unconditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides both conditional and unconditional Tagging. User can choose to tag hand picked files or files in a particular folder. • For that all the used need to do is open the HDFS data browser, choose the files you want to tag, and click on “add tag” button. • Unconditional tagging is useful when you want to tag the files whose HDFS location is already known to you and the tagging is not dependent on other attributes of file like file type, file length etc.
  • 32. Conditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution
  • 33. Conditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution • Conditions can be defined using the values of file attributes (Ex: if Length > 1000) OR by parsing the content of the file (Ex: if NumberOfLines > 100).
  • 34. Conditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution • Conditions can be defined using the values of file attributes (Ex: if Length > 1000) OR by parsing the content of the file (Ex: if NumberOfLines > 100). • Also the tag value can be obtained by parsing the content of file (Ex: NumberOfWords, ifWord(“QueryIO”)Exist, ifPatternExists etc.)
  • 35. Conditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution • Conditions can be defined using the values of file attributes (Ex: if Length > 1000) OR by parsing the content of the file (Ex: if NumberOfLines > 100). • Also the tag value can be obtained by parsing the content of file (Ex: NumberOfWords, ifWord(“QueryIO”)Exist, ifPatternExists etc.) • Conditional data tags can be added on chosen file types or on all files present on the HDFS cluster.
  • 36. Conditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution • Conditions can be defined using the values of file attributes (Ex: if Length > 1000) OR by parsing the content of the file (Ex: if NumberOfLines > 100). • Also the tag value can be obtained by parsing the content of file (Ex: NumberOfWords, ifWord(“QueryIO”)Exist, ifPatternExists etc.) • Conditional data tags can be added on chosen file types or on all files present on the HDFS cluster.
  • 37. Conditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution • Conditions can be defined using the values of file attributes (Ex: if Length > 1000) OR by parsing the content of the file (Ex: if NumberOfLines > 100). • Also the tag value can be obtained by parsing the content of file (Ex: NumberOfWords, ifWord(“QueryIO”)Exist, ifPatternExists etc.) • Conditional data tags can be added on chosen file types or on all files present on the HDFS cluster.
  • 38. Conditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution • Conditions can be defined using the values of file attributes (Ex: if Length > 1000) OR by parsing the content of the file (Ex: if NumberOfLines > 100). • Also the tag value can be obtained by parsing the content of file (Ex: NumberOfWords, ifWord(“QueryIO”)Exist, ifPatternExists etc.) • Conditional data tags can be added on chosen file types or on all files present on the HDFS cluster.
  • 39. Download QueryIO Now! http://QueryIO.com/download/big-data-analytics-download.html OR Take a Demo http://demo.QueryIO.com/queryio Hadoop Based SQL and Big Data Analytics Solution “Its Free”