SlideShare ist ein Scribd-Unternehmen logo
1 von 7
© 2016 MapR Technologies© 2016 MapR Technologies
Importing Big Data Sets in JSON
© 2016 MapR Technologies
Big Datasets and MapR-DB JSON
● Introducing jq
● maprdb ImportJSON
● Write Your Own OJAI Document Builder
MapR-DB JSON & OJAI
● Scalable JSON Document Datastore
● Utilizes the OJAI Open Source API
● Command Line Utility
● Developer Preview Sandbox
© 2016 MapR Technologies
Using ‘jq’ -- The ‘sed’ for JSON
● Unix Style Filter (standard in/standard out)
● Can Used to Select JSON Data
● Can be Used to Transform JSON Data
head -n 2000 chicago_crime_data-complete.json > ....
cat chicago_crime_data-partial.json | jq -c '.data[]'
cat chicago_crime_data-partial.json | jq -c '.data[] | {
"id":.[1],"case":.[0]}'
cat citylots.json | jq -c '.features[]'
python -mjson.tool chicago_crime_data-partial.json
© 2016 MapR Technologies
maprdb ImportJSON
● Import a Single File or Multiple Files
● Specify a Row Key (_id)
● One JSON Document per Line
● Can Run as YARN Map-Reduce Job
maprdb Examples
maprdb importJSON -src /user/mapr/foo.json -dst /user/mapr/foobar
maprdb shell
maprdb mapr:> find '/user/mapr/foobar' --id 605460
maprdb importJSON -src /user/mapr/import.file -dst /user/mapr/foobar -idField
properties.BLKLOT
© 2016 MapR Technologies
Roll Your Own
● Parse JSON from File, Stream, Message Queue, Unix Pipe
● Use DocumentBuilder API to Construct OJAI Document
● Specify the Key or Have ‘_id’ Field in Document
java -jar ojai-import-json-0.1-SNAPSHOT-jar-with-dependencies.jar --help
usage: ojaiimport
-f,--file <arg> Use this file as input instead of standard in
-h,--help This message
-k,--key <arg> JSON element to use as the '_id' (rowkey)
cat single-record.json |
java -jar ojai-import-json-0.1-SNAPSHOT-jar-with-dependencies.jar
/user/mapr/foobar
© 2016 MapR Technologies
Links
● https://stedolan.github.io/jq/tutorial/
● https://robots.thoughtbot.com/jq-is-sed-for-json
● http://ojai.github.io/javadocs/latest/
● https://www.mapr.com/hadoop-developer-quicklinks
● https://www.mapr.com/developer-preview/mapr-db-document-
database
● https://s3.amazonaws.com/mapr-fcf/datasets.zip
● https://github.com/mapr-demos/ojai-import-json
● https://data.cityofchicago.org/
● https://data.sfgov.org/
© 2015 MapR Technologies 7
Q & A
@mapr maprtech
pcurtis@mapr.com
Engage with us!
MapR
maprtech
mapr-technologies

Weitere ähnliche Inhalte

Andere mochten auch

Carta de Ezequiel Pelaia
Carta de Ezequiel PelaiaCarta de Ezequiel Pelaia
Carta de Ezequiel PelaiaUnidiversidad
 
Atrial Fibrillation in Women - St Vincent's, Birmingham, AL
Atrial Fibrillation in Women - St Vincent's, Birmingham, ALAtrial Fibrillation in Women - St Vincent's, Birmingham, AL
Atrial Fibrillation in Women - St Vincent's, Birmingham, ALJose Osorio
 
Realm: Building a mobile database
Realm: Building a mobile databaseRealm: Building a mobile database
Realm: Building a mobile databaseChristian Melchior
 
Ibm csoc response_future _ai 20160907 v5
Ibm csoc response_future _ai 20160907 v5Ibm csoc response_future _ai 20160907 v5
Ibm csoc response_future _ai 20160907 v5ISSIP
 

Andere mochten auch (7)

Carta de Ezequiel Pelaia
Carta de Ezequiel PelaiaCarta de Ezequiel Pelaia
Carta de Ezequiel Pelaia
 
Hypermedia Systems
Hypermedia SystemsHypermedia Systems
Hypermedia Systems
 
Keegan Law
Keegan LawKeegan Law
Keegan Law
 
Atrial Fibrillation in Women - St Vincent's, Birmingham, AL
Atrial Fibrillation in Women - St Vincent's, Birmingham, ALAtrial Fibrillation in Women - St Vincent's, Birmingham, AL
Atrial Fibrillation in Women - St Vincent's, Birmingham, AL
 
τα ντραμς της άλκηστης
τα ντραμς της άλκηστηςτα ντραμς της άλκηστης
τα ντραμς της άλκηστης
 
Realm: Building a mobile database
Realm: Building a mobile databaseRealm: Building a mobile database
Realm: Building a mobile database
 
Ibm csoc response_future _ai 20160907 v5
Ibm csoc response_future _ai 20160907 v5Ibm csoc response_future _ai 20160907 v5
Ibm csoc response_future _ai 20160907 v5
 

Mehr von MapR Technologies

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscapeMapR Technologies
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationMapR Technologies
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataMapR Technologies
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureMapR Technologies
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsMapR Technologies
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionMapR Technologies
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformMapR Technologies
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareMapR Technologies
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsMapR Technologies
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Technologies
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLMapR Technologies
 

Mehr von MapR Technologies (20)

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscape
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
 

Kürzlich hochgeladen

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Kürzlich hochgeladen (20)

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Import Large JSON Data Sets with OJAI

  • 1. © 2016 MapR Technologies© 2016 MapR Technologies Importing Big Data Sets in JSON
  • 2. © 2016 MapR Technologies Big Datasets and MapR-DB JSON ● Introducing jq ● maprdb ImportJSON ● Write Your Own OJAI Document Builder MapR-DB JSON & OJAI ● Scalable JSON Document Datastore ● Utilizes the OJAI Open Source API ● Command Line Utility ● Developer Preview Sandbox
  • 3. © 2016 MapR Technologies Using ‘jq’ -- The ‘sed’ for JSON ● Unix Style Filter (standard in/standard out) ● Can Used to Select JSON Data ● Can be Used to Transform JSON Data head -n 2000 chicago_crime_data-complete.json > .... cat chicago_crime_data-partial.json | jq -c '.data[]' cat chicago_crime_data-partial.json | jq -c '.data[] | { "id":.[1],"case":.[0]}' cat citylots.json | jq -c '.features[]' python -mjson.tool chicago_crime_data-partial.json
  • 4. © 2016 MapR Technologies maprdb ImportJSON ● Import a Single File or Multiple Files ● Specify a Row Key (_id) ● One JSON Document per Line ● Can Run as YARN Map-Reduce Job maprdb Examples maprdb importJSON -src /user/mapr/foo.json -dst /user/mapr/foobar maprdb shell maprdb mapr:> find '/user/mapr/foobar' --id 605460 maprdb importJSON -src /user/mapr/import.file -dst /user/mapr/foobar -idField properties.BLKLOT
  • 5. © 2016 MapR Technologies Roll Your Own ● Parse JSON from File, Stream, Message Queue, Unix Pipe ● Use DocumentBuilder API to Construct OJAI Document ● Specify the Key or Have ‘_id’ Field in Document java -jar ojai-import-json-0.1-SNAPSHOT-jar-with-dependencies.jar --help usage: ojaiimport -f,--file <arg> Use this file as input instead of standard in -h,--help This message -k,--key <arg> JSON element to use as the '_id' (rowkey) cat single-record.json | java -jar ojai-import-json-0.1-SNAPSHOT-jar-with-dependencies.jar /user/mapr/foobar
  • 6. © 2016 MapR Technologies Links ● https://stedolan.github.io/jq/tutorial/ ● https://robots.thoughtbot.com/jq-is-sed-for-json ● http://ojai.github.io/javadocs/latest/ ● https://www.mapr.com/hadoop-developer-quicklinks ● https://www.mapr.com/developer-preview/mapr-db-document- database ● https://s3.amazonaws.com/mapr-fcf/datasets.zip ● https://github.com/mapr-demos/ojai-import-json ● https://data.cityofchicago.org/ ● https://data.sfgov.org/
  • 7. © 2015 MapR Technologies 7 Q & A @mapr maprtech pcurtis@mapr.com Engage with us! MapR maprtech mapr-technologies