SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Cascalog: an Interactive Query
Language for Hadoop
Nathan Marz
BackType
What is Cascalog?


Cascading (the job execution engine)

    +

 Datalog (basis of the API design)

    +

 Clojure (the host programming language)
Why another query language for Hadoop?




 Existing tools cause too much

 Accidental Complexity
Accidental complexity




 Complexity caused by the tool used
 to solve a problem rather than the
 problem itself
Accidental complexity in existing tools




Pig               The query language is different
                  than the programming language
Hive
When query tool is separate from
programming language


 Friction when embedding custom operations

 Interlacing queries with regular application logic
  is unnatural

 Generating queries dynamically is difficult
Clojure

 General purpose programming language

 Dialect of Lisp that compiles to Java bytecode

 “Programmable programming language”: Easy to
  build Domain Specific Languages (DSL) in Clojure
Clojure examples



    Clojure code             Result

      (+ 1 2 3)                6


     (> 20 18)                true


   (defn incr [x] (+ 1 x))     4
   (incr 3)
Cascalog




 Domain Specific Language in Clojure for
 processing data using Hadoop
Cascalog




       Full power of a general purpose
 programming language available at all times
Cascalog




       Full power of a general purpose
 programming language available at all times


         Cascalog is a Clojure library
Example query: (?<- (stdout) [?p ?a] (age ?p 25))
Demo time!
Some of Cascalog’s features

   Inner and outer joins
   Aggregators
   Functions
   Subqueries
   Sorting
   Read from and write to arbitrary data sources
    › HDFS
    › HBase
    › MySQL
    › Etc.
When query tool is separate from
programming language


 Friction when embedding custom operations

 Interlacing queries with regular application logic
  is unnatural

 Generating queries dynamically is difficult
Cascalog, on the other hand...



 Custom operations defined just like any other
  function

 Interlacing queries with regular application logic
  is trivial

 Generating queries dynamically is easy and
  idiomatic
Try Cascalog yourself!


Project Page
http://www.github.com/nathanmarz/cascalog

Introductory Tutorial
http://nathanmarz.com/blog/introducing-
cascalog/

       5 minutes to install Clojure, Hadoop, and
       Cascalog locally! See project README
Questions?

Twitter: @nathanmarz
Email: nathan.marz@gmail.com
More benefits to being Clojure DSL


 Excellent module system

 Interactive REPL

 Make use of any Clojure function in queries

Weitere ähnliche Inhalte

Andere mochten auch

Handling of Large Data by Salesforce
Handling of Large Data by SalesforceHandling of Large Data by Salesforce
Handling of Large Data by SalesforceThinqloud
 
Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...
Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...
Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...Data Con LA
 
Machine learning at Scale with Apache Spark
Machine learning at Scale with Apache SparkMachine learning at Scale with Apache Spark
Machine learning at Scale with Apache SparkMartin Zapletal
 
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco IntercloudStreamsets Inc.
 
Understanding the Salesforce Architecture: How We Do the Magic We Do
Understanding the Salesforce Architecture: How We Do the Magic We DoUnderstanding the Salesforce Architecture: How We Do the Magic We Do
Understanding the Salesforce Architecture: How We Do the Magic We DoSalesforce Developers
 
Salesforce API Series: Fast Parallel Data Loading with the Bulk API Webinar
Salesforce API Series: Fast Parallel Data Loading with the Bulk API WebinarSalesforce API Series: Fast Parallel Data Loading with the Bulk API Webinar
Salesforce API Series: Fast Parallel Data Loading with the Bulk API WebinarSalesforce Developers
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
Introduction to Apache NiFi - Seattle Scalability Meetup
Introduction to Apache NiFi - Seattle Scalability MeetupIntroduction to Apache NiFi - Seattle Scalability Meetup
Introduction to Apache NiFi - Seattle Scalability MeetupSaptak Sen
 
Microservice-based Architecture on the Salesforce App Cloud
Microservice-based Architecture on the Salesforce App CloudMicroservice-based Architecture on the Salesforce App Cloud
Microservice-based Architecture on the Salesforce App Cloudpbattisson
 
Kafka at Scale: Multi-Tier Architectures
Kafka at Scale: Multi-Tier ArchitecturesKafka at Scale: Multi-Tier Architectures
Kafka at Scale: Multi-Tier ArchitecturesTodd Palino
 
Reference architecture for Internet of Things
Reference architecture for Internet of ThingsReference architecture for Internet of Things
Reference architecture for Internet of ThingsSujee Maniyam
 
Secure Kafka at Salesforce.com
Secure Kafka at Salesforce.comSecure Kafka at Salesforce.com
Secure Kafka at Salesforce.comRajasekar Elango
 
Developing real-time data pipelines with Spring and Kafka
Developing real-time data pipelines with Spring and KafkaDeveloping real-time data pipelines with Spring and Kafka
Developing real-time data pipelines with Spring and Kafkamarius_bogoevici
 

Andere mochten auch (16)

Handling of Large Data by Salesforce
Handling of Large Data by SalesforceHandling of Large Data by Salesforce
Handling of Large Data by Salesforce
 
Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...
Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...
Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...
 
Machine learning at Scale with Apache Spark
Machine learning at Scale with Apache SparkMachine learning at Scale with Apache Spark
Machine learning at Scale with Apache Spark
 
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
 
Salesforce REST API
Salesforce  REST API Salesforce  REST API
Salesforce REST API
 
Understanding the Salesforce Architecture: How We Do the Magic We Do
Understanding the Salesforce Architecture: How We Do the Magic We DoUnderstanding the Salesforce Architecture: How We Do the Magic We Do
Understanding the Salesforce Architecture: How We Do the Magic We Do
 
Salesforce API Series: Fast Parallel Data Loading with the Bulk API Webinar
Salesforce API Series: Fast Parallel Data Loading with the Bulk API WebinarSalesforce API Series: Fast Parallel Data Loading with the Bulk API Webinar
Salesforce API Series: Fast Parallel Data Loading with the Bulk API Webinar
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
Introduction to Apache NiFi - Seattle Scalability Meetup
Introduction to Apache NiFi - Seattle Scalability MeetupIntroduction to Apache NiFi - Seattle Scalability Meetup
Introduction to Apache NiFi - Seattle Scalability Meetup
 
Microservice-based Architecture on the Salesforce App Cloud
Microservice-based Architecture on the Salesforce App CloudMicroservice-based Architecture on the Salesforce App Cloud
Microservice-based Architecture on the Salesforce App Cloud
 
Large Data Management Strategies
Large Data Management StrategiesLarge Data Management Strategies
Large Data Management Strategies
 
Kafka at Scale: Multi-Tier Architectures
Kafka at Scale: Multi-Tier ArchitecturesKafka at Scale: Multi-Tier Architectures
Kafka at Scale: Multi-Tier Architectures
 
Reference architecture for Internet of Things
Reference architecture for Internet of ThingsReference architecture for Internet of Things
Reference architecture for Internet of Things
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
Secure Kafka at Salesforce.com
Secure Kafka at Salesforce.comSecure Kafka at Salesforce.com
Secure Kafka at Salesforce.com
 
Developing real-time data pipelines with Spring and Kafka
Developing real-time data pipelines with Spring and KafkaDeveloping real-time data pipelines with Spring and Kafka
Developing real-time data pipelines with Spring and Kafka
 

Mehr von nathanmarz

Demystifying Data Engineering
Demystifying Data EngineeringDemystifying Data Engineering
Demystifying Data Engineeringnathanmarz
 
The inherent complexity of stream processing
The inherent complexity of stream processingThe inherent complexity of stream processing
The inherent complexity of stream processingnathanmarz
 
Using Simplicity to Make Hard Big Data Problems Easy
Using Simplicity to Make Hard Big Data Problems EasyUsing Simplicity to Make Hard Big Data Problems Easy
Using Simplicity to Make Hard Big Data Problems Easynathanmarz
 
The Epistemology of Software Engineering
The Epistemology of Software EngineeringThe Epistemology of Software Engineering
The Epistemology of Software Engineeringnathanmarz
 
Your Code is Wrong
Your Code is WrongYour Code is Wrong
Your Code is Wrongnathanmarz
 
Runaway complexity in Big Data... and a plan to stop it
Runaway complexity in Big Data... and a plan to stop itRunaway complexity in Big Data... and a plan to stop it
Runaway complexity in Big Data... and a plan to stop itnathanmarz
 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationnathanmarz
 
Become Efficient or Die: The Story of BackType
Become Efficient or Die: The Story of BackTypeBecome Efficient or Die: The Story of BackType
Become Efficient or Die: The Story of BackTypenathanmarz
 
The Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data SystemsThe Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data Systemsnathanmarz
 
Clojure at BackType
Clojure at BackTypeClojure at BackType
Clojure at BackTypenathanmarz
 
Cascalog workshop
Cascalog workshopCascalog workshop
Cascalog workshopnathanmarz
 
Cascalog at Strange Loop
Cascalog at Strange LoopCascalog at Strange Loop
Cascalog at Strange Loopnathanmarz
 
Cascalog at Hadoop Day
Cascalog at Hadoop DayCascalog at Hadoop Day
Cascalog at Hadoop Daynathanmarz
 
Cascalog at May Bay Area Hadoop User Group
Cascalog at May Bay Area Hadoop User GroupCascalog at May Bay Area Hadoop User Group
Cascalog at May Bay Area Hadoop User Groupnathanmarz
 

Mehr von nathanmarz (18)

Demystifying Data Engineering
Demystifying Data EngineeringDemystifying Data Engineering
Demystifying Data Engineering
 
The inherent complexity of stream processing
The inherent complexity of stream processingThe inherent complexity of stream processing
The inherent complexity of stream processing
 
Using Simplicity to Make Hard Big Data Problems Easy
Using Simplicity to Make Hard Big Data Problems EasyUsing Simplicity to Make Hard Big Data Problems Easy
Using Simplicity to Make Hard Big Data Problems Easy
 
The Epistemology of Software Engineering
The Epistemology of Software EngineeringThe Epistemology of Software Engineering
The Epistemology of Software Engineering
 
Your Code is Wrong
Your Code is WrongYour Code is Wrong
Your Code is Wrong
 
Runaway complexity in Big Data... and a plan to stop it
Runaway complexity in Big Data... and a plan to stop itRunaway complexity in Big Data... and a plan to stop it
Runaway complexity in Big Data... and a plan to stop it
 
Storm
StormStorm
Storm
 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computation
 
ElephantDB
ElephantDBElephantDB
ElephantDB
 
Become Efficient or Die: The Story of BackType
Become Efficient or Die: The Story of BackTypeBecome Efficient or Die: The Story of BackType
Become Efficient or Die: The Story of BackType
 
The Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data SystemsThe Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data Systems
 
Clojure at BackType
Clojure at BackTypeClojure at BackType
Clojure at BackType
 
Cascalog workshop
Cascalog workshopCascalog workshop
Cascalog workshop
 
Cascalog at Strange Loop
Cascalog at Strange LoopCascalog at Strange Loop
Cascalog at Strange Loop
 
Cascalog at Hadoop Day
Cascalog at Hadoop DayCascalog at Hadoop Day
Cascalog at Hadoop Day
 
Cascalog at May Bay Area Hadoop User Group
Cascalog at May Bay Area Hadoop User GroupCascalog at May Bay Area Hadoop User Group
Cascalog at May Bay Area Hadoop User Group
 
Cascalog
CascalogCascalog
Cascalog
 
Cascading
CascadingCascading
Cascading
 

Kürzlich hochgeladen

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

Kürzlich hochgeladen (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

Cascalog at Hadoop Summit

  • 1. Cascalog: an Interactive Query Language for Hadoop Nathan Marz BackType
  • 2. What is Cascalog? Cascading (the job execution engine) + Datalog (basis of the API design) + Clojure (the host programming language)
  • 3. Why another query language for Hadoop? Existing tools cause too much Accidental Complexity
  • 4. Accidental complexity Complexity caused by the tool used to solve a problem rather than the problem itself
  • 5. Accidental complexity in existing tools Pig The query language is different than the programming language Hive
  • 6. When query tool is separate from programming language  Friction when embedding custom operations  Interlacing queries with regular application logic is unnatural  Generating queries dynamically is difficult
  • 7. Clojure  General purpose programming language  Dialect of Lisp that compiles to Java bytecode  “Programmable programming language”: Easy to build Domain Specific Languages (DSL) in Clojure
  • 8. Clojure examples Clojure code Result (+ 1 2 3) 6 (> 20 18) true (defn incr [x] (+ 1 x)) 4 (incr 3)
  • 9. Cascalog Domain Specific Language in Clojure for processing data using Hadoop
  • 10. Cascalog Full power of a general purpose programming language available at all times
  • 11. Cascalog Full power of a general purpose programming language available at all times Cascalog is a Clojure library Example query: (?<- (stdout) [?p ?a] (age ?p 25))
  • 13. Some of Cascalog’s features  Inner and outer joins  Aggregators  Functions  Subqueries  Sorting  Read from and write to arbitrary data sources › HDFS › HBase › MySQL › Etc.
  • 14. When query tool is separate from programming language  Friction when embedding custom operations  Interlacing queries with regular application logic is unnatural  Generating queries dynamically is difficult
  • 15. Cascalog, on the other hand...  Custom operations defined just like any other function  Interlacing queries with regular application logic is trivial  Generating queries dynamically is easy and idiomatic
  • 16. Try Cascalog yourself! Project Page http://www.github.com/nathanmarz/cascalog Introductory Tutorial http://nathanmarz.com/blog/introducing- cascalog/ 5 minutes to install Clojure, Hadoop, and Cascalog locally! See project README
  • 18. More benefits to being Clojure DSL  Excellent module system  Interactive REPL  Make use of any Clojure function in queries

Hinweis der Redaktion

  1. This is the Title slide. Please use the name of the presentation that was used in the abstract submission.
  2. This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time.
  3. This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time.
  4. This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time.
  5. - UDFs, custom duct tape for registering and finding dependencies, separate files - separate files, testing?, error handling - things that you didn&amp;#x2019;t think were possible become idiomatic. compose queries, parameterize, pass queries and operations around
  6. This is the final slide; generally for questions at the end of the talk. Please post your contact information here.