SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Data.Engineers.Toolkit
Tools for Cloud Data Engineering
The Most Popular Tools in Use Today across
10 Skillsets in Data Engineering
Business Platform Success
We help our clients build their global
platforms on scalable data platforms
with our Playbook, Framework, and
Knowledge base.
Who we help Succeed
ARCHITECT
noun: architect; chief builder
verb: architect; design or make (COMPUTING)
“We create and manage global platforms that run on
Cassandra and related technologies.”
5
Things We Love : Scalable Fast Data
Without Datastax
With Datastax
The Landscape of Cloud Data Engineering
Query BI
Data
Warehouse
DataOps
DevOps
Data Engineering
SQL
NoSQL
Queues
Data Lake
SQL - The foundation of data
engineering. Still very relevant.
SQL / Relational Databases in Data Engineering
1. MySQL - The most popular DB /
variant of SQL in use (MariaDB).
2. PostgreSQL - Used by more and
more to replace Oracle
3. Microsoft SQL - Still relevant. Not
going anywhere.
4. Oracle - Big companies use this. Still
relevant.
1. Popularity - Very popular because
most software commercial or open
source runs on relational databases.
2. Function - What SQL can do in
relation to ACID transactions
currently hard to beat in NoSQL
3. Staying Power - Open, Commercial,
Cloud options. No reason to see it
disappearing.
Tools Factors
NoSQL - The foundation for big
data applications. Lots of variants.
NoSQL / Non-relational DBs in Data Engineering
1. Mongo - Due to popularity in Node
world, in use everywhere.
2. Redis - Needed not only for Apps but
in the process of data engineering.
3. Dynamo - Easy to get started. Lots of
AWS play apps on Dynamo.
4. Cassandra - In use by the largest
companies with critical ops.
1. Popularity - Popular because of
ease of use to get started
2. Function - Each has its own special
reason to be useful.
3. Staying Power - Different variants /
implementations / managed services
for these DBs mean that enough
people need it for these additional
markets of services.
Tools Factors
Data Lakes on HDFS - The
standard for storage and retrieval of
files - structured, unstructured,
semi-structured, or binary.
Data Lakes on HDFS / S3 Distributed File Storage
1. HDFS - Universal protocol for
distributed file system access.
2. Amazon S3 - Supports HDFS and S3
object API also a standard now.
3. Google Storage - Does what S3 does
on Google
4. Azure Blob - Does what S3 does on
Azure
1. Popularity - Popularized due to big
data and clouds needing their own
distributed file storage.
2. Function - Use as an object storage
(key:value) or to store raw files , or
structured data for use later in query
engines.
3. Staying Power - Is responsible for
the massive storage of all “cold” data
that doesn’t need to be in a
database. HDFS/S3 standards now
universal.
Tools Factors
Streams/Queues - Adding “Real-
time” processing into the mix.
Streams / Queues in Data Engineering
1. Popularity - Popular because of the
rise of real-time use-cases in
business platforms.
2. Function - Used to store “everything”
that’s happening as well as for
focused “events” to trigger
processes.
3. Staying Power - Different reasons for
staying power: demand in the market
and current users continue to grow
use-cases.
Tools Factors
1. RabbitMQ - Lots of use in business,
works well until it doesn’t.
2. Apache Kafka - Full ecosystem and
variants that support Kafka protocol.
3. Amazon SQS - Easy to get started
and use in Amazon. Similar services
in other Clouds
Data Engineering - The actual
work.
Popular Data Engineering Tools
1. Popularity - Different reasons for
popularity. Commercial tools save
tons of time.
2. Function - Allows to consolidate and
standardize all flows into a single
system.
3. Staying Power - Apache Spark is a
core part of cloud offerings.
Stitchdata, Fivetran popular at large
companies. Dbt is new but has good
growth.
Tools Factors
1. Apache Spark - The most popular big
data engineering toolkit. Python,
Scala, Java, R, C#
2. Dbt - New tool but very powerful.
Abstracts database engineering into
SQL.
3. Fivetran - Commercial tool for
visually managing data flows.
4. Stitch - Similar to Fivetran, many
connectors / open Singer framework.
Data Operations - Managing In /
Out / Around
Data Operations in Data Engineering
1. Popularity - Traction in big and small
companies.
2. Function - Allows to orchestrate
complex workflows of tasks (DAG).
3. Staying Power - Airflow future proof
in Kubernetes, Argo is the new kid in
Kubernetes. Jenkins is in use in
many companies.
Tools Factors
1. Airflow - Many connectors to
manage complex data flows.
2. Jenkins - Used for CICD can do
linear pipelines.
3. Prefect - New but powerful tool in
Python
4. Argo - Does CICD but the Workflow
engine is useful, runs Kubeflow
Data Warehouse - Running SQL at
large scale.
Data Warehouse - Analytics Across Data
1. Popularity - Warehousing
conventions around for a while -
dimensions, facts.
2. Function - After bringing data
together and relating it , can do
massive SQL queries.
3. Staying Power - Theory isn’t going
anywhere. Technologies my change,
but the core concept is solid.
Tools Factors
1. Redshift - Widely used due to
Amazon
2. BigQuery - Well integrated query
engine in Google.
3. Snowflake - Does a bit of data
engineering as well as query engine.
4. MsSQL/Oracle - Commercial DBs
have a data warehouse
configuration.
Query - Virtualizing data through
standard query engines.
Query Engines - Analytics Across Data Sources
1. Popularity - Hive is a standard,
works in different systems like
Spark/Hadoop. Presto popular.
Denodo coming up.
2. Function - Separates storage from
query. “Virtualizes” queries.
3. Staying Power - The theory has
been now implemented in
Snowflake, Redshift - separate
storage from query. These will stick.
Tools Factors
1. Apache Hive - Available in Hadoop
ecosystem or some variants by
cloud vendors.
2. Spark SQL / Hive - Like Hive but on
Spark.
3. PrestoDB - Open data virtualization,
can run on Spark, works with Hive.
4. Denodo - Commercial data
virtualization, can run on Spark
Business Intelligence -
Visualization and dashboarding data
for consumers.
Business Intelligence tools for Data Engineers
1. Popularity - BI is HUGE. Learning it
is not just about the tool. Tools are
always coming and going.
2. Function - Allows non programmers
to discover, analyze, and create
visualizations, and reports that other
non-technical people can consume.
3. Staying Power - Tableau will stick
around. Open source Redash now
supported by Databricks.
Tools Factors
1. Tableau - Very popular since they
give people community access.
2. Looker - Commercial grade tool -
expect good UI.
3. Redash - Powerful open source tool
for data professionals to make
reports/dashboards.
4. Metabase - Easy to use tool for non
admin / dba types.
DevOps - Infrastructure/Software
Configuration/Large Scale Admin
Dev Ops Tools for Data Engineering
Tools More Tools
1. Terraform - Manage different clouds
with one language.
2. Prometheus / Grafana - The O.G. of
time series system data vis.
3. Ansible - Organizes commands that
need to be run better - Setup,
Configure, Run ad-hoc commands
1. Docker - Customize your image.
2. Kubernetes - Run your cluster.
3. Argo - CICD for Containers in
Kubernetes land.
4. Jenkins - General purpose CICD -
can use this to run other tools.
Any Questions?
Create and
manage global
data platforms.
www.anant.us | solutions@anant.us | (855) 262-6826
3 Washington Circle, NW | Suite 301 | Washington, DC 20037
Knowledge
Playbook.anant.us
Blog.anant.us
Cassandra.link
Cassandra.tools
Let’s talk.
Service Catalog
Cassandra
Spark
Kafka
Airflow
DevOps
DataOps
Training
Data Engineering
DevOps
DataOps
(Apprentice)

Weitere ähnliche Inhalte

Was ist angesagt?

Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and FutureHadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and Future
DataWorks Summit
 
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Qubole
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analytics
joshwills
 

Was ist angesagt? (20)

Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Harnessing the Power of Apache Hadoop
Harnessing the Power of Apache Hadoop Harnessing the Power of Apache Hadoop
Harnessing the Power of Apache Hadoop
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and FutureHadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and Future
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 
The Microsoft BigData Story
The Microsoft BigData StoryThe Microsoft BigData Story
The Microsoft BigData Story
 
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
 
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | QuboleEbooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
 
Snowflake Datawarehouse Architecturing
Snowflake Datawarehouse ArchitecturingSnowflake Datawarehouse Architecturing
Snowflake Datawarehouse Architecturing
 
Spark - Migration Story
Spark - Migration Story Spark - Migration Story
Spark - Migration Story
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lake
 
Intro to databricks delta lake
 Intro to databricks delta lake Intro to databricks delta lake
Intro to databricks delta lake
 
Atlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesAtlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slides
 
Webinar: Buckle Up: The Future of the Distributed Database is Here - DataStax...
Webinar: Buckle Up: The Future of the Distributed Database is Here - DataStax...Webinar: Buckle Up: The Future of the Distributed Database is Here - DataStax...
Webinar: Buckle Up: The Future of the Distributed Database is Here - DataStax...
 
Big Data on azure
Big Data on azureBig Data on azure
Big Data on azure
 
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
 
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha DittmannAzure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analytics
 

Ähnlich wie Data Engineer's Lunch #55: Get Started in Data Engineering

Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1
Thanh Nguyen
 

Ähnlich wie Data Engineer's Lunch #55: Get Started in Data Engineering (20)

Agile data lake? An oxymoron?
Agile data lake? An oxymoron?Agile data lake? An oxymoron?
Agile data lake? An oxymoron?
 
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelUnstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus Model
 
Real time analytics
Real time analyticsReal time analytics
Real time analytics
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
 
Big data & Hadoop
Big data & HadoopBig data & Hadoop
Big data & Hadoop
 
spark_v1_2
spark_v1_2spark_v1_2
spark_v1_2
 
Hadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersHadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, Providers
 
Oracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by ExampleOracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by Example
 
Data Warehouse on Hadoop Based System In Action
Data Warehouse on Hadoop Based System In ActionData Warehouse on Hadoop Based System In Action
Data Warehouse on Hadoop Based System In Action
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor Landscape
 
Interactive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroDataInteractive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroData
 
2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014
 
Hadoop(Term Paper)
Hadoop(Term Paper)Hadoop(Term Paper)
Hadoop(Term Paper)
 
Available platforms for Big Data 2.0
Available platforms for Big Data 2.0Available platforms for Big Data 2.0
Available platforms for Big Data 2.0
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a Nutshell
 
Analyst Report : The Enterprise Use of Hadoop
Analyst Report : The Enterprise Use of Hadoop Analyst Report : The Enterprise Use of Hadoop
Analyst Report : The Enterprise Use of Hadoop
 
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
 

Mehr von Anant Corporation

NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
Anant Corporation
 
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPTAutomate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Anant Corporation
 
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Anant Corporation
 

Mehr von Anant Corporation (20)

QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137
QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137
QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137
 
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdf
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdfKono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdf
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdf
 
Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache Pinot
Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache PinotData Engineer's Lunch 96: Intro to Real Time Analytics Using Apache Pinot
Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache Pinot
 
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
 
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPTAutomate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
 
YugabyteDB Developer Tools
YugabyteDB Developer ToolsYugabyteDB Developer Tools
YugabyteDB Developer Tools
 
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapEpisode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
 
Machine Learning Orchestration with Airflow
Machine Learning Orchestration with AirflowMachine Learning Orchestration with Airflow
Machine Learning Orchestration with Airflow
 
Cassandra Lunch 130: Recap of Cassandra Forward Talks
Cassandra Lunch 130: Recap of Cassandra Forward TalksCassandra Lunch 130: Recap of Cassandra Forward Talks
Cassandra Lunch 130: Recap of Cassandra Forward Talks
 
Data Engineer's Lunch 90: Migrating SQL Data with Arcion
Data Engineer's Lunch 90: Migrating SQL Data with ArcionData Engineer's Lunch 90: Migrating SQL Data with Arcion
Data Engineer's Lunch 90: Migrating SQL Data with Arcion
 
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
 
Cassandra Lunch 129: What’s New: Apache Cassandra 4.1+ Features & Future
Cassandra Lunch 129: What’s New:  Apache Cassandra 4.1+ Features & FutureCassandra Lunch 129: What’s New:  Apache Cassandra 4.1+ Features & Future
Cassandra Lunch 129: What’s New: Apache Cassandra 4.1+ Features & Future
 
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data Stack
 
CL 121
CL 121CL 121
CL 121
 
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergData Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
 
Apache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOps
Apache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOpsApache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOps
Apache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOps
 
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache CassandraApache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
 

Kürzlich hochgeladen

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Kürzlich hochgeladen (20)

(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 

Data Engineer's Lunch #55: Get Started in Data Engineering

  • 1. Data.Engineers.Toolkit Tools for Cloud Data Engineering The Most Popular Tools in Use Today across 10 Skillsets in Data Engineering
  • 2. Business Platform Success We help our clients build their global platforms on scalable data platforms with our Playbook, Framework, and Knowledge base.
  • 3. Who we help Succeed
  • 4. ARCHITECT noun: architect; chief builder verb: architect; design or make (COMPUTING) “We create and manage global platforms that run on Cassandra and related technologies.”
  • 5. 5 Things We Love : Scalable Fast Data Without Datastax With Datastax
  • 6. The Landscape of Cloud Data Engineering Query BI Data Warehouse DataOps DevOps Data Engineering SQL NoSQL Queues Data Lake
  • 7. SQL - The foundation of data engineering. Still very relevant.
  • 8. SQL / Relational Databases in Data Engineering 1. MySQL - The most popular DB / variant of SQL in use (MariaDB). 2. PostgreSQL - Used by more and more to replace Oracle 3. Microsoft SQL - Still relevant. Not going anywhere. 4. Oracle - Big companies use this. Still relevant. 1. Popularity - Very popular because most software commercial or open source runs on relational databases. 2. Function - What SQL can do in relation to ACID transactions currently hard to beat in NoSQL 3. Staying Power - Open, Commercial, Cloud options. No reason to see it disappearing. Tools Factors
  • 9. NoSQL - The foundation for big data applications. Lots of variants.
  • 10. NoSQL / Non-relational DBs in Data Engineering 1. Mongo - Due to popularity in Node world, in use everywhere. 2. Redis - Needed not only for Apps but in the process of data engineering. 3. Dynamo - Easy to get started. Lots of AWS play apps on Dynamo. 4. Cassandra - In use by the largest companies with critical ops. 1. Popularity - Popular because of ease of use to get started 2. Function - Each has its own special reason to be useful. 3. Staying Power - Different variants / implementations / managed services for these DBs mean that enough people need it for these additional markets of services. Tools Factors
  • 11. Data Lakes on HDFS - The standard for storage and retrieval of files - structured, unstructured, semi-structured, or binary.
  • 12. Data Lakes on HDFS / S3 Distributed File Storage 1. HDFS - Universal protocol for distributed file system access. 2. Amazon S3 - Supports HDFS and S3 object API also a standard now. 3. Google Storage - Does what S3 does on Google 4. Azure Blob - Does what S3 does on Azure 1. Popularity - Popularized due to big data and clouds needing their own distributed file storage. 2. Function - Use as an object storage (key:value) or to store raw files , or structured data for use later in query engines. 3. Staying Power - Is responsible for the massive storage of all “cold” data that doesn’t need to be in a database. HDFS/S3 standards now universal. Tools Factors
  • 13. Streams/Queues - Adding “Real- time” processing into the mix.
  • 14. Streams / Queues in Data Engineering 1. Popularity - Popular because of the rise of real-time use-cases in business platforms. 2. Function - Used to store “everything” that’s happening as well as for focused “events” to trigger processes. 3. Staying Power - Different reasons for staying power: demand in the market and current users continue to grow use-cases. Tools Factors 1. RabbitMQ - Lots of use in business, works well until it doesn’t. 2. Apache Kafka - Full ecosystem and variants that support Kafka protocol. 3. Amazon SQS - Easy to get started and use in Amazon. Similar services in other Clouds
  • 15. Data Engineering - The actual work.
  • 16. Popular Data Engineering Tools 1. Popularity - Different reasons for popularity. Commercial tools save tons of time. 2. Function - Allows to consolidate and standardize all flows into a single system. 3. Staying Power - Apache Spark is a core part of cloud offerings. Stitchdata, Fivetran popular at large companies. Dbt is new but has good growth. Tools Factors 1. Apache Spark - The most popular big data engineering toolkit. Python, Scala, Java, R, C# 2. Dbt - New tool but very powerful. Abstracts database engineering into SQL. 3. Fivetran - Commercial tool for visually managing data flows. 4. Stitch - Similar to Fivetran, many connectors / open Singer framework.
  • 17. Data Operations - Managing In / Out / Around
  • 18. Data Operations in Data Engineering 1. Popularity - Traction in big and small companies. 2. Function - Allows to orchestrate complex workflows of tasks (DAG). 3. Staying Power - Airflow future proof in Kubernetes, Argo is the new kid in Kubernetes. Jenkins is in use in many companies. Tools Factors 1. Airflow - Many connectors to manage complex data flows. 2. Jenkins - Used for CICD can do linear pipelines. 3. Prefect - New but powerful tool in Python 4. Argo - Does CICD but the Workflow engine is useful, runs Kubeflow
  • 19. Data Warehouse - Running SQL at large scale.
  • 20. Data Warehouse - Analytics Across Data 1. Popularity - Warehousing conventions around for a while - dimensions, facts. 2. Function - After bringing data together and relating it , can do massive SQL queries. 3. Staying Power - Theory isn’t going anywhere. Technologies my change, but the core concept is solid. Tools Factors 1. Redshift - Widely used due to Amazon 2. BigQuery - Well integrated query engine in Google. 3. Snowflake - Does a bit of data engineering as well as query engine. 4. MsSQL/Oracle - Commercial DBs have a data warehouse configuration.
  • 21. Query - Virtualizing data through standard query engines.
  • 22. Query Engines - Analytics Across Data Sources 1. Popularity - Hive is a standard, works in different systems like Spark/Hadoop. Presto popular. Denodo coming up. 2. Function - Separates storage from query. “Virtualizes” queries. 3. Staying Power - The theory has been now implemented in Snowflake, Redshift - separate storage from query. These will stick. Tools Factors 1. Apache Hive - Available in Hadoop ecosystem or some variants by cloud vendors. 2. Spark SQL / Hive - Like Hive but on Spark. 3. PrestoDB - Open data virtualization, can run on Spark, works with Hive. 4. Denodo - Commercial data virtualization, can run on Spark
  • 23. Business Intelligence - Visualization and dashboarding data for consumers.
  • 24. Business Intelligence tools for Data Engineers 1. Popularity - BI is HUGE. Learning it is not just about the tool. Tools are always coming and going. 2. Function - Allows non programmers to discover, analyze, and create visualizations, and reports that other non-technical people can consume. 3. Staying Power - Tableau will stick around. Open source Redash now supported by Databricks. Tools Factors 1. Tableau - Very popular since they give people community access. 2. Looker - Commercial grade tool - expect good UI. 3. Redash - Powerful open source tool for data professionals to make reports/dashboards. 4. Metabase - Easy to use tool for non admin / dba types.
  • 26. Dev Ops Tools for Data Engineering Tools More Tools 1. Terraform - Manage different clouds with one language. 2. Prometheus / Grafana - The O.G. of time series system data vis. 3. Ansible - Organizes commands that need to be run better - Setup, Configure, Run ad-hoc commands 1. Docker - Customize your image. 2. Kubernetes - Run your cluster. 3. Argo - CICD for Containers in Kubernetes land. 4. Jenkins - General purpose CICD - can use this to run other tools.
  • 28. Create and manage global data platforms. www.anant.us | solutions@anant.us | (855) 262-6826 3 Washington Circle, NW | Suite 301 | Washington, DC 20037