Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich

•

1 gefällt mir•2,078 views

The term “Lambda Architecture” stands for a generic, scalable and fault-tolerant data processing architecture. As the hyper-scale now offers a various PaaS services for data ingestion, storage and processing, the need for a revised, cloud-native implementation of the lambda architecture is arising. In this talk we demonstrate the blueprint for such an implementation in Microsoft Azure, with Azure Databricks — a PaaS Spark offering – as a key component. We go back to some core principles of functional programming and link them to the capabilities of Apache Spark for various end-to-end big data analytics scenarios. We also illustrate the “Lambda architecture in use” and the associated tread-offs using the real customer scenario – Rijksmuseum in Amsterdam – a terabyte-scale Azure-based data platform handles data from 2.500.000 visitors per year.

Daten & Analysen

Andrei Varanovich, InSpark
Lambda Architecture in
the Cloud with Azure
Databricks
#SAISDev6

Selfie
Data & AI Lead
@DrGigabit
andrei.varanovich
andrei.varanovich@inspark.nl
Data
Programmability
Cloud
High-
performance
teams
Neural
Networks
##SAISDev6

Big Data problem is many
small data problems
##SAISDev6

4
2.500.000 visitors per year
8.000 objects of art and history
1.000.000 objects stored from
the year 1200
##SAISDev6

Under the hood
5
Retail
Sponsorship engagement
Occupancy rate
Service Management
Incident Management
Marketing Performance
Warehouse Management
Capacity Planning
Financial Performance
Ticket Sales
##SAISDev6

THE DATA JOURNEY
6##SAISDev6
Consolidation
Insights
Innovation
Consolidate data
in a centralized
store
Organizational
processes and
efficiency
New ideas,
leveraging
machine learning

7
IN THE NEED FOR THE PLATFORM
START
• Begin small and
focused
• Prove value
GROW
• Grow organically
as more use cases
arise
SCALE
• Go production and
scale to the revel
required
We are in the need of the truly elastic data platform, to avoid any upfront planning, deployment and
operations expenses, and put business value discovery first. The platform should support the [big]data
projects in any stage, without the need to reengineer the whole solution.
##SAISDev6

Lambda Architecture on Azure
8
INGEST BATCH
INGEST STREAM STORE ANALYZE
Azure Data
Lake Store
Azure Data Factory
Azure Databricks
(managed Spark;
batch & streaming)
Social
LOB
Graph
IoT
Image
CRM
AI models/ APIs
Cognitive Services
Azure container
Service & registry
Insight sharing
Power BI/ other tools
Event Hubs
Stream
Batch
SECURITY &
MANAGEMENT Azure Log
Analytics
Azure Graph
API
Cost
monitoring
Azure Active Directory
##SAISDev6

9
Optimized Databricks Runtime Engine
DATABRICKS I/O SERVERLESS
Collaborative workspace
Cloud storage
Data warehouses
Hadoop storage
IoT / streaming data
Rest APIs
Machine learning models
BI tools
Data exports
Data warehouses
Azure Databricks
Production jobs & workflows
APACHE SPARK
MULTI-STAGE PIPELINES
DATA ENGINEER
JOB SCHEDULER NOTIFICATION & LOGS
DATA SCIENTIST BUSINESS ANALYST
##SAISDev6

10
Simplicity is the ultimate sophistication
Leonardo da Vinci
##SAISDev6

##SAISDev6 13
Composition of
functions is applying
one function to the
result of another

14
f(x) = x+1
(g º f)(x) = g(f(x))
(g º f)(x) = (x+1)2
g(x) = x2
input input+1 input2
##SAISDev6
input+1
(input+1)2

15
Transformation pipeline as a series of transitions
s1
f3
f2
f1
sum

Optimized Databricks Runtime Engine
DATABRICKS I/O SERVERLESS
Collaborative workspace
Cloud storage
Data warehouses
Hadoop storage
IoT / streaming data
Rest APIs
Machine learning models
BI tools
Data exports
Data warehouses
Azure Databricks
Production jobs & workflows
APACHE SPARK
MULTI-STAGE PIPELINES
DATA ENGINEER
JOB SCHEDULER NOTIFICATION & LOGS
DATA SCIENTIST BUSINESS ANALYST
##SAISDev6

Conclusions
17
… with proper design, the features come cheaply. This
approach is arduous, but continues to succeed.
—Dennis Ritchie
##SAISDev6
• Standardization on Apache Spark allows us to move forward without
introducing extra complexity.
• 100% PaaS offering is important – no need to maintain the
infrastructure. All components we use offered as PaaS on Azure.
• Data pipelines as function composition allows us to ensure end-to-end
consistency and spot the errors quickly.
• Saving intermediate states allows to quickly inspect the data sets.

Weitere ähnliche Inhalte

Was ist angesagt?

Data scientists face numerous challenges throughout the data science workflow that hinder productivity. As organizations continue to become more data-driven, a collaborative environment is more critical than ever — one that provides easier access and visibility into the data, reports and dashboards built against the data, reproducibility, and insights uncovered within the data.. Join us to hear how Databricks’ open and collaborative platform simplifies data science by enabling you to run all types of analytics workloads, from data preparation to exploratory analysis and predictive analytics, at scale — all on one unified platform.

Learn to Use Databricks for Data Science

Databricks

Spark is fast becoming a critical part of Customer Solutions on Azure. Databricks on Microsoft Azure provides a first-class experience for building and running Spark applications. The Microsoft Azure CAT team engaged with many early adopter customers helping them build their solutions on Azure Databricks. In this session, we begin by reviewing typical workload patterns, integration with other Azure services like Azure Storage, Azure Data Lake, IoT / Event Hubs, SQL DW, PowerBI etc. Most importantly, we will share real-world tips and learnings that you can take and apply in your Data Engineering / Data Science workloads

Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta

Databricks

Embarking on building a modern data warehouse in the cloud can be an overwhelming experience due to the sheer number of products that can be used, especially when the use cases for many products overlap others. In this talk I will cover the use cases of many of the Microsoft products that you can use when building a modern data warehouse, broken down into four areas: ingest, store, prep, and model & serve. It’s a complicated story that I will try to simplify, giving blunt opinions of when to use what products and the pros/cons of each.

Building a modern data warehouse

James Serra

Databricks Fundamentals

Dalibor Wijas

Databricks used to use a static manually maintained wiki page for internal data exploration. We will discuss how we leverage Amundsen, an open source data discovery tool from Linux Foundation AI & Data, to improve productivity with trust by surfacing the most relevant dataset and SQL analytics dashboard with its important information programmatically at Databricks internally. We will also talk about how we integrate Amundsen with Databricks world class infrastructure to surface metadata including: Surface the most popular tables used within Databricks Support fuzzy search and facet search for dataset- Surface rich metadata on datasets: Lineage information (downstream table, upstream table, downstream jobs, downstream users) Dataset owner Dataset frequent users Delta extend metadata (e.g change history) ETL job that generates the dataset Column stats on numeric type columns Dashboards that use the given dataset Use Databricks data tab to show the sample data Surface metadata on dashboards including: create time, last update time, tables used, etc Last but not least, we will discuss how we incorporate internal user feedback and provide the same discovery productivity improvements for Databricks customers in the future.

Data Discovery at Databricks with Amundsen

Databricks

Announcing Databricks Cloud (Spark Summit 2014)

Databricks

Azure Databricks is a fast, easy, and collaborative Apache Spark-based analytics platform optimized for Azure. Designed in collaboration with the founders of Apache Spark, Azure Databricks combines the best of Databricks and Azure to help customers accelerate innovation with one-click set up, streamlined workflows, and an interactive workspace that enables collaboration between data scientists, data engineers, and business analysts. As an Azure service, customers automatically benefit from the native integration with other Azure services such as Power BI, SQL Data Warehouse, and Cosmos DB, as well as from enterprise-grade Azure security, including Active Directory integration, compliance, and enterprise-grade SLAs.

Microsoft Azure Databricks

Sascha Dittmann

Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh? In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry. The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems. This session is targeted for architects, decision-makers, data-engineers, and system designers.

Architect’s Open-Source Guide for a Data Mesh Architecture

Databricks

In this presentation, we will study a recent use case we implemented recently. In this use case we are working with a large, metropolitan fire department. Our company has already created a complete analytics architecture for the department based upon Azure Data Factory, Databricks, Delta Lake, Azure SQL and Azure SQL Server Analytics Services (SSAS). While this architecture works very well for the department, they would like to add a real-time channel to their reporting infrastructure. This channel should serve up the following information: •The most up-to-date locations and status of equipment (fire trucks, ambulances, ladders etc.) • The current locations and status of firefighters, EMT personnel and other relevant fire department employees • The current list of active incidents within the city The above information should be visualized through an automatically updating dashboard. The central component of the dashboard will be map which automatically updates with the locations and incidents. This view should be as real-time as possible and will be used by the fire chiefs to assist with real-time decision-making on resource and equipment deployments. In this presentation, we will leverage Databricks, Spark Structured Streaming, Delta Lake and the Azure platform to create this real-time delivery channel.

Build Real-Time Applications with Databricks Streaming

Databricks

A Thorough Comparison of Delta Lake, Iceberg and Hudi

Databricks

With so many new technologies it can get confusing on the best approach to building a big data architecture. The data lake is a great new concept, usually built in Hadoop, but what exactly is it and how does it fit in? In this presentation I'll discuss the four most common patterns in big data production implementations, the top-down vs bottoms-up approach to analytics, and how you can use a data lake and a RDBMS data warehouse together. We will go into detail on the characteristics of a data lake and its benefits, and how you still need to perform the same data governance tasks in a data lake as you do in a data warehouse. Come to this presentation to make sure your data lake does not turn into a data swamp!

Big data architectures and the data lake

James Serra

Lakehouse in Azure

Sergio Zenatti Filho

Building Lakehouses on Delta Lake with SQL Analytics Primer

Databricks

Timely data in a data warehouse is a challenge many of us face, often with there being no straightforward solution. Using a combination of batch and streaming data pipelines you can leverage the Delta Lake format to provide an enterprise data warehouse at a near real-time frequency. Delta Lake eases the ETL workload by enabling ACID transactions in a warehousing environment. Coupling this with structured streaming, you can achieve a low latency data warehouse. In this talk, we’ll talk about how to use Delta Lake to improve the latency of ingestion and storage of your data warehouse tables. We’ll also talk about how you can use spark streaming to build the aggregations and tables that drive your data warehouse.

Near Real-Time Data Warehousing with Apache Spark and Delta Lake

Databricks

Change Data Feed is a new feature of Delta Lake on Databricks that is available as a public preview since DBR 8.2. This feature enables a new class of ETL workloads such as incremental table/view maintenance and change auditing that were not possible before. In short, users will now be able to query row level changes across different versions of a Delta table. In this talk we will dive into how Change Data Feed works under the hood and how to use it with existing ETL jobs to make them more efficient and also go over some new workloads it can enable.

Change Data Feed in Delta

Databricks

Azure Data Factory V2; The Data Flows

Thomas Sykes

[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...

DataScienceConferenc1

In this session, learn how to quickly supplement your on-premises Hadoop environment with a simple, open, and collaborative cloud architecture that enables you to generate greater value with scaled application of analytics and AI on all your data. You will also learn five critical steps for a successful migration to the Databricks Lakehouse Platform along with the resources available to help you begin to re-skill your data teams.

5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop

Databricks

Building an open data platform with apache iceberg

Alluxio, Inc.

Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...

Cathrine Wilhelmsen

Was ist angesagt? (20)

Learn to Use Databricks for Data Science

Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta

Building a modern data warehouse

Databricks Fundamentals

Data Discovery at Databricks with Amundsen

Announcing Databricks Cloud (Spark Summit 2014)

Microsoft Azure Databricks

Architect’s Open-Source Guide for a Data Mesh Architecture

Build Real-Time Applications with Databricks Streaming

A Thorough Comparison of Delta Lake, Iceberg and Hudi

Big data architectures and the data lake

Lakehouse in Azure

Building Lakehouses on Delta Lake with SQL Analytics Primer

Near Real-Time Data Warehousing with Apache Spark and Delta Lake

Change Data Feed in Delta

Azure Data Factory V2; The Data Flows

[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...

5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop

Building an open data platform with apache iceberg

Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...

Ähnlich wie Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich

We’ll talk about the changes in the industry that customers are faced with and how Red Hat Hyperconverged Infrastructure can address those challenges . Our customers are struggling not only to manage the growth of big data (structured and unstructured), but also to reap timely business insights from their data using their existing data infrastructure like monolithic Hadoop clusters. This often leads to alternative approaches that often lead to disappointing results.

Managing data analytics in a hybrid cloud

Karan Singh

At Sams Club we have a long history of using Apache Spark and Hadoop. Projects from all parts of the company use Apache Spark, from fraud detection to product recommendations. Because of the scale of our business with billions of transactions and trillions of events it is often essential to use big data technologies. Until recently all of this work has run on several large on-premise Hadoop clusters. As part of our transition to public cloud we needed to build out an enterprise scale data platform. Azure Databricks is a key component of this platform giving our data scientist, engineers, and business users the ability to easily work with the companies data. We will discuss our architecture considerations that lead to using multiple Databricks workspaces and external Azure blob storage. We will also discuss how we move massive amounts of data to Azure on a daily basis with Airflow. Further we will discuss the self-service tools that we created to help users get their data to Azure and for us to manage the platform. Finally we will discuss our security considerations and how that played out in our architecture. Authors: Andrew Ray, Craig Covey

Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...

Databricks

"Combining Databricks, the unified analytics platform with Snowflake, the data warehouse built for the cloud is a powerful combo. Databricks offers the ability to process large amounts of data reliably, including developing scalable AI projects. Snowflake offers the elasticity of a cloud-based data warehouse that centralizes the access to data. Databricks brings the unparalleled utility of being based on a mature distributed big data processing and AI-enabled tool to the table, capable of integrating with nearly every technology, from message queues (e.g. Kafka) to databases (e.g. Snowflake) to object stores (e.g. S3) and AI tools (e.g. Tensorflow). Key Takeaways: How Databricks & Snowflake work; Why they're so powerful; How Databricks + Snowflake symbiotically catalyze analytics and AI initiatives"

Databricks + Snowflake: Catalyzing Data and AI Initiatives

Databricks

Talend introduction v1

Softnix Technology

Microsoft R Server for Data Sciencea

Data Science Thailand

132177_16x9_GlobalAlliances_Presentation_NetApp_01_D[1]

Ruth White-Cabbell

Azure Synapse 101 Webinar Presentation

Matthew W. Bowers

The Briefing Room with Richard Hackathorn and Teradata Live Webcast March 25, 2015 Watch the Archive: https://bloorgroup.webex.com/bloorgroup/onstage/g.php?MTID=e7254708146d056339a0974f097f569b2 Hadoop data lakes are emerging as peers to corporate data warehouses. However, successful analytic solutions require a fusion of all relevant data, big and small, which has proven challenging for many companies. By allowing business analysts to quickly access data wherever it rests, success factors shift to focus on three key aspects: 1) business objectives, 2) organizational workflow, and 3) data placement. Register for this Special Edition of The Briefing Room to hear veteran Analyst Richard Hackathorn as he provides details from his recent research report focused on success stories using Teradata QueryGrid. Examples of use cases described will include: Joining sensor data in Hadoop with data warehouse labor schedules in seconds How bridging corporate cultures and systems creates new business opportunities The 360 view of customer journeys using weblogs in Hadoop via BI tools How can you put the data where you want and query it however you want Virtualizing Hadoop data with Teradata QueryGrid Visit InsideAnalysis.com for more information.

Achieving Business Value by Fusing Hadoop and Corporate Data

Inside Analysis

Deploying machine learning models seems like it should be a relatively easy task. Take your model and pass it some features in production. The reality is that the code written during the prototyping phase of model development doesn’t always work when applied at scale or on “real” data. This talk will explore 1) common problems at the intersection of data science and data engineering 2) how you can structure your code so there is minimal friction between prototyping and production, and 3) how you can use Apache Spark to run predictions on your models in batch or streaming contexts. You will take away how to address some of productionizing issues that data scientists and data engineers face while deploying machine learning models at scale and a better understanding of how to work collaboratively to minimize disparity between prototyping and productizing.

Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...

Databricks

NEW LAUNCH! Introducing AWS IoT Analytics - IOT214 - re:Invent 2017

Amazon Web Services

Trivadis - Microsoft Transform your data estate with cloud, data and AI

Trivadis

Azure Databricks - An Introduction (by Kris Bock)

Daniel Toomey

Introduction to Azure Synapse Webinar

Peter Ward

EsgynDB: A Big Data Engine. Simplifying Fast and Reliable Mixed Workloads

Srikanth Ramakrishnan

Azure Databricks & Spark @ Techorama 2018

Nathan Bijnens

Be the Data Hero in Your Organization with SAP and CA Analytic Solutions

CA Technologies

Watch full webinar here: https://bit.ly/3hfEO6d Die SAP Analytics Cloud (kurz "SAC" genannt) ist ein Service in der Cloud, der umfangreiche Analysefunktionen für Benutzer in einem Produkt bereit stellt. Wie immer bei der SAP ist auch die SAC technologisch gut integriert in die Welt der SAP Systeme. Doch die Daten, die Unternehmen heutzutage analysieren möchten, befinden sich sehr häufig in den unterschiedlichsten Datenquellen: In relationalen Datenbanken, in Data Lakes, in Webservices, in Dateien, in NoSQL Datenbanken,... Und so stellt sich zwangsläufig die Frage, wie Sie aus der SAC heraus alle Daten konnektieren, transformieren und kombinieren können. Und das möglichst live, d.h. mit Abfragen auf Echtzeit-Daten! Hier kommt die Datenvirtualisierung ins Spiel: Sie bietet Anwendungen (so auch der SAC) einen einheitlichen, integrierten und performanten Zugriff auf SAP Daten und non-SAP Daten. Erfahren Sie in diesem Webcast: - Wie die Datenvirtualisierung funktioniert (in a Nutshell) - Wie Sie aus der SAC heraus auf alle ihre Daten in Echtzeit zugreifen können ("Live Data Connection" genannt) - Wie die Datenvirtualisierung die Performance auch für Abfragen auf grossen Datenmengen optimiert

SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?

Denodo

SAPPHIRENOW 2017 – ASUG Preconference – Connecting Things, People, and Processes for Live Business with Solutions for the Internet of Things – BEFORE YOU GO! For over 40 years, SAP has been helping customers map their journey through digital transformation. No one has more knowledge and experience in business processes while giving our customers a 360° view of their entire business operation. IoT has the potential to drive the largest segment of growth in new business value with the right solution infrastructure. SAP can help you know where opportunities and risk exist, evolve your business processes to act upon the potential opportunity or risk to create new business value and ecosystem advantages.

ASUG SAPPHIRENOW 2017 - SAP Leonardo Internet of Things - Briefing Book

Pushkar Ranjan

Analytics in a Day Ft. Synapse Virtual Workshop

CCG

Watch this talk here: https://www.confluent.io/online-talks/driving-business-transformation-real-time-analytics-using-apache-kafka-and-ksql Digital transformation is more than just a buzzword, it’s become a necessity in order to compete in the modern era. At the heart of digital transformation is real-time data. Your organization must respond in real time to every customer experience transaction, sale, and market movement in order to stay competitive. Streaming data technologies like Apache Kafka® and Confluent KSQL, the streaming SQL engine for Apache Kafka, are being used to detect and react to events as they occur. Combining this technology with the analytics insights from RCG and visualizations from Arcadia Data delivers a powerful foundation for driving real time business decisions. Use cases span across industries and include retail transaction cost analysis, automotive maintenance and loyalty program management, and credit card fraud detection. Join experts from Confluent, RCG and Arcadia Data for a discussion and demo on how companies are integrating streaming data technologies to transform their business. You will learn: -Why Apache Kafka is widely used for real-time event monitoring and decisioning -How to integrate real-time analytics and visualizations to drive business processes -How KSQL, streaming SQL for Kafka, can easily transform and filter streams of data in real time

Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...

confluent

Ähnlich wie Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich (20)

Managing data analytics in a hybrid cloud

Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...

Databricks + Snowflake: Catalyzing Data and AI Initiatives

Talend introduction v1

Microsoft R Server for Data Sciencea

132177_16x9_GlobalAlliances_Presentation_NetApp_01_D[1]

Azure Synapse 101 Webinar Presentation

Achieving Business Value by Fusing Hadoop and Corporate Data

Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...

NEW LAUNCH! Introducing AWS IoT Analytics - IOT214 - re:Invent 2017

Trivadis - Microsoft Transform your data estate with cloud, data and AI

Azure Databricks - An Introduction (by Kris Bock)

Introduction to Azure Synapse Webinar

EsgynDB: A Big Data Engine. Simplifying Fast and Reliable Mixed Workloads

Azure Databricks & Spark @ Techorama 2018

Be the Data Hero in Your Organization with SAP and CA Analytic Solutions

SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?

ASUG SAPPHIRENOW 2017 - SAP Leonardo Internet of Things - Briefing Book

Analytics in a Day Ft. Synapse Virtual Workshop

Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...

Mehr von Databricks

DW Migration Webinar-March 2022.pptx

Databricks

The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse. Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today. Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow. This is an educational event. Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.

Data Lakehouse Symposium | Day 1 | Part 1

Databricks

Data Lakehouse Symposium | Day 1 | Part 2

Databricks

Data Lakehouse Symposium | Day 2

Databricks

Application performance monitoring (APM) has become the cornerstone of software engineering allowing engineering teams to quickly identify and remedy production issues. However, as the world moves to intelligent software applications that are built using machine learning, traditional APM quickly becomes insufficient to identify and remedy production issues encountered in these modern software applications. As a lead software engineer at NewRelic, my team built high-performance monitoring systems including Insights, Mobile, and SixthSense. As I transitioned to building ML Monitoring software, I found the architectural principles and design choices underlying APM to not be a good fit for this brand new world. In fact, blindly following APM designs led us down paths that would have been better left unexplored. In this talk, I draw upon my (and my team’s) experience building an ML Monitoring system from the ground up and deploying it on customer workloads running large-scale ML training with Spark as well as real-time inference systems. I will highlight how the key principles and architectural choices of APM don’t apply to ML monitoring. You’ll learn why, understand what ML Monitoring can successfully borrow from APM, and hear what is required to build a scalable, robust ML Monitoring architecture.

Why APM Is Not the Same As ML Monitoring

Databricks

Autonomy and ownership are core to working at Stitch Fix, particularly on the Algorithms team. We enable data scientists to deploy and operate their models independently, with minimal need for handoffs or gatekeeping. By writing a simple function and calling out to an intuitive API, data scientists can harness a suite of platform-provided tooling meant to make ML operations easy. In this talk, we will dive into the abstractions the Data Platform team has built to enable this. We will go over the interface data scientists use to specify a model and what that hooks into, including online deployment, batch execution on Spark, and metrics tracking and visualization.

The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix

Databricks

In this talk, I will dive into the stage level scheduling feature added to Apache Spark 3.1. Stage level scheduling extends upon Project Hydrogen by improving big data ETL and AI integration and also enables multiple other use cases. It is beneficial any time the user wants to change container resources between stages in a single Apache Spark application, whether those resources are CPU, Memory or GPUs. One of the most popular use cases is enabling end-to-end scalable Deep Learning and AI to efficiently use GPU resources. In this type of use case, users read from a distributed file system, do data manipulation and filtering to get the data into a format that the Deep Learning algorithm needs for training or inference and then sends the data into a Deep Learning algorithm. Using stage level scheduling combined with accelerator aware scheduling enables users to seamlessly go from ETL to Deep Learning running on the GPU by adjusting the container requirements for different stages in Spark within the same application. This makes writing these applications easier and can help with hardware utilization and costs. There are other ETL use cases where users want to change CPU and memory resources between stages, for instance there is data skew or perhaps the data size is much larger in certain stages of the application. In this talk, I will go over the feature details, cluster requirements, the API and use cases. I will demo how the stage level scheduling API can be used by Horovod to seamlessly go from data preparation to training using the Tensorflow Keras API using GPUs. The talk will also touch on other new Apache Spark 3.1 functionality, such as pluggable caching, which can be used to enable faster dataframe access when operating from GPUs.

Stage Level Scheduling Improving Big Data and AI Integration

Databricks

In this talk, I would like to introduce an open-source tool built by our team that simplifies the data conversion from Apache Spark to deep learning frameworks. Imagine you have a large dataset, say 20 GBs, and you want to use it to train a TensorFlow model. Before feeding the data to the model, you need to clean and preprocess your data using Spark. Now you have your dataset in a Spark DataFrame. When it comes to the training part, you may have the problem: How can I convert my Spark DataFrame to some format recognized by my TensorFlow model? The existing data conversion process can be tedious. For example, to convert an Apache Spark DataFrame to a TensorFlow Dataset file format, you need to either save the Apache Spark DataFrame on a distributed filesystem in parquet format and load the converted data with third-party tools such as Petastorm, or save it directly in TFRecord files with spark-tensorflow-connector and load it back using TFRecordDataset. Both approaches take more than 20 lines of code to manage the intermediate data files, rely on different parsing syntax, and require extra attention for handling vector columns in the Spark DataFrames. In short, all these engineering frictions greatly reduced the data scientists’ productivity. The Databricks Machine Learning team contributed a new Spark Dataset Converter API to Petastorm to simplify these tedious data conversion process steps. With the new API, it takes a few lines of code to convert a Spark DataFrame to a TensorFlow Dataset or a PyTorch DataLoader with default parameters. In the talk, I will use an example to show how to use the Spark Dataset Converter to train a Tensorflow model and how simple it is to go from single-node training to distributed training on Databricks.

Simplify Data Conversion from Spark to TensorFlow and PyTorch

Databricks

There is no doubt Kubernetes has emerged as the next generation of cloud native infrastructure to support a wide variety of distributed workloads. Apache Spark has evolved to run both Machine Learning and large scale analytics workloads. There is growing interest in running Apache Spark natively on Kubernetes. By combining the flexibility of Kubernetes and scalable data processing with Apache Spark, you can run any data and machine pipelines on this infrastructure while effectively utilizing resources at disposal. In this talk, Rajesh Thallam and Sougata Biswas will share how to effectively run your Apache Spark applications on Google Kubernetes Engine (GKE) and Google Cloud Dataproc, orchestrate the data and machine learning pipelines with managed Apache Airflow on GKE (Google Cloud Composer). Following topics will be covered: – Understanding key traits of Apache Spark on Kubernetes- Things to know when running Apache Spark on Kubernetes such as autoscaling- Demonstrate running analytics pipelines on Apache Spark orchestrated with Apache Airflow on Kubernetes cluster.

Scaling your Data Pipelines with Apache Spark on Kubernetes

Databricks

Pipelines have become ubiquitous, as the need for stringing multiple functions to compose applications has gained adoption and popularity. Common pipeline abstractions such as “fit” and “transform” are even shared across divergent platforms such as Python Scikit-Learn and Apache Spark. Scaling pipelines at the level of simple functions is desirable for many AI applications, however is not directly supported by Ray’s parallelism primitives. In this talk, Raghu will describe a pipeline abstraction that takes advantage of Ray’s compute model to efficiently scale arbitrarily complex pipeline workflows. He will demonstrate how this abstraction cleanly unifies pipeline workflows across multiple platforms such as Scikit-Learn and Spark, and achieves nearly optimal scale-out parallelism on pipelined computations. Attendees will learn how pipelined workflows can be mapped to Ray’s compute model and how they can both unify and accelerate their pipelines with Ray.

Scaling and Unifying SciKit Learn and Apache Spark Pipelines

Databricks

In this talk about zipline, we will introduce a new type of windowing construct called a sawtooth window. We will describe various properties about sawtooth windows that we utilize to achieve online-offline consistency, while still maintaining high-throughput, low-read latency and tunable write latency for serving machine learning features.We will also talk about a simple deployment strategy for correcting feature drift – due operations that are not “abelian groups”, that operate over change data.

Sawtooth Windows for Feature Aggregations

Databricks

We want to present multiple anti patterns utilizing Redis in unconventional ways to get the maximum out of Apache Spark.All examples presented are tried and tested in production at Scale at Adobe. The most common integration is spark-redis which interfaces with Redis as a Dataframe backing Store or as an upstream for Structured Streaming. We deviate from the common use cases to explore where Redis can plug gaps while scaling out high throughput applications in Spark. Niche 1 : Long Running Spark Batch Job – Dispatch New Jobs by polling a Redis Queue · Why? o Custom queries on top a table; We load the data once and query N times · Why not Structured Streaming · Working Solution using Redis Niche 2 : Distributed Counters · Problems with Spark Accumulators · Utilize Redis Hashes as distributed counters · Precautions for retries and speculative execution · Pipelining to improve performance

Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink

Databricks

In the era of microservices, decentralized ML architectures and complex data pipelines, data quality has become a bigger challenge than ever. When data is involved in complex business processes and decisions, bad data can, and will, affect the bottom line. As a result, ensuring data quality across the entire ML pipeline is both costly, and cumbersome while data monitoring is often fragmented and performed ad hoc. To address these challenges, we built whylogs, an open source standard for data logging. It is a lightweight data profiling library that enables end-to-end data profiling across the entire software stack. The library implements a language and platform agnostic approach to data quality and data monitoring. It can work with different modes of data operations, including streaming, batch and IoT data. In this talk, we will provide an overview of the whylogs architecture, including its lightweight statistical data collection approach and various integrations. We will demonstrate how the whylogs integration with Apache Spark achieves large scale data profiling, and we will show how users can apply this integration into existing data and ML pipelines.

Re-imagine Data Monitoring with whylogs and Spark

Databricks

Machine learning (ML) models are typically part of prediction queries that consist of a data processing part (e.g., for joining, filtering, cleaning, featurization) and an ML part invoking one or more trained models. In this presentation, we identify significant and unexplored opportunities for optimization. To the best of our knowledge, this is the first effort to look at prediction queries holistically, optimizing across both the ML and SQL components. We will present Raven, an end-to-end optimizer for prediction queries. Raven relies on a unified intermediate representation that captures both data processing and ML operators in a single graph structure. This allows us to introduce optimization rules that (i) reduce unnecessary computations by passing information between the data processing and ML operators (ii) leverage operator transformations (e.g., turning a decision tree to a SQL expression or an equivalent neural network) to map operators to the right execution engine, and (iii) integrate compiler techniques to take advantage of the most efficient hardware backend (e.g., CPU, GPU) for each operator. We have implemented Raven as an extension to Spark’s Catalyst optimizer to enable the optimization of SparkSQL prediction queries. Our implementation also allows the optimization of prediction queries in SQL Server. As we will show, Raven is capable of improving prediction query performance on Apache Spark and SQL Server by up to 13.1x and 330x, respectively. For complex models, where GPU acceleration is beneficial, Raven provides up to 8x speedup compared to state-of-the-art systems. As part of the presentation, we will also give a demo showcasing Raven in action.

Raven: End-to-end Optimization of ML Prediction Queries

Databricks

Semantic segmentation is the classification of every pixel in an image/video. The segmentation partitions a digital image into multiple objects to simplify/change the representation of the image into something that is more meaningful and easier to analyze [1][2]. The technique has a wide variety of applications ranging from perception in autonomous driving scenarios to cancer cell segmentation for medical diagnosis. Exponential growth in the datasets that require such segmentation is driven by improvements in the accuracy and quality of the sensors generating the data extending to 3D point cloud data. This growth is further compounded by exponential advances in cloud technologies enabling the storage and compute available for such applications. The need for semantically segmented datasets is a key requirement to improve the accuracy of inference engines that are built upon them. Streamlining the accuracy and efficiency of these systems directly affects the value of the business outcome for organizations that are developing such functionalities as a part of their AI strategy. This presentation details workflows for labeling, preprocessing, modeling, and evaluating performance/accuracy. Scientists and engineers leverage domain-specific features/tools that support the entire workflow from labeling the ground truth, handling data from a wide variety of sources/formats, developing models and finally deploying these models. Users can scale their deployments optimally on GPU-based cloud infrastructure to build accelerated training and inference pipelines while working with big datasets. These environments are optimized for engineers to develop such functionality with ease and then scale against large datasets with Spark-based clusters on the cloud.

Processing Large Datasets for ADAS Applications using Apache Spark

Databricks

At Adobe Experience Platform, we ingest TBs of data every day and manage PBs of data for our customers as part of the Unified Profile Offering. At the heart of this is a bunch of complex ingestion of a mix of normalized and denormalized data with various linkage scenarios power by a central Identity Linking Graph. This helps power various marketing scenarios that are activated in multiple platforms and channels like email, advertisements etc. We will go over how we built a cost effective and scalable data pipeline using Apache Spark and Delta Lake and share our experiences. What are we storing? Multi Source – Multi Channel Problem Data Representation and Nested Schema Evolution Performance Trade Offs with Various formats Go over anti-patterns used (String FTW) Data Manipulation using UDFs Writer Worries and How to Wipe them Away Staging Tables FTW Datalake Replication Lag Tracking Performance Time!

Massive Data Processing in Adobe Using Delta Lake

Databricks

Detecting advanced email attacks at scale is a challenging ML problem, particularly due to the rarity of attacks, adversarial nature of the problem, and scale of data. In order to move quickly and adapt to the newest threat we needed to build a Continuous Integration / Continuous Delivery pipeline for the entire ML detection stack. Our goal is to enable detection engineers and data scientists to make changes to any part of the stack including joined datasets for hydration, feature extraction code, detection logic, and develop/train ML models. In this talk, we discuss why we decided to build this pipeline, how it is used to accelerate development and ensure quality, and dive into the nitty-gritty details of building such a system on top of an Apache Spark + Databricks stack.

Machine Learning CI/CD for Email Attack Detection

Databricks

Sarah: CEO-Finance-Report pipeline seems to be slow today. Why Jeeves: SparkSQL query dbt_fin_model in CEO-Finance-Report is running 53% slower on 2/28/2021. Data skew issue detected. Issue has not been seen in last 90 days. Jeeves: Adding 5 more nodes to cluster recommended for CEO-Finance-Report to finish in its 99th percentile time of 5.2 hours. Who is Jeeves? An experienced Spark developer? A seasoned administrator? No, Jeeves is a chatbot created to simplify data operations management for enterprise Spark clusters. This chatbot is powered by advanced AI algorithms and an intuitive conversational interface that together provide answers to get users in and out of problems quickly. Instead of being stuck to screens displaying logs and metrics, users can now have a more refreshing experience via a two-way conversation with their own personal Spark expert. We presented Jeeves at Spark Summit 2019. In the two years since, Jeeves has grown up a lot. Jeeves can now learn continuously as telemetry information streams in from more and more applications, especially SQL queries. Jeeves now “knows” about data pipelines that have many components. Jeeves can also answer questions about data quality in addition to performance, cost, failures, and SLAs. For example: Tom: I am not seeing any data for today in my Campaign Metrics Dashboard. Jeeves: 3/5 validations failed on the cmp_kpis table on 2/28/2021. Run of pipeline cmp_incremental_daily failed on 2/28/2021. This talk will give an overview of the newer capabilities of the chatbot, and how it now fits in a modern data stack with the emergence of new data roles like analytics engineers and machine learning engineers. You will learn how to build chatbots that tackle your complex data operations challenges.

Jeeves Grows Up: An AI Chatbot for Performance and Quality

Databricks

Hyperparameter tuning is critical in model development. And its general form: parameter tuning with an objective function is also widely used in industry. On the other hand, Apache Spark can handle massive parallelism, and Apache Spark ML is a solid machine learning solution. But we have not seen a general and intuitive distributed parameter tuning solution based on Apache Spark, why? Not every tuning problem is on Apache Spark ML models. How can Apache Spark handle general models? Not every tuning problem is a parallelizable grid or random search. Bayesian optimization is sequential, how can Apache Spark help in this case? Not every tuning problem is single epoch, deep learning is not. How to fit algos such as hyperband and ASHA into Apache Spark? Not every tuning problem is a machine learning problem, for example simulation + tuning is also common. How to generalize? In this talk, we are going to show how using Fugue-Tune and Apache Spark together can eliminate these painpoints Fugue-Tune like Fugue, is a “super framework” – an absraction layer unifying existing solutions such as Hyperopt and Optuna It firstly models the general tuning problems, independent from machine learning It is designed for both small and large scale problems. It can always fully parallelize the distributable part of a tuning problem It works for both classical and deep learning models. With Fugue, running hyperband and ASHA becomes possible on Apache Spark. In the demo, you will see how to do any type of tuning in a consistent, intuitive, scalable and minimal way. And you will see a live demo of the amazing performance.

Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue

Databricks

When it comes to Large Scale data processing and Machine Learning, Apache Spark is no doubt one of the top battle-tested frameworks out there for handling batched or streaming workloads. The ease of use, built-in Machine Learning modules, and multi-language support makes it a very attractive choice for data wonks. However bootstrapping and getting off the ground could be difficult for most teams without leveraging a Spark cluster that is already pre-provisioned and provided as a managed service in the Cloud, while this is a very attractive choice to get going, in the long run, it could be a very expensive option if it’s not well managed. As an alternative to this approach, our team has been exploring and working a lot with running Spark and all our Machine Learning workloads and pipelines as containerized Docker packages on Kubernetes. This provides an infrastructure-agnostic abstraction layer for us, and as a result, it improves our operational efficiency and reduces our overall compute cost. Most importantly, we can easily target our Spark workload deployment to run on any major Cloud or On-prem infrastructure (with Kubernetes as the common denominator) by just modifying a few configurations. In this talk, we will walk you through the process our team follows to make it easy for us to run a production deployment of our Machine Learning workloads and pipelines on Kubernetes which seamlessly allows us to port our implementation from a local Kubernetes set up on the laptop during development to either an On-prem or Cloud Kubernetes environment

Infrastructure Agnostic Machine Learning Workload Deployment

Databricks

Mehr von Databricks (20)

DW Migration Webinar-March 2022.pptx

Data Lakehouse Symposium | Day 1 | Part 1

Data Lakehouse Symposium | Day 1 | Part 2

Data Lakehouse Symposium | Day 2

Why APM Is Not the Same As ML Monitoring

The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix

Stage Level Scheduling Improving Big Data and AI Integration

Simplify Data Conversion from Spark to TensorFlow and PyTorch

Scaling your Data Pipelines with Apache Spark on Kubernetes

Scaling and Unifying SciKit Learn and Apache Spark Pipelines

Sawtooth Windows for Feature Aggregations

Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink

Re-imagine Data Monitoring with whylogs and Spark

Raven: End-to-end Optimization of ML Prediction Queries

Processing Large Datasets for ADAS Applications using Apache Spark

Massive Data Processing in Adobe Using Delta Lake

Machine Learning CI/CD for Email Attack Detection

Jeeves Grows Up: An AI Chatbot for Performance and Quality

Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue

Infrastructure Agnostic Machine Learning Workload Deployment

Kürzlich hochgeladen

20240412-SmartCityIndex-2024-Full-Report.pdf

khraisr

Discover Why Less is More in B2B Research

michael115558

Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models We are available 24*7 Booking Contact Details :- WhatsApp Chat :- +91-7014168258 If you're looking for India Call girls you've come to the right place. You'll find some of the most beautiful call girls in our location with. These ladies have pleasing personalities, hot figures, and a passion for physical pleasure. Call girls in India Lucknow Many men have booked them for their erotic and soul-mixing performances, which are sure to leave you with unforgettable memories. #K09 Escort Service India is available in the city for men and women of all ages. They can satisfy your sexual needs and will make your experience even more enjoyable and memorable. Whether you're looking for a blow-job, stripping, lovemaking, or other dirty acts, you'll be able to find a match for your tastes and budget. These highly trained professionals will help you have an unforgettable night. One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7014168258 We are available 24*7 all days of the year. Call us — 7014168258 Thank you for Visiting.

Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...

gajnagarg

Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C 9548273370 Escort Service CALL GIRL IN Lucknow 9548273370 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL IN #j11 We are Providing :- ● – Private independent collage Going girls . ● – independent Models . ● – House Wife’s . ● – Private Independent House Wife’s ● – Corporate M.N.C Working Profiles . ● – Call Center Girls . ● – Live Band Girls . ●- Foreigners & Many More . Service type: 1.In call 2.out call 3. full Lip to Lip kiss 4.69 5.b-job without Condom 6. Hard Core sex & Much More. 7 Body to Body Touch 8 Kissing 9 Sucking Boobs and More 10 Enjoy by Hand 11 Relax By Oral 12 Sex with Happy Ending • In Call and Out Call Service • 3* 5* 7* Hotels Service • 24 Hours Available • Indian, Russian, Punjabi, Kashmiri Escorts • Real Models, College Girls, House Wife, Also Available • Short Time and Full Time Service Available • Hygienic Full AC Neat and Clean Rooms Avail. In Hotel 24 hours • Daily Escorts Staff Available • Minimum to Maximu m Range Available.c

Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...

HyderabadDolls

Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Service Booking Now open +91- 9332606886 Pune VIP Call Girls is highly qualified professionals. #S10 They will offer only top quality services and never try to scam or drain you of money; in fact, they strive to give complete satisfaction so that each cent spent was worth its worth. Furthermore, these professionals pride themselves in keeping customer details confidential; any information gleaned will never be shared with any third party.$V15 Two shots with one girl: ₹4500/in-call, ₹7500/out-call Body to body massage with : ₹5500/in-call Full night for one person: ₹7000/in-call, ₹12000/out-call •𝐂𝐨𝐥𝐥𝐞𝐠𝐞 𝐆𝐢𝐫𝐥𝐬 •𝐇𝐨𝐮𝐬𝐞𝐰𝐢𝐯𝐞𝐬 •𝐇𝐢𝐠𝐡 𝐏𝐫𝐨𝐟𝐢𝐥𝐞 𝐆𝐢𝐫𝐥𝐬 •𝐑𝐚𝐦𝐩 𝐌𝐨𝐝𝐞𝐥𝐬 •𝐅𝐨𝐫𝐞𝐢𝐠𝐧𝐞𝐫 𝐆𝐢𝐫𝐥𝐬 𝐎𝐮𝐫 𝐞𝐬𝐜𝐨𝐫𝐭 𝐬𝐞𝐫𝐯𝐢𝐜𝐞𝐬 •𝐇𝐉 (𝐇𝐚𝐧𝐝 𝐉𝐨𝐛) •𝐁𝐉 (𝐁𝐥𝐨𝐰𝐣𝐨𝐛) # Completion # (Oral To Completion) # Special Massage # O-Level (Oral ) # Oral With A Noncondom) # COB (Come On Body) # Extraball (Have Many Times) # All Meetings 100%Safe

Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...

kumargunjan9515

Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models We are available 24*7 Booking Contact Details :- WhatsApp Chat :- +91-7014168258 If you're looking for India Call girls you've come to the right place. You'll find some of the most beautiful call girls in our location with. These ladies have pleasing personalities, hot figures, and a passion for physical pleasure. Call girls in India Lucknow Many men have booked them for their erotic and soul-mixing performances, which are sure to leave you with unforgettable memories. #K09 Escort Service India is available in the city for men and women of all ages. They can satisfy your sexual needs and will make your experience even more enjoyable and memorable. Whether you're looking for a blow-job, stripping, lovemaking, or other dirty acts, you'll be able to find a match for your tastes and budget. These highly trained professionals will help you have an unforgettable night. One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7014168258 We are available 24*7 all days of the year. Call us — 7014168258 Thank you for Visiting.

Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...

gajnagarg

Saudi Arabia [ Abortion pills) Jeddah/riaydh/dammam/+966572737505☎️] cytotec tablets uses abortion pills 💊💊 How effective is the abortion pill? 💊💊 +966572737505) "Abortion pills in Jeddah" how to get cytotec tablets in Riyadh " Abortion pills in dammam*💊💊 The abortion pill is very effective. If you’re taking mifepristone and misoprostol, it depends on how far along the pregnancy is, and how many doses of medicine you take:💊💊 +966572737505) how to buy cytotec pills At 8 weeks pregnant or less, it works about 94-98% of the time. +966572737505[ 💊💊💊 At 8-9 weeks pregnant, it works about 94-96% of the time. +966572737505) At 9-10 weeks pregnant, it works about 91-93% of the time. +966572737505)💊💊 If you take an extra dose of misoprostol, it works about 99% of the time. At 10-11 weeks pregnant, it works about 87% of the time. +966572737505) If you take an extra dose of misoprostol, it works about 98% of the time. In general, taking both mifepristone and+966572737505 misoprostol works a bit better than taking misoprostol only. +966572737505 Taking misoprostol alone works to end the+966572737505 pregnancy about 85-95% of the time — depending on how far along the+966572737505 pregnancy is and how you take the medicine. +966572737505 The abortion pill usually works, but if it doesn’t, you can take more medicine or have an in-clinic abortion. +966572737505 When can I take the abortion pill?+966572737505 In general, you can have a medication abortion up to 77 days (11 weeks)+966572737505 after the first day of your last period. If it’s been 78 days or more since the first day of your last+966572737505 period, you can have an in-clinic abortion to end your pregnancy.+966572737505 Why do people choose the abortion pill? Which kind of abortion you choose all depends on your personal+966572737505 preference and situation. With+966572737505 medication+966572737505 abortion, some people like that you don’t need to have a procedure in a doctor’s office. You can have your medication abortion on your own+966572737505 schedule, at home or in another comfortable place that you choose.+966572737505 You get to decide who you want to be with during your abortion, or you can go it alone. Because+966572737505 medication abortion is similar to a miscarriage, many people feel like it’s more “natural” and less invasive. And some+966572737505 people may not have an in-clinic abortion provider close by, so abortion pills are more available to+966572737505 them. +966572737505 Your doctor, nurse, or health center staff can help you decide which kind of abortion is best for you. +966572737505 More questions from patients: Saudi Arabia+966572737505 CYTOTEC Misoprostol Tablets. Misoprostol is a medication that can prevent stomach ulcers if you also take NSAID medications. It reduces the amount of acid in your stomach, which protects your stomach lining. The brand name of this medication is Cytotec®.+966573737505) Unwanted Kit is a combination of two medicin

Abortion pills in Jeddah | +966572737505 | Get Cytotec

Abortion pills in Riyadh +966572737505 get cytotec

原版定制【微信:153539019】《英国诺森比亚大学毕业证（NU毕业证书）》【微信:153539019】（留信学历认证永久存档查询）采用学校原版纸张、特殊工艺完全按照原版一比一制作（包括：隐形水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠，文字图案浮雕，激光镭射，紫外荧光，温感，复印防伪）行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备，十五年致力于帮助留学生解决难题，业务范围有加拿大、英国、澳洲、韩国、美国、新加坡，新西兰等学历材料，包您满意。【业务选择办理准则】一、工作未确定，回国需先给父母、亲戚朋友看下文凭的情况，办理一份就读学校的毕业证【微信153539019】文凭即可二、回国进私企、外企、自己做生意的情况，这些单位是不查询毕业证真伪的，而且国内没有渠道去查询国外文凭的真假，也不需要提供真实教育部认证。鉴于此，办理一份毕业证【微信153539019】即可三、进国企，银行，事业单位，考公务员等等，这些单位是必需要提供真实教育部认证的，办理教育部认证所需资料众多且烦琐，所有材料您都必须提供原件，我们凭借丰富的经验，快捷的绿色通道帮您快速整合材料，让您少走弯路。留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才【关于价格问题（保证一手价格）】我们所定的价格是非常合理的，而且我们现在做得单子大多数都是代理和回头客户介绍的所以一般现在有新的单子我给客户的都是第一手的代理价格，因为我想坦诚对待大家不想跟大家在价格方面浪费时间对于老客户或者被老客户介绍过来的朋友，我们都会适当给一些优惠。

如何办理英国诺森比亚大学毕业证（NU毕业证书）成绩单原件一模一样

wsppdmt

Gartner's Data Analytics Maturity Model.pptx

chadhar227

SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...

Elaine Werffeli

Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home Delivery Escorts Service Our agency presents a selection of young, charming call girls #J11 available for bookings at Oyo Hotels. Experience high-class escort services at pocket-friendly rates, with our female escorts exuding both beauty and a delightful personality, ready to meet your desires. Whether it's Housewives,#J11 College girls, Russian girls, Muslim girls, or any other preference, we offer a diverse range of options to cater to your tastes. We provide both in-call and out-call services for your convenience. Our in-call location in Kolkata ensures cleanliness, hygiene, and 100% safety, while our out-call services offer doorstep delivery for added ease. We value your time and money, hence we kindly request pic collectors, time-passers, and bargain hunters to refrain from contacting us. Our services feature various packages at competitive rates: One shot: ₹2000/in-call, ₹5000/out-call Two shots with one girl: ₹3500/in-call, ₹6000/out-call Body to body massage with sex: ₹3000/in-call Full night for one person: ₹7000/in-call, ₹10000/out-call Full night for more than 1 person: Contact us at 🔝 8005736733 🔝. for details Operating 24/7, we serve various locations in Kolkata, including Green Park, near metro stations. For premium call girl services in Kolkata 🔝 8005736733 🔝. Thank you for considering us!

Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...

HyderabadDolls

+97470301568 Qatar THC Oil and weed in Qatar? Where can I get THC vape in Doha Qatar?WhatsApp +97470301568 Buy Weed, Cocaine, Heroin and Shrooms in France,Germany, Poland Serbia,Romania, Ukraine WhatsApp +97470301568 Buy Weed, Cocaine, Heroin and Shrooms in Dubai UAE Malaysia Oman Kuwait Bahrain Saudi Arabia Qatar Where can I get weed in Qatar? Where can I get THC vape in Doha Qatar?WhatsApp +97470301568 Buy Weed, Cocaine, Heroin and Shrooms in France,Germany, Poland Serbia,Romania, UkraineWhatsApp +97470301568 Buy Weed, Cocaine, Heroin and Shrooms in Dubai UAE Malaysia Oman Kuwait Bahrain Saudi Arabia Qatar WhatsApp+97470301568 WhatsApp +97470301568 Buy Weed, Cocaine, Heroin and Shrooms in Qatar Dubai UAE Malaysia Oman Kuwait Bahrain Saudi Arabia #Singapore #Jordan #Ireland, #Belgium, #United Kingdom, #Iceland, #*Portugal, Spain, China, Japan, Turkey, Canada United States, Morocco, France,Germany, Poland Serbia,Romania, Ukraine, and all countries United Arab Emirates . Our team has succesfully delivered in 26 different countries . All marijuana and Cocaine is double vacuum packed before shipping, making it completely odorless to ensure that it arrives safely to your door. Our distribution crew is expert at making packages that blend in with the rest of the mail. We have also put into place many other security measures to ensure the security of our customers. buy weed Dubai +97470301568buy Weed Qatar #Buy Weed Kuwait #Buy Weed Bahrain #Buy #Weed #Oman #Buy Weed UAE #Buy Weed Abu Dhabi @Buy Weed Doha Qatar #@Buy Weed Ajman #@Buy Weed Online #@Buy Weed UK #@Buy Weed Iceland #*@Buy Weed All Countries Below are the various strains of kush available ; buy hash and Weed in dubai,abu dhabi,sharjah where to buy weed in doha,where can i find weed in jeddah,Can I get weed delivered to Riyadh?,Buy weed Online Jeddah Saudi Arabia,Buy Weed and THC Cannabis Oil online ,QATAR , DOHA buy kush in DOHA , buy kush in DOHAWeed in QATAR # DOHA Buy Weed and THC Cannabis Oil online who delivers at your own location in Qatar Doha ,Kuwait ,Dubai including cannabis / weed,Where can I find weed in Dubai as a tourist?,Is marijuana allowed in Dubai ? How much is medical marijuana in Dubai ?Is weed legal in Dubai ?How to get marijuana in Saudi Arabia Do people in Saudi Arabia smoke weed ? Is Hash legal in Saudi Arabia ? Where is marijuana the most illegal?Can you get weed in Baku? Dubai, United Arab Emirates Canabis smokers - Dubai, Buy Marijuana Products Online in UAE Desertcart ships the Marijuana products in Dubai ,Abu Dhabi, Sharjah, Al Ain, Ajman and more cities in UAE. Get unlimited free shipping in 164+ countries Buy Weed Products Online in Saudi Arabia Order thc Weed in Saudi Arabia Order thc Weed in Saudi Arabia Order thc Weed in Saudi Arabia Order thc Weed in Saudi Arabia Buy Marijuana Saudi Arabia weed in jeddahis cbd legal in saudi arabia cali tins smoke in saudi arabia drug use saudi arabia Buy weed marijuana. White Widow OG #Kush Sensi Star x ak 47 Afghan Ku

+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...

Health

High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girls Near 5 Star Hotel Booking Now open +91- 9332606886 Pune VIP Call Girls is highly qualified professionals. #S10 They will offer only top quality services and never try to scam or drain you of money; in fact, they strive to give complete satisfaction so that each cent spent was worth its worth. Furthermore, these professionals pride themselves in keeping customer details confidential; any information gleaned will never be shared with any third party.$V15 Two shots with one girl: ₹4500/in-call, ₹7500/out-call Body to body massage with : ₹5500/in-call Full night for one person: ₹7000/in-call, ₹12000/out-call •𝐂𝐨𝐥𝐥𝐞𝐠𝐞 𝐆𝐢𝐫𝐥𝐬 •𝐇𝐨𝐮𝐬𝐞𝐰𝐢𝐯𝐞𝐬 •𝐇𝐢𝐠𝐡 𝐏𝐫𝐨𝐟𝐢𝐥𝐞 𝐆𝐢𝐫𝐥𝐬 •𝐑𝐚𝐦𝐩 𝐌𝐨𝐝𝐞𝐥𝐬 •𝐅𝐨𝐫𝐞𝐢𝐠𝐧𝐞𝐫 𝐆𝐢𝐫𝐥𝐬 𝐎𝐮𝐫 𝐞𝐬𝐜𝐨𝐫𝐭 𝐬𝐞𝐫𝐯𝐢𝐜𝐞𝐬 •𝐇𝐉 (𝐇𝐚𝐧𝐝 𝐉𝐨𝐛) •𝐁𝐉 (𝐁𝐥𝐨𝐰𝐣𝐨𝐛) # Completion # (Oral To Completion) # Special Massage # O-Level (Oral ) # Oral With A Noncondom) # COB (Come On Body) # Extraball (Have Many Times) # All Meetings 100%Safe

High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...

kumargunjan9515

Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & Dating Escorts Service CALL GIRL IN Lucknow 9548273370 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL IN #j11 We are Providing :- ● – Private independent collage Going girls . ● – independent Models . ● – House Wife’s . ● – Private Independent House Wife’s ● – Corporate M.N.C Working Profiles . ● – Call Center Girls . ● – Live Band Girls . ●- Foreigners & Many More . Service type: 1.In call 2.out call 3. full Lip to Lip kiss 4.69 5.b-job without Condom 6. Hard Core sex & Much More. 7 Body to Body Touch 8 Kissing 9 Sucking Boobs and More 10 Enjoy by Hand 11 Relax By Oral 12 Sex with Happy Ending • In Call and Out Call Service • 3* 5* 7* Hotels Service • 24 Hours Available • Indian, Russian, Punjabi, Kashmiri Escorts • Real Models, College Girls, House Wife, Also Available • Short Time and Full Time Service Available • Hygienic Full AC Neat and Clean Rooms Avail. In Hotel 24 hours • Daily Escorts Staff Available • Minimum to Maximu m Range Available.c

Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...

HyderabadDolls

Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We are available 24*7 Booking Contact Details :- WhatsApp Chat :- +91-7014168258 If you're looking for India Call girls you've come to the right place. You'll find some of the most beautiful call girls in our location with. These ladies have pleasing personalities, hot figures, and a passion for physical pleasure. Call girls in India Lucknow Many men have booked them for their erotic and soul-mixing performances, which are sure to leave you with unforgettable memories. #K09 Escort Service India is available in the city for men and women of all ages. They can satisfy your sexual needs and will make your experience even more enjoyable and memorable. Whether you're looking for a blow-job, stripping, lovemaking, or other dirty acts, you'll be able to find a match for your tastes and budget. These highly trained professionals will help you have an unforgettable night. One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7014168258 We are available 24*7 all days of the year. Call us — 7014168258 Thank you for Visiting.

Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...

gajnagarg

RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx

ronsairoathenadugay

Computer science Sql cheat sheet.pdf.pdf

SayantanBiswas37

Ranking and Scoring Exercises for Research

Rajesh Mondal

Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort Service Available 24/7 Hire Booking Contact Details :- WhatsApp Chat :- +91-6378878445 We offer all types of girls of your choice with space. Our escorts are fully cooperative and understand your needs.#K09 All types of call girls like Housewives, College girls,#K09 Russian girls, Muslim girls, Afghani girls, Bengali girls, Working girls, south Indian girls, Punjabi girls, etc. In-Call: — You Can Reach At Our Place in Bangalore Our place Which Is Very Clean Hygienic 100% safe Accommodation. Out-Call: — Service for Out Call You have To Come Pick The Girl From My Place We Also Provide Door-Step Services Hygienic: — Full Ac Neat And Clean Rooms Available In Hotel 24 * 7 Hrs In Bangalore Our Services and Rates: – One Shot — 2500/in call (time ½ hour), 5000/out call Two shot with one girl — 5000/in call (time 1 hour), 6000/out call Body to body massage with sex- 3000/in call (time 1 hour) full night for one person– 8000/in call, 10000/out call (shot limit 4 shot) We are available 24*7 all days of the year

Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...

gragchanchal546

Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models We are available 24*7 Booking Contact Details :- WhatsApp Chat :- +91-7014168258 If you're looking for India Call girls you've come to the right place. You'll find some of the most beautiful call girls in our location with. These ladies have pleasing personalities, hot figures, and a passion for physical pleasure. Call girls in India Lucknow Many men have booked them for their erotic and soul-mixing performances, which are sure to leave you with unforgettable memories. #K09 Escort Service India is available in the city for men and women of all ages. They can satisfy your sexual needs and will make your experience even more enjoyable and memorable. Whether you're looking for a blow-job, stripping, lovemaking, or other dirty acts, you'll be able to find a match for your tastes and budget. These highly trained professionals will help you have an unforgettable night. One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7014168258 We are available 24*7 all days of the year. Call us — 7014168258 Thank you for Visiting.

Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...

gajnagarg

Kürzlich hochgeladen (20)

20240412-SmartCityIndex-2024-Full-Report.pdf

Discover Why Less is More in B2B Research

Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...

Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...

Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...

Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...

Abortion pills in Jeddah | +966572737505 | Get Cytotec

如何办理英国诺森比亚大学毕业证（NU毕业证书）成绩单原件一模一样

Gartner's Data Analytics Maturity Model.pptx

SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...

Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...

+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...

High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...

Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...

Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...

RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx

Computer science Sql cheat sheet.pdf.pdf

Ranking and Scoring Exercises for Research

Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...

Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...

Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich

1. Andrei Varanovich, InSpark Lambda Architecture in the Cloud with Azure Databricks #SAISDev6

2. Selfie Data & AI Lead @DrGigabit andrei.varanovich andrei.varanovich@inspark.nl Data Programmability Cloud High- performance teams Neural Networks ##SAISDev6

3. Big Data problem is many small data problems ##SAISDev6

4. 4 2.500.000 visitors per year 8.000 objects of art and history 1.000.000 objects stored from the year 1200 ##SAISDev6

5. Under the hood 5 Retail Sponsorship engagement Occupancy rate Service Management Incident Management Marketing Performance Warehouse Management Capacity Planning Financial Performance Ticket Sales ##SAISDev6

6. THE DATA JOURNEY 6##SAISDev6 Consolidation Insights Innovation Consolidate data in a centralized store Organizational processes and efficiency New ideas, leveraging machine learning

7. 7 IN THE NEED FOR THE PLATFORM START • Begin small and focused • Prove value GROW • Grow organically as more use cases arise SCALE • Go production and scale to the revel required We are in the need of the truly elastic data platform, to avoid any upfront planning, deployment and operations expenses, and put business value discovery first. The platform should support the [big]data projects in any stage, without the need to reengineer the whole solution. ##SAISDev6

8. Lambda Architecture on Azure 8 INGEST BATCH INGEST STREAM STORE ANALYZE Azure Data Lake Store Azure Data Factory Azure Databricks (managed Spark; batch & streaming) Social LOB Graph IoT Image CRM AI models/ APIs Cognitive Services Azure container Service & registry Insight sharing Power BI/ other tools Event Hubs Stream Batch SECURITY & MANAGEMENT Azure Log Analytics Azure Graph API Cost monitoring Azure Active Directory ##SAISDev6

9. 9 Optimized Databricks Runtime Engine DATABRICKS I/O SERVERLESS Collaborative workspace Cloud storage Data warehouses Hadoop storage IoT / streaming data Rest APIs Machine learning models BI tools Data exports Data warehouses Azure Databricks Production jobs & workflows APACHE SPARK MULTI-STAGE PIPELINES DATA ENGINEER JOB SCHEDULER NOTIFICATION & LOGS DATA SCIENTIST BUSINESS ANALYST ##SAISDev6

10. 10 Simplicity is the ultimate sophistication Leonardo da Vinci ##SAISDev6

11. 11

12. LAMBDA TO THE RESCUE 12 ##SAISDev6

13. ##SAISDev6 13 Composition of functions is applying one function to the result of another

14. 14 f(x) = x+1 (g º f)(x) = g(f(x)) (g º f)(x) = (x+1)2 g(x) = x2 input input+1 input2 ##SAISDev6 input+1 (input+1)2

15. 15 Transformation pipeline as a series of transitions s1 f3 f2 f1 sum

16. Optimized Databricks Runtime Engine DATABRICKS I/O SERVERLESS Collaborative workspace Cloud storage Data warehouses Hadoop storage IoT / streaming data Rest APIs Machine learning models BI tools Data exports Data warehouses Azure Databricks Production jobs & workflows APACHE SPARK MULTI-STAGE PIPELINES DATA ENGINEER JOB SCHEDULER NOTIFICATION & LOGS DATA SCIENTIST BUSINESS ANALYST ##SAISDev6

17. Conclusions 17 … with proper design, the features come cheaply. This approach is arduous, but continues to succeed. —Dennis Ritchie ##SAISDev6 • Standardization on Apache Spark allows us to move forward without introducing extra complexity. • 100% PaaS offering is important – no need to maintain the infrastructure. All components we use offered as PaaS on Azure. • Data pipelines as function composition allows us to ensure end-to-end consistency and spot the errors quickly. • Saving intermediate states allows to quickly inspect the data sets.

18. Thank you! 18 ##SAISDev6

19. Questions? 19 ##SAISDev6

Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich

Ähnlich wie Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich (20)

Mehr von Databricks

Mehr von Databricks (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich