Streaming Data Integration - For Women in Big Data Meetup

•Als PPTX, PDF herunterladen•

2 gefällt mir•1,354 views

A stream processing platform is not an island unto itself; it must be connected to all of your existing data systems, applications, and sources. In this talk, we will provide different options for integrating systems and applications with Apache Kafka, with a focus on the Kafka Connect framework and the ecosystem of Kafka connectors. We will discuss the intended use cases for Kafka Connect and share our experience and best practices for building large-scale data pipelines using Apache Kafka.

Software

1Confidential
Streaming Data Integration
with Apache Kafka

3Confidential
The Plan
1. What is Data Integration About?
2. How things changed?
3. What is difficult and important?
4. How we solve things in Kafka?

4Confidential
Data Integration
Making sure the right data
Gets to the right places

5Confidential
10 years ago…
Informatica
DataStage
Manual Optimizations

9Confidential
Today…
• Everything streaming
• Everything real-time
• Everything in-memory
• Everything containers
• Everything clouds

10Confidential
These Things Matter
• Reliability – Losing data is (usually) not OK.
• Exactly Once vs At Least Once
• Timeliness
• Push vs Pull
• High throughput, Varying throughput
• Compression, Parallelism, Back Pressure
• Data Formats
• Flexibility, Structure
• Security
• Error Handling

12Confidential
After: Stream Data Platform with Kafka
 Distributed  Fault Tolerant  Stores Messages
Search Security
Fraud Detection Application
User Tracking Operational Logs Operational MetricsEspresso Cassandra Oracle
Hadoop Log Search Monitoring
Data
Warehouse
Kafka
 Processes Streams

18Confidential
Introducing
Kafka Connect
Large-scale streaming data import/export for Kafka

20Confidential
Overview of Connect
1. Install a cluster of Workers
2. Download / Build and install Connector Plugins
3. Use REST API to Start and Configure Connectors
4. Connectors start Tasks. Tasks run inside Workers and copy data.

Weitere ähnliche Inhalte

Was ist angesagt?

Apache kafka-a distributed streaming platform

confluent

Since it was open sourced, Apache Kafka has been adopted very widely from web companies like Uber, Netflix, LinkedIn to more traditional enterprises like Cerner, Goldman Sachs and Cisco. At these companies, Kafka is used in a variety of ways - as a pipeline for collecting high-volume log data for load into Hadoop, a means for collecting operational metrics to feed monitoring and alerting applications, for low latency messaging use cases and to power near realtime stream processing.

The Many Faces of Apache Kafka: Leveraging real-time data at scale

Neha Narkhede

Presentation by Ewen Cheslack-Postava, Engineer, Apache Kafka Committer, Confluent In streaming workloads, often times data produced at the source is not useful down the pipeline or it requires some transformation to get it into usable shape. Similarly, where sensitive data is concerned, filtering of topics is helpful to ensure that the wrong data doesn't get to the wrong place. The newest release of Apache Kafka now offers the ability to do transformations on individual messages, making is possible to implement finer grained transformations customized to your unique needs. In this session we’ll talk about the new single message transform capabilities, how to use them to implement things like data masking and advanced partitioning, and when you’ll need to use more complex tools like the Kafka Streams API instead.

Data Pipelines Made Simple with Apache Kafka

confluent

Speaker: Gwen Shapira, Principal Data Architect, Confluent Join Gwen Shapira, Apache Kafka® committer and co-author of ""Kafka: The Definitive Guide,"" as she presents core patterns of modern data engineering and explains how you can use microservices, event streams and a streaming platform like Apache Kafka to build scalable and reliable data pipelines designed to evolve over time. This is part 1 of 3 in Streaming ETL - The New Data Integration series. Watch the recording: https://videos.confluent.io/watch/q7roRtNZBnjiT9C3ii88fo?.

The Future of ETL Isn't What It Used to Be

confluent

PostgreSQL is an open source relational database. Kafka is an open source log-based messaging system. Because both systems are powerful and flexible, they’re devouring whole categories of infrastructure. And they’re even better together. In this talk, you’ll learn about commit logs and how that fundamental data structure underlies both PostgreSQL and Kafka. We’ll use that basis to understand what Kafka is, what advantages it has over traditional messaging systems, and why it’s perfect for modeling database tables as streams. From there, we’ll introduce the concept of change data capture (CDC) and run a live demo of Bottled Water, an open source CDC pipeline, watching INSERT, UPDATE, and DELETE operations in PostgreSQL stream into Kafka. We’ll wrap up with a discussion of use cases for this pipeline: messaging between systems with transactional guarantees, transmitting database changes to a data warehouse, and stream processing.

PostgreSQL + Kafka: The Delight of Change Data Capture

Jeff Klukas

Many enterprises have a large technical debt in legacy applications hosted in on-premises data centers. There is a strong desire to modernize and move to a cloud-based infrastructure, but the world won’t stop for you to transition. Existing applications need to be supported and enhanced; data from legacy platforms is required to make decisions that drive the business. On the other hand, data from cloud-based applications does not exist in a vacuum. Legacy applications need access to these cloud data sources and vice versa. Can an enterprise have it both ways? Can new applications be built in the cloud while existing applications are maintained in a private data center? Monsanto has adopted a cloud-first mentality—today most new development is focused on the cloud. However, this transition did not happen overnight. Chrix Finne and Bob Lehmann share their experience building and implementing a Kafka-based cross-data-center streaming platform to facilitate the move to the cloud—in the process, kick-starting Monsanto’s transition from batch to stream processing. Details include an overview of the challenges involved in transitioning to the cloud and a deep dive into the cross-data-center stream platform architecture, including best practices for running this architecture in production and a summary of the benefits seen after deploying this architecture.

Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform

confluent

The initial deployment of Espresso relies on MySQL’s built-in mechanism for Master-Slave replication. Storage hosts running MySQL masters service HTTP requests to store and retrieve documents, while hosts running slave replicas remain mostly idle. Since replication is at the MySQL instance level, masters and slaves must contain the exact same partitions – precluding flexible and dynamic partition placement and migration within the cluster. Espresso is migrating to a new deployment topology where each Storage Node may host a combination of master and slave partitions; thus distributing the application requests equally across all available hardware resources. This topology requires per-partition replication between master and slave nodes. Kafka will be used as the transport for replication between partitions. For use as the replication stream for the source-of-truth data store for LinkedIn’s most valuable data, Kafka must be as reliable as MySQL replication. The session will cover Kafka configuration options to ensure highly reliable, in-order message delivery. Additionally, the application logic maintains state both within the Kafka event stream and externally to detect message re-delivery, out of order delivery, and messages inserted out-of-band. These application protocols to guarantee high fidelity will be discussed.

Espresso Database Replication with Kafka, Tom Quiggle

confluent

(Todd Palino, LinkedIn) Kafka Summit SF 2018 What do you really know about how to monitor a Kafka cluster for problems? Is your most reliable monitoring your users telling you there’s something broken? Are you capturing more metrics than the actual data being produced? Sure, we all know how to monitor disk and network, but when it comes to the state of the brokers, many of us are still unsure of which metrics we should be watching, and what their patterns mean for the state of the cluster. Kafka has hundreds of measurements, from the high-level numbers that are often meaningless to the per-partition metrics that stack up by the thousands as our data grows. We will thoroughly explore three key monitoring concepts in the broker, that will leave you an expert in identifying problems with the least amount of pain: -Under-replicated Partitions: The mother of all metrics -Request Latencies: Why your users complain -Thread pool utilization: How could 80% be a problem? We will also discuss the necessity of availability monitoring and how to use it to get a true picture of what your users see, before they come beating down your door!

URP? Excuse You! The Three Metrics You Have to Know

confluent

Whether you know you want to run Apache Kafka in multiple data centers and need practical advice or you are wondering why some organizations even need more than one cluster, this online talk is for you. In this short session, we’ll discuss the basic patterns of multi-datacenter Kafka architectures, explore some of the use-cases enabled by each architecture and show how Confluent Enterprise products make these patterns easy to implement. Visit www.confluent.io for more information.

Common Patterns of Multi Data-Center Architectures with Apache Kafka

confluent

Speaker: Jun Rao, Co-founder, Confluent In 2010, LinkedIn began developing Apache Kafka®. In 2011, Kafka was released an Apache open source project. Since then, the use of Kafka has grown rapidly in a variety of businesses. Now more than 30% of Fortune 500 companies are already using Kafka. In this 60-minute online talk, Confluent Co-founder Jun Rao will: -Explain how Kafka became the predominant publish/subscribe messaging system that it is today -Introduce Kafka's most recent additions to its set of enterprise-level features -Demonstrate how to evolve your Kafka implementation into a complete real-time streaming data platform that functions as the central nervous system for your organization Watch the recording: https://cnfl.io/kafka-past-present-future-on-demand

Apache Kafka: Past, Present and Future

confluent

The field of astronomy is rapidly changing away from the traditional notion of a lone astronomer pointing a telescope at a single object in a static sky. Initiatives such as the Sloan Digital Sky Survey have ushered in a collaborative big data era of wide-field sky surveys, in which telescopes collect observations continuously while sweeping across the visible night sky. This method of data collection enables not only very deep imaging of far and faint objects but is also optimal for searching for objects that might be changing or moving. By analyzing the differences in astronomical image data from one night to the next, astronomers can detect "transient" objects, such as variable stars, supernova, and near Earth asteroids. New sky surveys provide a wealth of scientific value for astronomers but not without technical challenges. Survey data need to be automatically processed and the results immediately distributed to the scientific community in order to enable rapid follow-up observations as transient astronomy can be highly time sensitive. Detection alert data distribution mechanisms need to be robust and reliable to maintain scientific integrity without data loss. Additionally, alerting systems need to be scalable to support a data volume unprecedented in astronomy, as transient detection rates have increased to exceed all historical data in a single night. A streaming architecture is an ideal architecture for automated distribution and processing of transient data in real time as it is being collected. In this talk, we will discuss how Kafka and Avro are being used in wide-field astronomical sky survey pipelines to serialize and distribute transient data, the design choices behind this system, and how this alert stream system has been successfully deployed in production to distribute transient detection alerts to the scientific research community in excess of 1 million events per night.

Building a newsfeed from the Universe: Data streams in astronomy (Maria Patte...

confluent

When you are running systems in production, clearly you want to make sure they are up and running at all times. But in a distributed system such as Apache Kafka… what does “up and running” even mean? Experienced Apache Kafka users know what is important to monitor, which alerts are critical and how to respond to them. They don’t just collect metrics - they go the extra mile and use additional tools to validate availability and performance on both the Kafka cluster and their entire data pipelines. In this presentation we’ll discuss best practices of monitoring Apache Kafka. We’ll look at which metrics are critical to alert on, which are useful in troubleshooting and what may actually be misleading. We’ll review a few “worst practices” - common mistakes that you should avoid. We’ll then look at what metrics don’t tell you - and how to cover those essential gaps.

Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications

confluent

Apache Kafka 0.8 basic training (120 slides) covering: 1. Introducing Kafka: history, Kafka at LinkedIn, Kafka adoption in the industry, why Kafka 2. Kafka core concepts: topics, partitions, replicas, producers, consumers, brokers 3. Operating Kafka: architecture, hardware specs, deploying, monitoring, P&S tuning 4. Developing Kafka apps: writing to Kafka, reading from Kafka, testing, serialization, compression, example apps 5. Playing with Kafka using Wirbelsturm Audience: developers, operations, architects Created by Michael G. Noll, Data Architect, Verisign, https://www.verisigninc.com/ Verisign is a global leader in domain names and internet security. Tools mentioned: - Wirbelsturm (https://github.com/miguno/wirbelsturm) - kafka-storm-starter (https://github.com/miguno/kafka-storm-starter) Blog post at: http://www.michael-noll.com/blog/2014/08/18/apache-kafka-training-deck-and-tutorial/ Many thanks to the LinkedIn Engineering team (the creators of Kafka) and the Apache Kafka open source community!

Apache Kafka 0.8 basic training - Verisign

Michael Noll

Apache Kafka is the backbone for building architectures that deal with billions of events a day. Chris Castle, Developer Advocate, will show you where it might fit in your roadmap. - What Apache Kafka is and how to use it on Heroku - How Kafka enables you to model your data as immutable streams of events, introducing greater parallelism into your applications - How you can use it to solve scale problems across your stack such as managing high throughput inbound events and building data pipelines Learn more at https://www.heroku.com/kafka Reveal.js version of slides: http://slides.com/christophercastle/deck#/

Event Driven Architectures with Apache Kafka on Heroku

Heroku

A stream processing platform is not an island unto itself; it must be connected to all of your existing data systems, applications, and sources. In this talk we will provide different options for integrating systems and applications with Apache Kafka, with a focus on the Kafka Connect framework and the ecosystem of Kafka connectors. We will discuss the intended use cases for Kafka Connect and share our experience and best practices for building large-scale data pipelines using Apache Kafka.

Data integration with Apache Kafka

confluent

Confluent building a real-time streaming platform using kafka streams and k...

Thomas Alex

With 3.6 million paid print and digital subscriptions, how did The New York Times remain a leader in an evolving industry that once relied on print? It fundamentally changed its infrastructure at the core to keep up with the new expectations of the digital age and its consumers. Now every piece of content ever published by The New York Times throughout the past 166 years and counting is stored in Apache Kafka. Join The New York Times' Director of Engineering Boerge Svingen to learn how the innovative news giant of America transformed the way it sources content while still maintaining searchability, accuracy and accessibility through a variety of applications and services—all through the power of a real-time streaming platform. In this talk, Boerge will: -Provide an overview of what the publishing infrastructure used to look like -Deep dive into the log-based architecture of The New York Times’ Publishing Pipeline -Explain the schema, monolog and skinny log used for storing articles -Share challenges and lessons learned -Answer live questions submitted by the audience Watch the recording: https://videos.confluent.io/watch/SURnGMNNzsvDHYCmnCkJEY?

Apache Kafka® Delivers a Single Source of Truth for The New York Times

confluent

kafka for db as postgres

PivotalOpenSourceHub

Introduction to Apache Kafka

Jim Plush

Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...

confluent

Was ist angesagt? (20)

Apache kafka-a distributed streaming platform

The Many Faces of Apache Kafka: Leveraging real-time data at scale

Data Pipelines Made Simple with Apache Kafka

The Future of ETL Isn't What It Used to Be

PostgreSQL + Kafka: The Delight of Change Data Capture

Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform

Espresso Database Replication with Kafka, Tom Quiggle

URP? Excuse You! The Three Metrics You Have to Know

Common Patterns of Multi Data-Center Architectures with Apache Kafka

Apache Kafka: Past, Present and Future

Building a newsfeed from the Universe: Data streams in astronomy (Maria Patte...

Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications

Apache Kafka 0.8 basic training - Verisign

Event Driven Architectures with Apache Kafka on Heroku

Data integration with Apache Kafka

Confluent building a real-time streaming platform using kafka streams and k...

Apache Kafka® Delivers a Single Source of Truth for The New York Times

kafka for db as postgres

Introduction to Apache Kafka

Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...

Andere mochten auch

Kafka for DBAs

Gwen (Chen) Shapira

Data Architectures for Robust Decision Making

Gwen (Chen) Shapira

Kafka Reliability - When it absolutely, positively has to be there

Gwen (Chen) Shapira

Kafka connect-london-meetup-2016

Gwen (Chen) Shapira

Kafka at scale facebook israel

Gwen (Chen) Shapira

Modern data systems don't just process massive amounts of data, they need to do it very fast. Using fraud detection as a convenient example, this session will include best practices on how to build real-time data processing applications using Apache Kafka. We'll explain how Kafka makes real-time processing almost trivial, discuss the pros and cons of the famous lambda architecture, help you choose a stream processing framework and even talk about deployment options.

Fraud Detection for Israel BigThings Meetup

Gwen (Chen) Shapira

Nyc kafka meetup 2015 - when bad things happen to good kafka clusters

Gwen (Chen) Shapira

Application architectures with hadoop – big data techcon 2014

Jonathan Seidman

Apache™ Kafka is a fast, scalable, durable, and fault-tolerant publish-subscribe messaging system. It offers higher throughput, reliability and replication. To manage growing data volumes, many companies are leveraging Kafka for streaming data ingest and processing. Join experts from Confluent, the creators of Apache™ Kafka, and the experts at Attunity, a leader in data integration software, for a live webinar where you will learn how to: -Realize the value of streaming data ingest with Kafka -Turn databases into live feeds for streaming ingest and processing -Accelerate data delivery to enable real-time analytics -Reduce skill and training requirements for data ingest The recorded webinar on slide 32 includes a demo using automation software (Attunity Replicate) to stream live changes from a database into Kafka and also includes a Q&A with our experts. For more information, please go to www.attunity.com/kafka.

Streaming Data Ingest and Processing with Apache Kafka

Attunity

Multi-Datacenter Kafka - Strata San Jose 2017

Gwen (Chen) Shapira

This session will go into best practices and detail on how to architect a near real-time application on Hadoop using an end-to-end fraud detection case study as an example. It will discuss various options available for ingest, schema design, processing frameworks, storage handlers and others, available for architecting this fraud detection application and walk through each of the architectural decisions among those choices.

Fraud Detection Architecture

Gwen (Chen) Shapira

Transaction processing systems are generally considered easier to scale than data warehouses. Relational databases were designed for this type of workload, and there are no esoteric hardware requirements. Mostly, it is just matter of normalizing to the right degree and getting the indexes right. The major challenge in these systems is their extreme concurrency, which means that small temporary slowdowns can escalate to major issues very quickly. In this presentation, Gwen Shapira will explain how application developers and DBAs can work together to built a scalable and stable OLTP system - using application queues, connection pools and strategic use of caches in different layers of the system.

Queues, Pools, Caches

Gwen (Chen) Shapira

Kafka and Hadoop at LinkedIn Meetup

Gwen (Chen) Shapira

Fraud Detection with Hadoop

markgrover

In this presentation, I will explain event driven architecture, describe the different types of events, demonstrate how events can be related and orchestrated, and provide a basic understanding of how this method can drive the architecture of enterprise systems. In addition to understanding the concepts of event driven architecture, we will explore a working sample built using an open-source .NET messaging framework called MassTransit.

Event Driven Architecture

Chris Patterson

Change Data Capture using Kafka

Akash Vacher

Have your cake and eat it too

Gwen (Chen) Shapira

There are a lot of tasks in Oracle world which would not be possible without a programming languages. Shell scripting can be applied to a wide variety of system and database tasks. In my presentation I will share advanced shell scripting techniques on real life customer success story migrating users from on premise Oracle Internet Directory (OID) instance to AWS OID instance. Migration with standard OID provided tools was not possible due to specific customer requirements. Therefore shell scripting came to achieve desired goals. I`ll give deep overview about issues faced during the scripting, troubleshooting techniques used, scripting performance aspects and solutions applied to make efficient user migration possible.

Advanced Shell Scripting for Oracle professionals

Andrejs Vorobjovs

Kafka & Hadoop - for NYC Kafka Meetup

Gwen (Chen) Shapira

Architecting applications with Hadoop - Fraud Detection

hadooparchbook

Andere mochten auch (20)

Kafka for DBAs

Data Architectures for Robust Decision Making

Kafka Reliability - When it absolutely, positively has to be there

Kafka connect-london-meetup-2016

Kafka at scale facebook israel

Fraud Detection for Israel BigThings Meetup

Nyc kafka meetup 2015 - when bad things happen to good kafka clusters

Application architectures with hadoop – big data techcon 2014

Streaming Data Ingest and Processing with Apache Kafka

Multi-Datacenter Kafka - Strata San Jose 2017

Fraud Detection Architecture

Queues, Pools, Caches

Kafka and Hadoop at LinkedIn Meetup

Fraud Detection with Hadoop

Event Driven Architecture

Change Data Capture using Kafka

Have your cake and eat it too

Advanced Shell Scripting for Oracle professionals

Kafka & Hadoop - for NYC Kafka Meetup

Architecting applications with Hadoop - Fraud Detection

Ähnlich wie Streaming Data Integration - For Women in Big Data Meetup

CA Technologies Customer Presentation

Splunk

Operational Buddhism: Building Reliable Services From Unreliable Components -...

Ernie Souhrada

Presto Summit 2018 - 02 - LinkedIn

kbajda

Deploying Big Data Platforms

Chris Kernaghan

Joe Witt presentation on Apache NiFi

Mark Kerzner

We’ve been collecting information for many years, driven by the usual suspects: compliance and fear. Now it’s time to take advantage of the information we’ve gathered by shifting our focus from the people who felt they had to keep it to the people who can actually use it. In short, it’s time to reap the benefits of the hard work we have already done. Learn how American Nuclear Insurers is using their information today, the process that got them there, and the technology it took to make it happen. Learn about the current state of Information Management in AIIM’s latest report: http://info.aiim.org/2017-state-of-information-management

[AIIM17] It’s Harvest Time in the Information Garden - Dan Antion

AIIM International

A presentation on the Cold Chain, Cloud Computing, and Validation from the DicksonOne Product Manager, Matt McNamara, presented at the Cold Chain Conference in Dubai. Matt walks you through the different advantages to using cloud computing in the cold chain, and how new technology currently complies with regulatory agencies such as the FDA. Questions? Send us an email at content@dicksondata.com, and for more content on temperature, visit our blog at www.blog.dicksondata.com. Enjoy!

The Cloud, Cold Chain, and Compliance

Michael Miller

CERN Data Centre Evolution

Gavin McCance

Since it became an Apache Top Level Project in early 2008, Hadoop has established itself as the de-facto industry standard for batch processing. The two layers composing its core, HDFS and MapReduce, are strong building blocks for data processing. Running data analysis and crunching petabytes of data is no longer fiction. But the MapReduce framework does have two major drawbacks: query latency and data freshness. At the same time, businesses have started to exchange more and more data through REST API, leveraging HTTP words (GET, POST, PUT, DELETE) and URI (for instance http://company/api/v2/domain/identifier), pushing the need to read data in a random access style – from simple key/value to complex queries. Enhancing the BigData stack with real time search capabilities is the next natural step for the Hadoop ecosystem, because the MapReduce framework was not designed with synchronous processing in mind. There is a lot of traction today in this area and this talk will try to answer the question of how to fill in this gap with specific open-source components, ultimately building a dedicated platform that will enable real-time queries on Internet-scale data sets. After discussing the evolution of the deployments of common Hadoop platform, a hybrid approach called lambda architecture will be proposed. It will be demonstrated with concrete examples, discussing which technology could be a good match, and how they would interact together.

Soft-Shake 2013 : Enabling Realtime Queries to End Users

Benoit Perroud

Science for the Future: Strategies for Moving and Sharing Data

Ian Foster

Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017

AWS Chicago

Virtual Desktops on AWS by Mike Burke, Farm Credit Canada

TriNimbus

Introduction to Apache Cassandra

Instaclustr

Mediawiki to Confluence migration

Nils Hofmeister

While the cloud offers many benefits, moving TBs and PBs of data to the cloud can be challenging. Traditional software transfer technologies are slow and unreliable, and shipping physical storage disks is time consuming and exposes data to unnecessary security risks. IBM Aspera offers high-speed data transfer that uses the public internet to securely and reliably migrate large amounts of data from your existing environment to AWS. Learn how IBM Aspera on Cloud can dramatically reduce migration windows, help lower the costs of migration, and eliminate the risks associated with physical disk shipment. This presentation is brought to you by AWS partner, IBM.

IBM Aspera for High Speed Data Migration to Your AWS Cloud - DEM06-S - Anahei...

Amazon Web Services

Connecting Akka with Oracle Event Hub Cloud Service

Dalibor Blazevic

Webinar with SnagAJob, HP Vertica and Looker - Data at the speed of busines s...

Looker

Delivering a Campus Research Data Service with Globus

Ian Foster

(Bob Lehmann, Bayer) Kafka Summit SF 2018 You’ve built your streaming data platform. The early adopters are “all in” and have developed producers, consumers and stream processing apps for a number of use cases. A large percentage of the enterprise, however, has expressed interest but hasn’t made the leap. Why? In 2014, Bayer Crop Science (formerly Monsanto) adopted a cloud first strategy and started a multi-year transition to the cloud. A Kafka-based cross-datacenter DataHub was created to facilitate this migration and to drive the shift to real-time stream processing. The DataHub has seen strong enterprise adoption and supports a myriad of use cases. Data is ingested from a wide variety of sources and the data can move effortlessly between an on premise datacenter, AWS and Google Cloud. The DataHub has evolved continuously over time to meet the current and anticipated needs of our internal customers. The “cost of admission” for the platform has been lowered dramatically over time via our DataHub Portal and technologies such as Kafka Connect, Kubernetes and Presto. Most operations are now self-service, onboarding of new data sources is relatively painless and stream processing via KSQL and other technologies is being incorporated into the core DataHub platform. In this talk, Bob Lehmann will describe the origins and evolution of the Enterprise DataHub with an emphasis on steps that were taken to drive user adoption. Bob will also talk about integrations between the DataHub and other key data platforms at Bayer, lessons learned and the future direction for streaming data and stream processing at Bayer.

Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...

confluent

Pinterest hadoop summit_talk

Krishna Gade

Ähnlich wie Streaming Data Integration - For Women in Big Data Meetup (20)

CA Technologies Customer Presentation

Operational Buddhism: Building Reliable Services From Unreliable Components -...

Presto Summit 2018 - 02 - LinkedIn

Deploying Big Data Platforms

Joe Witt presentation on Apache NiFi

[AIIM17] It’s Harvest Time in the Information Garden - Dan Antion

The Cloud, Cold Chain, and Compliance

CERN Data Centre Evolution

Soft-Shake 2013 : Enabling Realtime Queries to End Users

Science for the Future: Strategies for Moving and Sharing Data

Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017

Virtual Desktops on AWS by Mike Burke, Farm Credit Canada

Introduction to Apache Cassandra

Mediawiki to Confluence migration

IBM Aspera for High Speed Data Migration to Your AWS Cloud - DEM06-S - Anahei...

Connecting Akka with Oracle Event Hub Cloud Service

Webinar with SnagAJob, HP Vertica and Looker - Data at the speed of busines s...

Delivering a Campus Research Data Service with Globus

Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...

Pinterest hadoop summit_talk

Mehr von Gwen (Chen) Shapira

Velocity 2019 - Kafka Operations Deep Dive

Gwen (Chen) Shapira

Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote

Gwen (Chen) Shapira

Gluecon - Kafka and the service mesh

Gwen (Chen) Shapira

Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17

Gwen (Chen) Shapira

Kafka reliability velocity 17

Gwen (Chen) Shapira

Twitter with hadoop for oow

Gwen (Chen) Shapira

R for hadoopers

Gwen (Chen) Shapira

Scaling ETL with Hadoop - Avoiding Failure

Gwen (Chen) Shapira

Intro to Spark - for Denver Big Data Meetup

Gwen (Chen) Shapira

Cloudera Impala: The Open Source, Distributed SQL Query Engine for Big Data. The Cloudera Impala project is pioneering the next generation of Hadoop capabilities: the convergence of fast SQL queries with the capacity, scalability, and flexibility of a Apache Hadoop cluster. With Impala, the Hadoop ecosystem now has an open-source codebase that helps users query data stored in Hadoop-based enterprise data hubs in real time, using familiar SQL syntax. This talk will begin with an overview of the challenges organizations face as they collect and process more data than ever before, followed by an overview of Impala from the user's perspective and a dive into Impala's architecture. It concludes with stories of how Cloudera's customers are using Impala and the benefits they see.

Incredible Impala

Gwen (Chen) Shapira

Data Wrangling and Oracle Connectors for Hadoop

Gwen (Chen) Shapira

Scaling etl with hadoop shapira 3

Is hadoop for you

Ssd collab13

Integrated dwh 3

Visualizing database performance hotsos 13-v2

Gwen (Chen) Shapira

Flexible Design

Gwen (Chen) Shapira

Mehr von Gwen (Chen) Shapira (17)

Velocity 2019 - Kafka Operations Deep Dive

Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote

Gluecon - Kafka and the service mesh

Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17

Kafka reliability velocity 17

Twitter with hadoop for oow

R for hadoopers

Scaling ETL with Hadoop - Avoiding Failure

Intro to Spark - for Denver Big Data Meetup

Incredible Impala

Data Wrangling and Oracle Connectors for Hadoop

Scaling etl with hadoop shapira 3

Is hadoop for you

Ssd collab13

Integrated dwh 3

Visualizing database performance hotsos 13-v2

Flexible Design

Kürzlich hochgeladen

In an era where security concerns are paramount, the integration of artificial intelligence (AI) into CCTV cameras has revolutionized surveillance capabilities. One of the most significant advancements is the ability to achieve real-time threat detection, enabling immediate responses to potential security breaches. This blog explores how AI is reshaping surveillance through real-time threat detection and the implications of this technology.

Optimizing AI for immediate response in Smart CCTV

shikhaohhpro

Investing in AI transformation today The modern business advantage: Uncovering deep insights with AI Organizations around the world have come to recognize AI as the transformative technology that enables them to gain real business advantage. AI’s ability to organize vast quantities of data allows those who implement it to uncover deep business insights, augment human expertise, drive operational efficiency, transform their products, and better serve their customers

Microsoft AI Transformation Partner Playbook.pdf

Willy Marroquin (WillyDevNET)

Conference: Engage2024 in Antwerp Type: Workshop Speakers: Florian Vogler, Henning Kunz, Christoph Adler Title: Navigating the Future with The Hitchhiker's Guide to Notes and Domino 14 Abstract: Embark on an exhilarating journey with industry trailblazers Florian Vogler, Henning Kunz, and Christoph Adler in this not-to-be-missed workshop at the forefront of the tech universe. Get ready for a thrilling kick-off as we navigate the current state of the HCL universe, setting the stage for an exploration of the groundbreaking Notes and Domino 14. Discover the latest enhancements and revolutionary features that will redefine your experience. In this interactive session, unlock a treasure trove of tips and tricks to elevate your utilization of version 14, both with and without the game-changing panagenda MarvelClient. Brace yourself for also diving into Nomad, Nomad Web, and VoltMX, expanding your horizons in the expansive HCL landscape. Be a part of this exclusive opportunity to stay ahead in the ever-evolving world of HCL technologies. Your journey to mastering Notes and Domino 14 begins here. And remember, in the spirit of intergalactic exploration, don't forget to bring your towel!

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...

panagenda

A Secure and Reliable Document Management System is Essential.docx

ComplianceQuest1

Diamond Application Development Crafting Solutions with Precision

SolGuruz

A great deal of attention in medical devices has shifted towards cybersecurity with the ratification of section 524B of the FD&C act. This new law enables the FDA to enforce cybersecurity controls in any medical device that is capable of networked communications or that has software. In this webinar we will recap the process for managing vulnerabilities, identify categories of vulnerabilities and solutions and more.

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...

ICS

In the realm of real-time applications, Large Language Models (LLMs) have long dominated language-centric tasks, while tools like OpenCV have excelled in the visual domain. However, the future (maybe) lies in the fusion of LLMs and deep learning, giving birth to the revolutionary concept of Large Action Models (LAMs). Imagine a world where AI not only comprehends language but mimics human actions on technology interfaces. For example, the Rabbit r1 device presented at CES 2024, driven by an AI operating system and LAM, brings this vision to life. It executes complex commands, leveraging GUIs with unprecedented ease. In this presentation, join me on a journey as a software engineer tinkering with WebRTC, Janus, and LLM/LAMs. Together, we’ll evaluate the current state of these AI technologies, unraveling the potential they hold for shaping the future of real-time applications.

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications

Alberto González Trastoy

SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI

ABDERRAOUF MEHENNI

+971565801893 Mtp-Kit (500MG) Prices » Dubai [(+971565801893**)] Abortion Pills For Sale In Dubai, UAE, Mifepristone and Misoprostol Tablets Available In Dubai, UAE CONTACT DR.Leen Whatsapp +971565801893 We Have Abortion Pills / Cytotec Tablets /Mifegest Kit Available in Dubai, Sharjah, Abudhabi, Ajman, Alain, Fujairah, Ras Al Khaimah, Umm Al Quwain, UAE, Buy cytotec in Dubai +971565801893''''Abortion Pills near me DUBAI | ABU DHABI|UAE. Price of Misoprostol, Cytotec” +971565801893' Dr.DEEM ''BUY ABORTION PILLS MIFEGEST KIT, MISOPROTONE, CYTOTEC PILLS IN DUBAI, ABU DHABI,UAE'' Contact me now via What's App…… abortion Pills Cytotec also available Oman Qatar Doha Saudi Arabia Bahrain Above all, Cytotec Abortion Pills are Available In Dubai / UAE, you will be very happy to do abortion in Dubai we are providing cytotec 200mg abortion pill in Dubai, UAE. Medication abortion offers an alternative to Surgical Abortion for women in the early weeks of pregnancy. We only offer abortion pills from 1 week-6 Months. We then advise you to use surgery if its beyond 6 months. Our Abu Dhabi, Ajman, Al Ain, Dubai, Fujairah, Ras Al Khaimah (RAK), Sharjah, Umm Al Quwain (UAQ) United Arab Emirates Abortion Clinic provides the safest and most advanced techniques for providing non-surgical, medical and surgical abortion methods for early through late second trimester, including the Abortion By Pill Procedure (RU 486, Mifeprex, Mifepristone, early options French Abortion Pill), Tamoxifen, Methotrexate and Cytotec (Misoprostol). The Abu Dhabi, United Arab Emirates Abortion Clinic performs Same Day Abortion Procedure using medications that are taken on the first day of the office visit and will cause the abortion to occur generally within 4 to 6 hours (as early as 30 minutes) for patients who are 3 to 12 weeks pregnant. When Mifepristone and Misoprostol are used, 50% of patients complete in 4 to 6 hours; 75% to 80% in 12 hours; and 90% in 24 hours. We use a regimen that allows for completion without the need for surgery 99% of the time. All advanced second trimester and late term pregnancies at our Tampa clinic (17 to 24 weeks or greater) can be completed within 24 hours or less 99% of the time without the need surgery. The procedure is completed with minimal to no complications. Our Women's Health Center located in Abu Dhabi, United Arab Emirates, uses the latest medications for medical abortions (RU-486, Mifeprex, Mifegyne, Mifepristone, early options French abortion pill), Methotrexate and Cytotec (Misoprostol). The safety standards of our Abu Dhabi, United Arab Emirates Abortion Doctors remain unparalleled. They consistently maintain the lowest complication rates throughout the nation. Our Physicians and staff are always available to answer questions and care for women in one of the most difficult times in their lives. The decision to have an abortion at the Abortion Clinic in Abu Dhabi, United Arab Emirates.+971565801893

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...

Health

Document contains steps to getting ups and running quickly with MyTimeClock Employee Scheduling and Time Keeping Cloud Software as a Service Solution, Web version. Try MyTimeClock or any of our other software packages risk-free by registering for a FREE ACCOUNT at https://register.myintellisource.com/. If you would like more information about our company or its software, follow us on Facebook, Instagram, LinkedIn, Twitter, or YouTube, visit our home page at https://www.myintellisource.com/, or send us an email at cs@myintellisource.com. Take care and have a great day.

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...

MyIntelliSource, Inc.

(Vivek)Call Us, 8448380779,Call girls in Delhi NCr – We Offer best in class call girls. escort Service At Affordable Price At low Rate with Space Night 8000 We Are One Of The Oldest Escort and Call girls Agencies in Delhi. You Will Find That Our Female Escorts Are Full Of Fun, Sexy And They Would Love Enjoy Your Company. We Have A Fantastic Selection Of Escort Ladies Available For In-Calls As Well As Out-Calls. Our Escorts Are Not Only Beautiful But All Have Great Personalities Making Them The Perfect Companion For Any Occasion. In-Call:- You Can Come At Our Place in Delhi Our place Which Is Very Clean Hygienic 100% safe Accommodation. Out-Call:- You have To Come Pick The Girl From My Place We Are Also Provide Door Step Services (Delhi Ncr, Noida, Gurgaon, Faridabad, Ghaziabad Note:- Pic Collectors Time Passers Bargainers Stay Away As We Respect The Value For Your Money Time And Expect The Same From You Hygienic:- Full Ac room And Clean Rooms Available In Hotel 24 * 7 Hourly In Delhi NCR More Details, With WhatsApp Number, +91-8448380779

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️

Delhi Call girls

5 Signs You Need a Fashion PLM Software.pdf

Wave PLM

Unlocking the Future of AI Agents with Large Language Models

aagamshah0812

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf

kalichargn70th171

How To Use Server-Side Rendering with Nuxt.js

Andolasoft Inc

Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...

OnePlan Solutions

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...

kellynguyen01

Hand gesture recognition PROJECT PPT.pptx

bodapatigopi8531

Test automation is a cornerstone of software development and quality assurance in today's rapidly evolving digital landscape. Its significance cannot be overstated. Businesses can enhance efficiency, productivity, and accelerate software delivery to market through automation, streamlining testing processes effectively. This comprehensive guide addresses the best practices for test automation in 2024. It offers a detailed checklist to empower you to optimize your automation efforts and maintain a competitive edge.

The Ultimate Test Automation Guide_ Best Practices and Tips.pdf

kalichargn70th171

👉 𝑾𝑰𝑳𝑳 𝑷𝑹𝑶𝑽𝑰𝑫𝑬 𝒀𝑶𝑼 𝑾𝑰𝑻𝑯 𝑺𝑬𝑿𝒀 𝑴𝑶𝑫𝑬𝑳𝑺 𝑾𝑯𝑶 𝑾𝑰𝑳𝑳 𝑫𝑨𝑵𝑪𝑬 & 𝑫𝑹𝑰𝑵𝑲 𝑾𝑰𝑻𝑯 𝒀𝑶𝑼 𝑨𝑵𝑫 𝑨𝑳𝑺𝑶 𝑷𝑹𝑶𝑽𝑰𝑫𝑬 𝒀𝑶𝑼 𝑺𝑬𝑿𝑼𝑨𝑳 𝑩𝑶𝑫𝒀 𝑻𝑶 𝑩𝑶𝑫𝒀 𝑴𝑨𝑺𝑺𝑨𝑮𝑬 𝑾𝑰𝑻𝑯 𝑺𝑬𝑿. 👉𝒀𝑶𝑼 𝑴𝑨𝒀 𝑻𝑨𝑲𝑬 𝑻𝑯𝑬𝑴 𝑶𝑼𝑻 𝑭𝑶𝑹 𝑨 𝑷𝑨𝑹𝑻𝒀 𝑶𝑹 𝑨𝑳𝑺𝑶 𝑭𝑶𝑹 𝑨𝑵𝒀 𝑷𝑹𝑰𝑽𝑨𝑻𝑬 𝑷𝑨𝑹𝑻𝑰𝑬𝑺. 👉𝑻𝑯𝑬𝑺𝑬 𝑮𝑰𝑹𝑳𝑺 𝑨𝑹𝑬 𝑰𝑵𝑻𝑬𝑹𝑬𝑺𝑻𝑬𝑫 𝑰𝑵 𝑯𝑨𝑽𝑰𝑵𝑮 𝑺𝑶𝑴𝑬 𝑭𝑼𝑵 𝑾𝑰𝑻𝑯 𝒀𝑶𝑼 𝑨𝑵𝑫 𝑾𝑰𝑳𝑳 𝑬𝑵𝑺𝑼𝑹𝑬 𝑻𝑯𝑨𝑻 𝒀𝑶𝑼 𝑯𝑨𝑽𝑬 𝑪𝑶𝑴𝑷𝑳𝑬𝑻𝑬 𝑭𝑼𝑵. 28/April/2024 {{🎗️SCH🎗️}} 8923113531 How to book call girls The Booking process is particularly time-saving and more comfortable for us. Below mentioned are the steps you need to follow to hire our call girl: Step 1 - Visit our Call Girls Service website Step 2 - check for the portfolios of our sexy and hot call girls Step 3 - check on the services provided by the specific call girls Step 4 - Once you have selected the call girls and respective services, go to the contact us page. Step 5 - on the contact page, you will get our phone number, WhatsApp number, and email address. You can choose any one of them to connect with us. Call Us – 8923113531 ✣ ✤ ✥ ✦ 𝑻𝑰𝑴𝑬 𝑾𝑨𝑺𝑻𝑬𝑹𝑺 𝑨𝑵𝑫 𝑩𝑨𝑹𝑮𝑨𝑰𝑵𝑬𝑹𝑺 𝑨𝑹𝑬 𝑷𝑳𝑬𝑨𝑺𝑬 𝑬𝑿𝑪𝑼𝑺𝑬, 𝑾𝑬 𝑹𝑬𝑺𝑷𝑬𝑪𝑻 𝒀𝑶𝑼𝑹 𝑺𝑨𝑭𝑬𝑻𝒀 𝑨𝑵𝑫 𝑷𝑹𝑰𝑽𝑨𝑪𝒀 𝑨𝑵𝑫 𝑬𝑿𝑷𝑬𝑪𝑻 𝑻𝑯𝑬 𝑺𝑨𝑴𝑬 𝑭𝑹𝑶𝑴 𝒀𝑶𝑼.✣ ✤ ✥ ✦

CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service

anilsa9823

Kürzlich hochgeladen (20)