shark attack on sql-on-hadoop Talk at BerlinBuzzwords 2014

•

3 likes•1,410 views

This document discusses SQL-on-Hadoop tools like Shark and SparkSQL. Shark sits on top of Apache Spark and is tightly coupled with Hive, using Hive statements and metadata. It provides faster performance than Hive due to in-memory processing. SparkSQL is a new tool that is not dependent on Hive and uses a new SchemaRDD. The document recommends using columnar file formats like Parquet for better performance and disk usage and provides a hands-on demonstration comparing file formats and query execution times in Hive, Impala and Shark.

Technology

Shark attack on SQL-on-
Hadoop
Gerd König
May 27th, 2014
(Big) Data Engineer

Agenda
● SQL-on-Hadoop
○ Why that hype?
○ Tool overview and comparison
○ File formats matters
● Shark
○ Facts & figures
○ What makes the difference?
○ SparkSQL enters the playground
● Hands-On (quick ‘n dirty)
○ File formats & disk usage
○ Execution times (at a rough estimate) / Benchmarking
● Summary

SQL-on-Hadoop - Why that hype?
● Hadoop is widely accepted as “new technology”
● Hadoop gets more and more enterprise ready
● SQL is a well established language for many years
and used by DB developers as well as Business
Analysts
=> Huge demand for SQL(-like) access to Hadoop

SQL-on-Hadoop
A whole bunch of tools (just an excerpt)

Shark - Facts & figures
● ...sits on top of Apache Spark
● is tightly coupled with Hive, uses a slightly modified version
● use Hive statements, UDFs and Hive metastore (HCatalog)
● can be run in Shark-shell as well as Shark Server (connect e.g.
via beeline JDBC client)

Shark / SparkSQL
● What makes the difference?
○ Performance increase due to in-memory processing (‘low-latency
M/R’)
○ Interaction with other “Plugins” of the Spark stack, like ML-library,
e.g. call ML functions directly with your SQL resultset:
val youngUsers = sql2rdd("SELECT * FROM users WHERE age < 20")
println(youngUsers.count)
val featureMatrix = youngUsers.map(extractFeatures(_))
kmeans(featureMatrix)
● SparkSQL - A new star is born?
○ no dependencies to Hive, new type of RDD “SchemaRDD”
○ fires SQL against RDDs, Parquet files, Hive (via Wrappers)

File format matters
● An appropriate file format influences
○ performance, and
○ used disk space
● Use a columnar storage format for columnar data(bases)
○ RCFile, ORC, Parquet

Hands-On
● Part I
○ compare Parquet based table vs. flat file
● Part II
○ execute 1 query in Hive, Impala and Shark
○ get a feeling about runtime...

Further information
● Detailled Benchmarks by Berkeley AmpLab:
https://amplab.cs.berkeley.edu/benchmark/
● Shark
http://shark.cs.berkeley.edu/
● SparkSQL
https://github.com/apache/spark/tree/master/sql
http://people.apache.org/~pwendell/catalyst-docs/sql-
programming-guide.html

THANKS for your attention !
Gerd König
gerd.koenig@ymc.ch
Tel. +41 (0)71 508 24 74
@gerd_koenig
ch.linkedin.com/in/gerdkoenig
Q&A

What's hot

Most databases are based on architectures that pre-date advances to modern hardware. This results in performance issues, the need to overprovision, and a high total cost of ownership. In this webinar we will discuss the advances to modern server technology and take a deep dive into Scylla’s shard-per-core architecture and our asynchronous engine, the Seastar framework. Join us to learn how Seastar (and Scylla): Avoid locks and contention on the CPU level Bypass kernel bottlenecks Implement its per-core shared-nothing autosharding mechanism Utilize modern storage hardware Leverage NUMA to get the best RAM performance Balance your data across CPUs and nodes for best and smoothest performance Plus we’ll cover the advantages of unlocking vertical scalability.

Under the Hood of a Shard-per-Core Database Architecture

ScyllaDB

ScyllaDB recently announced Project Alternator, a new open source project that will enable Amazon DynamoDB users to easily migrate to an open-source database that runs anywhere — on most cloud platforms, on-premises, on bare-metal, virtual machines or via Kubernetes — all while preserving their investments in their existing application code. Project Alternator will help DynamoDB users achieve much better and more reliable performance, reduce database costs by 80% - 90%, support large items (10s of MBs) and large partitions (multiple GBs), control the number of replicas, balance cost vs. redundancy, and much more. Join ScyllaDB founders Avi Kivity and Dor Laor and lead engineer Nadav Har’El for a live webinar on September 25th, where they will share an overview of Project Alternator, including: Alternator’s design implementation and goals How to configure Alternator (ok, add alternator_port: 8000 to your scylla.yaml) Demo how to easily run it from docker/rpm Run several examples: Tic-tac-toe based DynamoDB example with Alternator How to benchmark Scylla Alternator with YCSB and considerations around it How to run a serverless application along with Alternator How to migrate DynamoDB data to Alternator using the Spark migrator Discuss the current limitations of Alternator Plus we will discuss current limitations of Alternator, describe different consistencies and active-active vs leader model, share the project roadmap, and answer your questions at the end.

Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API

ScyllaDB

MongoDB has become the prominent NoSQL database engine and is now used for a wide variety of use cases because of its flexibility and ease of use for developers, while Scylla, a C++ rewrite of Cassandra, provides benefits through its architectural approach, including getting rid of the JVM and a CPU-level design that gets the most out of your hardware thanks to a CPU level design. Numberly has been using MongoDB for over a decade and Scylla for over a year in production. The benefits of the Scylla architecture allied to the Cassandra ecosystem fuel a rapid adoption in a very wide range of use cases: from real-time data pipelines and analytics batches processing to web applications database backend. Learn the motivations of such an adoption trend and why it proves to be successful so far while outlining its limits and why MongoDB is still here to stay!

MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...

ScyllaDB

With so many moving parts, how can you be sure that a tweak to your database is really beneficial or just a measurement fluke? And how can you be sure that your benchmark is measuring the right thing? In this webinar you will learn a set of best practices to improve the quality of your benchmarks, including how to... Evaluate changes to database systems, whether you are a code contributor or a user playing with knobs Determine what to measure for the most accurate results Make benchmarks that are strong, solid, and reliable We will also cover how to judge whether benchmarks with incredible claims are to be trusted.

The Do’s and Don’ts of Benchmarking Databases

ScyllaDB

NoSQL Slideshare Presentation

Ericsson Labs

Since its inception, Scylla has offered a compelling alternative to Apache Cassandra, providing better performance for a lower cost of ownership. With Scylla Open Source 4.0 we continue to extend our CQL interface features and capabilities and also now provide an open source alternative to DynamoDB, allowing you to run your workloads anywhere, on any cloud provider, or on premises. Join ScyllaDB co-founders, CTO Avi Kivity and CEO Dor Laor, for a look at the new features in Scylla Open Source 4.0, and architectural and cost comparisons with the coming Cassandra 4.0. Topics will include: Improved consistency with our new Lightweight Transactions Scylla Operator for Kubernetes How we stack up against Apache Cassandra 4.0 Our “run anywhere” DynamoDB alternative

Introducing Scylla Open Source 4.0

ScyllaDB

Introduction to NoSQL

PolarSeven Pty Ltd

Webinar how to build a highly available time series solution with kairos-db (1)

Julia Angell

NoSQL Databases

Eduard Tudenhoefner

NoSQL Databases

Ashish Karki

«NoSQL Databases and Polyglot Persistence»

Olga Lavrentieva

To maximize the benefits of ScyllaDB, you must adapt the structure of your data. Data modeling for ScyllaDB should be query-driven based on your access patterns – a very different approach than normalization for SQL tables. In this session, you will learn how tools can help you migrate your existing SQL structures to accelerate your digital transformation and application modernization. To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.

Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...

ScyllaDB

Xldb2011 tue 1120_youtube_datawarehouse

liqiang xu

Natura is a multibillion-dollar Brazilian cosmetics company that prides itself on its ecological friendliness, bioethics and sustainable business practices. It operations now span over 70 countries, 3,200 stores, 17,000 employees, and a human network of 1.8 million sales consultants. In this presentation, we will discuss the technical and business drivers behind Natura’s decision to migrate from Datastax Cassandra to Scylla, business scenarios involved, and expectations.

Scylla Summit 2018: The Short and Straight Road That Leads from Cassandra to ...

ScyllaDB

NoSQL and NewSQL: Tradeoffs between Scalable Performance & Consistency

ScyllaDB

Nosql databases for the .net developer

Jesus Rodriguez

Kubernetes is a declarative system for automatically deploying, managing, and scaling server-side applications and their dependencies. In this webinar, we will introduce Kubernetes at a high level and demonstrate how to get started using Scylla with Kubernetes and Google Compute Engine. Join us to: Understand the principles of Kubernetes and how it solves common problems of deploying distributed applications Explore an example configuration of Scylla with Kubernetes that can serve as a starting point for your own system. Get insight into the performance characteristics of Scylla when it it is run in a container (e.g. Docker) and deployed via Kubernetes.

Steering the Sea Monster - Integrating Scylla with Kubernetes

ScyllaDB

Introduction to NoSql

Omid Vahdaty

Building a REST API with Cassandra on Datastax Astra Using Python and Node

Anant Corporation

Cassandra-vs-MongoDB

Jainul Musani

What's hot (20)

Under the Hood of a Shard-per-Core Database Architecture

Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API

MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...

The Do’s and Don’ts of Benchmarking Databases

NoSQL Slideshare Presentation

Introducing Scylla Open Source 4.0

Introduction to NoSQL

Webinar how to build a highly available time series solution with kairos-db (1)

NoSQL Databases

«NoSQL Databases and Polyglot Persistence»

Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...

Xldb2011 tue 1120_youtube_datawarehouse

Scylla Summit 2018: The Short and Straight Road That Leads from Cassandra to ...

NoSQL and NewSQL: Tradeoffs between Scalable Performance & Consistency

Nosql databases for the .net developer

Steering the Sea Monster - Integrating Scylla with Kubernetes

Introduction to NoSql

Building a REST API with Cassandra on Datastax Astra Using Python and Node

Cassandra-vs-MongoDB

Viewers also liked

Shark SQL and Rich Analytics at Scale

DataWorks Summit

In this talk, we present two emerging, popular open source projects: Spark and Shark. Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. It outperform Hadoop by up to 100x in many real-world applications. Spark programs are often much shorter than their MapReduce counterparts thanks to its high-level APIs and language integration in Java, Scala, and Python. Shark is an analytic query engine built on top of Spark that is compatible with Hive. It can run Hive queries much faster in existing Hive warehouses without modifications. These systems have been adopted by many organizations large and small (e.g. Yahoo, Intel, Adobe, Alibaba, Tencent) to implement data intensive applications such as ETL, interactive SQL, and machine learning.

20130912 YTC_Reynold Xin_Spark and Shark

YahooTechConference

BDAS Shark study report 03 v1.1

Stefanie Zhao

The internals

Heribertus Bramundito

Social network analysis

Stefanie Zhao

Spark is an open source cluster computing framework that can outperform Hadoop by 30x through a combination of in-memory computation and a richer execution engine. Shark is a port of Apache Hive onto Spark, which provides a similar speedup for SQL queries, allowing interactive exploration of data in existing Hive warehouses. This talk will cover how both Spark and Shark are being used at various companies to accelerate big data analytics, the architecture of the systems, and where they are heading. We will also discuss the next major feature we are developing, Spark Streaming, which adds support for low-latency stream processing to Spark, giving users a unified interface for batch and real-time analytics.

Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data

Jetlore

Sharks powerpoint

Smithtown High School West

Viewers also liked (7)

Shark SQL and Rich Analytics at Scale

20130912 YTC_Reynold Xin_Spark and Shark

BDAS Shark study report 03 v1.1

The internals

Social network analysis

Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data

Sharks powerpoint

Similar to shark attack on sql-on-hadoop Talk at BerlinBuzzwords 2014

At Databricks, we have a unique view into over a hundred different companies trying out Spark for development and production use-cases, from their support tickets and forum posts. Having seen so many different workflows and applications, some discernible patterns emerge when looking at common performance and scalability issues that our users run into. This talk will discuss some of these common common issues from an engineering and operations perspective, describing solutions and clarifying misconceptions.

Spark Summit EU 2015: Lessons from 300+ production users

Databricks

Spark 101

Shahaf Azriely {TopLinked} ☁

The state of analytics has changed dramatically over the last few years. Hadoop is now commonplace, and the ecosystem has evolved to include new tools such as Spark, Shark, and Drill, that live alongside the old MapReduce-based standards. It can be difficult to keep up with the pace of change, and newcomers are left with a dizzying variety of seemingly similar choices. This is compounded by the number of possible deployment permutations, which can cause all but the most determined to simply stick with the tried and true. In this talk I will introduce you to a powerhouse combination of Cassandra and Spark, which provides a high-speed platform for both real-time and batch analysis.

New Analytics Toolbox DevNexus 2015

Robbie Strickland

Spark SQL

Joud Khattab

spark_v1_2

Frank Schroeter

A look under the hood at Apache Spark's API and engine evolutions

Databricks

Spark For Faster Batch Processing

Edureka!

Apache Spark for Beginners

Anirudh

Spark SQL | Apache Spark

Edureka!

Big Data Processing With Spark

Edureka!

AWS Big Data Demystified is all about knowledge sharing b/c knowledge should be given for free. in this lecture we will dicusss the advantages of working with Zeppelin + spark sql, jdbc + thrift, ganglia, r+ spark r + livy, and a litte bit about ganglia on EMR.\ subscribe to you youtube channel to see the video of this lecture: https://www.youtube.com/channel/UCzeGqhZIWU-hIDczWa8GtgQ?view_as=subscriber

AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...

Omid Vahdaty

This presentation about Apache Spark covers all the basics that a beginner needs to know to get started with Spark. It covers the history of Apache Spark, what is Spark, the difference between Hadoop and Spark. You will learn the different components in Spark, and how Spark works with the help of architecture. You will understand the different cluster managers on which Spark can run. Finally, you will see the various applications of Spark and a use case on Conviva. Now, let's get started with what is Apache Spark. Below topics are explained in this Spark presentation: 1. History of Spark 2. What is Spark 3. Hadoop vs Spark 4. Components of Apache Spark 5. Spark architecture 6. Applications of Spark 7. Spark usecase What is this Big Data Hadoop training course about? The Big Data Hadoop and Spark developer course have been designed to impart an in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab. What are the course objectives? Simplilearn’s Apache Spark and Scala certification training are designed to: 1. Advance your expertise in the Big Data Hadoop Ecosystem 2. Help you master essential Apache and Spark skills, such as Spark Streaming, Spark SQL, machine learning programming, GraphX programming and Shell Scripting Spark 3. Help you land a Hadoop developer job requiring Apache Spark expertise by giving you a real-life industry project coupled with 30 demos What skills will you learn? By completing this Apache Spark and Scala course you will be able to: 1. Understand the limitations of MapReduce and the role of Spark in overcoming these limitations 2. Understand the fundamentals of the Scala programming language and its features 3. Explain and master the process of installing Spark as a standalone cluster 4. Develop expertise in using Resilient Distributed Datasets (RDD) for creating applications in Spark 5. Master Structured Query Language (SQL) using SparkSQL 6. Gain a thorough understanding of Spark streaming features 7. Master and describe the features of Spark ML programming and GraphX programming Who should take this Scala course? 1. Professionals aspiring for a career in the field of real-time big data analytics 2. Analytics professionals 3. Research professionals 4. IT developers and testers 5. Data scientists 6. BI and reporting professionals 7. Students who wish to gain a thorough understanding of Apache Spark Learn more at https://www.simplilearn.com/big-data-and-analytics/apache-spark-scala-certification-training

What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...

Simplilearn

Apache Spark is one of the most popular big data systems, but once the shiny finish starts to wear off you can find yourself wondering if you've accidentally deployed a Ford Pinto into production. This talk will look at the challenges that come with scaling Spark jobs. Also, the talk will explore Spark's new(ish) Dataset/DataFrame API, as well as how it’s evolving in Spark 2.3 with improved Python support. If you're already a Spark user, come to find out why it’s not all your fault. If you aren't already a Spark user, come to find out how to save yourself from some of the pitfalls once you move beyond the example code. Check out Holden's newest book, High Performance Spark, for more information! From https://niketechtalksjan2018.splashthat.com/

Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018

Holden Karau

Going into different streaming methods, we will share our experience as early-adopters of Spark Streaming and Spark Structured Streaming, and how we overcame technical barriers (and there were plenty...). We will also present a rather unique solution of using Kafka to imitate streaming over our Data Lake, while significantly reducing our cloud services’ costs. Topics include : * Kafka and Spark Streaming for stateless and stateful use-cases * Spark Structured Streaming as a possible alternative * Combining Spark Streaming with batch ETLs * “Streaming” over Data Lake using Kafka

Stream, stream, stream: Different streaming methods with Spark and Kafka

Itai Yaffe

Unit II Real Time Data Processing tools.pptx

Rahul Borate

This talk covers a number of important topics for making scalable Apache Spark programs - from RDD re-use to considerations for working with Key/Value data, why avoiding groupByKey is important and more. We also include Python specific considerations, like the difference between DataFrames/Datasets and traditional RDDs with Python. We also explore some tricks to intermix Python and JVM code for cases where the performance overhead is too high.

Improving PySpark performance: Spark Performance Beyond the JVM

Holden Karau

Apache spark-melbourne-april-2015-meetup

Ned Shawa

Apache Spark PDF

Naresh Rupareliya

Getting started with Apache Spark in Python - PyLadies Toronto 2016

Holden Karau

This talk will introduce Apache Spark (one of the most popular big data tools), the different built ins (from SQL to ML), and, of course, everyone's favorite wordcount example. Once we've got the nice parts out of the way, we'll talk about some of the limitations and the work being undertaken to improve those limitations. We'll also look at the cases where Spark is more like trying to hammer a screw. Since we want to finish on a happy note, we will close out with looking at the new vectorized UDFs in PySpark 2.3.

A fast introduction to PySpark with a quick look at Arrow based UDFs

Holden Karau

Similar to shark attack on sql-on-hadoop Talk at BerlinBuzzwords 2014 (20)

Spark Summit EU 2015: Lessons from 300+ production users

Spark 101

New Analytics Toolbox DevNexus 2015

Spark SQL

spark_v1_2

A look under the hood at Apache Spark's API and engine evolutions

Spark For Faster Batch Processing

Apache Spark for Beginners

Spark SQL | Apache Spark

Big Data Processing With Spark

AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...

What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...

Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018

Stream, stream, stream: Different streaming methods with Spark and Kafka

Unit II Real Time Data Processing tools.pptx

Improving PySpark performance: Spark Performance Beyond the JVM

Apache spark-melbourne-april-2015-meetup

Apache Spark PDF

Getting started with Apache Spark in Python - PyLadies Toronto 2016

A fast introduction to PySpark with a quick look at Arrow based UDFs

Recently uploaded

Exploring Multimodal Embeddings with Milvus

Zilliz

Artificial Intelligence Chap.5 : Uncertainty

Khushali Kathiriya

Join our latest Connector Corner webinar to discover how UiPath Integration Service revolutionizes API-centric automation in a 'Quote to Cash' process—and how that automation empowers businesses to accelerate revenue generation. A comprehensive demo will explore connecting systems, GenAI, and people, through powerful pre-built connectors designed to speed process cycle times. Speakers: James Dickson, Senior Software Engineer Charlie Greenberg, Host, Product Marketing Manager

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

DianaGray10

Passkeys: Developing APIs to enable passwordless authentication Cody Salas, Sr Developer Advocate | Solutions Architect - Yubico Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...

apidays

Tracing the root cause of a performance issue requires a lot of patience, experience, and focus. It’s so hard that we sometimes attempt to guess by trying out tentative fixes, but that usually results in frustration, messy code, and a considerable waste of time and money. This talk explains how to correctly zoom in on a performance bottleneck using three levels of profiling: distributed tracing, metrics, and method profiling. After we learn to read the JVM profiler output as a flame graph, we explore a series of bottlenecks typical for backend systems, like connection/thread pool starvation, invisible aspects, blocking code, hot CPU methods, lock contention, and Virtual Thread pinning, and we learn to trace them even if they occur in library code you are not familiar with. Attend this talk and prepare for the performance issues that will eventually hit any successful system. About authorWith two decades of experience, Victor is a Java Champion working as a trainer for top companies in Europe. Five thousands developers in 120 companies attended his workshops, so he gets to debate every week the challenges that various projects struggle with. In return, Victor summarizes key points from these workshops in conference talks and online meetups for the European Software Crafters, the world’s largest developer community around architecture, refactoring, and testing. Discover how Victor can help you on victorrentea.ro : company training catalog, consultancy and YouTube playlists.

Finding Java's Hidden Performance Traps @ DevoxxUK 2024

Victor Rentea

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Product Anonymous

DBX First Quarter 2024 Investor Presentation

Dropbox

Architecting Cloud Native Applications

WSO2

Corporate and higher education. Two industries that, in the past, have had a clear divide with very little crossover. The difference in goals, learning styles and objectives paved the way for differing learning technologies platforms to evolve. Now, those stark lines are blurring as both sides are discovering they have content that’s relevant to the other. Join Tammy Rutherford as she walks through the pros and cons of corporate and higher ed collaborating. And the challenges of these different technology platforms working together for a brighter future.

Corporate and higher education May webinar.pptx

Rustici Software

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...

Zilliz

Following the popularity of "Cloud Revolution: Exploring the New Wave of Serverless Spatial Data," we're thrilled to announce this much-anticipated encore webinar. In this sequel, we'll dive deeper into the Cloud-Native realm by uncovering practical applications and FME support for these new formats, including COGs, COPC, FlatGeoBuf, GeoParquet, STAC, and ZARR. Building on the foundation laid by industry leaders Michelle Roby of Radiant Earth and Chris Holmes of Planet in the first webinar, this second part offers an in-depth look at the real-world application and behind-the-scenes dynamics of these cutting-edge formats. We will spotlight specific use-cases and workflows, showcasing their efficiency and relevance in practical scenarios. Discover the vast possibilities each format holds, highlighted through detailed discussions and demonstrations. Our expert speakers will dissect the key aspects and provide critical takeaways for effective use, ensuring attendees leave with a thorough understanding of how to apply these formats in their own projects. Elevate your understanding of how FME supports these cutting-edge technologies, enhancing your ability to manage, share, and analyze spatial data. Whether you're building on knowledge from our initial session or are new to the serverless spatial data landscape, this webinar is your gateway to mastering cloud-native formats in your workflows.

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Safe Software

Strategies for Landing an Oracle DBA Job as a Fresher

Remote DBA Services

Webinar Recording: https://www.panagenda.com/webinars/why-teams-call-analytics-is-critical-to-your-entire-business Nothing is as frustrating and noticeable as being in an important call and being unable to see or hear the other person. Not surprising then, that issues with Teams calls are among the most common problems users call their helpdesk for. Having in depth insight into everything relevant going on at the user’s device, local network, ISP and Microsoft itself during the call is crucial for good Microsoft Teams Call quality support. To ensure a quick and adequate solution and to ensure your users get the most out of their Microsoft 365. But did you know that ‘bad calls’ are also an excellent indicator of other problems arising? Precisely because it is so noticeable!? Like the canary in the mine, bad calls can be early indicators of problems. Problems that might otherwise not have been noticed for a while but can have a big impact on productivity and satisfaction. Join this session by Christoph Adler to learn how true Microsoft Teams call quality analytics helped other organizations troubleshoot bad calls and identify and fix problems that impacted Teams calls or the use of Microsoft365 in general. See what it can do to keep your users happy and productive! In this session we will cover - Why CQD data alone is not enough to troubleshoot call problems - The importance of attributing call problems to the right call participant - What call quality analytics can do to help you quickly find, fix-, and prevent problems - Why having retrospective detailed insights matters - Real life examples of how others have used Microsoft Teams call quality monitoring to problem shoot problems with their ISP, network, device health and more.

Why Teams call analytics are critical to your entire business

panagenda

FWD Group - Insurer Innovation Award 2024

The Digital Insurer

Abhishek Deb(1), Mr Abdul Kalam(2) M. Des (UX) , School of Design, DIT University , Dehradun. This paper explores the future potential of AI-enabled smartphone processors, aiming to investigate the advancements, capabilities, and implications of integrating artificial intelligence (AI) into smartphone technology. The research study goals consist of evaluating the development of AI in mobile phone processors, analyzing the existing state as well as abilities of AI-enabled cpus determining future patterns as well as chances together with reviewing obstacles as well as factors to consider for more growth.

Exploring the Future Potential of AI-Enabled Smartphone Processors

debabhi2

Boost Fertility New Invention Ups Success Rates.pdf

sudhanshuwaghmare1

In the thrilling conclusion to 2023, ransomware groups had a banner year, really outdoing themselves in the "make everyone's life miserable" department. LockBit 3.0 took gold in the hacking olympics, followed by the plucky upstarts Clop and ALPHV/BlackCat. Apparently, 48% of organizations were feeling left out and decided to get in on the cyber attack action. Business services won the "most likely to get digitally mugged" award, with education and retail nipping at their heels. Hackers expanded their repertoire beyond boring old encryption to the much more exciting world of extortion. The US, UK and Canada took top honors in the "countries most likely to pay up" category. Bitcoins were the currency of choice for discerning hackers, because who doesn't love untraceable money?

Ransomware_Q4_2023. The report. [EN].pdf

Overkill Security

Three things you will take away from the session: • How to run an effective tenant-to-tenant migration • Best practices for before, during, and after migration • Tips for using migration as a springboard to prepare for Copilot in Microsoft 365 Main ideas: Migration Overview: The presentation covers the current reality of cross-tenant migrations, the triggers, phases, best practices, and benefits of a successful tenant migration Considerations: When considering a migration, it is important to consider the migration scope, performance, customization, flexibility, user-friendly interface, automation, monitoring, support, training, scalability, data integrity, data security, cost, and licensing structure Next Wave: The next wave of change includes the launch of Copilot, which requires businesses to be prepared for upcoming changes related to Copilot and the cloud, and to consolidate data and tighten governance ShareGate: ShareGate can help with pre-migration analysis, configurable migration tool, and automated, end-user driven collaborative governance

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

sammart93

Accelerating FinTech Innovation: Unleashing API Economy and GenAI Vasa Krishnan, Chief Technology Officer - FinResults Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...

apidays

Dubai, known for its towering skyscrapers, luxurious lifestyle, and relentless pursuit of innovation, often finds itself in the global spotlight. However, amidst the glitz and glamour, the emirate faces its own set of challenges, including the occasional threat of flooding. In recent years, Dubai has experienced sporadic but significant floods, disrupting normalcy and posing unique challenges to its infrastructure. Among the critical nodes in this bustling metropolis is the Dubai International Airport, a vital hub connecting the world. This article delves into the intersection of Dubai flood events and the resilience demonstrated by the Dubai International Airport in the face of such challenges.

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf

Orbitshub

Recently uploaded (20)

Exploring Multimodal Embeddings with Milvus

Artificial Intelligence Chap.5 : Uncertainty

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...

Finding Java's Hidden Performance Traps @ DevoxxUK 2024

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

DBX First Quarter 2024 Investor Presentation

Architecting Cloud Native Applications

Corporate and higher education May webinar.pptx

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Strategies for Landing an Oracle DBA Job as a Fresher

Why Teams call analytics are critical to your entire business

FWD Group - Insurer Innovation Award 2024

Exploring the Future Potential of AI-Enabled Smartphone Processors

Boost Fertility New Invention Ups Success Rates.pdf

Ransomware_Q4_2023. The report. [EN].pdf

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf

shark attack on sql-on-hadoop Talk at BerlinBuzzwords 2014

1. Shark attack on SQL-on- Hadoop Gerd König May 27th, 2014 (Big) Data Engineer

2. Agenda ● SQL-on-Hadoop ○ Why that hype? ○ Tool overview and comparison ○ File formats matters ● Shark ○ Facts & figures ○ What makes the difference? ○ SparkSQL enters the playground ● Hands-On (quick ‘n dirty) ○ File formats & disk usage ○ Execution times (at a rough estimate) / Benchmarking ● Summary

3. SQL-on-Hadoop - Why that hype? ● Hadoop is widely accepted as “new technology” ● Hadoop gets more and more enterprise ready ● SQL is a well established language for many years and used by DB developers as well as Business Analysts => Huge demand for SQL(-like) access to Hadoop

4. SQL-on-Hadoop A whole bunch of tools (just an excerpt)

5. SQL-on-Hadoop Clustering some tools

6. Shark - Facts & figures ● ...sits on top of Apache Spark ● is tightly coupled with Hive, uses a slightly modified version ● use Hive statements, UDFs and Hive metastore (HCatalog) ● can be run in Shark-shell as well as Shark Server (connect e.g. via beeline JDBC client)

7. Shark / SparkSQL ● What makes the difference? ○ Performance increase due to in-memory processing (‘low-latency M/R’) ○ Interaction with other “Plugins” of the Spark stack, like ML-library, e.g. call ML functions directly with your SQL resultset: val youngUsers = sql2rdd("SELECT * FROM users WHERE age < 20") println(youngUsers.count) val featureMatrix = youngUsers.map(extractFeatures(_)) kmeans(featureMatrix) ● SparkSQL - A new star is born? ○ no dependencies to Hive, new type of RDD “SchemaRDD” ○ fires SQL against RDDs, Parquet files, Hive (via Wrappers)

8. File format matters ● An appropriate file format influences ○ performance, and ○ used disk space ● Use a columnar storage format for columnar data(bases) ○ RCFile, ORC, Parquet

9. Hands-On ● Part I ○ compare Parquet based table vs. flat file ● Part II ○ execute 1 query in Hive, Impala and Shark ○ get a feeling about runtime...

10. Further information ● Detailled Benchmarks by Berkeley AmpLab: https://amplab.cs.berkeley.edu/benchmark/ ● Shark http://shark.cs.berkeley.edu/ ● SparkSQL https://github.com/apache/spark/tree/master/sql http://people.apache.org/~pwendell/catalyst-docs/sql- programming-guide.html

11. THANKS for your attention ! Gerd König gerd.koenig@ymc.ch Tel. +41 (0)71 508 24 74 @gerd_koenig ch.linkedin.com/in/gerdkoenig Q&A

shark attack on sql-on-hadoop Talk at BerlinBuzzwords 2014

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to shark attack on sql-on-hadoop Talk at BerlinBuzzwords 2014

Similar to shark attack on sql-on-hadoop Talk at BerlinBuzzwords 2014 (20)

Recently uploaded

Recently uploaded (20)

shark attack on sql-on-hadoop Talk at BerlinBuzzwords 2014