Сергей Ковалёв (Altoros): Practical Steps to Improve Apache Hive Performance

•Als PPTX, PDF herunterladen•

2 gefällt mir•1,214 views

Сергей Ковалёв: Solutions Architect, Big Data/High-performance Computation Expert в Altoros; г.Минск Доклад: «Practical Steps to Improve Apache Hive Performance»

Technologie

Practical Steps to Improve Hive Queries
Performance
Sergey Kovalev
Software Engineer at Altoros

1. Use partitions whenever possible
/folder1/video_data/file1
id, title, channelId, description, uploadYear
1, title1, channelId1, description1, 2012
2, title2, channelId2, description2, 2012
3, title3, channelId3, description3, 2013
4, title4, channelId4, description4, 2013
/folder1/video_data/2012/file1
1, title1, channelId1, description1, 2012
2, title2, channelId2, description2, 2012
/folder1/video_data/2013/file1
3, title3, channelId3, description3, 2013
4, title4, channelId4, description4, 2013
SELECT * from video WHERE uploadYear=’2013-04-08’

1. Use partitions whenever possible
create table video (
id STRING,
title STRING,
description STRING,
viewCount BIGINT
) PARTITIONED BY (uploadYear date)
STORED AS ORC;
insert into table video PARTITION (uploadYear) select * from video_external;

2. Use bucketing
create table video (
id STRING,
channelId STRING,
title STRING,
description STRING,
) CLUSTERED BY(channelId)
INTO 2 BUCKETS
STORED AS ORC;
create table channel (
id STRING,
title STRING,
description STRING,
viewCount BIGINT
) CLUSTERED BY(id)
INTO 2 BUCKETS
STORED AS ORC;
SELECT v.title FROM video v JOIN channel ch ON v.channelId = ch.id WHERE
ch.viewCount>1000

2. Use bucketing
/folder1/video_data/file1
id, title, channelId, description, uploadYear
1, title1, channelId1, description1, 2012
2, title2, channelId2, description2, 2012
3, title3, channelId3, description3, 2012
4, title4, channelId4, description4, 2012
5, title5, channelId5, description5, 2013
6, title6, channelId6, description6, 2013
7, title7, channelId7, description7, 2013
8, title8, channelId8, description8, 2013
/folder1/video_data/file1
2, title2, channelId2, description2, 2012
4, title4, channelId4, description4, 2012
6, title6, channelId6, description6, 2013
8, title8, channelId8, description8, 2013
/folder1/video_data/file2
1, title1, channelId1, description1, 2012
3, title3, channelId3, description3, 2012
5, title5, channelId5, description5, 2013
7, title7, channelId7, description7, 2013

2. Use bucketing
/folder1/channel_data/file1
id, title, description, viewCount
channelId1, title1, description1, viewCount1
channelId2, title2, description2, viewCount2
channelId3, title3, description3, viewCount3
channelId4, title4, description4, viewCount4
channelId5, title5, description5, viewCount5
channelId6, title6, description6, viewCount6
channelId7, title7, description7, viewCount7
channelId8, title8, description8, viewCount8
/folder1/channel_data/file1
channelId2, title2, description2, viewCount2
channelId4, title4, description4, viewCount4
channelId6, title6, description6, viewCount6
channelId8, title8, description8, viewCount8
/folder1/channel_data/file2
channelId1, title1, description1, viewCount1
channelId3, title3, description3, viewCount3
channelId5, title5, description5, viewCount5
channelId7, title7, description7, viewCount7

3. Partitions + bucketing
create table video (
id STRING,
channelId STRING,
title STRING,
description STRING,
viewCount BIGINT
) PARTITIONED BY (uploadYear date)
CLUSTERED BY(channelId)
INTO 2 BUCKETS
STORED AS ORC;

3. Partitions + bucketing
/folder1/video_data/file1
id, title, channelId, viewCount, uploadYear
1, title1, channelId1, viewCount1, 2012
2, title2, channelId2, viewCount2, 2012
3, title3, channelId3, viewCount3, 2012
4, title4, channelId4, viewCount4, 2012
5, title5, channelId5, viewCount5, 2013
6, title6, channelId6, viewCount6, 2013
7, title7, channelId7, viewCount7, 2013
8, title8, channelId8, viewCount8, 2013
/folder1/video_data/2012/file1
2, title2, description2, viewCount2, 2012
4, title4, description4, viewCount4, 2012
/folder1/video_data/2012/file2
1, title1, description1, viewCount1, 2012
3, title3, description3, viewCount3, 2012
/folder1/video_data/2013/file1
6, title6, description6, viewCount6, 2013
8, title8, description8, viewCount8, 2013
/folder1/video_data/2013/file2
5, title5, description5, viewCount5, 2013
7, title7, description7, viewCount7, 2013

4. Use joins optimization
Shuffle join/Common join:

4. Use joins optimization
Map-side join:

4. Use joins optimization
Sort-merge-bucket (SMB) join:

5. Choose the right input format
Row Data Column Store

6. Other optimization
Avoid highly normalized table structures
Compress map/reduce output
For map output compression, execute set mapred.compress.map.output = true.
For job output compression, execute set mapred.output.compress = true.
Use parallel execution
SET hive.exce.parallel=true;

7. Use the 'explain' keyword to improve the query
execution plan
EXPLAIN query...

7. Use the 'explain' keyword to improve the query
execution plan

8. Stinger Initiative
Use cost-based optimization
Use vectorization
Transactions with ACID semantics

8. Sub-Second Queries with Hive LLAP
New approach using a hybrid engine that leverages Tez and something new called LLAP (Live
Long and Process)

Weitere ähnliche Inhalte

Ähnlich wie Сергей Ковалёв (Altoros): Practical Steps to Improve Apache Hive Performance

These slides were presented on the Streaming Media West conference in 2016. This talk is also a reference for the blog post "Using Microservices to Encode and Publish Videos at The New York Times" at The New York Times Open blog. - Streaming Media West 2016: http://streamingmedia.com/Conferences/West2016/ - Open Blog: http://open.blogs.nytimes.com/2016/11/01/using-microservices-to-encode-and-publish-videos-at-the-new-york-times/

Building a Video Encoding Pipeline at The New York Times

Flávio Ribeiro

初心者Scala in f@n　第五回 sbt+giter8

gak2223

These slides were presented at the Streaming Media West conference in 2016. This talk is also a reference to the blog post "Using Microservices to Encode and Publish Videos at The New York Times" at The New York Times Open blog. - Streaming Media West 2016: http://streamingmedia.com/Conferences/West2016/ - Open Blog: http://open.blogs.nytimes.com/2016/11/01/using-microservices-to-encode-and-publish-videos-at-the-new-york-times/

Building a Video Encoding Pipeline at The New York Times

Maxwell Dayvson Da Silva

This summer, coming to a server near you, Cassandra 3.0! Contributors and committers have been working hard on what is the most ambitious release to date. It’s almost too much to talk about, but we will dig into some of the most important, ground breaking features that you’ll want to use. Indexing changes that will make your applications faster and spark jobs more efficient. Storage engine changes to get even more density and efficiency from your nodes. Developer focused features like full JSON support and User Defined Functions. And finally, one of the most requested features, Windows support, has made it’s arrival. There is more, but you’ll just have to some see for yourself. Get your front row seat and don’t miss it!

Cassandra 3.0 advanced preview

Patrick McFadin

Building Killr Applications with DataStax Enterprise

DataStax

Building Killr Applications with DSE

DataStax

FileWrite.javaFileWrite.java/* * To change this license header, choose License Headers in Project Properties. * To change this template file, choose Tools | Templates * and open the template in the editor. */ package filewrite; import java.io.BufferedWriter; import java.io.FileWriter; import java.io.IOException; /** * @description This program will write text to a file and save the file in the * project's root directory. * @author Eric */ publicclassFileWrite{ /** * @param args the command line arguments */ publicstaticvoid main(String[] args){ // declaring variables of text and initializing the buffered writer String txt ="Hello World."; BufferedWriter writer =null; // write the text variable using the bufferedwriter to testing.txt try{ writer =newBufferedWriter(newFileWriter("testing.txt")); writer.write(txt); } // print error message if there is one catch(IOException io){ System.out.println("File IO Exception"+ io.getMessage()); } //close the file finally{ try{ if(writer !=null){ writer.close(); } } //print error message if there is one catch(IOException io){ System.out.println("Issue closing the File."+ io.getMessage()); } } } } JavaMail.javaJavaMail.java/* * To change this license header, choose License Headers in Project Properties. * To change this template file, choose Tools | Templates * and open the template in the editor. */ package javamail; import java.util.Properties; import javax.mail.Message; import javax.mail.MessagingException; import javax.mail.PasswordAuthentication; import javax.mail.Session; import javax.mail.Transport; import javax.mail.internet.InternetAddress; import javax.mail.internet.MimeMessage; /** * @description This program uses Java to send emails over the SSL protocol. * @author Eric */ publicclassJavaMail{ /** * @param args the command line arguments */ publicstaticvoid main(String[] args){ Properties props =newProperties(); props.put("mail.smtp.host","smtp.gmail.com"); props.put("mail.smtp.socketFactory.port","465"); props.put("mail.smtp.socketFactory.class", "javax.net.ssl.SSLSocketFactory"); props.put("mail.smtp.auth","true"); props.put("mail.smtp.port","465"); Session session =Session.getDefaultInstance(props, new javax.mail.Authenticator(){ protectedPasswordAuthentication getPasswordAuthentication(){ returnnewPasswordAuthentication("username","password"); } }); try{ Message message =newMimeMessage(session); message.setFrom(newInternetAddress("[email protected]")); message.setRecipients(Message.RecipientType.TO, InternetAddress.parse("[email protected]")); message.setSubject("Testing Subject"); message.setText("Dear Mail Crawler,"+ "\n\n No spam to my email, please!"); Transport.send(message); System.out.println("Done"); }catch(MessagingException e){ thrownewRuntimeException(e); } } } loginApp.javaloginApp.java ...

FileWrite.javaFileWrite.java To change this license header.docx

ssuser454af01

Darknet yolo

Bang Tsui Liou

SQL server Backup Restore Revealed

Antonios Chatzipavlis

BOXEE apps API

idancohen

The distributed in-memory caching capabilities of Windows Server AppFabric will change how you think about scaling your Microsoft .NET-connected applications. Come learn how the distributed nature of the AppFabric cache allows large amounts of data to be stored in-memory for extremely fast access, how AppFabric's integration with Microsoft ASP.NET makes it easy to add low-latency data caching across the web farm, and discover the unique high availability features of AppFabric which will bring new degrees of scale and resilience to your data tier and your web applications.

Scale Your Data Tier With Windows Server App Fabric

Chris Dufour

EDI Training Module 11: Publishing Data in the EDI Repository

Environmental Data Initiative

Neo4j Bloom: What’s New with Neo4j's Data Visualization Tool

Neo4j

This tutorial is designed for anyone who needs to work with data stored in HDF5 files. The tutorial will cover functionality and useful features of the HDF5 utilities h5dump, h5diff, h5repack, h5stat, h5copy, h5check and h5repart. We will also introduce a prototype of the new h52jpeg conversion tool and recently released h5perf_serial tool used for performance studies. We will briefly introduce HDFView. Details of the HDFView and HDF-Java will be discussed in a separate tutorial.

HDF5 Tools

The HDF-EOS Tools and Information Center

TechDays 2010 Portugal - Scaling your data tier with app fabric 16x9

Nuno Godinho

Skyfire log files100411

navaidkhan

short_intro_to_CMake_(inria_REVES_team)

Jérôme Esnault

Standards For Java Coding

Rahul Bhutkar

You've made a good career developing applications using a relational database. You know learning how to be a Cassandra developer is going to be a great skill to add. Now it's time to bridge those two things into reality. I was in your shoes and I can help. How do you work without ACID transactions? The data model looks similar but is so different! What are some of the bad things I should avoid? What are some of the traps I can fall into moving from a relational database? I hear these questions all the time. Let's spend some time to walk through each one and get you on track. Before you know it, you'll be going crazy on your next Cassandra based application! About the Speaker Patrick McFadin Chief Evangelist, DataStax Patrick McFadin is one of the leading experts of Apache Cassandra and data modeling techniques. As the Chief Evangelist for Apache Cassandra and consultant for DataStax, he has helped build some of the largest and exciting deployments in production. Previous to DataStax, he was Chief Architect at Hobsons and an Oracle DBA/Developer for over 15 years.

Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassan...

DataStax

IP3/build.xml Builds, tests, and runs the project IP3. IP3/build/classes/.netbeans_automatic_build IP3/build/classes/.netbeans_update_resources IP3/build/classes/ip3/IP3.classpackage ip3; publicsynchronizedclass IP3 extends Sub { public void IP3(); publicstatic void main(String[]); } IP3/build/classes/ip3/Sub.classpackage ip3; publicsynchronizedclass Sub { private String name; private String address; private String beverage; private String bread; private String type; private String size; public void Sub(); public void Sub(String, String); public String getName(); public void setName(String); public String getAddress(); public void setAddress(String); public String getBeverage(); public void setBeverage(String); public String getBread(); public void setBread(String); public String getType(); public void setType(String); public String getSize(); public void setSize(String); } IP3/manifest.mf Manifest-Version: 1.0 X-COMMENT: Main-Class will be added automatically by build IP3/nbproject/build-impl.xml .

IP3build.xml Builds, tests, and runs the project IP3..docx

christiandean12115

Ähnlich wie Сергей Ковалёв (Altoros): Practical Steps to Improve Apache Hive Performance (20)

Building a Video Encoding Pipeline at The New York Times

初心者Scala in f@n　第五回 sbt+giter8

Building a Video Encoding Pipeline at The New York Times

Cassandra 3.0 advanced preview

Building Killr Applications with DataStax Enterprise

Building Killr Applications with DSE

FileWrite.javaFileWrite.java To change this license header.docx

Darknet yolo

SQL server Backup Restore Revealed

BOXEE apps API

Scale Your Data Tier With Windows Server App Fabric

EDI Training Module 11: Publishing Data in the EDI Repository

Neo4j Bloom: What’s New with Neo4j's Data Visualization Tool

HDF5 Tools

TechDays 2010 Portugal - Scaling your data tier with app fabric 16x9

Skyfire log files100411

short_intro_to_CMake_(inria_REVES_team)

Standards For Java Coding

Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassan...

IP3build.xml Builds, tests, and runs the project IP3..docx

Mehr von Olga Lavrentieva

15 10-22 altoros-fact_sheet_st_v4

Olga Lavrentieva

Андрей Козлов (Altoros): Оптимизация производительности Cassandra

Olga Lavrentieva

Владимир Иванов: Software Engineer / Principal Member of Technical Staff в Oracle; г.Санкт-Петербург Ведущий инженер Oracle, работает в группе разработки виртуальной Java-машиныHotSpot. Специализируется на JIT-компиляции и поддержке альтернативных языков на платформе Java. Доклад: «Java: прошлое и будущее».

Владимир Иванов (Oracle): Java: прошлое и будущее

Olga Lavrentieva

Brug - Web push notification

Olga Lavrentieva

Александр Ломов: "Reactjs + Haskell + Cloud Foundry = Love"

Olga Lavrentieva

Максим Жилинский: "Контейнеры: под капотом"

Olga Lavrentieva

Александр Протасеня: "PayPal. Различные способы интеграции"

Olga Lavrentieva

Сергей Черничков (.Net Developer в Altoros): "Интеграция платежных систем в .Net приложения" - Выбор платежной системы (Payment Gateway) - Обзор типовых решений интеграции платежных систем - Рекомендации по разработке, тестированию интеграции платежной системы.

Сергей Черничков: "Интеграция платежных систем в .Net приложения"

Olga Lavrentieva

Антон Шемерей (Senior Developer в Sphere Consulting, г.Минск) Доклад: «Single Responsibility Principle в Руби или почему instance/class variables это ОЧЕНЬ плохо» Всем приходится работать с унаследованным кодом и часами тратить время на поиск устранения ошибок, которых в большинстве случаев можно было бы легко избежать. Одним из краеугольных камней является нарушение принципа единственной ответственности. В докладе пойдет речь о том, как провести анализ кода, как его можно исправить и как избегать таких ошибок в будущем.

Антон Шемерей «Single responsibility principle в руби или почему instanceclas...

Olga Lavrentieva

Егор Воробьёв (Web Developer в Datarockets) Доклад: «Ruby internals» Юкихиро Мацумото и его команда потратили уйму времени, чтобы реализовать те вещи, которыми мы пользуемся каждый день. В своем докладе Егор расскажет, что скрывается за обычными строчками, которые каждый из нас использует, и объяснит, почему важно знать то, что находятся по ту сторону экрана.

Егор Воробьёв: «Ruby internals»

Olga Lavrentieva

Андрей Колешко (Team Lead проекта Mezuka) Доклад: «Что не так с Rails?» Андрей расскажет, как и почему он и его команда решили отказаться от многих возможностей Rails и чем их заменили на своем проекте. В целом рассказ Андрея - это рассуждение о том, к чему приводит неправильное использование Rails, почему Rails не годится для всех Web-проектов в том виде, в котором представляет его сообщество разработчиков, авторы книг и best practices.

Андрей Колешко «Что не так с Rails»

Olga Lavrentieva

Дмитрий Савицкий (Senior Software Engineer в Altoros) Доклад: «Ruby Anti-Magic Shield» Не упустите шанс попасть на сеанс практической магии с разоблачением от Дмитрия Савицкого. Способов помешать кому-то, кто пытается повлиять на ваш код со злым умыслом или по незнанию, не так уж и много. Дмитрий расскажет о тех немногочисленных возможностях, которые позволяют избежать запутанной и опасной "метамагии" в приложениях. Будет магически интересно.

Дмитрий Савицкий «Ruby Anti Magic Shield»

Olga Lavrentieva

Сергей Алексеев (Ruby Developer в Pinshape) Доклад: «Парное программирование. Удаленно» «Устали объяснять как это работает? Парное программирование – вместо тысячи слов. Потратили полдня на решение задачи и безрезультатно? Не тормозите – программируйте с напарником. Следуете трендам, следите за тенденциями – новое поколение выбирает парное программирование. Когда программировать одному уже не ice... Просто добавьте напарника. Несколько полезных инструментов и техник – мы отбираем только самое лучшее. Вы еще программируете в одиночку? Тогда мы идем к вам!»

Сергей Алексеев «Парное программирование. Удаленно»

Olga Lavrentieva

Алексей Дёмин (Java Developer в InData Labs) Доклад: «Почему Spark отнюдь не так хорош» О чём: Сейчас по всем каналам идёт обсуждение новой революционной технологии обработки данных Spark. Алексей предлагает взглянуть чуть глубже и узнать, действительно ли Spark так хорош, как нам рассказывает об этом маркетинг.

«Почему Spark отнюдь не так хорош»

Olga Lavrentieva

«Cassandra data modeling – моделирование данных для NoSQL СУБД Cassandra»

Olga Lavrentieva

Сергей Сверчков (Solution Architect в Altoros) Доклад: «Практика построения высокодоступного решения на базе Cloud Foundry PaaS ». О чём: В докладе Сергей продемонстрирует архитектуру решения, базирующуюся на OpenStack, Cassandra и Cloud Foundry (PaaS), расскажет об интересных особенностях Cloud Foundry. Он также опишет опыт в области обработки данных с медицинских приборов, опыт разработки решения с высокими требованиями по доступности, безопасности в этой области. В своей презентации Сергей раскроет нюансы работы над различными уровнями решения и их интеграцией.

«Практика построения высокодоступного решения на базе Cloud Foundry Paas»

Olga Lavrentieva

Виктор Смирнов (Java Tech Lead в Klika Technologies) Доклад: «Дизайн продвинутых нереляционных схем для Big Data» О чём: Виктор познакомит всех с примерами продвинутых нереляционных схем данных и тем, как они могут использоваться для решения задач, связанных с хранением и обработкой больших данных.

«Дизайн продвинутых нереляционных схем для Big Data»

Olga Lavrentieva

«Обзор возможностей Open cv»

Olga Lavrentieva

«Нужно больше шин! Eventbus based framework vertx.io»

Olga Lavrentieva

Сергей Нартымов (Software Engineer в Transinet GmbH, г.Минск) Доклад: «Работа с базами данных с использованием Sequel» О чём: Ruby библиотека для работы с базами данных Sequel представляет собой легковесную альтернативу более популярной Active Record. Sequel лежит в основе работы с SQL базами данных в ROM (Ruby Object Mapper) - развивающемся ORM для Ruby, реализующим паттерн Data Mapper. В докладе будут рассмотрены различные аспекты использования Sequel, в том числе показаны примеры использования некоторых возможностей PostgreSQL с помощью Sequel.

«Работа с базами данных с использованием Sequel»

Olga Lavrentieva

Mehr von Olga Lavrentieva (20)

15 10-22 altoros-fact_sheet_st_v4

Андрей Козлов (Altoros): Оптимизация производительности Cassandra

Владимир Иванов (Oracle): Java: прошлое и будущее

Brug - Web push notification

Александр Ломов: "Reactjs + Haskell + Cloud Foundry = Love"

Максим Жилинский: "Контейнеры: под капотом"

Александр Протасеня: "PayPal. Различные способы интеграции"

Сергей Черничков: "Интеграция платежных систем в .Net приложения"

Антон Шемерей «Single responsibility principle в руби или почему instanceclas...

Егор Воробьёв: «Ruby internals»

Андрей Колешко «Что не так с Rails»

Дмитрий Савицкий «Ruby Anti Magic Shield»

Сергей Алексеев «Парное программирование. Удаленно»

«Почему Spark отнюдь не так хорош»

«Cassandra data modeling – моделирование данных для NoSQL СУБД Cassandra»

«Практика построения высокодоступного решения на базе Cloud Foundry Paas»

«Дизайн продвинутых нереляционных схем для Big Data»

«Обзор возможностей Open cv»

«Нужно больше шин! Eventbus based framework vertx.io»

«Работа с базами данных с использованием Sequel»

Kürzlich hochgeladen

In this session, we will delve into strategic approaches for optimizing knowledge management within Microsoft 365, amidst the evolving landscape of Copilot. From leveraging automatic metadata classification and permission governance with SharePoint Premium, to unlocking Viva Engage for the cultivation of knowledge and communities, you will gain actionable insights to bolster your organization's knowledge-sharing initiatives. In this session, we will also explore how to facilitate solutions to enable your employees to find answers and expertise within Microsoft 365. You will leave equipped with practical techniques and a deeper understanding of how there is more to effective knowledge management than just enabling Copilot, but building actual solutions to prepare the knowledge that Copilot and your employees can use.

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Drew Madelung

AWS Community Day CPH - Three problems of Terraform

Andrey Devyatkin

GenCyber Cyber Security Day Presentation

Michael W. Hawkins

A Domino Admins Adventures (Engage 2024)

Gabriella Davis

The presentation explores the development and application of artificial intelligence (AI) from its inception to its current status in the modern world. The term "artificial intelligence" was first coined by John McCarthy in 1956 to describe efforts to develop computer programs capable of performing tasks that typically require human intelligence. This concept was first introduced at a conference held at Dartmouth College, where programs demonstrated capabilities such as playing chess, proving theorems, and interpreting texts. In the early stages, Alan Turing contributed to the field by defining intelligence as the ability of a being to respond to certain questions intelligently, proposing what is now known as the Turing Test to evaluate the presence of intelligent behavior in machines. As the decades progressed, AI evolved significantly. The 1980s focused on machine learning, teaching computers to learn from data, leading to the development of models that could improve their performance based on their experiences. The 1990s and 2000s saw further advances in algorithms and computational power, which allowed for more sophisticated data analysis techniques, including data mining. By the 2010s, the proliferation of big data and the refinement of deep learning techniques enabled AI to become mainstream. Notable milestones included the success of Google's AlphaGo and advancements in autonomous vehicles by companies like Tesla and Waymo. A major theme of the presentation is the application of generative AI, which has been used for tasks such as natural language text generation, translation, and question answering. Generative AI uses large datasets to train models that can then produce new, coherent pieces of text or other media. The presentation also discusses the ethical implications and the need for regulation in AI, highlighting issues such as privacy, bias, and the potential for misuse. These concerns have prompted calls for comprehensive regulations to ensure the safe and equitable use of AI technologies. Artificial intelligence has also played a significant role in healthcare, particularly highlighted during the COVID-19 pandemic, where it was used in drug discovery, vaccine development, and analyzing the spread of the virus. The capabilities of AI in healthcare are vast, ranging from medical diagnostics to personalized medicine, demonstrating the technology's potential to revolutionize fields beyond just technical or consumer applications. In conclusion, AI continues to be a rapidly evolving field with significant implications for various aspects of society. The development from theoretical concepts to real-world applications illustrates both the potential benefits and the challenges that come with integrating advanced technologies into everyday life. The ongoing discussion about AI ethics and regulation underscores the importance of managing these technologies responsibly to maximize their their benefits while minimizing potential harms.

Artificial Intelligence: Facts and Myths

Joaquim Jorge

MySQL Webinar, presented on the 25th of April, 2024. Summary: MySQL solutions enable the deployment of diverse Database Architectures tailored to specific needs, including High Availability, Disaster Recovery, and Read Scale-Out. With MySQL Shell's AdminAPI, administrators can seamlessly set up, manage, and monitor these solutions, ensuring efficiency and ease of use in their administration. MySQL Router, on the other hand, provides transparent routing from the application traffic to the backend servers in the architectures, requiring minimal configuration. Completely built in-house and supported by Oracle, these solutions have been adopted by enterprises of all sizes for their business-critical applications. In this presentation, we'll delve into various database architecture solutions to help you choose the right one based on your business requirements. Focusing on technical details and the latest features to maximize the potential of these solutions.

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Miguel Araújo

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Rafal Los

Discord is a free app offering voice, video, and text chat functionalities, primarily catering to the gaming community. It serves as a hub for users to create and join servers tailored to their interests. Discord’s ecosystem comprises servers, each functioning as a distinct online community with its own channels dedicated to specific topics or activities. Users can engage in text-based discussions, voice calls, or video chats within these channels. Understanding Discord Servers Discord servers are virtual spaces where users congregate to interact, share content, and build communities. Servers may revolve around gaming, hobbies, interests, or fandoms, providing a platform for like-minded individuals to connect. Communication Features Discord offers a range of communication tools, including text channels for messaging, voice channels for real-time audio conversations, and video channels for face-to-face interactions. These features facilitate seamless communication and collaboration. What Does NSFW Mean? The acronym NSFW stands for “Not Safe For Work,” indicating content that may be inappropriate for professional or public settings. NSFW Content NSFW content encompasses material that is sexually explicit, violent, or otherwise graphic in nature. It often includes nudity, profanity, or depictions of sensitive topics.

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

UK Journal

Boost Fertility New Invention Ups Success Rates.pdf

sudhanshuwaghmare1

The value of a flexible API Management solution for Open Banking Steve Melan, Manager for IT Innovation and Architecture - State's and Saving's Bank of Luxembourg Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - The value of a flexible API Management solution for O...

apidays

Histor y of HAM Radio presentation slide

vu2urc

💉💊+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI}}+971581248768 +971581248768 Mtp-Kit (500MG) Prices » Dubai [(+971581248768**)] Abortion Pills For Sale In Dubai, UAE, Mifepristone and Misoprostol Tablets Available In Dubai, UAE CONTACT DR.Maya Whatsapp +971581248768 We Have Abortion Pills / Cytotec Tablets /Mifegest Kit Available in Dubai, Sharjah, Abudhabi, Ajman, Alain, Fujairah, Ras Al Khaimah, Umm Al Quwain, UAE, Buy cytotec in Dubai +971581248768''''Abortion Pills near me DUBAI | ABU DHABI|UAE. Price of Misoprostol, Cytotec” +971581248768' Dr.DEEM ''BUY ABORTION PILLS MIFEGEST KIT, MISOPROTONE, CYTOTEC PILLS IN DUBAI, ABU DHABI,UAE'' Contact me now via What's App…… abortion Pills Cytotec also available Oman Qatar Doha Saudi Arabia Bahrain Above all, Cytotec Abortion Pills are Available In Dubai / UAE, you will be very happy to do abortion in Dubai we are providing cytotec 200mg abortion pill in Dubai, UAE. Medication abortion offers an alternative to Surgical Abortion for women in the early weeks of pregnancy. We only offer abortion pills from 1 week-6 Months. We then advise you to use surgery if its beyond 6 months. Our Abu Dhabi, Ajman, Al Ain, Dubai, Fujairah, Ras Al Khaimah (RAK), Sharjah, Umm Al Quwain (UAQ) United Arab Emirates Abortion Clinic provides the safest and most advanced techniques for providing non-surgical, medical and surgical abortion methods for early through late second trimester, including the Abortion By Pill Procedure (RU 486, Mifeprex, Mifepristone, early options French Abortion Pill), Tamoxifen, Methotrexate and Cytotec (Misoprostol). The Abu Dhabi, United Arab Emirates Abortion Clinic performs Same Day Abortion Procedure using medications that are taken on the first day of the office visit and will cause the abortion to occur generally within 4 to 6 hours (as early as 30 minutes) for patients who are 3 to 12 weeks pregnant. When Mifepristone and Misoprostol are used, 50% of patients complete in 4 to 6 hours; 75% to 80% in 12 hours; and 90% in 24 hours. We use a regimen that allows for completion without the need for surgery 99% of the time. All advanced second trimester and late term pregnancies at our Tampa clinic (17 to 24 weeks or greater) can be completed within 24 hours or less 99% of the time without the need surgery. The procedure is completed with minimal to no complications. Our Women's Health Center located in Abu Dhabi, United Arab Emirates, uses the latest medications for medical abortions (RU-486, Mifeprex, Mifegyne, Mifepristone, early options French abortion pill), Methotrexate and Cytotec (Misoprostol). The safety standards of our Abu Dhabi, United Arab Emirates Abortion Doctors remain unparalleled. They consistently maintain the lowest complication rates throughout the nation. Our Physicians and staff are always available to answer questions and care for women in one of the most difficult times in their lives. The decision to have an abortion at the Abortion Cl

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

Created by Mozilla Research in 2012 and now part of Linux Foundation Europe, the Servo project is an experimental rendering engine written in Rust. It combines memory safety and concurrency to create an independent, modular, and embeddable rendering engine that adheres to web standards. Stewardship of Servo moved from Mozilla Research to the Linux Foundation in 2020, where its mission remains unchanged. After some slow years, in 2023 there has been renewed activity on the project, with a roadmap now focused on improving the engine’s CSS 2 conformance, exploring Android support, and making Servo a practical embeddable rendering engine. In this presentation, Rakhi Sharma reviews the status of the project, our recent developments in 2023, our collaboration with Tauri to make Servo an easy-to-use embeddable rendering engine, and our plans for the future to make Servo an alternative web rendering engine for the embedded devices industry. (c) Embedded Open Source Summit 2024 April 16-18, 2024 Seattle, Washington (US) https://events.linuxfoundation.org/embedded-open-source-summit/ https://ossna2024.sched.com/event/1aBNF/a-year-of-servo-reboot-where-are-we-now-rakhi-sharma-igalia

A Year of the Servo Reboot: Where Are We Now?

Igalia

Real Time Object Detection Using Open CV

Khem

Scaling API-first – The story of a global engineering organization Ian Reasor, Senior Computer Scientist - Adobe Radu Cotescu, Senior Computer Scientist - Adobe Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

apidays

As privacy and data protection regulations evolve rapidly, organizations operating in multiple jurisdictions face mounting challenges to ensure compliance and safeguard customer data. With state-specific privacy laws coming up in multiple states this year, it is essential to understand what their unique data protection regulations will require clearly. How will data privacy evolve in the US in 2024? How to stay compliant? Our panellists will guide you through the intricacies of these states' specific data privacy laws, clarifying complex legal frameworks and compliance requirements. This webinar will review: - The essential aspects of each state's privacy landscape and the latest updates - Common compliance challenges faced by organizations operating in multiple states and best practices to achieve regulatory adherence - Valuable insights into potential changes to existing regulations and prepare your organization for the evolving landscape

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

TrustArc

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Product Anonymous

Scaling API-first – The story of a global engineering organization

Radu Cotescu

With more memory available, system performance of three Dell devices increased, which can translate to a better user experience Conclusion When your system has plenty of RAM to meet your needs, you can efficiently access the applications and data you need to finish projects and to-do lists without sacrificing time and focus. Our test results show that with more memory available, three Dell PCs delivered better performance and took less time to complete the Procyon Office Productivity benchmark. These advantages translate to users being able to complete workflows more quickly and multitask more easily. Whether you need the mobility of the Latitude 5440, the creative capabilities of the Precision 3470, or the high performance of the OptiPlex Tower Plus 7010, configuring your system with more RAM can help keep processes running smoothly, enabling you to do more without compromising performance.

Boost PC performance: How more available memory can improve productivity

Principled Technologies

GenAI Risks & Security Meetup 01052024.pdf

lior mazor

Kürzlich hochgeladen (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

AWS Community Day CPH - Three problems of Terraform

GenCyber Cyber Security Day Presentation

A Domino Admins Adventures (Engage 2024)

Artificial Intelligence: Facts and Myths

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

Boost Fertility New Invention Ups Success Rates.pdf

Apidays New York 2024 - The value of a flexible API Management solution for O...

Histor y of HAM Radio presentation slide

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

A Year of the Servo Reboot: Where Are We Now?

Real Time Object Detection Using Open CV

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Scaling API-first – The story of a global engineering organization

Boost PC performance: How more available memory can improve productivity

GenAI Risks & Security Meetup 01052024.pdf

Сергей Ковалёв (Altoros): Practical Steps to Improve Apache Hive Performance

1. Practical Steps to Improve Hive Queries Performance Sergey Kovalev Software Engineer at Altoros

2. How Hive works

3. 1. Use partitions whenever possible /folder1/video_data/file1 id, title, channelId, description, uploadYear 1, title1, channelId1, description1, 2012 2, title2, channelId2, description2, 2012 3, title3, channelId3, description3, 2013 4, title4, channelId4, description4, 2013 /folder1/video_data/2012/file1 1, title1, channelId1, description1, 2012 2, title2, channelId2, description2, 2012 /folder1/video_data/2013/file1 3, title3, channelId3, description3, 2013 4, title4, channelId4, description4, 2013 SELECT * from video WHERE uploadYear=’2013-04-08’

4. 1. Use partitions whenever possible create table video ( id STRING, title STRING, description STRING, viewCount BIGINT ) PARTITIONED BY (uploadYear date) STORED AS ORC; insert into table video PARTITION (uploadYear) select * from video_external;

5. 2. Use bucketing create table video ( id STRING, channelId STRING, title STRING, description STRING, ) CLUSTERED BY(channelId) INTO 2 BUCKETS STORED AS ORC; create table channel ( id STRING, title STRING, description STRING, viewCount BIGINT ) CLUSTERED BY(id) INTO 2 BUCKETS STORED AS ORC; SELECT v.title FROM video v JOIN channel ch ON v.channelId = ch.id WHERE ch.viewCount>1000

6. 2. Use bucketing /folder1/video_data/file1 id, title, channelId, description, uploadYear 1, title1, channelId1, description1, 2012 2, title2, channelId2, description2, 2012 3, title3, channelId3, description3, 2012 4, title4, channelId4, description4, 2012 5, title5, channelId5, description5, 2013 6, title6, channelId6, description6, 2013 7, title7, channelId7, description7, 2013 8, title8, channelId8, description8, 2013 /folder1/video_data/file1 2, title2, channelId2, description2, 2012 4, title4, channelId4, description4, 2012 6, title6, channelId6, description6, 2013 8, title8, channelId8, description8, 2013 /folder1/video_data/file2 1, title1, channelId1, description1, 2012 3, title3, channelId3, description3, 2012 5, title5, channelId5, description5, 2013 7, title7, channelId7, description7, 2013

7. 2. Use bucketing /folder1/channel_data/file1 id, title, description, viewCount channelId1, title1, description1, viewCount1 channelId2, title2, description2, viewCount2 channelId3, title3, description3, viewCount3 channelId4, title4, description4, viewCount4 channelId5, title5, description5, viewCount5 channelId6, title6, description6, viewCount6 channelId7, title7, description7, viewCount7 channelId8, title8, description8, viewCount8 /folder1/channel_data/file1 channelId2, title2, description2, viewCount2 channelId4, title4, description4, viewCount4 channelId6, title6, description6, viewCount6 channelId8, title8, description8, viewCount8 /folder1/channel_data/file2 channelId1, title1, description1, viewCount1 channelId3, title3, description3, viewCount3 channelId5, title5, description5, viewCount5 channelId7, title7, description7, viewCount7

8. 3. Partitions + bucketing create table video ( id STRING, channelId STRING, title STRING, description STRING, viewCount BIGINT ) PARTITIONED BY (uploadYear date) CLUSTERED BY(channelId) INTO 2 BUCKETS STORED AS ORC;

9. 3. Partitions + bucketing /folder1/video_data/file1 id, title, channelId, viewCount, uploadYear 1, title1, channelId1, viewCount1, 2012 2, title2, channelId2, viewCount2, 2012 3, title3, channelId3, viewCount3, 2012 4, title4, channelId4, viewCount4, 2012 5, title5, channelId5, viewCount5, 2013 6, title6, channelId6, viewCount6, 2013 7, title7, channelId7, viewCount7, 2013 8, title8, channelId8, viewCount8, 2013 /folder1/video_data/2012/file1 2, title2, description2, viewCount2, 2012 4, title4, description4, viewCount4, 2012 /folder1/video_data/2012/file2 1, title1, description1, viewCount1, 2012 3, title3, description3, viewCount3, 2012 /folder1/video_data/2013/file1 6, title6, description6, viewCount6, 2013 8, title8, description8, viewCount8, 2013 /folder1/video_data/2013/file2 5, title5, description5, viewCount5, 2013 7, title7, description7, viewCount7, 2013

10. 4. Use joins optimization Shuffle join/Common join:

11. 4. Use joins optimization Map-side join:

12. 4. Use joins optimization Sort-merge-bucket (SMB) join:

13. 5. Choose the right input format Row Data Column Store

14. 6. Other optimization Avoid highly normalized table structures Compress map/reduce output For map output compression, execute set mapred.compress.map.output = true. For job output compression, execute set mapred.output.compress = true. Use parallel execution SET hive.exce.parallel=true;

15. 7. Use the 'explain' keyword to improve the query execution plan EXPLAIN query...

16. 7. Use the 'explain' keyword to improve the query execution plan

17. 8. Stinger Initiative Use cost-based optimization Use vectorization Transactions with ACID semantics

18. 8. Hive on Tez

19. 8. Sub-Second Queries with Hive LLAP New approach using a hybrid engine that leverages Tez and something new called LLAP (Live Long and Process)

20. Questiones?

Сергей Ковалёв (Altoros): Practical Steps to Improve Apache Hive Performance

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Сергей Ковалёв (Altoros): Practical Steps to Improve Apache Hive Performance

Ähnlich wie Сергей Ковалёв (Altoros): Practical Steps to Improve Apache Hive Performance (20)

Mehr von Olga Lavrentieva

Mehr von Olga Lavrentieva (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Сергей Ковалёв (Altoros): Practical Steps to Improve Apache Hive Performance