Does Better Throughput Require Worse Latency?

•

0 likes•562 views

The document discusses the tradeoff between throughput and latency in parallel systems. It provides examples of how different algorithms for shared counters can impact latency and throughput. Specifically, it shows that increasing throughput, such as through more parallelism or replication, often leads to worse latency due to the increased communication between cores. The document concludes that there is generally a tradeoff between throughput and latency based on the number of readers, writers, and contention level in a parallel system.

Technology Education

Does Better Throughput
Require Worse Latency?
David Ungar, Doug Kimelman, Sam Adams, Mark Wegman
IBM T. J. Watson Research Center
Monday 7 January 2013

Example:
On-Line Transaction Processing
✦ Large “database” (100 GB) of information
✦ Constant stream of incoming updates & queries
✦ Need many cores to handle the work
✦ Cores need to communicate updates
✦ roll-ups sum over many variables
✦ Trick:
✦ Caching - updates must sync with invalidates
✦ Replication - updates must propagate
Monday 7 January 2013

Assumptions
✦ Too much computation for one core
✦ Not trivially scalable;
✦ needs communication
✦ Inputs constantly changing
✦ No sub-space radio:
✦ communication ﬁnite and limiting
Monday 7 January 2013

Throughput ~ Scaling
0
25
50
75
100
1 core 25 cores 50 cores 75 cores 100 cores
throughput = 1.0
throughput = 0.25
Monday 7 January 2013

Latency
✦ Inter-core
✦ Data structure/algorithm level
✦ Time needed for cause (input, computation result) on one
core to affect another
Δt
What is best possible latency (on a given platform)?
Monday 7 January 2013

Measure w/ Ring Counter
while (1) A
= D
;
while (1) D
= C
+ 1;
while(1)B
=A;
w
hile (1) C
=
B;
Core 1
Ring
Counter
Latency Baseline ≣ Time / Count / Number-of-Cores
Core2
Core 3
Core4
Monday 7 January 2013

Ring Counter
Latency Baselines
0
20
40
60
80
100
1 2 3 4 5 6 7 8
Normal loads & stores
Latency(ns)
# threads (4 cores, 2-way SMT)
0
20
40
60
80
100
1 2 3 4 5 6 7 8
Normal loads & stores + memory barrier
Latency(ns)
# threads (4 cores, 2-way SMT)
Other platforms? Signals? Atomics?
min
max
min
max
Monday 7 January 2013

The Intution
✦ After you have optimized:
✦ Suppose relative latency is 10
✦ Relative throughput is 1/4
✦ If you then raise throughput to 1/2
✦ Latency will increase to 20
Space of best algorithms exhibits this trade-off
Monday 7 January 2013

Variables
#readers
# writers
contention
reading/writing
Which Instructions
Normal loads &
stores
Atomic loads & stores
Signals
Memory barriers
Monday 7 January 2013

Shared Counter
From McKenney’s PerfBook
Monday 7 January 2013

Shared Counter
From McKenney’s PerfBook
Write code Read Code Latency Throughput
Serial
Mutex
Lock-
Free
Per-
thread
Per-
thread +
cache
Race &
Repair
C += delta C tiny single-core
lock, C += delta, unlock C
small, unless
writers convoy
higher, but writers
have locking &
contention
overhead
C +=atomic delta C
if contention
writers can starve
higher for low-
contention writers
per-thread-C += delta sum(all C’s) high if many cores
higher for writers,
lower for readers
per-thread-C += delta
another thread
maintains sum;
read sum
higher: summing
thread may be idle
high for both
readers and
writers
C += delta C
higher under
contention: lost
counts
high for both
readers and
writers
Monday 7 January 2013

Conclusions
✦ Throughput: how well parallelism gets work
done
✦ Latency: how fast one core responds to another
✦ Lots of dimensions: # readers, # writers,
contention
✦ Throughput vs Latency:
✦ throughput -> parallel -> distributed/
replicated -> more latency
Monday 7 January 2013

Viewers also liked

A Case for Relativistic Programming

racesworkshop

Welcome and Lightning Intros

racesworkshop

Dancing with Uncertainty

racesworkshop

(Relative) Safety Properties for Relaxed Approximate Programs

racesworkshop

Edge Chasing Delayed Consistency: Pushing the Limits of Weak Memory Models

racesworkshop

Keynote, LambdaConf 2015 - Ipecac for the Ouroboros

Paul Phillips

Beyond Expert-Only Parallel Programming

racesworkshop

Viewers also liked (7)

A Case for Relativistic Programming

Welcome and Lightning Intros

Dancing with Uncertainty

(Relative) Safety Properties for Relaxed Approximate Programs

Edge Chasing Delayed Consistency: Pushing the Limits of Weak Memory Models

Keynote, LambdaConf 2015 - Ipecac for the Ouroboros

Beyond Expert-Only Parallel Programming

Similar to Does Better Throughput Require Worse Latency?

import rdma: zero-copy networking with RDMA and Python

groveronline

Ruby threads are limited due to the Global Interpreter Lock. Therefore, the best way to do parallel computing with Ruby is to use multiple processes but how do you get these processes to communicate? This session will provide some strategies for handling multi-process communication in Ruby, with a focus on the use of TupleSpaces. A TupleSpace provides a repository of tuples that can be accessed concurrently to implement a Blackboard system. Ruby ships with a built-in implementation of a TupleSpace with the Rinda library. During the session, Luc will demonstrate how to use Rinda and will highlight other libraries/projects that facilitate interprocess communication and parallel computing in Ruby.

Concurrent Programming with Ruby and Tuple Spaces

luccastera

Introduction to Galera Cluster

Codership Oy - Creators of Galera Cluster

Making the Most Out of ScyllaDB's Awesome Concurrency at Optimizely

ScyllaDB

We rubyists historically haven’t been in the habit of thinking about concurrency but the reality is that our thread-unsafe code often works by sheer luck. There are different implementations of Ruby with their own semantics that can unearth challenging and unexpected concurrency bugs in our code. We have to become more accustomed to writing threadsafe code in order to anticipate these potential surprises, especially in light of the rise in popularity of JRuby. I will discuss approaches to writing threadsafe code in this talk, with a specific focus on performance considerations and testing. I'll start by explaining some basic concurrency concepts, describe methods for handling shared mutable data, and touch on the subtleties of concurrency primitives (Mutex, ConditionVariable). Hair-raising, real-world bugs will be used throughout the presentation to illustrate specific concurrency issues and techniques for solving them.

Ruby thread safety first

Emily Stolfo

This talk was given during Lucene Revolution 2017 and has two goals: first, to discuss the tradeoffs for running Solr on Docker. For example, you get dynamic allocation of operating system caches, but you also get some CPU overhead. We'll keep in mind that Solr nodes tend to be different than your average container: Solr is usually long running, takes quite some RSS and a lot of virtual memory. This will imply, for example, that it makes more sense to use Docker on big physical boxes than on configurable-size VMs (like Amazon EC2). The second goal is to discuss issues with deploying Solr on Docker and how to work around them. For example, many older (and some of the newer) combinations of Docker, Linux Kernel and JVM have memory leaks. We'll go over Docker operations best practices, such as using container limits to cap memory usage and prevent the host OOM killer from terminating a memory-consuming process - usually a Solr node. Or running Docker in Swarm mode over multiple smaller boxes to limit the spread of a single issue.

Solr on Docker - the Good, the Bad and the Ugly

Sematext Group, Inc.

Solr on Docker: the Good, the Bad, and the Ugly - Radu Gheorghe, Sematext Gro...

Lucidworks

Parallel Programming: Beyond the Critical Section

Tony Albrecht

Distributed systems are usually optimized with particular workloads in mind. At the same time, the system should still behave in a sane way when the assumptions about workload do not hold - notably, one user shouldn't be able to ruin the whole system's performance. Buggy parts of the system can be a source of the overload as well, so it is worth considering overload protection on a per-component basis. For example, ScyllaDB's shared-nothing architecture gives it great scalability, but at the same time makes it prone to a "hot partition" problem: a single partition accessed with disproportionate frequency can ruin performance for other requests handled by the same shards. This talk will describe how we implemented rate limiting on a per-partition basis which reduces the performance impact in such a case, and how we reduced the CPU cost of handling failed requests such as timeouts (spoiler: it's about C++ exceptions).

Retaining Goodput with Query Rate Limiting

ScyllaDB

Experiences building a multi region cassandra operations orchestrator on aws

Diego Pacheco

Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

Shuai Yuan

I am Norman H. I am a Computer Networking Assignment Expert at computernetworkassignmenthelp.com. I hold a Master's in Computer Science from, McMaster University, Canada. I have been helping students with their assignments for the past 15 years. I solve assignments related to Computer Networking. Visit computernetworkassignmenthelp.com or email support@computernetworkassignmenthelp.com. You can also call on +1 678 648 4277 for any assistance with Computer Networking Assignment.

Computer Networking Assignment Help

Computer Network Assignment Help

Tuning Java Servers

Srinath Perera

Speaker: Konrad Malawski Language: English It's the year 2015, so unless you've been living under a rock for the last decade, you probably have heard about servers and platforms needing to go asynchronous in order to scale. But really, how deep did you dive into the reasons as why this need arrises? This talk aims to explain the various reasons and techniques that can be (and often are) used in developing high performance web applications - from the kernel depths, to the high level abstractions that all contribute to such designs. We'll start with the lowest level of them all - the network transports we all use and how they impact latency in our systems. Then we will move on to operating systems' socket selector implementation details and the now legendary C10K problem, to see how implementations were forced to change in order to survive the ever-rising number of concurrent connections. Next we'll dive into processor and thread utilisation effects and how parallel programming - using either message-passing or stream processing style libraries fits into the grand picture of pursuing the most stable and lowest latency characteristics we could dream of. Visit our website: http://atmosphere-conference.com/

Atmosphere Conference 2015: Need for Async: In pursuit of scalable internet-s...

PROIDEA

This is the story of how we built a highly available data pipeline that processes terabytes of network data every day, making it available to security researchers for assessment and threat hunting. Building this kind of stuff in the cloud is not that complicated, but if you have to make it near real-time, fault tolerant and 24/7 available, well... that's another story. In this talk, we will tell you how we achieved this ambitious goal and how we missed a few good nights of sleep while trying to do that! Spoiler alert: contains AWS, serverless, elastic search, monitoring, alerting & more!

Processing TeraBytes of data every day and sleeping at night

Luciano Mammino

Architectural Overview of MapR's Apache Hadoop Distribution

mcsrivas

This is the story of how we built a highly available data pipeline that processes terabytes of network data every day, making it available to security researchers for assessment and threat hunting. Building this kind of stuff on AWS is not that complicated, but if you have to make it near real-time, fault tolerant and 24/7 available, well... that's another story. In this talk, we will tell you how we achieved this ambitious goal and how we missed a few good nights of sleep while trying to do that! Spoiler alert: contains AWS, serverless, elastic search, monitoring, alerting & more!

Processing TeraBytes of data every day and sleeping at night

Luciano Mammino

(Surge 2014) How do you build a distributed cache invalidation system that can invalidate content in 150 milliseconds across a global network of servers? Fastly CTO Tyler McMullen and engineer Bruce Spang will discuss the process of constructing a production-ready distributed system built on solid theoretical foundations. This talk will cover using research to design systems, the bimodal multicast algorithm, and the behavior of this system in production.

Developing a Globally Distributed Purging System

Fastly

This is the story of how we built a highly available data pipeline that processes terabytes of network data every day, making it available to security researchers for security assessment and threat hunting. Building this kind of stuff in the cloud is not that complicated, but if you have to make it near real-time, fault tolerant and 24/7 available, well... that's another story. In this talk, Luciano and Domagoj will tell you how they achieved this ambitious goal and how they missed a few good nights of sleep while trying to do that! Spoiler alert: contains AWS, lambda, elastic search, monitoring, alerting & more!

Processing Terabytes of data every day … and sleeping at night (infiniteConf ...

Luciano Mammino

Highlighted notes of: Chapter 9: Atomics Book: CUDA by Example An Introduction to General Purpose GPU Computing Authors: Jason Sanders Edward Kandrot “This book is required reading for anyone working with accelerator-based computing systems.” –From the Foreword by Jack Dongarra, University of Tennessee and Oak Ridge National Laboratory CUDA is a computing architecture designed to facilitate the development of parallel programs. In conjunction with a comprehensive software platform, the CUDA Architecture enables programmers to draw on the immense power of graphics processing units (GPUs) when building high-performance applications. GPUs, of course, have long been available for demanding graphics and game applications. CUDA now brings this valuable resource to programmers working on applications in other domains, including science, engineering, and finance. No knowledge of graphics programming is required–just the ability to program in a modestly extended version of C. CUDA by Example, written by two senior members of the CUDA software platform team, shows programmers how to employ this new technology. The authors introduce each area of CUDA development through working examples. After a concise introduction to the CUDA platform and architecture, as well as a quick-start guide to CUDA C, the book details the techniques and trade-offs associated with each key CUDA feature. You’ll discover when to use each CUDA C extension and how to write CUDA software that delivers truly outstanding performance. Table of Contents Why CUDA? Why Now? Getting Started Introduction to CUDA C Parallel Programming in CUDA C Thread Cooperation Constant Memory and Events Texture Memory Graphics Interoperability Atomics Streams CUDA C on Multiple GPUs The Final Countdown All the CUDA software tools you’ll need are freely available for download from NVIDIA. Jason Sanders is a senior software engineer in NVIDIA’s CUDA Platform Group, helped develop early releases of CUDA system software and contributed to the OpenCL 1.0 Specification, an industry standard for heterogeneous computing. He has held positions at ATI Technologies, Apple, and Novell. Edward Kandrot is a senior software engineer on NVIDIA’s CUDA Algorithms team, has more than twenty years of industry experience optimizing code performance for firms including Adobe, Microsoft, Google, and Autodesk.

CUDA by Example : Atomics : Notes

Subhajit Sahu

Similar to Does Better Throughput Require Worse Latency? (20)

import rdma: zero-copy networking with RDMA and Python

Concurrent Programming with Ruby and Tuple Spaces

Introduction to Galera Cluster

Making the Most Out of ScyllaDB's Awesome Concurrency at Optimizely

Ruby thread safety first

Solr on Docker - the Good, the Bad and the Ugly

Solr on Docker: the Good, the Bad, and the Ugly - Radu Gheorghe, Sematext Gro...

Parallel Programming: Beyond the Critical Section

Retaining Goodput with Query Rate Limiting

Experiences building a multi region cassandra operations orchestrator on aws

Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

Computer Networking Assignment Help

Tuning Java Servers

Atmosphere Conference 2015: Need for Async: In pursuit of scalable internet-s...

Processing TeraBytes of data every day and sleeping at night

Architectural Overview of MapR's Apache Hadoop Distribution

Processing TeraBytes of data every day and sleeping at night

Developing a Globally Distributed Purging System

Processing Terabytes of data every day … and sleeping at night (infiniteConf ...

CUDA by Example : Atomics : Notes

Recently uploaded

Advantages of Hiring UIUX Design Service Providers for Your Business

Pixlogix Infotech

With more memory available, system performance of three Dell devices increased, which can translate to a better user experience Conclusion When your system has plenty of RAM to meet your needs, you can efficiently access the applications and data you need to finish projects and to-do lists without sacrificing time and focus. Our test results show that with more memory available, three Dell PCs delivered better performance and took less time to complete the Procyon Office Productivity benchmark. These advantages translate to users being able to complete workflows more quickly and multitask more easily. Whether you need the mobility of the Latitude 5440, the creative capabilities of the Precision 3470, or the high performance of the OptiPlex Tower Plus 7010, configuring your system with more RAM can help keep processes running smoothly, enabling you to do more without compromising performance.

Boost PC performance: How more available memory can improve productivity

Principled Technologies

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Martijn de Jong

Imagine a world where information flows as swiftly as thought itself, making decision-making as fluid as the data driving it. Every moment is critical, and the right tools can significantly boost your organization’s performance. The power of real-time data automation through FME can turn this vision into reality. Aimed at professionals eager to leverage real-time data for enhanced decision-making and efficiency, this webinar will cover the essentials of real-time data and its significance. We’ll explore: FME’s role in real-time event processing, from data intake and analysis to transformation and reporting An overview of leveraging streams vs. automations FME’s impact across various industries highlighted by real-life case studies Live demonstrations on setting up FME workflows for real-time data Practical advice on getting started, best practices, and tips for effective implementation Join us to enhance your skills in real-time data automation with FME, and take your operational capabilities to the next level.

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Safe Software

Enterprise Knowledge’s Urmi Majumder, Principal Data Architecture Consultant, and Fernando Aguilar Islas, Senior Data Science Consultant, presented "Driving Behavioral Change for Information Management through Data-Driven Green Strategy" on March 27, 2024 at Enterprise Data World (EDW) in Orlando, Florida. In this presentation, Urmi and Fernando discussed a case study describing how the information management division in a large supply chain organization drove user behavior change through awareness of the carbon footprint of their duplicated and near-duplicated content, identified via advanced data analytics. Check out their presentation to gain valuable perspectives on utilizing data-driven strategies to influence positive behavioral shifts and support sustainability initiatives within your organization. In this session, participants gained answers to the following questions: - What is a Green Information Management (IM) Strategy, and why should you have one? - How can Artificial Intelligence (AI) and Machine Learning (ML) support your Green IM Strategy through content deduplication? - How can an organization use insights into their data to influence employee behavior for IM? - How can you reap additional benefits from content reduction that go beyond Green IM?

Driving Behavioral Change for Information Management through Data-Driven Gree...

Enterprise Knowledge

Presentation on how to chat with PDF using ChatGPT code interpreter

naman860154

As privacy and data protection regulations evolve rapidly, organizations operating in multiple jurisdictions face mounting challenges to ensure compliance and safeguard customer data. With state-specific privacy laws coming up in multiple states this year, it is essential to understand what their unique data protection regulations will require clearly. How will data privacy evolve in the US in 2024? How to stay compliant? Our panellists will guide you through the intricacies of these states' specific data privacy laws, clarifying complex legal frameworks and compliance requirements. This webinar will review: - The essential aspects of each state's privacy landscape and the latest updates - Common compliance challenges faced by organizations operating in multiple states and best practices to achieve regulatory adherence - Valuable insights into potential changes to existing regulations and prepare your organization for the evolving landscape

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

TrustArc

MySQL Webinar, presented on the 25th of April, 2024. Summary: MySQL solutions enable the deployment of diverse Database Architectures tailored to specific needs, including High Availability, Disaster Recovery, and Read Scale-Out. With MySQL Shell's AdminAPI, administrators can seamlessly set up, manage, and monitor these solutions, ensuring efficiency and ease of use in their administration. MySQL Router, on the other hand, provides transparent routing from the application traffic to the backend servers in the architectures, requiring minimal configuration. Completely built in-house and supported by Oracle, these solutions have been adopted by enterprises of all sizes for their business-critical applications. In this presentation, we'll delve into various database architecture solutions to help you choose the right one based on your business requirements. Focusing on technical details and the latest features to maximize the potential of these solutions.

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Miguel Araújo

BooK Now Call us at +918448380779 to hire a gorgeous and seductive call girl for sex. Take a Delhi Escort Service. The help of our escort agency is mostly meant for men who want sexual Indian Escorts In Delhi NCR. It should be noted that any impersonator will get 100 attention from our Young Girls Escorts in Delhi. They will assume the position of reliable allies. VIP Call Girl With Original Photos Book Tonight +918448380779 Our Cheap Price 1 Hour not available 2 Hours 5000 Full Night 8000 TAG: Call Girls in Delhi, Noida, Gurgaon, Ghaziabad, Connaught Place, Greater Kailash Delhi, Lajpat Nagar Delhi, Mayur Vihar Delhi, Chanakyapuri Delhi, New Friends Colony Delhi, Majnu Ka Tilla, Karol Bagh, Malviya Nagar, Saket, Khan Market, Noida Sector 18, Noida Sector 76, Noida Sector 51, Gurgaon Mg Road, Iffco Chowk Gurgaon, Rajiv Chowk Gurgaon All Delhi Ncr Free Home Deliver

08448380779 Call Girls In Civil Lines Women Seeking Men

Delhi Call girls

Boost Fertility New Invention Ups Success Rates.pdf

sudhanshuwaghmare1

Created by Mozilla Research in 2012 and now part of Linux Foundation Europe, the Servo project is an experimental rendering engine written in Rust. It combines memory safety and concurrency to create an independent, modular, and embeddable rendering engine that adheres to web standards. Stewardship of Servo moved from Mozilla Research to the Linux Foundation in 2020, where its mission remains unchanged. After some slow years, in 2023 there has been renewed activity on the project, with a roadmap now focused on improving the engine’s CSS 2 conformance, exploring Android support, and making Servo a practical embeddable rendering engine. In this presentation, Rakhi Sharma reviews the status of the project, our recent developments in 2023, our collaboration with Tauri to make Servo an easy-to-use embeddable rendering engine, and our plans for the future to make Servo an alternative web rendering engine for the embedded devices industry. (c) Embedded Open Source Summit 2024 April 16-18, 2024 Seattle, Washington (US) https://events.linuxfoundation.org/embedded-open-source-summit/ https://ossna2024.sched.com/event/1aBNF/a-year-of-servo-reboot-where-are-we-now-rakhi-sharma-igalia

A Year of the Servo Reboot: Where Are We Now?

Igalia

Axa Assurance Maroc - Insurer Innovation Award 2024

The Digital Insurer

The Raspberry Pi 5 was announced on October 2023. This new version of the popular embedded device comes with a new iteration of Broadcom’s VideoCore GPU platform, and was released with a fully open source driver stack, developed by Igalia. The presentation will discuss some of the major changes required to support this new Video Core iteration, the challenges we faced in the process and the solutions we provided in order to deliver conformant OpenGL ES and Vulkan drivers. The talk will also cover the next steps for the open source Raspberry Pi 5 graphics stack. (c) Embedded Open Source Summit 2024 April 16-18, 2024 Seattle, Washington (US) https://events.linuxfoundation.org/embedded-open-source-summit/ https://eoss24.sched.com/event/1aBEx

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

Igalia

Explore 'The Codex of Business: Writing Software for Real-World Solutions,' a compelling SlideShare presentation that delves into digital transformation in healthcare. Discover through a detailed case study how Agile methodologies empower healthcare providers to develop, iterate, and refine digital solutions that address real-world challenges. Learn how strategic planning, user feedback, and continuous improvement drive success in deploying technologies that enhance patient care and operational efficiency. Ideal for healthcare professionals, IT specialists, and digital transformation advocates seeking actionable insights and practical examples of technology making a real difference.

The Codex of Business Writing Software for Real-World Solutions 2.pptx

Malak Abu Hammad

In this session, we will delve into strategic approaches for optimizing knowledge management within Microsoft 365, amidst the evolving landscape of Copilot. From leveraging automatic metadata classification and permission governance with SharePoint Premium, to unlocking Viva Engage for the cultivation of knowledge and communities, you will gain actionable insights to bolster your organization's knowledge-sharing initiatives. In this session, we will also explore how to facilitate solutions to enable your employees to find answers and expertise within Microsoft 365. You will leave equipped with practical techniques and a deeper understanding of how there is more to effective knowledge management than just enabling Copilot, but building actual solutions to prepare the knowledge that Copilot and your employees can use.

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Drew Madelung

How to Troubleshoot Apps for the Modern Connected Worker

ThousandEyes

Slack Application Development 101 Slides

praypatel2

Real Time Object Detection Using Open CV

Khem

[2024]Digital Global Overview Report 2024 Meltwater.pdf

hans926745

What is a good lead in your organisation? Which leads are priority? What happens to leads? When sales and marketing give different answers to these questions, or perhaps aren't sure of the answers at all, frustrations build and opportunities are left on the table. Join us for an illuminating session with Cian McLoughlin, HubSpot Principal Customer Success Manager, as we look at that crucial piece of the customer journey in which leads are transferred from marketing to sales.

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

HampshireHUG

Recently uploaded (20)