Writing concurrent code is becoming more and more important to leverage the parallelism of multicore architectures. The C++11 library introduced futures and promises as a first step towards task-based programming. However, the C++ support of concurrency is still very limited. Other languages, like C# and Python, provide some forms of resumable functions or coroutines and in C#, the async/await pattern enables to write functions that suspend their execution while waiting for a computation or I/O to complete.This talk will describe a proposal for the addition of resumable function and async/await in C++17. We will focus on the implementation of resumable function on Windows, and we'll play with a first prototype of their implementation in the Visual Studio 2015 Preview. Finally, we will see how resumable functions can also be used to implement (lazy) generators, similar to the one provided by "yield" statements in C#.
2. Concurrency: why?
Herb Sutter quotes:
« Welcome to the Parallel Jungle »
“The network is just another bus to more compute cores.”
« C++ and Beyond 2012 »
DON’T STOP!
Blocking a thread is bad for scalability.
• Client side: we want responsive apps (ex: iOS, Win8)
• UIs must be fluid: never block the GUI thread!
• In WinRT all API that take > 50 ms are only asynchronous.
• Server side: we want to serve the max # or requests/sec
• Allocating too many threads is expensive
• We don’t want to have threads blocked on I/O.
2
3. Concurrency: how?
Concurrency is a huge topic.
• Anthony Williams,
C++ Concurrency in Action
• Joe Duffy,
Concurrent Programming on
Windows
• Scott Meyers,
Effective Modern C++
• Bjarne Stroustrup,
The C++ Programming
Language (4th Edition)
3
4. Concurrency: how?
Concurrency is not a new problem, there have been solutions for many
years (OS threads, coroutines, synchronization objects, …)
Now in C++11/14:
• std::threads
• std::mutex, std::lock_guard<>, std::unique_lock<>
• std::condition_variable
• std::atomic_xxx, std::atomic<>, std::atomic_thread_fence()
• std::future<>, std::shared_future<> , std::atomic_future<>
• std::promise<>,
• std::packaged_task<>
• …
• Work for C++17:
• Concurrency TS (Nonblocking futures (.then), executors, await)
4
6. Threads
It’s now possible to write portable multithreaded code in C++.
Example:
#include <thread>
std::thread t([=]{
writeFile(readFile(inFile), outFile);
});
...
t.join();
Note: no direct way to transfer back a value.
6
8. Callbacks and task continuations
We setup processing (task) pipelines to handle events:
• With callbacks
• Example from Win32:
BOOL WriteFileEx(HANDLE hFile, LPCVOID lpBuffer,
DWORD nNumberOfBytesToWrite, LPOVERLAPPED lpOverlapped,
LPOVERLAPPED_COMPLETION_ROUTINE lpCompletionRoutine);
Problem: “Spaghetti code” difficult to read and to maintain – callbacks are the
modern goto!
• With tasks and continuations
• Using libraries like TPL or PPL.
do A do D
A request
has arrived!
do B
do C
Send a
response
8
9. Tasks, futures and promises
In C++11:
• std::future<T>: the result of an async computation (task). It’s a proxy for a
value that will become available.
• std::promise<T>: provides a means of setting a value which can later be
read through an associated std::future<T>.
The act of invocation is separated from the act of retrieving the result.
future
value
promise
get()/wait()
set_value()
set_exception()
task 2task 1
9
get_future()
10. std::async
std::async(): starts an asynchronous task and returns a future
which will eventually hold the task result.
Example:
size_t async_copyFile(const string& inFile, const string& outFile)
{
auto fut1 = std::async(readFile, inFile);
auto fut2 = std::async([&fut1](const string& path){
return writeFile(fut1.get(), path);
},
outFile);
...
// do more work
...
return fut2.get();
}
10
11. std::async
When and where is the async function executed?
• std::async behaves as a task scheduler. The behavior can be
configured with an argument. There are two launch modes:
• std::launch::async
greedy : starts a new thread to run the task.
• std::launch::deferred
lazy: runs the task synchronously only when the future is waited for.
Concurrency != parallelism
11
12. The problem with async…
• The async feature was not designed very well!
• Will this function block?
void func()
{
future<int> f = start_async_task();
//... more code that doesn’t call f.get() or f.wait()
} // future destructed here... Does it block waiting for the task to complete?
It depends!!!
• It blocks if the future was created with std::async(l);
• It does not block otherwise
They need to fix this mess…
12
13. Broken promises
C++11 Futures and promises are still very limited. They are not easily
composable.
• We can only wait on a future at the time. No support for:
• wait_any()
• wait_all()
• It is not possible to attach continuations. No support for:
• then()
Composable tasks would allow making the whole architecture non-
blocking and event-driven.
For more details, see Bartosz Milewski’s article: “Broken promises”.
13
14. .NET Tasks
In .NET we have the TPL, Task Parallel Library
• Task<T> is the equivalent of std::future<T>.
• Supports composition with Task.WaitAny() and Task. WaitAll().
• Allows to chain sequences of tasks as continuations, with
Task.ContinueWith().
Example:
static Task<int> CopyFile(string inFile, string outFile)
{
var tRead = new Task<byte[]>(() => ReadFile(inFile));
var tWrite = tRead.ContinueWith((t) =>
WriteFile(outFile, t.Result));
tRead.Start();
return tWrite;
}
tRead tWrite
14
15. Microsoft PPL
PPL (Parallel Patterns Library) is Microsoft implementation of C++ tasks.
• PPL Concurrency::task<T> ~= what a std::future<T> should be:
• task::wait_any()
• task::wait_all()
• task::then()
• Windows RT provides many async APIs that return task<T> and uses PPL as its
model of asynchrony in C++.
• C++ REST SDK (Casablanca)
• Library for asynchronous C++ code that connects with cloud-based services.
• APIs for HTTP, Websocket, JSON, asynchronous streams, …
• Designed to be completely asynchronous – uses PPL tasks everywhere.
• Supports Windows, Linux, OS X, iOS, and Android!
• Also provides a portable implementation of PPL tasks.
15
17. C++17: Concurrency TS
Next major release
• 8 Technical Specifications (TS):
• Library fundamentals (optional<>, string_view)
• Arrays (runtime-sized arrays, dynarray<>)
• Parallelism (Parallel STL library, parallel algorithms)
• Concurrency (Nonblocking futures (.then), executors, await)
• File System (Portable file system access: Boost.Filesystem)
• Networking (IP addresses, URIs, byte ordering)
• Concepts Lite (Extensions for template type checking)
• Transactional Memory
17
18. Towards a better std::future
Proposal (N3857) for an extended version of futures, with the same
semantic of PPL tasks:
• future::when_any()
• future::when_all()
• future::then()
• future::unwrap()
Example:
std::future<size_t> cpp17_copyFile(const string& inFile, const string& outFile)
{
return std::async([inFile]() {
return readFile(inFile);
}).then([outFile](const vector<char>& buffer) {
return writeFile(buffer, outFile);
});
}
18
19. The problem with tasks
Tasks do not work well with iterations or branches.
• Let’s say that we want to copy a file asynchronously chunk by chunk.
There is no easy way to construct a loop of continuations with tasks.
We need to dynamically attach “recursive” task continuations:
task<void> repeat()
{
return create_task(readFileChunkAsync())
.then(writeFileChunkAsync)
.then([]({
if (not_completed()) {
repeat();
} else {
return create_task([]{}); // empty task
}
});
}
19
22. PPL: a Casablanca sample (1/3)
From sample ‘SearchFile’:
// A convenient helper function to loop asychronously until a condition is met.
pplx::task<bool> _do_while_iteration(std::function<pplx::task<bool>(void)> func)
{
pplx::task_completion_event<bool> ev;
func().then([=](bool guard)
{
ev.set(guard);
});
return pplx::create_task(ev);
}
pplx::task<bool> _do_while_impl(std::function<pplx::task<bool>(void)> func)
{
return _do_while_iteration(func).then([=](bool guard) -> pplx::task<bool>
{
if (guard) {
return ::_do_while_impl(func);
}
else {
return pplx::task_from_result(false);
}
});
}
pplx::task<void> do_while(std::function<pplx::task<bool>(void)> func)
{
return _do_while_impl(func).then([](bool){});
}
22
23. // Function to create in data from a file and search for a given string writing all lines
containing the string to memory_buffer.
static pplx::task<void> find_matches_in_file(const string_t &fileName,
const std::string &searchString,
basic_ostream<char> results)
{
return file_stream<char>::open_istream(fileName).then([=](basic_istream<char> inFile)
{
auto lineNumber = std::make_shared<int>(1);
return ::do_while([=]()
{
container_buffer<std::string> inLine;
return inFile.read_line(inLine).then([=](size_t bytesRead)
{
if (bytesRead == 0 && inFile.is_eof()) {
return pplx::task_from_result(false);
}
else if (inLine.collection().find(searchString) != std::string::npos)
{
results.print("line ");
results.print((*lineNumber)++);
PPL: a Casablanca sample (2/3)
(continues…)
23
25. C# async-await
C#5.0 solution: async/await make the code “look” synchronous!
Example: copy a file chunk by chunk
static private async Task CopyChunk(Stream input, Stream output)
{
byte[] buffer = new byte[4096];
int bytesRead;
while ((bytesRead = await input.ReadAsync(buffer, 0, buffer.Length)) != 0) {
await output.WriteAsync(buffer, 0, bytesRead);
}
}
static public async Task CopyFile(string inFile, string outFile)
{
using (StreamReader sr = new StreamReader(inFile))
{
using (StreamWriter sw = new StreamWriter(outFile)) {
await CopyChunk(sr.BaseStream, sw.BaseStream);
}
}
}
25
26. async/await in C#
What happens actually when we await?
• The functions pause and resume
• The compiler transforms an ‘async’ function into a class that
implements a state machine
• All local variables become data members of the class
• On ‘await’ the code attaches a continuation to the invoked task
• When the invoked task completes, the continuation is called and the
state machine resume
• In which thread? It depends on the current SynchronizationContext (either
the same thread or any thread from the pool).
All this gives the impression that the function pauses and resumes.
26
27. C# iterators (generators)
• Async/await is not the only example of resumable functions in C#
• C# iterator blocks contain the yield statement and return lazily a
sequence of values:
IEnumerable<int> Fib(int max)
{
int a = 0;
int b = 1;
while (a <= max)
{
yield return a;
int next = a + b;
a = b; b = next;
}
}
• Are implemented with a state machine: the function Fib() is
compiled as a class that implements enumerator interfaces.
• On MoveNext() the state machine resumes from the last suspension point,
and executes until the next yield statement.
All this gives the impression that the function pauses and resumes.
27
28. LINQ (to objects)
• Based on C# generators
• Declarative “query language” that operates on lazily generated
sequences
Example: a (slow!) generator of prime numbers:
var primes = Enumerable.Range(2, max)
.Where(i => Enumerable.Range(2, i - 2)
.All(j => i%j != 0));
28
29. Resumable functions (Coroutines)
• Coroutines: generalization of routines, allow suspending and
resuming execution at certain “suspension points”, preserving the
execution context.
• Boost libraries: Boost.Coroutine, Boost.Context
• Use Posix API on Linux
• Use Win32 Fibers on Windows
• Quite fast (coroutine switch 50-80 CPU cycles)
• Fibers: lightweight threads of execution. OS-managed coroutines.
• Added to Windows NT 4.0 to support cooperative multitasking.
• SwitchToFiber() yields the execution to another fiber.
29
30. Resumable functions in C++17 ?
• Lots of proposals for resumable functions in C++17!!!
• Current proposal: N4134 (Nishanov, Radigan: MSFT)
• Two new keywords: await, yield
• First prototype available in Visual Studio 2015 Preview
Examples:
future<size_t> copyFile(const string& inFile, const string& outFile)
{
string s = await readFileAsync(inFile);
return await writeFileAsync(s, outFile);
}
generator<int> fib(int max)
{
int a = 0;
int b = 1;
while (a <= max) {
yield a;
int next = a + b;
a = b;
b = next;
}
} 30
32. task<void> copyFile(string inFilePath, string outFilePath)
{
auto inFile = make_shared<ifstream>(inFilePath, ios::binary | ios::ate);
inFile->seekg(0, inFile->beg);
auto outFile = make_shared<ofstream>(outFilePath, ios::binary);
while (true) {
string s = await readFileChunk(inFile, 4096);
if (s.empty()) {
break;
}
await writeFileChunk(outFile, s);
}
}
32
Copy a file in chunks with resumable functions
33. C++ resumable functions: how?
Several possible implementations:
• with a state machine (like C#)
• with stackful coroutines (Fibers)
• with stackless coroutines
• Resumable functions require changes both to:
• the language
• New statements (async, await)
• Compiler “transforms” resumable functions to support suspension and
resumption
• the library
• Requires “improved futures” (like PPL tasks)
• Need more advanced schedulers (executors).
33
34. Stackful coroutines
Visual Studio 2014 CTP had a first implementation, based on Win32
Fibers
• An async function has its own resumable side stack, separated by the
“real thread” stack.
• The side stack lives beyond the suspension points until logical
completion.
• Problem: not really scalable!
• Each fiber reserves a stack (1MB by default), so a few thousand
coroutines will exhaust all VM (in 32 bit).
• Context switches are relatively expensive.
34
36. Stackless coroutines (N4134)
36
task<int> bar() { ... }
task<R> foo(T1 a, T2 b) {
// function body
await bar(); // suspension point (suspends foo, resumes bar)
}
compiled as:
task<R> foo(T1 a, T2 b) {
auto rh = new resumable_handle<R, T1, T2>(a, b);
(*rh)();
}
The resumable_handle is a compiler-generated class that contains
the state of a coroutine and implements the call operator ():
void operator () {
// function body
task<int> t = bar();
// suspension point
if (!await_ready(t)) {
await_suspend(t);
}
_result = await_resume(t);
}
_promise
a
b
local vars
resumable_handle (foo)
37. Stackless coroutines (N4134)
• Awaitable types are types for which a library provides the support
for await statements, by implementing:
bool await_ready(T& t) const
void await_suspend(T& t, resumable_handle rh)
void await_resume(T& t)
• For example, for PPL tasks, the awaitable functions are:
bool await_ready(task& t) const {
return t.is_done();
}
void await_suspend(task& t, resumable_handle rh) {
t.then([rh](task&){ rh(); };
}
void await_resume(task& t) {
t.get();
}
37
38. Stackless coroutines (N4134)
• The coroutine_promise class is a library class that implements the
semantics of a particular type of coroutine (ex: await, generators, …).
It must implement functions like:
void set_result(T val)
T get_return_object(resumable_handle rh)
void set_exception(E e)
AwaitType yield_value(T val)
...
• Functions with await statements => compiler generates code that
allocates a resumable_handle and uses library code for coroutine-
promises and awaitable type to implement the logic of a
suspension/resumption point.
Note: this is still just a proposal! (unlikely to be standardized in C++17).
38
39. Generators (N4134)
Generator: resumable function that provides a sequence of values (“yielded” lazily,
only when the next element is requested)
Example: Fibonacci sequence
generator<int> fib(int max) {
int a = 0;
int b = 1;
while (a <= max) {
yield a;
int next = a + b;
a = b;
b = next;
}
}
for (auto n in fib(1000)) {
std::cout << n << std::endl;
}
• The library defines a generator<T> class, with a special kind of coroutine
promised designed to support the yield statement.
• Generators behave as ranges (provide input iterators).
• Ideally, should be composable with LINQ-like operators and interoperable with
Niebler’s ranges. 39
_promise
resumable_handle
(fib)
generator
generator::
iterator
begin()
end()
operator ++ ()
operator == ()
40. Generators (N4134)
template<typename T>
class generator
{
class iterator
{
resumable_handle _coro;
iterator(resumable_handle rh) : _coro(rh) {}
iterator(nullptr_t) {} // represents the end of the sequence
iterator operator ++ () {
_coro(); // resume execution
return *this;
}
bool operator == (const iterator& rhs) {
return _coro == rhs._coro;
}
const T& operator * () const {
return _coro.promise()._CurrentValue;
}
};
resumable_handle _coro;
generator(resumable_handle rh) : _coro(rh) {}
iterator begin() {
_coro(); // starts the coroutine and executes it until it terminates or yields.
return iterator(_coro);
}
iterator end() { return iterator(nullptr); }
};
40
41. In conclusion…
• Don’t block: write asynchronous code!
• C++11 futures are a good start, but there is still work to do.
• Meanwhile, use libraries (like PPL, not only on Windows).
• Continuation-style code is complicated! We need help from the
compiler and the libraries (ex: async/await).
• Let’s hope the await proposal will be approved!
• We will soon be able to write simple, elegant asynchronous code in
C++!
41
Note that even though copyFile takes a std::string as the second parameter, the string literal is passed as a char const* and converted to a std::string only in the context of the new thread.
This is particularly important when the argument supplied is a pointer to an automatic variable, as above.
Also, of course sharing data between threads can cause race conditions.
The new standard provides classes like std::mutex, std::lock_guard, a library for atomic functions, and so on.
(These are outside the scope of this quick presentation)
Support to synchronize operations between threads:
Condition variables (work like Win32 Events o .NET Monitors):
notify_one()
wait()
).
Open file
Start a loop:
Read a line
If it is EOF -> break,
Else if the line contains searchString
writes line number and content into the output stream
Goes on to the next line
Else -> just increments lineNumber (which is in a shared_ptr)
Closes the file e output stream.
The code is VERY convoluted!
And we are not even dealing with error handling.
This IS NOT a good way to write asynchronous code. It works well, but the code is too complicated.
NOTE that this is not a fake example, this is the actual code from the Casablanca sample.
It is practically the state of the art to write really asynchronous code with C++.
It is also clearly not acceptable, IMHO.