SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
Take advantage of C++
from Python
Yung-Yu Chen
PyCon Kyushu
30th June 2018
Why C++
❖ Python is slow
❖ Everything is on heap
❖ Always dynamic types
❖ Hard to access assembly
❖ Convoluted algorithms with ndarray
❖ Access external code written in any language
❖ Detail control and abstraction
Hard problems take time
• Supersonic jet in cross flow; density contour
• 264 cores with 53 hours for 1.3 B variables (66 M elements) by 12,000 time steps
• At OSC, 2011 (10 Gbps InfiniBand)
HPC (high-performance computing) is hard. Physics is harder. Don’t mingle.
Best of both worlds
❖ C++: fast runtime, strong static type checking, industrial grade
❖ Slow to code
❖ Python: fast prototyping, batteries included, easy to use
❖ Slow to run
❖ Hybrid system is everywhere.
❖ TensorFlow, Blender, OpenCV, etc.
❖ C++ crunches numbers. Python controls the flow.
❖ Applications work like libraries, libraries like applications.
pybind11
❖ https://github.com/pybind/pybind11: C++11
❖ Expose C++ entities to Python
❖ Use Python from C++
❖ list, tuple, dict, and str
❖ handle, object, and none
C++11(/14/17/20)
New language features: auto and decltype, defaulted and deleted
functions, final and override, trailing return type, rvalue references,
move constructors/move assignment, scoped enums, constexpr and
literal types, list initialization, delegating and inherited constructors,
brace-or-equal initializers, nullptr, long long, char16_t and char32_t,
type aliases, variadic templates, generalized unions, generalized
PODs, Unicode string literals, user-defined literals, attributes,
lambda expressions, noexcept, alignof and alignas, multithreaded
memory model, thread-local storage, GC interface, range for (based
on a Boost library), static assertions (based on a Boost library)
http://en.cppreference.com/w/cpp/language/history
Python’s friends
❖ Shared pointer: manage resource ownership between
C++ and Python
❖ Move semantics: speed
❖ Lambda expression: ease the wrapping code
Ownership
❖ All Python objects are dynamically allocated on the
heap. Python uses reference counting to know who
should deallocate the object when it is no longer used.
❖ A owner of the reference to an object is responsible for
deallocating the object. With multiple owners, the last
owner (at this time, the reference count is 1) calls the
destructor and deallocate. Other owners simply
decrement the count by 1.
Shared pointer
#include <memory>
#include <vector>
#include <algorithm>
#include <iostream>
class Series {
std::vector<int> m_data;
public:
int sum() const {
const int ret = std::accumulate(
m_data.begin(), m_data.end(), 0);
std::cout << "Series::sum() = " << ret << std::endl;
return ret;
}
static size_t count;
Series(size_t size, int lead) : m_data(size) {
for (size_t it=0; it<size; it++) { m_data[it] = lead+it; }
count++;
}
~Series() { count--; }
};
size_t Series::count = 0;
void use_raw_pointer() {
Series * series_ptr = new Series(10, 2);
series_ptr->sum(); // call member function
// OUT: Series::sum() = 65
// remember to delete the object or we leak memory
std::cout << "before explicit deletion, Series::count = "
<< Series::count << std::endl;
// OUT: before explicit deletion, Series::count = 1
delete series_ptr;
std::cout << "after the resource is manually freed, Series::count = "
<< Series::count << std::endl;
// OUT: after the resource is manually freed, Series::count = 0
}
void use_shared_pointer() {
std::shared_ptr<Series> series_sptr(new Series(10, 3));
series_sptr->sum(); // call member function
// OUT: Series::sum() = 75
// note shared_ptr handles deletion for series_sptr
}
int main(int argc, char ** argv) {
// the common raw pointer
use_raw_pointer();
// now, shared_ptr
use_shared_pointer();
std::cout << "no memory leak: Series::count = "
<< Series::count << std::endl;
// OUT: no memory leak: Series::count = 0
return 0;
}
Move semantics
❖ Number-crunching code needs large arrays as memory buffers.
They aren’t supposed to be copied frequently.
❖ 50,000 × 50,000 takes 20 GB.
❖ Shared pointers should manage large chucks of memory.
❖ New reference to an object: copy constructor of shared pointer
❖ Borrowed reference to an object: const reference to the shared
pointer
❖ Stolen reference to an object: move constructor of shared
pointer
Lambda
❖ Put the code at the place it should be shown
namespace py = pybind11;
cls = py::class_< wrapped_type, holder_type >(mod, pyname, clsdoc):
cls
.def(
py::init([](block_type & block, index_type icl, bool init_sentinel) {
return wrapped_type(block, icl, init_sentinel);
}),
py::arg("block"), py::arg("icl"), py::arg("init_sentinel")=true
)
.def("repr", &wrapped_type::repr, py::arg("indent")=0, py::arg("precision")=0)
.def("__repr__", [](wrapped_type & self){ return self.repr(); })
.def("init_sentinel", &wrapped_type::init_sentinel)
.def_readwrite("cnd", &wrapped_type::cnd)
.def_readwrite("vol", &wrapped_type::vol)
.def_property_readonly(
"nbce",
[](wrapped_type & self) { return self.bces.size(); }
)
.def(
"get_bce",
[](wrapped_type & self, index_type ibce) { return self.bces.at(ibce); }
)
;
Lambda, cont’d
❖ Code as free as Python, as fast as C
#include <unordered_map>
#include <functional>
#include <cstdio>
int main(int argc, char ** argv) {
// Python: fmap = dict()
std::unordered_map<int, std::function<void(int)>> fmap;
// Python: fmap[1] = lambda v: print("v = %d" % v)
fmap.insert({
1, [](int v) -> void { std::printf("v = %dn", v); }
});
// Python: fmap[5] = lambda v: print("v*5 = %d" % (v*5))
fmap.insert({
5, [](int v) -> void { std::printf("v*5 = %dn", v*5); }
});
std::unordered_map<int, std::function<void(int)>>::iterator search;
// Python: fmap[1](100)
search = fmap.find(1);
search->second(100);
// OUT: v = 100
// Python: fmap[5](500)
search = fmap.find(5);
search->second(500);
// OUT: v*5 = 2500
return 0;
}
Manipulate Python
❖ Don’t mingle Python with C++
❖ Python has GIL
❖ Don’t include Python.h if you don’t intend to run
Python
❖ Once it enters your core, it’s hard to get it off
#include <Python.h>
class Core {
private:
int m_value;
PyObject * m_pyobject;
};
Do it in the wrapping layer
cls
.def(
py::init([](py::object pyblock) {
block_type * block = py::cast<block_type *>(pyblock.attr("_ustblk"));
std::shared_ptr<wrapped_type> svr = wrapped_type::construct(block->shared_from_this());
for (auto bc : py::list(pyblock.attr("bclist"))) {
std::string name = py::str(bc.attr("__class__").attr("__name__").attr("lstrip")("GasPlus"));
BoundaryData * data = py::cast<BoundaryData *>(bc.attr("_data"));
std::unique_ptr<gas::TrimBase<NDIM>> trim;
if ("Interface" == name) {
trim = make_unique<gas::TrimInterface<NDIM>>(*svr, *data);
} else if ("NoOp" == name) {
trim = make_unique<gas::TrimNoOp<NDIM>>(*svr, *data);
} else if ("NonRefl" == name) {
trim = make_unique<gas::TrimNonRefl<NDIM>>(*svr, *data);
} else if ("SlipWall" == name) {
trim = make_unique<gas::TrimSlipWall<NDIM>>(*svr, *data);
} else if ("Inlet" == name) {
trim = make_unique<gas::TrimInlet<NDIM>>(*svr, *data);
} else {
/* do nothing for now */ // throw std::runtime_error("BC type unknown");
}
svr->trims().push_back(std::move(trim));
}
if (report_interval) { svr->make_qty(); }
return svr;
}),
py::arg("block")
);
pybind11::list
❖ Read a list and cast contents:
❖ Populate:
#include <pybind11/pybind11.h> // must be first
#include <string>
#include <iostream>
namespace py = pybind11;
PYBIND11_MODULE(_pylist, mod) {
mod.def(
"do",
[](py::list & l) {
// convert contents to std::string and send to cout
std::cout << "std::cout:" << std::endl;
for (py::handle o : l) {
std::string s = py::cast<std::string>(o);
std::cout << s << std::endl;
}
}
);
mod.def(
"do2",
[](py::list & l) {
// create a new list
std::cout << "py::print:" << std::endl;
py::list l2;
for (py::handle o : l) {
std::string s = py::cast<std::string>(o);
s = "elm:" + s;
py::str s2(s);
l2.append(s2); // populate contents
}
py::print(l2);
}
);
} /* end PYBIND11_PLUGIN(_pylist) */
>>> import _pylist
>>> # print the input list
>>> _pylist.do(["a", "b", "c"])
std::cout:
a
b
c
>>> _pylist.do2(["d", "e", "f"])
py::print:
['elm:d', 'elm:e', 'elm:f']
pybind11::tuple
❖ Tuple is immutable, thus
behaves like read-only. The
construction is through another
iterable object.
❖ Read the contents of a tuple:
#include <pybind11/pybind11.h> // must be first
#include <vector>
namespace py = pybind11;
PYBIND11_MODULE(_pytuple, mod) {
mod.def(
"do",
[](py::args & args) {
// build a list using py::list::append
py::list l;
for (py::handle h : args) {
l.append(h);
}
// convert it to a tuple
py::tuple t(l);
// print it out
py::print(py::str("{} len={}").format(t, t.size()));
// print the element one by one
for (size_t it=0; it<t.size(); ++it) {
py::print(py::str("{}").format(t[it]));
}
}
);
} /* end PYBIND11_PLUGIN(_pytuple) */
>>> import _pytuple
>>> _pytuple.do("a", 7, 5.6)
('a', 7, 5.6) len=3
a
7
5.6
pybind11::dict
❖ Dictionary is one of the
most useful container in
Python.
❖ Populate a dictionary:
❖ Manipulate it:
#include <pybind11/pybind11.h> // must be first
#include <string>
#include <stdexcept>
#include <iostream>
namespace py = pybind11;
PYBIND11_MODULE(_pydict, mod) {
mod.def(
"do",
[](py::args & args) {
if (args.size() % 2 != 0) {
throw std::runtime_error("argument number must be even");
}
// create a dict from the input tuple
py::dict d;
for (size_t it=0; it<args.size(); it+=2) {
d[args[it]] = args[it+1];
}
return d;
}
);
mod.def(
"do2",
[](py::dict d, py::args & args) {
for (py::handle h : args) {
if (d.contains(h)) {
std::cout << py::cast<std::string>(h)
<< " is in the input dictionary" << std::endl;
} else {
std::cout << py::cast<std::string>(h)
<< " is not found in the input dictionary" << std::endl;
}
}
std::cout << "remove everything in the input dictionary!" << std::endl;
d.clear();
return d;
}
);
} /* end PYBIND11_PLUGIN(_pydict) */
>>> import _pydict
>>> d = _pydict.do("a", 7, "b", "name", 10, 4.2)
>>> print(d)
{'a': 7, 'b': 'name', 10: 4.2}
>>> d2 = _pydict.do2(d, "b", "d")
b is in the input dictionary
d is not found in the input dictionary
remove everything in the input dictionary!
>>> print("The returned dictionary is empty:", d2)
The returned dictionary is empty: {}
>>> print("The first dictionary becomes empty too:", d)
The first dictionary becomes empty too: {}
>>> print("Are the two dictionaries the same?", d2 is d)
Are the two dictionaries the same? True
pybind11::str
❖ One more trick with
Python strings in
pybind11; user-defined
literal:



#include <pybind11/pybind11.h> // must be first
#include <iostream>
namespace py = pybind11;
using namespace py::literals; // to bring in the `_s` literal
PYBIND11_MODULE(_pystr, mod) {
mod.def(
"do",
[]() {
py::str s("python string {}"_s.format("formatting"));
py::print(s);
}
);
} /* end PYBIND11_PLUGIN(_pystr) */
>>> import _pystr
>>> _pystr.do()
python string formatting
Generic Python objects
❖ Pybind11 defines two generic types for representing
Python objects:
❖ “handle”: base class of all pybind11 classes for Python
types
❖ “object” derives from handle and adds automatic
reference counting
pybind11::handle and object
manually descrases refcount after h.dec_ref(): 3
#include <pybind11/pybind11.h> // must be first
#include <iostream>
namespace py = pybind11;
using namespace py::literals; // to bring in the `_s` literal
PYBIND11_MODULE(_pyho, mod) {
mod.def(
"do",
[](py::object const & o) {
std::cout << "refcount in the beginning: "
<< o.ptr()->ob_refcnt << std::endl;
py::handle h(o);
std::cout << "no increase of refcount with a new pybind11::handle: "
<< h.ptr()->ob_refcnt << std::endl;
{
py::object o2(o);
std::cout << "increased refcount with a new pybind11::object: "
<< o2.ptr()->ob_refcnt << std::endl;
}
std::cout << "decreased refcount after the new pybind11::object destructed: "
<< o.ptr()->ob_refcnt << std::endl;
h.inc_ref();
std::cout << "manually increases refcount after h.inc_ref(): "
<< h.ptr()->ob_refcnt << std::endl;
h.dec_ref();
std::cout << "manually descrases refcount after h.dec_ref(): "
<< h.ptr()->ob_refcnt << std::endl;
}
);
} /* end PYBIND11_PLUGIN(_pyho) */
>>> import _pyho
>>> _pyho.do(["name"])
refcount in the beginning: 3
no increase of refcount with a new pybind11::handle: 3
increased refcount with a new pybind11::object: 4
decreased refcount after the new pybind11::object destructed: 3
manually increases refcount after h.inc_ref(): 4
pybind11::none
❖ It’s worth noting that
pybind11 has “none”
type. In Python, None is
a singleton, and
accessible as Py_None in
the C API.
❖ Access None single from
C++:
#include <pybind11/pybind11.h> // must be first
#include <iostream>
namespace py = pybind11;
using namespace py::literals; // to bring in the `_s` literal
PYBIND11_MODULE(_pynone, mod) {
mod.def(
"do",
[](py::object const & o) {
if (o.is(py::none())) {
std::cout << "it is None" << std::endl;
} else {
std::cout << "it is not None" << std::endl;
}
}
);
} /* end PYBIND11_PLUGIN(_pynone) */
>>> import _pynone
>>> _pynone.do(None)
it is None
>>> _pynone.do(False)
it is not None
Fast Code with C++
Never loop in Python
❖ Sum 100,000,000 integers
❖ The C++ version:
❖ Numpy is better, but not enough
$ python -m timeit -s 'data = range(100000000)' 'sum(data)'
10 loops, best of 3: 2.36 sec per loop
$ time ./run
real 0m0.010s
user 0m0.002s
sys 0m0.004s
#include <cstdio>
int main(int argc, char ** argv) {
long value = 0;
for (long it=0; it<100000000; ++it) { value += it; }
return 0;
}
$ python -m timeit -s 'import numpy as np ; data =
np.arange(100000000, dtype="int64")' 'data.sum()'
10 loops, best of 3: 74.9 msec per loop
Wisely use arrays
❖ Python calls are expensive. Data need to be transferred
from Python to C++ in batch. Use arrays.
❖ C++ code may use arrays as internal representation. For
example, matrices are arrays having a 2-D view.
❖ Arrays are used as both
❖ interface between Python and C++, and
❖ internal storage in the C++ engine
Arrays in Python
❖ What we really mean is numpy(.ndarray)
❖ 12 lines to create vertices for zig-zagging mesh
❖ They get things done, although sometimes look convoluted
# create nodes.
nodes = []
for iy, yloc in enumerate(np.arange(y0, y1+dy/4, dy/2)):
if iy % 2 == 0:
meshx = np.arange(x0, x1+dx/4, dx, dtype='float64')
else:
meshx = np.arange(x0+dx/2, x1-dx/4, dx, dtype='float64')
nodes.append(np.vstack([meshx, np.full_like(meshx, yloc)]).T)
nodes = np.vstack(nodes)
assert nodes.shape[0] == nnode
blk.ndcrd[:,:] = nodes
assert (blk.ndcrd == nodes).all()
Expose memory buffer
class Buffer: public std::enable_shared_from_this<Buffer> {
private:
size_t m_length = 0;
char * m_data = nullptr;
struct ctor_passkey {};
public:
Buffer(size_t length, const ctor_passkey &)
: m_length(length) { m_data = new char[length](); }
static std::shared_ptr<Buffer> construct(size_t length) {
return std::make_shared<Buffer>(length, ctor_passkey());
}
~Buffer() {
if (nullptr != m_data) {
delete[] m_data;
m_data = nullptr;
}
}
/** Backdoor */
template< typename T >
T * data() const { return reinterpret_cast<T*>(m_data); }
};
py::array from(array_flavor flavor) {
// ndarray shape and stride
npy_intp shape[m_table.ndim()];
std::copy(m_table.dims().begin(),
m_table.dims().end(),
shape);
npy_intp strides[m_table.ndim()];
strides[m_table.ndim()-1] = m_table.elsize();
for (ssize_t it = m_table.ndim()-2; it >= 0; --it) {
strides[it] = shape[it+1] * strides[it+1];
}
// create ndarray
void * data = m_table.data();
py::object tmp = py::reinterpret_steal<py::object>(
PyArray_NewFromDescr(
&PyArray_Type,
PyArray_DescrFromType(m_table.datatypeid()),
m_table.ndim(),
shape,
strides,
data,
NPY_ARRAY_WRITEABLE,
nullptr));
// link lifecycle to the underneath buffer
py::object buffer = py::cast(m_table.buffer());
py::array ret;
if (PyArray_SetBaseObject((PyArrayObject *)tmp.ptr(),
buffer.inc_ref().ptr()) == 0) {
ret = tmp;
}
return ret;
}
Internal buffer Expose the buffer as ndarray
❖ Numpy arrays provide the most common construct: a
contiguous memory buffer, and tons of code
❖ N-dimensional arrays (ndarray)
❖ There are variants, but less useful in C++: masked
array, sparse matrices, etc.
Define your meta data
❖ Free to define how the memory is used
class LookupTableCore {
private:
std::shared_ptr<Buffer> m_buffer;
std::vector<index_type> m_dims;
index_type m_nghost = 0;
index_type m_nbody = 0;
index_type m_ncolumn = 0;
index_type m_elsize = 1; ///< Element size in bytes.
DataTypeId m_datatypeid = MH_INT8;
public:
index_type ndim() const { return m_dims.size(); }
index_type nghost() const { return m_nghost; }
index_type nbody() const { return m_nbody; }
index_type nfull() const { return m_nghost + m_nbody; }
index_type ncolumn() const { return m_ncolumn; }
index_type nelem() const { return nfull() * ncolumn(); }
index_type elsize() const { return m_elsize; }
DataTypeId datatypeid() const { return m_datatypeid; }
size_t nbyte() const { return buffer()->nbyte(); }
};
0
bodyghost
Organize arrays
❖ LookupTable is a class
template providing static
information for the dynamic
array core
❖ Now we can put together a
class that keeps track of all
data for computation
template< size_t NDIM >
class UnstructuredBlock {
private:
// geometry arrays.
LookupTable<real_type, NDIM> m_ndcrd;
LookupTable<real_type, NDIM> m_fccnd;
LookupTable<real_type, NDIM> m_fcnml;
LookupTable<real_type, 0> m_fcara;
LookupTable<real_type, NDIM> m_clcnd;
LookupTable<real_type, 0> m_clvol;
// meta arrays.
LookupTable<shape_type, 0> m_fctpn;
LookupTable<shape_type, 0> m_cltpn;
LookupTable<index_type, 0> m_clgrp;
// connectivity arrays.
LookupTable<index_type, FCMND+1> m_fcnds;
LookupTable<index_type, FCNCL > m_fccls;
LookupTable<index_type, CLMND+1> m_clnds;
LookupTable<index_type, CLMFC+1> m_clfcs;
// boundary information.
LookupTable<index_type, 2> m_bndfcs;
std::vector<BoundaryData> m_bndvec;
};
(This case is for unstructured meshes of mixed elements in 2-/3-dimensional Euclidean space)
Fast and hideous
❖ In theory we can write
beautiful and fast code in
C++, and we should.
❖ In practice, as long as it’s
fast, it’s not too hard to
compromise on elegance.
❖ Testability is the bottom
line.
const index_type *
pclfcs = reinterpret_cast<const index_type *>(clfcs().row(0));
prcells = reinterpret_cast<index_type *>(rcells.row(0));
for (icl=0; icl<ncell(); icl++) {
for (ifl=1; ifl<=pclfcs[0]; ifl++) {
ifl1 = ifl-1;
ifc = pclfcs[ifl];
const index_type *
pfccls = reinterpret_cast<const index_type *>(fccls().row(0))
+ ifc*FCREL;
if (ifc == -1) { // NOT A FACE!? SHOULDN'T HAPPEN.
prcells[ifl1] = -1;
continue;
} else if (pfccls[0] == icl) {
if (pfccls[2] != -1) { // has neighboring block.
prcells[ifl1] = -1;
} else { // is interior.
prcells[ifl1] = pfccls[1];
};
} else if (pfccls[1] == icl) { // I am the neighboring cell.
prcells[ifl1] = pfccls[0];
};
// count rcell number.
if (prcells[ifl1] >= 0) {
rcellno[icl] += 1;
} else {
prcells[ifl1] = -1;
};
};
// advance pointers.
pclfcs += CLMFC+1;
prcells += CLMFC;
}; (This looks like C since it really was C.)
Final notes
❖ Avoid Python when you need speed; use it as a shell to
your high-performance library from day one
❖ Resource management is in the core of the hybrid
architecture; do it in C++
❖ Use array (look-up tables) to keep large data
❖ Don’t access PyObject from your core
❖ Always keep in mind the differences in typing systems

Weitere ähnliche Inhalte

Was ist angesagt?

좌충우돌 ORM 개발기 | Devon 2012
좌충우돌 ORM 개발기 | Devon 2012좌충우돌 ORM 개발기 | Devon 2012
좌충우돌 ORM 개발기 | Devon 2012
Daum DNA
 

Was ist angesagt? (20)

Stack using Array
Stack using ArrayStack using Array
Stack using Array
 
Classes and objects
Classes and objectsClasses and objects
Classes and objects
 
Java package
Java packageJava package
Java package
 
Function in c program
Function in c programFunction in c program
Function in c program
 
Constructors and Destructor in C++
Constructors and Destructor in C++Constructors and Destructor in C++
Constructors and Destructor in C++
 
Stacks
StacksStacks
Stacks
 
STORAGE CLASSES
STORAGE CLASSESSTORAGE CLASSES
STORAGE CLASSES
 
Java thread life cycle
Java thread life cycleJava thread life cycle
Java thread life cycle
 
Tail recursion
Tail recursionTail recursion
Tail recursion
 
File in c
File in cFile in c
File in c
 
Power point presentation on access specifier in OOPs
Power point presentation on access specifier in OOPsPower point presentation on access specifier in OOPs
Power point presentation on access specifier in OOPs
 
Veriloggen: Pythonによるハードウェアメタプログラミング(第3回 高位合成友の会 @ドワンゴ)
Veriloggen: Pythonによるハードウェアメタプログラミング(第3回 高位合成友の会 @ドワンゴ)Veriloggen: Pythonによるハードウェアメタプログラミング(第3回 高位合成友の会 @ドワンゴ)
Veriloggen: Pythonによるハードウェアメタプログラミング(第3回 高位合成友の会 @ドワンゴ)
 
Classes and objects
Classes and objectsClasses and objects
Classes and objects
 
Memory Management C++ (Peeling operator new() and delete())
Memory Management C++ (Peeling operator new() and delete())Memory Management C++ (Peeling operator new() and delete())
Memory Management C++ (Peeling operator new() and delete())
 
Object Oriented Programming in Python
Object Oriented Programming in PythonObject Oriented Programming in Python
Object Oriented Programming in Python
 
Q2.12: Debugging with GDB
Q2.12: Debugging with GDBQ2.12: Debugging with GDB
Q2.12: Debugging with GDB
 
Array operations
Array operationsArray operations
Array operations
 
Abstract data types
Abstract data typesAbstract data types
Abstract data types
 
좌충우돌 ORM 개발기 | Devon 2012
좌충우돌 ORM 개발기 | Devon 2012좌충우돌 ORM 개발기 | Devon 2012
좌충우돌 ORM 개발기 | Devon 2012
 
C Programming Storage classes, Recursion
C Programming Storage classes, RecursionC Programming Storage classes, Recursion
C Programming Storage classes, Recursion
 

Ähnlich wie Take advantage of C++ from Python

C++totural file
C++totural fileC++totural file
C++totural file
halaisumit
 
Python and Pytorch tutorial and walkthrough
Python and Pytorch tutorial and walkthroughPython and Pytorch tutorial and walkthrough
Python and Pytorch tutorial and walkthrough
gabriellekuruvilla
 
CS225_Prelecture_Notes 2nd
CS225_Prelecture_Notes 2ndCS225_Prelecture_Notes 2nd
CS225_Prelecture_Notes 2nd
Edward Chen
 

Ähnlich wie Take advantage of C++ from Python (20)

Start Wrap Episode 11: A New Rope
Start Wrap Episode 11: A New RopeStart Wrap Episode 11: A New Rope
Start Wrap Episode 11: A New Rope
 
Boost.Python: C++ and Python Integration
Boost.Python: C++ and Python IntegrationBoost.Python: C++ and Python Integration
Boost.Python: C++ and Python Integration
 
C++ tutorial
C++ tutorialC++ tutorial
C++ tutorial
 
C++totural file
C++totural fileC++totural file
C++totural file
 
Cluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in CCluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in C
 
Python and Pytorch tutorial and walkthrough
Python and Pytorch tutorial and walkthroughPython and Pytorch tutorial and walkthrough
Python and Pytorch tutorial and walkthrough
 
Intro To C++ - Class #17: Pointers!, Objects Talking To Each Other
Intro To C++ - Class #17: Pointers!, Objects Talking To Each OtherIntro To C++ - Class #17: Pointers!, Objects Talking To Each Other
Intro To C++ - Class #17: Pointers!, Objects Talking To Each Other
 
tokyotalk
tokyotalktokyotalk
tokyotalk
 
PHP 8: Process & Fixing Insanity
PHP 8: Process & Fixing InsanityPHP 8: Process & Fixing Insanity
PHP 8: Process & Fixing Insanity
 
Return of c++
Return of c++Return of c++
Return of c++
 
Apache Thrift
Apache ThriftApache Thrift
Apache Thrift
 
CS225_Prelecture_Notes 2nd
CS225_Prelecture_Notes 2ndCS225_Prelecture_Notes 2nd
CS225_Prelecture_Notes 2nd
 
C++primer
C++primerC++primer
C++primer
 
Why learn Internals?
Why learn Internals?Why learn Internals?
Why learn Internals?
 
Linux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloudLinux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloud
 
Notes about moving from python to c++ py contw 2020
Notes about moving from python to c++ py contw 2020Notes about moving from python to c++ py contw 2020
Notes about moving from python to c++ py contw 2020
 
C language introduction
C language introduction C language introduction
C language introduction
 
SRAVANByCPP
SRAVANByCPPSRAVANByCPP
SRAVANByCPP
 
Introduction Of C++
Introduction Of C++Introduction Of C++
Introduction Of C++
 
C++ theory
C++ theoryC++ theory
C++ theory
 

Mehr von Yung-Yu Chen

Mehr von Yung-Yu Chen (8)

Write Python for Speed
Write Python for SpeedWrite Python for Speed
Write Python for Speed
 
SimpleArray between Python and C++
SimpleArray between Python and C++SimpleArray between Python and C++
SimpleArray between Python and C++
 
Write code and find a job
Write code and find a jobWrite code and find a job
Write code and find a job
 
On the necessity and inapplicability of python
On the necessity and inapplicability of pythonOn the necessity and inapplicability of python
On the necessity and inapplicability of python
 
Harmonic Stack for Speed
Harmonic Stack for SpeedHarmonic Stack for Speed
Harmonic Stack for Speed
 
Your interactive computing
Your interactive computingYour interactive computing
Your interactive computing
 
Engineer Engineering Software
Engineer Engineering SoftwareEngineer Engineering Software
Engineer Engineering Software
 
Craftsmanship in Computational Work
Craftsmanship in Computational WorkCraftsmanship in Computational Work
Craftsmanship in Computational Work
 

Kürzlich hochgeladen

Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 

Kürzlich hochgeladen (20)

Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening Designs
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 

Take advantage of C++ from Python

  • 1. Take advantage of C++ from Python Yung-Yu Chen PyCon Kyushu 30th June 2018
  • 2. Why C++ ❖ Python is slow ❖ Everything is on heap ❖ Always dynamic types ❖ Hard to access assembly ❖ Convoluted algorithms with ndarray ❖ Access external code written in any language ❖ Detail control and abstraction
  • 3. Hard problems take time • Supersonic jet in cross flow; density contour • 264 cores with 53 hours for 1.3 B variables (66 M elements) by 12,000 time steps • At OSC, 2011 (10 Gbps InfiniBand) HPC (high-performance computing) is hard. Physics is harder. Don’t mingle.
  • 4. Best of both worlds ❖ C++: fast runtime, strong static type checking, industrial grade ❖ Slow to code ❖ Python: fast prototyping, batteries included, easy to use ❖ Slow to run ❖ Hybrid system is everywhere. ❖ TensorFlow, Blender, OpenCV, etc. ❖ C++ crunches numbers. Python controls the flow. ❖ Applications work like libraries, libraries like applications.
  • 5. pybind11 ❖ https://github.com/pybind/pybind11: C++11 ❖ Expose C++ entities to Python ❖ Use Python from C++ ❖ list, tuple, dict, and str ❖ handle, object, and none
  • 6. C++11(/14/17/20) New language features: auto and decltype, defaulted and deleted functions, final and override, trailing return type, rvalue references, move constructors/move assignment, scoped enums, constexpr and literal types, list initialization, delegating and inherited constructors, brace-or-equal initializers, nullptr, long long, char16_t and char32_t, type aliases, variadic templates, generalized unions, generalized PODs, Unicode string literals, user-defined literals, attributes, lambda expressions, noexcept, alignof and alignas, multithreaded memory model, thread-local storage, GC interface, range for (based on a Boost library), static assertions (based on a Boost library) http://en.cppreference.com/w/cpp/language/history
  • 7. Python’s friends ❖ Shared pointer: manage resource ownership between C++ and Python ❖ Move semantics: speed ❖ Lambda expression: ease the wrapping code
  • 8. Ownership ❖ All Python objects are dynamically allocated on the heap. Python uses reference counting to know who should deallocate the object when it is no longer used. ❖ A owner of the reference to an object is responsible for deallocating the object. With multiple owners, the last owner (at this time, the reference count is 1) calls the destructor and deallocate. Other owners simply decrement the count by 1.
  • 9. Shared pointer #include <memory> #include <vector> #include <algorithm> #include <iostream> class Series { std::vector<int> m_data; public: int sum() const { const int ret = std::accumulate( m_data.begin(), m_data.end(), 0); std::cout << "Series::sum() = " << ret << std::endl; return ret; } static size_t count; Series(size_t size, int lead) : m_data(size) { for (size_t it=0; it<size; it++) { m_data[it] = lead+it; } count++; } ~Series() { count--; } }; size_t Series::count = 0; void use_raw_pointer() { Series * series_ptr = new Series(10, 2); series_ptr->sum(); // call member function // OUT: Series::sum() = 65 // remember to delete the object or we leak memory std::cout << "before explicit deletion, Series::count = " << Series::count << std::endl; // OUT: before explicit deletion, Series::count = 1 delete series_ptr; std::cout << "after the resource is manually freed, Series::count = " << Series::count << std::endl; // OUT: after the resource is manually freed, Series::count = 0 } void use_shared_pointer() { std::shared_ptr<Series> series_sptr(new Series(10, 3)); series_sptr->sum(); // call member function // OUT: Series::sum() = 75 // note shared_ptr handles deletion for series_sptr } int main(int argc, char ** argv) { // the common raw pointer use_raw_pointer(); // now, shared_ptr use_shared_pointer(); std::cout << "no memory leak: Series::count = " << Series::count << std::endl; // OUT: no memory leak: Series::count = 0 return 0; }
  • 10. Move semantics ❖ Number-crunching code needs large arrays as memory buffers. They aren’t supposed to be copied frequently. ❖ 50,000 × 50,000 takes 20 GB. ❖ Shared pointers should manage large chucks of memory. ❖ New reference to an object: copy constructor of shared pointer ❖ Borrowed reference to an object: const reference to the shared pointer ❖ Stolen reference to an object: move constructor of shared pointer
  • 11. Lambda ❖ Put the code at the place it should be shown namespace py = pybind11; cls = py::class_< wrapped_type, holder_type >(mod, pyname, clsdoc): cls .def( py::init([](block_type & block, index_type icl, bool init_sentinel) { return wrapped_type(block, icl, init_sentinel); }), py::arg("block"), py::arg("icl"), py::arg("init_sentinel")=true ) .def("repr", &wrapped_type::repr, py::arg("indent")=0, py::arg("precision")=0) .def("__repr__", [](wrapped_type & self){ return self.repr(); }) .def("init_sentinel", &wrapped_type::init_sentinel) .def_readwrite("cnd", &wrapped_type::cnd) .def_readwrite("vol", &wrapped_type::vol) .def_property_readonly( "nbce", [](wrapped_type & self) { return self.bces.size(); } ) .def( "get_bce", [](wrapped_type & self, index_type ibce) { return self.bces.at(ibce); } ) ;
  • 12. Lambda, cont’d ❖ Code as free as Python, as fast as C #include <unordered_map> #include <functional> #include <cstdio> int main(int argc, char ** argv) { // Python: fmap = dict() std::unordered_map<int, std::function<void(int)>> fmap; // Python: fmap[1] = lambda v: print("v = %d" % v) fmap.insert({ 1, [](int v) -> void { std::printf("v = %dn", v); } }); // Python: fmap[5] = lambda v: print("v*5 = %d" % (v*5)) fmap.insert({ 5, [](int v) -> void { std::printf("v*5 = %dn", v*5); } }); std::unordered_map<int, std::function<void(int)>>::iterator search; // Python: fmap[1](100) search = fmap.find(1); search->second(100); // OUT: v = 100 // Python: fmap[5](500) search = fmap.find(5); search->second(500); // OUT: v*5 = 2500 return 0; }
  • 13. Manipulate Python ❖ Don’t mingle Python with C++ ❖ Python has GIL ❖ Don’t include Python.h if you don’t intend to run Python ❖ Once it enters your core, it’s hard to get it off #include <Python.h> class Core { private: int m_value; PyObject * m_pyobject; };
  • 14. Do it in the wrapping layer cls .def( py::init([](py::object pyblock) { block_type * block = py::cast<block_type *>(pyblock.attr("_ustblk")); std::shared_ptr<wrapped_type> svr = wrapped_type::construct(block->shared_from_this()); for (auto bc : py::list(pyblock.attr("bclist"))) { std::string name = py::str(bc.attr("__class__").attr("__name__").attr("lstrip")("GasPlus")); BoundaryData * data = py::cast<BoundaryData *>(bc.attr("_data")); std::unique_ptr<gas::TrimBase<NDIM>> trim; if ("Interface" == name) { trim = make_unique<gas::TrimInterface<NDIM>>(*svr, *data); } else if ("NoOp" == name) { trim = make_unique<gas::TrimNoOp<NDIM>>(*svr, *data); } else if ("NonRefl" == name) { trim = make_unique<gas::TrimNonRefl<NDIM>>(*svr, *data); } else if ("SlipWall" == name) { trim = make_unique<gas::TrimSlipWall<NDIM>>(*svr, *data); } else if ("Inlet" == name) { trim = make_unique<gas::TrimInlet<NDIM>>(*svr, *data); } else { /* do nothing for now */ // throw std::runtime_error("BC type unknown"); } svr->trims().push_back(std::move(trim)); } if (report_interval) { svr->make_qty(); } return svr; }), py::arg("block") );
  • 15. pybind11::list ❖ Read a list and cast contents: ❖ Populate: #include <pybind11/pybind11.h> // must be first #include <string> #include <iostream> namespace py = pybind11; PYBIND11_MODULE(_pylist, mod) { mod.def( "do", [](py::list & l) { // convert contents to std::string and send to cout std::cout << "std::cout:" << std::endl; for (py::handle o : l) { std::string s = py::cast<std::string>(o); std::cout << s << std::endl; } } ); mod.def( "do2", [](py::list & l) { // create a new list std::cout << "py::print:" << std::endl; py::list l2; for (py::handle o : l) { std::string s = py::cast<std::string>(o); s = "elm:" + s; py::str s2(s); l2.append(s2); // populate contents } py::print(l2); } ); } /* end PYBIND11_PLUGIN(_pylist) */ >>> import _pylist >>> # print the input list >>> _pylist.do(["a", "b", "c"]) std::cout: a b c >>> _pylist.do2(["d", "e", "f"]) py::print: ['elm:d', 'elm:e', 'elm:f']
  • 16. pybind11::tuple ❖ Tuple is immutable, thus behaves like read-only. The construction is through another iterable object. ❖ Read the contents of a tuple: #include <pybind11/pybind11.h> // must be first #include <vector> namespace py = pybind11; PYBIND11_MODULE(_pytuple, mod) { mod.def( "do", [](py::args & args) { // build a list using py::list::append py::list l; for (py::handle h : args) { l.append(h); } // convert it to a tuple py::tuple t(l); // print it out py::print(py::str("{} len={}").format(t, t.size())); // print the element one by one for (size_t it=0; it<t.size(); ++it) { py::print(py::str("{}").format(t[it])); } } ); } /* end PYBIND11_PLUGIN(_pytuple) */ >>> import _pytuple >>> _pytuple.do("a", 7, 5.6) ('a', 7, 5.6) len=3 a 7 5.6
  • 17. pybind11::dict ❖ Dictionary is one of the most useful container in Python. ❖ Populate a dictionary: ❖ Manipulate it: #include <pybind11/pybind11.h> // must be first #include <string> #include <stdexcept> #include <iostream> namespace py = pybind11; PYBIND11_MODULE(_pydict, mod) { mod.def( "do", [](py::args & args) { if (args.size() % 2 != 0) { throw std::runtime_error("argument number must be even"); } // create a dict from the input tuple py::dict d; for (size_t it=0; it<args.size(); it+=2) { d[args[it]] = args[it+1]; } return d; } ); mod.def( "do2", [](py::dict d, py::args & args) { for (py::handle h : args) { if (d.contains(h)) { std::cout << py::cast<std::string>(h) << " is in the input dictionary" << std::endl; } else { std::cout << py::cast<std::string>(h) << " is not found in the input dictionary" << std::endl; } } std::cout << "remove everything in the input dictionary!" << std::endl; d.clear(); return d; } ); } /* end PYBIND11_PLUGIN(_pydict) */ >>> import _pydict >>> d = _pydict.do("a", 7, "b", "name", 10, 4.2) >>> print(d) {'a': 7, 'b': 'name', 10: 4.2} >>> d2 = _pydict.do2(d, "b", "d") b is in the input dictionary d is not found in the input dictionary remove everything in the input dictionary! >>> print("The returned dictionary is empty:", d2) The returned dictionary is empty: {} >>> print("The first dictionary becomes empty too:", d) The first dictionary becomes empty too: {} >>> print("Are the two dictionaries the same?", d2 is d) Are the two dictionaries the same? True
  • 18. pybind11::str ❖ One more trick with Python strings in pybind11; user-defined literal:
 
 #include <pybind11/pybind11.h> // must be first #include <iostream> namespace py = pybind11; using namespace py::literals; // to bring in the `_s` literal PYBIND11_MODULE(_pystr, mod) { mod.def( "do", []() { py::str s("python string {}"_s.format("formatting")); py::print(s); } ); } /* end PYBIND11_PLUGIN(_pystr) */ >>> import _pystr >>> _pystr.do() python string formatting
  • 19. Generic Python objects ❖ Pybind11 defines two generic types for representing Python objects: ❖ “handle”: base class of all pybind11 classes for Python types ❖ “object” derives from handle and adds automatic reference counting
  • 20. pybind11::handle and object manually descrases refcount after h.dec_ref(): 3 #include <pybind11/pybind11.h> // must be first #include <iostream> namespace py = pybind11; using namespace py::literals; // to bring in the `_s` literal PYBIND11_MODULE(_pyho, mod) { mod.def( "do", [](py::object const & o) { std::cout << "refcount in the beginning: " << o.ptr()->ob_refcnt << std::endl; py::handle h(o); std::cout << "no increase of refcount with a new pybind11::handle: " << h.ptr()->ob_refcnt << std::endl; { py::object o2(o); std::cout << "increased refcount with a new pybind11::object: " << o2.ptr()->ob_refcnt << std::endl; } std::cout << "decreased refcount after the new pybind11::object destructed: " << o.ptr()->ob_refcnt << std::endl; h.inc_ref(); std::cout << "manually increases refcount after h.inc_ref(): " << h.ptr()->ob_refcnt << std::endl; h.dec_ref(); std::cout << "manually descrases refcount after h.dec_ref(): " << h.ptr()->ob_refcnt << std::endl; } ); } /* end PYBIND11_PLUGIN(_pyho) */ >>> import _pyho >>> _pyho.do(["name"]) refcount in the beginning: 3 no increase of refcount with a new pybind11::handle: 3 increased refcount with a new pybind11::object: 4 decreased refcount after the new pybind11::object destructed: 3 manually increases refcount after h.inc_ref(): 4
  • 21. pybind11::none ❖ It’s worth noting that pybind11 has “none” type. In Python, None is a singleton, and accessible as Py_None in the C API. ❖ Access None single from C++: #include <pybind11/pybind11.h> // must be first #include <iostream> namespace py = pybind11; using namespace py::literals; // to bring in the `_s` literal PYBIND11_MODULE(_pynone, mod) { mod.def( "do", [](py::object const & o) { if (o.is(py::none())) { std::cout << "it is None" << std::endl; } else { std::cout << "it is not None" << std::endl; } } ); } /* end PYBIND11_PLUGIN(_pynone) */ >>> import _pynone >>> _pynone.do(None) it is None >>> _pynone.do(False) it is not None
  • 23. Never loop in Python ❖ Sum 100,000,000 integers ❖ The C++ version: ❖ Numpy is better, but not enough $ python -m timeit -s 'data = range(100000000)' 'sum(data)' 10 loops, best of 3: 2.36 sec per loop $ time ./run real 0m0.010s user 0m0.002s sys 0m0.004s #include <cstdio> int main(int argc, char ** argv) { long value = 0; for (long it=0; it<100000000; ++it) { value += it; } return 0; } $ python -m timeit -s 'import numpy as np ; data = np.arange(100000000, dtype="int64")' 'data.sum()' 10 loops, best of 3: 74.9 msec per loop
  • 24. Wisely use arrays ❖ Python calls are expensive. Data need to be transferred from Python to C++ in batch. Use arrays. ❖ C++ code may use arrays as internal representation. For example, matrices are arrays having a 2-D view. ❖ Arrays are used as both ❖ interface between Python and C++, and ❖ internal storage in the C++ engine
  • 25. Arrays in Python ❖ What we really mean is numpy(.ndarray) ❖ 12 lines to create vertices for zig-zagging mesh ❖ They get things done, although sometimes look convoluted # create nodes. nodes = [] for iy, yloc in enumerate(np.arange(y0, y1+dy/4, dy/2)): if iy % 2 == 0: meshx = np.arange(x0, x1+dx/4, dx, dtype='float64') else: meshx = np.arange(x0+dx/2, x1-dx/4, dx, dtype='float64') nodes.append(np.vstack([meshx, np.full_like(meshx, yloc)]).T) nodes = np.vstack(nodes) assert nodes.shape[0] == nnode blk.ndcrd[:,:] = nodes assert (blk.ndcrd == nodes).all()
  • 26. Expose memory buffer class Buffer: public std::enable_shared_from_this<Buffer> { private: size_t m_length = 0; char * m_data = nullptr; struct ctor_passkey {}; public: Buffer(size_t length, const ctor_passkey &) : m_length(length) { m_data = new char[length](); } static std::shared_ptr<Buffer> construct(size_t length) { return std::make_shared<Buffer>(length, ctor_passkey()); } ~Buffer() { if (nullptr != m_data) { delete[] m_data; m_data = nullptr; } } /** Backdoor */ template< typename T > T * data() const { return reinterpret_cast<T*>(m_data); } }; py::array from(array_flavor flavor) { // ndarray shape and stride npy_intp shape[m_table.ndim()]; std::copy(m_table.dims().begin(), m_table.dims().end(), shape); npy_intp strides[m_table.ndim()]; strides[m_table.ndim()-1] = m_table.elsize(); for (ssize_t it = m_table.ndim()-2; it >= 0; --it) { strides[it] = shape[it+1] * strides[it+1]; } // create ndarray void * data = m_table.data(); py::object tmp = py::reinterpret_steal<py::object>( PyArray_NewFromDescr( &PyArray_Type, PyArray_DescrFromType(m_table.datatypeid()), m_table.ndim(), shape, strides, data, NPY_ARRAY_WRITEABLE, nullptr)); // link lifecycle to the underneath buffer py::object buffer = py::cast(m_table.buffer()); py::array ret; if (PyArray_SetBaseObject((PyArrayObject *)tmp.ptr(), buffer.inc_ref().ptr()) == 0) { ret = tmp; } return ret; } Internal buffer Expose the buffer as ndarray ❖ Numpy arrays provide the most common construct: a contiguous memory buffer, and tons of code ❖ N-dimensional arrays (ndarray) ❖ There are variants, but less useful in C++: masked array, sparse matrices, etc.
  • 27. Define your meta data ❖ Free to define how the memory is used class LookupTableCore { private: std::shared_ptr<Buffer> m_buffer; std::vector<index_type> m_dims; index_type m_nghost = 0; index_type m_nbody = 0; index_type m_ncolumn = 0; index_type m_elsize = 1; ///< Element size in bytes. DataTypeId m_datatypeid = MH_INT8; public: index_type ndim() const { return m_dims.size(); } index_type nghost() const { return m_nghost; } index_type nbody() const { return m_nbody; } index_type nfull() const { return m_nghost + m_nbody; } index_type ncolumn() const { return m_ncolumn; } index_type nelem() const { return nfull() * ncolumn(); } index_type elsize() const { return m_elsize; } DataTypeId datatypeid() const { return m_datatypeid; } size_t nbyte() const { return buffer()->nbyte(); } }; 0 bodyghost
  • 28. Organize arrays ❖ LookupTable is a class template providing static information for the dynamic array core ❖ Now we can put together a class that keeps track of all data for computation template< size_t NDIM > class UnstructuredBlock { private: // geometry arrays. LookupTable<real_type, NDIM> m_ndcrd; LookupTable<real_type, NDIM> m_fccnd; LookupTable<real_type, NDIM> m_fcnml; LookupTable<real_type, 0> m_fcara; LookupTable<real_type, NDIM> m_clcnd; LookupTable<real_type, 0> m_clvol; // meta arrays. LookupTable<shape_type, 0> m_fctpn; LookupTable<shape_type, 0> m_cltpn; LookupTable<index_type, 0> m_clgrp; // connectivity arrays. LookupTable<index_type, FCMND+1> m_fcnds; LookupTable<index_type, FCNCL > m_fccls; LookupTable<index_type, CLMND+1> m_clnds; LookupTable<index_type, CLMFC+1> m_clfcs; // boundary information. LookupTable<index_type, 2> m_bndfcs; std::vector<BoundaryData> m_bndvec; }; (This case is for unstructured meshes of mixed elements in 2-/3-dimensional Euclidean space)
  • 29. Fast and hideous ❖ In theory we can write beautiful and fast code in C++, and we should. ❖ In practice, as long as it’s fast, it’s not too hard to compromise on elegance. ❖ Testability is the bottom line. const index_type * pclfcs = reinterpret_cast<const index_type *>(clfcs().row(0)); prcells = reinterpret_cast<index_type *>(rcells.row(0)); for (icl=0; icl<ncell(); icl++) { for (ifl=1; ifl<=pclfcs[0]; ifl++) { ifl1 = ifl-1; ifc = pclfcs[ifl]; const index_type * pfccls = reinterpret_cast<const index_type *>(fccls().row(0)) + ifc*FCREL; if (ifc == -1) { // NOT A FACE!? SHOULDN'T HAPPEN. prcells[ifl1] = -1; continue; } else if (pfccls[0] == icl) { if (pfccls[2] != -1) { // has neighboring block. prcells[ifl1] = -1; } else { // is interior. prcells[ifl1] = pfccls[1]; }; } else if (pfccls[1] == icl) { // I am the neighboring cell. prcells[ifl1] = pfccls[0]; }; // count rcell number. if (prcells[ifl1] >= 0) { rcellno[icl] += 1; } else { prcells[ifl1] = -1; }; }; // advance pointers. pclfcs += CLMFC+1; prcells += CLMFC; }; (This looks like C since it really was C.)
  • 30. Final notes ❖ Avoid Python when you need speed; use it as a shell to your high-performance library from day one ❖ Resource management is in the core of the hybrid architecture; do it in C++ ❖ Use array (look-up tables) to keep large data ❖ Don’t access PyObject from your core ❖ Always keep in mind the differences in typing systems