SlideShare ist ein Scribd-Unternehmen logo
1 von 79
Downloaden Sie, um offline zu lesen
SPU gameplay
Joe Valenzuela
joe@insomniacgames.com
GDC 2009
glossary
• mobys
– class
– instances
• update classes
• AsyncMobyUpdate
– Guppys
– Async
• aggregateupdate
spu gameplay difficulties
• multiprocessor
• NUMA
• different ISA
• it’s different
– takes time and effort to retrofit code
– unfamiliarity with the necessary upfront design
your virtual functions don’t work
SPU
PPU
preupdate
update
draw
vtable
0x0128020
0x012C050
0x011F070
?
?
?
vtable
0x0128020
0x012C050
0x011F070
your pointers don’t work
4.0
struct foo_t
{
float m_t;
float m_scale;
u32 m_flags;
u16* m_points;
};
foo_t
m_t
m_scale
m_flags
m_points
1.0
0x80100013
0x4050c700
your code doesn’t compile
x:/core/code/users/jvalenzu/shared/igCore/igsys/igDebug.h(19,19): error:
libsn.h: No such file or directory
x:/core/code/users/jvalenzu/shared/igCore/igTime/igTimer.h(15,27): error:
sys/time_util.h: No such file or directory
pickup/pickupbase_preupdate_raw.inc(160): error: 'DEFAULT_FLAGS' is not a
member of 'COLL'
pickup/pickupbase_preupdate_raw.inc(160): error: 'EXCLUDE_HERO_ONLY' is
not a member of 'COLL'
x:/core/code/users/jvalenzu/shared/igCore/igPhysics/ppu/igPhysics.h(293):
error: expected unqualified-id before '*' token
x:/core/code/users/jvalenzu/shared/igCore/igPhysics/ppu/igPhysics.h(293):
error: expected ',' or '...' before '*' token
x:/core/code/users/jvalenzu/shared/igCore/igPhysics/ppu/igPhysics.h(293):
error: ISO C++ forbids declaration of 'parameter' with no type
x:/core/code/users/jvalenzu/shared/igCore/igg/igShaderStructs.h: At global
scope:
x:/core/code/users/jvalenzu/shared/igCore/igg/igShaderStructs.h(22):
error: redefinition of 'struct VtxVec4'
x:/core/code/users/jvalenzu/shared/igCore/igsys/igTypes.h(118): error:
previous definition of 'struct VtxVec4'
object driven update
for(i = 0; i < num_entities; ++i) {
entity* e = &g_entity_base[i];
e->collect_info();
e->update();
e->move();
e->animate();
e->etc();
}
• can’t amortize setup costs
• can’t hide much deferred work
more modular update
for(i = 0, e = &g_entity_base[0]; i < num_ent; ++i, ++e) {
e->collect_info();
e->issue_anim_request();
}
for(i = 0, e = &g_entity_base[0]; i < num_ent; ++i, ++e)
e->update();
finalize_animation();
for(i = 0, e = &g_entity_base[0]; i < num_ent; ++i, ++e)
e->postupdate();
1
2
aggregate updating
• group instances by type
– further sort each group to minimize state
change
• one aggregate updater per type, with
multiple code fragments
• combined ppu & spu update
• more opportunity to amortize cost of
expensive setup
aggregate example (pickup)
Pickup Instances
PickupBolt
PickupBolt
PickupHealth
PickupHealth
PickupHealth
pickupbolt_preupdate
pickupbolt_update
pickupheatlh_preupdate
pickuphealth_update
pickuphealth_postupdate
aggregate example cont…
a trivial optimization
void TruckUpdate::Update()
{
if(m_wait_frame > TIME::GetCurrentFrame())
{
return;
}
// … more work
}
a trivial optimization
void TruckUpdate::Update()
{
if(m_wait_frame > TIME::GetCurrentFrame())
{
return;
}
// … more work
}
a trivial optimization (cont)
void Aggregate_TruckUpdate_Update()
{
u32 current_frame = TIME::GetCurrentFrame();
for(u32 i = 0; i < m_count; ++i)
{
TruckUpdate* self = &m_updates[i];
if(self->m_wait_frame > current_frame)
{
continue;
}
// … more work
}
}
SPU gameplay systems
SPU gameplay intro
• systems built around applying shaders to lots of
homogenous data
– AsyncMobyUpdate
– Guppys
– AsyncEffect
• small, simple code overlays
– user-supplied
– compiled offline
– debuggable
– analogous to graphics shaders
async moby update overview
overview
• AsyncMobyUpdate
– base framework, meant to work with update
classes
– retains MobyInstance rendering pipeline
• Guppys
– “light” MobyInstance replacement
– 100% SPU update, no update class
– 90% MobyInstance rendering pipeline
• AsyncEffect
– very easy fire & forget SPU “effects”
– user-selectable, not user-written, shaders
async moby update
• designed to move update classes to SPU
• user supplied update routine in code
fragment
• multiple code fragments per update class
–one per AI state, for example
• user-defined instance data format
• user-defined common data
• extern code provided through function
pointer tables
async moby update (cont...)
InstancesUpdate Groups
0x0128020
0x012C050
0x011F070
jet_follow_path
0x0128020
0x012C050
0x011F070
jet_circle
jet_crash
• update group per code fragment
• instances bound to update group each frame
instance vs common
• instance data
– data transformed by your update routine
– e.g. a Jet, or zombie limb
• common data
– data common to all instances of the same
type
– e.g. class-static variables, current frame
function pointer table interface
struct global_funcs_t
{
void (*print)(const char *fmt, ...);
// …
f32 (*get_current_time)();
u32 (*read_decrementer)();
u32 (*coll_swept_sphere)(qword p0, qword p1, u32 flags);
void (*coll_get_result) (COLL::Result *dest, u32 id, u32 tag);
};
• debug, print functions
• access to common data, timestep
• collision & FX routines
simple API
u32 tag = AsyncMobyUpdate::AllocTag();
AsyncMobyUpdate::RegisterType (tag, truck_frag_start, truck_frag_size);
// common setup
AsyncMobyUpdate::SetNumCommonBlocks(tag, 1);
AsyncMobyUpdate::SetCommonBlock (tag, 0,
&g_TruckCommon, sizeof g_TruckCommon);
// instance setup
AsyncMobyUpdate::SetNumInstanceStreams(tag, 1);
AsyncMobyUpdate::SetOffset (tag, 0, 0);
AsyncMobyUpdate::SetStride (tag, 0, sizeof(TruckClass));
setup:
use:
AsyncMobyUpdate::AddInstances (tag, instance_block, count);
allocate tag
u32 tag = AsyncMobyUpdate::AllocTag();
AsyncMobyUpdate::RegisterType (tag, truck_frag_start, truck_frag_size);
// common setup
AsyncMobyUpdate::SetNumCommonBlocks(tag, 1);
AsyncMobyUpdate::SetCommonBlock (tag, 0,
&g_TruckCommon, sizeof g_TruckCommon);
// instance setup
AsyncMobyUpdate::SetNumInstanceStreams(tag, 1);
AsyncMobyUpdate::SetOffset (tag, 0, 0);
AsyncMobyUpdate::SetStride (tag, 0, sizeof(TruckClass));
setup:
use:
AsyncMobyUpdate::AddInstances (tag, instance_block, count);
register fragment
u32 tag = AsyncMobyUpdate::AllocTag();
AsyncMobyUpdate::RegisterType (tag, truck_frag_start, truck_frag_size);
// common setup
AsyncMobyUpdate::SetNumCommonBlocks(tag, 1);
AsyncMobyUpdate::SetCommonBlock (tag, 0,
&g_TruckCommon, sizeof g_TruckCommon);
// instance setup
AsyncMobyUpdate::SetNumInstanceStreams(tag, 1);
AsyncMobyUpdate::SetOffset (tag, 0, 0);
AsyncMobyUpdate::SetStride (tag, 0, sizeof(TruckClass));
setup:
use:
AsyncMobyUpdate::AddInstances (tag, instance_block, count);
set common block info
u32 tag = AsyncMobyUpdate::AllocTag();
AsyncMobyUpdate::RegisterType (tag, truck_frag_start, truck_frag_size);
// common setup
AsyncMobyUpdate::SetNumCommonBlocks(tag, 1);
AsyncMobyUpdate::SetCommonBlock (tag, 0,
&g_TruckCommon, sizeof g_TruckCommon);
// instance setup
AsyncMobyUpdate::SetNumInstanceStreams(tag, 1);
AsyncMobyUpdate::SetOffset (tag, 0, 0);
AsyncMobyUpdate::SetStride (tag, 0, sizeof(TruckClass));
setup:
use:
AsyncMobyUpdate::AddInstances (tag, instance_block, count);
set instances info
u32 tag = AsyncMobyUpdate::AllocTag();
AsyncMobyUpdate::RegisterType (tag, truck_frag_start, truck_frag_size);
// common setup
AsyncMobyUpdate::SetNumCommonBlocks(tag, 1);
AsyncMobyUpdate::SetCommonBlock (tag, 0,
&g_TruckCommon, sizeof g_TruckCommon);
// instance setup
AsyncMobyUpdate::SetNumInstanceStreams(tag, 1);
AsyncMobyUpdate::SetOffset (tag, 0, 0);
AsyncMobyUpdate::SetStride (tag, 0, sizeof(TruckClass));
setup:
use:
AsyncMobyUpdate::AddInstances (tag, instance_block, count);
add instances per frame
u32 tag = AsyncMobyUpdate::AllocTag();
AsyncMobyUpdate::RegisterType (tag, truck_frag_start, truck_frag_size);
// common setup
AsyncMobyUpdate::SetNumCommonBlocks(tag, 1);
AsyncMobyUpdate::SetCommonBlock (tag, 0,
&g_TruckCommon, sizeof g_TruckCommon);
// instance setup
AsyncMobyUpdate::SetNumInstanceStreams(tag, 1);
AsyncMobyUpdate::SetOffset (tag, 0, 0);
AsyncMobyUpdate::SetStride (tag, 0, sizeof(TruckClass));
setup:
use:
AsyncMobyUpdate::AddInstances (tag, instance_block, count);
our gameplay shaders
• 32k relocatable programs
• makefile driven process combines code,
data into fragment
• instance types
– user defined (Async Moby Update)
– predefined (Guppys, AsyncEffect)
more shader talk
• What do our code fragments do?
– dma up instances
– transform instance state, position
– maybe set some global state
– dma down instances
• typical gameplay stuff
– preupdate, update, postupdate
more about instance data
• what is our instance data?
– not an object
– generally, a subset of an update class
– different visibility across PU/SPU
• where does instance data live?
– could be copied into a separate array
– could read directly from the update classes
– we support and use both forms
packed instance array
• advantages
–simplicity
–lifetime guarantees
–compression
• disadvantages
–explicit
fragmentation
–pack each frame
pack instance
SPUPPU
update instances
pack instance
pack instance
unpack instance
unpack instance
unpack instance
data inside update class
• advantages
–pay memory cost as you go
–don’t need to know about every detail of
an update class
• disadvantages
–no longer control “lifetime” of objects
• specify interesting data with stride/offset
instance prefetch problem
Instance pipe[2];
dma_get(&pipe[0], ea_base, sizeof(Instance), tag);
for(int i = 0; i < num_instances; ++i) {
Instance* cur_inst = &pipe[i&1];
Instance* next_inst = &pipe[(i+1)&1];
dma_sync(tag);
dma_get(next_inst, ea_base + (i+1) * sizeof(Instance), tag);
// ... do work
}
• ea_base = starting address of our instances
• num_instances = number of instances
instance prefetch problem (cont)
Instance pipe[2];
dma_get(&pipe[0], ea_base, sizeof(Instance), tag);
for(int i = 0; i < num_instances; ++i) {
Instance* cur_inst = &pipe[i&1];
Instance* next_inst = &pipe[(i+1)&1];
dma_sync(tag);
dma_get(next_inst, ea_base + (i+1) * sizeof(Instance), tag);
MobyInstance cur_moby;
dma_get(&cur_moby, cur_inst->m_moby_ea, sizeof(MobyInstance), tag);
dma_sync(tag);
// do work
}
• ... we almost always need to fetch an associated data
member out of our instances immediately
instance “streams”
• instances are available as “streams”
–each has its own base, count, offset,
stride, and addressing mode
• allows one to prefetch multiple associated
elements without stalling
• also useful for getting slices of interesting
data out of an untyped blob
TruckUpdate (PPU) instance
TruckInfo
(interesting data)
offset
stride
memory addressing
• direct
–contiguous block of instances
• direct indexed
–indices used to deference an array of
instances
• indirect indexed
–indices used to source an array of
instance pointers
More Memory Addressing
• common blocks are preloaded
• shaders must DMA up their own instances
• meta-information is preloaded
–indices
–EA base pointer (direct)
–EA pointer array (indirect)
• buffering logic is very context sensitive
indirect indexed example
struct DroneInfo
{
u32 m_stuff;
};
class DroneUpdate : public AI::Component
{
// …
DroneInfo m_info;
MobyInstance* m_moby;
};
indirect indexed example (cont)…
// once
AsyncMobyUpdate::SetNumInstanceStreams(amu_tag, 2);
AsyncMobyUpdate::SetStride(amu_tag, 0, sizeof DroneUpdate);
AsyncMobyUpdate::SetOffset(amu_tag, 0, OFFSETOF(DroneUpdate,
m_info));
AsyncMobyUpdate::SetStride(amu_tag, 1, sizeof MobyInstance);
AsyncMobyUpdate::SetOffset(amu_tag, 1, 0);
// per frame
AsyncMobyUpdate::BeginAddInstances(amu_tag);
AsyncMobyUpdate::SetStreamIndirect(0,
/* base */ m_updates,
/* indices */ m_truck_update_indices,
/* count */ m_num_trucks,
/* max_index */ m_num_trucks);
AsyncMobyUpdate::SetStream(1, IGG::g_MobyInsts.m_array,
m_moby_indices, m_num_trucks);
AsyncMobyUpdate::EndAddInstance ();
indirect indexed example (cont)…
// once
AsyncMobyUpdate::SetNumInstanceStreams(amu_tag, 2);
AsyncMobyUpdate::SetStride(amu_tag, 0, sizeof DroneUpdate);
AsyncMobyUpdate::SetOffset(amu_tag, 0, OFFSETOF(DroneUpdate,
m_info));
AsyncMobyUpdate::SetStride(amu_tag, 1, sizeof MobyInstance);
AsyncMobyUpdate::SetOffset(amu_tag, 1, 0);
// per frame
AsyncMobyUpdate::BeginAddInstances(amu_tag);
AsyncMobyUpdate::SetStreamIndirect(0,
/* base */ m_updates,
/* indices */ m_truck_update_indices,
/* count */ m_num_trucks,
/* max_index */ m_num_trucks);
AsyncMobyUpdate::SetStream(1, IGG::g_MobyInsts.m_array,
m_moby_indices, m_num_trucks);
AsyncMobyUpdate::EndAddInstance ();
typical usage: two streams
void async_foo_update(global_funcs_t* gt, update_set_info_t* info,
common_blocks_t* common_blocks,
instance_streams_t* instance_streams,
u8* work_buffer, u32 buf_size, u32 tags[4]) {
u32* drone_array = instance_streams[0].m_ea_array;
u16* drone_indices = instance_streams[0].m_indices;
u32 drone_offset = instance_streams[0].m_offset;
u32* moby_array = instance_streams[1].m_ea_array;
u16* moby_indices = instance_streams[1].m_indices;
u32 moby_offset = instance_streams[1].m_offset;
DroneInfo inst;
MobyInstance moby;
for(int i = 0; i < instance_streams[0].m_count; ++i) {
gt->dma_get(&inst, drone_array[drone_indices[i]] + drone_offset,
sizeof inst, tags[0]);
gt->dma_get(&moby, moby_array[moby_indices[i]] + moby_offset,
sizeof moby, tags[0]);
gt->dma_wait(tags[0]);
// …
}
}
1
2
indirect indexed example #2
struct TruckInfo
{
u32 m_capacity;
u32 m_type_info;
f32 m_hp;
u32 m_flags;
};
class TruckUpdate : public AI::Component
{
protected:
…;
public:
u32 m_ai_state;
TruckInfo m_info __attribute__((aligned(16)));
};
indirect indexed example 2 (cont)…
// once
AsyncMobyUpdate::SetNumInstanceStreams(amu_tag, 2);
AsyncMobyUpdate::SetStride(amu_tag, 0, sizeof TruckUpdate);
AsyncMobyUpdate::SetOffset(amu_tag, 0, OFFSETOF(TruckUpdate, m_info));
AsyncMobyUpdate::SetStride(amu_tag, 1, sizeof TruckUpdate);
AsyncMobyUpdate::SetOffset(amu_tag, 1, OFFSETOF(TruckUpdate, m_info));
// per frame
AsyncMobyUpdate::BeginAddInstances(amu_tag);
AsyncMobyUpdate::SetStreamIndirect(0,
/* base */ m_updates,
/* indices */ m_truck_update_indices,
/* count */ m_num_trucks,
/* max_index */ m_num_trucks);
AsyncMobyUpdate::SetStreamIndirect(1, m_updates,
m_truck_update_indices,
m_num_trucks, m_num_trucks);
AsyncMobyUpdate::EndAddInstance ();
indirect indexed example 2 (cont)…
// once
AsyncMobyUpdate::SetNumInstanceStreams(amu_tag, 2);
AsyncMobyUpdate::SetStride(amu_tag, 0, sizeof TruckUpdate);
AsyncMobyUpdate::SetOffset(amu_tag, 0, OFFSETOF(TruckUpdate, m_info));
AsyncMobyUpdate::SetStride(amu_tag, 1, sizeof TruckUpdate);
AsyncMobyUpdate::SetOffset(amu_tag, 1, OFFSETOF(TruckUpdate, m_info));
// per frame
AsyncMobyUpdate::BeginAddInstances(amu_tag);
AsyncMobyUpdate::SetStreamIndirect(0,
/* base */ m_updates,
/* indices */ m_truck_update_indices,
/* count */ m_num_trucks,
/* max_index */ m_num_trucks);
AsyncMobyUpdate::SetStreamIndirect(1, m_updates,
m_truck_update_indices,
m_num_trucks, m_num_trucks);
AsyncMobyUpdate::EndAddInstance ();
dma up slice of update class
void async_foo_update(global_funcs_t* gt, update_set_info_t* info,
common_blocks_t* common_blocks,
instance_streams_t* instance_streams,
u8* work_buffer, u32 buf_size, u32 tags[4])
{
u32* earray = instance_streams[0].m_ea_array;
u16* indices = instance_streams[0].m_indices;
u32 offset = instance_streams[0].m_offset;
TruckInfo inst;
for(int i = 0; i < instance_streams[0].m_count; ++i)
{
gt->dma_get(&inst, earray[ indices[i] ] + offset,
sizeof inst, tags[0]);
gt->dma_wait(tags[0]);
// update
}
}
1
2
dma full update class, slice info out
u32* earray = instance_streams[0].m_ea_array;
u16* indices = instance_streams[0].m_indices;
u32 offset = instance_streams[0].m_offset;
u32 stride = instance_streams[0].m_stride;
u32* ai_earray = instance_streams[1].m_ea_array;
u16* ai_indices = instance_streams[1].m_indices;
u32 ai_offset = instance_streams[1].m_offset;
u8* blob = (u8*) alloc(instance_streams[1].m_stride);
for(int i = 0; i < instance_streams[0].m_count; ++i)
{
gt->dma_get(&blob, earray[ indices[i] ], stride, tags[0]);
gt->dma_wait(tags[0]);
TruckInfo *inst = (TruckInfo*) (blob + instance_streams[0].m_offset);
u32 *ai_state = (u32*) (blob + instance_streams[1].m_offset);
// …
}
1
3
2
extern "C" void
async_foo_update(global_funcs_t* gt,
update_set_info_t* info,
common_blocks_t* common_blocks,
instance_streams_t* instance_streams,
u8* work_buffer, u32 buf_size, u32 tags[4])
• global_funcs_t - global function pointer table
• update_set_info_t - meta info
• common blocks_t - stream array for common blocks
• instance_streams_t - stream array for instances
• work_buffer & buf_size - access to LS
• dma_tags - 4 preallocated dma tags
code fragment signature
guppys
• lightweight alternative to MobyInstance
• update runs entirely on SPU
• one 128byte instance type
• common data contained in “schools”
• simplified rendering
guppys
• to cleave a mesh, we previously
required an entire new
MobyInstance to cleave a mesh
– turn off arm mesh segment on
main instance
– turn off all other mesh segments
on spawned instance
• spawn a guppy now instead
• common use case:
“bangles”
the guppy instance
• position/orientation EA
• 1 word flags
• block of “misc” float/int union data
• animation joints EA
• joint remap table
• Insomniac “special sauce”
Async Effect
• simplified API for launching SPU-updating
effects
• no code fragment writing necessary
• specialized at initialization
– linear/angular movement
– rigid body physics
– rendering parameters
Async Effect API
u16 moby_iclass = LookupMobyClassIndex( ART_CLASS_BOT_HYBRID_2_ORGANS );
MobyClass* mclass = &IGG::g_MobyCon.m_classes[ moby_iclass ];
u32 slot = AsyncEffect::BeginEffectInstance(AsyncEffect::EFFECT_STATIONARY,
mclass);
AsyncEffect::SetEffectInstancePo (slot, moby->m_mat3, moby->m_pos);
AsyncEffect::SetEffectInstanceF32(slot, AsyncEffect::EFFECT_PARAM_LIFESPAN,
20.0f);
u32 name = AsyncEffect::Spawn(slot);
• stationary effect with 20 second life
• name can be used to kill the effect
SPU invoked code
• immediate
– via global function table
• deferred
– command buffer
• adhoc
– PPU shims
– direct data injection
different mechanisms
deferred
• PPU shims
– flags set in SPU update, querys/events
triggered subsequently on PPU
• command buffer
– small buffer in LS filled with command specific
byte-code, flushed to PPU
• atomic allocators
– relevant data structures packed on SPU,
atomically inserted
vec4
command buffer: swept sphere
CMD_SWEPT_SPHERE
vec4
point0
pad
IGNORE_TRIS
<job#><id>
u32
u32
u32
point1
command buffer: results
frame n
frame n+1
handle_base = ((stage << 5) | (job_number & 0x1f)) << 24;
// …
handle = handle_base + request++;
frame n, stage 0, job 1
frame n, stage 1, job 1
frame n, stage 2, job 1
frame n, stage 0, job 2
offset table
result
direct data
• patch into atomically allocated, double
buffered data structures
• instance allocates fresh instance each
frame, forwards state between old and
new
• deallocation == stop code fragment
• used for rigid body physics
direct data
SPU main memory
new_ea = rigid body #0
rigid body #0
rigid body #1
rigid body #2
unallocated
get(&ls_rigidbody, old_ea)
update ls_rigidbody
old_ea = new_ea
put(&ls_rigidbody, new_ea)
1
2
3
SPU API
• SPU-API
– mechanism to expose new functionality
through to the AsyncMobyUpdate system
– library of code fragment code and common
data (“fixup”)
– function pointer table (“interface”) oriented
– hides immediate or deferred commands
SPU API
struct rcf_api_interface
{
u32 (*derived_from)(u32 update_class, u32 base_class);
u32 (*add_bolts) (u32 update_class, u32 value);
};
rcf_api_interface* rcf_api = gt->get_spu_api(“rcf2”);
if(rcf_api->derived_from(inst->m_update_class, HERO_Ratchet_CLASS))
{
rci_api->add_bolts(inst->m_update_class, 25);
}
cpp
h
example #1
• Jets
– uses AsyncMobyUpdate along with an Update Class
– packed instance array
– code fragment per AI state
– little initialization setup, state changes rare
– events triggered using adhoc method
• flags checked at pack/unpack time
• inputs:
– position/orientation
– state info
• output:
– position/orientation
example #2
• zombie limbs
– guppy
– not much update logic
– direct data interface to rigid body physics
• input:
– position/orientation
– animation joints
– collision info
• output:
– packed rigid bodies
porting code
porting code sucks
• difficult to retrofit code
• different set of constraints
• expensive use of time
• can result in over-abstracted systems
– like, software cache for transparent pointers
couple of tips
struct foo_t : bar_t
{
// stuff
u32 m_first;
u32 m_second;
u32 m_third;
u32 m_fourth;
// more stuff
};
Separate interesting information from non-necessary data structures
struct foo_info_t
{
u32 m_first;
u32 m_second;
u32 m_third;
u32 m_fourth;
};
struct foo_t : bar_t
{
// stuff
foo_info_t m_info;
// more stuff
};
couple of tips (cont)…
struct baz_t
{
u32 m_foo;
#ifdef PPU
MobyInstance* m_moby;
#else
u32 m_moby_ea;
#endif
f32 m_scale;
};
avoid pointer fixup, define pointer types to unsigned ints
living with polymorphism
• the described mechanisms have problems
with virtual functions
– would need to port and patch up all possible
vtable destinations
– would end up duplicating/shadowing virtual
class hierarchy on SPU
• could work, but we don’t do that
compile-time polymorphism
• do a trace of the class hierarchy: one code
fragment per leaf class
• separate base functions into .inc files
• virtual functions selected through sequential
macro define/undef pairs
• not described:
– deferred resolution of base function calls to
derived function via preprocessor pass
living with polymorphism (cont)
#include "code_fragment_pickup.inl"
#include "code_fragment_pickup_bolt.inl"
pickupbolt_preupdate.cpp
void base_on_pickup(CommonInfo* common, InstanceInfo* inst) {
inst->m_base_info->m_spu_flags |= PICKUP_SPU_FLAGS_PICKED_UP;
}
#define ON_PICKUP(c,i) base_on_pickup(c, i)
code_fragment_pickup.inl
void bolt_on_pickup(CommonInfo* common, InstanceInfo* inst) {
common->m_active_bolt_delta--;
}
#undef ON_PICKUP
#define ON_PICKUP(c,i) bolt_on_pickup(c, i)
code_fragment_pickup_bolt.inl
living with polymorphism (cont)
#include "code_fragment_pickup.inl"
#include "code_fragment_pickup_bolt.inl"
pickupbolt_preupdate.cpp
void base_on_pickup(CommonInfo* common, InstanceInfo* inst) {
inst->m_base_info->m_spu_flags |= PICKUP_SPU_FLAGS_PICKED_UP;
}
#define ON_PICKUP(c,i) base_on_pickup(c, i)
code_fragment_pickup.inl
void bolt_on_pickup(CommonInfo* common, InstanceInfo* inst) {
common->m_active_bolt_delta--;
}
#undef ON_PICKUP
#define ON_PICKUP(c,i) bolt_on_pickup(c, i)
code_fragment_pickup_bolt.inl
living with polymorphism (cont)
#include "code_fragment_pickup.inl"
#include "code_fragment_pickup_bolt.inl"
pickupbolt_preupdate.cpp
void base_on_pickup(CommonInfo* common, InstanceInfo* inst) {
inst->m_base_info->m_spu_flags |= PICKUP_SPU_FLAGS_PICKED_UP;
}
#define ON_PICKUP(c,i) base_on_pickup(c, i)
code_fragment_pickup.inl
void bolt_on_pickup(CommonInfo* common, InstanceInfo* inst) {
common->m_active_bolt_delta--;
}
#undef ON_PICKUP
#define ON_PICKUP(c,i) bolt_on_pickup(c, i)
code_fragment_pickup_bolt.inl
design from scratch
• parameterized systems
– not specialized by code
• use atomic allocators as programming
interfaces
• no virtual functions in update phase
• separate perception, cognition, action
• plan to interleave ppu/spu
In conclusion
• design up front for deferred, SPU-friendly
systems.
• don’t worry too much about writing optimized
code, just make it difficult to write unoptimizable
code
• remember, this is supposed to be fun. SPU
programming is fun.
all pictures U.S. Fish & Wildlife Service except
The Whale Fishery - NOAA National Marine Fisheries Service
jurvetson@flickr “Swarm Intelligence” photo
Thanks
joe@insomniacgames.com

Weitere ähnliche Inhalte

Ähnlich wie SPU gameplay

Static analysis of C++ source code
Static analysis of C++ source codeStatic analysis of C++ source code
Static analysis of C++ source codePVS-Studio
 
Gdc09 Minigames
Gdc09 MinigamesGdc09 Minigames
Gdc09 MinigamesSusan Gold
 
PVS-Studio, a solution for resource intensive applications development
PVS-Studio, a solution for resource intensive applications developmentPVS-Studio, a solution for resource intensive applications development
PVS-Studio, a solution for resource intensive applications developmentOOO "Program Verification Systems"
 
2012 04-19 theory-of_operation
2012 04-19 theory-of_operation2012 04-19 theory-of_operation
2012 04-19 theory-of_operationbobwolff68
 
Why you should be using structured logs
Why you should be using structured logsWhy you should be using structured logs
Why you should be using structured logsStefan Krawczyk
 
GDG-MLOps using Protobuf in Unity
GDG-MLOps using Protobuf in UnityGDG-MLOps using Protobuf in Unity
GDG-MLOps using Protobuf in UnityIvan Chiou
 
PVS-Studio and Continuous Integration: TeamCity. Analysis of the Open RollerC...
PVS-Studio and Continuous Integration: TeamCity. Analysis of the Open RollerC...PVS-Studio and Continuous Integration: TeamCity. Analysis of the Open RollerC...
PVS-Studio and Continuous Integration: TeamCity. Analysis of the Open RollerC...Andrey Karpov
 
Viktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceViktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceLviv Startup Club
 
Over-the-Air: How we Remotely Compromised the Gateway, BCM, and Autopilot ECU...
Over-the-Air: How we Remotely Compromised the Gateway, BCM, and Autopilot ECU...Over-the-Air: How we Remotely Compromised the Gateway, BCM, and Autopilot ECU...
Over-the-Air: How we Remotely Compromised the Gateway, BCM, and Autopilot ECU...Priyanka Aash
 
Advanced iOS Build Mechanics, Sebastien Pouliot
Advanced iOS Build Mechanics, Sebastien PouliotAdvanced iOS Build Mechanics, Sebastien Pouliot
Advanced iOS Build Mechanics, Sebastien PouliotXamarin
 
SMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgiSMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgiTakuya ASADA
 
Scene Graphs & Component Based Game Engines
Scene Graphs & Component Based Game EnginesScene Graphs & Component Based Game Engines
Scene Graphs & Component Based Game EnginesBryan Duggan
 
Implementing of classical synchronization problem by using semaphores
Implementing of classical synchronization problem by using semaphoresImplementing of classical synchronization problem by using semaphores
Implementing of classical synchronization problem by using semaphoresGowtham Reddy
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.J On The Beach
 
syzkaller: the next gen kernel fuzzer
syzkaller: the next gen kernel fuzzersyzkaller: the next gen kernel fuzzer
syzkaller: the next gen kernel fuzzerDmitry Vyukov
 
Tiramisu をちょっと、味見してみました。
Tiramisu をちょっと、味見してみました。Tiramisu をちょっと、味見してみました。
Tiramisu をちょっと、味見してみました。Mr. Vengineer
 
Когда предрелизный не только софт
Когда предрелизный не только софтКогда предрелизный не только софт
Когда предрелизный не только софтCEE-SEC(R)
 
[CCC-28c3] Post Memory Corruption Memory Analysis
[CCC-28c3] Post Memory Corruption Memory Analysis[CCC-28c3] Post Memory Corruption Memory Analysis
[CCC-28c3] Post Memory Corruption Memory AnalysisMoabi.com
 

Ähnlich wie SPU gameplay (20)

Static analysis of C++ source code
Static analysis of C++ source codeStatic analysis of C++ source code
Static analysis of C++ source code
 
Gdc09 Minigames
Gdc09 MinigamesGdc09 Minigames
Gdc09 Minigames
 
HPC Examples
HPC ExamplesHPC Examples
HPC Examples
 
PVS-Studio, a solution for resource intensive applications development
PVS-Studio, a solution for resource intensive applications developmentPVS-Studio, a solution for resource intensive applications development
PVS-Studio, a solution for resource intensive applications development
 
2012 04-19 theory-of_operation
2012 04-19 theory-of_operation2012 04-19 theory-of_operation
2012 04-19 theory-of_operation
 
Why you should be using structured logs
Why you should be using structured logsWhy you should be using structured logs
Why you should be using structured logs
 
GDG-MLOps using Protobuf in Unity
GDG-MLOps using Protobuf in UnityGDG-MLOps using Protobuf in Unity
GDG-MLOps using Protobuf in Unity
 
PVS-Studio and Continuous Integration: TeamCity. Analysis of the Open RollerC...
PVS-Studio and Continuous Integration: TeamCity. Analysis of the Open RollerC...PVS-Studio and Continuous Integration: TeamCity. Analysis of the Open RollerC...
PVS-Studio and Continuous Integration: TeamCity. Analysis of the Open RollerC...
 
20131212
2013121220131212
20131212
 
Viktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceViktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning Service
 
Over-the-Air: How we Remotely Compromised the Gateway, BCM, and Autopilot ECU...
Over-the-Air: How we Remotely Compromised the Gateway, BCM, and Autopilot ECU...Over-the-Air: How we Remotely Compromised the Gateway, BCM, and Autopilot ECU...
Over-the-Air: How we Remotely Compromised the Gateway, BCM, and Autopilot ECU...
 
Advanced iOS Build Mechanics, Sebastien Pouliot
Advanced iOS Build Mechanics, Sebastien PouliotAdvanced iOS Build Mechanics, Sebastien Pouliot
Advanced iOS Build Mechanics, Sebastien Pouliot
 
SMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgiSMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgi
 
Scene Graphs & Component Based Game Engines
Scene Graphs & Component Based Game EnginesScene Graphs & Component Based Game Engines
Scene Graphs & Component Based Game Engines
 
Implementing of classical synchronization problem by using semaphores
Implementing of classical synchronization problem by using semaphoresImplementing of classical synchronization problem by using semaphores
Implementing of classical synchronization problem by using semaphores
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.
 
syzkaller: the next gen kernel fuzzer
syzkaller: the next gen kernel fuzzersyzkaller: the next gen kernel fuzzer
syzkaller: the next gen kernel fuzzer
 
Tiramisu をちょっと、味見してみました。
Tiramisu をちょっと、味見してみました。Tiramisu をちょっと、味見してみました。
Tiramisu をちょっと、味見してみました。
 
Когда предрелизный не только софт
Когда предрелизный не только софтКогда предрелизный не только софт
Когда предрелизный не только софт
 
[CCC-28c3] Post Memory Corruption Memory Analysis
[CCC-28c3] Post Memory Corruption Memory Analysis[CCC-28c3] Post Memory Corruption Memory Analysis
[CCC-28c3] Post Memory Corruption Memory Analysis
 

Mehr von Slide_N

SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...Slide_N
 
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdfParallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdfSlide_N
 
Experiences with PlayStation VR - Sony Interactive Entertainment
Experiences with PlayStation VR  - Sony Interactive EntertainmentExperiences with PlayStation VR  - Sony Interactive Entertainment
Experiences with PlayStation VR - Sony Interactive EntertainmentSlide_N
 
SPU-based Deferred Shading for Battlefield 3 on Playstation 3
SPU-based Deferred Shading for Battlefield 3 on Playstation 3SPU-based Deferred Shading for Battlefield 3 on Playstation 3
SPU-based Deferred Shading for Battlefield 3 on Playstation 3Slide_N
 
Filtering Approaches for Real-Time Anti-Aliasing
Filtering Approaches for Real-Time Anti-AliasingFiltering Approaches for Real-Time Anti-Aliasing
Filtering Approaches for Real-Time Anti-AliasingSlide_N
 
Chip Multiprocessing and the Cell Broadband Engine.pdf
Chip Multiprocessing and the Cell Broadband Engine.pdfChip Multiprocessing and the Cell Broadband Engine.pdf
Chip Multiprocessing and the Cell Broadband Engine.pdfSlide_N
 
Cell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology GroupCell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology GroupSlide_N
 
New Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - KutaragiNew Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - KutaragiSlide_N
 
Sony Transformation 60 - Kutaragi
Sony Transformation 60 - KutaragiSony Transformation 60 - Kutaragi
Sony Transformation 60 - KutaragiSlide_N
 
Sony Transformation 60
Sony Transformation 60 Sony Transformation 60
Sony Transformation 60 Slide_N
 
Moving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living RoomMoving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living RoomSlide_N
 
Cell Technology for Graphics and Visualization
Cell Technology for Graphics and VisualizationCell Technology for Graphics and Visualization
Cell Technology for Graphics and VisualizationSlide_N
 
Industry Trends in Microprocessor Design
Industry Trends in Microprocessor DesignIndustry Trends in Microprocessor Design
Industry Trends in Microprocessor DesignSlide_N
 
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with OcelotTranslating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with OcelotSlide_N
 
Cellular Neural Networks: Theory
Cellular Neural Networks: TheoryCellular Neural Networks: Theory
Cellular Neural Networks: TheorySlide_N
 
Network Processing on an SPE Core in Cell Broadband EngineTM
Network Processing on an SPE Core in Cell Broadband EngineTMNetwork Processing on an SPE Core in Cell Broadband EngineTM
Network Processing on an SPE Core in Cell Broadband EngineTMSlide_N
 
Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3Slide_N
 
Developing Technology for Ratchet and Clank Future: Tools of Destruction
Developing Technology for Ratchet and Clank Future: Tools of DestructionDeveloping Technology for Ratchet and Clank Future: Tools of Destruction
Developing Technology for Ratchet and Clank Future: Tools of DestructionSlide_N
 
NVIDIA Tesla Accelerated Computing Platform for IBM Power
NVIDIA Tesla Accelerated Computing Platform for IBM PowerNVIDIA Tesla Accelerated Computing Platform for IBM Power
NVIDIA Tesla Accelerated Computing Platform for IBM PowerSlide_N
 
The Visual Computing Revolution Continues
The Visual Computing Revolution ContinuesThe Visual Computing Revolution Continues
The Visual Computing Revolution ContinuesSlide_N
 

Mehr von Slide_N (20)

SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
 
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdfParallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
 
Experiences with PlayStation VR - Sony Interactive Entertainment
Experiences with PlayStation VR  - Sony Interactive EntertainmentExperiences with PlayStation VR  - Sony Interactive Entertainment
Experiences with PlayStation VR - Sony Interactive Entertainment
 
SPU-based Deferred Shading for Battlefield 3 on Playstation 3
SPU-based Deferred Shading for Battlefield 3 on Playstation 3SPU-based Deferred Shading for Battlefield 3 on Playstation 3
SPU-based Deferred Shading for Battlefield 3 on Playstation 3
 
Filtering Approaches for Real-Time Anti-Aliasing
Filtering Approaches for Real-Time Anti-AliasingFiltering Approaches for Real-Time Anti-Aliasing
Filtering Approaches for Real-Time Anti-Aliasing
 
Chip Multiprocessing and the Cell Broadband Engine.pdf
Chip Multiprocessing and the Cell Broadband Engine.pdfChip Multiprocessing and the Cell Broadband Engine.pdf
Chip Multiprocessing and the Cell Broadband Engine.pdf
 
Cell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology GroupCell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology Group
 
New Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - KutaragiNew Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - Kutaragi
 
Sony Transformation 60 - Kutaragi
Sony Transformation 60 - KutaragiSony Transformation 60 - Kutaragi
Sony Transformation 60 - Kutaragi
 
Sony Transformation 60
Sony Transformation 60 Sony Transformation 60
Sony Transformation 60
 
Moving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living RoomMoving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living Room
 
Cell Technology for Graphics and Visualization
Cell Technology for Graphics and VisualizationCell Technology for Graphics and Visualization
Cell Technology for Graphics and Visualization
 
Industry Trends in Microprocessor Design
Industry Trends in Microprocessor DesignIndustry Trends in Microprocessor Design
Industry Trends in Microprocessor Design
 
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with OcelotTranslating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
 
Cellular Neural Networks: Theory
Cellular Neural Networks: TheoryCellular Neural Networks: Theory
Cellular Neural Networks: Theory
 
Network Processing on an SPE Core in Cell Broadband EngineTM
Network Processing on an SPE Core in Cell Broadband EngineTMNetwork Processing on an SPE Core in Cell Broadband EngineTM
Network Processing on an SPE Core in Cell Broadband EngineTM
 
Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3
 
Developing Technology for Ratchet and Clank Future: Tools of Destruction
Developing Technology for Ratchet and Clank Future: Tools of DestructionDeveloping Technology for Ratchet and Clank Future: Tools of Destruction
Developing Technology for Ratchet and Clank Future: Tools of Destruction
 
NVIDIA Tesla Accelerated Computing Platform for IBM Power
NVIDIA Tesla Accelerated Computing Platform for IBM PowerNVIDIA Tesla Accelerated Computing Platform for IBM Power
NVIDIA Tesla Accelerated Computing Platform for IBM Power
 
The Visual Computing Revolution Continues
The Visual Computing Revolution ContinuesThe Visual Computing Revolution Continues
The Visual Computing Revolution Continues
 

Kürzlich hochgeladen

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 

Kürzlich hochgeladen (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

SPU gameplay

  • 2. glossary • mobys – class – instances • update classes • AsyncMobyUpdate – Guppys – Async • aggregateupdate
  • 3. spu gameplay difficulties • multiprocessor • NUMA • different ISA • it’s different – takes time and effort to retrofit code – unfamiliarity with the necessary upfront design
  • 4. your virtual functions don’t work SPU PPU preupdate update draw vtable 0x0128020 0x012C050 0x011F070 ? ? ? vtable 0x0128020 0x012C050 0x011F070
  • 5. your pointers don’t work 4.0 struct foo_t { float m_t; float m_scale; u32 m_flags; u16* m_points; }; foo_t m_t m_scale m_flags m_points 1.0 0x80100013 0x4050c700
  • 6. your code doesn’t compile x:/core/code/users/jvalenzu/shared/igCore/igsys/igDebug.h(19,19): error: libsn.h: No such file or directory x:/core/code/users/jvalenzu/shared/igCore/igTime/igTimer.h(15,27): error: sys/time_util.h: No such file or directory pickup/pickupbase_preupdate_raw.inc(160): error: 'DEFAULT_FLAGS' is not a member of 'COLL' pickup/pickupbase_preupdate_raw.inc(160): error: 'EXCLUDE_HERO_ONLY' is not a member of 'COLL' x:/core/code/users/jvalenzu/shared/igCore/igPhysics/ppu/igPhysics.h(293): error: expected unqualified-id before '*' token x:/core/code/users/jvalenzu/shared/igCore/igPhysics/ppu/igPhysics.h(293): error: expected ',' or '...' before '*' token x:/core/code/users/jvalenzu/shared/igCore/igPhysics/ppu/igPhysics.h(293): error: ISO C++ forbids declaration of 'parameter' with no type x:/core/code/users/jvalenzu/shared/igCore/igg/igShaderStructs.h: At global scope: x:/core/code/users/jvalenzu/shared/igCore/igg/igShaderStructs.h(22): error: redefinition of 'struct VtxVec4' x:/core/code/users/jvalenzu/shared/igCore/igsys/igTypes.h(118): error: previous definition of 'struct VtxVec4'
  • 7. object driven update for(i = 0; i < num_entities; ++i) { entity* e = &g_entity_base[i]; e->collect_info(); e->update(); e->move(); e->animate(); e->etc(); } • can’t amortize setup costs • can’t hide much deferred work
  • 8. more modular update for(i = 0, e = &g_entity_base[0]; i < num_ent; ++i, ++e) { e->collect_info(); e->issue_anim_request(); } for(i = 0, e = &g_entity_base[0]; i < num_ent; ++i, ++e) e->update(); finalize_animation(); for(i = 0, e = &g_entity_base[0]; i < num_ent; ++i, ++e) e->postupdate(); 1 2
  • 9. aggregate updating • group instances by type – further sort each group to minimize state change • one aggregate updater per type, with multiple code fragments • combined ppu & spu update • more opportunity to amortize cost of expensive setup
  • 12. a trivial optimization void TruckUpdate::Update() { if(m_wait_frame > TIME::GetCurrentFrame()) { return; } // … more work }
  • 13. a trivial optimization void TruckUpdate::Update() { if(m_wait_frame > TIME::GetCurrentFrame()) { return; } // … more work }
  • 14. a trivial optimization (cont) void Aggregate_TruckUpdate_Update() { u32 current_frame = TIME::GetCurrentFrame(); for(u32 i = 0; i < m_count; ++i) { TruckUpdate* self = &m_updates[i]; if(self->m_wait_frame > current_frame) { continue; } // … more work } }
  • 16. SPU gameplay intro • systems built around applying shaders to lots of homogenous data – AsyncMobyUpdate – Guppys – AsyncEffect • small, simple code overlays – user-supplied – compiled offline – debuggable – analogous to graphics shaders
  • 17. async moby update overview
  • 18. overview • AsyncMobyUpdate – base framework, meant to work with update classes – retains MobyInstance rendering pipeline • Guppys – “light” MobyInstance replacement – 100% SPU update, no update class – 90% MobyInstance rendering pipeline • AsyncEffect – very easy fire & forget SPU “effects” – user-selectable, not user-written, shaders
  • 19. async moby update • designed to move update classes to SPU • user supplied update routine in code fragment • multiple code fragments per update class –one per AI state, for example • user-defined instance data format • user-defined common data • extern code provided through function pointer tables
  • 20. async moby update (cont...) InstancesUpdate Groups 0x0128020 0x012C050 0x011F070 jet_follow_path 0x0128020 0x012C050 0x011F070 jet_circle jet_crash • update group per code fragment • instances bound to update group each frame
  • 21. instance vs common • instance data – data transformed by your update routine – e.g. a Jet, or zombie limb • common data – data common to all instances of the same type – e.g. class-static variables, current frame
  • 22. function pointer table interface struct global_funcs_t { void (*print)(const char *fmt, ...); // … f32 (*get_current_time)(); u32 (*read_decrementer)(); u32 (*coll_swept_sphere)(qword p0, qword p1, u32 flags); void (*coll_get_result) (COLL::Result *dest, u32 id, u32 tag); }; • debug, print functions • access to common data, timestep • collision & FX routines
  • 23. simple API u32 tag = AsyncMobyUpdate::AllocTag(); AsyncMobyUpdate::RegisterType (tag, truck_frag_start, truck_frag_size); // common setup AsyncMobyUpdate::SetNumCommonBlocks(tag, 1); AsyncMobyUpdate::SetCommonBlock (tag, 0, &g_TruckCommon, sizeof g_TruckCommon); // instance setup AsyncMobyUpdate::SetNumInstanceStreams(tag, 1); AsyncMobyUpdate::SetOffset (tag, 0, 0); AsyncMobyUpdate::SetStride (tag, 0, sizeof(TruckClass)); setup: use: AsyncMobyUpdate::AddInstances (tag, instance_block, count);
  • 24. allocate tag u32 tag = AsyncMobyUpdate::AllocTag(); AsyncMobyUpdate::RegisterType (tag, truck_frag_start, truck_frag_size); // common setup AsyncMobyUpdate::SetNumCommonBlocks(tag, 1); AsyncMobyUpdate::SetCommonBlock (tag, 0, &g_TruckCommon, sizeof g_TruckCommon); // instance setup AsyncMobyUpdate::SetNumInstanceStreams(tag, 1); AsyncMobyUpdate::SetOffset (tag, 0, 0); AsyncMobyUpdate::SetStride (tag, 0, sizeof(TruckClass)); setup: use: AsyncMobyUpdate::AddInstances (tag, instance_block, count);
  • 25. register fragment u32 tag = AsyncMobyUpdate::AllocTag(); AsyncMobyUpdate::RegisterType (tag, truck_frag_start, truck_frag_size); // common setup AsyncMobyUpdate::SetNumCommonBlocks(tag, 1); AsyncMobyUpdate::SetCommonBlock (tag, 0, &g_TruckCommon, sizeof g_TruckCommon); // instance setup AsyncMobyUpdate::SetNumInstanceStreams(tag, 1); AsyncMobyUpdate::SetOffset (tag, 0, 0); AsyncMobyUpdate::SetStride (tag, 0, sizeof(TruckClass)); setup: use: AsyncMobyUpdate::AddInstances (tag, instance_block, count);
  • 26. set common block info u32 tag = AsyncMobyUpdate::AllocTag(); AsyncMobyUpdate::RegisterType (tag, truck_frag_start, truck_frag_size); // common setup AsyncMobyUpdate::SetNumCommonBlocks(tag, 1); AsyncMobyUpdate::SetCommonBlock (tag, 0, &g_TruckCommon, sizeof g_TruckCommon); // instance setup AsyncMobyUpdate::SetNumInstanceStreams(tag, 1); AsyncMobyUpdate::SetOffset (tag, 0, 0); AsyncMobyUpdate::SetStride (tag, 0, sizeof(TruckClass)); setup: use: AsyncMobyUpdate::AddInstances (tag, instance_block, count);
  • 27. set instances info u32 tag = AsyncMobyUpdate::AllocTag(); AsyncMobyUpdate::RegisterType (tag, truck_frag_start, truck_frag_size); // common setup AsyncMobyUpdate::SetNumCommonBlocks(tag, 1); AsyncMobyUpdate::SetCommonBlock (tag, 0, &g_TruckCommon, sizeof g_TruckCommon); // instance setup AsyncMobyUpdate::SetNumInstanceStreams(tag, 1); AsyncMobyUpdate::SetOffset (tag, 0, 0); AsyncMobyUpdate::SetStride (tag, 0, sizeof(TruckClass)); setup: use: AsyncMobyUpdate::AddInstances (tag, instance_block, count);
  • 28. add instances per frame u32 tag = AsyncMobyUpdate::AllocTag(); AsyncMobyUpdate::RegisterType (tag, truck_frag_start, truck_frag_size); // common setup AsyncMobyUpdate::SetNumCommonBlocks(tag, 1); AsyncMobyUpdate::SetCommonBlock (tag, 0, &g_TruckCommon, sizeof g_TruckCommon); // instance setup AsyncMobyUpdate::SetNumInstanceStreams(tag, 1); AsyncMobyUpdate::SetOffset (tag, 0, 0); AsyncMobyUpdate::SetStride (tag, 0, sizeof(TruckClass)); setup: use: AsyncMobyUpdate::AddInstances (tag, instance_block, count);
  • 29. our gameplay shaders • 32k relocatable programs • makefile driven process combines code, data into fragment • instance types – user defined (Async Moby Update) – predefined (Guppys, AsyncEffect)
  • 30. more shader talk • What do our code fragments do? – dma up instances – transform instance state, position – maybe set some global state – dma down instances • typical gameplay stuff – preupdate, update, postupdate
  • 31. more about instance data • what is our instance data? – not an object – generally, a subset of an update class – different visibility across PU/SPU • where does instance data live? – could be copied into a separate array – could read directly from the update classes – we support and use both forms
  • 32. packed instance array • advantages –simplicity –lifetime guarantees –compression • disadvantages –explicit fragmentation –pack each frame pack instance SPUPPU update instances pack instance pack instance unpack instance unpack instance unpack instance
  • 33. data inside update class • advantages –pay memory cost as you go –don’t need to know about every detail of an update class • disadvantages –no longer control “lifetime” of objects • specify interesting data with stride/offset
  • 34. instance prefetch problem Instance pipe[2]; dma_get(&pipe[0], ea_base, sizeof(Instance), tag); for(int i = 0; i < num_instances; ++i) { Instance* cur_inst = &pipe[i&1]; Instance* next_inst = &pipe[(i+1)&1]; dma_sync(tag); dma_get(next_inst, ea_base + (i+1) * sizeof(Instance), tag); // ... do work } • ea_base = starting address of our instances • num_instances = number of instances
  • 35. instance prefetch problem (cont) Instance pipe[2]; dma_get(&pipe[0], ea_base, sizeof(Instance), tag); for(int i = 0; i < num_instances; ++i) { Instance* cur_inst = &pipe[i&1]; Instance* next_inst = &pipe[(i+1)&1]; dma_sync(tag); dma_get(next_inst, ea_base + (i+1) * sizeof(Instance), tag); MobyInstance cur_moby; dma_get(&cur_moby, cur_inst->m_moby_ea, sizeof(MobyInstance), tag); dma_sync(tag); // do work } • ... we almost always need to fetch an associated data member out of our instances immediately
  • 36. instance “streams” • instances are available as “streams” –each has its own base, count, offset, stride, and addressing mode • allows one to prefetch multiple associated elements without stalling • also useful for getting slices of interesting data out of an untyped blob
  • 40. memory addressing • direct –contiguous block of instances • direct indexed –indices used to deference an array of instances • indirect indexed –indices used to source an array of instance pointers
  • 41. More Memory Addressing • common blocks are preloaded • shaders must DMA up their own instances • meta-information is preloaded –indices –EA base pointer (direct) –EA pointer array (indirect) • buffering logic is very context sensitive
  • 42. indirect indexed example struct DroneInfo { u32 m_stuff; }; class DroneUpdate : public AI::Component { // … DroneInfo m_info; MobyInstance* m_moby; };
  • 43. indirect indexed example (cont)… // once AsyncMobyUpdate::SetNumInstanceStreams(amu_tag, 2); AsyncMobyUpdate::SetStride(amu_tag, 0, sizeof DroneUpdate); AsyncMobyUpdate::SetOffset(amu_tag, 0, OFFSETOF(DroneUpdate, m_info)); AsyncMobyUpdate::SetStride(amu_tag, 1, sizeof MobyInstance); AsyncMobyUpdate::SetOffset(amu_tag, 1, 0); // per frame AsyncMobyUpdate::BeginAddInstances(amu_tag); AsyncMobyUpdate::SetStreamIndirect(0, /* base */ m_updates, /* indices */ m_truck_update_indices, /* count */ m_num_trucks, /* max_index */ m_num_trucks); AsyncMobyUpdate::SetStream(1, IGG::g_MobyInsts.m_array, m_moby_indices, m_num_trucks); AsyncMobyUpdate::EndAddInstance ();
  • 44. indirect indexed example (cont)… // once AsyncMobyUpdate::SetNumInstanceStreams(amu_tag, 2); AsyncMobyUpdate::SetStride(amu_tag, 0, sizeof DroneUpdate); AsyncMobyUpdate::SetOffset(amu_tag, 0, OFFSETOF(DroneUpdate, m_info)); AsyncMobyUpdate::SetStride(amu_tag, 1, sizeof MobyInstance); AsyncMobyUpdate::SetOffset(amu_tag, 1, 0); // per frame AsyncMobyUpdate::BeginAddInstances(amu_tag); AsyncMobyUpdate::SetStreamIndirect(0, /* base */ m_updates, /* indices */ m_truck_update_indices, /* count */ m_num_trucks, /* max_index */ m_num_trucks); AsyncMobyUpdate::SetStream(1, IGG::g_MobyInsts.m_array, m_moby_indices, m_num_trucks); AsyncMobyUpdate::EndAddInstance ();
  • 45. typical usage: two streams void async_foo_update(global_funcs_t* gt, update_set_info_t* info, common_blocks_t* common_blocks, instance_streams_t* instance_streams, u8* work_buffer, u32 buf_size, u32 tags[4]) { u32* drone_array = instance_streams[0].m_ea_array; u16* drone_indices = instance_streams[0].m_indices; u32 drone_offset = instance_streams[0].m_offset; u32* moby_array = instance_streams[1].m_ea_array; u16* moby_indices = instance_streams[1].m_indices; u32 moby_offset = instance_streams[1].m_offset; DroneInfo inst; MobyInstance moby; for(int i = 0; i < instance_streams[0].m_count; ++i) { gt->dma_get(&inst, drone_array[drone_indices[i]] + drone_offset, sizeof inst, tags[0]); gt->dma_get(&moby, moby_array[moby_indices[i]] + moby_offset, sizeof moby, tags[0]); gt->dma_wait(tags[0]); // … } } 1 2
  • 46. indirect indexed example #2 struct TruckInfo { u32 m_capacity; u32 m_type_info; f32 m_hp; u32 m_flags; }; class TruckUpdate : public AI::Component { protected: …; public: u32 m_ai_state; TruckInfo m_info __attribute__((aligned(16))); };
  • 47. indirect indexed example 2 (cont)… // once AsyncMobyUpdate::SetNumInstanceStreams(amu_tag, 2); AsyncMobyUpdate::SetStride(amu_tag, 0, sizeof TruckUpdate); AsyncMobyUpdate::SetOffset(amu_tag, 0, OFFSETOF(TruckUpdate, m_info)); AsyncMobyUpdate::SetStride(amu_tag, 1, sizeof TruckUpdate); AsyncMobyUpdate::SetOffset(amu_tag, 1, OFFSETOF(TruckUpdate, m_info)); // per frame AsyncMobyUpdate::BeginAddInstances(amu_tag); AsyncMobyUpdate::SetStreamIndirect(0, /* base */ m_updates, /* indices */ m_truck_update_indices, /* count */ m_num_trucks, /* max_index */ m_num_trucks); AsyncMobyUpdate::SetStreamIndirect(1, m_updates, m_truck_update_indices, m_num_trucks, m_num_trucks); AsyncMobyUpdate::EndAddInstance ();
  • 48. indirect indexed example 2 (cont)… // once AsyncMobyUpdate::SetNumInstanceStreams(amu_tag, 2); AsyncMobyUpdate::SetStride(amu_tag, 0, sizeof TruckUpdate); AsyncMobyUpdate::SetOffset(amu_tag, 0, OFFSETOF(TruckUpdate, m_info)); AsyncMobyUpdate::SetStride(amu_tag, 1, sizeof TruckUpdate); AsyncMobyUpdate::SetOffset(amu_tag, 1, OFFSETOF(TruckUpdate, m_info)); // per frame AsyncMobyUpdate::BeginAddInstances(amu_tag); AsyncMobyUpdate::SetStreamIndirect(0, /* base */ m_updates, /* indices */ m_truck_update_indices, /* count */ m_num_trucks, /* max_index */ m_num_trucks); AsyncMobyUpdate::SetStreamIndirect(1, m_updates, m_truck_update_indices, m_num_trucks, m_num_trucks); AsyncMobyUpdate::EndAddInstance ();
  • 49. dma up slice of update class void async_foo_update(global_funcs_t* gt, update_set_info_t* info, common_blocks_t* common_blocks, instance_streams_t* instance_streams, u8* work_buffer, u32 buf_size, u32 tags[4]) { u32* earray = instance_streams[0].m_ea_array; u16* indices = instance_streams[0].m_indices; u32 offset = instance_streams[0].m_offset; TruckInfo inst; for(int i = 0; i < instance_streams[0].m_count; ++i) { gt->dma_get(&inst, earray[ indices[i] ] + offset, sizeof inst, tags[0]); gt->dma_wait(tags[0]); // update } } 1 2
  • 50. dma full update class, slice info out u32* earray = instance_streams[0].m_ea_array; u16* indices = instance_streams[0].m_indices; u32 offset = instance_streams[0].m_offset; u32 stride = instance_streams[0].m_stride; u32* ai_earray = instance_streams[1].m_ea_array; u16* ai_indices = instance_streams[1].m_indices; u32 ai_offset = instance_streams[1].m_offset; u8* blob = (u8*) alloc(instance_streams[1].m_stride); for(int i = 0; i < instance_streams[0].m_count; ++i) { gt->dma_get(&blob, earray[ indices[i] ], stride, tags[0]); gt->dma_wait(tags[0]); TruckInfo *inst = (TruckInfo*) (blob + instance_streams[0].m_offset); u32 *ai_state = (u32*) (blob + instance_streams[1].m_offset); // … } 1 3 2
  • 51. extern "C" void async_foo_update(global_funcs_t* gt, update_set_info_t* info, common_blocks_t* common_blocks, instance_streams_t* instance_streams, u8* work_buffer, u32 buf_size, u32 tags[4]) • global_funcs_t - global function pointer table • update_set_info_t - meta info • common blocks_t - stream array for common blocks • instance_streams_t - stream array for instances • work_buffer & buf_size - access to LS • dma_tags - 4 preallocated dma tags code fragment signature
  • 52. guppys • lightweight alternative to MobyInstance • update runs entirely on SPU • one 128byte instance type • common data contained in “schools” • simplified rendering
  • 53. guppys • to cleave a mesh, we previously required an entire new MobyInstance to cleave a mesh – turn off arm mesh segment on main instance – turn off all other mesh segments on spawned instance • spawn a guppy now instead • common use case: “bangles”
  • 54. the guppy instance • position/orientation EA • 1 word flags • block of “misc” float/int union data • animation joints EA • joint remap table • Insomniac “special sauce”
  • 55. Async Effect • simplified API for launching SPU-updating effects • no code fragment writing necessary • specialized at initialization – linear/angular movement – rigid body physics – rendering parameters
  • 56. Async Effect API u16 moby_iclass = LookupMobyClassIndex( ART_CLASS_BOT_HYBRID_2_ORGANS ); MobyClass* mclass = &IGG::g_MobyCon.m_classes[ moby_iclass ]; u32 slot = AsyncEffect::BeginEffectInstance(AsyncEffect::EFFECT_STATIONARY, mclass); AsyncEffect::SetEffectInstancePo (slot, moby->m_mat3, moby->m_pos); AsyncEffect::SetEffectInstanceF32(slot, AsyncEffect::EFFECT_PARAM_LIFESPAN, 20.0f); u32 name = AsyncEffect::Spawn(slot); • stationary effect with 20 second life • name can be used to kill the effect
  • 58. • immediate – via global function table • deferred – command buffer • adhoc – PPU shims – direct data injection different mechanisms
  • 59. deferred • PPU shims – flags set in SPU update, querys/events triggered subsequently on PPU • command buffer – small buffer in LS filled with command specific byte-code, flushed to PPU • atomic allocators – relevant data structures packed on SPU, atomically inserted
  • 60. vec4 command buffer: swept sphere CMD_SWEPT_SPHERE vec4 point0 pad IGNORE_TRIS <job#><id> u32 u32 u32 point1
  • 61. command buffer: results frame n frame n+1 handle_base = ((stage << 5) | (job_number & 0x1f)) << 24; // … handle = handle_base + request++; frame n, stage 0, job 1 frame n, stage 1, job 1 frame n, stage 2, job 1 frame n, stage 0, job 2 offset table result
  • 62. direct data • patch into atomically allocated, double buffered data structures • instance allocates fresh instance each frame, forwards state between old and new • deallocation == stop code fragment • used for rigid body physics
  • 63. direct data SPU main memory new_ea = rigid body #0 rigid body #0 rigid body #1 rigid body #2 unallocated get(&ls_rigidbody, old_ea) update ls_rigidbody old_ea = new_ea put(&ls_rigidbody, new_ea) 1 2 3
  • 64. SPU API • SPU-API – mechanism to expose new functionality through to the AsyncMobyUpdate system – library of code fragment code and common data (“fixup”) – function pointer table (“interface”) oriented – hides immediate or deferred commands
  • 65. SPU API struct rcf_api_interface { u32 (*derived_from)(u32 update_class, u32 base_class); u32 (*add_bolts) (u32 update_class, u32 value); }; rcf_api_interface* rcf_api = gt->get_spu_api(“rcf2”); if(rcf_api->derived_from(inst->m_update_class, HERO_Ratchet_CLASS)) { rci_api->add_bolts(inst->m_update_class, 25); } cpp h
  • 66. example #1 • Jets – uses AsyncMobyUpdate along with an Update Class – packed instance array – code fragment per AI state – little initialization setup, state changes rare – events triggered using adhoc method • flags checked at pack/unpack time • inputs: – position/orientation – state info • output: – position/orientation
  • 67. example #2 • zombie limbs – guppy – not much update logic – direct data interface to rigid body physics • input: – position/orientation – animation joints – collision info • output: – packed rigid bodies
  • 69. porting code sucks • difficult to retrofit code • different set of constraints • expensive use of time • can result in over-abstracted systems – like, software cache for transparent pointers
  • 70. couple of tips struct foo_t : bar_t { // stuff u32 m_first; u32 m_second; u32 m_third; u32 m_fourth; // more stuff }; Separate interesting information from non-necessary data structures struct foo_info_t { u32 m_first; u32 m_second; u32 m_third; u32 m_fourth; }; struct foo_t : bar_t { // stuff foo_info_t m_info; // more stuff };
  • 71. couple of tips (cont)… struct baz_t { u32 m_foo; #ifdef PPU MobyInstance* m_moby; #else u32 m_moby_ea; #endif f32 m_scale; }; avoid pointer fixup, define pointer types to unsigned ints
  • 72. living with polymorphism • the described mechanisms have problems with virtual functions – would need to port and patch up all possible vtable destinations – would end up duplicating/shadowing virtual class hierarchy on SPU • could work, but we don’t do that
  • 73. compile-time polymorphism • do a trace of the class hierarchy: one code fragment per leaf class • separate base functions into .inc files • virtual functions selected through sequential macro define/undef pairs • not described: – deferred resolution of base function calls to derived function via preprocessor pass
  • 74. living with polymorphism (cont) #include "code_fragment_pickup.inl" #include "code_fragment_pickup_bolt.inl" pickupbolt_preupdate.cpp void base_on_pickup(CommonInfo* common, InstanceInfo* inst) { inst->m_base_info->m_spu_flags |= PICKUP_SPU_FLAGS_PICKED_UP; } #define ON_PICKUP(c,i) base_on_pickup(c, i) code_fragment_pickup.inl void bolt_on_pickup(CommonInfo* common, InstanceInfo* inst) { common->m_active_bolt_delta--; } #undef ON_PICKUP #define ON_PICKUP(c,i) bolt_on_pickup(c, i) code_fragment_pickup_bolt.inl
  • 75. living with polymorphism (cont) #include "code_fragment_pickup.inl" #include "code_fragment_pickup_bolt.inl" pickupbolt_preupdate.cpp void base_on_pickup(CommonInfo* common, InstanceInfo* inst) { inst->m_base_info->m_spu_flags |= PICKUP_SPU_FLAGS_PICKED_UP; } #define ON_PICKUP(c,i) base_on_pickup(c, i) code_fragment_pickup.inl void bolt_on_pickup(CommonInfo* common, InstanceInfo* inst) { common->m_active_bolt_delta--; } #undef ON_PICKUP #define ON_PICKUP(c,i) bolt_on_pickup(c, i) code_fragment_pickup_bolt.inl
  • 76. living with polymorphism (cont) #include "code_fragment_pickup.inl" #include "code_fragment_pickup_bolt.inl" pickupbolt_preupdate.cpp void base_on_pickup(CommonInfo* common, InstanceInfo* inst) { inst->m_base_info->m_spu_flags |= PICKUP_SPU_FLAGS_PICKED_UP; } #define ON_PICKUP(c,i) base_on_pickup(c, i) code_fragment_pickup.inl void bolt_on_pickup(CommonInfo* common, InstanceInfo* inst) { common->m_active_bolt_delta--; } #undef ON_PICKUP #define ON_PICKUP(c,i) bolt_on_pickup(c, i) code_fragment_pickup_bolt.inl
  • 77. design from scratch • parameterized systems – not specialized by code • use atomic allocators as programming interfaces • no virtual functions in update phase • separate perception, cognition, action • plan to interleave ppu/spu
  • 78. In conclusion • design up front for deferred, SPU-friendly systems. • don’t worry too much about writing optimized code, just make it difficult to write unoptimizable code • remember, this is supposed to be fun. SPU programming is fun.
  • 79. all pictures U.S. Fish & Wildlife Service except The Whale Fishery - NOAA National Marine Fisheries Service jurvetson@flickr “Swarm Intelligence” photo Thanks joe@insomniacgames.com