Skip to content

Synchronization for Collectives #3315

Answered by elstehle
osayamenja asked this question in CUB
Discussion options

You must be logged in to vote

Rather than synchronizing to reuse the shared temp storage, I allocate 64 temp storage objects and use one per inclusive scan, do I still need to synchronize prior to invoking each successive scan?

You do not need to __syncthreads() if the shared memory allocations passed to consecutive Block* algorithm invocations do not overlap. Keep in mind that excessive shared memory allocations may reduce occupancy, though. In most cases reducing occupancy will result in worse performance than invoking a __syncthreads().

What if one thread does some intermediate work before invoking the collective?

It should not be a problem as long as the data that a thread passes to a Block* algorithm is avail…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@osayamenja
Comment options

Answer selected by osayamenja
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
CUB
Labels
None yet
2 participants