![]() |
CUTLASS
CUDA Templates for Linear Algebra Subroutines and Solvers
|
Compacted thread map in which the 4D region is contiguous.
#include <output_tile_thread_map.h>
Public Types | |
| using | Shape = Shape_ |
| using | Iterations = OutputTileShape< Detail::RowArrangement::kIterationsColumn, Detail::RowArrangement::kIterationsRow, Detail::kIterationsGroup, Detail::kIterationsCluster, 1 > |
| using | Delta = OutputTileShape< Detail::RowArrangement::kDeltaColumn, Detail::RowArrangement::kDeltaRow, Detail::kCompactedDeltaGroup, Detail::kCompactedDeltaCluster, 1 > |
Static Public Member Functions | |
| static CUTLASS_HOST_DEVICE MatrixCoord | initial_offset (int thread_idx) |
| Function to compute each thread's initial offset. More... | |
Static Public Attributes | |
| static int const | kElementsPerAccess = ElementsPerAccess |
| Number of elements within each vector access. More... | |
| static int const | kThreads = Threads |
| Number of threads. More... | |
| using cutlass::epilogue::threadblock::OutputTileOptimalThreadMap< Shape_, Count_, Threads, ElementsPerAccess, ElementSize >::CompactedThreadMap::Delta = OutputTileShape< Detail::RowArrangement::kDeltaColumn, Detail::RowArrangement::kDeltaRow, Detail::kCompactedDeltaGroup, Detail::kCompactedDeltaCluster, 1> |
| using cutlass::epilogue::threadblock::OutputTileOptimalThreadMap< Shape_, Count_, Threads, ElementsPerAccess, ElementSize >::CompactedThreadMap::Iterations = OutputTileShape< Detail::RowArrangement::kIterationsColumn, Detail::RowArrangement::kIterationsRow, Detail::kIterationsGroup, Detail::kIterationsCluster, 1> |
| using cutlass::epilogue::threadblock::OutputTileOptimalThreadMap< Shape_, Count_, Threads, ElementsPerAccess, ElementSize >::CompactedThreadMap::Shape = Shape_ |
|
inlinestatic |
|
static |
|
static |
1.8.11