![]() |
CUTLASS
CUDA Templates for Linear Algebra Subroutines and Solvers
|

Files | |
| file | default_gemv_core.h [code] |
| Defines basic properties needed by CTA-level batched GEMV assuming expectations about data layout of the global memory fragments, data types, and internal tile sizes. | |
| file | default_mma.h [code] |
| Template for a pipelined GEMM kernel. Does not compute batching or support split-K. | |
| file | default_mma_core.h [code] |
| Defines basic properties needed by CTA-level GEMMs assuming expectations about data layout of the global memory fragments, data types, and internal tile sizes. | |
| file | default_mma_core_simt.h [code] |
| Defines basic properties needed by CTA-level GEMMs assuming expectations about data layout of the global memory fragments, data types, and internal tile sizes. | |
| file | default_mma_core_sm50.h [code] |
| Defines basic properties needed by CTA-level GEMMs assuming expectations about data layout of the global memory fragments, data types, and internal tile sizes. | |
| file | default_mma_core_sm70.h [code] |
| Defines basic properties needed by CTA-level GEMMs assuming expectations about data layout of the global memory fragments, data types, and internal tile sizes. | |
| file | default_mma_core_sm75.h [code] |
| Defines basic properties needed by CTA-level GEMMs assuming expectations about data layout of the global memory fragments, data types, and internal tile sizes. | |
| file | default_mma_core_wmma.h [code] |
| Defines basic properties needed by CTA-level GEMMs assuming expectations about data layout of the global memory fragments, data types, and internal tile sizes. | |
| file | gemv.h [code] |
| Template for a threadblock-scoped GEMV kernel. | |
| file | mma_base.h [code] |
| Template for a double-buffered threadblock-scoped GEMM kernel. | |
| file | mma_pipelined.h [code] |
| Template for a double-buffered threadblock-scoped GEMM kernel. | |
| file | mma_singlestage.h [code] |
| Template for a double-buffered threadblock-scoped GEMM kernel. | |
| file | gemm/threadblock/threadblock_swizzle.h [code] |
| Implements several possible threadblock-swizzling functions mapping blockIdx to GEMM problems. | |
1.8.11