Title | Applications and Techniques on the Road to Exascale Computing |
Publication Type | Book Chapter |
Year of Publication | 2012 |
Authors | Karwacki M. , Stpiczyński P |
Editor | De Bosschere K. , D`Hollander E.H. , Joubert G.R. , Padna D. , Peters F. , Sawyer M. |
Book Title | Advances in Parallel Computing |
Volume | 22 |
Chapter | Improving performance of triangular Matrix-Vector BLAS routines on GPUs. |
Publisher | IOS Press |
ISBN Number | 978-1-61499-040-6 |
Abstract | CUBLAS is a widely used implementation of BLAS (Basic Linear Algebra Subprograms) for NVIDIA CUDA Graphical Processing Units (GPUs). The aim of this paper is to show that the performance of the selected Level 2 BLAS routines for working with triangular matrices can be improved using some optimization techniques suitable for GPUs like using shared memory and coalesced memory access. We present new implementation of the routines xTRMV and xTRSV. The results of experiments carried out on two GPU architectures: Tesla M2050 and GeForce GTX 260 show that these new implementations are up to 500% faster than corresponding routines from CUBLAS Library. |