Blocked all-pairs shortest paths algorithm on Intel Xeon Phi KNL processor : a case study

Detalles Bibliográficos
Autor Principal: Rucci, Enzo
Otros autores o Colaboradores: De Giusti, Armando Eduardo, Naiouf, Ricardo Marcelo
Formato: Capítulo de libro
Lengua:español
Temas:
Acceso en línea:Consultar en el Cátalogo
Resumen:Manycores are consolidating in HPC community as a way of improving performance while keeping power efficiency. Knights Landing is the recently released second generation of Intel Xeon Phi architec- ture.While optimizing applications on CPUs, GPUs and first Xeon Phi’s has been largely studied in the last years, the new features in Knights Landing processors require the revision of programming and optimization techniques for these devices. In this work, we selected the Floyd-Warshall algorithm as a representative case study of graph and memory-bound ap- plications. Starting from the default serial version, we show how data, thread and compiler level optimizations help the parallel implementation to reach 338 GFLOPS.
Notas:Formato de archivo PDF. -- Este documento es producción intelectual de la Facultad de Informática - UNLP (Colección BIPA/Biblioteca)
Descripción Física:1 archivo (1,4 MB)

MARC

LEADER 00000naa a2200000 a 4500
003 AR-LpUFIB
005 20250311170453.0
008 230201s2017 xx r 000 0 spa d
024 8 |a DIF-M7770  |b 7988  |z DIF007100 
040 |a AR-LpUFIB  |b spa  |c AR-LpUFIB 
100 1 |a Rucci, Enzo 
245 1 0 |a Blocked all-pairs shortest paths algorithm on Intel Xeon Phi KNL processor :  |b a case study 
300 |a 1 archivo (1,4 MB) 
500 |a Formato de archivo PDF. -- Este documento es producción intelectual de la Facultad de Informática - UNLP (Colección BIPA/Biblioteca) 
520 |a Manycores are consolidating in HPC community as a way of improving performance while keeping power efficiency. Knights Landing is the recently released second generation of Intel Xeon Phi architec- ture.While optimizing applications on CPUs, GPUs and first Xeon Phi’s has been largely studied in the last years, the new features in Knights Landing processors require the revision of programming and optimization techniques for these devices. In this work, we selected the Floyd-Warshall algorithm as a representative case study of graph and memory-bound ap- plications. Starting from the default serial version, we show how data, thread and compiler level optimizations help the parallel implementation to reach 338 GFLOPS. 
534 |a Congreso Argentino de Ciencias de la Computación (23ro : 2017 : La Plata, Argentina) 
650 4 |a PROCESADORES GRÁFICOS (GPUs) 
653 |a Xeon Phi 
700 1 |a De Giusti, Armando Eduardo 
700 1 |a Naiouf, Ricardo Marcelo 
942 |c CP 
952 |0 0  |1 0  |4 0  |6 A0926  |7 3  |8 BD  |9 82346  |a DIF  |b DIF  |d 2025-03-11  |l 0  |o A0926   |r 2025-03-11 17:04:53  |u http://catalogo.info.unlp.edu.ar/meran/getDocument.pl?id=1671  |w 2025-03-11  |y CP 
999 |c 56875  |d 56875