Multi-level Analysis of GPU Utilization in ML Training Workloads - Département Microélectronique Accéder directement au contenu
Communication Dans Un Congrès Année : 2024

Multi-level Analysis of GPU Utilization in ML Training Workloads

Debjyoti Bhattacharjee
  • Fonction : Auteur
Simei Yang
  • Fonction : Auteur
Diksha Moolchandani
  • Fonction : Auteur
  • PersonId : 1368576
Francky Catthoor
  • Fonction : Auteur
  • PersonId : 1086572
David Novo

Résumé

Training time has become a critical bottleneck due 100% to the recent proliferation of large-parameter ML models. GPUs continue to be the prevailing architecture for training ML models. However, the complex execution flow of ML frameworks makes it difficult to understand GPU computing resource utilization. Our main goal is to provide a better understanding of how efficiently ML training workloads use the computing resources of modern GPUs. To this end, we first describe an ideal reference execution of a GPU-accelerated ML training loop and identify relevant metrics that can be measured using existing profiling tools. Second, we produce a coherent integration of the traces obtained from each profiling tool. Third, we leverage the metrics within our integrated trace to analyze the impact of different software optimizations (e.g., mixed-precision, various ML frameworks, and execution modes) on the throughput and the associated utilization at multiple levels of hardware abstraction (i.e., whole GPU, SM subpartitions, issue slots, and tensor cores). In our results on two modern GPUs, we present seven takeaways and show that although close to 100% utilization is generally achieved at the GPU level, average utilization of the issue slots and tensor cores always remains below 50% and 5.2%, respectively.
Fichier principal
Vignette du fichier
Delestrac 2024 Multilevel.pdf (908.04 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04523554 , version 1 (30-03-2024)

Identifiants

  • HAL Id : hal-04523554 , version 1

Citer

Paul Delestrac, Debjyoti Bhattacharjee, Simei Yang, Diksha Moolchandani, Francky Catthoor, et al.. Multi-level Analysis of GPU Utilization in ML Training Workloads. 2024 Design, Automation & Test in Europe Conference (DATE 2024), Mar 2024, Valencia (Espagne), Spain. ⟨hal-04523554⟩
0 Consultations
1 Téléchargements

Partager

Gmail Facebook X LinkedIn More