DiMO-Sparse: Differentiable Modeling and Optimization of Sparse CNN Dataflow and Hardware Architecture
Published in DATE, 2024
Many real-world CNNs exhibit sparsity, a character- istic that has primarily been utilized in manual design processes and has received little attention in existing automatic optimization techniques. To the best of our knowledge, this paper presents the first systematic investigation of automatic dataflow and hardware optimization for sparse CNN computation. A differentiable PPA (Power Performance Area) model incorporating stochastic model- ing of sparse CNN workloads is developed to enable fast nonlinear optimization solving and massively parallel local search-based discretization. Experimental results on public domain testcases demonstrate the efficacy of the proposed approach, achieving an average of 5× and 10× better PPA than the previous work for two different sparsity patterns.