ETTE: Efficient tensor-train-based computing engine for deep neural networks

Published in ISCA, 2023

Tensor-train (TT) decomposition enables ultra-high compression ratio, making the deep neural network (DNN) accelerators based on this method very attractive. TIE, the state-of-the-art TT based DNN accelerator, achieved high performance by leveraging a compact inference scheme to remove unnecessary computations and mem- ory access. However, TIE increases memory costs for stage-wise intermediate results and additional intra-layer data transfer, leading to limited speedups even the models are highly compressed. To unleash the full potential of TT decomposition, this paper pro- poses ETTE, an algorithm and hardware co-optimization framework for Efficient Tensor-Train Engine. At the algorithm level, ETTE proposes new tensor core construction and computation ordering mechanism to reduce stage-wise computation and storage cost at the same time. At the hardware level, ETTE proposes a lookahead- style across-stage processing scheme to eliminate the unnecessary stage-wise data movement. By fully leveraging the decoupled input and output dimension factors, ETTE develops an efficient low-cost memory partition-free access scheme to efficiently support the desired matrix transformation. We demonstrate the effectiveness of ETTE via implementing a 16- PE hardware prototype with CMOS 28nm technology. Compared with GPU on various workloads, ETTE achieves 6.5× – 253.1× higher throughput and 189.2× – 9750.5× higher energy efficiency. Compared with the state-of-the-art DNN accelerators, ETTE brings 1.1× – 58.3×, 2.6× – 1170.4× and 1.8× – 2098.2× improvement on throughput, energy efficiency and area efficiency, respectively.