Researchers at the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) and MIT-IBM Watson AI Lab recently Hardware-Aware Transformers (HAT), an AI model training technique that incorporates Google’s architecture. They claim that HAT can achieve a 3 times inferencing speedup on devices like the Raspberry Pi 4 while reducing model size by 3.7 times compared with a baseline.
Google’s Transformer is widely used in natural language processing (and even some ) tasks because of its cutting-edge performance. Nevertheless, Transformers remain challenging to deploy on edge devices because of their computation cost; on a Raspberry Pi, translating a sentence with only 30 words requires 13 gigaflops (one billion floating-point operations per second) and takes 20 seconds. This obviously limits the architecture’s usefulness for developers and companies