Variant Parallelism: Lightweight Deep Convolutional Models for Distributed Inference on IoT Devices

Asadi, Navidreza; Goudarzi, Maziar

doi:10.1109/jiot.2023.3285877

2024

Zurück
Zurück zum Anfang der Trefferliste
Dauerhafter Link zum angezeigten Objekt

Titel:: Variant Parallelism: Lightweight Deep Convolutional Models for Distributed Inference on IoT Devices
Dokumenttyp:: Zeitschriftenaufsatz
Autor(en):: Asadi, Navidreza; Goudarzi, Maziar
Abstract:: Two major techniques are commonly used to meet real-time inference limitations when distributing models across resource-constrained IoT devices: 1) model parallelism (MP) and 2) class parallelism (CP). In MP, transmitting bulky intermediate data (orders of magnitude larger than input) between devices imposes huge communication overhead. Although CP solves this problem, it has limitations on the number of submodels. In addition, both solutions are fault intolerant, an issue when deployed on edge devices. We propose variant parallelism (VP), an ensemble-based deep learning distribution method where different variants of a main model are generated and can be deployed on separate machines. We design a family of lighter models around the original model, and train them simultaneously to improve accuracy over single models. Our experimental results on six common mid-sized object recognition data sets demonstrate that our models can have 5.8× – 7.1× fewer parameters, 4.3× – 31× fewer multiply accumulations (MACs), and 2.5× – 13.2× less response time on atomic inputs compared to MobileNetV2 while achieving comparable or higher accuracy. Our technique easily generates several variants of the base architecture. Each variant returns only 2k outputs 1≤k≤(#classes/2) , representing Top−k classes, instead of tons of floating point values required in MP. Since each variant provides a full-class prediction, our approach maintains higher availability compared with MP and CP in presence of failure. «
Two major techniques are commonly used to meet real-time inference limitations when distributing models across resource-constrained IoT devices: 1) model parallelism (MP) and 2) class parallelism (CP). In MP, transmitting bulky intermediate data (orders of magnitude larger than input) between devices imposes huge communication overhead. Although CP solves this problem, it has limitations on the number of submodels. In addition, both solutions are fault intolerant, an issue when deployed on edge... »
Zeitschriftentitel:: IEEE Internet of Things Journal
Jahr:: 2024
Band / Volume:: 11
Heft / Issue:: 1
Seitenangaben Beitrag:: 345-352
Volltext / DOI:: doi:10.1109/jiot.2023.3285877
Verlag / Institution:: Institute of Electrical and Electronics Engineers (IEEE)
E-ISSN:: 2327-46622372-2541
Publikationsdatum:: 01.01.2024
Semester:: WS 23-24
BibTeX

Vorkommen:

mediaTUM Gesamtbestand Einrichtungen Schools TUM School of Computation, Information and Technology Departments Computer Engineering Kommunikationsnetze (Prof. Kellerer)2024