Experimente und Beobachtungen / experiments and observations
Datentyp:
Texte / texts; Datenbanken / data bases
Anderer Datentyp:
Network traffic traces
Methode der Datenerhebung:
The traffic was collected in a four worker testbed setup. The workers were interconnected with a 10G Ethernet network via a single packet switch. Each worker was equipped with an Nvidia Tesla T4 GPU. Traffic traces were directly taken on the worker nodes. The models were trained for 20 epochs on the CIFAR-10 image dataset.
Beschreibung:
Network traffic collection (PCAP) of three widely-used state-of-the-art Distributed Machine Learning (DML) frameworks (Tensorflow, Horovod, KungFu). The collection contains distributed training runs of four models (MobileNetV2, ResNet50, Resnet101, DenseNet201) with varying configurations of the frameworks. Varied parameters are the communication topology and backend, the distributed optimizer, the batch size and the packet loss in the network.