A Minimal Model for Compositional Generalization on gSCAN

Hein, Alice; Diepold, Klaus

Title:: A Minimal Model for Compositional Generalization on gSCAN
Document type:: Konferenzbeitrag
Author(s):: Hein, Alice; Diepold, Klaus
Pages contribution:: 1-15
Abstract:: Whether neural networks are capable of compositional generalization has been a topic of much debate. Most previous studies on this subject investigate the generalization capabilities of state-of-the-art deep learning architectures. We here take a more bottom-up approach and design a minimal model that displays generalization on a compositional benchmark, namely, the gSCAN dataset. The model is a hybrid architecture that combines layers trained with gradient descent and a selective attention mechanism optimized with an evolutionary strategy. The architecture has around 60 times fewer trainable parameters than models previously tested on gSCAN, and achieves comparable accuracies on most test splits, even when trained only on a fraction of the dataset. On adverb to verb generalization accuracy, it outperforms previous approaches by 65 to 86%. Through ablation studies, neuron pruning, and error analyses, we show that weight decay and attention mechanisms facilitate compositional generalization by encouraging sparse representations divorced from irrelevant context. We find that the model’s sample efficiency can mainly be attributed to its selective attention mechanism. «
Whether neural networks are capable of compositional generalization has been a topic of much debate. Most previous studies on this subject investigate the generalization capabilities of state-of-the-art deep learning architectures. We here take a more bottom-up approach and design a minimal model that displays generalization on a compositional benchmark, namely, the gSCAN dataset. The model is a hybrid architecture that combines layers trained with gradient descent and a selective attention mech... »
Editor:: Association for Computational Linguistics
Book / Congress title:: Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP
Congress (additional information):: Abu Dhabi, United Arab Emirates (Hybrid)
Year:: 2022
Year / month:: 2022-12
Month:: Dec
Pages:: 15
Reviewed:: ja
Language:: en
Publication format:: WWW
WWW:: https://preview.aclanthology.org/emnlp-22-ingestion/2022.blackboxnlp-1.1.pdf
BibTeX

Occurrences:

mediaTUM Gesamtbestand Hochschulbibliographie 2022 Schools und Fakultäten Elektrotechnik und Informationstechnik Datenverarbeitung (Prof. Diepold)

mediaTUM Gesamtbestand Einrichtungen Schools TUM School of Computation, Information and Technology Departments Computer Engineering Datenverarbeitung (Prof. Diepold)