Relationformer: A Unified Framework for <i>Image-to-Graph</i> Generation

Shit, Suprosanna; Koner, Rajat; Wittmann, Bastian; Paetzold, Johannes; Ezhov, Ivan; Li, Hongwei; Pan, Jiazhen; Sharifzadeh, Sahand; Kaissis, Georgios; Tresp, Volker; Menze, Bjoern

doi:10.1007/978-3-031-19836-6_24

journal article

Titel:: Relationformer: A Unified Framework for Image-to-Graph Generation
Dokumenttyp:: Proceedings Paper
Autor(en):: Shit, Suprosanna; Koner, Rajat; Wittmann, Bastian; Paetzold, Johannes; Ezhov, Ivan; Li, Hongwei; Pan, Jiazhen; Sharifzadeh, Sahand; Kaissis, Georgios; Tresp, Volker; Menze, Bjoern
Abstract:: A comprehensive representation of an image requires understanding objects and their mutual relationship, especially in image-to-graph generation, e.g., road network extraction, blood-vessel network extraction, or scene graph generation. Traditionally, image-to-graph generation is addressed with a two-stage approach consisting of object detection followed by a separate relation prediction, which prevents simultaneous object-relation interaction. This work proposes a unified one-stage transformer-based framework, namely Relationformer that jointly predicts objects and their relations. We leverage direct set-based object prediction and incorporate the interaction among the objects to learn an object-relation representation jointly. In addition to existing [obj]-tokens, we propose a novel learnable token, namely [ rln]-token. Together with [obj]-tokens, [rln]-token exploits local and global semantic reasoning in an image through a series of mutual associations. In combination with the pair-wise [obj]-token, the [rln]-token contributes to a computationally efficient relation prediction. We achieve state-of-the-art performance on multiple, diverse and multi-domain datasets that demonstrate our approach's effectiveness and generalizability. «
A comprehensive representation of an image requires understanding objects and their mutual relationship, especially in image-to-graph generation, e.g., road network extraction, blood-vessel network extraction, or scene graph generation. Traditionally, image-to-graph generation is addressed with a two-stage approach consisting of object detection followed by a separate relation prediction, which prevents simultaneous object-relation interaction. This work proposes a unified one-stage transformer-... »
Zeitschriftentitel:: Med Image Comput Comput Assist Interv Int Conf Med Image Comput Comput Assist Interv
Jahr:: 2022
Band / Volume:: 13697
Seitenangaben Beitrag:: 422-439
Volltext / DOI:: doi:10.1007/978-3-031-19836-6_24
Print-ISSN:: 0302-9743
TUM Einrichtung:: Institut für KI und Informatik in der Medizin (Prof. Rückert)
BibTeX

Vorkommen:

mediaTUM Gesamtbestand Hochschulbibliographie 2022 Schools und Fakultäten Medizin Institut für Medizinische Statistik und Epidemiologie

mediaTUM Gesamtbestand Einrichtungen Schools TUM School of Medicine and Health Departments Clinical Medicine Institut für KI und Informatik in der Medizin (Prof. Rückert)2022

Relationformer: A Unified Framework for Image-to-Graph Generation