Self-supervised pre-training on unlabeled images has shown promising results in the medical domain. Recently, methods using text-supervision from companion text like radiological reports improved upon these results even further. However, most works in the medical domain focus on image classification downstream tasks and do not study more localized tasks like semantic segmentation or object detection. We therefore propose a novel evaluation framework consisting of 18 localized tasks, including semantic segmentation and object detection, on five public chest radiography datasets. Using our proposed evaluation framework, we study the effectiveness of existing text-supervised methods and compare them with image-only self-supervised methods and transfer from classification in more than 1200 evaluation runs. Our experiments show that text-supervised methods outperform all other methods on 13 out of 18 tasks making them the preferred method. In conclusion, image-only contrastive methods provide a strong baseline if no reports are available while transfer from classification, even in-domain, does not perform well in pre-training for localized tasks.
«
Self-supervised pre-training on unlabeled images has shown promising results in the medical domain. Recently, methods using text-supervision from companion text like radiological reports improved upon these results even further. However, most works in the medical domain focus on image classification downstream tasks and do not study more localized tasks like semantic segmentation or object detection. We therefore propose a novel evaluation framework consisting of 18 localized tasks, including se...
»