This work describes a novel, attention-based view of documents in a sender-receiver-model (author-reader-model). According to this attention-based view, the author of a document marks up relevant information on the document using conspicuous layout features. These layout features attract a reader's attention at first glance and enable him to efficiently extract relevant information from the document. This human mechanism is transferred into a new technical method, that localizes relevant information using exclusively the image of a scanned paper document, and that evaluates the relevance of one piece of information as compared to the others on the document. In connection with this transfer, three topics are of major interest: 1. As the reader's attention is drawn towards conspicuous, i.e. visually attractive document parts, a measure for the visual attractiveness of each object on the document's image is calculated. According to the basic idea, this measure of attractiveness is an implicit measure of relevance for the underlying content. 2. The psychological texton theory explains the human visual attention by texture perception, so the technical method determines conspicuous layout features by texture analysis of the document. For use in the technical context, this work describes the texton theory's notion of texture and its implications in a formal mathematical way. 2. When defining the measure of attractiveness, the subjective perception of attractiveness for different layout features is accounted for by using different weights. Initial values of these subjective weights are estimated from a survey amongst test persons. Test persons are also needed for the final test of the method, where the perceived attractiveness of all test documents' parts is compared to the calculated measure of attractiveness. The final test indicates a large correlation between the calculated measure of attractiveness and the human perception of attractiveness. Additionally, a key-sentence extraction algorithm based on the measure of attractiveness produces quite plausible summaries of the analysed document.
«
This work describes a novel, attention-based view of documents in a sender-receiver-model (author-reader-model). According to this attention-based view, the author of a document marks up relevant information on the document using conspicuous layout features. These layout features attract a reader's attention at first glance and enable him to efficiently extract relevant information from the document. This human mechanism is transferred into a new technical method, that localizes relevant informa...
»