Text Indexing with Errors

Moritz G. Maass; Johannes Nowak

Benutzer: Gast

2005

Zurück
Zurück zum Anfang der Trefferliste
Dauerhafter Link zum angezeigten Objekt

Wenn Sie Schwierigkeiten haben, das Dokument zu öffnen, versuchen Sie auch bitte diesen Link

Titel:: Text Indexing with Errors
Dokumenttyp:: Technical Report
Autor(en):: Moritz G. Maass; Johannes Nowak
Abstract:: In this paper we address the problem of constructing an index for a text document or a collection of documents to answer various questions about the occurrences of a pattern when allowing a constant number of errors. In particular, our index can be built to report all occurrences, all positions, or all documents where a pattern occurs in time linear in the size of the query string and the number of results. This improves over previous work where the look-up time was either not linear or depended upon the size of the document corpus. Our data structure has size O(n log^d n) on average and with high probability for input size n and queries with up to d errors. Additionally, we present a trade-off between query time and index complexity that achieves worst-case bounded index size and preprocessing time with linear look-up time on average. «
In this paper we address the problem of constructing an index for a text document or a collection of documents to answer various questions about the occurrences of a pattern when allowing a constant number of errors. In particular, our index can be built to report all occurrences, all positions, or all documents where a pattern occurs in time linear in the size of the query string and the number of results. This improves over previous work where the look-up time was either not linear or depended... »
Stichworte:: Algorithms; Data Structures; Text Indexing; Dictionary Indexing; Hamming Distance; Edit Distance; Approximate Pattern Matching; Information Retrieval
Jahr:: 2005
Jahr / Monat:: 2005-03-01 00:00:00
Seiten/Umfang:: 29
BibTeX

Vorkommen:

mediaTUM Gesamtbestand Einrichtungen Schools TUM School of Computation, Information and Technology Technische Berichte 2005