Bitte warten...
Bitte warten...
English
Hilfe
Login
Forschungsportal
Suche
Forschungsprofile
Forschungsprojekte
Projektvollmacht
Lehre
Forschung
Organisation
GraphWrap - Graph-Based Wrapping from PDF Documents
01.05.2008 - 28.02.2010
Forschungsförderungsprojekt
This project aims to investigate a new method of supervised information extraction from unstructured documents such as PDF files. It builds upon our achievements and knowledge derived from the NextWrap project, where we devised a basic graph structure to represent the physical objects on the page. In GraphWrap we plan to investigate the use of graph matching techniques to wrap data directly from this graph structure, instead of from an intermediary representation. This brings many tangible benefits: First, it enables the purely geometric as well as the logical structure to be used for locating instances of data to be wrapped. Secondly, it is far more intuitive for the user to interact with, giving the impression of ¿wrapping directly on the document¿. Thirdly, it is not as rigid, enabling the document understanding process to be partly influenced during wrapping. In this way, we plan to overcome the greatest limitations of current PDF wrapping approaches, which use an intermediate representation. The main contributions of this work are as follows: * A suitable graph representation of a PDF file, enhanced from our current representation to include logical as well as geometric relations between nodes * A suitable error-tolerant graph matching algorithm, which can locate the desired instances on the document in suitable time, and is invariant to common document structure changes * An intuitive prototype user interface, using both rendered and graph views of the page, where the user can select desired example instances, fine-tune wrapper parameters and be given immediate feedback on the result, allowing user interaction through the system to be researched.
Personen
Projektleiter_in
Reinhard Pichler
(E184)
Projektmitarbeiter_innen
Tamir Hassan
(E184)
Institut
E184 - Institut für Informationssysteme
Förderungsmittel
FFG - Österr. Forschungsförderungs- gesellschaft mbH (National)
Österreichische Forschungsförderungsgesellschaft mbH (FFG)
Forschungsschwerpunkte
Information and Communication Technology
Schlagwörter
Deutsch
Englisch
graph matching
graph matching
Informationsextraktion
Information Extraction
Publikationen
Publikationsliste