Information Extraction, Approaches, tools and methods for Wrapper Generation, Web Querying, Data Integration, XML.
- Information Extraction: Setting, History, IE vs. IR
- Structured Data Extraction and Wrapping
- XML Transformation and Query Languages
- Web Wrapper Languages
- Wrapper Generation Tools
- Web Wrappers for Mashups, SOA and BI
- Inductive Wrapper Generation
- Automatic Data Extraction / Web Data Mining
- Supervised Wrapper Generation
- Deep Web Navigation Approaches
- Data Extraction from PDF documents
- Mediation and Integration Approaches
- Web Data Cleaning
- Lixto Visual Wrapper and Transformation Server
The course comprises both a lecture and an exercise part. The lecture part is primarily intended to teach about methodologies as well as to illustrate concepts from practice including system live demonstrations. The goal of the exercises is to strengthen the knowlege of the participants, especially including practical usage of tools in the area of web data extraction. At the end of the course, student group talks will cover further aspects in more detail. One meeting will be devoted to give an overview about current (applied) research projects at DBAI to give a short glimpse on novel research in this area.
Selected Fridays 15:30-19:00 (Exercise Unit 1 15:30-16:20, Lecture 16:30-17:30, Exercise Unit 2 17:40-18:30), preliminary meeting on 1st of October 15:30 to 16:30 (A and B), in total nine sessions. Details on lecture Web page http://www.dbai.tuwien.ac.at/staff/baumgart/exin1011/.
Course Language: English
Supervised labs with tutor.