181.130 Web Data Extraction and Integration
This course is in all assigned curricula part of the STEOP.
This course is in at least 1 assigned curriculum part of the STEOP.

2011W, VU, 2.0h, 3.0EC

Properties

  • Semester hours: 2.0
  • Credits: 3.0
  • Type: VU Lecture and Exercise

Aim of course

Information Extraction, Approaches, tools and methods for Wrapper Generation, Web Querying, Data Integration, XML.

Subject of course

- Information Extraction: Setting, History, IE vs. IR
- Structured Data Extraction and Wrapping
- XML Transformation and Query Languages, DOM, jQuery
- Web Wrapper Languages
- Wrapper Generation Tools
- Web Wrappers for Mashups, SOA and BI
- Inductive Wrapper Generation
- Automatic Data Extraction / Web Data Mining
- Supervised Wrapper Generation
- Deep Web Navigation Approaches
- Mediation and Integration Approaches
- Web Data Cleaning
- Lixto Visual Wrapper and Transformation Server

The course comprises both a lecture and an exercise part. The lecture part is primarily intended to teach about methodologies as well as to illustrate concepts from practice including system live demonstrations. The goal of the exercises is to strengthen the knowlege of the participants, especially including practical usage of tools in the area of web data extraction. At the end of the course, student group talks will cover further aspects in more detail. One meeting will be devoted to give an overview about current (applied) research projects at DBAI to give a short glimpse on novel research in this area.

Additional information

ECTS-Breakdown:
lectures: 14 hours
discussion of the exercises: 8 hours
exercises: 26 hours
final project/exam: 27 hours
total: 75 hours (3 ECTS)

 

Selected Fridays 16:00-19:00 (Exercise Unit 1 16:00-17:00, Lecture 17:00-18:00, Exercise Unit 2 18:00-19:00), first meeting on 7th of October 16:00 to 17:30 (A and B), in total nine sessions. Details on lecture Web page http://www.dbai.tuwien.ac.at/staff/baumgart/exin1112/.
Course Language: English
Supervised labs with tutor.

Planned days atm: 7/10, 21/10, 4/11, 11/11, 18/11, 2/12, 16/12, 13/1, 20/1 (27/1)

Lecturers

  • Baumgartner, Robert

Institute

Course dates

DayTimeDateLocationDescription
Fri16:00 - 19:0007.10.2011 - 27.01.2012EI 2 Pichelmayer HS - ETIT Web Data Extraction and Integration
Web Data Extraction and Integration - Single appointments
DayDateTimeLocationDescription
Fri07.10.201116:00 - 19:00EI 2 Pichelmayer HS - ETIT Web Data Extraction and Integration
Fri14.10.201116:00 - 19:00EI 2 Pichelmayer HS - ETIT Web Data Extraction and Integration
Fri21.10.201116:00 - 19:00EI 2 Pichelmayer HS - ETIT Web Data Extraction and Integration
Fri28.10.201116:00 - 19:00EI 2 Pichelmayer HS - ETIT Web Data Extraction and Integration
Fri04.11.201116:00 - 19:00EI 2 Pichelmayer HS - ETIT Web Data Extraction and Integration
Fri11.11.201116:00 - 19:00EI 2 Pichelmayer HS - ETIT Web Data Extraction and Integration
Fri18.11.201116:00 - 19:00EI 2 Pichelmayer HS - ETIT Web Data Extraction and Integration
Fri25.11.201116:00 - 19:00EI 2 Pichelmayer HS - ETIT Web Data Extraction and Integration
Fri02.12.201116:00 - 19:00EI 2 Pichelmayer HS - ETIT Web Data Extraction and Integration
Fri09.12.201116:00 - 19:00EI 2 Pichelmayer HS - ETIT Web Data Extraction and Integration
Fri16.12.201116:00 - 19:00EI 2 Pichelmayer HS - ETIT Web Data Extraction and Integration
Fri30.12.201116:00 - 19:00EI 2 Pichelmayer HS - ETIT Web Data Extraction and Integration
Fri13.01.201216:00 - 19:00EI 2 Pichelmayer HS - ETIT Web Data Extraction and Integration
Fri20.01.201216:00 - 19:00EI 2 Pichelmayer HS - ETIT Web Data Extraction and Integration
Fri27.01.201216:00 - 19:00EI 2 Pichelmayer HS - ETIT Web Data Extraction and Integration

Examination modalities

On the one hand, course assessment is based on individual exercises, group exercises and the presentation thereof during the semester, on the other hand on a final project at semester end, in which a particular topic is elaborated as paper as well as being presented and discussed.

Course registration

Registration modalities

Lecture subscription via group registration for exercises in TISS. In case you are an ECML student who can not yet officially register via the system please subscribe to the exercises via email instead. A Groups: 16:00 to 17:00, B Groups: 18:00-19:00.

Group Registration

GroupRegistration FromTo
Gruppe A 116.08.2011 00:0006.10.2011 00:00
Gruppe A 216.08.2011 00:0006.10.2011 00:00
Gruppe A 301.10.2011 00:0007.10.2011 00:00
Gruppe A 416.08.2011 00:0006.10.2011 00:00
Gruppe A 516.08.2011 00:0006.10.2011 00:00
Gruppe A 601.09.2011 00:0007.10.2011 00:00
Gruppe B 116.08.2011 00:0006.10.2011 00:00
Gruppe B 216.08.2011 00:0006.10.2011 00:00
Gruppe B 316.08.2011 00:0006.10.2011 00:00
Gruppe B 416.08.2011 00:0006.10.2011 00:00
Gruppe B 516.08.2011 00:0006.10.2011 00:00

Curricula

Literature

Please refer to lecture slides.

Previous knowledge

Basic knowledge HTML and XML

Continuative courses

Miscellaneous

  • Attendance Required!

Language

English