After successful completion of the course, students are able to:
(1) Write domain specifications in RDF-S, SHACL, and in the OWL profiles.
(2) Given such a specification and a knowledge graph, choose a suitable validation or inference algorithm, and mention an existing engine for the task at hand. For example inputs of moderate size, the student will be able to perform the validation manually on paper.
(3) Compare the semantics of partial-closed formalisms like SHACL with that of open-world formalisms like OWL, and explain the consequences of these different semantic assumptions, both in terms of the resulting inference regimes, as in terms of the cost of testing entailment and evaluation.
(4) Explain the basic algorithms for evaluating SPARQL queries (in well-formed monotone SPARQL fragments) over knowledge graphs.
(5) Given an inconsistent knowledge graph, list its repairs and compute the answers to a given query over standard variations of the repair semantics.
(6) Given a description of a domain and of some possibly heterogenous incomplete data sources, write an OBDA specification to construct a virtual knowledge graph.
(7) Given a query and an OBDA specification, explain how to compute the answers over the represented virtual knowledge graphs.
The course studies several semantic technologies and the way they can be used for integrating and accessing data, especially data that cannot be easily handled with legacy techniques because it may be incomplete, inconsistent, or heterogeneous, and expensive to integrate and maintain.
We will study specification languages like RDF-S, SHACL, and the OWL profiles, as well as the SPARQL query language. These formalisms are studied in some detail, comparing their abstract syntax, their semantic assumptions, and their core algorithms for validation and query evaluation. We will see how RDF-S, SHACL, and OWL can be used to validate graph data, and to obtain useful knowledge graphs from data that may be incomplete and heterogeneous. We will study how these graphs can be queried in (fragments of) the SPARQL query language, some of the algorithmic and computational challenges that result from different choices of formalisms, and solutions for querying both virtual and inconsistent knowledge graphs.
Detailed contents:
(1) Specification languages for graph data semi-structured data: RDF-S, SHACL, OWL profiles
- Abstract syntax of the formalisms
- Semantics, with emphasis on assumptions about data (in)completeness
(2) Validation and inference in knowledge graphs
- Validation and inference tasks
- Algorithms for validation and inference
- Detecting and repairing inconsistencies in data
- The computational cost of validation and inference in the different formalisms
(3) Querying knowledge graphs
- Foundations of SPARQL and fragments
- Query evaluation in graphs with knowledge and inference
(4) Querying inconsistent knowledge graphs
- Repairs, inconsistency tolerant query semantics
- Algorithms and complexity
(5) Ontology-based data integration and access
- The OBDA paradigm and virtual knowledge graphs
- Query rewriting
- The OBDA pipeline