Web Structure

ILLC Course: Measuring information in very large data sets 

6 ECT, january 2017
Coordinator: Pieter Adriaans  
Requirements: we like the course to be self-contained, some background in information theory and/ore semantic web theory would help  (but if you don't like mathematics you'll probably have a hard time).  

Workforms: Lectures, guest lecturers, hands-on experiments on real data sets, workgroups, presentations, essay
Contributors: Frank van HarmelenPeter BloemLeen Torenvliet, Pieter Adriaans 

Frank van HarmelenPeter Bloem Leen TorenvlietPieter Adriaans

Background: the results of the Commit P23 subproject have shown that bringing together researchers from various disciplines like information theory, semantic web, and logic can lead to promising synergetic results. This course aims at students and researchers from the UvA, VU and ILLC to further explore these possibilities.

In this course we explore the interplay between recent developments in the field of information theory in the light of research issues generated by the study of extremely large data sets. Main themes: 1) Expanding the power of information theoretical measures that currently mainly deal with strings of messages or bits to cover higher dimensional data, sparse data, classes of languages, sets and graphs.  2) Analyze the behavior of various measures for complexity and meaningful information. 3) Practical application of these insights for efficient model selection in extremely large datasets.

Fundamental Research Questions:

Theoretical  Empirical
What is the interaction between computation
and information? 
 What are the principles that govern the emergent
structure of knowledge graphs?
 Is there a notion of information that unifies
current entropy based approaches
(Shannon, Kolmogorov)?
 How is the meaning of a knowledge graph
affected by its structure?
 What is meaningful information?   How do we measure information in
knowlegde graphs? 

Time & Place
We have two locations:
https://rooster.uva.nl/locationinfo/630GII213
https://rooster.uva.nl/locationinfo/630GII305

 The schedule for the classes is:

Week 1 (9-13 January):
Mon: 11:00-13:00 SP G3.05
Tue-Fri 15:00-17:00 SP G3.05 

Week 2 (16-20 January):
Mon: 11:00-13:00 SP G3.05
Tue & Wed 15:00-17:00 SP G2.13
Thu 15:00-17:00 SP G3.05
Fri 15:00-17:00 SP G2.13 

Classes

  • Short overview (both historical and systematical) of the concept of Information (See material in the Stanford Encyclopedia Lemma on Information (http://plato.stanford.edu/entries/information/). This overview culminates in a list of open problems in philosophy of information: http://plato.stanford.edu/entries/information/supplement.html
  • Current research issues in the study of the semantic web, with as main theme: Our theory is falling behind our technology. (See: https://sssw.org/2016/?page_id=386).  The aim is to show the connection between some of the open problems in philosophy of information and the current research issues.
  • Aspects of information theory: optimal code, bayes law, relation with entropy, randomness deficiency, incompressibility of the set of natural numbers. Two-part code optimization, structure functions. Normalized  Compression Distance. The aim is to show that the theory really works in practice, but that there are flaws
  • Presentation of some new ideas on information measurement and meaningful information, insofar as they clarify some of the problems: https://arxiv.org/abs/1611.07829.
  • Applications of the theory. See e.g. Thesis P.Bloem Chapter V.

 Week 3 and 4. 
Projects either empirical, working with real life datasets, or more theoretical, writing a paper about a selected subject. 

 Presentations:
Frank van Harmelen  
 Pieter Adriaans 

Standard reference texts: 

de la Higuera, Grammatical Inference, Cambridge UP. 2010.
Hopcroft, Motwani & Ullman, Introduction to Automata Languages, and Compuatation, Addison Wesley, snd ed. 2001. 
Cover end Thomas, Elements of Information Theory, Wiley, 2nd ed. 2006.
Li & Vitányi, An Introduction to Kolmogorov Complexity and its Applications, 3rd ed. Springer 2008. 
Grünwald, The Minimum Description Length Principle, MIT Press, 2007. 
Adriaans & Van Benthem, Philosophy of Information, North Holland, 2008. 
 
 References and Background materials