Tutorial - Accessing the Semantic Web via Statistical Machine Learning

A half-day tutorial at ESWC2012, Heraklion, Greece. Registration can be done via the ESWC2012 registration.


Monday, May 28th, 2012 | 09:30 – 12:30


Room Clio


The traditional means of extracting information from the Web is via keyword search. The Semantic Web adds linked structured information which can support search of annotated documents but also enables the user to query abstracted information, supported by reasoning. In this tutorial we suggest that a third mean to access Web information is via statistical machine learning which abstracts information by exploiting statistical regularities. The increase of interest in statistical machine learning techniques has arisen largely due to the open, distributed and inherently incomplete nature of the semantic web. In this tutorial we will describe our work on statistical machine learning for the Semantic Web pursued in the German THESEUS project and in the EU FP7 project LarKC. In those projects, a scalable machine learning approach has been developed that is appropriate to the high-dimensional, sparse, and noisy data one encounters on the Semantic Web. The approach is based on matrix and tensor factorization and has shown good performance on a number of semantic web data sets. Extensions have been developed that model temporal effects and sequences and can exploit ontological background and textual information. The approach was employed in the winning entry in the ISWC 2011 Semantic Web Challenge.


The audience will learn about the statistical perspective on machine learning for the Semantic Web. Emphasis is put on scalability and performance. Statistical machine learning complements ontology learning, which is a more formal machine learning approach for Semantic Web learning. The audience should be familiar with the basic concepts of the Semantic Web. Some basic background in machine learning would be helpful.



Here is the current slides (size 2MB).


Yi Huang is a staff scientist at Siemens Corporate Research and Technology in Munich and is finishing his Ph.D. at the Ludwig Maximilian University of Munich, Germany. His research interests focus on statistical machine learning, text mining, information retrieval, and the Semantic Web. He received a Diploma in computer science at the Ludwig Maximilian University of Munich. He is involved in the EU FP7 LarKC project and in the THESEUS program, the Internet of Services, funded by Federal Ministry of Economics and Technology of Germany. His main contributions in these projects are the development of learning approaches for the Semantic Web and on their applications to large data sets.

Volker Tresp received a Diploma degree from the University of Göttingen, Germany, in 1984 and the M.Sc. and Ph.D. degrees from Yale University, New Haven, CT, in 1986 and 1989 respectively. Since 1989 he is the head of a research team in machine learning at Siemens, Corporate Research and Technology. In 1994 he was a visiting scientist at the Massachusetts Institute of Technology's Center for Biological and Computational Learning. He filed more than 54 patent applications and was inventor of the year of Siemens in 1996. He has published more than 100 scientific articles and administered more than 10 Ph.D. theses. The company Panoratio is a spin-off out of his team. He has been involved in all leading programme committees in machine learning. He is coordinating the Siemens effort in the nationally funded project THESEUS for the development of the potential of semantic, multimedia and learning technologies and has lead the machine learning efforts in the EU FP7 project LarKC (2008-2011) on scalable reasoning and machine learning for the Semantic Web. In 2011 he became a Professor at the Ludwig Maximilian University of Munich. He has presented tutorials at the Machine Learning Summer School 2006 in Canberrra, Australia, at ICML 2009 and at ILP 2010.


Statistical Machine Learning
[1] Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data mining, Inference and Prediction. Springer (2nd Ed.) (2009)
[2] Duda. R., Hart, P., Stork, D.: Pattern Classification. Wiley (2000)
[3] Bishop, C.: Pattern Recognition and Machine Learning. Springer (2007)
[4] Han, J., and Kamber, M.: Data Mining: Concepts and Techniques. (2nd Ed.), Morgan Kaufmann (2005)
[5] Mitchell, T.: Machine Learning. McGraw-Hill
Overviews on Machine Learning for the Semantic Web
[6] Balduini, M., Celino, I., Dell’Aglio, D., Valle, E.D., Huang, Y., Lee, T., Kim, S.H., Tresp, V.: Reality mining on micro-post streams: Deductive and inductive reasoning for personalized and location-based recommendations. Submitted (2012)
[7] Tresp, V., Bundschus, M., Rettinger, A., Huang, Y.: Towards machine learning on the semantic web. Uncertainty Reasoning for the Semantic Web I Lecture Notes in AI, Springer (2008)
[8] D’Amato, C., Fanizzi, N., Esposito, F.: Inductive learning for the semantic web: What does it buy? Semantic Web, 1 (2010)
[9] Berendt, B., Hotho, A., Stumme, G.: Towards semantic web mining. ISWC (2002)
SUNS and Extensions
[10] Tresp, V., Huang, Y., Bundschus, M., Rettinger, A.: Materializing and querying learned knowledge. In: Proceedings of the First ESWC Workshop on Inductive Reasoning and Machine Learning on the Semantic Web (2009)
[11] Fensel, D., van Harmelen, F., Andersson, B., Brennan, P., Cunningham, H., Valle, E.D., Fischer, F., Huang, Z., Kiryakov, A., il Lee, T.K., Schooler, L., Tresp, V., Wesner, S., Witbrock, M., Zhong, N.: Towards larkc: A platform for web-scale reasoning. In: ICSC (2008)
[12] Huang, Y., Bundschus, M., Tresp, V., Rettinger, A., Kriegel, H.P.: Multivariate structured prediction for learning on the semantic web. In: Proceedings of the 20th International Conference on Inductive Logic Programming (ILP) (2010)
[13] Huang, Y., Nickel, M., Tresp, V., Kriegel, H.P.: A scalable kernel approach to learning in semantic graphs with applications to linked data. In: Proceedings of the 1st Workshop on Mining the Future Internet (2011)
[14] Huang, Y., Tresp, V., Nickel, M., Rettinger, A., Kriegel, H.P.: A scalable approach for statistical learning in semantic graphs. Submitted (2012)
[15] Jiang, X., Huang1, Y., Nickel, M., Tresp, V.: Exploiting information extraction, reasoning and machine learning for relation prediction. Submitted (2012)
[16] Tresp, V., Huang, Y., Jiang, X., Rettinger, A.: Graphical models for relations - modeling relational context. In: International Conference on Knowledge Discovery and Information Retrieval (2011)
Tensor Approaches
[17] Nickel, M., Tresp, V., Kriegel, H.P.: A three-way model for collective learning on multirelational data. In: Proceedings of the 28th International Conference on Machine Learning (2011)
[18] Franz, T., Schultz, A., Sizov, S., Staab, S: Triplerank: Ranking semantic web data by tensor decomposition. The Semantic Web-ISWC (2009)
Other approaches to learning on the Semantic Web (not covered in the tutorial)
[19] Kiefer, C., Bernstein, A., Locher, A.: Adding data mining support to sparql via statistical relational learning methods. In ESWC 2008. Springer-Verlag (2008)
[20] Lehmann, J.: DL-learner: Learning concepts in description logics. JMLR