Todas las entradas de: jogracia

Semantic Web: science or engineering?

iswc2011_van_harmelen_universal_01_Page_01“There is little doubt that the Semantic Web is an engineering success. […] However, as a scientific field, have we discovered any general principles? Have we uncovered any universal patterns that give us insights into the structure of data, information and knowledge?” These and similar questions were posed by Frank van Harmelen (University of Amsterdam) in his awesome keynote speech at ISWC 2011 in Bonn (Germany).

In his presentation, van Harmelen played with the idea of considering research in Semantic Web as a proper science and imagined which laws about the information universe could be derived from our study, just like laws in the physical universe.

I choose this topic to initiate a series of sessions that we are starting at the Ontology Engineering Group, called “Reading Club Over Coffee meetings“. The aim of these sessions is to discuss informally any interesting paper, blog entry, or videocast related either to our particular research topics or to science or research in general. These short meetings are monthly and take place on Fridays right after lunch with no fixed duration (well, the duration of one or two coffees).

1922949009_7036dcd738_mSo, around a table, with warm coffee in front of us, Filip Radulovic, Almudena Ruiz-Iniesta, Alejandro Llaves, Idafen Santana, Pablo Calleja, Freddy Priyatna, Lupe Aguado, María Poveda, and myself met for one hour and debated informally around van Harmelen’s ideas. As chair of the session, my only duty was to select some questions and to articulate the debate around them:

1. What is the MOST interesting, appealing, motivating, idea contained in the talk?
2. Do you DISAGREE with something in the talk?
3. What can be the IMPACT of this view in your own research?

I have to say that the debate was exciting and had very inspired contributions from everyone. Actually it was still alive some days later in some other coffee talks! In the rest of this post I will try to summarize the main debated arguments, although (sorry about that) omitting many details.

In general, we all agreed that one of the most interesting aspects of van Harmelen’s proposal is the view of the SW study as a science, and the mental exercise that he did of taking its first 10 years of history as an “experiment“. What could be inferred if the experiment could be repeated again and again? María and Lupe liked particularly Frank’s consideration of vocabularies as “pillar” in the SW.

In the talk, a few laws were proposed (eg.: factual knowledge is a graph, conceptual knowledge a hierarchy, and the former is much bigger than the latter). Some of us thought, however, that these are not proper “laws”: this is how the SW looks like here and now, maybe it is not repeatable always. Notice, though, that not even van Harmelen considered them as laws strictly speaking, it’s just a mental exercise! In fact, “to derive a law one experiment is not enough” as Filip stated. Alejandro disagreed in particular with the law of “publication distributed/computation centralized”, as nowadays computation is moving towards being highly decentralized as well.

The biggest debate came around the notion of engineering vs. science. At the extremes were Filip (“Computer Science is not a science”) and Idafen (“Computer Science in general and SW in particular are science, since you can apply the scientific method to them”), and the rest situated all along this spectrum. From María: “not everything in Computer Science is science, e.g. programming”. Freddy: “people created SPARQL first as engineering (to solve a problem), and science came later (studying query complexity and so on)”.

In general we all agreed on the fact that “THERE IS science in Computer Science and Semantic Web” (e.g., information theory). But we did not reach an agreement on whether Semantic Web IS a science or not. The debate still continues… However we all have the perception that most of our work in the area have been about technical issues rather than about inferring laws, thus confirming van Harmelen’s initial intuition.

Finally we asked ourselves about the impact of van Harmelen’s view in our future work. There was no clear answer in general, although the talk invited us to be more conscious about the division between theory and practice, which could have an impact in the way we communicate our research results. In addition, I missed more work in our area that treats the SW as an object of study itself, as an ecosystem that evolves and is subject of certain laws (or regularities, better said). A good analogy to this would be the work done by Barabasi when studying Internet and the Web as “scale free networks”.

The next session of our reading club (Filip will chair) will be titled “What do you think about that, Siri?” and will treat the work of Douglas Hofstadter. Looking forward to it!

Multilingual Linked Data: the discussion continues

If you are familiarised with the idea of the Semantic Web, you probably are also aware of the difficulties of its practical realisation. A relevant one is the presence of language barriers between semantic information expressed in different languages. Precisely, multilingualism and Semantic Web is one of the research topics in which we are currently involved at OEG. Of course, linguistic barriers are not exclusive of the Semantic Web but of the Web in general. And to treat such topic (Multilingualism on the Web), the Multilingual Web initiative dedicates a series of workshops since a few years.

A few weeks ago I attended the W3C Multilingual Web Workshop in Rome, at FAO. It was the first time I went to one of these MW workshops  and it was a very nice experience, I have to say. There I had the opportunity of co-chairing a session on  Best practices for Multilingual Linked Open Data jointly with Dominic Jones (Trinity College Dublin) and José Labra (University of Oviedo). I will try to summarise the experience in a few lines.

From previous events, such as MLODE workshop in Leipzig (Sep 2012), it was clear the interest of the community around the topic of multilingualism in Linked Data generation. But also, as it was seen in the discussions that followed José Labra’s talk in Leipzig, the lack of consensus  was evident in many aspects, such as the use of URIs vs IRIs, opaque URIs vs descriptive URIs, the scope of language tags, the role of content negotiation, etc. Still with the feeling of an unfinished discussion after Leipzig I proposed to the organisers of the Multilingual Web Workshop to celebrate a sort of panel to continue this discussion in Rome, and this took the form of the breakout session that Dom, José and myself coordinated.

We started the session in Rome with a set of really interesting lightning talks (Ivan Herman, Gordon Dunsire, Daniel Vila,  Dave Lewis, Charles McCathie Nevile, Roberto Navigli, Haofen Wang). They told us about their particular experiences, ranging from bibliographic standards to Chinese LOD generation, and pointed out common issues when dealing with multilingualism in Linked Data.

Then, the discussion session followed, with a lot of interaction between speakers and public. It was mostly focused on three topics: naming (URIs), labelling, and linking of multilingual content in LD.  There was a general feeling that IRIs (internationalized resource identifiers) are cool but that their use is hampered by the lack of support given by current tools. Regarding language tags, it was agreed that they should be used always; although, sadly, this best practise is rarely followed by semantic data providers. Also the participants commented on the necessities and difficulties of linking vocabularies in different languages, and on the fact that links others than owl:sameAs have to be further explored. Finally, it was pointed out the necessity of defining suitable use cases for multilingual Linked Data in order to guide our future discussions on best practises.

Although our initial intention was to write a kind of white paper with the conclusions of the session, it turned out to be a too optimistic idea: after the session there were too many open issues remaining and too few agreements made. Nevertheless, following Felix Sasaki’s suggestion, we agreed that our discussions would continue in the context of a new W3C Community Group.

And here we are, launching a new W3C group “Best Practices for Multilingual Linked Open Data“, and hoping that many interested people contribute to it! Let me paste here the group description:

The target for this group is to crowd-source ideas from the community regarding best practises for producing multilingual linked open data. The topics for discussion are mainly focused on naming, labelling, interlinking, and quality of multilingual linked data, among others. Use cases will be identified to motivate discussions. Participation both from academia and industry is expected. The main outcome of the group will be the documentation of patterns and best practices for the creation, linking, and use of multilingual linked data.

So, if you have research or practical interests on the matter, feel free to join and enjoy it!

Starting

So, this is my first blog entry! Just to say HELLO, and to announce my intention of writing something in these pages periodically.  This will be about computers in general and Semantic Web in particular, although any topic will be possible! I will write either in (my broken) English or in Spanish, depending on the mood and target audience, I guess.

See you!