Talis Consultancy
World leading expertise in Linked Data and the Semantic Web

Category: Ontology Writing

Barriers to ontology reuse

confusion © 2009 Tim Hodson

Recent work on the British Library’s bibliographic data model has given me some examples of identifying appropriate usage of properties in the Dublin Core and ISBD vocabularies. These were the vocabularies I was using, but you can generalise these examples too.

I should point out that this post is written from the point of view of a developer wanting to work out whether the use of a property or class found in a vocabulary is pertinent to a particular use case. I am assuming no prior knowledge of the debate surrounding the creation of these vocabularies, and realise that there will be many of you who can cite good reasons for decisions taken. I have also tried to keep my general dislike of record centric data models out of this post (I am not even going to mention that there may be some issues with the existence of ISBD properties in the first place).

As already eloquently demonstrated on the foaf wiki, the dcterms:creator definition is somewhat inconsistant leaving room for misinterpretation. The documentation appears to contradict itself, and suggests that both literals and resources are suitable as values for the property.

This leads to a mixture of implementations. In the British Library’s model, a decision was taken to treat the value as a resource. This makes the most sense for lots of reasons, as it allows our data to continue growing, rather than be stuck at a ‘literal’ dead end. We cannot say more things about a literal.

From the Dublin Core Terms example, we learn that when defining a property (or class), we should use clear definitions that use unambiguous wording.

The ISBD element sets have committed these ‘sins’ which make it very hard to choose to work with such a vocabulary.

  • Fundamental problem: the HTTP uris for the properties and classes do not resolve to anything. This means that I as a developer cannot look up any useful definition of the property. This may change in the future, but doesn’t help me now…
  • Fundamental problem: The names of the properties are alphanumeric codes. This is compounded by issue 1, which means I cannot lookup any definition of this property.
  • What information I can find via a google search leads me to a confusing metadata registry.

I am told that the ISBD property names are named with codes so as to render them language neutral. In the context of the semantic web, where language plays an important part in defining what a term means, it seems rather obtuse to hide that meaning behind a code.

Even if the property name itself is named in English, multiple labels can be given to that term in as many languages as necessary. Then, by making sure that the URI for the property dereferences to some useful data, anyone can easily find out what the term means by looking at whichever language they are comfortable with. There is also a possibility that a term in another language may have a subtly different meaning which should be expressed in that language. Again, use of multilingual labels can be used to make it clear what the differences are and why.

My basic message is: don’t make it so hard for people to find out how to use your vocabulary or ontology. And if it is hard, you might ask yourself whether it makes sense to model the vocabulary or ontology in the way that you have.

Specific lessons to learn:

  • Make sure that your properties and classes are clearly defined and use unambiguous wording.
  • Use sensible and descriptive names for your properties and classes.
  • Use additional labels with appropriate language types to make your ontology multilingual.
  • Make sure that the URIs for your properties and classes resolve to the definitions of those properties and classes (use RDF).
  • Don’t hide your vocabularies in complex registries, just publish them as a document, or series of documents.

I am aware that I haven’t mentioned domains and ranges. These are used to add extra descriptive information about the types of thing to be found on the left and right of properties. Sometimes this is a good thing, and sometimes this can add too many ‘restrictions’, although they are not truly restrictions, as you would be asserting a fact that doesn’t make sense, but in RDF it is still a valid fact. I might deal with this issue in another post.

If you want to know more, then why not try one of Talis Consulting’s training courses. We have an open course coming up in November, or we can run bespoke training tailored to your organisation’s needs.

Ontologies wont make you rich: or will they?

This post sets out some discussion points that arose in response to a conversation with +Aaron Bradley on Google+. The conversation was prompted by Kendall Clark’s post which started by suggesting “an OWL ontology is like a public API for your data”. Aaron suggested that his OWL ontology may need to remain private in order to retain competitive advantage.

There is no value in writing ontologies that are not shared. If you describe your own data in your own way without sharing that ontology, how will you ever find other data that you could mix into yours at a later date?

The counter argument is that the data within your organisation is disparate and needs to be organised, but you don’t want to give away your secrets as to how you have organised your data. I am not about to claim that Linked open Data is the only way to do Linked Data. Linked Data within an organisation will allow data integration across departments to happen more easily.

But the ontology is not core to this. It is the way you can combine data with shared URIs that use open ontologies that is the killer feature. So if you want to protect anything, then you may want to protect those URIs. Now that we are talking about URIs we have already moved the discussion into the data layer rather than the ontology layer, and you’re still able to protect your data even if people know what ontologies you’re using.

An ontology is not going to give you a competitive advantage. Your advantage will be what you do with the data, not how the data is described. No-one to my knowledge has made a business out of trading database schema; but when they trade well curated data, there is money to be made.

If more than one organisation uses the same ontologies to describe two different datasets, then that ontology has started to create a data market where those two organisations can trade their data without prohibitive data integration overheads. Sharing your ontology helps you to grow your market.

If you are interested in having your data easily available via a public API, you will find that publishing your data as Linked Data, because it can be published with both a Human friendly HTML face and machine friendly RDF face, transforms your website into your API. There are standard techniques that can then be applied to monetize your data streams, and this may even include a paywall.

Of course you might use OWL, or some part of OWL, to describe how your data is structured, but if you need APIs built on top, then a Linked Data approach is proving to be a simple way to achieve both those aims in one go, surely that is more cost effective?

In summary: The data layer is where your competitive advantage sits. The ontology layer is the bit of the Linked Data ecosystem that is going to add value to your data through ontology re-use making your data easier to integrate, both internally and externally, and growing your market. Your API (either internal or external) can be built easily using a Linked Data approach.

If you want to know more about how re-usable ontologies can grow your market, then talk to us.

Ontology Writing: Sense and Sensibility

Linked Data - Photo credit © 2010 Tim Hodson

Photo credit © 2010 Tim Hodson

One of the things we do for our customers is write vocabularies and descriptive schema, which are both types of ontology to some extent. In this post I am exploring the difference between descriptive schema and conceptual schema, and the ways in which a descriptive schema can be validated through good old common sense.

I think many ontologies fall into the camp of conceptual schema, i.e. they try to overlay on reality a set of concepts that the schema writer chooses to express. Often these concepts will be drawn from a set of competency questions, these questions may be enough to generate an ontology, but is that ontology a good ontology? That is a difficult question to answer, and depends what your criteria are.

Many ontologies are written to provide a framework for some application to consume or validate data, but a descriptive schema is more than that, although it applies the conceptual framework to the things being described, it does this in a way that creates a picture of what the thing is.

I have been asked on several occasions to explain how to evaluate a schema. The question is often phrased as:

“How do I know that my schema is right?”

When modelling data for a specific domain, the test I use the most is a test for sensibility, i.e. an indication that the assertion made is likely to be feasible and something which is not nonsensical. This is not a mathematical test, but a common sense test with elements of reasoning (by a human).

The data I am creating contains triples. The triple I am writing is saying something. The triple is asserting some fact. Is that assertion sensible?

How to test sensibility? Some basic questions will help us do this.

  • Is the scenario in the assertion always the case?
  • What to do with the outliers?
  • Is the assertion the simplest way to say something?
  • Is the assertion the only way to say something?
  • If we say something using a particular property, what do the domains and ranges of the property imply?

If the answers to questions like these are sensible answers, then I would conclude that my schema was sensible. The answers to these and other domain specific questions test our understanding of how the model we are using describes the data.

This post is intended to demonstrate a pragmatic way of verifying that a descriptive schema makes sense. The answers to these questions may also lead to the addition of some additional properties, classes or subclasses to better describe the domain.

Such questioning could also be used to raise identify issues around the completeness and accuracy of the data being described. Looking for outliers will identify any inconsistencies in your data, and looking for the best way to describe something may identify gaps in your data.

But what about formally describing the logic that governs how things interrelate? It would be perfectly possible to write a schema using OWL which can include all the rules and restrictions that you like. But you still have to apply the ‘sensible test’ to any formal logically constructed ontology. In this case the sensibility test is executed using a reasoner, and then reading the results to see if they make sense.

I’m sure this post will raise further questions. If you think that this approach to writing ontologies is something you want to know more about, why not leave a comment or drop me an email? tim.hodson@talis.com