Talis Consultancy
World leading expertise in Linked Data and the Semantic Web

Archive for 2011

Åke Nygren Previews a Session on the Mobile Revolution

online11 Ake Åke Nygren from Stockholm Public Library talks with Richard Wallis about the session at the Online Information Conference 2011, which he is moderating – The Mobile Revolution: Opening Up A New World Of Possibilities.

Their discussions ranges from Åke‘s background in Swedish libraries, including the creation of a new library, through to his engagement with several digital initiatives, before exploring the themes of his session.

Library of Congress To Boldly Voyage To Linked Data Worlds

The Library of Congress made an announcement earlier this week that has left some usually vocal library pundits speechless.

Roy Tennant (rtennant) on Twitter

 

 

loc_logo_detail

MARC is Dead!RDA made irrelevant! – cries that can be heard rattling around the bibliographic blogo-twittersphere.   My opinion is that this is an inevitable move based upon serious consideration, and has been building on several initiatives that have been brewing for many months.

Bold though – very bold.  I am sure that there are many in the library community, who have invested much of their careers in MARC and its slightly more hip cousin RDA, who are now suffering from vertigo as they feel the floor being pulled from beneath their feet.

The Working Group of the Future of Bibliographic Control, as it examined technology for the future, wrote that the Library community’s data carrier, MARC, is “based on forty-year-old techniques for data management and is out of step with programming styles of today.”

Many of the libraries taking part in the test [of RDA] indicated that they had little confidence RDA changes would yield significant benefits…

 

And on a more positive note:

The Library of Congress (LC) and its MARC partners are interested in a deliberate change that allows the community to move into the future with a more robust, open, and extensible carrier for our rich bibliographic data….
….The new bibliographic framework project will be focused on the Web environment, Linked Data principles and mechanisms, and the Resource Description Framework (RDF) as a basic data model.

There is still a bit of confusion there between a data carrier and a framework for describing resources.  Linked Data is about linking descriptions of things, not necessarily transporting silos of data from place to place.  But maybe I quibble a little too much at this early stage.

So now what:

The Library of Congress will be developing a grant application over the next few months to support this initiative.  The two-year grant will provide funding for the Library of Congress to organize consultative groups (national and international) and to support development and prototyping activities.  Some of the supported activities will be those described above:  developing models and scenarios for interaction within the information community, assembling and reviewing ontologies currently used or under development, developing domain ontologies for the description of resources and related data in scope, organizing prototypes and reference implementations.

I know that this is the way that LoC and the library community do things, but do I hope that this doesn’t mean that they will disappear into an insular huddle for a couple of years to re-emerge with something that is almost right yet missing some of the evolution that is going on around them over that period.

As per other recent announcements, such as the vote to openly share European Libraries’ data, the report from the W3C’s Library Linked Data Incubator Group, and now the report from the Stanford Linked Data Workshop.  I welcome these developments. However I warn those involved that these are great opportunities [to enable the valuable resources catalogued and curated by libraries over decades to become foundational assets of the future web] that can be easily squandered by not applying the open thinking that characterise successes in the web of data.

British Library Data Model One very relevant example of the success of applying open thinking and approach to the bibliographic word using Linked Data is the open publishing of the British National Bibliography (BnB).  Readers of this blog will know that we at Talis have worked closely with the team at the BL in their ground breaking work.   The data model they produced is an example of one of those things that may induce that feeling of vertigo that I mentioned.  It doesn’t look much like a MARC record!  I can assure the sceptical that although it may be very different to what you are used to, it is easy to get your head around.  (Drop us a line if you want some guidance).

As we host the BnB Linked Data for the BL, I can testify to the success of this work – only launched in mid July.  It’s use is growing rapidly, receiving just short of 2 million hits in the last month alone.

With the British Library, along with the National Libraries of Canada and Germany, being quoted as partners with the LoC in this initiative, plus their work being referenced as an exemplar in the other reports I mention, I hold out a great hope that things are headed in the right direction.

As comments to some of my previous posts attest, there is concern from some in the community of domain experts, that this RDF stuff is too simple and light-weight and will not enable them capture the rich detail that they need.  They are missing a couple of points.  Firstly, it is this simplicity that will help non-domain experts to understand, reference and link to their rich resources.  Secondly, RDF is more than capable of describing the rich detail that they require – using several emerging ontologies including the RDA ontology, FRBR, etc.  Finally and most importantly, it is not a binary choice between widely comprehended simplicity and and domain specific detailed description.   The RDF for a resource can, and probably should, contain both.

So Library of Congress, I welcome your announcement and offer a friendly reminder that you not only need to draw expertise from the forward thinking library community, but also from the wider Linked Data world.  I am sure your partners from the British Library will reinforce this message.

Craig Newmark talks about Effective Social Media: Past, Present and Future

Craig_Newmark

online11 In this podcast Craig Newmark, the opening keynote speaker for the Online Information Conference 2011, previews his presentation.

After sharing an overview of his background, Craig talks about the inception and growth of what became a poster-child of the web, craigslist and the ambitions of his more recent venture, to connect the world for the common good, craigconnects.org.

We discuss how today’s social media has echoes from the past, from the times of Gutenberg and St Paul, before moving on to speculate on the future impacts of today’s emerging influences.

Making Open Data and A Public Data Corporation Real

In August of 2011 the Cabinet Office and Department for Business, Innovation and Skills issued two public consultation papers, one entitled Making Open Data Real: A Public Consultation and the second entitled A Consultation on Data Policy for a Public Data Corporation.

Talis has responded to both and we wanted to share our responses with you and welcome discussion on the comments here.

The Making Open Data real consultation questions provide a good framework for structuring the conversation around how best to make Open Data real for the UK. We have provided specific answers to the questions in the attached PDF of our response and felt a summary of the recurring themes would be useful here.

We believe there is a great deal of opportunity presented by HM Government publishing data for re-use by individuals and companies alike. These opportunities fall into several key categories:

• Transparency

• Informed Choice

• Efficiencies

• Innovation

All of these agenda for open data are important and all have similar requirements in order to make them successful.

1 — Publish Data

Data that is published openly is far more usable than data that has to be requested. Often people won’t know what to request or what might be available and often the time delay between requesting and receiving data is off-putting.

2 — License Openly

An ecosystem based on data requires certainty of licensing in order to make use of the data without fear. Provide clear and unambiguous licensing of all published data to support experimentation. This licensing must allow commercial exploitation of the data if we are to see investment made in new businesses.

3 — Remove Barriers

Use of data is often experimental; it is often an exploration to find an answer. That journey can happen much faster if there are fewer hurdles in the way. Any process that prevents direct and immediate access to the raw data should be avoided.

These criteria are common to all of the agenda that people pursue around Open Data and can be summarised as:

Give people unfettered access to the raw data to with as they please.

Some folks have been more concerned about the second consultation, that dealing with the creation of a Public Data Corporation or PDC.

We believe that the creation of a PDC has the potential to significantly simplify access to government data and make it possible for many more individuals and companies to make use of it. In that, however, there is risk. By making the PDC and its parts accountable for establishing “sustainable business models” we risk continuing the status quo in which licensing fees, restrictive licenses and lengthy processes make it impossible to innovate on this data.

It is possible with the use of simple technologies and techniques to publish data at little cost and we would like to see options explored for a PDC that is accountable for a low-cost model in which data is available as cheaply as possible and without restriction. Such a model would promote innovation and make the UK a leader in data exploitation.

1 — Charging for PDC Data

We believe that there are sufficient cost-savings and increases in productivity that would come from freely releasing government data that it would be possible to afford a no-charge model for PDC data.

This may take time to achieve and requires changes to the way many parts of the PDC would operate but is inline with the stated objective of delivering more data for free year- on-year.

It is important that any charging model for PDC data is built to support the increasing release of data for free not work against it.

2 — PDC commercially exploiting data itself

This presents a very real conflict of interest in which those inside the PDC have a much better opportunity to build businesses on top of government data than those outside. It also presents a concern for those wanting to innovate as there is a conflict of interest within the PDC when hearing new ideas for the commercial use of PDC data. This should be avoided.

3 — Licensing

All but one of the options discussed for licensing present significant complexity for consumers of the data. If we wish to stimulate innovation then licenses that require a defined use up-front will prove limiting and any licensing regime that requires consumers to seek legal advice will present a substantial barrier to use.

If the PDC and its parts are charged with developing commercial supply of this data then this is likely to include terms that prevent the re-distribution of this data. These again will severely limit the ways in which the data can be used to create new and innovative businesses.

We hope that these consultations will continue a discussion about the potential benefits of opening up government data and encourage you to comment below and to link to your own responses if you responded to the consultations also.

We’ve also been contributing to a response from the Linked Data community. I’ll update with a link to that once it’s published.

Here are the Talis Group responses in full:

Talis Group Response to Making Open Data Real A Public Consultation (PDF)

Talis Group Response to A Consultation on Data Policy for a Public Data Corporation (PDF)

 

W3C Library Linked Data Final Report Published

w3c_home The W3C Library Linked Data Incubator Group has published it’s Final Report after a year of deliberation.

The mission of the Library Linked Data Incubator Group was to help
increase the global interoperability of library data on the Web by
focusing on the potential role of Linked Data technologies.

This report contains several messages that are not just interesting and relevant for the Linked Data enthusiast in the library community. It contains some home truths for those in libraries who think that a slight tweak to the status quo, such as adopting RDA, will be sufficient to keep libraries [data] relevant in the rapidly evolving world of the web.

On the NGC4LIB mailing list, Eric Lease Morgan picked out some useful quotes from the report:

  • Linked Data is not about creating a different Web, but rather about enhancing the Web through the addition of structured data.
  • By promoting a bottom-up approach to publishing data, Linked Data creates an opportunity for libraries to improve the value proposition of describing their assets.
  • Linked Data may be a first step toward a “cloud-based” approach to managing cultural information, which could be more cost-effective than stand-alone systems in institutions.
  • With Linked Open Data, libraries can increase their presence on the Web, where most information seekers can be found.
  • The use of the Web and Web-based identifiers will make up-to-date resource descriptions directly citable by catalogers.
  • History shows that all technologies are transitory, and the history of information technology suggests that specific data formats are especially short-lived.
  • Library developers and vendors will directly benefit from not being tied to library-specific data formats.
  • Most information in library data is encoded as display-oriented, natural-language text.
  • Work on library Linked Data can be hampered by the disparity in concepts and terminology between libraries and the Semantic Web community.
  • Relatively few bibliographic datasets have been made available as Linked Data, and even less metadata has been produced for journal articles, citations, or circulation data.
  • A major advantage of Linked Data technology is realized with the establishment of connections between and across datasets.
  • Libraries should embrace the web of information, both by making their data available for use as Linked Data and by using the web of data in library services. Ideally, library data should integrate fully with other resources on the Web, creating greater visibility for libraries and bringing library services to information seekers.

Also, from the report:

Relatively few bibliographic datasets have been made available as Linked Data, and even less metadata has been produced for journal articles, citations, or circulation data — information which could be put to effective use in environments where data is integrated seamlessly across contexts. Pioneering initiatives such as the release of the British National Bibliography reveal the effort required to address challenges such as licensing, data modeling, the handling of legacy data, and collaboration with multiple user communities. However, these also demonstrate the considerable benefits of releasing bibliographic databases as Linked Data. As the community’s experience increases, the number of datasets released as Linked Data is growing rapidly.

Talis Consulting has been closely and actively involved in the modelling, data transformation, publishing, and hosting of the British National Bibliography (BnB) as Linked Data.  A great overview of the approach taken to modelling of bibliographic data in a way that makes it easily compatible with the wider Web of Data, is provided by Tim Hodson in his post – British Library Data Model: Overview.  As can bee seen from their work, the modelling used for the BnB differs from the approach taken by many attempting to publish bibliographic data as Linked Data – it describes the resources (the books, authors, publishers, etc.)  as people, places, events, and things, as against attempting to represent the records that libraries keep about their stock of resources.

With intentions to release open library data specifically mentioning Linked Data, the sentiments from this report are already influencing the wider forward thinking library community.  I will leave the last word to the report’s final paragraph which some, in the traditional record-based cataloguing community, may have difficulty in getting their head around.  I encourage them to look at libraries from the point of view of the wider [non-library] web consumers, and read it again.

One final caveat: data consumers should bear in mind that, in contrast to traditional, closed IT systems, Linked Data follows an open-world assumption: the assumption that data cannot generally be assumed to be complete and that, in principle, more data may become available for any given entity. We hope that more “data linking” will happen in the library domain in line with the projects mentioned here.

The Tyranny of Time

A guest post by Lawrence Serewicz, Principal Information Management Officer, Durham County Council

I came across the following reference to time within the retail sector and it made me consider how my world of local government, or any business for that matter, thinks about time.

An old saying in the retail industry is that: ‘If information is available monthly, then decisions taken will take 6 months to have an effect. If it is available weekly, then decisions take a month to influence outcomes; if daily, it takes a week; and if hourly, the decisions can have an impact the next day’ (p.13)

(Source : Valuing Information as an Asset http://www.sas.com/reg/gen/uk/valuing-information )

How often do we collect data? In many organisations, there are quarterly returns, but is that enough for today’s services? In some cases, councils collect real-time data, but are their reporting systems ready for it? For example, management or cabinet committee, meetings may be once a month, but is that enough to have a strategic view of what is happening within an organisation?

At one level, the timeframe for Council Members is different because their work is strategic, they are trying to shape the organisation’s future and where it will be over the long term and not determining if the recent refuse collection achieved 99% or 98% effectiveness. Even if we discount the member’s need to have real time data (at least a strategic level) and focus on the manager/officer’s role, we still see the tyranny to time.

How often do we see, use, or for that matter, analyse, real-time data? Do our performance management systems display a disconnect between the timescale within which they are collected and reported? We may have refuse bin collection rates measured every day, but if our performance reporting within the organisation is quarterly, how well does that serve the organisation? At the same time, is that performance information available to the services, such as customer service desks?

In this example, if the real-time performance is being reported to the customer service desk, they can see that the bin collection rate on a snowy day (for example) is lagging in some areas, but is still robust in other areas. Thus, a call from an area with good collection (say 99%) is going to be a different issue than a missed collection in an area with an 80% collection rate because of the snow conditions. Yet, how many performance management systems or performance information systems are designed to capture and analyse real time data. Even weekly data, can be considered real time data depending on the service, so it raises the point at the start. If the information is only available quarterly, what is the impact rate? If you collect each quarter, is the final impact seen yearly or in two years? If that is the timescale, is it going to be effective?

What does this have to do with open data? If data is being collected and made available to customers and the public, are they getting real time data or is there an organisation influenced lag effect on the data? One of the main themes within the UK  government’s open data collection consultation (http://data.gov.uk/opendataconsultation) as well as its overall transparency agenda is to open service performance information to the public.

The service performance information will inform their choice about services but also to hold it to account. Yet, if there is a lag effect, between when service information is collected and published can the public hold a local authority to account effectively? How much and when the information is released can have a large influence on whether an organisation is accountable. If it only has to report once a year, how much accountability can be achieved? If a change in performance is required, how will it be demonstrated in such a long reporting cycle?

If, however, real time data is released, will that have a destabilizing effect on the political process? If the political process is relying on quarterly performance reporting and the public are getting the information in real time, how will elected members be able to respond? Moreover, if members, as residents, are consuming the information in real time as well, what is the role of a quarterly performance reporting system?  To be sure there will be different reports for different issues, but the underlying question is how to make open data respond to real time demand.  Do I need to know the car park was full last week if I am trying to get parked now?

The issue of time is also about how and where information is released. If an organisation releases its performance statistics in a paper report, and not as a spreadsheet, can external scrutiny be achieved?  In that sense, the format for publication will show the timescales. Such reporting has an immediate and direct effect on the ability of the public, and members, to hold the organisation to account.

At the same time, there is the question of whether real time reporting fits your strategy. If one company is working on a the day to day reporting and another is taking a ten year strategy to grow they will have different understandings of time.  Moreover, their reporting mechanisms will be different.  Yet, can the 10 year plan work without taking care of the day to day? In that sense, can anyone escape the tyranny of time?  The more your competitors harness, the more you will need to adapt or adopt.

From an accountability perspective, the issue may be simply finding a way to reconcile that with monthly or quarterly performance reporting to the real time data.

What effect this will have on the way we operate in the public and private sectors?  Only time will tell.

Talis Consulting +1

I arrived in Birmingham and joined the Talis Consulting Team about three weeks ago now (time flies!), so it’s about time I introduced myself to the wider world. I have been closely involved with the Semantic Web since early 2004, when I started my PhD at the Digital Enterprise Research Institute (DERI) in Galway, Ireland. Back then this idea of semantics and structured data on the Web was still new, very much academic, and considered to be mostly arcane and irrelevant by the majority of the Web community. Terms like “Linked Data” or “Open Data” hadn’t even been coined yet, and wide-spread adoption was still far beyond the horizon. It was a great time to work in academia and start a PhD in this emerging field, and DERI was a fantastic place to do it.

Now, almost eight years later, the idea that semantics and structured data matter has started to stick – just have a look at last week’s blog post about schema.org, or think about the way Open Data has not only established itself as a hot topic in many countries, but has in fact become a wide-spread policy. So, after spending many years on the academic side of things, getting a PhD a long the way, and watching the web slowly embrace these “weird” new ideas and technologies, I felt that maybe this was the perfect time to switch over to the industry side of things. I enjoy building things, seeing something take shape, and now that more and more serious Linked Data and semantics-related projects are being started, I can apply my know-how and help to bring them to life. Or maybe people have an idea for a project, but don’t yet know which direction to take, or are unsure which approach might work for them? I should be able to use my experience to help them figure out which way to go. Maybe an organisation just wants to learn about Linked Data, and how they can benefit from it? Again, my time in academia has helped me to communicate things like that.

Talis has been very active and visible in the Linked Data community for several years now, and has in fact established itself as one of the leading players in the field. I had already co-operated with several people from Talis over the years, writing papers, organising community events or running projects. Some time last year then I heard about the plans for a new consulting team, which sounded like the perfect environment to do the things outlined above – teach, design and create solutions with a wide range of different clients in the Linked Data and semantics space. Talis looked more and more like the ideal place for me to go. Fast forward to the present day, and here I am with my first, but certainly not my last post!

Schema.org Déjà vu

schema-org1 The Web has been around for getting on for a couple of decades now, and massive industries have grown up around the magic of making it work for you and your organisation.  Some of it, it has to be said, can be considered snake-oil.  Much of it is the output of some of the best brains on the planet.  Where, on the hit parade of technological revolutions to influence mankind, the Web is placed is oft disputed, but it is definitely up there with fire, steam, electricity, computing, and of course the wheel.  Similar debates, are and will virtually rage, around the hit parade of web features that will in retrospect have been most influential – pick your favourites, http, XML, REST, Flash, RSS, SVG, the URL, the href, CSS, RDF – the list is a long one.

I have observed a pattern as each of the successful new enhancements to the web have been introduced, and then generally adopted.  Firstly there is a disconnect between the proponents of the new approach/technology/feature and the rest of us.  The former split their passions between focusing on the detailed application, rules, and syntax of it’s use and; broadcasting it’s worth to the world, not quite understanding why the web masses do not ‘get it’ and adopt it immediately.  This phase is then followed by one of post-hype disillusionment from the creators, especially when others start suggesting simplifications to their baby.  Also at this time back-room adoption by those who find it interesting, but are not evangelistic about it, starts to occur.  The real kick for the web comes from those back-room folks who just use this next thing to deliver stuff and solve problems in a better way.  It is the results of their work that the wider world starts to emulate, so that they can keep up with the pack and remain competitive.  Soon this new feature is adopted by the majority, because all the big boys are using it, and it becomes just part of the tool kit.

A great example of this was RSS.  Not a technological leap but a pragmatic mix of current techniques and technologies mixed in with some lateral thinking and a group of people agreeing to do it in ‘this way’ then sharing it with the world.  As you will see from the Wikipedia page on RSS, the syntax wars raged in the early days – I remember it well 0.9, 0.91, 1.0, 1.1, 2.0- 2.01, etc.  I also remember trying, not always with success, to convince people around me to use it, because it was so simple.  Looking back it is difficult to say exactly when it became mainstream, but this line from Wikipedia gives me a clue: In December 2005, the Microsoft Internet Explorer team and Microsoft Outlook team announced on their blogs that they were adopting the feed icon first used in the Mozilla Firefox browser. In February 2006, Opera Software followed suit.  From then on, the majority of consumers of RSS were not aware of what they were using and it became just one of the web technologies you use to get stuff done.

I am now seeing the pattern starting to repeat itself again, with structured and linked data.  Many, including me, have been evangelising the benefits of web friendly, structured, linked data for some time now – preaching to a crowd that has been slow in growing, but growing it is.   Serious benefit is now being gained by organisations adopting these techniques and technologies, as our selection of case studies demonstrate.  They are getting on with it, often with our help, using it to deliver stuff.  We haven’t hit the mainstream yet.  For instance, the SEO folks still need to get their head around the difference between content and data. 

Something is stirring around the edge of the Semantic Web/Linked Data community  that has the potential to give structured web enabled data the kick towards mainstream that RSS got when Microsoft adopted the RSS logo and all that came with it.   That something is schema.org, an initiative backed by the heavyweights of the search engine world, Google, Yahoo, and Bing.  For the SEO and web developer folks, schema.org offers a simple attractive proposition – embed some structured data in your html and, via things like Google’s Rich Snippets, we will give you a value added display in our search results.  Result, happy web developers with their sites getting improve listing display.  Result, lots of structured data starting to be published by people that you would have had an impossible task in convincing that it would be a good idea to publish structured data on the web.

I was at Semtech in San Francisco in June, just after schema.org was launched and caused a bit of a stir.  They’ve over simplified the standards that we have been working on for years, dumbing down RDF, diluting the capability, with to small a set of attributes, etc., etc.  When you get under the skin of schema.org, you see that with support for RDFa and supporting RDFa 1.1 lite, they are not that far from the RDF/Linked Data community.

Schema.org should be welcomed as an enabler for getting loads more structured and linked data on the web.  Is their approach now perfect,? No.  Will it influence the development of Linked Data? Yes.  Will the introduction be messy? Yes.  Is it about more than just rich snippets?  Oh yes.  Do the webmasters care at the moment? No.

If you want a friendly insight in to what schema.org is about, I suggest a listen to this month’s Semantic Link podcast, with their guest from Google/schema.org Ramanathan V. Guha. 

Now where have I seen that name before? – Oh yes, back on the Wikipedia RSS pageThe basic idea of restructuring information about websites goes back to as early as 1995, when Ramanathan V. Guha and others in Apple Computer’s Advanced Technology Group developed the Meta Content Framework.”  So it probably isn’t just me who is getting a feeling of Déjà vu.

Steve Dale looks forward to Online Information 2011

SteveDaleonline11In this podcast Steve Dale, Chair of the Online Information Conference 2011, looks forward to the event.

For a conference that is in it’s 35th year, the five major themes of Going mobile; Social Media; Building a framework for the future of the information professional; New frontiers in information management; and Search and Information Discovery; are all highly relevant for those interested or concerned about their impact.

Steve walks us through some of the sessions he is looking forward to whilst indicating the worth he sees in still attending a traditional face-to-face conference even in the current economic climate.

Barriers to ontology reuse

confusion © 2009 Tim Hodson

Recent work on the British Library’s bibliographic data model has given me some examples of identifying appropriate usage of properties in the Dublin Core and ISBD vocabularies. These were the vocabularies I was using, but you can generalise these examples too.

I should point out that this post is written from the point of view of a developer wanting to work out whether the use of a property or class found in a vocabulary is pertinent to a particular use case. I am assuming no prior knowledge of the debate surrounding the creation of these vocabularies, and realise that there will be many of you who can cite good reasons for decisions taken. I have also tried to keep my general dislike of record centric data models out of this post (I am not even going to mention that there may be some issues with the existence of ISBD properties in the first place).

As already eloquently demonstrated on the foaf wiki, the dcterms:creator definition is somewhat inconsistant leaving room for misinterpretation. The documentation appears to contradict itself, and suggests that both literals and resources are suitable as values for the property.

This leads to a mixture of implementations. In the British Library’s model, a decision was taken to treat the value as a resource. This makes the most sense for lots of reasons, as it allows our data to continue growing, rather than be stuck at a ‘literal’ dead end. We cannot say more things about a literal.

From the Dublin Core Terms example, we learn that when defining a property (or class), we should use clear definitions that use unambiguous wording.

The ISBD element sets have committed these ‘sins’ which make it very hard to choose to work with such a vocabulary.

  • Fundamental problem: the HTTP uris for the properties and classes do not resolve to anything. This means that I as a developer cannot look up any useful definition of the property. This may change in the future, but doesn’t help me now…
  • Fundamental problem: The names of the properties are alphanumeric codes. This is compounded by issue 1, which means I cannot lookup any definition of this property.
  • What information I can find via a google search leads me to a confusing metadata registry.

I am told that the ISBD property names are named with codes so as to render them language neutral. In the context of the semantic web, where language plays an important part in defining what a term means, it seems rather obtuse to hide that meaning behind a code.

Even if the property name itself is named in English, multiple labels can be given to that term in as many languages as necessary. Then, by making sure that the URI for the property dereferences to some useful data, anyone can easily find out what the term means by looking at whichever language they are comfortable with. There is also a possibility that a term in another language may have a subtly different meaning which should be expressed in that language. Again, use of multilingual labels can be used to make it clear what the differences are and why.

My basic message is: don’t make it so hard for people to find out how to use your vocabulary or ontology. And if it is hard, you might ask yourself whether it makes sense to model the vocabulary or ontology in the way that you have.

Specific lessons to learn:

  • Make sure that your properties and classes are clearly defined and use unambiguous wording.
  • Use sensible and descriptive names for your properties and classes.
  • Use additional labels with appropriate language types to make your ontology multilingual.
  • Make sure that the URIs for your properties and classes resolve to the definitions of those properties and classes (use RDF).
  • Don’t hide your vocabularies in complex registries, just publish them as a document, or series of documents.

I am aware that I haven’t mentioned domains and ranges. These are used to add extra descriptive information about the types of thing to be found on the left and right of properties. Sometimes this is a good thing, and sometimes this can add too many ‘restrictions’, although they are not truly restrictions, as you would be asserting a fact that doesn’t make sense, but in RDF it is still a valid fact. I might deal with this issue in another post.

If you want to know more, then why not try one of Talis Consulting’s training courses. We have an open course coming up in November, or we can run bespoke training tailored to your organisation’s needs.