Talis Consultancy
World leading expertise in Linked Data and the Semantic Web

Category: Uncategorized

A letter from the Middle Ages

Well actually, not just one, but over a thousand letters from the middle ages.

Last weekend, the National Archives held a Hackathon in the reading room at Kew. Around 40 developers and interested people took data from the National Archives and played with it.  There were new mobile interfaces for the NRA discovery API; collections of tweets mined for the data and PDFs they contained; stats on historical participation in the olympics pulled from the archives and shown on interactive maps. In all it was a fun weekend with lots of smart people in the room and very quiet but rapid typing on keyboards to get something finished by the 4pm Sunday deadline.

Prizes were:

  • 1st – Jonathan Tweed and Kai En Ong (ably assisted by Michael Smethurst, Faith Mowbray and Paul Rissen). A hack that pulls out data surrounding people & places in documents tweeted by @ukwarcabinet (and which – for a hack – is beautifully presented!).
  • 2nd – Jamie Mahoney - Debtors & creditors dataset hack maps the most popular lenders & shows who’s borrowing from where – Show me the money.
  • 2nd – Tim Hodson – A hack showing who wrote to whom in the middle ages.
  • 3rd – Crystal and Steven Hirschorn – A hack showing participation in the Olympics on an interactive map.

You can read more about these entries on the National Archives blog.

I hope you’ll forgive my showing off of my joint second prize winning contribution to the pizza and jelly baby fuelled hack fest.

I took a suggestion from Paul Risson as a personal challenge, and started puling the data that I wanted into a new CSV file.  I then converted that CSV file to a rudimentary RDF based model of the letters and people that the data described.  I now had a graph dataset which captured – in the way only a graph can – the network of relationships between people who are corresponding. It was then a case of finding a suitable javascript library to render my graph as a visual and to allow people to find out about who wrote to whom without cluttering up the graph diagram.

A guide to achievable data publishing

Opening your data sounds like a big scary sort of project that you wouldn’t want to have land in your lap.  It sounds like it ought to open up a minefield of legal, technical and practical issues that are maybe too big to tackle.

Our recent webinars sought to dispel any such myths, and provide you with a project outline that would work for your teams.  We know it works because this is how we run our projects to help organisations manage the transition to publishing their data in a new way.

Talis have the tools and experience to help you get up and running in months rather than years. Now you can watch this recording of the webinar to find out more.

Åke Nygren Previews a Session on the Mobile Revolution

online11 Ake Åke Nygren from Stockholm Public Library talks with Richard Wallis about the session at the Online Information Conference 2011, which he is moderating – The Mobile Revolution: Opening Up A New World Of Possibilities.

Their discussions ranges from Åke‘s background in Swedish libraries, including the creation of a new library, through to his engagement with several digital initiatives, before exploring the themes of his session.

Keynote Themes at Semantic Tech & Business, London, 2011

Talis Consulting Logo

We knew that Semantic Tech & Business in London this week was going to be a great conference with some real business message but we couldn’t have predicted how excellent the keynotes were going to be.

Straight from the recent announcement that Volkswagen are using extensive semantics for their product data we have Martin Hepp presenting the way that structured data enhances the web. Martin gave great and essential messages, describing how rich product data is destroyed by the web today. He describes the web of documents (quite rightly) as a data shredder.

Martin Hepp of Hepp Research at Semantic Tech and Business 2011, London

Through several major points the other that hit me between the eyes is how so much effort is spent optimising the experience of a web page once a visitor has landed there — yet the web has evolved (and is evolving) to show users key information without visiting the page. That means we have to invest far more in optimising for the way your data displays before a user arrives. Richard has been blogging about the use of Linked Data and Semantics in SEO and SERP for a little while now and if you want to discuss how to make the data on your site work harder to get visitors to come to you then we’d like to talk :)

Steve Harris of Garlik at Semantic Tech & Business, London, 2011

Steve Harris of Garlik talked about the way they’ve used semantic technologies internally at Garlik. Their customers and partners, on the whole, don’t know that they use technologies like this — they’re just impressed by what Garlik can do with the data. He raised some great points, hiring expertise in this area is hard, so they look for good software engineers and then train them in Linked Data and SPARQL. Their experience, like ours, is that developers who have built systems this way for a few months do not want to go back to SQL.

If you have a team of software engineers, developers, data owners, DBAs and project managers who you want to understand this technology then we have a proven two-day training course that teaches Linked Data from the basics.

Steve’s other key message is that this stuff is ready and possible for companies and it has allowed Garlik to do stuff they couldn’t have done with relational technologies.

John O'Donovan keynote at Semantic Tech & Business, London, 2011

John O’Donovan entertained us with a seemingly endless stream of the most wonderful (badly phrased) headlines. For him these demonstrate the need for comprehensive and well-managed metadata. He talked about the BBC’s World Cup 2010 project which built its site atop a triple store. Talis Consulting have trained many of the developers and information architects at BBC in semantic technologies.

John mirrors the message from Martin and Steve that this technology is ready, capable of delivering large production systems and has real benefits in terms of power, flexibility and cutting implementation costs.

We’ve been seeing this market mature year on year for some time now and it’s great to see three high profile keynotes all saying the same thing — Semantic Technologies are ready for you to use.

If you want to start using them, come and chat with us :)

Linked Data at the Kabuki

P1000034 A couple of days ago, Alison Kershaw and I, ran our first Linked Data Open Day in the US.  The location we chose was the Hotel Kabuki in the Japan Town area of San Francisco.  It seemed that everywhere you went in the hotel over the last few days, you could not avoid overhearing the phrase Linked Data.  Not only because of our open day, but it was immediately followed by the LOD-LAM Summit where some 100 people from the Libraries Archives and Museums community came together to discuss the issues around and potential for Linked Data.  More of that in another post.

Located where it is, the Hotel Kabuki has an appropriate oriental feel, complete with a Japanese style Summer garden in its centre, and was a great location for our event.

Although it was based upon our normal style that we have run previously in the UK, we changed a few of the presentation a little, providing a greater insight in to who Talis and Talis Consulting are, and updating our approach to demystifying Linked Data, its usage and benefits.  From the excellent questions and debate throughout the day it seems that we got it right.

We also introduced two excellent external speakers.  Charles Greer provided an interesting insight in to how O’Reilly Media evolved their way towards the significant use of Linked Data today.  We also had Jon Voss who gave an excellent overview of the Civil War Data 150 project and how it is to use Linked Data.

All-round an excellent day – slides from the day are available to view.

Profiting from the New Web

Does this describe you?

  • You remember what it was like not having the Internet.
  • You remember the first websites with pixelated images and flashing headings.
  • You remember thinking that putting a webpage for your business online would be a really great idea.
  • You remember thinking that accepting payments online would be amazing…

Or maybe this describes you?

  • You don’t remember a time before the Internet.
  • You can’t imagine not being on your laptop or smartphone and finding out whatever you want whenever you want.
  • You can’t imagine why you wouldn’t want to be a part of the twitterverse.

Whichever description fits, you’ll recognise the innovation that took a new technology and allowed new business models to emerge.

Profiting from the New Web, an event at the Royal Society, hosted by Webscience Trust in partnership with IntellectUk, showed that businesses are starting to look at the opportunities inherent in this always connected, ubiquitous computing enabled, social world we live in. A video summary of key messages from the day is now available.

Talis’ Ian Davis participated in a panel discussing the value in open data focused on the practicalities around why and how you can go about a Linked Data project.

Being on the cusp of new opportunities presented by the opening up and linking together of the data that underpins the New Web is where innovative businesses are.  That’s where Talis are.  Kasabi is opening up a data marketplace where you can share and access useful data.  Talis Aspire is changing the way academics manage their course materials.

And Talis Consulting? We’re here to help your business take advantage of the New Web.

Archives Hub and Locah release Linked Data

The LOCAH project has been working hard to take Archives Hub data and publish it as re-useable Linked Data at http://data.archiveshub.ac.uk.

Initially using only a subset of the data available from Archives Hub, the project has regularly documented the process of turning it into Linked Data through a series of blog posts.  These posts highlight the involvement of domain experts at the modelling stages and throughout the conversion to make sure that the final dataset is not just an attempt at triplification, but an attempt to make the data useful.

Talis, as the Technology Partner to the JiscEXPO funded LOCAH project, are pleased to have been able to provide hosting of their Creative Commons CC0 1.0 licensed data, under our Connected Commons scheme.

Read the Locah announcement.

Value-IT Semantic Technologies for Enterprise

If you haven’t seen it already, you may be interested in a report on the value of Semantic Technologies in your Enterprise (what the report calls STE).

From the report:

Semantic Technologies deal with rich data relationships and provide more intelligent access to resources in order to better mediate between the wants of the users and the available information. The challenge of these technologies is to create, encode and resolve meanings, and offer a structured organisation of knowledge to better manipulate, reuse and address information.

If you need ideas on how to sell the adoption of these technologies to the decision makers in your organisation, then have a look at the report’s accompanying use case scenarios.

From the Use case introduction:

STE can have an enormous impact on business performance and awareness, increasing operational intelligence and knowledge sharing, enabling better decision-making, and supporting more efficient and effective processes. This study provides a framework for understanding how semantic enabling is achieved in the Enterprise.

The business cases include the BBC’s Wildlife Finder, a project in which Talis provided the semantic technology expertise to enable the BBCs developers to create an interesting new showcase of their wildlife content in a thoroughly engaging way.

Talis Consulting are actively involved in helping people within organisations like the BBC and AstraZeneca take their first steps to understanding how semantic technologies can positively impact their business.

You can download the final report and the business use case document from the value-it.eu website.

Talis Linked Data Open Day – USA

2304125531_de22f1cfce_m Whenever we publicise one of our Linked Data events we regularly hold in the UK, I always get a handful of responses wishing that we would run such an event on the American side of the planet.

So it is a great pleasure to announce our first Talis Linked Data Open Day in San Francisco on Wednesday 1st June.

Register via the button below, come along to meet with some Talis folks, and explore the possibilities of Linked Data.

We will include an introduction to Linked Data; a look at where it is being used; a bit of history to put it in to context with the Web and other technologies; and a non-scary look under the hood at how it is implemented.

We will also share some of the experiences and lessons learnt by the Talis team, when working with many organisations, in government, the media, and commercial worlds.

These are informal participative days where you get the opportunity to question those presenting and discuss practicalities, and benefits of Linked Data.

Register for Talis Linked Data Open Day - USA in San Francisco  on Eventbrite

Photo Creative Commons licensed from Rob Styles Flickr Photostream

Choosing URIs, not a five minute task.

Chris Keene at Sussex is having a tough time making a decision on his URIs so I thought I’d wade in and muddy the waters a little.

He’s following advice from the venerable Designing URI Sets for the UK Public Sector. An eleven page document from the heady days of October 2009.

Chris discusses the choice between data.lib.sussex.ac.uk and www.sussex.ac.uk/library/ in terms of elegance, data merging and running infrastructure. He’s leaning toward data.lib.sussex.ac.uk on the basis that data.organisation.tld is the prevailing wind.

There are many more aspects worth considering, and while data.organisation.tld may be a way to get up and running quickly you might get longer term benefit from more consideration; after all we don’t want these URIs to change.

The key requirements are outlined well in ‘Designing URI Sets’ as follows

3. In particular, the domain will:

  • Expect to be maintained in perpetuity
  • Not contain the name of the department or agency currently defining and naming a concept, as that may be re-assigned
  • Support a direct response, or redirect to department/agency servers
  • Ensure that concepts do not collide
  • Require the minimum of central administration and infrastructure costs
  • Be scalable for throughput, performance, resilience

These are all key points, but one in particular stands out for me in terms of choosing the hostname part of a URI

  • Not contain the name of the department or agency currently defining and naming a concept, as that may be re-assigned

That simple sentence contains a lot more than at first reading and suggests that any or all of the concepts defined in the data may become someone else’s responsibility in time. I think over time we will see this becoming key to the longevity of URIs, along with much better redirect maintenance.

The approach data.gov.uk has taken is to break the data into broad subject areas within which many different types of data might sit – education.data.gov.uk, transport.data.gov.uk, crime.data.gov.uk, health.data.gov.uk and so on. This is one example of breaking up the hosts and while right now they all point to one cluster of web servers they can be moved around to allow hosting in different places.

This is good, yet I can’t help thinking that those subject matter areas are really rather broad. Then there are others that seem to work on a different axis, statistics.data.gov.uk and research.data.gov.uk. Leaving me confused at first glance as to where the responsibility for publishing crime research would lie. Then there is patents.data.gov.uk, not “innovation” or “invention” but “patents”, the things listed.

Data.gov.uk has done a great job trailblazing, making and publishing their decisions and allowing others to learn from them, develop on them and contribute back. I think we can push their thinking on hostnames still further. If we consider Linked Data to be descriptions of things, rather than publishing data, then directories of those things would be useful.

For example, we could give somebody the responsibility of publishing a list of all schools in the UK at schools.example.gov and that would be one part of the puzzle. A different group may have the responsibility of publishing the list of all universities and yet another the list of all companies at companies.example.gov.

Of course, we would expect all of these to interlink, patents.example.gov would have links to companies.example.gov and universities.example.gov to document the ownership of patents. We’d expect to see links in schools.example.gov to inspections.schools.example.gov and so on.

Notice that I’ve dropped the word data from those examples, as much of this is about making machine (and human) readable descriptions of things. It’s only because we describe lots of things at the same time and describe them uniformly we call it data.

I’d still expect health.example.gov to appear as well, but the responsibility would be one of aggregating what could be considered health data in order to support querying; it would aggregate doctors.example.gov, hospitals.examples.gov and more. I would expect as many of these aggregates to pop up as are useful.

Of course, in this approach, as in the current data.gov.uk approach, everyone who wants to say something about a particular doctor, school or patent has to be able to get access to that host to say it and, perhaps, conflicting things said by different people get mixed up.

At this point you’re probably thinking well, we might as well just use data.organisation.tld and be done with it then. Unfortunately that simple moves the same design decisions from the hostname to the resource part of the URI, the bit after the hostname. You still have to make decisions and with only one hostname your hosting options are drastically reduced.

Data.gov.uk places the type of thing in the resource part of the URI using what they call concept/reference pairs:

2. Examples of concept/reference pairs:
• road/M5
• school/123
3. The concept/reference construct may be repeated as necessary, for example:
• road/M5/junction/24
• school/123/class/5

I tend to do this slightly differently, using container/reference pairs so I would use “roads” rather than “road” as this lends itself better to then putting listings at those URIs.

The antithesis

We can often learn something by turning an approach on its head. In this case I wonder what would happen if we embraced the idea that many people will have different world-views about the same thing, their own two-penneth so to speak. None of them necessarily authoritative.

In that case we end up with me publishing data on data.my.domain and you publishing data about the same things on data.your.domain. Just as happens all over the web today. If I choose my domains carefully then maybe I can hand bits on as I find someone else to run them better, as above, but always there is more than world view.

There are two common ways to make this work and be interconnected. A common approach is to use owl:sameAs to indicate that data.my.domain/Winston_Churchill and data.your.domain/Winston_Churchill are describing the very same thing. The OWL community is not entirely supportive of that use.

The other approach is to use the annotation pattern and rdfs:seeAlso; in which case documents describing a resource live in many places, but they agree on a single, canonical, URI.

So what would that mean for Sussex?

Well, I’m not sure.

Fortunately, Chris has a limited decision to make right now, choosing a URI, or URIs, for the Mass Observation Archive. It is for this he is considering data.lib.sussex.ac.uk and www.sussex.ac.uk/library/.

Thinking about changing responsibilities over time, I have to say I would choose neither. It is perfectly conceivable that the mass observation may at some time move and not be under the remit of the University of Sussex Library, or even the university.

I would choose a hostname that can travel with the archive wherever it may live. Fortunately it already has one, http://www.massobs.org.uk/. Ideally the catalogue would live it something like catalogue.massobs.org.uk or maybe massobs.org.uk/archive or something like that.

My leaning on this is really because this web of data isn’t something separate from the web of documents, it’s “as well as” and “part of” the web as one whole thing. data.anything makes it somehow different; which in essence it’s not.

Postscript

Oh, on just one more thing…

URI type, for example one of:
• id – Identifier URI
• doc – Document URI, Representation URI
• def – Ontology URI
• set – Set URI

Personally, I really dislike this URI pattern. It leaves the distinguishing piece early in the URI, making it harder to spot the change as the server redirects and harder to select or change when working with the URIs.

I much prefer the pattern

/container/reference to mean the resource
/container/reference.rdf for the rdf/xml
/container/reference.html for the html

and expanding to

/container/reference.json, /container/reference.nt, /container/reference.xml and on and on.

My reasoning is simple, I can copy and paste the document URI from the address bar, paste it to curl on the command line and simply backspace a few to trim off the extension. Also, in the browser or wget, this pattern gives us files named something.html and something.rdf by default. Much easier to work with in most tools.