Talis Consultancy
World leading expertise in Linked Data and the Semantic Web

Category: Open Data

Tell us about your Open Data Experiences!

Open Data has become an important issue on the agenda of many organisations and companies. There are many reasons why you might decide to make your data available: On the one hand, there is legislation that requires public sector organisations in particular to make their data available (such as the Freedom of Information Act in both the UK and US). On the other hand, many owners of data start to see a potential benefit in sharing their data with the wider world, even without a direct legal requirement. Such reasons can range from wanting to provide better services to your customers or citizens, over improvements in SEO, to the expectation that opening your data will lead to cross-fertilisation within your industry (or even just within your own organisation), with an eventual benefit for all.

Open Data Problems

Labyrinth

If you or your organisation have any experiences in providing Open Data (or if you’re thinking about it), then you will have come across the 5-Star scheme for Open Data (for the original 5-star data proposal see here, a nice write-up is here) – the more stars, the more useful and connected it is. Publishing 1-star data (just put it online) to 3-star data (use a non-proprietary format) is relatively simple and straightforward. However, when it comes to 4- and 5-star data, things can become quickly become a bit more complex: If you’re new to it, the world of Linked Data and URIs can seem daunting and difficult to understand. However, even with some experience, there can often be issues of figuring out and mastering the right approach, such as deciding how to model your data, how to structure your URIs, which other data to link to and how, which vocabularies to use (or maybe you need to develop your own), etc. Other issues that can arise are more technical in nature, such as deciding which hosting platform to choose, which software to use for modelling, conversion and data maintenance, whether to set up your own infrastructure or use an external service, etc.

What are your Experiences?

If you or your organisation have encountered any of these or any other problems in the process of publishing data, we’re interested in hearing from you! We would love to learn about your data publishing experiences (both the good and bad), and the reasons you embarked on doing this. Also, Talis are currently offering a 1/2 day review of data that has been published. If you would like me (or one of my colleagues) to review the data you have published, feel free to contact me at knud.moeller@talis.com.


Labyrinth picture by rosmary on Flickr, licensed under CC BY 2.0.

Open data cities demonstrator

Photo credit: Tim Hodson

At the Open Data Cities Conference in Brighton, Talis unveiled their demonstrator app (link below) which shows how a city might begin to engage with it’s citizens and promote digital economy innovation.

The demo is designed to highlight the ways in which a city and its citizens might be brought together in an information marketplace. The demo is designed to trigger questions around how cities might use an interactive information marketplace to measure social impact. The demo is the software equivalent of a thought piece, allowing us to talk about the things that might change the way people in your city think about engaging with each other in social enterprise.

Talis have been exploring ways in which a data marketplace might add value to individual datasets, and have built Kasabi which allows anyone to publish their data easily, and then harness the power of multiple data access channels.

Key demonstrator themes:

  • citizens can request data about their local area
  • citizens can use data, from the city and local businesses, to build apps
  • the city might fund the building of apps that are in demand
  • citizens can share apps they have built
  • business can use the marketplace, to publish the data that will power other applications.
  • cities can easily publish data about anything
  • citizens able to add data to existing datasets
  • developers have several tools for accessing indexed and structured data
  • all data added to the site is indexed as it arrives and becomes available to applications within a very short time
  • the information marketplace is a data hub providing a revenue share opportunity

Behind the lightweight demonstrator sits a technology stack that provides data hosting and integration. The simple datasets used as examples in the demo can be explored by both developers who understand working with data, and citizens with no programming background.

I could throw the names of some technologies at you, such as graph databases, geo indexes, full-text indexes and application programming interfaces using a variety of protocols, but it is the self service nature of kasabi combined with the interactive and social aspects of our demonstrator that we think will make the difference to your city.

As a city we think you probably know your citizens quite well, however I am sure that there are ways that they can surprise you. Maybe it is a loosely organised not for profit company that sets itself the mission of providing the best quality data about where to park in your city. Maybe they take data that you provide about where the parking spaces are and how often they are used and combine it with a calendar of city wide events sourced from several other data providers. Maybe they built an indispensable app that helps people to choose the best parking site in the city. Maybe it even integrates with an existing drive-sharing scheme to provide parking booking services for commuters and tourists alike.

An idea like that is only possible if the people wanting to build a data driven application have easy access to data.

Of course there is no reason why that access should be free. A car parking app might charge a small fee for the provision of the service, and that fee might be shared with the data providers and the city playing host to the data in a marketplace. Everyone gets to have a share in the success of the idea.

For cities that might have a perceived poor parking experience, an app like this might improve the imgae of the city and reposition it as an easy place to find parking. It might even change people’s parking behaviour to the better, a social impact that becomes easier to measure.

At Talis, we are keen to work with you to explore how your city data and your citizen’s data might be brought together in a marketplace that allows new business to start and thrive.

See the demo >>

Talk to us

Making open data achievable

Government organisations have a remit for publishing some of their core data as open data.  This remit sometimes seems too difficult to achieve.

Tim Berners-Lee took a pragmatic approach by simplifying the problem into bite size chunks with an inbuilt mark of quality. The famous 5 Stars are not a new thing, they have been around since 2010, and we have asked if your data is 5 star before.

The 5 stars of open data publishing are clear simple steps that you can take to get your data published openly. Talis have helped the likes of the Ordnance Survey, Office of National Statistics, British Library, Data.Gov.Uk and the department for Business Innovation and skills to publish their data openly.

But how did they do it?

Our experience, gained by working with the likes of the Ordnance Survey, has shown us that providing a hosted platform for publishing data means that organisations can concentrate on data quality and utility without having to find funding for infrastructure and maintenance of specialist hardware. As John Goodwin, Research Scientist at the Ordnance Survey said:

“We decided to let Talis take care of the hosting and serving, so all we had to do was worry about making the data available.”

By pushing the infrastructure costs outside the organisation, data publishers can get on with making their local data link to a wider network of global data and gain their 5 star rating.

If you need help making open data achievable in your organisation, talk to us.

Library of Congress To Boldly Voyage To Linked Data Worlds

The Library of Congress made an announcement earlier this week that has left some usually vocal library pundits speechless.

Roy Tennant (rtennant) on Twitter

 

 

loc_logo_detail

MARC is Dead!RDA made irrelevant! – cries that can be heard rattling around the bibliographic blogo-twittersphere.   My opinion is that this is an inevitable move based upon serious consideration, and has been building on several initiatives that have been brewing for many months.

Bold though – very bold.  I am sure that there are many in the library community, who have invested much of their careers in MARC and its slightly more hip cousin RDA, who are now suffering from vertigo as they feel the floor being pulled from beneath their feet.

The Working Group of the Future of Bibliographic Control, as it examined technology for the future, wrote that the Library community’s data carrier, MARC, is “based on forty-year-old techniques for data management and is out of step with programming styles of today.”

Many of the libraries taking part in the test [of RDA] indicated that they had little confidence RDA changes would yield significant benefits…

 

And on a more positive note:

The Library of Congress (LC) and its MARC partners are interested in a deliberate change that allows the community to move into the future with a more robust, open, and extensible carrier for our rich bibliographic data….
….The new bibliographic framework project will be focused on the Web environment, Linked Data principles and mechanisms, and the Resource Description Framework (RDF) as a basic data model.

There is still a bit of confusion there between a data carrier and a framework for describing resources.  Linked Data is about linking descriptions of things, not necessarily transporting silos of data from place to place.  But maybe I quibble a little too much at this early stage.

So now what:

The Library of Congress will be developing a grant application over the next few months to support this initiative.  The two-year grant will provide funding for the Library of Congress to organize consultative groups (national and international) and to support development and prototyping activities.  Some of the supported activities will be those described above:  developing models and scenarios for interaction within the information community, assembling and reviewing ontologies currently used or under development, developing domain ontologies for the description of resources and related data in scope, organizing prototypes and reference implementations.

I know that this is the way that LoC and the library community do things, but do I hope that this doesn’t mean that they will disappear into an insular huddle for a couple of years to re-emerge with something that is almost right yet missing some of the evolution that is going on around them over that period.

As per other recent announcements, such as the vote to openly share European Libraries’ data, the report from the W3C’s Library Linked Data Incubator Group, and now the report from the Stanford Linked Data Workshop.  I welcome these developments. However I warn those involved that these are great opportunities [to enable the valuable resources catalogued and curated by libraries over decades to become foundational assets of the future web] that can be easily squandered by not applying the open thinking that characterise successes in the web of data.

British Library Data Model One very relevant example of the success of applying open thinking and approach to the bibliographic word using Linked Data is the open publishing of the British National Bibliography (BnB).  Readers of this blog will know that we at Talis have worked closely with the team at the BL in their ground breaking work.   The data model they produced is an example of one of those things that may induce that feeling of vertigo that I mentioned.  It doesn’t look much like a MARC record!  I can assure the sceptical that although it may be very different to what you are used to, it is easy to get your head around.  (Drop us a line if you want some guidance).

As we host the BnB Linked Data for the BL, I can testify to the success of this work – only launched in mid July.  It’s use is growing rapidly, receiving just short of 2 million hits in the last month alone.

With the British Library, along with the National Libraries of Canada and Germany, being quoted as partners with the LoC in this initiative, plus their work being referenced as an exemplar in the other reports I mention, I hold out a great hope that things are headed in the right direction.

As comments to some of my previous posts attest, there is concern from some in the community of domain experts, that this RDF stuff is too simple and light-weight and will not enable them capture the rich detail that they need.  They are missing a couple of points.  Firstly, it is this simplicity that will help non-domain experts to understand, reference and link to their rich resources.  Secondly, RDF is more than capable of describing the rich detail that they require – using several emerging ontologies including the RDA ontology, FRBR, etc.  Finally and most importantly, it is not a binary choice between widely comprehended simplicity and and domain specific detailed description.   The RDF for a resource can, and probably should, contain both.

So Library of Congress, I welcome your announcement and offer a friendly reminder that you not only need to draw expertise from the forward thinking library community, but also from the wider Linked Data world.  I am sure your partners from the British Library will reinforce this message.

Making Open Data and A Public Data Corporation Real

In August of 2011 the Cabinet Office and Department for Business, Innovation and Skills issued two public consultation papers, one entitled Making Open Data Real: A Public Consultation and the second entitled A Consultation on Data Policy for a Public Data Corporation.

Talis has responded to both and we wanted to share our responses with you and welcome discussion on the comments here.

The Making Open Data real consultation questions provide a good framework for structuring the conversation around how best to make Open Data real for the UK. We have provided specific answers to the questions in the attached PDF of our response and felt a summary of the recurring themes would be useful here.

We believe there is a great deal of opportunity presented by HM Government publishing data for re-use by individuals and companies alike. These opportunities fall into several key categories:

• Transparency

• Informed Choice

• Efficiencies

• Innovation

All of these agenda for open data are important and all have similar requirements in order to make them successful.

1 — Publish Data

Data that is published openly is far more usable than data that has to be requested. Often people won’t know what to request or what might be available and often the time delay between requesting and receiving data is off-putting.

2 — License Openly

An ecosystem based on data requires certainty of licensing in order to make use of the data without fear. Provide clear and unambiguous licensing of all published data to support experimentation. This licensing must allow commercial exploitation of the data if we are to see investment made in new businesses.

3 — Remove Barriers

Use of data is often experimental; it is often an exploration to find an answer. That journey can happen much faster if there are fewer hurdles in the way. Any process that prevents direct and immediate access to the raw data should be avoided.

These criteria are common to all of the agenda that people pursue around Open Data and can be summarised as:

Give people unfettered access to the raw data to with as they please.

Some folks have been more concerned about the second consultation, that dealing with the creation of a Public Data Corporation or PDC.

We believe that the creation of a PDC has the potential to significantly simplify access to government data and make it possible for many more individuals and companies to make use of it. In that, however, there is risk. By making the PDC and its parts accountable for establishing “sustainable business models” we risk continuing the status quo in which licensing fees, restrictive licenses and lengthy processes make it impossible to innovate on this data.

It is possible with the use of simple technologies and techniques to publish data at little cost and we would like to see options explored for a PDC that is accountable for a low-cost model in which data is available as cheaply as possible and without restriction. Such a model would promote innovation and make the UK a leader in data exploitation.

1 — Charging for PDC Data

We believe that there are sufficient cost-savings and increases in productivity that would come from freely releasing government data that it would be possible to afford a no-charge model for PDC data.

This may take time to achieve and requires changes to the way many parts of the PDC would operate but is inline with the stated objective of delivering more data for free year- on-year.

It is important that any charging model for PDC data is built to support the increasing release of data for free not work against it.

2 — PDC commercially exploiting data itself

This presents a very real conflict of interest in which those inside the PDC have a much better opportunity to build businesses on top of government data than those outside. It also presents a concern for those wanting to innovate as there is a conflict of interest within the PDC when hearing new ideas for the commercial use of PDC data. This should be avoided.

3 — Licensing

All but one of the options discussed for licensing present significant complexity for consumers of the data. If we wish to stimulate innovation then licenses that require a defined use up-front will prove limiting and any licensing regime that requires consumers to seek legal advice will present a substantial barrier to use.

If the PDC and its parts are charged with developing commercial supply of this data then this is likely to include terms that prevent the re-distribution of this data. These again will severely limit the ways in which the data can be used to create new and innovative businesses.

We hope that these consultations will continue a discussion about the potential benefits of opening up government data and encourage you to comment below and to link to your own responses if you responded to the consultations also.

We’ve also been contributing to a response from the Linked Data community. I’ll update with a link to that once it’s published.

Here are the Talis Group responses in full:

Talis Group Response to Making Open Data Real A Public Consultation (PDF)

Talis Group Response to A Consultation on Data Policy for a Public Data Corporation (PDF)

 

The Tyranny of Time

A guest post by Lawrence Serewicz, Principal Information Management Officer, Durham County Council

I came across the following reference to time within the retail sector and it made me consider how my world of local government, or any business for that matter, thinks about time.

An old saying in the retail industry is that: ‘If information is available monthly, then decisions taken will take 6 months to have an effect. If it is available weekly, then decisions take a month to influence outcomes; if daily, it takes a week; and if hourly, the decisions can have an impact the next day’ (p.13)

(Source : Valuing Information as an Asset http://www.sas.com/reg/gen/uk/valuing-information )

How often do we collect data? In many organisations, there are quarterly returns, but is that enough for today’s services? In some cases, councils collect real-time data, but are their reporting systems ready for it? For example, management or cabinet committee, meetings may be once a month, but is that enough to have a strategic view of what is happening within an organisation?

At one level, the timeframe for Council Members is different because their work is strategic, they are trying to shape the organisation’s future and where it will be over the long term and not determining if the recent refuse collection achieved 99% or 98% effectiveness. Even if we discount the member’s need to have real time data (at least a strategic level) and focus on the manager/officer’s role, we still see the tyranny to time.

How often do we see, use, or for that matter, analyse, real-time data? Do our performance management systems display a disconnect between the timescale within which they are collected and reported? We may have refuse bin collection rates measured every day, but if our performance reporting within the organisation is quarterly, how well does that serve the organisation? At the same time, is that performance information available to the services, such as customer service desks?

In this example, if the real-time performance is being reported to the customer service desk, they can see that the bin collection rate on a snowy day (for example) is lagging in some areas, but is still robust in other areas. Thus, a call from an area with good collection (say 99%) is going to be a different issue than a missed collection in an area with an 80% collection rate because of the snow conditions. Yet, how many performance management systems or performance information systems are designed to capture and analyse real time data. Even weekly data, can be considered real time data depending on the service, so it raises the point at the start. If the information is only available quarterly, what is the impact rate? If you collect each quarter, is the final impact seen yearly or in two years? If that is the timescale, is it going to be effective?

What does this have to do with open data? If data is being collected and made available to customers and the public, are they getting real time data or is there an organisation influenced lag effect on the data? One of the main themes within the UK  government’s open data collection consultation (http://data.gov.uk/opendataconsultation) as well as its overall transparency agenda is to open service performance information to the public.

The service performance information will inform their choice about services but also to hold it to account. Yet, if there is a lag effect, between when service information is collected and published can the public hold a local authority to account effectively? How much and when the information is released can have a large influence on whether an organisation is accountable. If it only has to report once a year, how much accountability can be achieved? If a change in performance is required, how will it be demonstrated in such a long reporting cycle?

If, however, real time data is released, will that have a destabilizing effect on the political process? If the political process is relying on quarterly performance reporting and the public are getting the information in real time, how will elected members be able to respond? Moreover, if members, as residents, are consuming the information in real time as well, what is the role of a quarterly performance reporting system?  To be sure there will be different reports for different issues, but the underlying question is how to make open data respond to real time demand.  Do I need to know the car park was full last week if I am trying to get parked now?

The issue of time is also about how and where information is released. If an organisation releases its performance statistics in a paper report, and not as a spreadsheet, can external scrutiny be achieved?  In that sense, the format for publication will show the timescales. Such reporting has an immediate and direct effect on the ability of the public, and members, to hold the organisation to account.

At the same time, there is the question of whether real time reporting fits your strategy. If one company is working on a the day to day reporting and another is taking a ten year strategy to grow they will have different understandings of time.  Moreover, their reporting mechanisms will be different.  Yet, can the 10 year plan work without taking care of the day to day? In that sense, can anyone escape the tyranny of time?  The more your competitors harness, the more you will need to adapt or adopt.

From an accountability perspective, the issue may be simply finding a way to reconcile that with monthly or quarterly performance reporting to the real time data.

What effect this will have on the way we operate in the public and private sectors?  Only time will tell.

Talis Consulting +1

I arrived in Birmingham and joined the Talis Consulting Team about three weeks ago now (time flies!), so it’s about time I introduced myself to the wider world. I have been closely involved with the Semantic Web since early 2004, when I started my PhD at the Digital Enterprise Research Institute (DERI) in Galway, Ireland. Back then this idea of semantics and structured data on the Web was still new, very much academic, and considered to be mostly arcane and irrelevant by the majority of the Web community. Terms like “Linked Data” or “Open Data” hadn’t even been coined yet, and wide-spread adoption was still far beyond the horizon. It was a great time to work in academia and start a PhD in this emerging field, and DERI was a fantastic place to do it.

Now, almost eight years later, the idea that semantics and structured data matter has started to stick – just have a look at last week’s blog post about schema.org, or think about the way Open Data has not only established itself as a hot topic in many countries, but has in fact become a wide-spread policy. So, after spending many years on the academic side of things, getting a PhD a long the way, and watching the web slowly embrace these “weird” new ideas and technologies, I felt that maybe this was the perfect time to switch over to the industry side of things. I enjoy building things, seeing something take shape, and now that more and more serious Linked Data and semantics-related projects are being started, I can apply my know-how and help to bring them to life. Or maybe people have an idea for a project, but don’t yet know which direction to take, or are unsure which approach might work for them? I should be able to use my experience to help them figure out which way to go. Maybe an organisation just wants to learn about Linked Data, and how they can benefit from it? Again, my time in academia has helped me to communicate things like that.

Talis has been very active and visible in the Linked Data community for several years now, and has in fact established itself as one of the leading players in the field. I had already co-operated with several people from Talis over the years, writing papers, organising community events or running projects. Some time last year then I heard about the plans for a new consulting team, which sounded like the perfect environment to do the things outlined above – teach, design and create solutions with a wide range of different clients in the Linked Data and semantics space. Talis looked more and more like the ideal place for me to go. Fast forward to the present day, and here I am with my first, but certainly not my last post!

Establishing the Connection

A guest post by Neil Wilson, Head of Metadata Services, British Library

I have been doing a number of presentations recently concerning the British Library’s new range of free data services and particularly the ‘Linked Open BNB’ that we launched via a Talis platform in July which as just appeared on the latest LOD Cloud diagram. Most recent of these was one for the Semantic Technology & Business Conference (London : Sept 26/27). However, before I come to some of the points I was trying to get across; a few words about the conference…

One of the objectives of the BL’s new open metadata strategy is to engage with the wider community in order to try and move beyond the traditional ‘library silo’. Semtech certainly provided an opportunity to do this since it was chiefly attended by a wide range of business technology implementers in addition to public sector organisations. Most interesting to me however was the fact that such a wide cross section of organisations: from the US Dept. of Defence, to the Amsterdam Fire Service by way of the BBC and a variety of hard nosed commercial companies were not just experimenting with triple stores etc or building them into future plans but actually using them in critical applications right now. It was therefore highly encouraging to see that the possibilities and value presented by semantic technologies were being made real by such a diverse variety of companies and public sector institutions. Undoubtedly there is still a way to go until the Semantic Web becomes a reality but the fact that applications are moving from the experimental to the everyday should help to accelerate its development further and convince others to take the plunge.

Back to my own presentation:


I really wanted to address some queries I had received from other organisations covering three main areas:
• Why is the BL experimenting with linked open data?
• What choices might you encounter when creating a linked data service?
• What lessons have you learned?

I was interested to see that some of the latter (e.g. develop incrementally & ‘grow as you go’ or exploit existing tools, standards & expertise to get an early result) were echoed in several other presentations. From some of the feedback received after the talk, at least some of the points made may have helped those contemplating a similar ‘linked data journey’.

So has it all been worth it? Statistics (i.e. 250K transactions in Month 1) suggest the BNB service is certainly being used. While coming to terms with linked data was a steep learning curve, colleagues found the project a highly positive experience. The BL also benefited from being one of the first library sector movers in the area; with the system assisting both visibility and relevance while also suggesting new options for future development. In a wider context, I would argue the results also show: libraries can rise to the challenges of a rapidly changing environment; have valuable resources to offer and an important contribution to make.

Keynote Themes at Semantic Tech & Business, London, 2011

Talis Consulting Logo

We knew that Semantic Tech & Business in London this week was going to be a great conference with some real business message but we couldn’t have predicted how excellent the keynotes were going to be.

Straight from the recent announcement that Volkswagen are using extensive semantics for their product data we have Martin Hepp presenting the way that structured data enhances the web. Martin gave great and essential messages, describing how rich product data is destroyed by the web today. He describes the web of documents (quite rightly) as a data shredder.

Martin Hepp of Hepp Research at Semantic Tech and Business 2011, London

Through several major points the other that hit me between the eyes is how so much effort is spent optimising the experience of a web page once a visitor has landed there — yet the web has evolved (and is evolving) to show users key information without visiting the page. That means we have to invest far more in optimising for the way your data displays before a user arrives. Richard has been blogging about the use of Linked Data and Semantics in SEO and SERP for a little while now and if you want to discuss how to make the data on your site work harder to get visitors to come to you then we’d like to talk :)

Steve Harris of Garlik at Semantic Tech & Business, London, 2011

Steve Harris of Garlik talked about the way they’ve used semantic technologies internally at Garlik. Their customers and partners, on the whole, don’t know that they use technologies like this — they’re just impressed by what Garlik can do with the data. He raised some great points, hiring expertise in this area is hard, so they look for good software engineers and then train them in Linked Data and SPARQL. Their experience, like ours, is that developers who have built systems this way for a few months do not want to go back to SQL.

If you have a team of software engineers, developers, data owners, DBAs and project managers who you want to understand this technology then we have a proven two-day training course that teaches Linked Data from the basics.

Steve’s other key message is that this stuff is ready and possible for companies and it has allowed Garlik to do stuff they couldn’t have done with relational technologies.

John O'Donovan keynote at Semantic Tech & Business, London, 2011

John O’Donovan entertained us with a seemingly endless stream of the most wonderful (badly phrased) headlines. For him these demonstrate the need for comprehensive and well-managed metadata. He talked about the BBC’s World Cup 2010 project which built its site atop a triple store. Talis Consulting have trained many of the developers and information architects at BBC in semantic technologies.

John mirrors the message from Martin and Steve that this technology is ready, capable of delivering large production systems and has real benefits in terms of power, flexibility and cutting implementation costs.

We’ve been seeing this market mature year on year for some time now and it’s great to see three high profile keynotes all saying the same thing — Semantic Technologies are ready for you to use.

If you want to start using them, come and chat with us :)

Even more Linked Open Library data…

British Library

The British Library has refreshed their recently announced preview of the British National Bibliography as Linked Data. Your feedback to the British Library’s metadata services team has resulted in a few changes, and they have also now included explicit links from authors back to the works they have either contributed to or created. Talis in their role of providing hosting and consultancy have increased the amount of data visible from the main resource view. See this record as an example: http://bnb.data.bl.uk/id/resource/014696887

The dataset now comprises approximately 2.6 million monographs published since 1950. That’s about 80 million triples after de-duplication.

Find out more at: http://bnb.data.bl.uk

Cambridge Open METadata Project

COMET have published their JISC funded work to convert a subset of their bibliographic records to Linked Open Data.  As well as tools and workflow documentation, one of the interesting final outcomes is a blog post which clearly explains the challenges faced by anyone setting out to convert and host data as Linked Data.

Find out more at: http://data.lib.cam.ac.uk/