Talis Consultancy
World leading expertise in Linked Data and the Semantic Web

Blog

Continuing Semantic Evolution

Photo by UggBoy♥UggGirl, Some rights reserved

Hal Hodson from Information Age, has picked up on Talis’ past and outlines how it has contributed to Talis as we are today.

Talis recognises that easier access to computing power and technologies which can deliver insight into larger sets of data are becoming increasingly important for creating competitive advantage.

Big Data offers big opportunities.

Talis can support your aspirations through the whole of your project delivery.  And by that we mean we don’t do the work for you, we do the work with you.

Talis have worked with the BBC, the Ordnance Survey, and others, in fact we have helped over 30 organisations to support their delivery of projects which reduce the length of time required to innovate and get a product to market. Whether that be your existing market or a new tranche of potential customers who access your data driven product through a data marketplace.

Talis have been exploring the information marketplace arena through Kasabi and are now taking that learning and applying it to individual organisations that want their own white label Kasabi as a hub for data, either hosted as a service or internally deployed.

We’ll be revealing more about these projects in the coming weeks, but our digital city demonstrator gives an idea of how a ‘data hub’ could evolve around a single organisation.

If you’d like to keep abreast of our forthcoming announcements, subscribe to our newsletter, or contact us.

Welcome to the Knowledge Graph

Yesterday’s announcement of Google’s new Knowledge Graph feature has already been well covered in a number of different reports. See for example ReadWriteWeb and TechCrunch. If you’ve not yet seen it then the product announcement gives some additional useful background and includes a video demonstration. I thought I’d share some early thoughts of my own on the news.

Firstly the reporting on what Google are actually doing is a little confused. For example ReadWriteWeb say that Google “looks at the words of your query and identifies the things in it.” But, from reading this bit of background from Google, its clear that they’re actually doing something more interesting and potentially more powerful.

There has been lots of previous work on creating semantic search engines that include natural language parsing to “understand” more about a user’s query and its contents. But what Google are doing is looking at the search results to identify the things that are frequently referenced, and then surface useful summaries of those things from their Knowledge Graph. As a user you can refine the results by identifying which thing you were actually interested in.

This seems like a more powerful approach as its driven from the content and data that’s on the web, not just the snippet of text we’re using in our searches. Its a much more emergent approach and will enable them to quickly surface information from their Knowledge Graph about whole new classes of things that people are talking about on the web.

And Google have already given us the means to help improve the mapping between the content we publish and the Knowledge Graph: Schema.org.

Schema.org

Schema.org has already been driving improvements in the amount of structured data on the web. Initially the goal seemed to be to drive some smaller refinements to search engine behaviour, e.g. by allowing better rich snippets on search results. But now we can see that there is a bigger vision.

The core schema proposed by Google and others is busily being extended by a community of people who are tailoring it to the needs of their particular domains. There’s also a recent move to provide more linking from schema.org markup to authoritative sources. With increased adoption of embedded metadata, and continual refinements to both the types of things that are described and the available detail about them, we can see that Schema.org is going to help drive improvements to their Knowledge Graph.

Firstly, its going to let them feed and grow the Knowledge Graph more directly from their web crawls. They’re already extracting lots of data from pages, but having more structure at source will be much more reliable. Secondly, the Schema.org markup is going help them more readily identify the entities referenced or described in a page and associate that content with entries in their Knowledge Graph.

That will also mean that we can predict a more fundamental change in how sites will be prioritised and ranked in search results: how authoritative is their content about a particular person, place, or thing? Good content will remain a must, but clear identification of the “things, not strings” that it references will be vital.

Build Your Own Knowledge Graph

The early reporting on the origins of the Knowledge Graph expresses a lot of interest in the sources of the data. Freebase clearly provides a strong backbone for the effort, but there is a collection of other sources being drawn upon. I wonder how many of the commercial sources that Google have used initially might end up getting supplanted by data drawn directly from web crawls in the future as schema.org markup becomes more ubiquitous?

While Google can clearly still operate at a scale that most organisations can only dream of, the Knowledge Graph is something that is actually within the reach of most organisations already. The approach is a new and exciting addition to their search engine, but the technology and capability isn’t a radical leap forward. You could build a Knowledge Graph of your own.

Google haven’t created a whole new dataset, they’ve collated existing sources. The data already available in the Linked Data cloud — which includes Freebase — is there for anyone to reuse. It is perfectly feasible for an organisation to create its own “knowledge graph” to serve a particular product or domain by selecting from the available sources.

Indeed this is what the BBC has been doing for some time now. As they’ve described in numerous talks and interviews: by drawing on data from the web, and using it as their content management system, they’ve been able to create graphs of data to power their own innovative applications.

These product graphs weave together open data sources with the BBC’s own unique content. The fundamental technologies and the scale of operations may differ, but both Google and the BBC are deriving real value from focusing on “things, not strings”.

Graphs aren’t just cool, they’re a necessary component of innovative data-driven products.

Shared Vision

It would be easy to get distracted by comparing what Google have done with the details of the Semantic Web and Linked Data vision, but this would be a mistake. Personally I see this as a massive validation of the overall approach: Google have built a large graph database populated with a rich domain model describing things in the world. To quote from Amit Singhal’s announcement: “It’s the intelligence between these different entities that’s the key”.

The Knowledge Graph, and the product its built on, is clearly going to improve as more structured data is added to the web; data that will also be available for anyone else to crawl, mine and process. The basis for network effects are already in place. It’s irrelevant whether the Knowledge Graph itself is published as Open Data or via an API, although that may happen in the future. The underlying technologies themselves are an implementation detail.

What’s important is that the approach, particularly the reliance on the web as a data source and exploiting the value of relationships between things, is Semantic Web through and through.

If you’re interested in exploring these ideas further, especially how your organisation could contribute to or build its own “knowledge graph” then get in touch.

Tell us about your Open Data Experiences!

Open Data has become an important issue on the agenda of many organisations and companies. There are many reasons why you might decide to make your data available: On the one hand, there is legislation that requires public sector organisations in particular to make their data available (such as the Freedom of Information Act in both the UK and US). On the other hand, many owners of data start to see a potential benefit in sharing their data with the wider world, even without a direct legal requirement. Such reasons can range from wanting to provide better services to your customers or citizens, over improvements in SEO, to the expectation that opening your data will lead to cross-fertilisation within your industry (or even just within your own organisation), with an eventual benefit for all.

Open Data Problems

Labyrinth

If you or your organisation have any experiences in providing Open Data (or if you’re thinking about it), then you will have come across the 5-Star scheme for Open Data (for the original 5-star data proposal see here, a nice write-up is here) – the more stars, the more useful and connected it is. Publishing 1-star data (just put it online) to 3-star data (use a non-proprietary format) is relatively simple and straightforward. However, when it comes to 4- and 5-star data, things can become quickly become a bit more complex: If you’re new to it, the world of Linked Data and URIs can seem daunting and difficult to understand. However, even with some experience, there can often be issues of figuring out and mastering the right approach, such as deciding how to model your data, how to structure your URIs, which other data to link to and how, which vocabularies to use (or maybe you need to develop your own), etc. Other issues that can arise are more technical in nature, such as deciding which hosting platform to choose, which software to use for modelling, conversion and data maintenance, whether to set up your own infrastructure or use an external service, etc.

What are your Experiences?

If you or your organisation have encountered any of these or any other problems in the process of publishing data, we’re interested in hearing from you! We would love to learn about your data publishing experiences (both the good and bad), and the reasons you embarked on doing this. Also, Talis are currently offering a 1/2 day review of data that has been published. If you would like me (or one of my colleagues) to review the data you have published, feel free to contact me at knud.moeller@talis.com.


Labyrinth picture by rosmary on Flickr, licensed under CC BY 2.0.

Data Foundations for Digital Cities (Video)

The Open Data Cities conference in Brighton was well attended  by around 150 people interested in how cities can grow their use of open data.  The thought provoking speakers touched on subjects ranging from why cities should invest in making their data open, to how they can make that achievable.

Talis unveiled their software thought piece encouraging cities to engage with communities to explore which data would be of most interest.

Leigh Dodds’ presentation is now available to watch:

Open data cities demonstrator

Photo credit: Tim Hodson

At the Open Data Cities Conference in Brighton, Talis unveiled their demonstrator app (link below) which shows how a city might begin to engage with it’s citizens and promote digital economy innovation.

The demo is designed to highlight the ways in which a city and its citizens might be brought together in an information marketplace. The demo is designed to trigger questions around how cities might use an interactive information marketplace to measure social impact. The demo is the software equivalent of a thought piece, allowing us to talk about the things that might change the way people in your city think about engaging with each other in social enterprise.

Talis have been exploring ways in which a data marketplace might add value to individual datasets, and have built Kasabi which allows anyone to publish their data easily, and then harness the power of multiple data access channels.

Key demonstrator themes:

  • citizens can request data about their local area
  • citizens can use data, from the city and local businesses, to build apps
  • the city might fund the building of apps that are in demand
  • citizens can share apps they have built
  • business can use the marketplace, to publish the data that will power other applications.
  • cities can easily publish data about anything
  • citizens able to add data to existing datasets
  • developers have several tools for accessing indexed and structured data
  • all data added to the site is indexed as it arrives and becomes available to applications within a very short time
  • the information marketplace is a data hub providing a revenue share opportunity

Behind the lightweight demonstrator sits a technology stack that provides data hosting and integration. The simple datasets used as examples in the demo can be explored by both developers who understand working with data, and citizens with no programming background.

I could throw the names of some technologies at you, such as graph databases, geo indexes, full-text indexes and application programming interfaces using a variety of protocols, but it is the self service nature of kasabi combined with the interactive and social aspects of our demonstrator that we think will make the difference to your city.

As a city we think you probably know your citizens quite well, however I am sure that there are ways that they can surprise you. Maybe it is a loosely organised not for profit company that sets itself the mission of providing the best quality data about where to park in your city. Maybe they take data that you provide about where the parking spaces are and how often they are used and combine it with a calendar of city wide events sourced from several other data providers. Maybe they built an indispensable app that helps people to choose the best parking site in the city. Maybe it even integrates with an existing drive-sharing scheme to provide parking booking services for commuters and tourists alike.

An idea like that is only possible if the people wanting to build a data driven application have easy access to data.

Of course there is no reason why that access should be free. A car parking app might charge a small fee for the provision of the service, and that fee might be shared with the data providers and the city playing host to the data in a marketplace. Everyone gets to have a share in the success of the idea.

For cities that might have a perceived poor parking experience, an app like this might improve the imgae of the city and reposition it as an easy place to find parking. It might even change people’s parking behaviour to the better, a social impact that becomes easier to measure.

At Talis, we are keen to work with you to explore how your city data and your citizen’s data might be brought together in a marketplace that allows new business to start and thrive.

See the demo >>

Talk to us

App Fund

Talis has been working with data for many years, and helping others make the most of their information. An area that we’ve been focusing on over the past few months is applications which make rich use of data.

We found three applications which excited us, and offered them financial backing to get off the ground. For the last few months, the teams have been building up and testing out their applications, and we are watching as they get them going!

I’ll post an original roundup of the app projects below, and there will be follow-up posts here and on the Kasabi blog.

If you work on a similar project, or would like to share your ideas with us, please get in touch!

Exploring Botanical Gardens from a Smart Phone

StrongSteam: Ian Ozsvald and Kyran Dale

The StrongSteam team are working on an iPhone app that opens up new levels of exploration for visitors to botanical gardens. The app will let people access tons of information about the plants they find by taking a photo of the label. The app uses advanced character recognition to read the Latin name from descriptive labels, and pulls in data from a variety of sources to tell the user far more about the plant than could be available on signs.

They’re using the StrongSteam datamining API for matching plant labels and IDs, then using datasets in Kasabi (GeoSpecies, DBPedia and BBC Wildlife for example) to extract detailed information about plant species. The user will then be given facts, figures and other pieces of information, letting them learn far more about the plants they find interesting.

FixMyStreet (Latest from MySociety Here)

My Society: Paul Lenz and Myfanwy Nixon

Through the popular app, FixMyStreet, My Society has been giving people the ability to report damaged infrastructure to their local authorities for a few years. Using smartphones, people have been highlighting things like potholes and broken streetlights across the UK since 2008. The app is now going through a complete overhaul, upgrading to a more sophisticated, HTML5-based service. The new FixMyStreet is a more powerful, responsively designed mobile-web version of the older native apps, and uses Kasabi to store a continuously-updated list of new problem reports. The new dataset includes information about councils, kinds of damage, timestamp and status of repairs along with detailed lat/long locations.

John Peel Time Machine

Storm: Dave Kelly, Mike Ellis, and Paul Leader

Developers from Storm are putting together a time machine travelling back through some of the greatest musical events of the 20th century under the watchful eyes of the legendary BBC radio DJ, the late John Peel. Building on the dataset of John Peel Sessions, the web app will guide users’ journeys on their search for artists who appeared on the live recordings of John Peel’s long-running show.

The Time Machine will work on a timeline, giving a high-level view of the Peel sessions by year, and highlight some of the relationships amongst musical artists. Where it can, it will link to recordings of the live sessions, and provide biographical information about the artists. The time machine will also provide information about the albums and tracks featured, and point users towards playlists of sessions, which they can purchase or listen to via the likes of iTunes or Spotify.

GCloud

Following on from work with UK governmental agencies (such as the BIS Research Funding Explorer and Ordnance Survey), Talis has joined the UK Government’s GCloud supplier community. We have been awarded an agreement within the framework to provide the UK government with Software as a Service products and services.

This means we’re part of the network of suppliers created to make finding cloud-based for public work a lot easier. Francis Maude, the Minister for the Cabinet Office summed up the setup:

“Simply stated, purchasing services from CloudStore will be quicker, easier, cheaper and more transparent for the public sector and suppliers alike.

GCloud is a list of selected suppliers, and has been built to work like a shop front (a “CloudStore”) for government group to search for solutions to problems or offering ideas that enhance their public service. Suppliers from the GCloud store would still work with transparent tendering, but the processes have been sped up to make it quicker to find a provider. It’s also aimed at helping various governmental bodies to get the best out of small/medium businesses (like Talis).

For Talis, this means it is now easier for us to work on exciting data projects with public-sector ‘sets for local and national government. It should also be quicker for any projects that would benefit from us hosting data to get off the ground—and into the cloud (sorry, had to.) So, for your public-sector project, it is a simpler process to work with us—email Alison to learn more.

If you are curious, you can read more about GCloud on the Civil Service Site.

A letter from the Middle Ages

Well actually, not just one, but over a thousand letters from the middle ages.

Last weekend, the National Archives held a Hackathon in the reading room at Kew. Around 40 developers and interested people took data from the National Archives and played with it.  There were new mobile interfaces for the NRA discovery API; collections of tweets mined for the data and PDFs they contained; stats on historical participation in the olympics pulled from the archives and shown on interactive maps. In all it was a fun weekend with lots of smart people in the room and very quiet but rapid typing on keyboards to get something finished by the 4pm Sunday deadline.

Prizes were:

  • 1st – Jonathan Tweed and Kai En Ong (ably assisted by Michael Smethurst, Faith Mowbray and Paul Rissen). A hack that pulls out data surrounding people & places in documents tweeted by @ukwarcabinet (and which – for a hack – is beautifully presented!).
  • 2nd – Jamie Mahoney - Debtors & creditors dataset hack maps the most popular lenders & shows who’s borrowing from where – Show me the money.
  • 2nd – Tim Hodson – A hack showing who wrote to whom in the middle ages.
  • 3rd – Crystal and Steven Hirschorn – A hack showing participation in the Olympics on an interactive map.

You can read more about these entries on the National Archives blog.

I hope you’ll forgive my showing off of my joint second prize winning contribution to the pizza and jelly baby fuelled hack fest.

I took a suggestion from Paul Risson as a personal challenge, and started puling the data that I wanted into a new CSV file.  I then converted that CSV file to a rudimentary RDF based model of the letters and people that the data described.  I now had a graph dataset which captured – in the way only a graph can – the network of relationships between people who are corresponding. It was then a case of finding a suitable javascript library to render my graph as a visual and to allow people to find out about who wrote to whom without cluttering up the graph diagram.

A guide to achievable data publishing

Opening your data sounds like a big scary sort of project that you wouldn’t want to have land in your lap.  It sounds like it ought to open up a minefield of legal, technical and practical issues that are maybe too big to tackle.

Our recent webinars sought to dispel any such myths, and provide you with a project outline that would work for your teams.  We know it works because this is how we run our projects to help organisations manage the transition to publishing their data in a new way.

Talis have the tools and experience to help you get up and running in months rather than years. Now you can watch this recording of the webinar to find out more.

Making open data achievable

Government organisations have a remit for publishing some of their core data as open data.  This remit sometimes seems too difficult to achieve.

Tim Berners-Lee took a pragmatic approach by simplifying the problem into bite size chunks with an inbuilt mark of quality. The famous 5 Stars are not a new thing, they have been around since 2010, and we have asked if your data is 5 star before.

The 5 stars of open data publishing are clear simple steps that you can take to get your data published openly. Talis have helped the likes of the Ordnance Survey, Office of National Statistics, British Library, Data.Gov.Uk and the department for Business Innovation and skills to publish their data openly.

But how did they do it?

Our experience, gained by working with the likes of the Ordnance Survey, has shown us that providing a hosted platform for publishing data means that organisations can concentrate on data quality and utility without having to find funding for infrastructure and maintenance of specialist hardware. As John Goodwin, Research Scientist at the Ordnance Survey said:

“We decided to let Talis take care of the hosting and serving, so all we had to do was worry about making the data available.”

By pushing the infrastructure costs outside the organisation, data publishers can get on with making their local data link to a wider network of global data and gain their 5 star rating.

If you need help making open data achievable in your organisation, talk to us.