Talis Consultancy
World leading expertise in Linked Data and the Semantic Web

Category: Data Publishing

Tell us about your Open Data Experiences!

Open Data has become an important issue on the agenda of many organisations and companies. There are many reasons why you might decide to make your data available: On the one hand, there is legislation that requires public sector organisations in particular to make their data available (such as the Freedom of Information Act in both the UK and US). On the other hand, many owners of data start to see a potential benefit in sharing their data with the wider world, even without a direct legal requirement. Such reasons can range from wanting to provide better services to your customers or citizens, over improvements in SEO, to the expectation that opening your data will lead to cross-fertilisation within your industry (or even just within your own organisation), with an eventual benefit for all.

Open Data Problems

Labyrinth

If you or your organisation have any experiences in providing Open Data (or if you’re thinking about it), then you will have come across the 5-Star scheme for Open Data (for the original 5-star data proposal see here, a nice write-up is here) – the more stars, the more useful and connected it is. Publishing 1-star data (just put it online) to 3-star data (use a non-proprietary format) is relatively simple and straightforward. However, when it comes to 4- and 5-star data, things can become quickly become a bit more complex: If you’re new to it, the world of Linked Data and URIs can seem daunting and difficult to understand. However, even with some experience, there can often be issues of figuring out and mastering the right approach, such as deciding how to model your data, how to structure your URIs, which other data to link to and how, which vocabularies to use (or maybe you need to develop your own), etc. Other issues that can arise are more technical in nature, such as deciding which hosting platform to choose, which software to use for modelling, conversion and data maintenance, whether to set up your own infrastructure or use an external service, etc.

What are your Experiences?

If you or your organisation have encountered any of these or any other problems in the process of publishing data, we’re interested in hearing from you! We would love to learn about your data publishing experiences (both the good and bad), and the reasons you embarked on doing this. Also, Talis are currently offering a 1/2 day review of data that has been published. If you would like me (or one of my colleagues) to review the data you have published, feel free to contact me at knud.moeller@talis.com.


Labyrinth picture by rosmary on Flickr, licensed under CC BY 2.0.

Data Foundations for Digital Cities (Video)

The Open Data Cities conference in Brighton was well attended  by around 150 people interested in how cities can grow their use of open data.  The thought provoking speakers touched on subjects ranging from why cities should invest in making their data open, to how they can make that achievable.

Talis unveiled their software thought piece encouraging cities to engage with communities to explore which data would be of most interest.

Leigh Dodds’ presentation is now available to watch:

Making open data achievable

Government organisations have a remit for publishing some of their core data as open data.  This remit sometimes seems too difficult to achieve.

Tim Berners-Lee took a pragmatic approach by simplifying the problem into bite size chunks with an inbuilt mark of quality. The famous 5 Stars are not a new thing, they have been around since 2010, and we have asked if your data is 5 star before.

The 5 stars of open data publishing are clear simple steps that you can take to get your data published openly. Talis have helped the likes of the Ordnance Survey, Office of National Statistics, British Library, Data.Gov.Uk and the department for Business Innovation and skills to publish their data openly.

But how did they do it?

Our experience, gained by working with the likes of the Ordnance Survey, has shown us that providing a hosted platform for publishing data means that organisations can concentrate on data quality and utility without having to find funding for infrastructure and maintenance of specialist hardware. As John Goodwin, Research Scientist at the Ordnance Survey said:

“We decided to let Talis take care of the hosting and serving, so all we had to do was worry about making the data available.”

By pushing the infrastructure costs outside the organisation, data publishers can get on with making their local data link to a wider network of global data and gain their 5 star rating.

If you need help making open data achievable in your organisation, talk to us.

Library of Congress To Boldly Voyage To Linked Data Worlds

The Library of Congress made an announcement earlier this week that has left some usually vocal library pundits speechless.

Roy Tennant (rtennant) on Twitter

 

 

loc_logo_detail

MARC is Dead!RDA made irrelevant! – cries that can be heard rattling around the bibliographic blogo-twittersphere.   My opinion is that this is an inevitable move based upon serious consideration, and has been building on several initiatives that have been brewing for many months.

Bold though – very bold.  I am sure that there are many in the library community, who have invested much of their careers in MARC and its slightly more hip cousin RDA, who are now suffering from vertigo as they feel the floor being pulled from beneath their feet.

The Working Group of the Future of Bibliographic Control, as it examined technology for the future, wrote that the Library community’s data carrier, MARC, is “based on forty-year-old techniques for data management and is out of step with programming styles of today.”

Many of the libraries taking part in the test [of RDA] indicated that they had little confidence RDA changes would yield significant benefits…

 

And on a more positive note:

The Library of Congress (LC) and its MARC partners are interested in a deliberate change that allows the community to move into the future with a more robust, open, and extensible carrier for our rich bibliographic data….
….The new bibliographic framework project will be focused on the Web environment, Linked Data principles and mechanisms, and the Resource Description Framework (RDF) as a basic data model.

There is still a bit of confusion there between a data carrier and a framework for describing resources.  Linked Data is about linking descriptions of things, not necessarily transporting silos of data from place to place.  But maybe I quibble a little too much at this early stage.

So now what:

The Library of Congress will be developing a grant application over the next few months to support this initiative.  The two-year grant will provide funding for the Library of Congress to organize consultative groups (national and international) and to support development and prototyping activities.  Some of the supported activities will be those described above:  developing models and scenarios for interaction within the information community, assembling and reviewing ontologies currently used or under development, developing domain ontologies for the description of resources and related data in scope, organizing prototypes and reference implementations.

I know that this is the way that LoC and the library community do things, but do I hope that this doesn’t mean that they will disappear into an insular huddle for a couple of years to re-emerge with something that is almost right yet missing some of the evolution that is going on around them over that period.

As per other recent announcements, such as the vote to openly share European Libraries’ data, the report from the W3C’s Library Linked Data Incubator Group, and now the report from the Stanford Linked Data Workshop.  I welcome these developments. However I warn those involved that these are great opportunities [to enable the valuable resources catalogued and curated by libraries over decades to become foundational assets of the future web] that can be easily squandered by not applying the open thinking that characterise successes in the web of data.

British Library Data Model One very relevant example of the success of applying open thinking and approach to the bibliographic word using Linked Data is the open publishing of the British National Bibliography (BnB).  Readers of this blog will know that we at Talis have worked closely with the team at the BL in their ground breaking work.   The data model they produced is an example of one of those things that may induce that feeling of vertigo that I mentioned.  It doesn’t look much like a MARC record!  I can assure the sceptical that although it may be very different to what you are used to, it is easy to get your head around.  (Drop us a line if you want some guidance).

As we host the BnB Linked Data for the BL, I can testify to the success of this work – only launched in mid July.  It’s use is growing rapidly, receiving just short of 2 million hits in the last month alone.

With the British Library, along with the National Libraries of Canada and Germany, being quoted as partners with the LoC in this initiative, plus their work being referenced as an exemplar in the other reports I mention, I hold out a great hope that things are headed in the right direction.

As comments to some of my previous posts attest, there is concern from some in the community of domain experts, that this RDF stuff is too simple and light-weight and will not enable them capture the rich detail that they need.  They are missing a couple of points.  Firstly, it is this simplicity that will help non-domain experts to understand, reference and link to their rich resources.  Secondly, RDF is more than capable of describing the rich detail that they require – using several emerging ontologies including the RDA ontology, FRBR, etc.  Finally and most importantly, it is not a binary choice between widely comprehended simplicity and and domain specific detailed description.   The RDF for a resource can, and probably should, contain both.

So Library of Congress, I welcome your announcement and offer a friendly reminder that you not only need to draw expertise from the forward thinking library community, but also from the wider Linked Data world.  I am sure your partners from the British Library will reinforce this message.

W3C Library Linked Data Final Report Published

w3c_home The W3C Library Linked Data Incubator Group has published it’s Final Report after a year of deliberation.

The mission of the Library Linked Data Incubator Group was to help
increase the global interoperability of library data on the Web by
focusing on the potential role of Linked Data technologies.

This report contains several messages that are not just interesting and relevant for the Linked Data enthusiast in the library community. It contains some home truths for those in libraries who think that a slight tweak to the status quo, such as adopting RDA, will be sufficient to keep libraries [data] relevant in the rapidly evolving world of the web.

On the NGC4LIB mailing list, Eric Lease Morgan picked out some useful quotes from the report:

  • Linked Data is not about creating a different Web, but rather about enhancing the Web through the addition of structured data.
  • By promoting a bottom-up approach to publishing data, Linked Data creates an opportunity for libraries to improve the value proposition of describing their assets.
  • Linked Data may be a first step toward a “cloud-based” approach to managing cultural information, which could be more cost-effective than stand-alone systems in institutions.
  • With Linked Open Data, libraries can increase their presence on the Web, where most information seekers can be found.
  • The use of the Web and Web-based identifiers will make up-to-date resource descriptions directly citable by catalogers.
  • History shows that all technologies are transitory, and the history of information technology suggests that specific data formats are especially short-lived.
  • Library developers and vendors will directly benefit from not being tied to library-specific data formats.
  • Most information in library data is encoded as display-oriented, natural-language text.
  • Work on library Linked Data can be hampered by the disparity in concepts and terminology between libraries and the Semantic Web community.
  • Relatively few bibliographic datasets have been made available as Linked Data, and even less metadata has been produced for journal articles, citations, or circulation data.
  • A major advantage of Linked Data technology is realized with the establishment of connections between and across datasets.
  • Libraries should embrace the web of information, both by making their data available for use as Linked Data and by using the web of data in library services. Ideally, library data should integrate fully with other resources on the Web, creating greater visibility for libraries and bringing library services to information seekers.

Also, from the report:

Relatively few bibliographic datasets have been made available as Linked Data, and even less metadata has been produced for journal articles, citations, or circulation data — information which could be put to effective use in environments where data is integrated seamlessly across contexts. Pioneering initiatives such as the release of the British National Bibliography reveal the effort required to address challenges such as licensing, data modeling, the handling of legacy data, and collaboration with multiple user communities. However, these also demonstrate the considerable benefits of releasing bibliographic databases as Linked Data. As the community’s experience increases, the number of datasets released as Linked Data is growing rapidly.

Talis Consulting has been closely and actively involved in the modelling, data transformation, publishing, and hosting of the British National Bibliography (BnB) as Linked Data.  A great overview of the approach taken to modelling of bibliographic data in a way that makes it easily compatible with the wider Web of Data, is provided by Tim Hodson in his post – British Library Data Model: Overview.  As can bee seen from their work, the modelling used for the BnB differs from the approach taken by many attempting to publish bibliographic data as Linked Data – it describes the resources (the books, authors, publishers, etc.)  as people, places, events, and things, as against attempting to represent the records that libraries keep about their stock of resources.

With intentions to release open library data specifically mentioning Linked Data, the sentiments from this report are already influencing the wider forward thinking library community.  I will leave the last word to the report’s final paragraph which some, in the traditional record-based cataloguing community, may have difficulty in getting their head around.  I encourage them to look at libraries from the point of view of the wider [non-library] web consumers, and read it again.

One final caveat: data consumers should bear in mind that, in contrast to traditional, closed IT systems, Linked Data follows an open-world assumption: the assumption that data cannot generally be assumed to be complete and that, in principle, more data may become available for any given entity. We hope that more “data linking” will happen in the library domain in line with the projects mentioned here.

The Tyranny of Time

A guest post by Lawrence Serewicz, Principal Information Management Officer, Durham County Council

I came across the following reference to time within the retail sector and it made me consider how my world of local government, or any business for that matter, thinks about time.

An old saying in the retail industry is that: ‘If information is available monthly, then decisions taken will take 6 months to have an effect. If it is available weekly, then decisions take a month to influence outcomes; if daily, it takes a week; and if hourly, the decisions can have an impact the next day’ (p.13)

(Source : Valuing Information as an Asset http://www.sas.com/reg/gen/uk/valuing-information )

How often do we collect data? In many organisations, there are quarterly returns, but is that enough for today’s services? In some cases, councils collect real-time data, but are their reporting systems ready for it? For example, management or cabinet committee, meetings may be once a month, but is that enough to have a strategic view of what is happening within an organisation?

At one level, the timeframe for Council Members is different because their work is strategic, they are trying to shape the organisation’s future and where it will be over the long term and not determining if the recent refuse collection achieved 99% or 98% effectiveness. Even if we discount the member’s need to have real time data (at least a strategic level) and focus on the manager/officer’s role, we still see the tyranny to time.

How often do we see, use, or for that matter, analyse, real-time data? Do our performance management systems display a disconnect between the timescale within which they are collected and reported? We may have refuse bin collection rates measured every day, but if our performance reporting within the organisation is quarterly, how well does that serve the organisation? At the same time, is that performance information available to the services, such as customer service desks?

In this example, if the real-time performance is being reported to the customer service desk, they can see that the bin collection rate on a snowy day (for example) is lagging in some areas, but is still robust in other areas. Thus, a call from an area with good collection (say 99%) is going to be a different issue than a missed collection in an area with an 80% collection rate because of the snow conditions. Yet, how many performance management systems or performance information systems are designed to capture and analyse real time data. Even weekly data, can be considered real time data depending on the service, so it raises the point at the start. If the information is only available quarterly, what is the impact rate? If you collect each quarter, is the final impact seen yearly or in two years? If that is the timescale, is it going to be effective?

What does this have to do with open data? If data is being collected and made available to customers and the public, are they getting real time data or is there an organisation influenced lag effect on the data? One of the main themes within the UK  government’s open data collection consultation (http://data.gov.uk/opendataconsultation) as well as its overall transparency agenda is to open service performance information to the public.

The service performance information will inform their choice about services but also to hold it to account. Yet, if there is a lag effect, between when service information is collected and published can the public hold a local authority to account effectively? How much and when the information is released can have a large influence on whether an organisation is accountable. If it only has to report once a year, how much accountability can be achieved? If a change in performance is required, how will it be demonstrated in such a long reporting cycle?

If, however, real time data is released, will that have a destabilizing effect on the political process? If the political process is relying on quarterly performance reporting and the public are getting the information in real time, how will elected members be able to respond? Moreover, if members, as residents, are consuming the information in real time as well, what is the role of a quarterly performance reporting system?  To be sure there will be different reports for different issues, but the underlying question is how to make open data respond to real time demand.  Do I need to know the car park was full last week if I am trying to get parked now?

The issue of time is also about how and where information is released. If an organisation releases its performance statistics in a paper report, and not as a spreadsheet, can external scrutiny be achieved?  In that sense, the format for publication will show the timescales. Such reporting has an immediate and direct effect on the ability of the public, and members, to hold the organisation to account.

At the same time, there is the question of whether real time reporting fits your strategy. If one company is working on a the day to day reporting and another is taking a ten year strategy to grow they will have different understandings of time.  Moreover, their reporting mechanisms will be different.  Yet, can the 10 year plan work without taking care of the day to day? In that sense, can anyone escape the tyranny of time?  The more your competitors harness, the more you will need to adapt or adopt.

From an accountability perspective, the issue may be simply finding a way to reconcile that with monthly or quarterly performance reporting to the real time data.

What effect this will have on the way we operate in the public and private sectors?  Only time will tell.

Schema.org Déjà vu

schema-org1 The Web has been around for getting on for a couple of decades now, and massive industries have grown up around the magic of making it work for you and your organisation.  Some of it, it has to be said, can be considered snake-oil.  Much of it is the output of some of the best brains on the planet.  Where, on the hit parade of technological revolutions to influence mankind, the Web is placed is oft disputed, but it is definitely up there with fire, steam, electricity, computing, and of course the wheel.  Similar debates, are and will virtually rage, around the hit parade of web features that will in retrospect have been most influential – pick your favourites, http, XML, REST, Flash, RSS, SVG, the URL, the href, CSS, RDF – the list is a long one.

I have observed a pattern as each of the successful new enhancements to the web have been introduced, and then generally adopted.  Firstly there is a disconnect between the proponents of the new approach/technology/feature and the rest of us.  The former split their passions between focusing on the detailed application, rules, and syntax of it’s use and; broadcasting it’s worth to the world, not quite understanding why the web masses do not ‘get it’ and adopt it immediately.  This phase is then followed by one of post-hype disillusionment from the creators, especially when others start suggesting simplifications to their baby.  Also at this time back-room adoption by those who find it interesting, but are not evangelistic about it, starts to occur.  The real kick for the web comes from those back-room folks who just use this next thing to deliver stuff and solve problems in a better way.  It is the results of their work that the wider world starts to emulate, so that they can keep up with the pack and remain competitive.  Soon this new feature is adopted by the majority, because all the big boys are using it, and it becomes just part of the tool kit.

A great example of this was RSS.  Not a technological leap but a pragmatic mix of current techniques and technologies mixed in with some lateral thinking and a group of people agreeing to do it in ‘this way’ then sharing it with the world.  As you will see from the Wikipedia page on RSS, the syntax wars raged in the early days – I remember it well 0.9, 0.91, 1.0, 1.1, 2.0- 2.01, etc.  I also remember trying, not always with success, to convince people around me to use it, because it was so simple.  Looking back it is difficult to say exactly when it became mainstream, but this line from Wikipedia gives me a clue: In December 2005, the Microsoft Internet Explorer team and Microsoft Outlook team announced on their blogs that they were adopting the feed icon first used in the Mozilla Firefox browser. In February 2006, Opera Software followed suit.  From then on, the majority of consumers of RSS were not aware of what they were using and it became just one of the web technologies you use to get stuff done.

I am now seeing the pattern starting to repeat itself again, with structured and linked data.  Many, including me, have been evangelising the benefits of web friendly, structured, linked data for some time now – preaching to a crowd that has been slow in growing, but growing it is.   Serious benefit is now being gained by organisations adopting these techniques and technologies, as our selection of case studies demonstrate.  They are getting on with it, often with our help, using it to deliver stuff.  We haven’t hit the mainstream yet.  For instance, the SEO folks still need to get their head around the difference between content and data. 

Something is stirring around the edge of the Semantic Web/Linked Data community  that has the potential to give structured web enabled data the kick towards mainstream that RSS got when Microsoft adopted the RSS logo and all that came with it.   That something is schema.org, an initiative backed by the heavyweights of the search engine world, Google, Yahoo, and Bing.  For the SEO and web developer folks, schema.org offers a simple attractive proposition – embed some structured data in your html and, via things like Google’s Rich Snippets, we will give you a value added display in our search results.  Result, happy web developers with their sites getting improve listing display.  Result, lots of structured data starting to be published by people that you would have had an impossible task in convincing that it would be a good idea to publish structured data on the web.

I was at Semtech in San Francisco in June, just after schema.org was launched and caused a bit of a stir.  They’ve over simplified the standards that we have been working on for years, dumbing down RDF, diluting the capability, with to small a set of attributes, etc., etc.  When you get under the skin of schema.org, you see that with support for RDFa and supporting RDFa 1.1 lite, they are not that far from the RDF/Linked Data community.

Schema.org should be welcomed as an enabler for getting loads more structured and linked data on the web.  Is their approach now perfect,? No.  Will it influence the development of Linked Data? Yes.  Will the introduction be messy? Yes.  Is it about more than just rich snippets?  Oh yes.  Do the webmasters care at the moment? No.

If you want a friendly insight in to what schema.org is about, I suggest a listen to this month’s Semantic Link podcast, with their guest from Google/schema.org Ramanathan V. Guha. 

Now where have I seen that name before? – Oh yes, back on the Wikipedia RSS pageThe basic idea of restructuring information about websites goes back to as early as 1995, when Ramanathan V. Guha and others in Apple Computer’s Advanced Technology Group developed the Meta Content Framework.”  So it probably isn’t just me who is getting a feeling of Déjà vu.

Putting the Links into Linked Data

Everything is, as Ted Nelson put it, deeply intertwingled. The relationships between things are beautifully multifarious. But the intertwingularity of digital information is still in its swaddling clothes.

Different (and sometimes proprietary) identifier systems, access mechanisms, schema, and data formats, as well as poor metadata, all cause massive friction when integrating data.

Linked Data eases the pains of data integtration by standardising on an open identifier system and access mechanism (HTTP URIs), a common well understood set of data formats, and through the best practice of reusing vocabulary terms used by other datasets.

But still, linking between data sets is relatively basic. There are some reasons for this:

  • It’s not always obvious which other datasets you could link your data to.
  • Unless the same ‘natural’ keys are used between datasets (for example, ISBNs in bibliographic data), it is hard to generate reliable, accurate links between entities in different datasets.
  • Generating links by doing ‘lookups’ (eg: SPARQL queries against the target dataset) as part of the linked data creation process is prohibitively time consuming.
  • The accuracy of generated links is often not quantified, and may be insufficient for some use cases
  • The links between two datasets can go stale when either dataset changes.

The ‘LOD Around The Clock’ (LATC) Project (which we at Talis Consulting are a part of) is working to make it easier for dataset publishers to interlink their data with other datasets by developing a Linking Platform that will take care of the heavy lifting.

The platform will include a Dataset Inventory which will let you search and browse for datasets that you might want to interlink.

Once you have chosen which datasets you want to link, you will be able to go to the LATC Platform’s hosted version of Silk Workbench. Silk Workbench is a web application that lets you design quite sophisticated “link specifications”, which describe the conditions under which a link can be generated between entities from two different datasets.

Once you have created the link specification, the LATC platform will run it, performing all the necessary SPARQL query lookups, generate the new links, and also provide an evaluation of their accuracy (based on sample links you provide when creating the link specification). If either dataset changes after the links have been created, the links will be regenerated. You (and anyone else) can subscribe to a feed that will let you know when the links have been updated, and you can pull the latest version of the links back into your dataset.

All this, we hope, will substantially reduce the technological barriers to deeper intertwingularity in the web of data. Another barrier remains though. Linked Data is a paradigm shift in Information Technology; we are so used to the world where data integration is hard and expensive, and connections between datasets are traditionally made by applications in a one-off effort (‘mashups’) that needs to be duplicated by each application that wants to use that connection; so connections are only made when the pay off is high enough to justify that effort. Even for experienced Linked Data practitioners, an effort of imagination is required to picture the possibilities of a vast web of data with a substantially higher level and variety of connectivity, and of applications that can simply follow those connections, instead of needing to make them on demand with expensive lookups and calculations.

So now we need to imagine a future of Really Linked Data, and make it happen. What connections would you like to see manifested in the web of data? Let us know and LATC will try to create them.

And if you would like to explore how Linked Data can help you with data-integration and open new opportunities for your organisation, please get in touch with us at Talis Consulting.

Further Reading

Ontologies wont make you rich: or will they?

This post sets out some discussion points that arose in response to a conversation with +Aaron Bradley on Google+. The conversation was prompted by Kendall Clark’s post which started by suggesting “an OWL ontology is like a public API for your data”. Aaron suggested that his OWL ontology may need to remain private in order to retain competitive advantage.

There is no value in writing ontologies that are not shared. If you describe your own data in your own way without sharing that ontology, how will you ever find other data that you could mix into yours at a later date?

The counter argument is that the data within your organisation is disparate and needs to be organised, but you don’t want to give away your secrets as to how you have organised your data. I am not about to claim that Linked open Data is the only way to do Linked Data. Linked Data within an organisation will allow data integration across departments to happen more easily.

But the ontology is not core to this. It is the way you can combine data with shared URIs that use open ontologies that is the killer feature. So if you want to protect anything, then you may want to protect those URIs. Now that we are talking about URIs we have already moved the discussion into the data layer rather than the ontology layer, and you’re still able to protect your data even if people know what ontologies you’re using.

An ontology is not going to give you a competitive advantage. Your advantage will be what you do with the data, not how the data is described. No-one to my knowledge has made a business out of trading database schema; but when they trade well curated data, there is money to be made.

If more than one organisation uses the same ontologies to describe two different datasets, then that ontology has started to create a data market where those two organisations can trade their data without prohibitive data integration overheads. Sharing your ontology helps you to grow your market.

If you are interested in having your data easily available via a public API, you will find that publishing your data as Linked Data, because it can be published with both a Human friendly HTML face and machine friendly RDF face, transforms your website into your API. There are standard techniques that can then be applied to monetize your data streams, and this may even include a paywall.

Of course you might use OWL, or some part of OWL, to describe how your data is structured, but if you need APIs built on top, then a Linked Data approach is proving to be a simple way to achieve both those aims in one go, surely that is more cost effective?

In summary: The data layer is where your competitive advantage sits. The ontology layer is the bit of the Linked Data ecosystem that is going to add value to your data through ontology re-use making your data easier to integrate, both internally and externally, and growing your market. Your API (either internal or external) can be built easily using a Linked Data approach.

If you want to know more about how re-usable ontologies can grow your market, then talk to us.

This post is not about Linked Library Data

Primitive Memories by Jurvetson, some rights reserved

You have some data in a format that was heavily influenced by the technological constraints of the 1960s. You have heard discussion of new formats for your data and you are aware of emerging communities who might be interested in using your data. You also have a real feeling that you and your data are being left behind.

You may have heard of something called Linked Data which its proponents claim is going to make your data more widely reusable. If you think that simply mapping your existing record structures into some new format is going to make this sharing of data happen, then you are wrong and missing the point of Linked Data.

Your data – built up over years, and on which you base your livelihood – is data that describes stuff that exists in the real world in some form or other. Hold that thought.

Linked Data uses a data format (an organised way of formatting data) called the Resource Description Framework (RDF). This framework is very flexible and powerful because it basically allows you to do two things.

  1. Give identifiers to real world things
  2. Describe how those real world things interrelate

This means your existing record structures (born out of a need to efficiently store and access data on computer systems that make your smartphone look like a supercomputer) are not actually telling you anything useful about your data.

Linked Data is about first exploring the real world that your data references, and then modelling how those real world things relate to each other, and describing their important qualities. It is not about cross walking data formats from one structured storage mechanism to another.

Don’t waste time trying to re-map your data from one comforting set of terms to the same set of terms couched in a different language. Instead talk to us and we will help you see just how much more potential your data can have when your data is representative of the real world that you do business in.

Post Script: This really isn’t a post about Linked Library Data, or misguided efforts to keep Marc21 alive in an RDF format. But if you need an example of something to avoid, you have it there.

Image Credit: Primitive Memories by jurvetson, some rights reserved