Give 'em good data (or they'll just use the bad stuff)

I've been looking at the Course Data Programme interim reports recently, as part of my role in the XCRI Support Project. One question that has arisen is an old chestnut: what happens if the organisations using our course marketing data from our XCRI-CAP feeds don't use it properly? If there's no guarantee on this question, it's difficult to get stakeholders to sign up.

Everyone wants services to make 'proper' use of their XCRI-CAP data. The reality is that people are already using the data, by scraping it off institutional websites. For example, how many institutions know of the Course Detective service, which is basically a hack by UKOLN on behalf of the JISC/HEA UKOER programme (http://www.coursedetective.co.uk)? While this hack-service is a benign (and experimental?) one, others may not be so. Carrying out such a hack is relatively trivial, and it happens all the time - simply using material from institutional websites is very easy. But it's unlikely that the unscrupulous hacker will update the data, so the hack-service will quickly display out-of-date information, thereby misrepresenting the institution's provision.

The wider question is the implied request for some form of policing of the usage of the data. This will happen using the same methods that organisations police the usage of other data taken from their websites.

With major "course search service" organisations, some form of service level agreement (SLA) or fair usage agreement might be worth investigating, and it's possible that JISC (for example) might be open to that suggestion. However, we all want our open and public course marketing data to be used as widely as possible. In this spirit, restrictions on usage should, in my view, be minimal, and with the recommended licencing arrangements from the Course Data Programme there can't be conditions placed on it.

While an SLA would be something to have for the major aggregators and services that institutions recruit from, for example UCAS, Hotcourses, Careers Service, Graduate Prospects and so on, we're not going to get an SLA from 'mashup boy'! Although we might like "someone" to vet and approve usage, this is not going to happen. Personally I don't think it's a problem, because the situation is conceptually no different from before we had XCRI-CAP feeds. If there's a major issue, we try to correct the offending data or use traditional legal forms of redress. I believe it's incumbent upon us to make it as easy as possible for everyone, including large national organisations and mash-up boy, to get our good quality data, so they aren't tempted to use the bad old stuff.