This post is a summary of the 'consuming XCRI-CAP' work I did in relation to the Trainagain service, a fairly straightforward SQL Server based system that contained low thousands of short course information for use in a searchable website.

The purpose of the Trainagain database was to provide searchable data about short training events. It was not optimised for courses in the XCRI-CAP sense, though it could hold such data. Some differences included 'event'; in Trainagain this was a 'short course' offered by a provider (usually an FE college or private training provider) held at a single venue. So XCRI-CAP presentations are all 'events' for Trainagain.

Most of the courses data in the Trainagain database is held in a single table called Event. There are links to tables for Country (in the world), Area in the UK (primarily counties), Category (very limited reference data about the subject of study) and EventType (4 values: 'Course', 'Network', 'Apprenticeship', 'Consultancy'). There are other tables not relevant to this work. The Event table has a good number of fields that matched with the XCRI-CAP requirements. These included conventional items such as start and end dates, duration, venue information, title and so on, and also some larger descriptive fields such as a general summary, eligibility and outcome.

Trainagain has some mandatory requirements for data that must be included in records in the Event table. These data items are reference data, as specified in the relationships described above for Country, Area, Region, Category and EventType.

The overall approach I took was iterative, in order to determine what the practical difficulties were and their solutions - by no means production-level! Starting with one institution's data, I mapped XCRI-CAP to the Trainagain structure using Altova MapForce. This tool generates sql statements that can be used to import data into the database, showing error messages on failure. The error messages were used to formulate successive new mappings and sub-processes, until success was achieved with the first institution's data. Having recorded the mapping and process, these were then used to import the other institutions' data, revealing a small number of additional difficulties, owing to differences in the XCRI-CAP content. For the last two institutions, relatively few difficulties were encountered.

A difficulty was the requirement for pre-population of reference data on records or SQL Server would refuse to load them. Reference data included Trainagain's existing esoteric (but short) subject list. As the latter only had very limited utility, I replaced this with the LDCS classification, and each course record was necessarily classified with the LDCS codes. Other reference data imported was for geographical data related to the venues, namely Country (UK), Region (EU region) and Area (county). For EventType default reference data values were selected on import (for most XCRI-CAP data this was, unsurprisingly, 'Course'). For simplicity the geographical reference data was loaded just for the relevant areas, using a manual lookup of appropriate values. The reference data tables could be populated using whole UK data sets easily.

The main data was loaded into the database using MapForce's sql routines. Trainagain uses auto-number for Region and Area identifiers in the reference data tables, so these were mapped across to the main XCRI-CAP data using the reference data tables as lookups. Country defaults to the UK identifier.

No insuperable problems were encountered, though I did find the following issues:

1. EventIDs were integers, so XCRI-CAP internalID values could not be used, as they are strings and may contain letters; an autonumber was used instead. This would be a stumbling block for any updating mechanism using partial files, as the XCRI-CAP identifiers could not be used to uniquely identify changed records.

2. The FullPrice field has to be an integer, so, where no value was available, the value '0' had to be used, which is incorrect.

3. Similarly, PlacesAvailable required an integer value, so if not available, the value '0' was used – potentially very misleading, as the course might be shown as having no places – perhaps a better default could be implemented.

4. Description, Eligibility and Outcome fields have character limits of 1,024, so the data might be truncated (Course Data Programme data definitions have 4,000 character limits, as a contrast).

This work did not represent production level aggregation, but was a series of trials to investigate the problems of aggregating data into an existing system.

Likely generic difficulties, compared to a web-based data entry system, that could be extrapolated from the work were:

Decisions about the processes required will depend on specific issues:

It's still my view that with robust data definitions, good quality feeds and well-designed processes, it should be straightforward to consume XCRI-CAP data. What is needed is an attention to the details of the data requirements, and how to map and transform the incoming data to meet them. It is also worth bearing in mind that course marketing data is not particularly volatile, so minute-by-minute real-time data exchange is not a requirement; in many cases monthly or quarterly regular updates are sufficient.

 


XCRI-CAP: turn 12 days of keying into 3 hours of checking.