Category Archives: Data Integration

Spaghetti Grows in System Architectures – not an April Fools’ Day joke

A replay on breakfast TV this morning of the well known Panarama hoax (1st April 1957) reminded me of the mission we’re on at Bristol to “turn spaghetti into lasagne”. This mission is number 7 on the JISC 10 pointer list for improving organisational efficiency: spaghetti refers to the proliferation of point-to-point (tightly coupled) integrations between our University’s many IT Systems and lasagne refers to the nicely layered systems and data architecture we’d like to achieve (see elsewhere in this blog).

However, transforming our data architecture overnight is not achievable, instead we’ve developed a roadmap spanning several years in which reform of our data architecture fits into the wider contexts of both Master Data Management and Service Oriented Architecture.

In November last year our senior project approval group (now known as the Systems and Process Investment Board) agreed to resource a one year Master Data Integration Project. We will return to the same board early in 2015 with a follow on business case, but this year’s project is concerned with delivering the following foundation work.

The establishment of Master Data governance and process at the University (the creation of a Master Data Governance Board and the appointment of Data Managers and Data Stewards as part of existing roles throughout the University – responsible for data quality in their domains and for following newly defined Change of Data processes),
Completion of documentation of all the spaghetti (aka the integrations between our IT systems) in our Interface Catalogue, and also the documentation of our Master Data Entities (and their attributes and relationships) in an online Enterprise Data Dictionary (developed in-house),
Development of a SOA blueprint for the University, including our target data architecture. This with the help of a SOA consultant and to inform the follow on business case for SOA at Bristol, which we hope the University will fund from 2015.

We are undertaking this work with the following resources: Enterprise Architect (me) at 0.3FTE for a year, a Business Analyst (trained in Enterprise and Solutions Architecture) at 0.5FTE, a Project Manager at 0.3FTE, IT development time (both for developing the Enterprise Data Dictionary and for helping to populate the Interface Catalogue with information) and approximately £60K of consultancy.

We had some very useful consultancy earlier this year from Intelligent Business Strategies: several insightful conversations with MD, Mike Ferguson, and a week with Henrik Soerensen. From this we were able to draw up a Master Data Governance structure tailored to our organisation, which we are now trialling.
This work also helped us to consider key issues around governance processes and how to capture key information – such as including business rules around data – in the online data dictionary.
Later this year we will be working for an extended period with an independent SOA consultant based in the South West, Ben Wilcock of SOA growers. We have already worked with Ben in small amounts this year and I am very much looking forward to collaborating with him further to develop our target data architecture (most likely a set of master data services, supporting basic CRUD operations) within the context of a SOA blueprint for our enterprise architecture.

The value of an Interface Catalogue

Part of Enterprise Architecture activity involves examining the “As Is” in terms of the organisation’s systems architecture and developing a vision and specification of the “To Be” systems architecture. Many systems are integrated with each other in terms of data – data is ideally stored once and reused many times. In an HE context this could involve reusing student records information in a Virtual Learning Environment or surfacing research publications stored in a research information system on the University’s public website, and so on. So, the enterprise integration architecture is a key puzzle to unravel and to improve over time.

In your organization you may be lucky enough to have had a central, searchable database in which all systems interfaces have been documented to a useful level of detail since the beginning of time. If so, I envy you, because at the University of Bristol we are not so fortunate – yet. An important part of my exploration of the “As Is” for my organization this last year has been to attempt to map out our complex systems architecture and to understand how mature our integration architecture is. Do we need to design and develop a Service Orientated Architecture, and if so are we ‘ready’ for it in terms of the maturity of our existing integration architecture and our clarity regarding future requirements? The problem has been a lack of documentation to date: many times when integrations between systems have been created they are done so according to the particular developer’s preference at that time, and not documented either in terms of how the integration was implemented (perhaps via ETL, using AJAX, using a Web Service, or merely via some perl scripts) or with respect to the rational for that method of integration (for a brief, useful analysis of options for integration that I believe is still relevant several years on, please see MIT’s chart at http://tinyurl.com/MITIntegrationOptions). Information about these integrations remains in people’s heads. And often these people leave (well, actually, a lot of people stay as the University is a popular place to work!, but you see my point).

At our University there are a lot of point to point integrations between systems. Far more than we would like. And so, to understand the extent of the problem I have introduced an Interface Catalog which developers across teams are now undertaking to fill with information. There were some exchanges of information about using interface catalogs on the ITANA email list earlier this year. I used this and other resources to develop a proforma format for our University of Bristol interface catalog. I started this off in an internal wiki as I’ve found this a good forum for relatively informal, collaborative development of standards in the first instance. My idea was not to impose a format, but to build consensus around what we really need to record and how consistent developers need to be in constructing the information in each record. In the early days it looked something like this (this doesn’t show the full set of column headings):

After several iterations and use by developers, consensus around the terminology we wished to use and the level of detail required was reached. The team has now implemented the catalog in an Oracle database, mainly so that we can easily control vocabulary (avoiding different developers describing the same type of interface or class of data object etc in different ways) and also so that we can more easily search the catalog.

There is a good level of buy-in to using this catalog which I am very pleased with. It is time-consuming to fill in information retrospectively, but developers report that it is quick and easy to record information about interfaces as they go along.

Some were initially unsure about the value of the catalog as it is clear that the database will run to many hundreds of rows, if not thousands, pretty quickly. However this catalog is not for browsing, it is a dataset for analysis and querying and there are several expected benefits that we hope to reap when we reach a critical mass of data:

When a system is up for replacement we will be able to query the catalog to see how many interfaces there are to that system and thus assess the work involved in integrating similarly (or indeed in new ways) with the incoming system.
When developers leave, they won’t have taken essential knowledge with them in their heads – centrally-held documentation is key!
The make up of our current integration architecture will become clearer and we will be able to produce a coherent analysis of the extent to which we are depending on point-to-point integrations between systems (which are hard to sustain over time and reduce the agility of our overall architecture), and also where different integrations are repeating similar tasks in different ways (for example, transferring/transforming the same data objects). In the former case we are able to make the case more clearly for a future, more mature integration architecture, and in the latter case we can look to offer core API’s offering commonly required functionality in a standard way (the reuse advantage).
The catalog will help us to introduce a more formal, standardised and thus consistent approach to the way we integrate new technical systems – if we choose ETL, say, then the developer should record the justification for that solution, if ESB was discounted, we can record why, and so on.

In the interests of standardisation, we are linking the names used to describe systems to our corporate services catalog and we are starting to join and interlink a Data Dictionary with the interface catalog (i.e. to help specify more precisely the data objects that are transferred and transformed between systems for every integration developed).

I am able to view and search the database using SQL Developer as my client software, and a simple SQL query reveals data like this:

If anyone wishes to know more about the full set of detail we’re capturing per interface, feel free to get in touch. I would also be interested in others’ experiences with interface catalogs.

Master Data, Data Integration and a JISC Project

We’re very pleased to have been awarded a project under the JISC Transformations Programme.

During the project we plan to use JISC resources such as the ICT Strategic Toolkit along with the support of the JISC Transformations programme and continued development of our institutional Enterprise Architecture approach to tackle the problem of achieving full integration of our various learning, teaching and research systems. We are in the process of documenting our core “master” data model and mapping the interrelationship of the data models implemented in our wide-ranging systems. This is because we need to consider how we may improve the sustainability of data exchange between systems without an on-going reliance on multiple point-to-point systems integrations – integrations that are resource-intensive and complex to maintain.

By core data model I mean the data model that is core to the business of the University and that is relatively unchanging over time. We are modelling entities such as Student, Programme, Unit, Researcher, Department, Research Output etc. and the relationships between them. We are also working on the classification schemes we use such as to define the University structure for faculties, schools and departments (this is currently undergoing a standarisation process internally). Documenting this data model – and maintaining a version-controlled history of it over time – will mean that our developers will be able to make reference to the core data model when developing new system solutions (thus avoiding potential ambiguity in the way information is shared between systems and with end-users), and we will be able to be clear about how new, external systems will need to be integrated to fit with our core data model. Finally, implementing integration support at the middleware layer will take us further on the road to ICT Maturity. We are currently somewhere between the stages of “technology standardisation” and “optimised core” as illustrated in the diagram.

Requirements are being driven by several large-scale projects at the University of Bristol including the Managing Information Project (with its emphasis on business intelligence), the Performance Enhancement Project (which seeks to provide better quality data to support our staff review and progression processes) and the University Web Project (which focuses on providing new and improved public Web content following the recent purchase of our new Content Management System).

Please see the blog for the project (http://coredataintegration.isys.bris.ac.uk/about/), where more info will appear over time.