Free White Papers
Ask the experts
Got a question? Put us to the test
Recent White Papers
The Future of Social Care
Power and Corporate Politics
Change and the Shadowside
British Library - Developing Enhancements to the Web Curator Tool
The Web Curator Tool (WCT) is a tool for managing the selective web harvesting process. It is typically used by national libraries and other collecting institutions to preserve on-line documentary heritage. The WCT is enterprise-class software written in Java, and is designed for use by non-technical users like librarians. The software was developed jointly by the National Library of New Zealand and the British Library, and has been released as free (open source) software for the benefit of the international collecting community. The British Library commissioned Oakleigh Consulting’s Technology Division, to implement major enhancements to the application during periods of accelerated development spanning the next four years.
The British Library needs to actively archive the changing content of certain parts of the internet to prevent the loss of significant information for current and future generations. This need to archive sections of the internet is internationally recognised and the British Library is involved in various activities and consortia connected to this challenge, including the International Internet Preservation Consortium (IIPC), and the UK Web Archiving Consortium (UKWAC). Currently, UKWAC uses a tool called PANDAS to manage the UK Web archive. To better fit future needs and to lower operational costs, UKWAC has decided to move to a toolset based on the IIPC tools: the Heritrix web archiving crawler, the NutchWAX web archive full-text search engine and the Wayback machine to display archived sites.
The Web Curator Tool (WCT) has been designed to provide the workflow capabilities of PANDAS while also integrating with the IIPC toolset.
The Web Curator Tool has been developed using classic Java technologies such as JDK 1.6, Tomcat 6.0, the Spring Framework and Hibernate 3.2. The tool is now maintained as an open source application under the Apache Public License and it is hosted on SourceForge. One of the major challenges of the assignment was to ensure that all interested parties were kept fully informed of the implications of the phases of accelerated development that were being commissioned.
The British Library needed to establish a relationship with a supplier to maintain and further develop the capabilities of the WCT to reflect the changing needs of its Web Archiving Team over a four-year period. They chose Oakleigh Consulting to fulfil this role.
Oakleigh is using a three phase approach to deliver this project:
- Project Initiation – this phase was concerned with project setup and the initial agreement of processes and service level agreements between Oakleigh and The British Library. This phase also developed an initial view of the business value of each requirement and an estimate for its delivery for inclusion in the requirements register (or ‘project backlog’ in SCRUM terminology) utilised during the development and support phase.
- Development and Support – Oakleigh is using a SCRUM-based agile approach to deliver the requirements. This phase is made up of a number of ‘sprints’, 20 day iterations delivering an agreed set of functionality. A meeting is held between British Library and Oakleigh at the beginning of each sprint to define the functionality to be delivered based on business value and effort required. At the end of each sprint, a period of system testing is carried out by British Library staff to ensure that the agreed functionality has been delivered and that the system remains fit for purpose. Support issues are prioritised in terms of severity by British Library. The most severe issues are addressed immediately, removing development effort from the current sprint. The less severe issues are recorded and flagged for potential inclusion in the next or subsequent sprints.
- Annual Review – this phase is conducted on an annual basis at the end of the Library’s financial year. It includes an audit of the current system’s maintainability and extensibility, and also delivers a report identifying work carried out over the previous year, an assessment of the system’s maintainability and extensibility and identification of opportunities for improvement.
Some examples of the tasks performed during this development are as follows:
- Creation of nightly build server environment.
This included the generation of scripts to:
* Check all code out of SourceForge CVS at the current state of development.
* Build all code.
* Deploy the application to a local instance of Apache Tomcat
* Build deployable (war) files and store them by date and build number
* Run all unit tests
* Email interested parties as to the success of the checkout, build and test processes.
- Creation of a local development environment.
This included setting up of Eclipse and Tortoise CVS in an environment that allows checking code out and back in, developing, building and debugging in Eclipse, and deployment of the application to a locally running instance of Tomcat, using a locally deployed PostgreSQL database.
- Oakleigh implemented a comprehensive unit testing strategy using JUnit 4.4.
A collection of unit tests and supporting mock objects have been developed to allow for the unit and regression testing of developed code – mainly at the page controller and validation class levels.
At the end of the first period of the contract, Oakleigh’s developers had delivered all of the British Library’s requirements for that period on time and to budget. A new version of the Web Curator Tool (1.3) was deployed onto the British Library’s production servers and was then released onto SourceForge for the benefit of the entire archiving community. Oakleigh Consulting’s support and development of the Web Curator Tool continues in the same manner into the next period of the project when further important developments will be made.
If you have any questions about the subjects covered in this white paper or you would like to find out more about how Oakleigh Consulting could help your organisation, please contact us on 0161 835 4100 or email us.
You may publish, quote or reproduce any white papers on this website on the condition that Oakleigh Consulting Ltd is notified, properly credited (and linked to) as the source, including our URL: www.oakleigh.co.uk.