Sunday, 26 February 2012


The project plan

Aims, Objectives and Final Output(s) of the project 

Trenches to Triples (T3) will provide Linked Data markup to 200 collection level descriptions and 6,000 item level catalogue entries relating to the First World War from the Liddell Hart Centre for Military Archives, an archive holding Designated Status from the MLA for the national significance of its collections. In so doing, the project will improve access to information at King’s in anticipation of the centenary celebrations of the War and meet already sustained demand for original material about the First World War. The project will also add value to JISC investment in the projects Open Metadata Pathway and the most recently awarded Step Change project (see http://openmetadatapathway.blogspot.com  for more information about Open Metadata Pathway and Stepchange).

The main outputs of this project will be:

  • Development of tool for the creation and validation of Linked Data markup of catalogue data
  • RDFa data created and validated for 200 collection level descriptions and 6,000 item level catalogue entries from the Liddell Hart Centre for Military Archives 
  • Development of web service to publish the RDFa data
  • Toolkit providing guidance on archival workload, cost and technical requirements for RDFa data creation and processing
  • Tested approach to updating and editing of legacy catalogues, and creation of Linked Data analysis and re-export of A2A catalogue data


Wider Benefits to Sector & Achievements for Host Institution 

The benefits this project will bring to the host institution and the wider HE sector are as follows:

  • Increased accessibility of a body of World War One research material for academic and popular research
  • New level of granularity for World War One subject (concepts and complex subjects), corporate, personal and place names added to processed data and to UKAT service for export and publication
  • Exemplar of linking summary and detailed archive catalogue descriptions with image metadata
  • Development of tool for the creation and validation of Linked Data which will be available to the wider research community
  • Toolkit providing guidance on archival workload, cost and technical requirements for RDFa data creation and processing for use by other institutions considering the use of Linked Data


Risk Analysis and Success Plan 

Risk
Probability 1-5
Severity 1-5
Score
PxS
Archives to prevent  / manage risk
Difficulty in recruiting and retaining staff
1
3
3
The archivist post will be widely advertised in the professional press and lists. In the event of difficulty, a secondment will be offered which is likely to prove attractive to young professionals seeking to develop their knowledge. The Departments of War Studies, Defence Studies, English and Psychiatry can draw upon a large pool of postgraduates with an interest in the First World War. We have a very open view of disciplinary requirements. Technical staff are already in place and knowledge sharing during the course of the project will provide some resilience. In the event of staff losses agency staff will be brought in or secondments from the Department of Digital Humanities or similar sought.

Technical development sub-contractor (Imagiz)  goes into receivership
2
2
4
Member of development company to act as independent consultant
Terms to be ‘linked’ insufficiently scoped
2
1
2
Pilot test phase to refine methodology for selecting of concepts, subjects ,corporations and dates  with outcomes reviewed and prioritised by an academic focus group to ensure effective use of resource. Employment of postgraduate researchers familiar with World War One to ensure terms include a reflection of new research trends. Detailed project planning.

Failure to meet project milestones
2
3
6
Produce project plan with clear objectives. Continuous project assessment and close communication between project manager, technical leads, and JISC programme manager to ensure targets are realistic, achievable and focus on project goals.


IPR
IPR in all reports and other documents produced by the project will be retained jointly by King’s College London and ULCC but made freely available on a non exclusive license as required/advised by JISC. All software and data created during the project will be made available to the community on an open licence. We will respect the licence model of all third parties and during the project, most of which is made available under open source licences.

Project Team Relationships and End User Engagement 
The project will be overseen by a board comprising: Patricia Methven, Director of Archives and Information Management (Chair); Geoffrey Browell, Senior Archivist, King’s College Archives Services; Lianne Smith, Archives Services Manager, King’s College Archives Services; Rory McNicholl, Senior Developer, ULCC; Pete Vox, Developer, Imagiz; Richard Davis, Development Manager and Technical Coordinator, ULCC; and the archivist and postgraduate researchers recruited for the project. Representation will also be sought from relevant projects funded in the programme call and from JISC itself. Geoffrey Browell will service as the Technical Coordinator for the project. Lianne Smith will manage the project including timetabling, oversight of staff employed, budget management and liaison with JISC. Day to day work will be undertaken by recruited archivist who will be responsible for coordinating the work of postgraduates recruited to ensure the robustness of terminology chosen across a arrange of disciplines.

Tools produced will be widely available to the research community in line with the JISC Discovery taskforce mission including Mimas. The project will establish a project blog and Twitter feed to record progress and invite comment. The information concerning the project will be circulated on academic and popular history lists and relevant websites. The project team will work proactively with other Discovery activities and projects, to identify synergistic goals and approaches. It will also work with the Open/Linked Data and Semantic Web communities to ensure the maximum dissemination opportunities for outputs. Services such as LinkedData.org and PTWS.com will be used to publicise the availability of the data. Project outputs will be made available on the project website. Dissemination to the wider archival, museum and library will be offered through professional conferences and press of ARA, CILIP, RLUK, SCONUL and the Museums Association. Websites such as Culture24 and provided by the Collections Trust will also be notified.

Projected Timeline, Workplan & Overall Project Methodology 

Work package 1: Project management
WP1 will assemble the project team, including the recruitment of the Project Archivist via advertising in professional press and mailing lists and recruitment of postgraduate researchers from King’s College London to assist provide expertise on research trends and terminology; establish the project board; ensure liaison between parties and JISC programme manager; deliver the detailed project plan; provide progress and risk assessment reports; organise project meetings; and deliver the exit and sustainability plan.

Work package 2: Creation of RDFa catalogue infrastructure and development of web service: This package will build the infrastructure to allow catalogue data to be exchanged and dynamically undated.
The LHCMA military collection level descriptions and file level catalogues are held in a Content Management System (ModX) visible in a public website. As an extension of this work, a web service (W/S) will be created to supply this data in a structured format that can be requested and consumed (in real-time) by remote websites/applications. This data feed will adhere to a documented RDF and will supply data in a format based on an EAD schema, including image links and metadata. A number of technical and data improvements need to be made to enable data to be output as a web service. These include: storage and display of master catalogue data (summary level in ISAD(G), with further detail pages/images), requiring the cleaning and optimisation of detailed documents, and ‘fragmentation’ into subsets of related XML data ‘nodes’; these XML nodes will require a logical filing structure, and a management module that will allow administrators to combine nodes into related groups/documents, and allocate internal system reference IDs. The W/S will also receive updated data from the workflow tool (see WP3) and update catalogue records accordingly, ensuring data relevance and consistency. The W/S design will include an authentication process to validate requests for data, and any updates received. Subsets of images will also be viewable/available as part of this data via a management module that will enable operators to build image galleries and PDF documents from high-resolution source images held in the Celum Digital Asset Management System at King’s College London created for the Serving Soldier project.

Work package 3: Adaptation of workflow tool: This package will allow the OMP tool for summary guides to be adapted for detailed catalogues and batch analysis.
The workflow tool that was developed for the OMP project, and that is currently being refined for Step change, will allow archivists to process and refine catalogue data as linked data as part of the normal cataloguing process. Some further refinements to the tool will be necessary to enable data held in ModX to be communicated to the workflow module and vice-versa. These include the ability to accept input data from an external CMS (currently it is only available from AIM25 EAD/ISAD(G)); adaption, which is currently designed to process only ISAD(G) summary descriptions, to process full catalogues and the ability to browse for and display additional structured data, and add capacity to the interface to allow the selection and analysis of such data. The infrastructure will return results of analysis to source via ModX, and the results of analysis will be stored in AIM25-UKAT - to avoid inaccuracies between two sources, AIM25-UKAT will store a reference to the ModX instance.

Work package 4: Pilot testing of military catalogue processing: Testing the import of data, processing and export and identifying and prioritising World War One terms to be marked up
The technical team (Imagiz (commercial sub-contractor), ULCC and Geoff Browell), Project Archivist and Project Manager, Lianne Smith, will work together to batch process test samples of LHCMA military catalogues relating to World War One using the workflow tool which will interrogate samples in real time. The results will inform the main detailed analysis of World War One related catalogue entries in WP5. Two postgraduate researchers and an academic focus group (WP8) will identify relevant themes (including concepts and complex subjects) places, dates and other entities relating to the War that will be used to improve the semantic analysis and UKAT refinement outlined in WP6. This WP will also be informed by the conclusions of the JISC ITT call on First World War sources.

Work package 5: Detailed semantic analysis of WW1 related catalogue entries by archivist: Bulk processing of catalogues, analysis by Project Archivist and availability via a web service.
Catalogue data held in ModX and sample Serving Soldier image metadata, both relating to the First World War, will be processed using the RDFa workflow tool developed by ULCC for the OMP project. This is currently being refined for the speedier processing of bulk metadata for the Step Change project, work which will be completed by February 2012. Processing will take place against AIM25-UKAT, Open Calais and other linked data services where appropriate. This processing will identify subject, personal, corporate and place name entities and triples where appropriate. Following refinement in WP4, the archivist will analyse the main body of material and validate and add new entities, including subject terms, which the analysis has failed to highlight.

These terms will be added to AIM25-UKAT where applicable. The final selection of entities will be stored alongside the catalogue image metadata in ModX and made available to other repositories via a new web service. This work package will also identify data discrepancies, for example variations of name and place, and will demonstrate how the workflow tool can be used to improve existing catalogues where this data is inconsistent. Some key LHCMA military catalogues relating to the First World War are held exclusively on The National Archives’ A2A website. The quality of this older data is very uneven and is representative of the variable quality of A2A data more generally across all repositories. In this instance, the data will be re-acquired from the TNA and the archivist will use the workflow tool to add RDFa metadata to enhance and improve accessibility to this legacy material, and in so doing, to provide an exemplar of the way in which this may be done across all content held in A2A in the future.

Work package 6: Provide detailed First World War terminology in UKAT: UKAT currently lacks detailed terminology relating to the First World War. Terms and concepts identified in the processing will be added to UKAT and made available via the Step change web service.
The Project Archivist will update metadata relating to the First World War, by adding granularity within AIM25-UKAT for terms relating to the First World War drawn from the semantic analysis of the catalogues – for example individual battles, personnel, geographical locations on the Western Front and other theatres of the war. These terms will be identified during the processing phase, updated dynamically in UKAT and output via the new UKAT API being developed for the Step change project. The extra level of granularity will provide an agreed controlled vocabulary of terms in linked data format that will be readily exportable and can be used by JISC and other projects as part of the suite of World War One commemoration websites and aggregation tools.

Work package 7: Toolkit development
This covers the creation of guidance for RDFa data definition through the batch processing of detailed catalogue data and the detailed semantic analysis of the processed data, and the dissemination of this data through the construction of web tools. It will collate information about archival workload, the cost and technical aspects from the members of the project group, which will provide a set of practical guidelines to be used to assist other institutions wishing to enrich their resource discovery tools through the use of Linked Data.

Work package 8: Evaluation
Two focus groups will meet. One of academic users will meet at the beginning to advise on terminology to add to UKAT and to ensure that the marked up data will reflect current and emerging trends of research. The other group, made up of archivists, will test the revised workflow tool to ensure usability.

Work package 9: Dissemination
Tools produced will be widely available to the research community in line with the JISC Discovery taskforce mission including Mimas. The project will establish a project blog and Twitter feed to record progress and invite comment. The information concerning the project will be circulated on academic and popular history lists and relevant websites. The project team will work proactively with other Discovery activities and projects, to identify synergistic goals and approaches. It will also work with the Open/Linked Data and Semantic Web communities to ensure the maximum dissemination opportunities for outputs. Services such as LinkedData.org and PTWS.com will be used to publicise the availability of the data. Project outputs will be made available on the project website. Dissemination to the wider archival, museum and library will be offered through professional conferences and press of ARA, CILIP, RLUK, SCONUL and the Museums Association.

2012
Feb
Mar
Apr
May
Jun
July
WP1
X
X
X
X
X
X
WP2
X
X




WP3
X





WP4

X




WP5

X
X
X
X

WP6


X
X
X
X
WP7




X
X
WP8
X



X

WP9

X
X
X
X
X

Budget