The project plan
Aims, Objectives and
Final Output(s) of the project
Trenches to Triples (T3) will provide Linked Data markup to 200
collection level descriptions and 6,000 item level catalogue entries relating
to the First World War from the Liddell Hart Centre for Military Archives, an
archive holding Designated Status from the MLA for the national significance of
its collections. In so doing, the project will improve access to information at
King’s in anticipation of the centenary celebrations of the War and meet
already sustained demand for original material about the First World War. The
project will also add value to JISC investment in the projects Open Metadata
Pathway and the most recently awarded Step Change project (see http://openmetadatapathway.blogspot.com
for more information about Open
Metadata Pathway and Stepchange).
The main outputs of this project will be:
- Development of tool for the creation and validation of Linked Data markup of catalogue data
- RDFa data created and validated for 200 collection level descriptions and 6,000 item level catalogue entries from the Liddell Hart Centre for Military Archives
- Development of web service to publish the RDFa data
- Toolkit providing guidance on archival workload, cost and technical requirements for RDFa data creation and processing
- Tested approach to updating and editing of legacy catalogues, and creation of Linked Data analysis and re-export of A2A catalogue data
Wider Benefits to Sector
& Achievements for Host Institution
The benefits
this project will bring to the host institution and the wider HE sector are as
follows:
- Increased accessibility of a body of World War One research material for academic and popular research
- New level of granularity for World War One subject (concepts and complex subjects), corporate, personal and place names added to processed data and to UKAT service for export and publication
- Exemplar of linking summary and detailed archive catalogue descriptions with image metadata
- Development of tool for the creation and validation of Linked Data which will be available to the wider research community
- Toolkit providing guidance on archival workload, cost and technical requirements for RDFa data creation and processing for use by other institutions considering the use of Linked Data
Risk Analysis and
Success Plan
Risk
|
Probability
1-5
|
Severity
1-5
|
Score
PxS
|
Archives
to prevent / manage risk
|
Difficulty
in recruiting and retaining staff
|
1
|
3
|
3
|
The
archivist post will be widely advertised in the professional press and lists.
In the event of difficulty, a secondment will be offered which is likely to
prove attractive to young professionals seeking to develop their knowledge.
The Departments of War Studies, Defence Studies, English and Psychiatry can
draw upon a large pool of postgraduates with an interest in the First World
War. We have a very open view of disciplinary requirements. Technical staff
are already in place and knowledge sharing during the course of the project
will provide some resilience. In the event of staff losses agency staff will
be brought in or secondments from the Department of Digital Humanities or
similar sought.
|
Technical
development sub-contractor (Imagiz)
goes into receivership
|
2
|
2
|
4
|
Member of
development company to act as independent consultant
|
Terms to
be ‘linked’ insufficiently scoped
|
2
|
1
|
2
|
Pilot test
phase to refine methodology for selecting of concepts, subjects ,corporations
and dates with outcomes reviewed and prioritised by an academic focus
group to ensure effective use of resource. Employment of postgraduate
researchers familiar with World War One to ensure terms include a reflection
of new research trends. Detailed project planning.
|
Failure to
meet project milestones
|
2
|
3
|
6
|
Produce
project plan with clear objectives. Continuous project assessment and close
communication between project manager, technical leads, and JISC programme
manager to ensure targets are realistic, achievable and focus on project
goals.
|
IPR
IPR in all reports and other documents produced by the project will be retained jointly by King’s College London and ULCC but made freely available on a non exclusive license as required/advised by JISC. All software and data created during the project will be made available to the community on an open licence. We will respect the licence model of all third parties and during the project, most of which is made available under open source licences.
IPR in all reports and other documents produced by the project will be retained jointly by King’s College London and ULCC but made freely available on a non exclusive license as required/advised by JISC. All software and data created during the project will be made available to the community on an open licence. We will respect the licence model of all third parties and during the project, most of which is made available under open source licences.
Project Team
Relationships and End User Engagement
The project
will be overseen by a board comprising: Patricia Methven, Director of Archives
and Information Management (Chair); Geoffrey Browell, Senior Archivist, King’s
College Archives Services; Lianne Smith, Archives Services Manager, King’s
College Archives Services; Rory McNicholl, Senior Developer, ULCC; Pete Vox,
Developer, Imagiz; Richard Davis, Development Manager and Technical
Coordinator, ULCC; and the archivist and postgraduate researchers recruited for
the project. Representation will also be sought from relevant projects funded
in the programme call and from JISC itself. Geoffrey Browell will service as
the Technical Coordinator for the project. Lianne Smith will manage the project
including timetabling, oversight of staff employed, budget management and liaison
with JISC. Day to day work will be undertaken by recruited archivist who will
be responsible for coordinating the work of postgraduates recruited to ensure
the robustness of terminology chosen across a arrange of disciplines.
Tools
produced will be widely available to the research community in line with the
JISC Discovery taskforce mission including Mimas. The project will establish a
project blog and Twitter feed to record progress and invite comment. The
information concerning the project will be circulated on academic and popular
history lists and relevant websites. The project team will work proactively
with other Discovery activities and projects, to identify synergistic goals and
approaches. It will also work with the Open/Linked Data and Semantic Web
communities to ensure the maximum dissemination opportunities for outputs.
Services such as LinkedData.org and PTWS.com will be used to publicise the
availability of the data. Project outputs will be made available on the project
website. Dissemination to the wider archival, museum and library will be
offered through professional conferences and press of ARA, CILIP, RLUK, SCONUL
and the Museums Association. Websites such as Culture24 and provided by the
Collections Trust will also be notified.
Projected Timeline,
Workplan & Overall Project Methodology
Work package
1: Project management
WP1 will
assemble the project team, including the recruitment of the Project Archivist
via advertising in professional press and mailing lists and recruitment of postgraduate
researchers from King’s College London to assist provide expertise on research
trends and terminology; establish the project board; ensure liaison between
parties and JISC programme manager; deliver the detailed project plan; provide
progress and risk assessment reports; organise project meetings; and deliver
the exit and sustainability plan.
Work package
2: Creation of RDFa catalogue infrastructure and development of web service: This
package will build the infrastructure to allow catalogue data to be exchanged
and dynamically undated.
The LHCMA
military collection level descriptions and file level catalogues are held in a
Content Management System (ModX) visible in a public website. As an extension
of this work, a web service (W/S) will be created to supply this data in a
structured format that can be requested and consumed (in real-time) by remote
websites/applications. This data feed will adhere to a documented RDF and will
supply data in a format based on an EAD schema, including image links and
metadata. A number of technical and data improvements need to be made to enable
data to be output as a web service. These include: storage and display of
master catalogue data (summary level in ISAD(G), with further detail
pages/images), requiring the cleaning and optimisation of detailed documents,
and ‘fragmentation’ into subsets of related XML data ‘nodes’; these XML nodes
will require a logical filing structure, and a management module that will
allow administrators to combine nodes into related groups/documents, and
allocate internal system reference IDs. The W/S will also receive updated data
from the workflow tool (see WP3) and update catalogue records accordingly,
ensuring data relevance and consistency. The W/S design will include an
authentication process to validate requests for data, and any updates received.
Subsets of images will also be viewable/available as part of this data via a
management module that will enable operators to build image galleries and PDF
documents from high-resolution source images held in the Celum Digital Asset
Management System at King’s College London created for the Serving Soldier
project.
Work package
3: Adaptation of workflow tool: This package will allow the OMP tool for
summary guides to be adapted for detailed catalogues and batch analysis.
The workflow
tool that was developed for the OMP project, and that is currently being
refined for Step change, will allow archivists to process and refine catalogue
data as linked data as part of the normal cataloguing process. Some further
refinements to the tool will be necessary to enable data held in ModX to be
communicated to the workflow module and vice-versa. These include the ability
to accept input data from an external CMS (currently it is only available from
AIM25 EAD/ISAD(G)); adaption, which is currently designed to process only
ISAD(G) summary descriptions, to process full catalogues and the ability to
browse for and display additional structured data, and add capacity to the
interface to allow the selection and analysis of such data. The infrastructure
will return results of analysis to source via ModX, and the results of analysis
will be stored in AIM25-UKAT - to avoid inaccuracies between two sources,
AIM25-UKAT will store a reference to the ModX instance.
Work package
4: Pilot testing of military catalogue processing: Testing the import of
data, processing and export and identifying and prioritising World War One
terms to be marked up
The
technical team (Imagiz (commercial sub-contractor), ULCC and Geoff Browell),
Project Archivist and Project Manager, Lianne Smith, will work together to
batch process test samples of LHCMA military catalogues relating to World War
One using the workflow tool which will interrogate samples in real time. The
results will inform the main detailed analysis of World War One related
catalogue entries in WP5. Two postgraduate researchers and an academic focus
group (WP8) will identify relevant themes (including concepts and complex
subjects) places, dates and other entities relating to the War that will be
used to improve the semantic analysis and UKAT refinement outlined in WP6. This
WP will also be informed by the conclusions of the JISC ITT call on First World
War sources.
Work package
5: Detailed semantic analysis of WW1 related catalogue entries by archivist: Bulk
processing of catalogues, analysis by Project Archivist and availability via a
web service.
Catalogue
data held in ModX and sample Serving Soldier image metadata, both relating to
the First World War, will be processed using the RDFa workflow tool developed
by ULCC for the OMP project. This is currently being refined for the speedier
processing of bulk metadata for the Step Change project, work which will be
completed by February 2012. Processing will take place against AIM25-UKAT, Open
Calais and other linked data services where appropriate. This processing will
identify subject, personal, corporate and place name entities and triples where
appropriate. Following refinement in WP4, the archivist will analyse the main body
of material and validate and add new entities, including subject terms, which
the analysis has failed to highlight.
These terms
will be added to AIM25-UKAT where applicable. The final selection of entities
will be stored alongside the catalogue image metadata in ModX and made
available to other repositories via a new web service. This work package will
also identify data discrepancies, for example variations of name and place, and
will demonstrate how the workflow tool can be used to improve existing
catalogues where this data is inconsistent. Some key LHCMA military catalogues
relating to the First World War are held exclusively on The National Archives’
A2A website. The quality of this older data is very uneven and is
representative of the variable quality of A2A data more generally across all
repositories. In this instance, the data will be re-acquired from the TNA and
the archivist will use the workflow tool to add RDFa metadata to enhance and
improve accessibility to this legacy material, and in so doing, to provide an
exemplar of the way in which this may be done across all content held in A2A in
the future.
Work package
6: Provide detailed First World War terminology in UKAT: UKAT currently
lacks detailed terminology relating to the First World War. Terms and concepts
identified in the processing will be added to UKAT and made available via the
Step change web service.
The Project
Archivist will update metadata relating to the First World War, by adding
granularity within AIM25-UKAT for terms relating to the First World War drawn
from the semantic analysis of the catalogues – for example individual battles,
personnel, geographical locations on the Western Front and other theatres of
the war. These terms will be identified during the processing phase, updated
dynamically in UKAT and output via the new UKAT API being developed for the
Step change project. The extra level of granularity will provide an agreed
controlled vocabulary of terms in linked data format that will be readily
exportable and can be used by JISC and other projects as part of the suite of
World War One commemoration websites and aggregation tools.
Work package
7: Toolkit development
This covers
the creation of guidance for RDFa data definition through the batch processing
of detailed catalogue data and the detailed semantic analysis of the processed
data, and the dissemination of this data through the construction of web tools.
It will collate information about archival workload, the cost and technical
aspects from the members of the project group, which will provide a set of
practical guidelines to be used to assist other institutions wishing to enrich
their resource discovery tools through the use of Linked Data.
Work package
8: Evaluation
Two focus
groups will meet. One of academic users will meet at the beginning to advise on
terminology to add to UKAT and to ensure that the marked up data will reflect
current and emerging trends of research. The other group, made up of
archivists, will test the revised workflow tool to ensure usability.
Work package
9: Dissemination
Tools produced will be widely available to the research community in
line with the JISC Discovery taskforce mission including Mimas. The project
will establish a project blog and Twitter feed to record progress and invite
comment. The information concerning the project will be circulated on academic
and popular history lists and relevant websites. The project team will work
proactively with other Discovery activities and projects, to identify
synergistic goals and approaches. It will also work with the Open/Linked Data
and Semantic Web communities to ensure the maximum dissemination opportunities
for outputs. Services such as LinkedData.org and PTWS.com will be used to
publicise the availability of the data. Project outputs will be made available
on the project website. Dissemination to the wider archival, museum and library
will be offered through professional conferences and press of ARA, CILIP, RLUK,
SCONUL and the Museums Association.
2012
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
July
|
WP1
|
X
|
X
|
X
|
X
|
X
|
X
|
WP2
|
X
|
X
|
||||
WP3
|
X
|
|||||
WP4
|
X
|
|||||
WP5
|
X
|
X
|
X
|
X
|
||
WP6
|
X
|
X
|
X
|
X
|
||
WP7
|
X
|
X
|
||||
WP8
|
X
|
X
|
||||
WP9
|
X
|
X
|
X
|
X
|
X
|
Budget