Author Archives: rireland

Contained components

large_h
wincewicz

Post by Richard Wincewicz, Software Engineer for SafeNet at EDINA.

In the previous post we saw an overview of the whole SafeNet project. In this post I wanted to dig a little deeper into the technical side of the project.

Key Components

We are currently developing the SafeNet Service Interface component by extending the LOCKSS software (http://www.lockss.org/), a software platform which allows libraries to store and provide access to locally managed copies of electronic content such as e-journals. The LOCKSS software was originally designed to work from within an institution’s network and provide access only to users that are part of that network. A key component of the SafeNet service is to introduce a centrally-managed Private LOCKSS Network that can be used by UK HE institutions to provide assurances of continuing access to their subscribed content, without having to run a server locally. The SafeNet model will allow institutions to participate in a shared service offering but there are a number of challenges that need to be addressed for this to work at scale.

The first challenge is determining who can have access to what content. In standard LOCKSS deployments, access is restricted by IP ranges (e.g. the university network) and so all users can access the same content. With a centrally-managed service this is no longer the case and we need a mechanism to ensure that a user is entitled to access content that they request.  For this purpose, we are designing and deploying an Entitlement Registry which holds information about the subscriptions that institutions have for specific journals. The Entitlement Registry provides a REST API that allows a user or application to query its database. Some of this data may be made openly available, such as lists of publishers and titles, and some of it will be restricted, such as the journals that an institution is subscribed to. We are extending the LOCKSS software to include a query to the Entitlement Registry whenever a user requests some journal content. As a pre-requisite to this, a user will be required to identify themselves by logging in and providing the LOCKSS software with identifying information about their institution. Using this information, we can then determine whether a user is entitled to access the requested content.

The Entitlement Registry has broader potential value as a reference tool used by both libraries and publishers.  To this end, we are also designing a user interface on top of the Entitlement Registry to allows users to interact with the entitlement data. Users will be able to view general information about titles and publishers as well as entitlement information specific to their institution. In addition to this, we are assessing use cases about external access to the Entitlement Registry API, so that other applications can make use of the data without having to collect and host it themselves.

Deployment Infrastructure

If a service is going to be successful it first needs to be reliable and responsive. These are two aspects of a service but can be solved with similar approaches. Having redundant copies of a service in different locations allows for one site to fail while still allowing users access the service from the second site. This also helps when dealing with heavy traffic because there are now two servers able to handle requests. This approach works well up to a point, but if different parts of a service all start to require large amounts of resource then creating more copies of the service doesn’t help.

At this stage the architecture of the service becomes important. If the service consists of a single application then the only way to deal with increased load is to run the application on a more powerful server. If the service is comprised of many small applications that communicate with each other then copies of these components can be created independently of each other. This leads to a much greater flexibility and allows the service to handle hardware and software failures as well as heavy traffic.

With the different components we wanted to make sure that each could run efficiently, scale well and be updated without disruption to the service. In order to do this we have created each component separately and given them their own environment to run in. Using Docker (https://www.docker.com/) each component runs in a container that isolates it from the rest of the processes that are running on the server. This means that we can have a lot of different components running in the same place without worrying about how they will affect each other.

Use of Docker also gives us a portable object meaning we can create as many identical versions of the component as needed to provide resilience or to deal with load. These portable objects can be started and stopped very quickly allowing us to deal with failed components or manage updates without affecting the running service.

For this model to be successful we had to put some thought into the design of the components so that they work under these conditions. In particular all of the information that the application uses is stored in an external database. In fact, a minimum amount of data is stored with the component to allow for it to be shutdown or restarted without having to worry about what will happen to the data. At EDINA we are lucky enough to have access to two datacentres meaning each of our services is spread across two sites. A load balancer deals with each request and passes it on to an available server in one of the datacentres. If one of the servers is down then all requests are passed to the other available servers ensuring that the service remains accessible.

Now that we have the basic structure of the service set up it is important that we continue to develop the service in a way that maintains the reliability and resilience. Docker makes it easier to rapidly deploy multiple copies of an application in different locations but it brings its own complexities. The goal now is to use Docker to make our lives easier rather than more complicated.

What we talk about when we talk about SafeNet

roads

July 2015 marks the halfway point for the SafeNet project. A lot of progress has been made towards developing a service that will provide value to the HE community. As we look forward towards the next reporting phase, one which involves significant outreach and negotiation efforts, our attention has focused on the need to produce a clear model of the service proposition by way of infrastructural components and stakeholders, with a demonstration of how those aspects will function and inter-relate.

In our last blog post, we introduced project personas that emerged via discussions with UK HE librarians.  Those discussions regularly explored the issues around post cancellation access in close detail, however interviewees found it harder to identify the shape of the tool or service that would address these problems. It was clear from these discussions that a more visual approach would be beneficial in explaining how the SafeNet service will provide content and how the components will work together to create a cohesive whole.

A recent work package has focused on the legal agreements required by the emergent SafeNet service, specifically in defining the publisher participation agreement that would underpin the supply and deposit of publisher content. The participation agreement outlines the commitments and responsibilities of those involved in supplying material and those operating the service. Along with these responsibilities, the agreement outlines the individual elements of the proposed service and the relationships of the main actors to the final product.

To this end the project team have spent time defining and illustrating who will do what and why they will do it as participants in the service.  The diagram below visualises the service components and, at a high level, clarifies the responsibilities of the stakeholders involved in the project (click to enlarge):

SafeNet blog diagram

This is a simplified diagram to show the high level relationships and interactions.  We can see, for example, the project responsibilities of EDINA and Jisc and their anticipated responsibilities once in service mode. We will be refining this model and adding further details where relevant to assist with production of a tool kit that will be used to aid negotiation and promotion, describing how the service works in practice.

That said, some of the above components are well defined at this stage and some require further work and investigation. For example, while the responsibility for the service components and operation lies with EDINA, Jisc Collections will deal with publisher negotiations building on their considerable experience in this area. Publishers will provide the e-journal content archived nationally using a private LOCKSS network (PLN). The publisher will always remain the preferred supplier of access, and in the event that content from SafeNet is accessed the service will provide usage information back to the publisher.

The diagram also shows, in red, those components EDINA will manage, including two of the PLN nodes which are complemented by four co-located nodes. Establishing this national infrastructure and formalising the agreements to support this is something that will be progressed in the coming months.

Methods for gathering entitlement information are being closely examined at the moment. We hope to convene a second community meeting in the coming months to discuss approaches and consider challenges. The focus in developing the entitlement registry is currently centred on considering data sources and assessing the quality of information available. The KB+ team — Magaly Bascones in particular — have been instrumental in assisting our progress with this. The SafeNet project are also grateful to KB+ users at the universities of Huddersfield, Newcastle, East Anglia and Cambridge for access to their KB+ test profiles as we investigate the possibility of reusing information held there.

As we reach the halfway point the roadmap above shows where the project is headed. Upcoming landmarks include drafting service level definitions, testing data ingest and integrating components into the broader service architecture as shown above. There’s another year to navigate through with plenty of challenging diversions along the way.

Interviews, personas & perpetual access pain points. Oh my!

Book question. 3D modeling and renderingTo help the SafeNet project team gain a better understanding of user needs, exploratory interviews were carried out with 19 serials librarians between the 19th of January and the 6th of March 2015.

As reported in an earlier post, Jisc Collections carried out a survey of its membership in order to understand the post cancellation access needs of the UK HE community. Part of the drive to do this came from earlier consultations with selected NESLi2 publishers. During these discussions it was clear that, for there to be buy-in on behalf of the publishing community, demand for a service based on the SafeNet project from UK HE library community would need to be demonstrated.

The interviews also provided the basis for the identification of 9 distinct personas. The personas do not represent real individuals; they are composites of common themes identified during the interviews. These will be used to assist the project team in being mindful of the audience for the final service and identifying their needs in relation to perpetual access.

The interviews provided an opportunity to find out what concerns and particular pain points librarians experience in relation to perpetual access. The earlier PECAN project identified that, despite existing external service providers offering long term digital preservation services, there are still concerns from the community about continuing access that require improvement and investment. The SafeNet project team specifically wanted to explore user needs in relation to these concerns and how the potential service components of SafeNet could address these issues.

Key Findings

All interviewees noted that their main goal was for users to experience seamless access to content and, should access be lost, to rectify it as quickly as possible. The quantity of journal content available electronically to libraries means that pro-actively checking access to all subscribed material is not practical. Library staff reported that they have to be reactive to access issues when notified by users and this can give a poor impression of their service. Many interviewees indicated that users don’t always see the distinction between the library catalogue and the content provider which is often reflected in NSS and LibQUAL surveys.

Institutional engagement with the issue of post cancellation access (PCA) varied. In some cases a ‘belt and braces’ approach had been taken with institutions participating in both LOCKSS and Portico. In others there were no library-side arrangements and PCA was left to publisher provision.

Record keeping in relation to entitlements also varied. There were similarities in terms of storing physical and digital copies of licences but strategies for making this information usefully available ranged from using the library management system to spreadsheets to nothing at all.

A common theme throughout the interviews was the time constraints library staff face. It was not uncommon for interviewees to report that correspondence with publishers was often protracted and required significant investment of time to provide evidence to support assertions. Again, record keeping was an issue here. In one specific case it was reported that entitlement claims were not pursued because the library was unlikely to have the evidence to hand and the staff time spent investigating the loss of access would outweigh the cost of an inter-library loan.

Several interviewees reported that assurance of PCA was most pressing when moving from a print and electronic subscription to e-only. The SafeNet service, offering a level of national resilience for content, was viewed positively in this scenario as it was common for interviewees to report that they continued to receive print journals in conjunction with the e-version to act as an archive should the subscription be cancelled and electronic access lost. Many stated that these print copies were never made available to users. The proposed SafeNet archive was also welcomed by librarians who had experience of PCA clauses being fulfilled on CD-ROMs or hard drives but who lacked the local infrastructure to provide access to this content for their users.

Overall it was clear from the discussions that there was enthusiasm from librarians about the SafeNet project. The sense that it would save time and provide a centralised, authoritative source of entitlements should access — either current or post cancellation — become a problem, was viewed positively. The national infrastructure was seen as an extremely useful step on the road to providing more robust perpetual access to content which had been paid for.

The persona document is available for download.

If you have any comments or feedback please contact us at edina@ed.ac.uk

SafeNet: Nine months on

Processed with VSCOcam with c1 presetThe SafeNet project has been officially underway for around 9 months. As SafeNet begins to take shape so too has Project Manager Adam Rusbridge’s son who emerged into the world three weeks ago. The first project baby but, with another 15 months to go, there’s no guarantee he’ll be the last. Congratulations to you all, the gauntlet has been thrown down to the rest of the team.

The team have been productive in other ways since our last post on project activity. In January the SafeNet project group met at the Jisc offices in London for a face-to-face meeting that included colleagues from Jisc and EDINA as well as contributors to the project from RLUK and Stanford University.

The group converged to discuss work carried out and planning for the future. The team reviewed project activity that included, at that stage, consultations with publishers and the beginning of consultations with libraries around the pain points of post cancellation access. Consideration was also given to access triggers, content scope, community development and the eventual negotiations with publishers regarding the intended local load agreement.

Aims for the six month period following the meeting up to our next face to face in July 2015 included drafting and testing a publisher participation agreement for the service, planning the service infrastructure, and developing community engagement. These elements would be addressed in tandem with the practicalities of building a service platform.

The publisher participation agreement is in the final stages of revision and should be ready by July 2015 as planned.  Setting up the service infrastructure is progressing and we are investigating options for hosting and co-location.  In terms of community outreach the first meeting of the advisory group took place in York and we aim to take advantage of the input the group have to offer to ensure the resulting service meets the needs of the community.

Development of the Entitlement Registry has progressed. The Entitlement Registry now has a user interface which will be tested and refined over the coming months. Publisher and library test data has been kindly supplied for testing and Magaly Bascones of the KB+ service has been very helpful, providing insights into data held for NESLi2 deals. This data will form the basis for initial testing.

Finally SafeNet has attracted international attention and resulted in conversations with both German and Italian colleagues who are also exploring the national hosting problem space. More information on these and similar initiatives will feature in a future post.

All Aboard: SafeNet Workshop, York, 25/3/15

DSC_0119

The inaugural meeting for prospective members of the SafeNet Community Advisory Group took place at the National Railway Museum in York at the end of March. The CAG will provide guidance on community priorities and workflows as the project progresses to assist in the design of a valuable service.

John McColl (RLUK Chair and University Librarian, St Andrews) introduced attendees by outlining the changes that have occurred as the shift from print to electronic journal content has become more prevalent. John spoke of the need for SafeNet within the higher education community as libraries increasingly find that they no longer retain the kind of archival access physical material traditionally gave to readers.

Members of the SafeNet team provided overviews on the origins of SafeNet, project activity and current thinking about several issues in the problem space. Lorraine Estelle (CEO Jisc Collections) gave an insight into the involvement of Jisc Collections and the approaches they will take when negotiating with publishers to create a national archive of content.

In and around these presentations the group engaged in discussions about the project in relation to the community and their experience of the issues. Some of the key talking points are summarised below. The contributions from members of the group will prove valuable in meeting the needs of the community as the project moves forward.

If you would like more information about SafeNet or have an interest in contributing to this group please contact the project team.

Continue reading

A challenging project but an essential one!

safety-net

lorraineGuest post by Lorraine Estelle, CEO of Jisc Collections. Lorraine is Executive Director of Jisc Digital Resources and Divisional CEO of Jisc Collections, overseeing all of Jisc’s digital content and discovery related people, organisations, strategy, services and operations. Among her many successes at Jisc Collections, Lorraine was instrumental in setting up NESLi2 and devising a national consortium with an opt-in model. Lorraine sits on the EDINA management board and has been a member of the SafeNet project team since inception.

I can think of no other asset which an academic institution buys, but to which it has neither physical possession nor a recognised certificate of ownership. Electronic journals are unique in this respect. Academic libraries subscribe, at the cost of millions of pounds each year, to electronic journals under licences that grant them perpetual rights. This system works well providing that the publisher remains in business and the library continues to renew its journal subscription every year.

If an academic library is forced (usually through lack of funds) to cancel a subscription, the problem arises of how its users continue to have online access to the previously acquired journals, given that the content is generally only accessible behind paywalls on publishers’ websites.

Some publishers provide explicit information about such an occurrence and, for example, state that they will make a per-download charge for access to journals post cancellation. These proposed charges are the equivalent to around 1/10th of the current subscription charge. Other publishers are silent on this issue, meaning that a library cancelling a subscription would be required to enter into a negotiation with the publisher to agree an affordable access fee.

This situation is further complicated because an institution will typically only have perpetual rights to some of the journal titles in each publisher’s collection. In order to gain access, the library must claim its rights to the issues of journal titles to which it historically subscribed. The Entitlement Registry project run by Jisc Collections in 2011, demonstrated how complex and time consuming these claims can be. Very often, library records and publishers’ records of entitlement do not agree. This is exacerbated when the publication of a journal title has transferred from one publisher to another, or when one publisher has acquired another and entitlement records are kept on different and often out-of-date legacy systems.

It is this messy landscape which the SafeNet project seeks to address, by building a nationally managed digital archive of journal content and a registry of entitlement. It will provide access to those UK academic institutions which have bought perpetual rights, following a number of trigger events, one of which is post-cancellation access.

Some may question why a national solution is required when global digital and archival solutions already exist. There are indeed some excellent technical solutions, but none quite meets the needs of UK academic institutions in the way that SafeNet will.  One such solution requires payment of annual fees (which may be unaffordable in an economic environment which forces the need to cancel journal subscriptions). CLOCKSS is a successful global solution, but one which does not allow for post-cancellation access. LOCKSS is another excellent solution, but one which is arduous for libraries to maintain. None of these solutions provides a registry of entitlement.

Our vision for SafeNet is that it will be a highly dependable and robust part of the national academic infrastructure. It will be a challenging project, not only from the technical perspective, but because publishers will be required to agree that SafeNet can load and preserve their content. The project team will need to advocate that a national academic archival solution is necessary to safeguard continued access to the journal content purchased by UK libraries. We will need to demonstrate to publishers that there is customer demand for such a service; and that the technical and governance structures of SafeNet will ensure access to each issue of a journal is only ever given to users in institutions that paid for it.

A challenging project but an essential one! The financial future is difficult to predict and a safety net is required in the event of severe economic pressures that would force UK academic libraries to cancel journal subscriptions. Jisc working with EDINA as trusted, non-commercial organisations are well placed to safe guard the scholarly content in which academic libraries have so heavily invested.