Name:
Persistent Identifiers – not just a termin[al/us] Recording
Description:
Persistent Identifiers – not just a termin[al/us] Recording
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/afa0fb2b-402f-4e2f-bff0-768dc5bdca25/videoscrubberimages/Scrubber_3.jpg
Duration:
T00H44M03S
Embed URL:
https://stream.cadmore.media/player/afa0fb2b-402f-4e2f-bff0-768dc5bdca25
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/afa0fb2b-402f-4e2f-bff0-768dc5bdca25/Persistent Identifiers %e2%80%93 not just a terminal-NISO Plus.mp4?sv=2019-02-02&sr=c&sig=7auThF3Usr%2Bw5PL78Zw4Zze4tVxKf5rtsnoaAUIcjl4%3D&st=2024-12-10T07%3A25%3A41Z&se=2024-12-10T09%3A30%3A41Z&sp=r
Upload Date:
2024-03-06T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
OK thank you very much. So I am indeed Dr. Adam Vials Moore. And with me today is John Kay.
We are both from Jisc and joining us in discussion later hopefully will be our colleagues Tamsin, Tim and Christopher. And indeed we are talking about persistent identifiers, no longer terminal terminus. And what does that actually mean? Well, I've got a couple of things about the overview here. So persistent identifiers up to now have mostly been used in representing the research outputs that have been produced at the end of research process.
So we're going to talk about identifiers as they look at, first of all. Where they are in that kind of terminus, in that kind of final stage, and then kind of looking at them and the surrounding metadata and the kind of infrastructure that could support rich culture of the whole research process. And to do that, we're going to look at a couple of systems.
You to look at the kind of sectional component driven chaining of the octopus publishing system. Look at the narrative sectional chronology of the RAiD persistent identifier, research activity identifier and then look at the kind of arising of structure and policy implications that come from. These kind of more fine grained, more chronological ways of.
Breaking down and identifying the research process. And then after that, going to have a bit of discussion about these kind of sectors of interconnected information, fabric that results from using these persistent identifiers, and also the ability to more finely interconnect the research information fabric and whether that's now actually becoming closer to this global system of connected information that was invented by the early hypertext pioneers.
So Bush, Nelson, Englebart. Now, as you submit the kind of proposals for these sessions, you're asked what kind of questions these sessions are set to address. So these are. the questions that we thought would be relevant to the session. And it would be great, if you could bear them in mind, because I think they'll be good to frame the discussion that we have afterwards.
So persistent identifiers, PIDs could now be used to enable a more nuanced capture of process and practice. How do we bring about, influence, culture change to embed this and you'll see why we think that's an important question as we go through this. How should that be reflected in the policy landscape? Probably also important, and I think just talked about the. Are we now creating more real hypertext OK things to think about.
And let's start by. Asking what do we mean by terminal? Well, when I was a first year, many decades ago, my children liked to just in the stone age when I went to school in a cave and chipped away at a rock on slate. They kind of had a terminal exam at the end of my first year, which kind of shocked me, but terminal the end of something.
And yeah, they're just kind of the final resting place for outputs. And that DOI, that persistent identifier, is often used as kind of identifier for the version of record. You know, it's not dynamic, but I'm sure we'll hear another here about the kind of issues around versions now. But it's also kind of a result of culture as well as infrastructure.
You know, it's kind of seen as an important thing to do to produce a record of the output of your research. So it's part of the culture of the research process as well as the infrastructure itself. So at the end of something, you produce something that shows what you have done, and that is kind of the final thing that you do. You write your stuff up. Kind of starting to break down some of that.
You know, we've had preprints we've had some work on looking at. Filing the start of trials, and things like that. And one of the things that being certainly here in the UK we've been looking at deposit on acceptance. The author accepted manuscript. We moved to capture the process. Again, it's a process of the starting to create that terminal again though.
But part of the issue with that is that these also accepted manuscripts don't necessarily even have a persistent identifier, and so they're not always captured within that global information fabric. So there are even issues around kind of this really important part of that process, not necessarily being integrated into that information. It so how do we move away from this kind of culture and infrastructure of terminal and of process reporting?
So we're going to look at two things to do that we're going to look at the component based octopus system that gives you more fine grained approach, ORCID at the very heart of the system. And looking at these kind of system identifiers, you got DOI, which allows you to reuse various separate components of that research process, allows you to link and chain all those components together.
And we'll hear about that next. And then we're going to look at the kind of narrative system, identifier system that's read to capture chronology, to draw other persistent identifiers into this kind of narrative. To surface impact to allow you to reflect on your work. So far on the various components of it, and to bring about this kind of idea of a scribe at the side, which I'll talk about next.
And now I'm going to hand over to John Kaye to talk about Octopus. Excellent brilliant. Hi, everyone. My name is John Kaye. I'm head of product for head of product at Jisc. And I'm here to talk about octopus.ac. I'm going to be referring to it as octopus dot ac as if you Google octopus, you get a lot of things coming above.
You and in the UK we get an energy company as well as the kind of sea creature. So just to make kind of life easier for folks but want to know more, you can kind of look at the website that I'm talking about at octopus.ac So octopus.ac is a new platform for sharing primary research. It's a place where the academic community can freely read, review and register their work.
Crucially, octopus publications are not intended to be papers. Instead, they're intended to be a new primary research record, a kind of patent office for scientific research, where the emphasis is on recording work in full as it happens, and providing a place to share research outputs that have traditionally struggled to find a home. These types of outputs could be kind of small scale data sets, folder or hypotheses and things like negative results.
Key to to this is the idea of breaking up the concepts of the paper or article as the main unit of publication. Instead, octopus has eight smaller output types, hence the name octopus. These are designed to match the stages of the research process. And you can see the kind of eight types on the right hand side of this slide. They go from kind of research problem down to kind of looking at real world applications.
And I'll talk about those in a bit. But yeah, the platform is entirely kind of open and free to use. Researchers can instantly share, share, share their work in details of the publications that are linked in branching chains, which are going to those kind of linkages in the second we have open post-publication peer review. We integrate with kind of existing kind of research systems, and I'll go through all of our kind of pet integrations as that's the topic of this talk.
And we're also kind of aligned to kind of open research policies and, and, and best practice, but mainly in kind of doing this, this work, hopefully kind of we still are enabling kind of the record that Adam mentioned, but not right at the end, not the terminus. We want to be able to use the octopus outputs as kind of to represent kind of the processes of the scientific process and to record and to allow researchers to record as they go.
So Adam The second slide. Thank you. So the scientific process naturally comes in a series of steps. Each step requires different skills and resources, and there may be different individuals specializing at each stage, working together to produce a whole. Forcing people to get right to the end of the process before sharing any of it.
As we have the traditional publishing model we believe is causing some issues and has led to kind of demands for things like preprint repositories and things like kind of questionable research practices. So kind of p-hacking and HARKing happened because researchers are trying to make their work more publishable and make their results look positive and more impactful in order to remove that pressure.
Octopus no longer values the research on the basis of the findings, but in the intrinsic quality of how it was done. And it also removes the kind of narrative element from publications by breaking the paper up into smaller pieces. As I said, octopus first can publish each stage. As it happens. And kind of this fits with kind of models that are coming out of kind of many disciplines and kind of things like kind of pre-registering and registering reports and that this is very, very important in some disciplines.
Different authors can link from the same publications. So several hypotheses may derive from a single problem, or several authors might consider a real world application from the same findings. It doesn't have to be kind of the single author or collaboration writing each point in the chain folks can kind of see what's been done before and adapt that and kind of carry on, carry on the work in that chain and create great new change.
So every publication has to be linked from an existing one at the previous stage. So for example, a hypothesis has to be linked from a problem. Data has to be linked from a protocol and on through kind of analysis the interpretation. And I'm can have a real, real world application.
So having these links. It does reflect the way of working and reflects the kind of steps within the scientific process. It also helps with kind of navigation and discovery on the site and showcases the kind of inherent interconnectedness in academic research. We're keen that no single publication can ever appear in isolation on Octopus.
So we'll be showcasing these links on each publication page and looking at ways that users might want to be able to browse these links. We have an additional type of review. So the, additional type of output, peer review that reviews can be published and attached to any kind of publication. They're treated and valued in the same way as other publications, including being assigned to their own DOIs They arise because reviewing is a scientific skill just like any other.
And hopefully by incentivizing it, we hope to encourage constructive peer feedback. And if you go on to the next slide, I'll start to talk about how we've kind of work with our development partners to launch octopus within the last six months or so. So just we're working on the project in collaboration with its founder, Dr. Alexandra Freeman, who looks after octopus CIC.
We're funded by UK research and innovation, so the main kind of government funding agency in the UK, we have a key project partner in the UK reproducibility network who are going to advise us on strategic and practical issues and feed into our development. And we've kind of got to kind of PID partners on that. These are the first integrations we've done as octopus. We thought we'd kind of built PIDs in straight from the start, and we're currently working with both ORCID and Datacite and have kind of the integrations that I'll show you shortly.
But yeah, kind of our key principle is that we want a kind of a platform that's developed by the community. We have a active user community, a critical friends panel, and we work with key stakeholders and specialists. And at the end of the presentation, I'll also talk about how you might be able to get involved in that. So if we have the next slide, thank you.
So the first kind of PID we thought we'd leverage with an octopus is ORCID. And this is just a screenshot of the octopus to see if you want to look at it live, you can kind of go to go to octopus at your leisure. But yeah, kind of one of the first things you see is a kind of button sign in with ORCID octopus. You can kind of read and browse without signing in, but if you sign in with ORCID, you can publish, you can interact with articles, you can kind of write reviews, you can kind of red flag for plagiarism or misinformation or something like that.
But yeah, kind of to, to do anything kind of but kind of read, read octopus. You need to be able to, to sign in with ORCID. So next slide, please. Um, but kind of. That's not what. It's not just the Log in on octopus. We decided that we try and kind of leverage the pid infrastructure as much as possible.
So most of the information we get about our users comes, comes from ORCID - our profile pages on octopus are derive from kind of the ORCID works and other details, such as kind of employment and education that come directly from ORCID. And we add the kind of octopus kind of works to the end of that we set out on the octopus mission to see could, we could have set up octopus without having to take in any information about the users and get those from kind of other sources.
At the moment, we do have to get a email address from users. And that's because the there's a lot of blanks in ORCID. We couldn't rely on that field within ORCID, but every other kind of piece of information we use or collects about a user is, is from is from the ORCID system. So so, yeah, we try to leverage our kind of integration as much as possible and could have the next slide.
Thank you. So as I mentioned, kind of every Octopus output gets its own DOI. This is a kind of screenshot, the top of one of our outputs. So this is a real world application. You can actually see the contents of the output, but we have a kind of nice branching kind of graph of the, of the chain structure at the top so you can see where the output has being derived from. So when you publish on octopus it gets assigned a Datacite DOI straight away, we send the the necessary kind of information to Datacite So we say that the output type is a kind of research problem hypothesis method in the kind of in the kind of subtype there.
And we also in the data like metadata, create the link to the previous output type. So this real world application will have a kind of is related to the, the, the interpretation and so on. So as well as creating these links within octopus, we try and kind of broadcast those where we can as well. So yeah, datacite metadata is really, really important to that, to that process for us.
So at the next slide. Thank you. The other pid we use within our publication workflow is to try and get our users to use the research organization registry. So the ROR pid at the moment, it's a little bit it's a little bit dumb really. We we have a link there, which basically says go and search for your organization's ROR that sends folks off to the ROR registry where they can kind of look that up and then they can put the ROR ID in there.
But when a researcher is putting in kind of the filling in kind of the form to publish their outputs, they don't really want to do that for their kind of their affiliation or their funding on the output. So kind of one of the things we want to do kind of within the next few months is kind of bring ROR with the octopus and allow folks to kind of search the ROR data there.
And yeah, so they don't have to go off to the website, copy it as your URL and bring it back. We'll make that a lot easier. But yeah, we are trying to encourage people from the start to use that. We think it's kind of really important and it will really help some of the kind of future integrations that that I'll talk about in a second.
Next slide, please. And another thing we have used in terms of kind of pids is to help us organize our data. So I mentioned that. We didn't want any octopus output to stand alone and that that means that the first outputs within octopus needed to be attached to something. So we've got some data in octopus now, about 7,000 records that we're calling seed data.
There's some high level classifications. They're derived from the Library of Congress. But then we did a project in a number of disciplines, including kind of psychology and neuroscience, also some kind of medical science as well, where we use machine learning over a text corpus that we derived from PubMed and the Core aggregation. That algorithm basically was looking for research problems within papers and it brought them out.
There's an example on screen now for the COVID 19 pandemic has had a massive impact on health care systems. What happens? What happens there? The algorithm looks for these problems within the corpus. And then it brings out basically spat out bunch of DOIs that we could use as references for that problem. Those those references, we didn't want to just give people a kind of DOI string, so we kind of use the cross-site kind of service that is run by Crossref.
And yes, collaboration, crossref and Datacite and so we can just bring back the references in a nice format. So if you have a look, if you go on octopus.ac, you see there's thousands of records there by a user called science octopus. That's that's the seed data. But yeah, hopefully we've made it as easy as possible to see kind of the links there that where this information has been derived from, where we've been deriving the kind of problems that we've pre-populated octopus with.
Adam, the next slide. Thanks and finally, for me, that's not the end of our kind of journey with pids. Yeah, we got set up initially where I kind of ORCID at the heart of the system. Everything kind of gets a DOI. We're encouraging kind of usage of ROR, but there's improvements to be made currently to publish a new version of everything on octopus.
It's a new kind of publication. You have to do your own kind of linking between the two. One of the next tasks we've got is creating a reversioning system, and that also includes creating kind of a DOI versions for those outputs, making it as easy as possible for the, for, for the end user. So they don't have to kind of be switching between publications.
We're going to be leveraging what we have done with the pids for kind of institutional integrations. There's some we're working with the University of Cambridge on an integration project where we're going to be building our OAI-PMH feed. We're looking at kind of implementing resource sync, but for UK institutional repositories, we will be using Jisc's publications router service that uses the pid infrastructure.
It looks at the ROR, it looks at the ORCID and it routes octopus outputs or any public publisher outputs through the publications router to the University repository where it wants to, where it wants to land. So having this kind of high quality kind of PID infrastructure, having this data attached to our publications allows us to do stuff like that. So octopus can push what it thinks is institutional.
So we can push researchers papers to the right to the right institutions. So so that's exciting work that's going to be coming up. And we have been asked by a number of UK institutions kind of how we might work with their systems and this is the first way we're going to be going to be doing that. I mentioned the research organization registry and search within octopus.
We have one more work to do with ORCID, and that is to create a bespoke yeah, a write integration so that ORCID profiles can be updated by octopus if folks opt in. Currently profiles can get updated with octopus outputs by using the datacite import tool that you could get updates for all datacite outputs. But we're going to create a kind of a standalone octopus, one where we can probably kind of get things like our octopus output types and things like that into, into ORCID profiles.
And then finally we'll be looking at how we might be able to use kind of the event data that we get kind of from datacite. So where our it's been kind of republished or used. And that's mainly going to be for kind of creating links to things outside of focus rather than kind of metrics. So one thing about Alex was when you look around it, you're not going to see any kind of numbers about usage. We've also kind of like taken the bias out of the system by not using folks kind of first names, not showing their institutions prominently and things like that.
So Oxford, as it stands, doesn't want to create any kind of sort of metrics driven kind of approach. As I say, it really, really wants to focus on kind of showing the process, showing the record, and then, yeah, kind of making that available. So what we do is the event data probably will be more along the lines of kind of discovery and linking into the kind of wider kind of scholarly communications graph Adam next slide, please.
Something that feels like it a. That's the end. OK so I was just saying. Yeah, so yeah, because that's kind of octopus in a nutshell. As I said, there's kind of lots of screenshots there. If you want to look at it for yourself. There's octopus.ac. We also have a user group and you please kind of go onto octopus to have a look to see how you could get on there if you'd like.
And yeah, kind of my emails at the end of this presentation as well. So thank you very much. So that's octopus and components. And so what we want to think about now is RAiD and narrative. So as well as breaking the process down into different steps, we have RAiD that can capture the.
Narrative of a research process. And it does that by. Well, there's a number of different kind of metaphors that are about my favorite. As you can see from the picture is the washing line. I like to think of like setting things along. As as you create this kind of image, you can also hear people talk about it as a canoe.
I'm not so sure about that. Or, you know, there's the more common idea of an envelope. None of us are going back to SOAP as a protocol. God, I hope. RAiD, the research activity identifier, it gives its full title. I think there's a kind of two standard things. on my top slide there. It allows for narrative and it delivers impact.
So without going on to any of the further slides. Let's just think about this for a minute. What it allows you to do is to create one identifier for a project, an activity, a theme, and to this identifier attach. Other things. But with timestamps. So this allows you to say, I have something happening and here are some pieces of funding.
Some people, some places. Some outputs no longer, like just some articles and stuff. But as we've just been hearing from John, some hypotheses, some small pieces of data. So a research problem. And so on. And then. All of these pieces can be connected together. So as they're connected together with this one overall kind of identifier, it allows you to look at them, join together over time.
And see kind of like the impact to look again reflexively and reflectively at them and carry that kind of narrative through of the work. OK probably should do some pretty pictures. OK so here's one kind of nice picture with like Big Easy to see things. So, you know, RAiD gathers together this idea of institutions. Where might this work have been done, where it might be centred?
Again, for the work that I've been doing for the last few years with the practice research community, with arts, humanities, you know, there's not just the idea of where work is done or where a work is funded from, but also the place that an event might take place or the place. Where a performance might happen? What kind of tools. And service it is?
What collaborators might be involved? So again, the ORCID records from the various people can all be attached at the various different points that people have been involved in this particular process project theme. So as people contribute to something, their input, their Association can be acknowledged and credited.
And there will be other presentations, I'm sure, around how important crediting the various contributions, not just of researchers, but of all the people that are part of a research process, should be. The data and the funding. And again, for some of the. Process and practice based research. That funding isn't necessarily one large grant, but it's some amount of money that funds some part of the work and some amount of money to fund some other part of the work.
And some of that work is done at one place, and that works in the rest. So this idea of being able to capture all of these things and to bring them together and to allow that kind of process to be underneath one single thing is incredibly important. Now, the big diagram in blue, produced with the assistance of the morebrains cooperative under the auspices of some Research England funding at Jisc, which shows you just the many different places along the lifecycle.
There you go on there it leans on. the thing... I'll see if I can circulate a more high detail one of that during or after the talk. So that will give you an idea of what we've segmented out some of the. Life cycle, and we basically got a huge number of different points, touch points of where things could be attached to a RAiD. I just give you some idea of all the different kinds of things that we capture and process.
And on all of these things that can be brought together to, you can see some of the narrative that we brought along. OK so what is a RAiD? Well, I guess for those of you who are interested in kind of these things, so it's a handle which allows pids to be associated with temporal data.
There's a NISO standard for you guys to go and read. Very exciting always. And you know, there's the number and the URL. This is not quite up to date. Metadata envelope type technical diagram for you. And again, this will be available after thing. So I don't want to dwell on it too much, but therefore I think.
For those of you who are familiar, that is a court transcribers keyboard. And again, I thought I'd come back to this whole idea of the scribe on the side. So, yeah, the idea is that these kind of component narrative based things allow us to now sort of capturing the narrative of the research process. They allow us to do more while we are creating. And completing our research.
So instead of going through an entire research process and then at the end writing stuff up, capturing stuff in disparate notebooks, creating bits of data, doing all those various bits of the process and at the end, creating an article. They allow us to do those various bits at various times, capture them and give us a point where we have reflection, where we have a narrative, where we have those various components that we can link together.
They interfere. Not necessarily stop. But they interfere with these ideas of HARKing, of p-hacking. They lend themselves to better research practices. And from the PR voices. Scoping project that was funded by the AHRC. I've just got this one quick quote from Helen Bailey at King's.
We were interviewing her as a part of a researcher's perspective on kind of all of this technology and process. You know, and I think this is really helpful and kind of illustrating why it's important to start using these and changing these cultures that sort of process the portfolio sits alongside as you're undertaking your research, not necessarily retrospective anymore.
It's a way about develop a portfolio. It's helpful in terms of understanding the insights of them and go onto like this is part of a wider thing that's available from the PR voices website. And it talks about the importance of reflection, importance of capturing. Insights on how you gain lessons.
OK moving on to that next bit. I've got one slide on this because I want it to be more thought provoking rather than going into lots of detail. So for open access and fair and this kind of progressive sectional output piece. So these are also accepted manuscripts I talked about right at the start as they are kind of a move towards this.
The idea of capturing that process and they are removed kind of like choices. Open access is this way of opening up a view and an insight onto the author rather than something that's being produced by the publisher. It's really important. For things like this, that there is a persistent identifier attached to this. Without that kind of shift towards giving important resources like this and outputs like this, there won't a persistent identifier.
There won't actually be a way to. Access that information in kind of that global information public. And the same for all these kind of pieces, these components, as John pointed out, for each individual output from octopus and for those other infrastructures that are coming that do the same kind of job in. different disciplines, different roles without persistent identifiers, without kind of schema and taxonomy that underlie and underpin those things.
We won't be able to access, share, discover and reuse those things without that kind of linking and chaining between those without understanding the relationships and without being able to then discover and reuse and build on top of those things. The production and kind of creation of those things is in itself. Only part of a process of. Changing kind of the culture without then making any progress in kind of the process of the wider Information Center right allows for the connection of these things together.
But how does that fit into the wider open access and impact discovery thing? So if you're connecting things together to building that narrative, how does that then give you? Something that fits more widely into policy landscape, and the same with improving discovery in landscape. How do you build the policy that reflects all those vital pieces? So if you've got a policy and a culture change that's required to both have a process in the kind of building all these individual components and linking them together and creating a system, identify how do you build policy that encourages that culture without creating too much of a burden?
On actually having all of those individual pieces of creation and allowing for that reflection. Creating that better piece of culture. Now on the final piece of kind of like what's real. Well OK. So Bush in 1945, at the end of the war, he wrote, as we may think. And then that top image, you've got a picture of the memory extender, the memex desk, Doug Engelbart, who was a lovely man, did the Mother of all Demos in 1968.
His wife still is running his foundation, bootstrap Foundation. Ted Nelson. Many amazing things. He wrote computer lib / dream machines, 1974. Everything is deeply intertwingled He did Xanadu / Udanax and zigzag / Gzz amongst other things . I think that is indeed my hand from the YouTube video on the YouTube of the ZigZag demo for that one.
Watch Doug instead? Hypertext rather than the world wide web, which is not hypertext or the internet, which, as we all know, is the road on which those things run. It's not even pretending to be that... real hypertext is bidirectional. It has micro-transactions. It has provenance.
Governance has structure. It has a deep set of reliable, robust connections. It's entirely the kind of thing that you see from the kind of persistent identifier structures that we're building. That kind of information fabric where things are reliable, they're governed. They're not necessarily free, but they're the cost is kind of built into the infrastructure.
All of those things are seemingly part of those kind of early versions of connectivity and those attributions, those linking together the metadata supports that. One way of thinking about that now is if you look at those kind of research graphs that are being built and there's been several different versions of that are starting to come together and be collated. I mean, they are post hoc in general, so they're looking at like mining the relationships in the pids and surfacing those things, surfacing the outputs and funding and DOI, the kind of researchers in ORCID and the institutions or in linking all those kinds of connections together.
But you're starting to see that kind of secondary fabric on top of those individual connections that are built from each of the declarative nodes within the system identified. So there's certainly as we start to build on that kind of secondary fabric and look there, we can see that kind of hypertext start to. We are very interested in discussing people's views of that.
So in summary, those threads now enable a more nuanced capture of process and practice. But we need to look at the infrastructure and see how we evolve that. To allow us to take advantage of them. We also need to bring with us a change in culture to embed. And then with that kind of need for evolving our infrastructure and changing the culture, we need to think about how we reflect that in the policy landscape.
And then all of that thinking means that we, you know, we bring good practice. Best practice would be good, but obviously we'd need to have some way of evaluating that, he said. OK and again, a reminder of those questions. Um, and thank you. And this is probably some of the information that John was looking for.
That's me at the top, I don't really do email, so get me on Twitter. That's John's email. Oops that one back there. And then Tamsin, Christopher and Tim should be joining us. Yes there's Jisc @ukpids Twitter, @raid_pid. And as John said, octopus.ac and you get them on Twitter.
@Science_octopus. John, was there anything you want to say finally about octopus community or anything like that? No just head over to octopus.ac and you'll be able to find kind of information there and learn more section about how to join the community. We've got a mailing list and a teams community as well, so.
Fabulous! so thank everyone for attention and look forward to discussion. Thank you very much. Thank you.