Name:
Standards for data management plans
Description:
Standards for data management plans
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/568bbba8-8cab-43df-b24f-fc93e51d79c9/videoscrubberimages/Scrubber_1.jpg
Duration:
T00H34M46S
Embed URL:
https://stream.cadmore.media/player/568bbba8-8cab-43df-b24f-fc93e51d79c9
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/568bbba8-8cab-43df-b24f-fc93e51d79c9/Standards for data management plans.mp4?sv=2019-02-02&sr=c&sig=TJbfYK%2FxlDKLXb35%2ByvIuXsLwrhyGOaIXEagH7i83JI%3D&st=2025-01-15T05%3A09%3A15Z&se=2025-01-15T07%3A14%3A15Z&sp=r
Upload Date:
2024-03-06T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
It looks like almost everyone has followed the instructions properly and arrived in the proper place. Can I leap in and say how smoothly I think that went with and the intros and the outros? The little jazzy music was very welcoming and then we were very smoothly transitioned here.
It's great. Well, great. Thank you. Fortunately, we got all of our technical foibles out of the way with David. Hopefully from here on out, it would be smooth sailing. George, over to you. Well, thank you, Todd.
And again, I want to acknowledge our speakers, Michael, Maria and Jennifer, who joined us. I think this has been a great sample of the type of stakeholders who are supporting researchers in this space, each coming from slightly different but very clearly overlapping, intertwined visions of where to go. So the idea behind these sessions and for those who didn't receive the introduction earlier and Todd will correct me if I misspeak here, but my understanding is I was really looking to come out of the conference with projects, with ideas for things that we can take forward.
So I think we've all been in meetings where we come together, we have lots of great discussions and we leave. We don't know what we're going to do next. I think the idea here is we really want to generate some ideas that we can explore a little bit further. So I'll be taking some notes during the session, but I do encourage active discussion and it's OK if we don't have the answers today.
I think trying to come up with the questions is as much a part of it. For a hand over to some more general discussion. I did want to go to the chat. There was a question that Todd actually had raised, Michael, during your session saying that he'd be interested to hear what identifier is being used for identification of software in the guidance. I'm not sure if you're able to quickly respond to that one.
Sure, I'll give a very quick response. So we use digital object identifiers that get recorded in data site for our code. And when people submit metadata about code produced through Doe funding, through our portal, OCI or gov, they can opt to ask for one of those devices to be assigned and then we register it with data site. That's one of the a number of tools that we offer and all of that's available, a new web page discussing our persistent identifier services.
So look for IDs at Oswego for a lot more information there. George we can't hear you. I don't know. You don't look like you're muted, but I don't hear anything. I see you're talking.
No, no, you. It looks like you might have your mic on you. There we go. Sorry Zoom and I do not get along very well. It appears so. My brilliant segue. You'll just have to imagine. But my basic point was, I think I want to start by returning to questions asked early on in the presentations.
It was actually one of Michael's slides talking about the future of open science. And Jennifer, I think, did a good job in her session about talking about how data management plan discussions tie into open research and how, in her words, which I think she was paraphrasing for, Maria, unlocks the potential there. So part of the question there, part of our maybe starting place is around how do these community standards enable this picture and where we can go.
So certainly I'll throw it open to the floor, but if any of our speakers had any thoughts, they want to expand on that a little bit there about how setting these standards can really be helpful and where the gaps are, what we're missing. Can I ask a follow on question, george? Sure and I and I need my colleagues to help guide me, because that I'm still quite new to the data space.
But the framing of a conversation reminded me of Ted Lieberman's work on metadata in the context of institutional repositories. And I bet that some of our colleagues on the phone will be more on the phone, will be familiar with this. But it sounded like not all of the technology is driving institutional repository support. The kids are the kids that are becoming more common with other systems.
So is that a standard that's being discussed? Is it possible that the technologies can be built to support standardized pids across systems. So that those again, can help the material to travel to other networks? I'll just jump in. I think from my perspective, working with DMP.
What I was really heartened with Michael's talk was how much d.o. is leading the way insofar as the real sophisticated use of pids because as federal agencies are increasing data sharing requirements, DMSP is. You know, the use of persistent identifiers within the data management plan is also really important for tracking all of the information that they're hoping happens as a result of these new policies.
So I think there's only so much we can do as providers to push these practices if they're not also embedded in federal systems and systems like what the are building, particularly on thinking about the use of grant ids, persistent identifiers for dapps, the use of ORCIDs. I think some of that is happening within the federal level. But also just structuring the DMP, there's no need for us to still have that narrative documents.
We have the mechanisms and the means to structure that information so that we can use it going forward. So I'm really kind of that's my hope as far as these agencies are increasing requirements for data sharing that they'll also alongside that build robust systems for generating good structured data management plans and take some of the work that the community has been building for many years now.
And I'd be happy to add that from our perspective, the ANSI point of view. It's very helpful to have the persistent identifiers to link all of those products that are the outcome of the research we support. Because not only does that help us monitor compliance with what's being reported to us, but it helps us understand the impact that our federal funding has on the broader ecosystem.
But I think there's an advantage to the community as well. We would like members of the community who share their data and have that data used in other ways to enable new research to be recognized for their contributions to that broader ecosystem. So I think making sure that we can assign persistent identifiers and that they do get linked between all of the various products from data to the models they produce to the publications based on it.
And perhaps the code and software that was used to perform. The analysis not only helps evolve the open science ecosystem, but enables people to get the recognition for their contributions to all those different elements which are all very important to making progress in science. Well, I think you touched on the two parts of it, too.
There we talked about the top down, the requirements and the mandates part of it. But then we also talked about meeting what the community itself is already doing and how we find that place in the middle appropriately. I'd be curious for your sense coming from different perspectives there, how well you think we're doing about matching or are we coming together in the middle now?
Is there a disconnect or kind of where we can go from there? I'll try and answer. Oh, I see. Jen, do you want to go first? Nope yeah, yeah, yeah, Yeah. Let me pave the way and then you can say something smart again. So so, I mean, that's. That's the fun part, isn't it?
Because I think what we have here together are three organizations working at the frontier. So an ANSI that's taking a pioneering role with pids and two projects or pioneering with DMP and and with data sharing. So the challenge is, as you say, George, how to help the middle to settle. So I kind of wonder what our colleagues on the call are thinking.
How far how far away are they in their systems and their practices and faculty behaviors and supporting each of these things? And how can we help, you know, how if we're at the frontier, if how can we extend a branch to others to come along over to maria? No, I think you said it perfectly. So, yeah, I'm interested to hear from folks on where your institutions kind of lie within that.
If anybody's feeling brave. As I said at the intro, I want to encourage participation in all of these sessions. So this is a conversation. It's not simply a panel conversation.
Maybe to stoke the fires of conversation. I'll point out that the new osteon public access memo was setting a new vision that agencies are working on, responding to or earlier data sharing and being more open about the process of open science in general. And so I think it would be particularly helpful to hear from the community. What do you see as necessary to help enable that?
Do you think there's going to be some pain points within the community of taking steps in that direction toward this vision? Of course, we're very busy from the ANSI perspective working on what we owe back to OMB and OSTP as a response. But clearly they've set a new vision for the community, and I would like to hear what the community is thinking about how you'll be able to work that into your processes in the future.
I'll jump in and just start the conversation just from the DNP tool community that I work closely with, which is comprised of lots of different professionals, but primarily those data librarians and folks in research administration. And what I'm hearing from that community is anxiety around compliance and who is responsible, particularly with the new NIH policy.
So a lot of folks. Feeling unsure if this is sort of a new role at their campus. Where does it fit? Where does that money come from? What does compliance look like? And I think some of that is just this is a new policy and some of it is unfolding and people are still kind of figuring out some of the mechanics. But that seems to be the question that comes up the most and the most sort of anxiety inducing among the folks that I talk with.
And to some extent, that's part of what we're working on building, is automated systems that link existing metadata so that we can create records that can track these things. Impact being one kind of outcome, but compliance being another so that we can Ideally create systems that don't create additional burden on administrators, but really sort of build on existing infrastructure and utilizing good metadata.
I just saw a comment saying what I hear from the community. DMP or dmsp? Federal mandates don't have teeth. We need money from Congress. Yeah and interesting. I was in a meeting last week. And on your Maria.
Actually, Jen, I think it was your slide that had the image of this gray structure. And one element of that was assessment. And the idea that came up in the conversation last week with a group of open research funders had to do with data exchange and. How do we get assessment data back and forth?
And could we develop something counter like that? These are the data that want that need to be collected. And here is an API structure to transfer that data back and forth. Much like counter and sushi as a model for this compliance that Maria was talking about, what data do we need? What data how do we capture that data and how do we exchange it between the interested parties? Those could be government agencies.
They could be private philanthropic organizations. There is grants, management, middleware, software in here as well that could play a role in terms of gathering the data and exchanging it back and forth, you know? A funder might be interested in. Well, what is the usage? What are the downloads of the paper we just paid for open access or you know, has this has this research data set been deposited into such and such a repository?
What is the license? Those sort of questions. And I don't know what. Compliance would look like. But I can envision an ecosystem where you have to. Explain to people what and systematically what the data you want is how you want that communicated. And so you don't have manual systems, you know, manual people going in and they're like, oh, I got to get this data from here and this data from there and this data from there to do our annual DO compliance exercise.
Be interested to hear what people have to what thoughts people have in that regard. I would weight in again in absence of anyone trying to stop me. So they had the gray project from the NIH is a really interesting kind of Petri dish for all of these questions, because the idea is that multiple systems develop common standards for supporting policy compliance, both in the act, but then the evaluation.
So if these seven platforms all adopt to make data count as the mechanism for evaluating data as a first step. And then I can imagine that the group in the next couple of years will then become begin advocating for data, citation principles and promulgating those as a practice. And then again, within this Petri dish where we're working just with NIH funding and we're as I showed in my slide, we're being very careful about collecting funding Information Association with one with each NIH Institute.
You get a complete picture there, right? You you get the funding, you get the products, you get the objects, and then you get the outcomes. So it's all being it's all being built in this small area. So I agree with you, Todd. What can we borrow from that? And how could that evaluation side be built out? I mean, that's to some extent what we're working towards building right now in DMP tool.
So one of the sources that we will be checking for associations with data management plans is dryad and other repositories also just pinging data site and the other doh providers for connections to, to projects. And so by linking it to this larger ecosystem, you don't have to use DMP tool to access this information. It's all shared through this larger open Doi infrastructure that we're working with.
And we do have the ability to include things that aren't dois recognizing that there are other identifiers utilized as well. So those are included in the application currently. So what we're working on building out right now is kind of a dashboard for administrators that will list plans so that they can sort of structure and register their plan, get a Doi and then list all of the associated outputs. And that's not just data sets, that's also manuscripts.
It could be samples, it could be protocols, it could be software, it could be anything. So the idea is that we're bringing all that information together. You can view it through DMP tool, but that's not the only source of access. But I was interested from Michael because I don't know what does it look like from the side as far as like tracking outputs connected to projects and plans at doe?
So a lot of the information and connections are available through our stickup site. So it does have connections when available between different products. And we have a few different sort of viewers. So there's a portal, the yellow pages, to look at submitted manuscripts or publications. There's code for software that's submitted, and there's also a way to look up data.
And one thing we try to do is show the connections between all those elements as well. And that's enabled by the fact that we can help provide persistent identifiers through services like connecting with data site, for instance, for code. We also register dois for data metadata. I guess metadata about publicly available data sets that are submitted to us as a research product as well. So we do have a portal that can let you start exploring all those linkages.
Although the idea of being able to see everything that has come out of a particular award is also pretty interesting. And awards are the next big issue that our Aussie colleagues are working on tackling right now. We have that at a few levels. Not only do we provide funding as awards, but our user facilities also award time on these amazing devices that enable science.
So it could be that you could see that one of the awards of beam of time at one of our user facilities that enabled additional work and track that through its own persistent identifier. So that's another thing we're working with our laboratories to try to offer so that we've got portals into what were the outcomes of a particular way we supported science.
I'm going to jump in with another question for you, Michael. Are the agencies. Each ANSI is supposed to be creating a plan for OSTP and OMB. Are the agencies aligning their policies around consistent identifier policies and metadata behind the scenes? I know Doe has been doing terrific work and has been a real leader here along with NIH.
Is there a sense that most of the others are following along? They're going to jump on, you know, some of the services that Austin provides. I know you provide some of the Austin data services for other agencies. Are we do you see the agencies headed in a kind of. Collaborative, single focused directions, hopefully. Well, I'll try to answer that question in a few parts. So first, the memo actually had a different timeline for the persistent identifier piece of the ANSI responses than other elements.
So whereas the publications and data access, we need to update our plans and they're due later this month in about a week. Back to OMB and OCP, the part about persistent identifiers, we have some more time to try to generate a response to. And part of that is to enable some more discussion across agencies, discussion across agencies to be more coordinated in our responses in general was also a part of the memo, so we're certainly continuing to do that.
We've already had many discussions that started through the NSC, the National Science and Technology council's subcommittee on open science. Persistent identifiers have been a topic among those interagency discussions for a while, and so the aim would be to continue discussions there and understand better approaches. But we do have a longer time scale to develop that particular response.
So I don't think there is sort of a unified response at this point, but discussions are certainly underway and part of the memo was also to continue that interagency discussions and coordination around all of these elements of public access. So we'll continue those discussions through the NCCN as well.
Are there any questions from the participants? And George, you might have tried to interject something, but I don't think your microphone was working. There we go. I appear to have been triple neutered. Just not another thing I knew I could do. I was just going to say, I think that some of this, too. I want to turn again to that comment from the chat about the resourcing question.
And I think it's not just about dollars. I think there's lots of other parts to that resource and question time bandwidth, so many other considerations that I think are going into it. And without knowing who is represented amongst the attendees here, certainly encourage you to put forth any other. Any questions or concerns that you have? Are there any things where you're really facing those challenges resourcing wise?
Because that's where something like a NISO project can probably get involved. Obviously, you're not going to be able to open up checkbooks or anything like that, but I think certainly other things to address resourcing concerns. Can I offer also, George, that it would be great to leverage what we've learned from open access.
You know, the first requirement from a public funder for there to be public access to the outcomes from their research was what? Todd 2005 when you and I were across the office from one another. So so that we have a lot of learnings with respect to policy implementation, behavior change and funding flows. We've made some mistakes around article processing charges and block grants.
And stuff. So so how do we find new, new pathways? Yeah I mean, I feel like I have a list of things that I feel like we could use additional funding for. But the two that really jump to mind when I think of these sort of increasing federal requirements for data sharing. One, of course, more clarification on compliance and like what that process is going to look like at the institution level.
Is that a new branch of a grants office? Like what? What does that look like and what are the mechanics of tracking that? And that will probably involve more resourcing. The other thing I think about is more from my time working with Jennifer, with dryad and thinking about, OK, we've got these mandates for data sharing.
We don't want to just throw over a bunch of like unusable data that nobody can reuse and build off of. We need good curation, we need good stewardship, we need good data management plans, and we need professionals who can really go in and take the time to make sure that what we're producing as a result of this work invested in data sharing is actually useful and usable, and that takes tremendous resources.
And so scaling that and creating systems that allow us to curate large numbers of data sets at scale. Is definitely going to require resources and also new technologies to optimize workflows and to make things to make the process faster. It's just inevitable. Otherwise, a lot of our work is for naught, right? If at the end of the day you aren't curating it, it's not reusable, why bother?
That's my opinion anyway. I think we're kind of caught in the trap, too, where you almost need to be able to demonstrate the value. But in order to demonstrate the value, you need to invest in it going forward. And so that's where I think starting with these agencies and being able to report on those outputs. To come back to Jennifer's point about what we've learned from open access.
I think we start out with a premise about open access that's been tested and proven revalidated and questioned over the years, and I think something similar is appropriate here. We only have a few minutes left. Just wanted to see either via comment in the Google Doc or a comment in chat or raising your hands and speaking up there.
Any points that we haven't addressed that anyone? So we're lacking anyone wanted to make sure that we did talk about or did consider. Is there still ongoing work within RDA for the data model for dmp? Is that still an ongoing project? Yes they still meet.
I don't think it's considered a working group anymore because there's like, I forgot how many months you can be a working group that whatever that number is, is, is over, but it's still being maintained and kind of folded into the active interest group. So it's very much an active community. So yeah, not officially, I think technically not a working group, but more in the ongoing maintenance mode.
RDA has its own level of bureaucracy. It's strange, it does, but I don't know. I consider this to be one of the real big successes because it underlies everything that we've built in the DNP tool and the fact that it was an actual output that came out of the community through RDA, through the hard work of a community is pretty remarkable.
Does that community need additional? You know, participation. I know being the head of an organization that runs a lot of these committees, oftentimes it's like, oh, that thing, it's still exists. And there's like four people who maintain it going forward. I'm wondering if like over time. How do we make sure that that group remains active and vibrant in its maintenance of this structure?
Yeah I mean, I'm sure they could use additional resources. It really is a small group of probably about four who maintain it. I am a co-chair of the active interest group, but I am not on I am not a core member of the common standard working group. So yeah, if that was something that niceville was interested in, presuming I can put you in touch with the right folks, there's all sorts of ways that we could expand on the common standard to include more of the narrative structuring more of the narrative components of a data management plan.
That's a piece that hasn't been fully kind of developed out. So there's, there's potential if that was of interest, Yeah. I'm just I'm not thinking that, you know, nice so directly would get involved but we can. Provide ways of vehicles for you to promote, slash, engage and reach out to the community to know, hey, these issues exist and you're going to have to deal with them.
And wouldn't you rather be part of a conversation about how these things are formed rather than, you know, implement it after the fact. And then complain that it wasn't done right? Definitely, Yeah. It's a yeah, it's a very active group of folks and they've been very responsive. Whenever I've asked for changes or modifications, it's been a good group to work with.
And they always meet at RDA. It's the same group of people. So if any of you are interested and attending RDA in person or virtually, I suggest those sessions. Right I think we almost reach time and there doesn't seem to be any more forthcoming questions from the audience or comments.
Todd well, yeah, just before we break, I want to remind people on the schedule we are set for our mid-afternoon break, but we do have a kind of lively and fun social event at 600. It's a team Jeopardy with a kind of standards focus, but standards technology, you know, all of these fun topics that we've been talking about. That starts at 6 PM.
You will join that the same way you join all of the other sessions by clicking and just get. So thank you so much. Thank you to our speakers. Thank you to George for being the moderator for today's session. Really appreciate it. The recording of this session as well as the chats I'll copy the chat into, as well as the chat from the other conversation into this Google Doc.
And we will post everything into Cat more as soon as we get around to processing it. Thank you all for joining us and we'll see you all at 600, hopefully. Thank you. Thank you. Thank you.