Name:
Ask the Experts about Open Data
Description:
Ask the Experts about Open Data
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/47f3b0e1-d74a-4b0f-9fcd-1c403228f1d8/videoscrubberimages/Scrubber_89.jpg
Duration:
T01H00M31S
Embed URL:
https://stream.cadmore.media/player/47f3b0e1-d74a-4b0f-9fcd-1c403228f1d8
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/47f3b0e1-d74a-4b0f-9fcd-1c403228f1d8/GMT20230920-150034_Recording_gallery_1920x1080.mp4?sv=2019-02-02&sr=c&sig=U%2B61IQlsokL7ngqFXiNC5zgQPJfT81DF29cxusWnyjE%3D&st=2024-11-26T09%3A23%3A16Z&se=2024-11-26T11%3A28%3A16Z&sp=r
Upload Date:
2024-07-22T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
Thank you and welcome to today's Ask the experts panel. We are pleased you can join our discussion on open data. I'm David Myers, SSP education committee member and a senior lead publisher at Wolters kluwer. Before we start, I want to thank our 2023 education sponsors, silverchair, 67, bricks, Taylor Francis F1000 Meraki.
We are grateful for your support. A few housekeeping items. Attendee microphones have been muted automatically. Please use the Q&A panel to enter questions for the panelists. Our agenda is to cover whatever questions you have, so don't be shy about participating. You can also use the chat feature to communicate with other participants or organizers.
Closed captions have been enabled. You can view captions by selecting the More option on your screen and choosing show captions. This one hour session will be recorded and available to view in a few days. A quick note on code of conduct in today's meeting. We are committed to diversity, equity and providing an inclusive meeting environment, fostering open dialogue, free of harassment, discrimination and hostile conduct.
We ask all participants whether speaking or in chat, to consider and debate relevant viewpoints in an orderly, respectful and fair manner. It is now my pleasure to introduce today's moderator, Anita ward, vice president, research collaborations at Elsevier. As vp, Anita focuses on working with academic and industry partners on projects pertaining to progress modes and frameworks for scholarly communication.
Since 1997, she has worked on bridging the gap between science, publishing and computational information technologies. Efforts include working on a semantic model for research papers co-founding force11, supporting the development of standards and models for research, data management, and a series of workshops on scholarly scholarly document processing. Anita has a degree in low temperature physics from Leiden University and worked in Moscow.
Before joining Elsevier as physics publisher in 1988. And now over to you, Anita. Thank you so much. Thank you very much. And welcome to everybody. Thanks very much for joining this webinar. It's a great pleasure and honor to introduce our three speakers today. I think open data is such a pertinent theme and the three people we have with us today are really key, key participants in making this open data future a reality.
As David mentioned, feel free to Pose any questions and put them in the chat or put them in the Q&A. The chat is more for if you can't hear us or if you have a clarifying question. Q&A is for any real questions to the panelists towards the end. Towards the end of the presentation will be taking questions directly from the audience. We'll start by asking each of the panelists to introduce themselves.
Then I have a couple of general questions to the panelists and hopefully some questions from the panelists to each other. And at the end, we will get to your questions. But if you want clarification on any of the topics that we're discussing, feel free to jump in and post to the chat and that would be really great. So without further ado, I'm really excited to see all three of you here today.
As I said, I think we have a great representation from the very complex web of participants and stakeholders that enable open data to start existing. So I'm going to ask each of our three panelists two questions. Can they briefly introduce themselves and can they tell me how you define open data and why it matters to you? So I want to start by asking Andrea. Andrea Medina smith, who's a data librarian with the research library, works with the Office in data and informatics, and importantly is one of the co-authors and co-creators of the research data framework.
So Andrea, if you could say a few words, introducing yourself and then if you could say how do you define open data and why does it matter to you? Thank you, Anita. Really appreciate getting the chance to be here today. So I am, as Anita said, a data librarian at the National Institute of Standards and Technology. And my work has two main roles. One, I help make our data publications public and publicly accessible in response to the public access memos and things like the Evidence Act.
And then I also have another hat where I am on the team that created the research data framework, which is a guide or a map to this really complex world of research data. I would say that to me, open data is a corollary to open science. It is the platonic ideal of how science can work where we have our research and it is open and transparent so that others can work with us to either better our science or refute it or better it through refuting it.
And that is how I hope to see science moving forward. And one of the main ways to do that. Now that we have the technology to do that is to open up our data as much as possible when it is ethically the correct thing to do, when it's the right thing to do. And a lot of times, especially with the data that this creates, it is the right thing to do. So we have a mandate that says basically that we are open by default. And my role is to help people to understand what that means and how they can do it.
That's really great. Thank you. Thank you so much for that introduction. I think there's a lot there, but I love the platonic ideal of open science. Maybe that's something we can get back to. Thanks so much. So welcome to our next speaker, Max petzold, who is with the Swedish national data service.
He's the deputy vice chancellor at the University of Gothenburg. Göteborg, am I saying that right? And involved in the committee of ethics in research, also a professor of biostatistics who has worked with the Swedish national data service. Max would be great if you could say a few words and how you define open data and why it matters to you. Thank you.
Thank you for inviting me to this meeting. It's I'm very happy to be here. And good afternoon from Sweden. It's a little bit later here. And so as with my background as a professor in biostatistics, I've been working with data for many, many years, of course, and to me, actually, I very much believe in the fair principles. I mean, that's kind of defining open data that it should be well documented, findable, accessible research data that is also well curated.
So open data is much more than it's just open. It's also useful and well documented and all this essential things that comes with the open science question. And and I've been working eight years as the director of the Swedish national data service, which is a joint effort between the Swedish universities. So all of them in the country having a main office, but very well functioning, distributed network of offices in the country.
So each University established their support unit for the researchers. And we work together and collect information and build a very good environment for open data. So we managed to create something that is a little bit unique actually, and get input from so many people instead of just having one office. And now I'm very happy. I'm just or I just left the office and as the director and became the vice deputy vice chancellor of the University.
So now I'm kind of leaving the National level and the European Union level down to the local level. So suddenly I have the tools to implement all what we have been saying during the eight years, which is very nice situation, very interesting situation that you really can do what you tell people to do. So thank you. Thank you so much.
Very interesting also that you're having this shift from planning to actually doing. And I'd also be very interested to hear more about your view as a biostatistician on really what it really means for data to be useful. So those are some things I'd love to come back to. And thirdly, and also a great honor and a pleasure to say Hi to Hiroshi puebla, who's the director of make data count at datacite.
She has a background, a very deep and long background, worked at asapbio plus one, and she's also one of the leaders of the research data publishing ethics working group of force11. So wonderful to see you, Raj. Same question to you. Could you briefly introduce yourself? How do you define open data and why does it matter to you? Thanks yes, sir.
Thank you, Anita. And Thank you for the opportunity to participate here. So as Anita was saying, I currently work at datacite with a focus on the make data count initiative. And for those of you who may not be familiar, the goal of this initiative is to develop and advance responsible data metrics so that we have the tools to enable us to meaningfully evaluate and reward data usage and impact in terms of defining open data.
Always difficult to provide a definition. I guess mostly highlight the characteristics of the values that I attach to open data. So for me, open data is data that you can access, you can Bookshare and you can reuse. And obviously all of this without having to go and ask the initial data producer. And I say this because I spend a lot of time when I was an editor on plos one asking these questions from authors.
So essentially, to me, open data is data that can be found, is discoverable. Researchers and others maybe for policymaking can find for their relevant purposes, incorporate it into their research, into their work data that can be cited as support for transparency and obviously also to enable that reward for the producer of the data set, important also for the Open Data to have all the necessary information that makes it reusable.
Essentially, the more information that you provide, the more likely it is that it will be reused. So essentially advancing this goal of generating new knowledge and important also obviously to be licensed in a way that enables sharing. Why does this matter to me? I think one of the important aspects is that I've worked in different areas of open science, so I've worked for open science journals and in preprints and now more closely with data and I think data is the common thread for two of my passions.
Again, this element of supporting open science and also that it's open just for the sake of being open, but because I believe it's open science practices are necessary really to catalyze and accelerate knowledge and discovery. And also because another of my passions that I've done some work on is the ethics in publication, reproducibility and integrity. And essentially I think that open data is the core element that is the pillar for both of those aspects, both enabling open, open science more broadly and also integrity and reproducibility.
Thanks that. That makes a ton of sense. That's that's really good. So it's interesting to see that there are overlaps and differences a little bit between, um, your, your views on open data and what the ideal open data future could look like. I was just wondering if I could get back to Andrea.
You had this. I'd love to ask the all three of you the same question, but I'll start with Andrea, because you had this, like, platonic ideal of open data. Um, I really like Russia's saying that you have a catalyst for more science to happen. I'm wondering, Andrea, from your experience at least, have you seen cases or areas where this platonic ideal of open data is actually happening?
And can you say something about what that looks like? How how does that indeed change science and change, change the way that science is done? Or or is it too far away? And in that case, can you just paint a picture of what it should look like? So I think there are a few projects here at niss that are really embodying that that ideal. One of them is the net zero house.
And so god, almost a decade ago now built a home. And they the goal for it is to be a net zero user of fossil fuels and the like. So basically they're looking at how do you use how do you create a home with the right materials and the right technologies to be a net zero producer of greenhouse gases? And they have released data over several years.
And along with the data sets itself, they have an extremely detailed data dictionary. They have a user group. So that you can actually interact with other people using the data and the researchers who created the data in the first place. And it's also it was one of the first things that we made available via our repository. So that means that the data is preserved.
It is well described so that if you find it through our site, it's you get the same sort of description as finding it through their website. So I think that's one project that we really should highlight more when it comes to this ideal of open data. And then when it comes to a lot of our other data sets, I'm not sure that researchers have been able to see the benefit to them, to their own career yet of opening up their data and really spending those precious resources of time and energy on creating reusable, really fair data sets.
And a lot of the technical side takes care of my team's take care of. But the descriptions to make it findable, that's up to the researcher making sure that you've got, um, that you're using non proprietary file formats. That's up to the researcher, all those sorts of things. I don't know that they've been able to see the benefit of it yet, and that's another thing that MIT'S work on. You know, we want to make it known that you're gone.
You're going to have a boost to your reputation. You're going to have a boost to your citation count if that's important to you, which is part of reputation. But I'm honestly not sure that those things really matter to our researchers. You know, they're important. Yes when they go up for promotion and the like. But they're not the be all and end all. And until we can convince the researchers and convince the supervisors and unit directors that this is where science is going, we are on board as missed.
I'm not sure that we're going to get. So not every one of our data sets is going to fall into that ideal. Not that they ever will, but that the percentage will be larger than I would like to see. That falls below the ideal. Oh, interesting. So just my clarification. So this is an actual physical house.
Yes OK. This campus, nobody lives there, but they have stuff set on timers and the like to pretend like someone's living there, though I thought that would have been a fabulous benefit for someone. You move your family in there for six months, you treat it like a real house. They can see what a family can do to it.
It's supposed to be like a family of 4 is living there. Oh, and so they monitor all the sort of everything that goes in and out of the house and that's all openly available. Yep I can give everyone the link to that if they are interested. That's super interesting. Thank you. I fully appreciate and wanted to go to Russia because you were smiling and nodding when we were talking about rewards. Um, for, for making this data open.
The question to Russia and then after that, also to max, what is your experience and what are your thoughts on how we could enable such rewards or vision of that or. First of all, the question, do you agree with Andrea that the researchers perhaps do not yet see why they would do this. They would invest this? And any thoughts on how that might change moving forward?
Yeah so, I mean, I was nodding because just last week we hosted an event talking about evaluation of data use. And and a lot of the discussions came back to, you know, what's the benefit for researchers. What happens as part of tenure and promotion processes is they're going to give me brownie points in my grant application. Um, so I certainly agree that the benefits are less tangible as it is now.
I think there's been a lot of focus on making sure that we brought different communities along in sharing data and maybe this part of evaluating what has been shared and which value do we assign to it. It's something that we're still kind of grappling with. And at earlier stages of the discussion among different stakeholders. But I think there is a lot of momentum here. Again, at the event that we had last week, we had a couple of panels with um, representative of institutions who are already thinking about what does it mean if we want to introduce data as part of tenure and promotion processes, how do we consider what's the metrics that we need and what should it be a metric?
Because that's also part of the evaluation. How do we account for different disciplinary cultures? Again, different disciplines produce different data sets. They get deposited in different ways, reused in different ways. So I think that the conversations are starting and there is quite a bit of momentum and we should all I mean, my suggestion to everyone should be to keep raising this point as part of any discussions on research assessment.
There is a lot of movement as well. I mean, you may be aware with this, but at European level, joara is a group that is trying to align to the et cetera, but really trying to bring tangible steps in, in reforming the evaluation at institutional level is open to anybody who can join currently has a pretty much European focus. But the interesting thing there is that those institutions who sign up to code, they have to produce a plan as to how they are going to update their assessment processes in the short term.
So I think we should push for data to be part of that. Essentially, we want a more holistic view as to what the researcher contributions are. Thank you. If there are any links to the outcomes that you had in last week's workshop, that'd be probably very interesting to, to all our, to our audience and everyone else. Um, yes, I'm preparing the slides that were presented, et cetera, so I'll be happy to share.
Thank you. I'd be, I'd be very interested. I saw the event. Max I was wondering, what does this look like from your end? So so thinking about tenure and recognition in your new role, perhaps, how do you how do you look at this from the inside of a European institution? Jim mean definite. Like in many other areas, it's a matter of generations would say that the mean at least in Sweden all the PhD students are getting exposed to open science open data.
So when mean we get more and more of the young researchers into the system, the why question more or less will disappear, I think, because they know why it's important. But clearly there is a gradient over the ages. And it's also quite interesting. I mean, normally try to highlight archaeology and metrology and also climate research, which mean they are very, very used to share data. And you asked earlier, have you seen any results from sharing data?
And definitely within archaeology and metrology and you can only excavate one time. So it has been very obvious for a long, long time in archaeology to document and share your data. And the interesting thing is that in these areas there is also a clear merit of sharing data, which is, of course lacking in many other areas. But we also need to remember that mean within like medicine, where I'm doing most of my research in epidemiology and in these areas.
And of course people share data, but what happens is that you find the paper, some researchers doing something and then you ask them and then you become co-author or something. I mean, that's another way to share it, but not organized in the way that we would like. And so, of course, people are sharing in all areas but more, more or less organized. And something that also try to highlight is the usefulness of sharing data to prepare new studies.
I mean, a lot of people downloading data from the Swedish national data service never cite the data and they don't use them in a way mean in a way that's fine because what they use the data for is to understand what does it look like when they are preparing a new study. So they would like to do data collection in some other country and then they look into what did it look like in Sweden.
And so it's also useful from that perspective. And also in areas where you have very, very expensive infrastructures like synchrotrons, particle accelerators, the technology is moving very, very quickly. So old data is in a way useless. But at the same time, there's so long queuing time for next experiments. So you tend to reuse old data to get a view of what you will expect when, when, when you come first in line.
So so it also the reuse is something that reuse for preparing new studies is something that we definitely should highlight a little bit more. So it's not just pure research, it's also preparing research. Was briefly on Max's point about the many different uses, et cetera. And I think this is something very important to bear in mind because obviously we should bring the incentives for the individuals to do this.
The intrinsic incentives and motivations are very powerful, but I think one of the benefits of open data that we should not disregard is to look at it as the collective, essentially all this collective knowledge that is available. And a great example of this is the protein data bank that has been running for decades and the economic impact of the data sets that are deposited there is estimated at the range of billions with a, b, and obviously this supports new research but also supports development in the private sector, et cetera.
So I mean, we are contributing to something that is larger and there may be uses of the data set that the individual producer cannot envisage that may become possible again years later, as Max was saying, with new technology, new analytical tools, they may be useful in different ways in the future. Yeah yeah, that's a really good point. And Thank you for raising that and Thank you also for those examples.
Max also further to your point, diraja, of course, there may be only certain situations in which certain people can access a certain type of data. Of course, archaeology, only one person can do a dig once and then that's done. And maybe it's an area that's hard to access or that's going to get built on or what have you. Similarly with astronomy, you know, the stars only align one way once.
So I think that unique aspect is also that's a great, great element to introduce. I want you for a moment shift slightly, focus in a little bit with all of this. So we've spoken about what is open data, why does it matter? What can it achieve? We've seen some examples of where it's very helpful and useful and talked a little bit about metrics.
Now we are here in an SSP webinar, so I'd love to hear from all of you how you think that publishers can really contribute to enabling this world of open data and, and what are, what are sort of concrete steps that publishers can do? Maybe if you have good examples, I don't think any publisher would like to be shamed at this moment in particular. But if you have examples that are not helpful, that's also completely your right.
Of course you're actually I'm starting with you again because you have the most, most immediate response. What can publishers do to help make this world happen? Um, sure. So I guess a few things that come to mind. And also I come at this from the perspective of a work plus that was a pioneer in having a data policy. But if you are working at a publisher and/or journal that does not have a data policy, please create one and create one that requires the accessibility of the data sets.
I realize that there will be different communities. And I don't say that you have to kind of, you know, bound everything to the ground and create this mandate overnight. But I would suggest that you do it in a way that you do a lot of community discussion and consultation, but really going into the direction of travel of the data should be open again unless there is any ethical or privacy concerns.
And there are also ways of addressing that. So essentially be very clear on that because that really supports, again, also research awareness. As part of your data policy, please recommend the use of data repositories. I'm guilty of having allowed authors to put PDFS in supplemental files very much regretted. Now, looking back, one of the lessons learned please no more PDFS in supplemental files that are not discoverable, not reusable.
Essentially, what we want with this data set is not only to reward the author, but make it usable for others. So please, as part of your policy, recommend the use of repositories. There are many, many out there for different disciplines that have their own standards and frameworks. If there is not one for your communities. There are also very many good generalist repositories available and enable data citation.
We are talking about these rewards. There is a lot of issues with the publisher workflows and the minute where essentially there is, apart from the cultural aspect of whether researchers will cite the data sets, there are some journals that don't allow citations to things that are not peer reviewed, which I don't think supports transparency. And then whenever there are citations, sometimes these don't get deposited into crossref.
So essentially look into your workflows, because I think that if we get the resources to do this extra bit of work, we. And to make it worthwhile and support them in getting that credit. And then the other thing I would mention that came across in a couple of conversations at the event last week, but before, is that at the end of the day, we want with this open data sets is to be reused and re-analysed.
So please do have policies that allow submissions of re-analyzed data sets because I've heard of journals that say if it's not new data, I don't want to look at the paper, which sounds a bit like paradoxical. If what we want is to make these data sets as useful as possible. Really interesting and a whole bunch of points there that I'd love to follow up on a little later.
But I wanted to go to Andrea. Andrea, you were one of the co-authors of the research data framework. And I know that you've spoken with publishers because I've been involved in some of those conversations. Can you say a little bit either from your perspective at Ms. but also and in particular as co-author of the research data framework, what are your thoughts on the roles that publishers can play to make this open data world?
From a broader public access hat, at least I would say everything that Raj brought up very important. Getting those publisher workflows in line so that we can feed the pids, especially from citations to the different agencies or organizations that can work with them to help get that credit. That's really important. The other thing I would say and this is a little controversial is I have just gotten on board with the data availability statement in addition to the citation.
And the reason I want that is because we have to remember that there are still going to be people reading these and the data availability statement, it puts it right there. Here's where you can get the data. You don't have to look in the Citations try and figure out which one in the list of 150 is the data they worked with. Here's the data I worked with.
Here is where you can access it or the data I worked with. Has privacy issues or ethical issues. We cannot release it. Please contact us to discuss, you know, something along those lines. I find with a lot of the machine readability and machine actionability statements that we start to leave people behind. And that's really frustrating for me. Part of partially because I don't have the skills to do all the machine reading readability, all that ml and AI stuff, and I still read papers.
So while there's a lot we still need to remember the people. And then my ardf hat, I would say that, yeah, the publishers, I would like publishers to step up and be a full participant in the Open Data movement in the sense of instead of sort of. I would like them to take the opportunity to say, all right, this is how we can do it and this is how we're going to feed into the PID graph and really take it as an opportunity instead of right now, I feel like we often are trying to work around the publishers to get done what we all feel needs to get done, and that's really frustrating when the publishers are such a big portion of the scientific enterprise.
Her strong words, I think. I hope we'll have some. Those are my words. Not nice words. No, no. No excellent. No, no. This is great. I'm not at all saying that.
No, this is. This is really good. Oh, you're getting a lot of claps. I think so. Brilliant no. Fantastic Thank you so much, Andrea. That was that was super helpful. Max, your thoughts on what publishers could or should do. And to me, publishers and journals are super important.
I mean, they have in a way, much more access to the researchers than the universities and organizations like Swedish national data service. So, I mean, the approach from the publishers is crucial. And to really work with the open data, open science, engage people. And also, I mean, like Andrea said, we remember the people. I mean, it needs to be understandable and doable.
I mean, these requirements that come up should, of course, be kind of coordinated between the different journals, the different publishers. So the researchers meet approximately the same kind of requirements, but they should also be clear. And we are actually working with the Swedish national data service and Elsevier in kind of small project to try to improve these messages to the researchers and requirements.
So make it easier to understand and explain why and all these kind of questions. But then the publishers also have the power to really require good citing. So avoiding free text and that kind of stuff because we need to come up with metrics and to make it visible on organizational level, but also on the researcher level. We need to be able to show how much data is reduced, mean cited and all this kind of things.
And today it's really problematic to get these numbers because of different ways to cite. Free text is problematic. And and then another thing, I mean, that's really the power in a positive way that the publishers can help in many ways and structure it in a good way. But it's also, you know, researchers are always a little bit hesitant leaving the data.
So, of course, these repositories at universities or other places should, of course, be accepted. So we shouldn't come to a situation when you need to upload to some certain repository. Of course, the repositories should fulfill the fair requirements and that's fine. But we it's really important that we don't scare the researchers. So they think the data will be taken over by the publishers in some way.
And this is also important for the metrics that we need to share, the metrics because the publishers are sitting on a unique source of metrics that really can't be accessed by the universities and other research bodies. So we need to help each other in different ways too, and not lock this information into some kind of a ownership sort either.
Thank you. So so Thanks. A lot of points there. I see there's also a question and it has come up both in what Max and Erica were saying. I think it may be helpful for a second say what is currently maybe Russia. You can take the lead here. But Andrea, also feel free to jump in from the perspective.
What is can you say very simply, what should publishers do in terms of mentioning and citing data? So I remember the force 11 data citation principles. And what I recall was we encourage authors to deposit their data in a repository and we ask them to cite it as if it were a reference with a persistent identifier PID. That was another question in chat. And so so that the references are kept.
But there is this issue with the data accessibility statement. So so maybe can you kind of say what, what, what is currently the state of the art? Is that still the state of the art and have there been an evolution? And then maybe for a second we can talk about whether or not publishers should really focus on this data accessibility statement. Yet it starts.
Spence, I understand the ecosystem. There are still issues with the workflows from the publishers in terms of getting this information to propagate. I would say the best way to get this links, if we want to call it that way, between the data sets and the articles, because at the end of the day, all of these elements of reward is we want to be able to say this data set was used in some way by somebody.
Was it to advance science or refuted, whatever it may be, to add it to a meta analysis, et cetera. So we need to be able to make those connections and the power this is breaking down is not the best way to make these connections is through the metadata that publishers will put in crossref because crossref has the workflows to link that to datacite that provides the dois for data. So essentially, I guess my one simple thing which is not so simple, I understand technically, but if there was one thing that publishers can do to really increase again, the information that we have about data usage is improve the metadata deposit that crossref to make sure that those references are there.
And again, use the persistent identifiers, as Andre was referring to, because the persistent identifier not only allows the citation for the researcher, it brings the metadata that will allow us to, again, make those connections, link the data set to the author, to the institution, to their funder, to their article, et cetera. So I would say improve the metadata deposit with crossref. Right and just a quick follow up.
How do you feel about data accessibility statements and publishers sort of insisting on those for publications? Is that a nice to have? Is it essential? I mean, I understand Andrea's point about the human. You know, I'm also coming from that perspective. used to be an editor and just read papers day in and day out. And I think that's important. But the part that I would emphasize is making sure that they are machine readable, which I don't think is the standard right now.
So essentially make them machine readable and include the persistent identifiers because again, through the persistent identifier information, we can start enabling those links. Yeah Thanks so much, Andrea. Further additions to that. So in particular, this issue of citing data of adding the data in the citation, is that something that also within RDF is something you recommend or does not go into that level of RDF does not go into that level of detail.
It does not. So I wouldn't really have anything to add to Rachel's comments. It's getting that into the flow in so that it can be part of the graph is just extremely important. Great um, Yeah. Max any other points from your end? Things that would be critical for publishers to do and thoughts on this citation, for instance, how did that work with the Swedish national data service?
Do you have pids? Do you look at citations to them, et cetera? Yes, for sure. I mean, and the thing is that there are so many good repositories that really provide with pids or I mean, that's what we do the device. Um, but we do have a little bit of a problem that the journals a little bit too often I would say, except that you don't upload your data to a repository because mean uploading is not the same as sharing.
Mean uploading is to make it searchable and you need to document it, you need to create it in a good way. But sharing is another matter. I mean, when someone finds the data, of course they might need ethical permission to get it or they need some kind of contract, whatever. But the important thing is that data is taken care of in a good repository and you get your DUI and all these kind of things.
And I think the publishers a little bit too easy accept because it's easy for a researcher to claim that this data can't be shared openly. And that's very often true. But you can definitely use the repositories still which lock the data in and make it available. Then you have the appropriate paper. So ethical permission that can things. So that's what we do in Sweden that we actually take care of also sensitive personal information, everything like that make it searchable.
And of course, if someone would like to use it, then there is a signal going back to the owner of the data and they need to access, if they can give it out to this person, that requires the data. And here we have it's really based on the authorities, on the universities that have to go through some certain process. But all data that is collected using governmental money in Sweden should be made available.
So it's very clear in the law, but it's not really done all the time. It's very often not done so. So a little bit more help from the publishers to push people using appropriate repositories instead of just having a data availability statement because people get retired, computers are lost, whatever. That's very interesting. So the data availability statement is a little bit like a stopgap.
They're, they're like, check that box, that's that. But you're saying they should even upload and cite correctly, even if they're not yet or not sharing it. Did I understand you correctly? I think that's a really interesting point. Sorry, go ahead. So whatever you publish, I mean, normally you have data behind either it's qualitative or quantitative data.
And I mean, you should upload this to a independent repository. It could be your university, it could be your whatever unit, but they should have a well organized repository. And there, of course, it should be well documented and taken care of following the fair principles. But then, of course, it might be that the person that asked to get the data don't fulfill the requirements. So you could be denied getting the data, of course, but it's still following the fair principles.
I mean, and now we come back to this, what I said about the project that is doing together with elsevier, Uh, in the instructions, we need to be very clear in relation to the authors that whenever you that the journal doesn't require the authors to share the data freely, that's another thing. But to follow the fair principles is required. And that's a little bit of a problem.
If you remember, don't remember the, the filmmaker but this super size me this. Yeah I'm a little bit doing that myself as a statistician because as a statistician, you're not normally the owner of the data, so you're engaged in a number of projects, but someone else is running the project. You can say the pi is someone else normally, and I within my role, I never ask people to share the data because I'm so keen on understanding how people are thinking about sharing data.
So I'm just sitting there and yeah, listen to the arguments that, Oh no, we can't. Outdated, isn't this and that. And it's very, very good way to learn how to improve the communication and also the way we handle the data to make researchers willing to share their data. So if you search for me, you will not find me very much in repositories.
I mean sharing data because I really try to understand all the obstacles by not really augmenting myself, but listen to the research groups, how they argue about sharing, sharing. So that's my super size family. But yeah, very interesting point. Andrea, you came off mute and I was wondering if you wanted to comment on this. That must have been an accident.
Not at all. But go ahead. If you if you had the specific thoughts. No, not really. I mean, it's just this all comes together in the. Well, we've got a bunch of threads. One, we need newer researchers in the pipeline who understand it and want to be sharing their data to when they share their data and they cite their data.
The publisher pipelines need to make that a useful thing. And three, we need to recognize that they're doing this and have some sort of reward for it. So it's a big project we're all working on. It it is a very big project. And I think in one of the preparatory calls, we were all saying, it's like you were there are all these switches and if any of the switches, you know, goes down, then the whole system doesn't flow.
And it's so easy for some of the switches, for some of the reasons not to go to plus and then the whole stream stops. In a way, I think that's something we're seeing. They're very conscious of time. I wanted to for a moment jump to the questions from the audience. Um, because I see we have a number of questions. I still have a couple of points I wanted to get back to, but we'll see if we have time or not.
And I was wondering, Caroline, you had a question. Um, I'll just read it out because I think the unmuting of, of folks who are on the call can be a little tricky. Um, so I just wanted to read out the question from Caroline. Max mentioned the broad inconsistencies around citing data sets in the literature, frustration shared by data repository managers who must report usage citations to their stakeholders.
The labor required to mine these citations year after year is inefficient and expensive. Is there a global standard for citing data in a manuscript, or is this still determined by the publisher? So I just wanted to put that out there because I think that may be a slightly different question than saying it needs to be cited in the citation. I was wondering if any of you have any good ideas about. Yeah, go ahead, Raja.
Yeah, sorry. Briefly on Caroline's question, another pointer for publishers is certainly provide guidance as to how you would like the data set cited by repositories and other organizations provide the template. So essentially I don't think that should be necessarily a barrier, I guess on the topic of data citations and what you were saying also, Anita, about the switches, essentially it becomes very complex because you need all the switches on, on the moment that only one is off, everything can collapse really quickly.
So so I think one of the things that to for I wanted to mention that we are working on data side is that we recognize that this requires many switches. And we thought, OK, maybe the answer is that we now have the tools to actually bypass some of that and have less steps and less switches along the way. So for data citations specifically, after waiting for a number of years, everyone gets onboard with their workflows.
Et cetera. What we are working on now is actually developing a project that we spoke about in the last week. What we want is to develop a resource for the community that will collect data citations from different sources. So essentially we already have some data citations through the workflows that we were mentioning through persistent identifiers, the crossref and datacite.
But we also know that many people are looking at disconnections between data sets and papers where they mentioned because we've talked about a lot about data availability statements, but many authors put the accession numbers in the method section as well. So you know, they can come up in different places. So maybe one of the things that we want to make easier is to actually find those mentions to data sets in the full text of articles and start collecting all of that information together so that we can make this available to the community.
And as Max said, in a community, available resource, not proprietary. So we get this holistic view while everybody else, you know, fine tunes, their processes and their culture and their incentives, maybe we can get started with the infrastructure that we have and start getting that view and then fine tune it obviously as we build more. Yep um, I'm reminded of two projects one by Julia lane, I believe, to also develop open, open source software.
I know some of my colleagues have contributed to that, to indeed find data sets through the mentions in data and the other one that no one has yet mentioned but Scott likes. So having an infrastructure that allows links between data sets and papers, that's of course also an interesting one. And you need to talk to Caroline. So that's good. Um, for a moment I wanted to get back because this was a point, max, that you raised, which I find fascinating.
You talked about the ownership and that data and that you, for instance, as a statistician, didn't feel that you were the owner of the data, so you couldn't actually share it, rightfully so. A quick question to all of you. I'll start with Andrea, because I'm not sure what your thoughts on, but do you think that researchers always know whether they can share data?
Is that is that something that is obvious, that is clear, or is that something perhaps where we as a community can still help them? Do they know if they have ownership and if they have the rights to share data? I would say that that is absolutely an open question for a lot of researchers. Um, the benefit of working for a US federal agency is any work they complete here is public domain and they don't own it.
Most researchers understand that there are still a few that really feel like it's theirs and they need to protect it somehow. But that's generally not the case here at outside. And even with the other agencies we work with and universities we work with. Yeah, there's a lot of questions about what legal, um, what the sort of IP and license ramifications are of their contracts.
What does that mean? Who owns it? But I would say probably a good 80% of ministers understand where the data sits legally, who it belongs to. But I think that number would drop significantly outside of a federal agency. That's great. I don't know.
I'm not sure. Max, if you have one or two of additions to that. Otherwise, I do see two new questions in the chat, so I'd like to get to those. But and for me, we can continue with the questions. The other ones. OK OK. Because I thought you really your point was very interesting.
And then good to know that for me it's at least very clear. A question from Anka pushka, if I pronounce your name correctly. If I didn't, my apologies. BI Publisher politics books and work within a larger humanities and social sciences team. Our authors still don't really think about opening their data the way that scientists do. How do we move in this direction?
By widening the concept of data to include things like interviews, personal notes, maps, et cetera. Do any of you have a thought or experience with these types of humanities datasets? I'm sorry. Go ahead. So my background is actually in humanities. I avoided physics the entirety of my school career, and now look where I am.
It's really helping me. So I would say that, yeah, there's a shift that really needs to happen in the mindset of humanists and to some extent social scientists as well, that, you know, what you're working with is data and it's valuable to people outside of your working group. I would say that digital humanists probably have a little bit closer perspective to hard scientists than a medievalist who is working with a single primary source and the like.
But yes. We need to work with them. The research data framework is non disciplinary. It talks about data as anything that you use to back up your research. So that can be interviews, that can be the primary texts, that can be anything that is relevant to your research. So if there's anybody here who leads a group of humanists or even social scientists, I would check out the RDF to help you sort of get a grasp of research data generally for your community.
Thanks so much. And if you could actually pop the link to the chat, that would be super helpful. Max, you had thoughts on this as well. We have been working so much from a technical perspective to make it possible to share also qualitative data, and that includes I mean, it includes maps, personal notes, interviews, extracts from whatever, from courts, journalists, data from different newspapers.
And we really think that this is booming. And it's a lot of people working with qualitative data that really appreciate. Now, the repositories are also thinking about this kind of data. So of course there are differences within different disciplines and you're aware to different levels and you talk about data in very different ways, but there is a fairly large number of researchers also within humanities that are interested in sharing the data, but they just lack the tools to some extent.
And so we have a very positive development in Sweden, but we are still struggling with especially artists that still claim that they don't have data. But yeah, they do have it's just too much too indeed. And I briefly wanted to get to the last question. Thank you for the link to the RDF. Andrea, um, how can we. Oh, sorry. There's another question that came in.
We probably don't have time for only questions, but how we advise authors to choose the best repository. I wanted to make a brief pitch. I know that NIH has a whole story around selecting repositories in the life sciences, and they actually have a whole framework for generalist repositories as well. So those of you interested in that, if you look at the generalist, I don't know what R stands for, but generalist repositories, um, that's something that NIH does.
But any thoughts about how we advise authors and also are all repositories working with crossref? Yeah so on the first question, my suggestion is that some publishers have at least when I was at closed, they had a recommended list of repositories. So depending on what for example, you're thinking of submitting the paper, check if they have any recommendations. But some of the things to bear in mind, apart from obviously the use of persistent identifiers, is making sure that the repository makes the data available at no cost and without requirements for registration essentially fully open with licenses that again are open.
So cc-by or cc0. And then it can also be relevant to check whether the repository has some longer term sustainability plan. So to run as a platform and also to make sure the data sets they host that will be available longer term. But I'll post the recommendations from plus there.
I mean, I know some other publishers also have them in terms of their repositories. So many data repositories actually work with datacite, which provides dois for a number of research objects, but we have a lot of dois for data science. Crossref generally is the place where the publishers will get the dois and the metadata for the article publications.
But crossref and datacite work very closely to, again, allow those enable those links from articles to data sets and vice versa. So again, if you are thinking about dois for data sets and how to handle that at your repository, I would say is that you check the data side website and I'll provide the link. Thank you so much. And kiya, we did see your question on making data open access.
I guess I will provide a very brief answer. The idea being that the data will be at the data repository. The publisher still holds the paper, but happy to have further follow up conversations around that. And I'm sorry about Juliet's question. We we actually are really out of time. This has been a really fantastic conversation. And I just want to take a moment to Thank you. Thank all three of you so much.
Honestly, the time has suddenly forged ahead. I wish we could talk about this longer, and I hope we will. I will make an attempt to kind of summarize this conversation as a blog post in the Scholarly Kitchen and any points that people questions people have for us, feel free to reach out to us. Let's see if we can answer any further questions offline. I want to thank you so much, Andrea and max, for giving us the time.
Um, any thoughts? I think over back, back over to you, David. Yes Thank you, Anita. And thanks, everybody, on the panel for a great discussion. Looking forward to reading, Anita, a summary of that, an upcoming Scarlett Kitchen. So again, we just want to Thank all of our sponsors Morrissey, Stephen brix, Taylor Francis, their 67 bricks sorry, Taylor Francis F1000 and Silverchair.
We're grateful for everybody's support. There will be an evaluation email sent out in about a day, so please fill that out and please visit the website for information on upcoming programs. This is the final ask the experts of 2023 and we'll begin planning 2024 shortly. So please keep an eye out for what exciting webinars we're going to have next year. So this will conclude this session.
Thank you again so much. Goodbye by.