Name:
Three Ethical Challenges in Scholarly Communication
Description:
Three Ethical Challenges in Scholarly Communication
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/1c3cba0f-845b-4a33-8974-dd6b0d74abe3/thumbnails/1c3cba0f-845b-4a33-8974-dd6b0d74abe3.jpg
Duration:
T00H59M29S
Embed URL:
https://stream.cadmore.media/player/1c3cba0f-845b-4a33-8974-dd6b0d74abe3
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/1c3cba0f-845b-4a33-8974-dd6b0d74abe3/GMT20221012-150353_Recording_gallery_1920x1080.mp4?sv=2019-02-02&sr=c&sig=p4xLdv1RZrpQpkk7yIQ%2BgkIzqE5kW5RNzDN%2FNZzOmOc%3D&st=2024-11-19T19%3A25%3A44Z&se=2024-11-19T21%3A30%3A44Z&sp=r
Upload Date:
2024-02-23T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
Hello Thank you for joining us for today's discussion. Three ethical challenges in scholarly communication. The sixth event in SSPS 2022 webinar series. I'm Jason point lead of the FASB education committees webinars working group. Before we get started, I want to thank SSPS 2022 education program sponsors RFA Jay and Jay editorial open, Athens silver chair 67 bricks and Taylor Francis f 1,000.
We're grateful for their support. I also have a few housekeeping items to review. Attendee microphones have been muted automatically. Please use the Q&A feature in Zoom to enter questions for the moderator and panelists. You can also use the chat feature to communicate directly with the speakers and other participants. This one hour session will be recorded and made available to registrants following today's event.
A quick note on SSPS code of conduct. In today's meeting. We are committed to diversity, equity and providing an inclusive meeting environment that fosters open dialogue and free expression of ideas, free of harassment, discrimination and hostile conduct. We ask all participants whether speaking or in chat to consider and debate relevant viewpoints in an orderly, respectful and fair manner.
At the conclusion of today's discussion, you will receive a post-event evaluation via email. We encourage you to provide feedback. It helps us shape future spe programming. It's now my pleasure to introduce the moderator for today's discussion, Dr. Mohammed Sobana. Mohammad is Professor and Dean of the business school at Alma University in Pakistan and also an ambassador for the Directory of Open Access Journals.
Mohammad Thank you. Thank you, Jason. Thank you very much. This is Dr. Schiavone here from sex education committee. I'm an investor and the professor in the School at my university, Pakistan. It's an honor for me to moderate today's SSP webinar on three ethical challenges and is currently publishing.
We have with us our three panelists to address the ethical challenges in publishing today. The three topics that we cover in today's talk are citation leakage, tax recycling, patents and ethical issues around the population. We talk on three challenges, challenges as we finally saw Lettie Macbeth from Marianne labor by ensign Dykes on the Square and Matt hodgkinson from COVID and Google.
The first ethical challenge that we are going to talk today is citation leakage and may not be as aggressive to talk on this topic, but it is a lead of peer review operations and publication integrity at an early stage. Many have recently joined Marion Davies. She came from she came very light and she came very enlightened from day to day and becomes everything.
She let the development of subject approach later. Moving to editing as head of editorial integrity, she participated in the first larger web of size, et information scientific ad Institute of information at ISI. She is studied under Dr. Eugene Caulfield and continues research and metrics, ethics and publishing citation metrics. This is called value citation vaccine.
If you want to give more info on you, place my conservative. Nice and are. Marie, you're muted.
Are you able to see my screen now and are you able to hear me now? Yeah, yeah, we are hearing it. So I have given you a little introduction. So if you want to give your more info to the audience, please go ahead. No, I just say that I have recently joined Mary Ann liebert, but I have had a long career in various aspects of scholarly communication, and particularly in citation science, with roughly 25 years with some interrupt in there working at isi, Thomson Reuters and then finishing up at clarivate and as I said recently, joining Mary Ann liebert, literally less than three weeks ago.
So let me begin. I want to address the idea of citation leakage more as a question of citation contamination, that which is undermining the integrity and the purpose of scholarly citation in the literature more broadly defined. I'm going to have to guess. OK I hope that's OK.
Slide two. I need to start by saying what is first the proper use of scholarly citations. So let's set a groundwork for understanding what happens in citation leakage or in a compromised or contaminated citation flow, as well as what that puts at risk in the scholarly communications network. Properly use scholarly references anchor a current work in the context of existing knowledge.
It can confirm that work support, confirm the prior work, support them, expand upon them or dispute the cited works and overturn the existing models. But it provides a context into which the current work is establishing itself. It also accomplishes the purpose of acknowledging the contribution and the influence of other scholars, the people who provided foundational work or critical ideas and concepts that are forming and informing the work at hand.
This also lets you avoid falsely taking credit for other people's work. It properly acknowledges the debt that you owe their own thinking, the intellectual debt. It also aids in the understanding of the current work by making the necessary prior works findable in the natural sciences. This part of citation supports reproducibility. If you don't know the foundational techniques and the descriptions and the basis upon which the work is done, it would be that much harder to understand and reproduce the work in the current sense.
In the social sciences, it has that purpose as well as a kind of unusual characteristic in the social sciences of bridging divergent terminologies in related phenomena. So across different scholarly communities there may be individual scholars studying a phenomenon, but they are more isolated in their terminology. And that would give you a much harder way to find based on keyword search or even term search or even full text search.
The scholarly network says, what is this work based on? Oh, it is based on the same thing as those other works in the arts and humanities. There's an additional layer where you can accurately identify the topic and the context of a study. The you don't have to worry about finding Dante's Inferno. You can find it. But if you know what, what version, what publication and the page, then you're looking at the individual, individual place in that work that is necessary to understand the commentary in an arts and humanities paper.
Citation provides. In that sense, it provides an intellectual and conceptual infrastructure for research itself. And I have to bring in this quote, because I think it's one of the most important ways to understand what this network of citation achieves. In 1975, Dr. Garfield introduced the publication of the journal citation reports by saying a citation index is based on the principle that there is some meaningful relationship between one paper and some other that it cites or that cites it, and thus between the work of the two authors or two groups of authors who publish the papers.
So what happens when that is not what citation is used for, when it's not about creating a meaningful relationship and it's not about creating a sound infrastructure that can support the ongoing research endeavor. I pulled up a specific example. It was written about in retraction watch, as well as featured in an unusual move in the journal citation reports by issuing an editorial expression of concern.
This is an individual paper in a journal that you may or may not be able to read. That's immaterial. It's a paper on nano beads based, mixed agglutination and et cetera. The article itself contains 99 cited references. 37 of them are spread through the body of the work in the introduction and the methodology, but 62 are in the final paragraph and they are.
They occur as block references of 10 or more without individual distinction. For example, nanoscience is an important field of study references 38 through 48 about which many, many subjects have considered it, and references 49 through 50. So these 49 or 59. So these large blocks of non distinct references without any individual reference point or inclusion of that work.
Also in those 99 cited references, there are 42 citations to the work of an individual author. Whether or not this helps anything, he's also not a coauthor or author on that paper. He is getting these 42 additional citations from one paper in this one journal. 70 of the 99 are citations to a journal impact factor numerator. So this paper was published in 2017 and it contained 70 out of 99 citations to the years 2016 or 2015.
In that group, 22 are journal self citations. So increasing the journal impact factor of the publishing journal 20 are to one other related or to other related journals from the same publisher in a group of three and 14, or to a journal from a different publisher. So what you're looking at here is the indeterminate references to possibly spurious links between works, the inflation of individual citation metrics, whether that be age index, total citation count, or any other citation based assessment of the individual researcher, as well as 70 citations that affect multiple journal citation metrics and distort the nature of the metrics around those journals.
Any manipulation of citation or non scholarly use of citation creates this distortion of the fundamental network itself. Now I pulled an individual example, but there were a whole set of papers in this journal and the related journals that were engaged in the same activity. And it added up to a very significant volume of citations and a very significant distortion of impact factors. But there are other forms of these non scholarly references.
There is author self citation. And I include a reference because it is not inherently problematic that an author cite their own work. But when that author citation comprises 50% or 60% of the reference list of a paper, you're no longer engaging with the literature more broadly or creating that foundational structure. There are also author to journal citations where the author, for example, in that prior paper, is enhancing the impact factor of a second, third and fourth journal without necessarily creating a meaningful connection among the concepts of those works.
We'll all be familiar with the editor editorial coercion of citations, where an editor may suggest or even insist that references to the journal or to his or her own works are part of any submitted and published paper. Reviewers have also been found to coerce citations and offer to accept a paper on the condition that you cite some of their prior works. This could either be for their personal citation advantage or for a journal level citation advantage on a journal where they published other works.
We've also recently experienced and observed citations for higher where on Facebook or Whatsapp. Those are sites that I have seen individually. They will offer to create a citation ring where if you cite their paper and five others, then this turns around with a small remittance offered to you, or even just the exchange of citations at scale. Now here these may not be irrelevant citations in that they might be in the same field and there might be two nominally relevant works, but they're not based on an actual use of that work or the deliberate intellectual engagement with that work.
For the purpose of explaining the current publication, I have personally seen in other publishers not Marianne Lieber instructions for authors that encourage references to the prior content of a journal. They sometimes specify that time window that they would like these citations. And I will admit oftentimes it is the previous two years. These are.
I mean, these are many small observable features, but taken at scale across many journals and many authors and many papers and also taking place in a largely unconsidered and often unrevealed part of the scholarly literature, they can start to significantly compromise the foundation of your house. How much is too much?
Right now, it just looks like a little water damage to the base of the wall. How much of a problem could that be? Now I want to take another step into a different sort of citation contamination. And this is another form of non scholarly or not non scholarly citation, but the compromise is the citation network more broadly, and that is citation to works that were later retracted.
Obviously, Retraction Watch tracks the occurrence, but it also provides a database that shows the growth and effect of that compromise research more broadly. Nisa has recently formed a working group called sea rec, the communication of retractions, removals and expression of concern. Where the task at hand is to identify how every later phase of the scholarly publication can begin to propagate propagate information on the retracted or compromised status of a work that has been referenced.
Whether that is referenced before the. The work was retracted. So if I am observing a paper and citing it in 2022 and it's retracted later in 2023 or 2024, the fact that my paper is legitimately depending on that earlier work needs to surface the fact that that earlier work has itself been compromised. We're all also familiar with the social impacts of misinformation and the undermining of trust that comes when compromised or even fraudulent research becomes a part of the discussion around research itself, whether it is taken in the odd and horrible direction of the Wakefield immunization paper, or it is merely discrediting the idea that scientific research can be not perfect but also long term self correcting and mutually explanatory.
I did a quick look at the number of retraction retracted works and I use the website's core collections with a span of 2012 to 2021. This background gray curve shows the percentage of the total population of retracted works between 2012 and 2021. The percentage of the total population for each of these years would provide sort of a background level of noise.
So between 2012 and 2019, roughly 10% of the total population is occurring in each of those years. So they're all roughly the same size. Obviously, more recent works may not have been retracted yet. There will be a lag time before publication. And then between publication and later retraction. But what I found interesting was taking the top five web of science categories where retraction has occurred in this time interval.
You see that within a field there's waves of retraction, and I think I would interpret that largely to be due to the identification of material patterns in the literature of those fields where a large number of retractions or of problematic papers are surfaced. When I looked at a more fine tuned categorization of these retracted works, I found that in the citation topics feature in Web of science, the two areas with the largest number of citations of retracted works in this interval were and long non-coding RNA.
The microRNA difficulty where huge numbers of papers were being produced by paper Mills and repurposed and republished, and also in molecular and Cell biology, cancer, autophagy and apoptosis. And those two are fields where they were affected by that same sort of tranche of papers that were tying Miranda's to specific phenomena in the Cell or to disease states in the Cell. I took a look at this same population of roughly 4300 retracted items, and in this 10 year period, over 68,000 cited references have accumulated to works that have been retracted.
Obviously some of those references were before the identification of the work is retracted, but that does not remove the linkage or the history of this paper. So that the links within and around the paper are compromised. So maybe it's not just a little piece of the foundation, while maybe there's a greater compromise to the structure of scholarly communication that's taking place in this context.
I also was asked specifically to address the idea of questionable journals, and I will push back against the term predatory journal because when I think of predators, I think of ecologically apex species like lions and Tigers and orcas and praying mantises. Those are really cool items or really cool organisms. And this is not a question of predator and prey, but a question of.
Journals that pretend or purport to deliver peer review and editorial scrutiny of a paper but are not actually performing that service. The fact that there are so many and the fact that they are so extensive means that these journals are actually meeting a need or they wouldn't exist. It's not true that research published in a large online platform with no peer review is de facto bad or false research.
Each paper finally has to stand on its own. No matter where it's published. But in these cases, this is research that arrives in a suspect packaging under what may be the upfront assumption that there has been any pre-publication scrutiny. The problem there is that every paper in that journal or in that platform becomes tarred with the same brush. And that is no more valid than every paper in a journal being lauded for the same impact factor.
What I wanted to do was give a sense of the scale of this problem, since this idea that I'm trying to bring forward is that as this problem accumulates and as it spreads, what we're dealing with is not a small one by one problem, but a significant effect that compromises the structure of the literature itself. And all of those things that we depend on in the structure of that literature, whether it's performance assessment or the identification of necessary works and important influential scientists, it all depends on that primary citation structure.
Estimates on the size of questionable or predatory journals are difficult. But I got I wanted to try to give a sense of this scale. Shen and Bork in 2014 estimated that there were 4,000 journals with 53,000 articles published in questionable journals in 2010. And four years later, there were 8,000 journals with roughly 420,000 articles in the year 2014. Now, that's not cumulative 420,000 articles.
That's 420,000 articles published in 2014. A recent, more recent entry in the Cabello's blog from September 2021 identified that the predatory reports database contained over 15,000 journal titles identified as having problematic content or at the very least, deceptive peer review practices. So how many papers would be in those 15,000 journals? I did a variety of back of the envelope calculations, none of which are truly valid.
I ended up with estimates of 500,000 and 750,000 500,000 750,000, or even up to 1.2 million based on a continuous growth according to that same pattern between 2010 and 2014. The upshot it is, it's a lot of papers, and those papers themselves become enmeshed in the literature. I have literally looked at dozens and dozens of journals that fit this questionable profile.
And every one of them has at least a small handful of references to those journals, from journals that are covered and screened as COVID in the Web of Science literature. So they are, in fact, contaminating the literature at a much larger scale. Now, that may be a scholarly citation of a paper that the author, the citing author, has identified as valid.
But it may not. And the problem is we can't tell the difference in that context. So why is this a problem? It's because the cited works are often difficult to separate from established journals due to similar names, and it starts to undermine the fundamentals of this structure. I think that what we have is the obligation to begin to identify these behaviors and include a review of the reference literature as part of a necessary screening of including that necessary screening step in peer review and taking seriously the effects that this can have more broadly.
And to scrutinize those records as well as to make sure that we are sensitive to what may have been a later suspect work or suspect publication. I spoke really fast and I hope I didn't use up too much time and I'm going to say Thank you and hand off. And stop sharing if I can find it.
I think you're muted, Muhammad. I might want to do that. Yeah all right. I my even know. It yes, Mary. So you covered everything in your truck. So what would you think? What is the know, the citation, intimidation, citation, rib cage?
These are these two terminologies. They are used, in fact, sometimes desirous of each other. So do you think that the can we have any fine line difference between these two or these are really two different terminologies for these same purpose? I think that in some sense, every citation is to its individual purpose.
And as a purist, I would hope that that is the purpose of creating this meaningful network and indicating necessary prior works as well as even controversial prior works. But there are as many reasons to create non scholarly references as there are to create scholarly references. And what we're doing is piece by piece identifying what the other targets of referencing may be and then trying to accommodate and adjust for that as we analyze these things across larger populations.
All right. OK onto what is your opinion on what are the short term and long term impacts of such leakage on the population? I think the short term impacts are small but growing. They are minor to major distortions in a very localized citation environment. Longer term, I think it undermines the fundamental structure.
I think that it's very common to point to citation metrics or publication metrics and say that when the measure becomes the target, the measure ceases to have validity. And I think that we lose something terribly important when citations are used for purpose, other than indicating these scholarly networks and leading an individual reader to prior works that are necessary to understand. So I think that there is an undermining of the structure, and as it grows, it becomes more dangerous.
All right. All right. Thank you very much. Thank you, Mary, for a wonderful insight on such an important topic. Now, the second talk of the day is on text recycling. Dr. van dyke, will, you're taking care of this topic as the panelist. Dr. Gary doig is a director of Global content at research as this prolific author.
After more than 300 articles and 30 journals over the last 20 years. All, including natural sciences. Proceedings of the National academy, et cetera. His research has been widely viewed in the media and is invited to present talks and workshops around the war. Workshops in China mentioned here run by cattle in 2019, identified more than 5,000 pigs.
If you have more actually something to talk about yourself, please go ahead. Thank you very much. Yeah great to have the chance to do this. And Thanks for the invitation. Gareth, my name. I'll talk briefly a little bit about text recycling and the research squared preprint platform and how we manage some of these ethical issues, how our platform has been set up to manage some of these ethical issues.
Just to get started by introducing the preprint platform, of course, Research Square and its sister organization, agi, part of Springer nature, as you'll see in a minute, like our preprint platform, is integrated with the Springer Nature in review system. Lots of journals that are not Springer Nature journals also integrated within the system. And research is preprint with us in one of two main ways.
They either put a preprint onto the platform because they want to preprint their work or they put their research onto our platform because it's coming in through the end review system. So lots of research being shared rapidly, very quickly, open for people to make comments on open for peer review, open for sharing. But of course, there's ethical implications associated with that as well as we move to a much more open way of sharing information and data globally.
You can see on these quick introductory slides how much our volume has changed over the last few years and the fact that people from all over the world are pre printing with researchers, open, accessible and of course, that also brings with it a number of problems. As you can see here, we do experience lots of issues. This is the case with any submitted literature, any submitted articles and English problems, text recycling, plagiarism, how that is identified, how that's identified in the traditional way by publishers and how we're able to contribute to that.
On the preprint platform, I think a very interesting topics for further discussion, as you'll see in a moment. Of course, talking to authors and talking to researchers around the world has kind of led us to settle on three main kinds of plagiarism, three main kinds of text recycling mosaic, where that's data from various sources mixed together to make the text seem original hardest to identify paraphrasing.
This happens a lot, especially for authors for whom English is not a native language, third or fourth language. So words may differ, but the original idea remains the same. And recycling can happen without citation. And then, most interestingly, for me at least, and most interesting for us overall is self plagiarism. And this is a big issue in academic publishing. Of course, as you're probably well aware, authors are often unaware that they can't do this.
They can't take information, they can't take text from their own previous publications and recycle it into subsequent publications. So it's a common problem. And I just provided like a few definitions of it here on the slide and we put a lot of effort and energy at Research Square training and providing events for authors so that they can understand what they should and shouldn't be doing with in particular, self plagiarism, text recycling of their own article content.
And everybody watching this is very well aware, of course, like what a plagiarism check looks like. And as I'm sure others will touch on in other presentations in our event today, what we're looking for at the journal is a 10% or less score. And so authors take advantage of these kinds of services in order to make sure that their work is not going to be above that bar when it goes into a journal. But what happens at the pre printing stage?
How is that dealt with by preprints and preprint servers, specifically the Research Square preprint server? Well, as you can see here, one of the main advantages of pre printing research, of course, is that you can speed up dramatically the path to publication. You can get your work out there, you can get people aware of your work, people looking at your work much, much faster.
But we have to therefore work much, much faster as well in order to stop any of these potential ethical violations as work moves onto the preprint server. And so that's what I'm going to just spend a little bit of time discussing in the second half of this presentation. Is the interview platform. You can see how it works and you can see how the Research Square platform is integrated into that workflow.
And as I mentioned at the beginning of this presentation, authors either preprint directly on Research Square because they want to share their work with the wider world or the work comes in through the interview platform. And of course early sharing means early primacy that Doi number that is stamped onto the preprint. But how do we work with authors, with the community to ensure that we're not seeing these ethical violations early in the process?
So I wanted to show you, if I can, a few aspects of the preprint platform. Sorry, I've just jumped ahead a little bit, but here you can see how it works and note the information that's shared on the platform with people looking at preprints. This is a preprint that's not being peer reviewed. People can see the process as the paper is moving through the platform.
So this is a new innovation for us on the research preprint page and people are able to comment. So in direct submissions that come into the platform, we have an editorial team that look at submissions, check submissions for any potential text recycling, and then that would be flagged either on the preprint itself or if the preprint needs to get retracted later on in the process. So authors are able to see other colleagues, other researchers are able to see that open workflow.
And you can see it here in an example from BMC radiation oncology, the dates and times and changes that were made to the original version of record as the preprint moves through that workflow. And this provides us with an opportunity to identify early in the process any ethical, potentially ethical violations that authors may have made. And that's a big difference from traditional journal publishing, where of course that may happen much later in the workflow.
So here comments come in, people put comments onto the preprints, as you can see, and that provides us with that mechanism to give early feedback both to authors, but also in the interview platform to potential journal editors. To editors can see that there might be something happening with this particular paper, and we get lots of cases.
Figure recycling and text recycling in particular would be the number, the top two of these kinds of violations that we see in the author workflow process. And here again, you can see the author dashboard. This submission is under review. There are no new comments on your preprint, what the editorial decision is going to be, and the score in there on the language quality for the edited manuscript also.
And authors, of course, are given opportunities on our platform to get their work edited, to get their work checked for plagiarism, and to also pass it through a digital editing tool, an AI editing tool. I'll put the link to that chat after I finished talking in just a moment's time. So professional assessments of preprint validation, you can see here are some other examples.
This one pass through the pre-screening evaluation. All the author information is being provided or the appropriate declarations are present. And so on and so forth. Very open, very transparent. Authors, editors and other members of the scholarly community can see everything that's happening with the preprint as it passes through the end review system.
And this is a summary then, of that editorial oversight. Preprint submissions come in, they get checked, they go through these various stages to identify any violations, any text recycling, and then they get posted. Very, very rarely do we then have to retract something further on down the line. Things like methodologies may be flawed, specious interpretations, weak reporting.
So preprint helps a lot. Pre printing helps a lot with these kinds of issues that we see automated checks are present in the platform. We can identify duplicated submissions and journals that are participating within the review platform of course, are then performing all of the usual plagiarism checks that we'd see when I showed that in an earlier slide. So we have the opportunity to police via community checks. And as I mentioned, we see figures, we see text, we see plagiarized articles on our platform.
But at the same time, we're providing faster, open peer review editorial notes added to preprints. And there are those small number of retracted cases as well. So I just. Wanted to give you some examples of that editorial oversight comments that appear in this particular case. Of course, COVID related preprints, lots of them on the platform and lots of community comments and editorial oversight was needed in a number of these cases.
And here you can see the kind of flagging that happens on the preprint platform. Again here, there's an editorial note that's been added to the top of the paper. People are able to see further information to provide important context regarding the topic of this preprint in this particular case. And so with that, I will pass back to Muhammad and Thank everybody again for the opportunity to do this presentation.
I'll put some links for everybody into the chat box when I've finished presenting. So thank you so much, Mohammed. I think you're muted against. Yeah Thank you. Thank you very much for thoughtful work on the second technical challenge.
Now our next panelist is Matt hodgkinson. To talk on the subject of genetics, Matt, a research integrity manager at that kingdom research integrity office, ukri. And he's also the member of committee on publishing ethics. He's a council member of committee publishing ethics code. He has he has a BA in biological sciences from Oxford and MS in genetics from Cambridge.
He worked in open AccessScience publishing for 18 years itself, developing expertise in peer to peer prisons that espouses publication ethics as an editor at BMC and plus. And then heading the research integrity unit at. Headline of match. Thank you very much for joining us to talk on the subject of publishing ethics. Yes might.
So what to you? I Thank me for the invite to speak. So I'll be talking about research data, publication, ethics. So I'm now at UK Rio, the UK research integrity office, where a charity that offers guidance and education on research integrity to researchers and institutions.
And one part of that is an advisory service, so anyone can get free advice about research connected to the UK by contacting us, and that's impartial and confidential. So I've moved over from the kind of gatekeeping side of research integrity because I was heading the research integrity team at hindawi and now I'm very much on the the, the helping side council member at Cape and UK Rio.
So that's three years ago I think it was then irakere who's on screen here and Daniela who I both worked with before. At close they realized that although there was a lot of work done on data sharing and there was lots of work on publication ethics, there was this gap on research, publication, ethics, data, publication, ethics. So not the issues with doing collecting the data or the kind of research ethics approval and not the publication of articles, but actually the publication of the data sets.
And so it was a gap and there was an increased number of issues that were getting into the press, getting into retraction watch, issues of people using data from dating sites like okcupid without permission, retraction of data sets. And so they realized and actually a very recent issue as well is the STP in the Uc just in August announced the memo on public access.
Lots of attention is being paid to the open access requirements, but less attention has been paid to the data access requirements. So the scientific data underlying peer reviewed publications from February funded research will have to be freely and publicly accessible unless there's a good reason why not legal, privacy, ethics, IP or like security, dual use concerns, that kind of thing could have it remain non public, but that needs to be justified.
And so that's going to be in place in just over three years time. So all of this really focuses people's mind on the ethics of publishing the data because often these data sets that maybe would have been passed between research group, research group, and now if they're going to be public, then a lot of these issues become rather more acute. And so errata and Daniela pulled together this working group initially.
It was an informal group of including me and a handful of other people, and it's grown growing a lot and become a formal collaboration between force 11 and cope. And the aim is to come up with best practices and guidance for how to handle these issues and how also to liaise between publishers and institutions and the repositories so that we're all talking to each other.
So the working group has representatives from publishers, people who work with data researchers and repositories to try to get all the stakeholders working together. So there's now nearly 100 members, 94 members as of today, and the groups available there on that link on Air Force 11 and the guidance, which I'll just go onto in a second, has had about 4,000 views in a year.
So it was September of last year when that first output of the working group was published. So here it is. It's on xanadu, free to download, and after a lot of running through kind of what were the major issues affecting data publication? Then it came down into four broad categories. So there's a big one, which also affects massively affects articles as well as authorship.
So there's all sorts of issues with authorship disputes. There's also issues where the authorship of the data might be different to the article. There's also less set up for how it is that repositories will change things so often. They might have a change log, but they don't necessarily have formal ways of flagging things up. So repositories need to think about formal corrections or flagging up where there's been changes made and how they technically make those changes to the metadata.
They need to think about whether they want to use ORCID. You can see on the screen there that errata and Daniela have ORCID IDs used on snow day and credit. Maybe the credit taxonomy which assigns author roles which are particularly important for data. And then another issue which has come up, come to light in the last few years or but not come to light but being highlighted in the last few years is name changes happening retrospectively, for instance, after marriage or when a marriage ends, or people who are trans and they've had a name change how you deal with that?
So there are some issues that you can read through these on the. I won't have time to read through those. So you can read through those. But there's even issues like authors who are deceased, whether they're or out of contact. So there's lots of thorny issues that we may not have considered in detail before.
Another big one is legal and regulatory restrictions. So there's lots of intellectual property that can be affected. There might be national laws about whether certain types of data need to be restricted, and there might be licensed restrictions that mean that there's a clash between funder and general expectations. And the expectations of the data source that people might not realise, especially commercial restrictions, can be a real minefield for researchers where one part of a company might not realize what some research that their company were doing in collaboration with academics.
So there can be copyright breaches that doesn't normally affect data, but if there's images or text within a data set that may be copyrightable, there might be breaches of licensing. People might be using licenses in ways that aren't compatible. So there's lots of thought has to be played put into the legal framework. Risk is a real big one. And this is.
Both personal risks. So this human subjects. So they're things like people might have changed their mind about consenting. They might not have been adequate consent. There might have been a breach of ethics during data collection, or there could be a privacy breach so that the paper itself might be fine. But in the data set, somebody might have included personal identifiers, even names in the data set in maybe a spreadsheet, and so that might need to be redacted.
There's also wider issues such as affecting endangered species. So you might give away where an endangered species is located and Plant collectors might descend on it. There could be. Social risks. Even one that came out in the news a few years ago was Strava running data at US military bases was helping map those military bases inadvertently.
So that wasn't a research use of data, but it is the kind of inadvertent breaches that can happen when you're releasing data without thinking about the possible consequences. And then finally, we've got the issue of rigor. So there might be errors happening, there might be misconduct happening. So do we need to retract the paper, the data set? Both do we need to correct them?
Is the data sound? But the interpretation by the article wasn't, in which case the data set might be able to stand. Do you need to correct the metadata or on the. On the database. So these can be quite technical and difficult issues. So to help with this, then the working group is coming up with. What flowcharts say.
These are very well established with COVID cope. Some designer is creating these. There will be some more out this month or next month and these are to do with authorship, whether the data issue is spotted pre publication or post publication. And there's also policy templates for journals and repositories to be able to state what their policies are for working with research data.
So you'll be able to show that you understand the issues and make and be able to adopt those, those issues really easily. There's lots more issues to be discussed. I don't really have time to go through this, but there's lots of gaps in the resources for curation and peer review of data. This is often something that is left aside and we have to think if we want data peer reviewed, then how does that happen?
During peer review has to be available for reviewers. There has to be communication between journals and repositories. We have to think about the terminology. Are we OK talking about data sets being retracted and having expressions of concern, or do we need different language? And how do we deal with the complex legal cases that happen so which can drag on for literally years, whereas we want rapid responses to act ethically.
So that's just a whistle stop tour through the working group and some of the issues that have been dealt with and the contact details for me and errata and Daniela and the links to the working group and cope are there for you as well. Thanks very much. Well Thank you very much. Thank you. Thank you very much.
Matt covering the major aspects of research, publishing ethics. Now it's time for Q&A session. So let me see what chat box and session says. All right. So it. Restaurants those have been patiently answered by the mighty mercury. And it.
All all right. So a few questions I have in my mind. Do all of the panelists. All many of the first question and do I actually raised a few questions actually like after you took in the question that was basically hitting to my mind was what can we do to discourage and stop cyber leakage for ensuring the best ethical practices in.
Yeah what can we do? I think that there is the question of individual as well as communal responsibility. Research integrity is no individual's job, no individual organization's job. Everyone will have touch points on it. I think as well that what we need to do is consider a structure of basically consequences in that transparency is lovely.
The citation contribution of journal self citations was part of the JCR from 1975 forward, but it journal distortion by journal citation kept happening until we began removing impact factors from the product due to significant distortion of rank. That does get very difficult, but I think that that also touches on the kinds of issues that Matt was raising with the consequences of data misuse and of controversies on authorship.
So these are all the same. All the same question is, how do we responsibly conduct ourselves as publishers, as aggregators, and as individuals contributing to or advising researchers? Garrett I have nothing to raise. Any questions after we talk about this again? You know, I always bug you whenever we have a talk.
My question is, do we have any innovative solutions, any smart, innovative solutions to identify tax recycling? I mean, there are many. Yes I mean, we have eye solutions, for example, that can be used to identify. But yeah, building upon databases is Matt was Matt was talking about as well. I mean, that's the fundamentals of it. But above all, we rely on people flagging these kinds of issues to us.
All right. All right. Because I know the Pac is working on this innovative solution, and we have given a very smart, innovative solution. I actually use that. That's basically a fantastic, fantastic tool to identify the reproducibility and the toxicity and so of the art.
And there are many other actually forms. These are basically the smart tools, in fact, for this purpose. Yeah oh, that too. I was just listening in to giving talk that you wrote with Daniela and Russian. And do so. The coop has to actually having pretty clear guidelines on publishing ethics.
And to the last year, as you mentioned, Emily, that coop has basically worked with facility to train some guidelines on research and publishing ethics. So will you please elaborate precisely for me and for my audience? What what are and duty lines? What are the main outcome of those to.
I'm well, I mean, the main output is the. So there's the publication on the guidelines for which was published in September last year. So that's the main output. And then there's the flow charts as well. And the policy template. And part of cope's involvement is to provide the publishing publication ethics.
Advice on top of the what's happened within the working group and then also to provide some of the kind of infrastructure support, things like designing the publications and the flowcharts. So it's the kind of thing that the designer can do that lesser mortals can't produce as fantastically laid out, laid out flowcharts.
So it's really that's the kind of basis of the, the co-op involvement and the working groups actually open for anyone can sign up to the force 11 website and join the working group. So it's not it's not a closed group or invite only it's anyone who's interested can join. So let's get it over 100. All right. All right.
Thank you. Thank you very much. Anybody has said from the audience and for the question for our panelists to pitch. If you have a question, please welcome. All right. I think.
It's time for today's session. And cold today as the education committee is thankful. Yes Thank you very much for our panel. Thank you for attending today's webinar. And of course, Thanks to our 2022 education sponsors. Rfa j. Editorial OpenAthens silverchair, 67 bricks and Taylor Frances f 1,000 attendees. You will again receive a post-event evaluation via email.
We encourage you to provide feedback. It helps us pick topics for future events and please check the cesp website for information on upcoming cesp events. Today's discussion was recorded and all registrants will receive a link to the recording when it is posted on the cesp website. This concludes our session today.
Thank you very.