Name:
The role of the information community in ensuring that information is authoritative
Description:
The role of the information community in ensuring that information is authoritative
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/6b00443b-aad3-430a-897b-3dfdd11a00a5/thumbnails/6b00443b-aad3-430a-897b-3dfdd11a00a5.png?sv=2019-02-02&sr=c&sig=wVgKjoEfZ5%2FO9ZvU%2FLWwyH0J%2BoMEaFDw9WO%2BsDQrlO0%3D&st=2025-01-22T08%3A01%3A40Z&se=2025-01-22T12%3A06%3A40Z&sp=r
Duration:
T00H53M12S
Embed URL:
https://stream.cadmore.media/player/6b00443b-aad3-430a-897b-3dfdd11a00a5
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/6b00443b-aad3-430a-897b-3dfdd11a00a5/The role of the information community in ensuring that infor.mp4?sv=2019-02-02&sr=c&sig=xmFavoftj9S446oeL8tz6Ix6mN6FubGxiTJQ5lQuwCE%3D&st=2025-01-22T08%3A01%3A41Z&se=2025-01-22T10%3A06%3A41Z&sp=r
Upload Date:
2022-08-26T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
Welcome to this NISO Plus session on the role of the information community in ensuring that information is authoritative. This has been a challenge for researchers, publishers, librarians, students, and the lay public since the very first scholarly publications and one that is far from solved. Rapid sharing of results during the research process, from data sets to preprints, with multiple versions and copies on the internet, has certainly not made it easier to ascertain what information is authoritative.
Peer review, which has been held up as the gold standard for decades, is increasingly under pressure just due to the sheer speed and volume of research. And retracted articles can continue a vibrant zombie existence on the web. So, in the midst of a pandemic and climate crisis, we need authoritative information more than ever before.
With this session, we are therefore looking for the input of the entire audience and hope to come up with some action points by the end that address a question of what can and should we be doing to safeguard the integrity of the content being created, disseminated, and used. Luckily, we have some fantastic experts to guide us. We will start our journey with data generation in a talk entitled "Design as a Tool for Producing Authoritative Information in Online Crowdsourcing" from Samantha Blickhan of Zooniverse.
Then, we will move to the first publication of the preprint with a talk on "Open Science and Signals of Trust" by Nici Pfeiffer of the Center of Open Science. From the preprint to the publication, Dr. Bahar Mehmani of Elsevier will walk us through the role of the publisher in a talk, "Providing Information and Building Trust." Finally, if all safeguards have failed, we turn to the retraction.
Jodi Schneider of the University of Illinois will give a talk, "Safeguarding the Integrity of the Literature-- the Role of Retraction." During the shared screening of the recorded talks, you can add your questions and ideas to the Chat. We will then reconvene on Zoom for an in-depth discussion of challenges and possible solutions and action points. So, before we get started, let me introduce our speakers. Samantha Blickhan--
SAMANTHA BLICKHAN: Hello.
SPEAKER: --is the humanities lead of Zooniverse and co-director of the Zooniverse team at Chicago's Adler Planetarium her duties include guiding the strategic vision for Zooniverse humanities efforts, managing development of new tools and resources, and leading research efforts, which have been supported by a wide range of funders. She's co-investigator of the Collective Wisdom Project, which produced the Collective Wisdom Handbook, an authoritative book on state of the art in cultural heritage crowdsourcing in 2021.
SPEAKER: Super interesting. Then, I would like to introduce Nici Pfeiffer.
NICI PFEIFFER: Hi.
SPEAKER: She's the chief product officer at the Center for Open Science, where she focuses on the infrastructure researchers and stakeholders need to adopt open practices. Her team has built the Open Science Framework and Open Source Free Tool for Research Collaboration and Management, with workflows for the entire research lifecycle, from planning and pre-registering studies, reporting outcomes and outputs to posting preprints and discovery and re-use.
SPEAKER: So, the next in our lineup Dr. Bahar Mehmani, a reviewer experienced lead in Global STM journals team at Elsevier.
BAHAR MEHMANI: Hello. Thank you.
SPEAKER: She leads Elsevier's peer review strategy and overseas projects related to researchers and academics' pain points throughout the peer review process. Bahar is a member of the NISO Peer Review Taxonomy Working Group and the chair of the Peer Review Committee and council member of the European Association of Science Editors. She received her PhD in Theoretical Physics from the University of Amsterdam in 2010. And before joining Elsevier, she was a post-doc researcher at the Max Planck Institute for the Science of Light.
SPEAKER: So exciting to have you also as part of our panel. And to wrap it up, I'm happy to introduce Jodi Schneider. Jodi is assistant professor at the School of Information Sciences at the University of Illinois, where she runs the Information Quality Lab. She studies scholarly communication and the science of science through the lens of arguments, evidence, and persuasion, with a special interest in controversies in science.
SPEAKER: Her recent work has focused on topics such as systematic review, automation, semantic publication, and the citation of retracted papers. She's held research positions across the US as well as in Ireland, England, France, and Chile, and she leads the Alfred P, Sloan-funded project reducing the inadvertent spread of retracted science, shaping a research and implementation agenda. So I think we've got a really exciting group of experts here.
SPEAKER: And I'm happy to now turn over to Samantha for our first talk.
SAMANTHA BLICKHAN: Thank you so much. All right. I am just going to kick things off by sharing my perspective on where questions of authority rise in the world of online crowdsourcing and the responsibilities that those of us who are creating spaces for this work to take place have to help ensure that the data being produced is at a quality level that meets the expectations of the people who are using it.
SAMANTHA BLICKHAN: Full disclosure, I am not a member of the information standards community, but I do think there are some valuable lessons from the field of crowdsourcing and public data creation that are extremely relevant to this question of responsibility and how to ensure that we're producing authoritative content. So, just some very quick background for those who aren't familiar.
SAMANTHA BLICKHAN: Zooniverse is the world's largest platform for online crowdsourced research. We refer to ourselves as a platform for people-powered research, and we do this by providing a space for researchers to build and run projects, which invite volunteers to help teams process data to aid in their research efforts. And we work with hundreds of research teams around the world.
SAMANTHA BLICKHAN: And since the platform launched in 2009, over 2.4 million registered volunteers have collectively produced almost $650 million classifications across more than 300 projects. And the main tool that supports this work is called the Project Builder, which is a browser-based tool that allows anyone to create and run a Zooniverse project for free. I've included the URL to the Project Builder at the bottom of this slide.
SAMANTHA BLICKHAN: It's fairly straightforward to use, as you can see from the screenshot. You move through the blue tabs on the left, and they allow your team to do things like add content, build workflows, setup message boards, upload data, and export results. More than 250 peer-reviewed publications have been produced using data from Zooniverse projects, and I have also included the link to our Publications page on this slide.
SAMANTHA BLICKHAN: So the main part of my talk will be about the ways that authority can be a barrier in crowdsourcing and some of our responsibilities as practitioners and platform maintainers to breaking those barriers down. This is a quote from the Collective Wisdom handbook, which was written last year by a group of leaders in the field of cultural heritage crowdsourcing, myself included. And I think this quote really sums up the most common barrier that we see, which is authority as a barrier to trusting the data produced through crowdsourcing, generally because it's not coming from a typically authoritative source.
SAMANTHA BLICKHAN: But I think the question of whether or not crowdsourced data is trustworthy isn't that straightforward for many projects. Context is extremely important here. I think there's a broad range of data we're working with, and each tends to have its own method for determining quality. So if it isn't clear how a team is planning to use or evaluate their data from the onset of the project, that's when they end up with results that they aren't able to use or that might be considered to be low quality.
SAMANTHA BLICKHAN: In the Collective Wisdom handbook, we defined the three main dimensions of data quality for cultural heritage crowdsourcing projects as fidelity, completeness, and accuracy. And I want to dig in a little bit to fidelity here, which refers to the digital representation of an object following project guidelines, which is a crucial point of that definition.
SAMANTHA BLICKHAN: So for example, a text transcription project might ask volunteers to type out the letters in a historical document as written, including preserving spelling mistakes. So a corrected transcription would have increased accuracy but would thereby reduce fidelity. So, if a desire for fidelity is part of a team's goals, that data is going to require an adjustment of the methods that they would typically use to determine accuracy of the results.
SAMANTHA BLICKHAN: So alongside these three dimensions, we also talk about auditability, which is a particular consideration with crowdsourced data as well. We can lend authority to crowdsource data by ensuring that we're also providing information about how it was produced. And just as an example, this is the data model for Zooniverse data exports, and it includes all of the fields shown here, among others, which are largely project-specific.
SAMANTHA BLICKHAN: But this type of information can give users or reviewers of the data the opportunity to audit the origin of the above information in relation to the project through which the data was created. So, we're giving some context and history to this data through these practices of auditability. As I noted earlier, there are many publications which show the quality of data produced through crowdsourcing.
SAMANTHA BLICKHAN: And I'll just use one example today, which is the most recent of this type of publications for Zooniverse projects, and it's from the team behind the Gravity Spy project. So their project works with data from LIGO, the Laser Interferometer Gravitational-wave Observatory, which detects gravitational waves in space. And in order to be able to do this, it has to be incredibly sensitive and is therefore susceptible to a lot of noise known as glitches.
SAMANTHA BLICKHAN: And the Gravity Spy team is trying to identify what causes these glitches so that they can be removed from the data. The size of their data sets makes it extremely difficult for experts to examine the entirety of the data, so the project is aiming to train machine-learning algorithms to help with this work. However, we know that machine-learning algorithms don't always do a great job with unexpected features, and volunteers on a crowdsourcing project might not recognize new features without guidance, or if they do, they're not as sure how to interpret them.
SAMANTHA BLICKHAN: So, the team is using both approaches, and this paper compares instances of volunteer-discovered and computer-discovered glitches and ultimately shows that Zooniverse volunteers can identify new glitches at a level similar to experts. That is an extremely simplistic overview of the paper, which you can read via the link I have included on the slide. But I have also included a link to a really good Twitter thread written by the team, which is a great explanation of the paper results, for those who might not feel comfortable taking on an academic astrophysics paper.
SAMANTHA BLICKHAN: And I did pull out a quote that I really appreciate, which is that, "with proper frameworks, volunteers can do more than classifying." And I think that really echoes the sentiment of the collective wisdom that I shared a few slides back. That the success of a project like this is really contingent upon intentional and well-thought-out project design. So, authority can also be a barrier to participants in online crowdsourcing projects, often in the form of hesitation or low confidence.
SAMANTHA BLICKHAN: And on Zooniverse project message [? boards ?] across disciplines, the most common concern from volunteers is, what if I get it wrong? And it's the responsibility of practitioners to help mitigate these feelings of uncertainty by clearly signposting any requirements of prerequisite knowledge needed to take part in a project, or even signposting when no background knowledge is required to participate. On the Zooniverse platform, we use a multi track approach to data collection as a quality control method.
SAMANTHA BLICKHAN: And basically, that just means multiple people will engage in the exact same task to produce many versions of the same data. And project teams must then aggregate the data together to determine a consensus response for a given task. This can be a really effective method. But if participants don't have an awareness or understanding of any quality control methods that are built into a platform or into the design of a project, we can't expect them to be reassured by them.
SAMANTHA BLICKHAN: And so this communication is a key responsibility of the people who are running the projects. So, now that I've shared a couple of ways that authority can act as a barrier in these situations, I want to talk about how we can break those barriers down using design solutions. So, first, rather than waiting until data is produced and implementing quality control or validation steps on the results, we can actually design projects based on our own experiences and the experiences of others.
SAMANTHA BLICKHAN: The first paper I have linked to here is an internal approach to raising quality standards across transcription projects on the Zooniverse platform. It's a study that I and several of my colleagues ran a few years back to test the quality of transcription data produced through individual and collaborative methods. So we did an A/B experiment with two different transcription tools and compared the results against gold standard data. We were able to show through this work that the collaborative methods produce significantly higher quality transcriptions, and so we chose to add those collaborative tools to our Project Builder toolkit.
SAMANTHA BLICKHAN: The second paper I linked here is another example of this type of internal approach but using early results from a project to iterate on the project design. And it was written by the team that runs the Transcribe Bentham project. And their paper describes how, after running a project for some time, they used an analysis of their results to implement changes to the transcription workflow that resulted again in higher quality results.
SAMANTHA BLICKHAN: And then the final point on this slide here is about testing. On Zooniverse, we require all teams to go through a process of internal review by our team, as well as beta review, by the Zooniverse community before their project can launch publicly. And teams are required to take this feedback very seriously, including demonstrating to us how they edited their project based on the beta feedback they received.
SAMANTHA BLICKHAN: And we made this choice as platform maintainers to require teams to put more effort in upfront to ensure the usability and reliability of the data being created through their project. So the final thing I'll note here is that if you want a large number of participants a crowd, if you will, you have to design for a general audience, not for specialists in your field. And the paper here that I've linked here describes the design methods we use for creating the Scribes of the Cairo Geniza project, which invited participants to transcribe manuscript fragments written in Hebrew and Arabic script.
SAMANTHA BLICKHAN: I should say invites, because the project is still running. So in this paper, we shared not only the technical code for creating the clickable keyboard shown here in this image, but also the adaptations we made to the original project design concept, which ultimately allowed us to welcome a broader audience to the project. I'm going to stop there for time, but I really look forward to the discussion.
SAMANTHA BLICKHAN: And I'll now pass it along to Nici. Thanks so much.
NICI PFEIFFER: So, thank you, Sam. It's really interesting what your team is doing and the support for crowdsource data. I just really liked seeing that. So I'm going to try to pick up from there and-- sorry, I'm getting the Zoom stuff in my way, of course-- pick up from there and talk a little bit about "Open Science and Signals of Trust." So just from my perspective, thinking about the open science landscape, bringing that down into the research lifecycle where transparency and openness is happening across all the stages, and thinking about the topic for this entire session and the different perspectives we're bringing in, to think about the trust verification review and evaluation of what is produced through researchers, really trying to take on those practices of open science.
NICI PFEIFFER: And, I guess, just one thing to kind of ground a little bit of what I'm going to bring forward is talking about this as a culture change for research or science in and of itself. And that it's not something that starts in one corner, it's happening across lots of disciplines, lots of geographic regions. And there is adoption at certain stages, and just thinking about beyond that, the ecosystem that has to support that adoption.
NICI PFEIFFER: And so this is one of the things that we talk a lot about for our organization, the Center for Open Science and how we approach open science adoption. And there's different components to that that really help push that forward. So one is the infrastructure tools like Zooniverse and the OSF and lots of other tools that exist out there. It's the communities and establishing new norms and standards around open science practices and data sharing.
NICI PFEIFFER: And then having that aligned with the policies for four publishers and journals and funders and then the incentives within sort of the research ecosystem. So just kind of bringing that back into perspective and just to call out a little bit of the incentives, the incentives are for individual success or focused on sometimes getting it published, not necessarily getting it right. And I think that's really called out and what you brought forward, Sam, on the crowdsourcing and the quality controls.
NICI PFEIFFER: And I think that's a lot of what we're trying to get at today. So I think that puts that into focus. And then one of the other things that kind of comes up when you talk about research data and open science is reproducibility. And one of the things we really try to support the shift here is around thinking about what's the end goal. If the end goal is to publish as a preprint or as an article and to share what you found in your study, that then becomes questionable when it gets into the reproducibility of that.
NICI PFEIFFER: So if you start with that in mind and you think about all the things that you curate and develop as artifacts, as part of that research study, making sure that is also evidence that you can provide with that end result, that final publication or that preprint. So, I really want to talk a little bit about the scholarly publication system and open science and sort of how does that all fit together.
NICI PFEIFFER: And there's new models that have been developing and are still happening to advance rigor transparency and rapid dissemination. So a couple of those just kind of popped on the slide just to think through a few of these. The first that you see with that, stage one peer review and stage two peer review is talking about registered reports. So this is a publication model but it starts with this study design and development, going in for peer review at this stage before data is collected, and getting that review and that peer feedback before then.
NICI PFEIFFER: And at that same time, getting in principle and acceptance into a journal so that when you do collect that data, you do finish your study and come out with reporting on your conclusions and your findings, regardless of whether you found a null result or something novel, that that still gets published, it still becomes part of the scholarly record. And this is really critical for us as a community to continue to have shared knowledge, to build that knowledge base about what is happening in the research landscape and then be able to reuse and build on that work.
NICI PFEIFFER: There's a couple of great groups out there that are really working on this. I just called out one, which is Peer Community In that sort of supports community crowdsourcing, if you will, of register reports and reviews and recommendations. Another model that we'll talk a little bit more about today is also posting preprints. So I'm going to spend some time on that, and that's where I'm going to go directly, but also to call out the badges.
NICI PFEIFFER: So this is something that other repositories and journals have been implementing to help call out where the steps in the process for having open science practices, open data pre-registration, or open materials [? are ?] when they're available to really one, provide some incentive and reward for that work but to really make it clear that that has taken place in the process. So again, just a little bit more on the register report model because I think there's a lot of value in it.
NICI PFEIFFER: It helps to improve research quality. It puts peer review prior to data collection. There are many journals at this point that are starting to adopt these formats. And, again, the real true benefit to us as a community is, null findings are getting published. So I just want to put that out there. I'm not going to spend too much more time on it. But where I would like to go is to talk about preprints.
NICI PFEIFFER: And one of the things, just in the simple process of conducting the research, a simple upload of what your findings are. And that can be revised over time, which is-- really, what's exciting is the transparency that preprints offer, as feedback is generated from your community, through crowdsourcing, through open review or even community review like Peer Community In. And so sharing that and then continuing to revise that as it makes its way towards our being published in a journal, in formal peer review and the discovery reuse.
NICI PFEIFFER: What's really interesting is, there were some studies done on this, and some of the data is saying that-- and I dropped a link to the full preprint on bioRxiv-- that journal articles that were initially preprints have higher citation rates. So this is really exciting to see. I think there's some alignment there with this model and the incentives that researchers do care a lot about, those citations.
NICI PFEIFFER: And so this is a really good thing to see. Where I'd like to go next is just to kind of talk about something that we were really interested in early on when we started offering infrastructure for preprint communities, for communities to come together and launch preprint servers around their discipline or specific topical areas, was to really understand that signals of trust on preprints.
NICI PFEIFFER: So this is really what the session is getting at. When these preprints come out, how do you know that you can trust what is there? And we did a survey of over 3,700 members of the community-- and that's researchers to stakeholders and everywhere in between-- and to find out what is that signal of trust. When you land on a preprint, how do you know that you can trust the credibility of what you're seeing?
NICI PFEIFFER: And what we found was really interesting, that it would be extremely important for the consumer of that preprint to have information regarding the materials that went into the study, a link to the study data. And the analysis scripts use to analyze that data. Information about independent group reproduction. So was the study reproducible? Or, were there attempts at reproducing it?
NICI PFEIFFER: And if so, were they successful? What did they find? Independent robustness checks that are done separate from the research team itself. Any conflict of interest disclosures, linking to the pre-registration, and any other information about independent groups using that linked information or that link data in some way. So there was a survey we did and a study funded through the Sloan Foundation, and so there's open data and the survey materials there.
NICI PFEIFFER: And I think this is really getting at some of the things we're calling out in this session around what is really important to help support the authority of the content that's being posted. And it's really to say that-- I think the researcher and the research team can help support that need, that platforms can help facilitate a lot of that workflow when these materials are being posted, when this content is being there.
NICI PFEIFFER: And just to kind of take that one step further, we actually took that survey data and implemented that across the preprints that are hosted on the OSF preprint infrastructure. And so you can see, when an author uploads a preprint, they're asked to assert some specific things around their data and pre-registration and conflicts of interest. That's on the left side, the researcher workflow that is uploading things.
NICI PFEIFFER: But then on the right-hand side, you're seeing a published preprint with those assertions at the top. So you can see the conflict of interest. You can see if there's public data available and actually follow that. And it would be really great to take that all the way to the badges and the other things that we talked about. So as a community, we can start to move into that direction.
NICI PFEIFFER: That's all I have to share, and I'll pass it now to Bahar to talk about it from a publisher perspective. Bahar.
BAHAR MEHMANI: Thank you, Nici. Just sharing my slides.
BAHAR MEHMANI: Well, I'm going to just build upon what Sam and Nici already alluded to, which is building trust. And, well, I'm going to just state that publishing is information-providing, regardless of the platform or journals that are providing it. And the role of scholarly publishing and platforms is to provide a trustworthy scientific information to the community for building on top of the existing knowledge and progress of science.
BAHAR MEHMANI: And that's why journal publishers and also platforms owners such as preprints all try to ensure that the content that they are providing is trustworthy. And that's the way that, for example, Nici and Sam provided [INAUDIBLE] mostly through communication channels, workflow tools, best practices, information on the platform, collecting data, and analyzing it, and providing more infrastructure to the users and the community.
BAHAR MEHMANI: Now focusing on the article publishing within the journals, I'm going to just go through the steps and see how journals are actually controlling the content trustworthiness through each step. So from journal information providing to submission systems to the peer review process, and, eventually, article publication.
BAHAR MEHMANI: In 2019, we ran a survey to get a [? sense ?] about science, interestingly, similar to what Nici also mentioned during the presentation, asking academics from different disciplines and different locations, what do they think about the trustworthiness of the peer review process and the published content? And interestingly, we found that 1/3 of the participants in the survey, more than 3,000 people, find to some extent the content that they read not very much trustworthy.
BAHAR MEHMANI: And when we asked them what they do in order to ensure the quality and trustworthiness of their content, we got similar answers to what Nici presented, namely, the most important ones were to checking supplementary material and data, to check the linking to the peer review process, whether the content has been published in a peer reviewed journal, and also, reaching out to their peers to get some recommendations about the content that they want to read.
BAHAR MEHMANI: So to make it quite clear, similar to OSF, I think most journals try to ensure that the submitted manuscript has data, code, methods, quite clear and available.
BAHAR MEHMANI: And the good journals in the sense of good quality make sure that they have a transparent peer review process information on their journal home pages through the submission system. An example is the NIS-- or, previously, SDM peer reviewed taxonomy implementation-- many journals are now using that terminology to inform their community about the peer review model of the journal and the type of interaction and the level of transparency between the different parts.
BAHAR MEHMANI: Collection of the content to the journal to the previous articles, to the bigger body of the information also to the site and articles relevant articles around it. And last but not least, also, a good clear retraction note and retraction policy that makes the integrity of research and ensures the integrity.
BAHAR MEHMANI: Now, just to show you some examples, this is an article page on Plos One. Clearly shows that, for example, the content is peer reviewed. It also mentions the peer review information quite clearly on the article page. Another example is Nature Communications, which clearly links all the authors to their [? archive ?] ID and makes also clear links to data availability, conflicts of interest, all sorts of other relevant information related to the survey results I mentioned.
BAHAR MEHMANI: And this is an example from Elsevier journals, some of Elsevier journals that are participating in transparent peer review, where on the article page, you not only see all the information about the authors but you also can find the peer review comments next to the article. On the journal itself, well, the good journal homepage is a journal homepage where it clearly mentions information on the impact speed of the publication and the peer review process, aims and scope, editorial names and contacts, and the peer review model.
BAHAR MEHMANI: So, again, some examples. This is an example of a society journal-- clearly lists the editors, links all of them to their [INAUDIBLE] their email addresses. Another example where journal clearly mentions items around the peer review speed or, quite recently, we also started to show the gender composition of the journal editorial boards on the [INAUDIBLE] journal home pages and benchmark it against the gender diversity within that subject area that the journal is covering.
BAHAR MEHMANI: Through the submission system also, there are lots of checks and checkpoints to ensure trustworthiness of the contents and best practices to provide clear guidelines to authors and reviewers to ensure rigor by inclusion and diversity, have a thorough peer review process in place, and also a clear guideline in terms of research integrity.
BAHAR MEHMANI: So I'm going to just give you some examples in terms of guidelines for authors. So declaration of interest. Again, related to the survey results that both Nikki mentioned and we see also in other surveys that it is quite important to see whether authors of the paper had any conflict of interest. And the declaration of interests is something that we introduced to all of our journals now that our authors [INAUDIBLE] submission can clearly mention any conflict of interest.
BAHAR MEHMANI: Many, many journals are making this mandatory as well. Ensuring reproducibility-- again, another item that also was mentioned by Nici. Good journals are the ones that are taking this quite seriously. So they require all kind of guidelines to ensure that the content provided at the end is reproducible, correct, complete, and available. The guidelines that those journals usually use are included in STAR methods.
BAHAR MEHMANI: They also use CONSORT for transparent reporting of trials, also for systematic review and meta research. PRISMA is used when there are loads of other guidelines also out there. Many journals are using these guidelines and many more. I'd like to also highlight another guideline related to sex and gender reporting, which is an important topic not necessarily only for health and medical journals but, in general, for engineering, computer sciences.
BAHAR MEHMANI: Think about AI. It's quite important to ensure that the sex and gender segregation of the data points are included, too. We are now introducing that into the guidelines for authors and reviewers from across journals. After submission and also checking for compliance with all these guidelines, there comes the peer review process. And the other speakers also mentioned-- I think it was Tiffany at the beginning-- mentioned that peer review is quite under pressure, and rightly so, the sheer number of submissions doesn't allow for the peer review process to be as slow as it used to be.
BAHAR MEHMANI: And also, the urgency of the topics that we are facing and dealing with. There are many, many journals and publishers out there who are using all kind of AI-based tools and machines to fasten, to accelerate the process, for example, finding the right type of reviewer. And what we need is to ensure that those AI tools are not reintroducing the existing bias in the scientific community and also ensuring that the AI that is being used is responsible.
BAHAR MEHMANI: And last but not least, peer review process needs to also ensure integrity. So there wasn't quite a nice explanation of registered reports. I'm not going to mention that further. But some of the ways that peer review process is trying to ensure that bias is reduced, namely, in terms of confirmation bias, let's say, are those kind of article types-- registered reports.
BAHAR MEHMANI: There are also results-masked types of research articles in the sense that the results are already available but the reviewers are not able to see the results. Only they are requested to review the methods and study design. And the number of these result masks-- the kind of journals that are offering these results mask studies are increasing in recent years.
BAHAR MEHMANI: We also hear journals or even an entire society or a publisher switching from one peer review model to the next to ensure that the bias against marginalized people is being addressed. And talking about biases as set, tools are also part of the story. So, we have to make sure that we are not reiterating the existing bias in the research ecosystem that is reflected in the data.
BAHAR MEHMANI: For Elsevier, for example, the tools that we are offering to our editors to match the best type of reviewer to a manuscript are based on Scopus data, which we know, based on loads of studies, are not necessarily biased, good representative in terms of, for example, citation metrics. Because studies show that marginalized groups receive less citation for the same content that they publish.
BAHAR MEHMANI: So, we had a group of data scientists who are running projects for quite some time to make sure that the fairness is embedded in the review recommendation tool that we provide to our journal editors. And more than that, because there is a certain limit to build in fairness in the tool, we also need to ensure that editors are quite aware of the limitations of the tools that they are using.
BAHAR MEHMANI: So being transparent about the limitations is also part of it in a similar way that a good research article is an article that also states limitation. And then last one is also to engage with the journal editors, to support them with other tools that enhances the diversity in the reviewer pool that they are looking at. Historically, we never have asked academics at the submission system-- by we, I mean journal publishers-- to identify, for example, the gender that they feel close with.
BAHAR MEHMANI: But quite recently, we started to collect that information by providing the option of "I prefer not to disclose" so that we can see what is the composition of journal authors versus journal reviewers because there are loads of statements out there and in isolated studies that show that reviewers and authors of a journal are not necessarily from the same geolocation, for example, or from the similar groups.
BAHAR MEHMANI: So we started with the gender where we are going to expand this also to include race and ethnicity, which is part of a bigger effort organized by many, many journals joint commitment that is led by Royal Society of Chemistry. So then, hopefully, our old journals will have the same schema to introduce to collect race and ethnicity just in a responsible manner to understand what is the composition and how can we ensure that the bias is not occurring.
BAHAR MEHMANI: And I'm going to finalize my presentation by also mentioning something about research integrity and on that big realm of research integrity, zooming into retraction. Because, again, we heard at the beginning that there is a problem of the retracted articles still getting cited and reviving every now and then. How can we ensure that the retracted articles are not cited if they are not relevant to the study?
BAHAR MEHMANI: What we are now testing [? with ?] and we are hoping that we can expand it beyond our own journals is to identify the retracted papers that are cited in a manuscript at the submission level to the authors and, eventually, to editors and reviewers so that they can see that, oh, the article, for example, number one in the table is retracted and then the author needs to validate if this citation really was necessary or not.
BAHAR MEHMANI: Hopefully, this will reduce the number of citations of the retracted paper. And with that, I will hand over to Jodi. Thank you.
JODI SCHNEIDER: What a wonderful transition. Thank you, [? Bahar. ?] It was amazing. So I'm going to talk today about safeguarding the integrity of the literature and the role of retraction in this, particularly. And you may know that retraction is intended to minimize the number of researchers who cite something. And that's because retraction is meant to correct the literature when there are serious flaws.
JODI SCHNEIDER: Unfortunately, it's not really working as we intend. There's investigative journalist, Charles Piller, who wrote about the continued citation of two COVID-19 articles. These were the [INAUDIBLE] retractions that came out in May 2020. They were retracted in June. They currently have, I think, 1,100, 1,300 citations [? together. ?] So in January 2021, the piece that came out said that over half of these inappropriately cited the retracted articles.
JODI SCHNEIDER: Over half sounds really bad. In fact, this is, I would say, a shining star of retraction working much better than average in practice. And I hope that that will be something that the research community is going to work to address. And I just want to give you some sort of concrete [? miss ?] here. So here is one of those articles published in the New England Journal of Medicine.
JODI SCHNEIDER: You see, this article has been retracted. Great. It's fantastic to have that at the publisher's site. This sometimes is a challenge. But unfortunately, there are other places that the person might search for the article and not notice that it's retracted. So if I look in Google Scholar for this article, the default that the Google Scholar gives at least on my computer is just the best result for this search.
JODI SCHNEIDER: So I see it's about 1,174 citations, and there's no sign of the retraction until you click through to the New England Journal of Medicine. I thought, well, let me check and see, what if I show all the results? The retraction does show up. It's been cited a little and [? the ?] expression of concern.
JODI SCHNEIDER: So it's there, but it's something that someone has to really dig for. And I think this is possibly part of the problem. We've been studying other situations like this as well. My team studied a human trial article published in 2005 in Chest, and it was retracted in October 2008 because an author falsified data. And it's now, at this point, been cited more after retraction than before.
JODI SCHNEIDER: 96% of those citations in the ones that we looked up at up to 2019 were inappropriate. I learned one of the complications in this case was journal transfer, and the Chest page for a long time, while I was studying it, didn't make any indication of the retraction that thankfully is fixed. But this is unfortunately a common problem. And so what happens is that the information can spread not just to those articles that directly cited but beyond.
JODI SCHNEIDER: And lately, I've even seen the retraction being indicated in one of the 2021 citations indicated in the bibliography. But if you go back and look at the text, the author did not seem to have any awareness because it's just cited as normal. So there, it looks like the publisher may have taken action. So lots of these challenges, in this case, the Chest when we were looking at this human trial article, linking errors were a real problem and [INAUDIBLE] could get us from their database to the retraction notice.
JODI SCHNEIDER: But otherwise, the retraction notice was almost impossible to get to. So all of these sort of information system things. The other thing I really want to emphasize is, the continued citation is not rare. My team also studied a set of 7,000 retracted papers in PubMed, looked at the citations post retraction in PubMed Central. There were 13,000 of these that we found, 94% inappropriate cited the retracted article.
JODI SCHNEIDER: You can see on the right, these just look like normal citations. We looked at where the citations happened in the article. There was no difference. Before retraction or after retraction, they were just being treated as normal science. And the exceptions were, in many cases, people like me who are studying retracted articles, there are certainly reasons that we may want to cite something that's retracted, to talk about it historically, to say that we're studying it, to point out that we specifically excluded it in systematic review.
JODI SCHNEIDER: But that's appropriate citation. That's citation with awareness. Nothing wrong with that. I do it all the time. That, in 94% of the cases, at least in biomedicine, is not what's happening right now. And I think we need to fix that. So, I have been working for about two years now with a group of stakeholders to try to understand, how do we reduce the inadvertent spread of retracted science?
JODI SCHNEIDER: Here's the advisory board that came along to help me in this process. We presented a little bit about this last year at NISO Plus just as we were coming to the end of the stakeholder consultation. And the recommendations that we presented then have now been finalized into our final form, which is basically developing a systematic cross-industry approach to ensure that we get consistent, standardized, interoperable, timely information.
JODI SCHNEIDER: So it's not enough if the publisher has it and Google Scholar doesn't. It's not enough if, you know, it's OK in one database but not in some others. And fundamentally, this information needs to be everywhere. Situation is continuing to improve, but it needs a lot more work. We think that there needs to be a taxonomy of retraction categories.
JODI SCHNEIDER: And classifications not about honorable and dishonorable retraction that's really challenging, but just the very basic. We don't agree in scholarly communication about what withdrawal is and when it should be used, for instance. I think that is a solvable problem in the near term. We need best practices for coordinating the retraction process, and the [? Clue ?] report came out shortly after we completed our hours.
JODI SCHNEIDER: And they have fantastic ideas about making these connections between institutions on better supporting the process in these multiple different ways. And there's also a huge need for education, thinking about the whole of this stewardship process right from preprints through peer review, through publication, through perhaps post-publication events like retraction. So, thanks for all the vivid discussion last year.
JODI SCHNEIDER: I hope this year, we'll have even more. As a result of those conversations last February, there is a group that is now launching-- CORREC-- to look at the communication of retractions, removals, and expressions of concern, which is meant to focus on the dissemination, the metadata, and the display sorts of information. Not what a retraction is, but once something is retracted, how do we get that information around in the community?
JODI SCHNEIDER: And I would encourage you to help us figure out, what are the priorities in this space? So, looking forward to discussing more.
SPEAKER: Thanks so much, Jodi and everyone. It's just been a fascinating round of talks, and I am really excited to start the discussion. So without further ado, we're all going to rush to Zoom and we will be open for your questions. So thank you very much for listening.