Flagging Predatory Journals to Fight "Citation Contamination"
Flagging Predatory Journals to Fight "Citation Contamination"
SYLVIA IZZO HUNTER: Hello, and welcome to Flagging Predatory Journals to Fight "Citation Contamination." My name is Sylvia Izzo Hunter, and I'll be introducing our speakers and providing a bit of context for their presentations. Our first speaker is Kathleen Berryman. Kathleen is the Director of Business Relations at Cabells, where she's responsible for developing projects and relationships to benefit the academic community. She has spearheaded Cabells' initiatives to identify and combat deceptive publishing practices in academia, and has assembled a team to protect the integrity of academic publishing by systematically identifying fraudulent publications with reliability and objectivity.
SYLVIA IZZO HUNTER: We'll also hear from my colleague, Liz Blake. Liz is Director of Business Development at Inera, where she oversees both sales and marketing activities, and works with the solutions team to ensure that eXtyles is configured and deployed to meet each customer's unique requirements. Before joining Inera in 2002, Liz worked as a scientific manuscript editor and was one of the original beta testers of eXtyles So her relationship with our reference processing technology goes back to the beginning.
SYLVIA IZZO HUNTER: So where did this webinar come from? If you read the Scholarly Kitchen blog, you've probably read at least one of these posts over the past couple of years. Fighting Citation Pollution-- The Challenge of Detecting Fraudulent Journals in Works Cited was the second most read Scholarly Kitchen post of 2019, and it's an issue we at Inera have been hearing about from our customers, too.
SYLVIA IZZO HUNTER: Clearly, a lot of us are thinking about this issue and wondering what, if anything, we can do about it. When we say citation pollution, or citation contamination, this is what we're talking about. Citations to articles published in journals with various bad practices, such as no peer review, or invented editorial boards, hiding in plain sight in authors' bibliographies.
SYLVIA IZZO HUNTER: Predatory, or fraudulent journals of various kinds, are not a new problem. Launching a real journal requires a lot of hard work, but setting up a fake one is easy. Identifying and avoiding these journals isn't so easy. For a long time, Beall's list was a de facto standard resource, but Beall's list was suspended in 2017. You can still find free archived copies, but since these copies aren't up to date, they get less and less useful as time goes on.
SYLVIA IZZO HUNTER: Beall's list also wasn't without controversy over the choice of which journals and publishers to include. What's different about the Cabells predatory reports database? Well, we'll hear a lot more detail from Kathleen in a moment, but for now I want to highlight two factors. First, Cabells uses a transparent methodology to identify fraudulent or predatory journals, so you don't have to guess what the inclusion criteria are, or how much weight is given to each one.
SYLVIA IZZO HUNTER: Second, they provide detailed violation reports, rather than just a yes or no judgment. Earlier, I mentioned the second most read Scholarly Kitchen post of 2019. In case you were wondering, the top post was about the Cabells predatory reports database. In a few minutes, we'll hear from my colleague, Liz, about how Inera's software solutions can integrate the Cabells database to flag problematic references in your content.
SYLVIA IZZO HUNTER: But first, here's Kathleen to explain how the Cabells predatory reports works.
KATHLEEN BERRYMAN: Hi, I'm Kathleen Berryman, Director of Business Relations at Cabells. Cabells evaluates academic journals. We help people make better decisions about journal quality. We specialize in identifying and evaluating predatory journals. Predatory journals undermine the integrity of scholarly publishing. They misrepresent themselves. They misrepresent the nature of their business.
KATHLEEN BERRYMAN: And, most importantly, they misrepresent their peer review process. They try to sell us this promise that doesn't really exist. Researchers that buy into these promises are robbed of the one thing that's at the heart of academic research-- peer review. If the community doesn't distinguish between real peer review and this peer review theater, we're allowing unverified and potentially flawed research to take root.
KATHLEEN BERRYMAN: This uncertainty extends beyond just the original articles, as further research sites and builds upon their findings. Cabells predatory reports selection criteria is made up of 74 behaviors that indicate deception in categories like integrity, peer review, website, publication practices, indexing, fees, access and copyright, and business practices.
KATHLEEN BERRYMAN: The behaviors we find in predatory journals range from very serious-- like including a fake ISSN, DOI, or editorial board members-- to very minor-- like making big promises, or having no editorial policies on their website. Each behavioral indicator, or violation as we call them, in our criteria is weighted based on how closely it relates to deception. Behaviors that directly indicate deception are weighted very heavily, while behaviors that are just commonly seen in deceptive journals are weighted lightly.
KATHLEEN BERRYMAN: We use these violations to create a rubric with which we evaluate suspected predatory journals. Any journal that meets our threshold for inclusion is added to predatory reports. Our rubric and weighting was specifically designed to avoid flagging journals that are simply new and inexperienced, or just low quality.
KATHLEEN BERRYMAN: This is an example of a journal card in predatory reports. Each journal card contains the basic identifying information for the journal, such as title, publisher, ISSN, and the date the journal was evaluated. Below this summary information is the journal's report. We categorize and detail each violation found in the journal to provide a transparent view of the reasons for inclusion. Our criteria and methodology were created to be as objective as possible.
KATHLEEN BERRYMAN: We continuously monitor trends in academic publishing and in deceptive practices in order to improve our ability to identify predatory journals. Verifying citations is difficult, and it's time consuming. And it falls on the backs of editors who are already struggling to meet the demands of research volume and speed. Cabells and Inera have collaborated to find a solution.
KATHLEEN BERRYMAN: We've integrated Inera's eXtyles reference tool with Cabells' predatory reports to flag predatory journals in the reference lists of submissions. With this integration, we're hoping to take some of the burden off of journals and publishers, and to help make the publication process more efficient.
ELIZABETH BLAKE: Thank you, Kathleen. And thank you, Sylvia. I want to begin my portion of the session by giving you all a little bit of background about Inera and our work with bibliographic references. So our two primary tools are called eXtyles and Edifix, and both of them include tools for automated processing of references. And eXtyles is actually celebrating its 20th anniversary this year.
ELIZABETH BLAKE: So those tools for working with bibliographies have been in development for 20 years now. And yes, we are still refining and expanding them, even now. So what we mean when we talk about automated processing of references with eXtyles and Edifix really falls into four categories-- parsing, editing, linking, and validating. And so, just to give you a little bit more background on those terms, by parsing, we're referring to-- our technology has the ability to identify the components of a free text reference.
ELIZABETH BLAKE: So for example, if we look at the reference at the top of the screenshot on this slide, that's just a reference that's been typed or pasted into a Word document. It doesn't have any tagging, identifying the elements of the reference, or any fielding. And what our technology is able to do is take that reference and identify all of the bits and pieces of it-- the reference number, author surname, author initials, year, title, journal name, et cetera.
ELIZABETH BLAKE: And once you've done that, you can do a lot of other very cool stuff automatically. So next we have editing, which is pretty straightforward. We're able to then rearrange and reformat the reference components to conform to either a standard editorial style, like AMA or APA or Chicago or custom style. We're also able to take advantage of now having, sort of, semantically tagged references to go out onto the web and try to link those references to online resources.
ELIZABETH BLAKE: And the primary online databases we use are Crossref to retrieve DOI links or references, and PubMed, to retrieve PMID or Medline links. And we can take that part of the process a step further and do validation, and even correction, of the references. So validation means we're checking the data in the reference, so not just editorial style in terms of order or punctuation of elements, but actually making sure that, for instance, the author's names are spelled correctly, or all of the information that is required is included.
ELIZABETH BLAKE: And if there's a conflict, or if there's missing information, we can pull in the correct information from Crossref or PubMed. So for example, the reference that you see here has been automatically copyedited in this case to AMA style. And we've actually also pulled in the issue number, which was missing, and the author's original, which is required in AMA style.
ELIZABETH BLAKE: Plus, I mentioned we have two products, and I'm going to be talking a little bit about both of them. So I just want to make clear the distinction between them. So eXtyles is a desktop application, and it's a plug-in to Microsoft Word. And it processes entire documents in Word, including the references. And you can see in the screenshot a handful of the tools that are specific to bibliographic references within eXtyles.
ELIZABETH BLAKE: Edifix is a more recent product, and it's a sort of sister product, which is just the reference tools that are available as a web service that you can subscribe to. So this is working on reference lists only, and it's something that's just available with a web subscription. But both of these tools use the same underlying reference technology.
ELIZABETH BLAKE: So I want to say it was a couple of years ago that someone first reached out to us to suggest that we might want to think about expanding our tools to automatically flag references to predatory journals, and a number of people have actually brought this up to us in the past few years. And it's really obvious why this would be a useful tool. Even if editors, authors, publishers have access to a reliable list of predatory journals, and that is a big if, as we discussed earlier, what are the chances that they're going to, when working on an article, go through the reference list and painstakingly look up every single journal title in that reference list to determine if it might be a problematic publication?
ELIZABETH BLAKE: So having something that can then automatically go through the reference list and just flag references to predatory journals, it's kind of a no-brainer in terms of what a time saving tool that would be. And it does make sense for us to be interested in developing that because we already have sophisticated data and processes within eXtyles and Edifix for identifying journal titles in references, which is a key element of that parsing step of, you know, trying to parse a reference into its individual components.
ELIZABETH BLAKE: So journal titles and references are obviously the key for this particular issue because it's the actual journal itself that we're trying to target. And it's also the major challenge of trying to develop tools to automate flagging predatory journals because predatory publishers play a lot of games with names. It's their goal to slip one past you, or as Sylvia said, to hide in plain sight.
ELIZABETH BLAKE: And they do that by naming their journals with titles similar to reputable journals, or even hijacked from reputable journals. But the goal is to have it sound official, to have it sound familiar, and therefore to just sort of slip past you when you're reviewing a reference list. And, of course, there's a lot of different ways that people writing articles and creating reference lists within articles can refer to journals, and abbreviated journal titles and references can make identifying predatory journals even more difficult because with predatory journals, the devil is in the details.
ELIZABETH BLAKE: It's often in the little in-between words, your prepositions and articles, that they make the slight change to be similar to the name of a reputable journal, but not quite the same. And then, of course, those words are what get dropped from abbreviated journal titles. And this is one reason why, actually, some people have suggested that publishers should stop using abbreviated journal titles at all in references, but that's probably-- that's unlikely to happen anytime soon because it's pretty established part of a lot of editorial style.
ELIZABETH BLAKE: But it does make this even more difficult, a problem. So one of the things that we want to do today is kind of give you an idea of the challenges of implementing something that's going to be accurate, and that's going to correctly flag a reference to a journal that you want to be aware-- maybe a predatory journal and that you may want to look into, and not incorrectly flag something that's legitimate. And so I actually called up my colleague, Bruce Rosenblum, and asked him to speak on this in a little bit more depth, just to give you guys an idea of the challenge here when we were developing this tool.
ELIZABETH BLAKE: And actually, it's a challenge that we've been dealing with for the 20 years that eXtyles has been around in terms of always having to refine how the automated reference processing works. So I'm going to switch over to Bruce for a minute, and let him give you some background on this. So with the real issue being two-fold, which is one, people who start fraudulent or predatory journals like to name them in misleading ways.
ELIZABETH BLAKE: They like to either use names that are very similar to those of reputable journals, or in some cases, even just outright hijack the names of reputable journals. So that's the first issue. And then the second issue is there's lots of different ways in references that the people who have created the reference list can refer to journals. So those are the two related issues, I think.
ELIZABETH BLAKE: So why don't you talk about that a little bit. And how--
BRUCE ROSENBLUM: Let me start with the second one. So in bibliographic references, authors can refer correctly to journal names in two different ways. They can refer to a fully spelled out journal name, where all the words-- so Journal of Psychology is completely spelled out, but the word journal and psychology spelled out. Or they can refer to journal names in abbreviated form. So J Psych instead of Journal of Psychology, or I think that might actually be J Psychol.
BRUCE ROSENBLUM: That creates an interesting set of challenges even for the most basic use of eXtyles because, if you have an author who's used what we call full spelled names, but the editorial style is abbreviated names, we have to be able to not only peel apart the reference, recognize the journal name as a piece of that correctly, but then be able to correctly convert it from full spelled to abbreviated, or vice versa.
BRUCE ROSENBLUM: And to support that over, now, 20 years, we've built up a database of more than 50,000 serial titles. It have not only the full journal name and the abbreviated journal name, but then also ISSN. And so, ultimately, we can try to resolve any given journal name to an ISSN key that we can then use to correctly insert the full or abbreviated name as appropriate. Now, that's the simple form because then we get into the more complex form where, what if the author has used an incorrect abbreviation or an incorrect full spelled name?
BRUCE ROSENBLUM: So part of what we do with eXtyles is we maintain a database of known bad alternatives, or incorrect alternatives, to journal names that we can then alias over to the correct form. So we can pick up names that are incorrect. And that works on sort of a one-off basis, but then there is a more global problem that showed up about 15 years ago when we had a customer who was processing a lot of bibliographic references that had been run through OCR.
BRUCE ROSENBLUM: And what made this particularly challenging, is in the editorial style that they were most commonly using, which was APA style, journal names are italic. And OCR, optical character recognition, and italic don't like each other. And we were seeing, as result, a lot of one character typos. So we'd see neural with an a, instead of neurol with an o, for neurology. And so that was when we put in yet another layer where we started using what's known as an edit distance algorithm to try to compensate for small typos where we thought we could handle them in a completely accurate way, but handle them when they were typographical errors.
BRUCE ROSENBLUM: So with all of this, it allows us to uniquely identify what each journal is. This is critically important because when we then start trying to match to a database of predatory journal names, we have to deal with the fact that, for example, Cabells data is only available in a full spelled journal name. So if the reference used in abbreviated journal name, we have to take that, convert it to that unique key, the ISSN, and then convert it to the full spelled name.
BRUCE ROSENBLUM: But beyond that, we're also sensitive about not over converting with our fuzzy matching, where if we encounter a name that we don't recognize, we recognize, hey, this is unknown to us, but then we can still try to do a rough conversion from, for example, abbreviated to full spell, and still do the look up in the Cabells database. And the reason why we have to be able to do this, even if it's an unknown journal, or a journal that may not have an ISSN, is because a fair number of predatory journals, in fact, never register for ISSNs.
BRUCE ROSENBLUM: And so we have to be able to be incredibly flexible in how we handle all of this. So the bottom line is, with looking up predatory journals, it's not just a matter of having a journal database. It's having the deep understanding of how journal names can appear in references. How do you then normalize those to a standard form, handling it with some degree of fuzzy matching, but not too much that you then take a real journal name and match it to a predatory journal, or a predatory journal name and match it to a real one incorrectly.
BRUCE ROSENBLUM: And then be able to do a look up into a database that only has full spelled journal names, and reliably report when you've got a citation to a predatory journal. So that's why we think Inera is uniquely positioned to work with Cabells on this technology of identifying citations to predatory references.
ELIZABETH BLAKE: All right. Thank you, Bruce. And so with that in mind, we did build an integration with eXtyles tools, of what we're calling eXtyles Cabells reference checking. And so this is a screenshot that shows you the results on a sample reference list. And what you can see is that the flags come through as word comments. So for instance, this first one lets us know that the journal titled J Gastrointest Dig Cyst is found in the Cabells list of predatory journals.
ELIZABETH BLAKE: Please check Cabells.com see the violation information. And so you would go to the Cabells website to find out the specific violations of this particular publication. And, relating to what Bruce was talking about, we're even able to do this for references to journals that we don't have in our own eXtyles database. So this second flagged reference, you can see, we got a comment saying eXtyles does not recognize this journal title.
ELIZABETH BLAKE: But we're still able to flag it as in the Cabells database, which is important because the eXtyles database, which is an essential part of our reference processing is probably not going to be as up to date on predatory journals as Cabells is. It is a living document, and we do update it regularly, but this way we have assurance that we're going to be able to flag titles that are in Cabells database, even if they aren't in our own.
ELIZABETH BLAKE: So that solution for flagging references is available to current eXtyles customers if they have a subscription to the Cabells database. But we want to also make this technology more widely available, and the best way to do that is to integrate it with our Edifix reference processing web service. We have not implemented this yet.
ELIZABETH BLAKE: In fact, we're just getting started on doing that. But it is something that we plan to do very soon. And one of the really nice things about Edifix is that you can sign up and start using it right away. So eXtyles is a custom workflow solution, but Edifix is just something that anyone-- authors, freelancers, editors-- can sign up for. So one of the things that we want to find out from you is if you'd be interested in testing this solution.
ELIZABETH BLAKE: Please do let us know. Reach out to us. Our contact information is in the presentation at the end. And we really do want to hear from people who are interested in potentially deploying this, or at least checking it out. So this is a screenshot from the Edifix website, and you can see here this is just a list of references that have been pasted into the web form.
ELIZABETH BLAKE: And you can see the options on the side include linking and data correction, PubMed correction and PMID linking, Crossref correction, and DOI linking are currently available. And ideally, Cabells checking will be in there quite soon as we've sort of mocked up here. So I hope this is of interest to people, and like I said, please do reach out.
ELIZABETH BLAKE: But I also want to end on food for thought for considering how you would deploy this kind of technology in your own publishing workflows because developing an automated reference flagging solution is really only the first step in addressing the problem of citation contamination because then comes the question, what will you do with the results? And when we showed the eXtyles integration at our most recent user group meeting, it opened up a really interesting discussion about exactly what the implications are of flagging citations to potentially problematic publications.
ELIZABETH BLAKE: So here are some questions that I'd like you all to consider. First, at what point or points in your workflow would you run this step? Would you want to run it at submission for a very early warning on a possibly problematic reference list? Or would you want to run it during the copyediting stage, which, frankly, is when our tools are most often used currently?
ELIZABETH BLAKE: Or at both, potentially at multiple stages in the process? Whose job would it be to review and follow up on the results? So you get a flag. That alerts you to a problem, but it doesn't solve your problem for you. You still have to decide what to do with that reference. Is it the author's job to go and verify that this is legitimate research? Is it the editor's job?
ELIZABETH BLAKE: If it's the editor's job, which editor are we talking about? Are we talking about a subject matter expert, or are we talking about a copy editor or a manuscript editor? Whose responsibility is it to follow up? A reference to a predatory journal isn't necessarily a reference to bad research. So if you get flagged references, would you always remove the reference, or in some cases would you keep it?
ELIZABETH BLAKE: If you decide to keep it because you believe that it is a legitimate piece of research, would you alert the reader to the journal's inclusion on the Cabells list? These are questions that came up in our discussions with our customers at our meeting last year. Cabells uses various criteria to identify journals as problematic, as Kathleen talked about earlier. Would different types of best practice violations lead to different editorial decisions?
ELIZABETH BLAKE: Would you feel like you had to set up a different workflow for a peer review violation versus a website violation, and so on? Ultimately, who decides if a reference to a predatory journal is a problem in an article? Is it the author of the paper? Is it the publisher of the paper? Or is it the reader of the paper?
ELIZABETH BLAKE: And so these are all questions that I think everyone who wants to address this problem and come to some solution for is going to have to think about. Whether they deploy our technology or something else, these are all questions that are not that easy to answer, and we hope they open up some interesting discussion. So thank you very much for your time. As I said, please do reach out if you're interested in learning more or even trying out this technology collaboration, and we look forward to your questions.
ELIZABETH BLAKE: Thank you.