Name:
NISO Update 1 Recording
Description:
NISO Update 1 Recording
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/befc96e5-50bf-44aa-b610-c87168c622ca/videoscrubberimages/Scrubber_10.jpg
Duration:
T01H17M05S
Embed URL:
https://stream.cadmore.media/player/befc96e5-50bf-44aa-b610-c87168c622ca
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/befc96e5-50bf-44aa-b610-c87168c622ca/NISO Update 1 .mp4?sv=2019-02-02&sr=c&sig=Sub3vm7QMBdby1rPyFmdCylZY484gkTYCY9KmtkNkgw%3D&st=2024-11-23T19%3A53%3A10Z&se=2024-11-23T21%3A58%3A10Z&sp=r
Upload Date:
2024-03-06T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
This is the first and I'll be saying this again, because people are joining. So I will continue to welcome people as they join. This is not a recorded session. This is an entirely live session. We're happy that you're here. Happy that all of you are making it into Zoom oak.
A nice big crowd.
And let's see when I see the participants have sort of stabilized, a few more people joining. And now I'm making sure that our speakers are all here. It looks like we do have all the speakers and. The participants have stabilized.
So I think everyone, just about everyone who's coming is in here. So thanks, everyone. Welcome My name is Nitti Lagasse. I'm the associate executive director at ISO. And together with my colleague Kendra Bailey, we manage all of the ISO working groups that create the recommended practices and standards and white papers that are the ISO outputs, hopefully things that are helpful to you in your work.
So here at ISO plus, we are hosting to ISO update sessions, which are short updates on different projects. And I'm really, really pleased that we have five projects to discuss in this session this morning, this afternoon, wherever you are, I'm very pleased to be joined by Caitlin Muhammad, tommie, Jeff, Noah and Robert, who will be talking to about the things that they're doing together with ISO.
So the way that we're doing this, it's just alphabetical by project. So Kaitlyn is going to speak about crack and that is alphabetically. First she will say I think it's a different format for each presenter, but short presentations each about 11 to 13 minutes. I can provide breakout rooms at the end of the session, so we'll hold all the questions to the end so that everyone has time to talk about their thing.
But if you've got questions for any presenter, we'll set up breakout rooms. That's worked pretty well in previous ISO pluses depending on the time for that. So Caitlin baker, I would like to introduce you as the co-chair of the crack communication of retractions, removals and expressions of concern. And please, the floor is yours. Be wonderful.
Thank you, Nitti. All right. So I'm just going to go ahead and share my screen. So hopefully you can all see this. If you can't, please let me know. I see it now. OK, perfect. Well, hello, everyone. As I mentioned, my name is Caitlin Baker.
I'm the discovery technologies librarian here at the University of Regina. And I'm going to be speaking on behalf of the communications of retractions, removal and expressions of concern working group, which I co-chair, along with my wonderful colleague Rachel Lamy from crossref. Now, our working group was established in May 2022 following the approval of the work item by the initial information, discovery and interchange topic committee back in August of 2021.
We are a relatively large group, as evidenced by the very small font that was necessary to get all of us on the slide. So we have 27 members currently. We also have six former members. And cumulatively these individuals represented 30 different organizations. And this was really a range of roles and perspectives that individuals had.
So it was everything from publishers of various sizes in different disciplines aggregators, researchers, libraries and others. So we really did want to get a broad cross-section of folks who would be interested in this topic. And would be essentially engaging with this issue of retractions, expressions of concern and corrections or removals, rather not corrections.
So together, our aim is to address the question that you can see on the slide here. So once a decision has been made to retract, to withdraw or to publish an expression of concern by an appropriately authorized organization, how do the scholarly communications ecosystem and other information consumers become aware of and share information about the status of the original object? So essentially looking at what metadata standards and associated workflows help to facilitate the clear, timely communication of retractions, ios's or expressions of concern and removal so that the scholarly record can be effectively corrected.
So many publishers and aggregators do already have some internal processes and practices in place to determine how retractions are issued, how things are communicated, how those are received. However, despite those internal practices, whether those are for retractions, ios's removals, the continued use of these publications that should be impacted by these actions does continue and that use without indication of their status, by which I mean the fact that, you know, a retracted publication has been retracted, that still remains relatively pervasive.
There have been quite a number of studies that have looked at how people are still citing retracted publications following the retraction of that publication. And generally, they found that a pretty high proportion of those studies or of the citations, rather anywhere from 50% to 95% of the citations to retracted research, don't indicate that a retraction has occurred.
Furthermore, when we consider how retracted publications are represented across different platforms and different journals, and sometimes even within the same platform and the same journal, there is a lack of consistency. Retractions may be noted differently, and sometimes they're not noted at all. And so the pervasiveness of the problem and then that variability of the representation, the display of that metadata, and then the just different range of internal workflows led to that initial development of the, the approved work item and then subsequently to the work of, of our group.
So how did we go about tackling these issues as previously mentioned. The working group was appointed in May 2022. In July we finalized the charge and the initial work plan, and at the moment we're in the middle of what we're calling our phase two, in which we're working on the initial draft of the recommended practices document.
So this work was developed following a six month information gathering phase, which was our phase one, and that ran from July to December 2022. And during the information gathering phase, we formed two subgroups. So we had a publisher subgroup and an aggregator and user subgroup, and those two subgroups were considering the different workflows and processes both that were currently in place, as well as those that would be ideal from each perspective.
So that of the publisher, you know, broadly speaking, a data producer and then that of the aggregator end user or broadly speaking, a data receiver. And this structure was really helpful because it provided an opportunity for similarly in-depth conversations about the unique needs and the challenges and the practices, both of the groups of the subgroups together, as well as the different individuals and their respective organizations represented within each of those subgroups.
In addition to the subgroup meetings and activities, we continued meeting as the larger working group to facilitate information sharing between the different subgroups as well as with the larger body, and to see where we had points of overlap or potentially disconnect where similar or different questions were arising, as well as to kind of seek feedback and clarification about what the different groups were doing or how certain practices did or did not manifest for us.
So we're now in the process of synthesizing this work, including developing some mandatory and optional metadata fields and some guidance on how to use those fields, as well as the associated workflows that would help to facilitate that metadata creation, transfer and display. We're aiming to have the initial draft completed by the end of March and then to have a public comment period between April and June of this year, after which we would be revising the document, responding to those comments with the aim of releasing the final ISO recommended practice in September 2023.
Now, given that the conversations and the initial drafting are still ongoing, it would be a little bit premature of me to share specifics regarding, you know, metadata elements or workflows, because those are still being refined and being synthesized at this point. But I did want to provide more information about the scope and the content of what will be covered in the recommended practice document.
And in broad strokes, some of the themes that emerged from the work of the two subgroups. So our publisher subgroup was investigating the existing workflows and practices of publishers, as you might guess from the name. So looking at things like how publishers and if publishers update PDFS, titles, other associated metadata following a retraction or an EOC or removal and how statements of retraction are issued.
There was a really broad range of practices that emerged which were somewhat reflective of the different organizational structures and the different organizations that were undertaking this work. So a broad range of activities, although certainly some points of similarity and overlap. The aggregator and the end user group, in contrast, was kind of looking at the opposite end of the workflow in that it was looking at the workflow from the point of receiving the metadata and really diving into the range of the metadata that are being received by different aggregators, looking at things like the different metadata fields that were included and what was kind of essential versus what was perhaps just included by way of the workflows and the processes of the publisher and also investigating the ways in which the data were transmitted and how those varied between publishers.
So looking at not only the content but also at the structure and the format of the work and very similar to the publisher group noting the range of activities that were taking place. And how that could vary pretty significantly, both between the publishers who were providing the data, but also for the Gators who were receiving that data, how it was that they were then subsequently operating based on that information.
So broadly speaking, the findings of both groups were really reinforcing the need to have a widely applicable set of recommendations, at the same time emphasizing the need for some practical strategies and some guidance on the implementation of for a range of different contexts of these different recommendations. So in outlining the scope of the forthcoming recommended practice document, I do just want to start by noting what's not going to be covered.
So namely, that we're not addressing why things would be retracted or removed, or why an EOC would be issued, or when it would be appropriate to do those things. Best practices about those activities. And that decision making process have already been established by organizations like COVID. And so our focus was really instead on how we can facilitate the communication following those activities and following those decisions having been made.
And so I'll just note that for expediency here. I'm just going to refer to retractions. But in this context, just please take that to mean not only retractions but also eeoc's and removals as well. So in this focus on metadata, we are considering both the metadata associated with the retracted publication and the notice of retraction, but also related scholarly objects.
So things like data sets, preprints, supplemental materials, conference presentations which may be housed in repositories or disseminated through channels other than the channel through which the retracted publication or the notice was disseminated. And one of the reasons we chose to include these sorts of scholarly objects, rather than limiting to purely the version of record article, is because one of the underlying aims of this group is really to ensure that the recommended practices are as broadly applicable as possible and also account for as many possible variations as is reasonable.
And to that end, we've been considering both the ideal state, you know, essentially the perfect record, as well as the bare minimum necessary to facilitate effective data transfer and display. And we really wanted to ensure that these practices would be applicable to publishers and to aggregators of any size and with different levels of resourcing, rather than being inaccessible to smaller or less resourced operations. And we're hoping within the document not only to articulate the metadata fields, but also to offer guidance on how to implement the recommendations in different contexts and how to interpret and use the metadata fields in order to operationalize those recommendations.
So as previously mentioned, we aim to have the draft of these recommendations available for public comment in April. And I hope that when this document becomes available, I'll take a few moments to review what we've prepared and to provide your feedback so that we can really ensure that this is and these recommendations are as meaningful as possible to the broadest possible audience. So on that note, I'll just thank you for your time and attention, and you can see a link to some of our materials here.
And I welcome your questions at the end of the session. Thank you. Thank you so much, Caitlin. You have crocus, one of our newer groups, and it's very interesting to see it moving so well. So next, our next speaker and next project will be Mohammad Hosseini, who will be speaking about credit the contributor roles taxonomy. Mohammed, thank you for joining us today.
I assume you can share your slides, but I cannot start my camera. Oh, that's odd, because I thought I saw you a few minutes ago. Yeah, but it now says unable. Oh Yeah. Here you are. Yes thank you. The live version?
No, Thanks. Thanks, neddy, and welcome, everyone. I'll start sharing my screen. Excellent so my name is Mohammed Jose. I'm a post-doc at the Department of preventive medicine here at Northwestern University in Chicago.
I've been working on the ethical issues related to scholarly attributions and ethics of ethics and integrity of research in the past couple of years. And at the moment, I'm a member of the credit standing committee, so credit is the acronym that I'll use a lot in this presentation, which stands for the contributor role taxonomy. So today I am going to start with an introduction and explain why authorship matters, because that's basically the crux of what a credit standing committee and credit does to improve authorship standards and authorship of attribution.
And then I'll introduce the credit taxonomy. I'll talk about some of the current and forthcoming activities, and then I will introduce the members of the Standing Committee. So to start with, there is a consensus in the scholarly community that. Contributions to publish research should be transparent, meaning that they should be unambiguous, they should be accurate, meaning that they should correctly reflect what happened during the research process.
There should be evident to readers, meaning that readers should be able to find them and see them and they should be machine readable so that we can create resumes. We can make sure that we can tally different people's contributions and we can allow people to use these contributions for all kinds of evaluations and promotions. That said, for the most part, the scholarly community uses authorship as a way to clarify contributions.
And although these are central to academic, scholarly and reward system, in the past couple of decades, there's been a major increase in the average number of authors on scientific papers. In a paper that was published in 2022, I clarified some of these factors that have made a contribution to an increase in the average number of authors for publications.
I think the most the six most important factors are these there is no order to them. It's just the way they're presented. So I think methodological complexity is research is becoming increasingly more complicated. Thereby, we need more people. We tend to recognize more technical roles. Research is becoming more internationalized. There's a lot more collaborations between universities that are located in different countries or continents.
There's an institutional pressure to publish more. And furthermore, we as researchers, we all have our own egos and we want to be successful and famous and all that. So that also plays a role. And there's also various forms of undeserved authorship. So one of the challenges of having many authors is that using authorship definitions becomes increasingly more complicated.
So if you remember at the start I said, we want contributions to be transparent, we want them to be machine readable, we want them to be accurate and all that. But when authors. Per publication when the average number of authors per publication increases. We face various challenges, one of which is that we cannot always apply definitions of authorship.
For instance, several researchers have explored the most recent version of authorship guidelines, for instance, provided by the International Committee of Medical Journal editors, and argued that many members of research projects cannot really meet all of these criteria. Just as a refresher for those who don't know these guidelines. CMG authorship guidelines suggest that in order for one to be an author, one has to have made substantial contribution to various aspects of the project.
You also have to have been involved in drafting the work or revising it. You also have to approve the final version of the manuscript, and also you have to agree to be accountable for all aspects of the work. In response to those critics that said, it is not always possible for all members of a research project to meet all four criteria in order to become an author.
The IMG guidelines suggests that whether you are an author or not an author, you should mention contributions in the acknowledgments section. In fact, it is even good. It is good practice to mention contributions of all authors and non authors in the acknowledgments section. So this has created another problem, which is that when we want to use acknowledgment sections, we do that using free text.
And when we use free text to describe contributions, a range of issues happen. For example, we may use various synonyms and describe contributions with a great variety. So let's have a look at three descriptions that I have developed for this purpose. And the first one is, let's say Andy Bradford collected data and was involved in data curation. So one is Andy Bradford investigated the samples and clean the data.
The third one, Andy Bradford complied the data set and ensured satisfactory data quality. These are all descriptions that could potentially appear in the acknowledgments section or in the contributors section, and to be honest, they could all relate to the same thing or they could relate to completely different things. But it seems like a B in all three instances has done something that has a lot in a lot in common with all the three other abs.
And this is one of the major challenges of using free text because it reduces reusability, machine readability, and it also reduces transparency. In response to this credit taxonomy was developed to standardize descriptions of contributions. So as I will explain, it currently has 14 roles, but in standardizing contributions, it also helps us address some other issues, such as transparency and accessibility of contributions and re-use of contributions.
These rules were developed with the help of researchers and editors, and they have been used since 2014. Um, this is the list of roles we have at the moment. We have 14 roles and. Each role comes with a very succinct definition. So, for instance, the role of formal analysis is. It comes with this definition that application of statistical, mathematical computation or other formal techniques to analyze or synthesize study data.
So if one of the contributors has made a contribution that can be described in this way, they will be tagged with this formal analysis contributor role so that the whole world knows what their contribution was. and as you can imagine, this helps us capture work in a much more granular way. Since 2022, credit became an ANSI standard, and since then, ISO is its custodian.
And since then we also have been able to have the standing committee and we are hoping to have further, further developments. And I will explain some of these. But what is very interesting, I think, is that credit has been integrated into various scholarly platforms. For instance, since its start, it's been integrated into various submission systems such as.
Editorial managers, scholar one and SAP. It's also been only mentioning a couple, so I'm just mentioning two examples here. It's also been used as a sort of platform for another application called tenzing, which is an application that uses credit to help researchers track their contributions as they go in terms of its integration in editorial manager. I'm going to use a Plus 1 as an example.
And they adopted credit in June 2016, and the credit taxonomy replaced their five term contribution list. And the way it works is that the corresponding author who is publishing the work clarifies individuals contributions. And these are both human and machine readable, as I will show. So this is what the corresponding author would see.
This is the sort of interface that the corresponding author would see when they're submitting the paper. They can name different authors, and they can say they can take these boxes and say, this person did this role, that role and that role, or this person is corresponding author or not. And once the paper is published, these contributions appear on both the website and on the PDF version.
So in the website, once you click on one of the authors names, for instance, the retailer, once you click on that name, you see the roles they conducted. And these are the credit roles that the formal analysis investigation methodology. Once you click on. The second tap the authors tab. You can see all authors with all their contributions according to the credit standard.
And when you open the PDF version, you see it in a different way. But still they are all visible and evident to all readers that these are the contributions of different authors. And one advantage of using credit has been that it has allowed us to do more research on research. For instance, one of the members of the credit standing committee, together with two other contributors, analyzed the plos one data and published a paper about it in 2021.
This paper showed us that, in fact, women are less involved in some prestigious and highly rewarding roles, such as supervision and funding acquisition, but more involved in time consuming tasks such as data curation and investigation. If it wasn't for standards such as credit taxonomy, you wouldn't be able to capture these nuances because all of these contributors would have been named as authors like first author last or whatever.
And we might have made various different interpretations about what first or last or middle author meant, but we would never have the granularity that we see right now. And this is one of the major improvements that I think this kind of data collection allows us to look at current practices in a more critical way and think about how it can be improved. Tenzing is another application.
As I said, it allows collecting contributors contributions as project proceeds. So for instance, you can say you can list your contributors and say this person did conceptualization that role that role. And this can be something that. Researchers and research projects can go back to as the research is going forward. And they can improve it, they can add notes to it, they can do all kinds of stuff that will help researchers and the corresponding authors or supervisors in the end to determine who should be where in terms of authorship and who not, who can be listed for what contribution are not.
That being said, it is advised repeatedly by those who have developed credit taxonomy that credit rules should not be used to define authorship. So this is a more granular way. This is a whole different paradigm. It is only meant to explain what contributions have been, not who has been an author. Have about 90 seconds. All right.
What's next? So we want to create more awareness about credit. We have some plans about outreach activities. We have developed some personas that we will hopefully be able to publish them this year. We are working on a couple of short videos for researchers of all disciplines to be able to see what credit is and learn about it in a more interactive way. We are also developing more resources, so using a recent review and some community feedback, we are hoping to be able to develop specific instructions on how to use credit and we are also hoping to launch the community of interest for credit.
This is the member of. These are the members of the credit standing committee. We have a very interesting mix from researchers like myself to former and current University administrators, publishing publishers, editors, members of crossref or kid librarians. So we have a really good mix and that I think, allows for moving the credit taxonomy forward in ways that satisfies the interests of various user groups.
If you have any questions, you can always reach out to the nisoor email or you can follow the credit handle on Twitter. And if there's any more questions, I'm more than happy to answer in a breakout session. Thank you so much. Thanks Muhammad that was really nice. And I appreciated your enhancing the ISO roster listing with the photos of all the standing committee members.
So nice. Thank you. So next, we have a talk about Jets. And to start the conversation, we've got Tom Houston of multi mulberry technologies and Jeff Beck of ncbi. They are the co-chairs of the jet standing committee. So because we're in a Zoom meeting, I don't have a way to I don't know, put move you together.
Tommy and Jeff are you're going to have to stay in your Zoom boxes. But we're interested in what you have to tell us today. We'll stay in our lanes. So we did an update like this last year, and it was I think it was very successful. I know that both Tommy and I enjoyed doing it this way. So instead of doing the presentation on what is jetzt and what's to come, we're just going to have a conversation where we ask each other questions about things and hopefully get to all of your answer, all of your questions that way.
So I guess I'll start off, Tommy asking the first question, which is. How did you get into the small business? How did I get into the XML business? Well, I got into the XML business because I was in the SGML business. And for those of you who aren't antique XML is what many of us thought of at the time as the easy to implement piece of SGML.
In fact, that's kind of ridiculous. There's a. Of the hard things are as hard as they ever were. The things that can be automated are now easier. I've been working with text documents and making text tractable since 1980. Mumble and that's how I got here. I'll turn it around. Jeff, how did you get into this game?
Paul I started off as a copy editor on medical journals, and when I realized that my company was wanting to stick with ink on paper, I went out and looking for something new. Wound up the National Library of Medicine when PubMed central got started in 2000. And at that point, they needed somebody to do some document modeling and no one else was willing to do it.
So I started doing it. And that's where I met you. And started working with mulberry on the TVs and then now chats. So probably we should ask the basic question that we just assume everybody knows, but that is what is jazz? I skipped over that one. OK well, let me take a quick cut at it.
Jazz is a tag set that was invented for interchange of journal articles and has grown from there to being used for the creation and management of journal articles and other materials that are created, managed or served in the same environments in which journal articles that are very similar to journal articles.
And let me turn that around to you, Jeff, and say, so where is Jets being used and who's using it? Oh, that's a good question. It well, we, we use it in PubMed central. We convert all content into digits to load them to PubMed central, which is the full text archive for journal articles that we have at the National Library in medicine. And we know from people who send content to us, we get about 80% to 85% of our submissions that are already in jetzt or in an DTB version.
So we know that it's being pretty widely used in the journal publishing world. I know that it's being used at portico to support their archives. And it's being used internationally. I know that the disk in Japan uses it to support their publishing in Japanese journals.
So it's become pretty ubiquitous in journal publishing. We have an extension, which is called bits, which is it takes the jet's objects and allows you to build books out of them. And that's become pretty successful for people who already have experience with Jets who also have to tag books, whether they're textbooks or other types of books.
So let's see. What is the status of Jets 2.0? I've heard a lot about it. And what does Jets 2.0 mean? Well, let me start with the easy part. What does Jets 2.0 mean? Jets 2.0 is a conversation about the first non backwards compatible.
What I think of as clean up version of Jets like all things created by human beings. After a while of using it, you realize that there are things you could have done better. And some of the things that we could have done better when we created Jets in the first place, we can. Fix or ameliorate in a way that is completely backwards compatible.
And that's what we've been doing with Jets 1.1, 1.2, 1.3. And there are some things where, you know, it really would have been better if we had done it a completely different way, if we knew now, if we knew then 15 years ago, something that we know now, we would have done it differently and we would have done it differently in ways that we can't just sort of slide in a backwards, compatible way. So we're having conversations about how we can, as they say in the jargon filled world, pay up, pay back our technical debt.
And those conversations are ongoing, in my opinion. The most valuable thing about those conversations is that they are deep conversations about what this tag said is and how it should be. And in the short term, many of the things that are being brought up in the conversations about how we could restructure Jets to make it cleaner are turning into things that can also be done in the short term, in the backwards compatible ways, and that will eventually lead to a better, cleaner tidier, non backwards compatible Jets at some point.
Well, I have a follow up question. Should I be terrified of this? Of course, but also but also excited about it. You should be terrified because you are one of the many people who works in an environment that has an existing system that is using Jets and existing system, many, many existing systems and huge investments in infrastructure that uses the existing text and.
Any time anybody's talking about changes to that, that's a big deal. That's expensive. It's frightening. It's a huge investment. And the jet standing committee knows that and is not casually saying, OK, fine, you're all going to have to come up with millions of dollars and completely change all of your business processes right now in a week.
Nobody's saying that because that would in fact be quite frightening and seriously stupid, and the community would do the only appropriate thing to do, which is completely ignore us if we told them to do that. On the other hand, you should be excited, because there are things that you cannot do. Now with your JATS documents that you probably want to do now. And certainly there are things that you will want to do in the future.
And the jet standing committee is working on ways to make that happen. So this is encouraging and exciting. Excellent now I have a question for you. Anybody who has read the program for nice Plus this year, which means everybody who's listening to us is aware that metadata is not only a big deal, it is becoming a bigger deal. And more metadata requirements are popping up all over the place.
New identifiers. You can't even keep track of all of the people who are coming up with important and interesting identifiers and coming up with. Many of them call them requirements. I'm going to call them guidelines. I don't care how many people sit-in a Zoom meeting every Tuesday forever and write requirements. They're not writing requirements.
They're writing guidelines. But there are lots of metadata guidelines and lots of identifiers coming up. How is JATS going to deal with all of these various identifiers and all this assorted metadata? Pulling up big guns. So the one thing we try to do as the standing committee is to. Build our document models so that we can represent.
Journal articles or documents that exist in the real world. We also try not to make our models our element names or attribute names specific to. Individual things. So one thing we want to do is we want to be able to allow identifiers on whatever objects within an article, metadata or in the text of a document that needs to be identified with an identifier.
And we did this with the let's see, we've done this for contributor identifiers. We have there's a very popular contributor identifier out there called the ORCID ID and we got an early request when ORCID was becoming popular to add ORCID IDs into chats. What we did with that was instead of adding an ORCID ID element or an ORCID ID attribute, we were, we, we added a contributor ID element to the contributor.
And then you're able to add both the name of the identifier and the value of the identifier so that when some other contributor ID comes along, if contributors start getting assigned dyes, you can then use the same element and identify what the type of identifier is along with the value. So that's really the way that we're able to handle that.
And that works both in metadata for affiliation IDs like raws, which we just have a general affiliation identifier or we can assign identifiers on objects within the flow of the text. If you have to identify a. Some product with an identifier or.
Thing like that. You've got time for about one more question. Let's see. I'll skip over. Please explain XML namespaces. I really like Marco. Are there any good markup related conferences I could go to this year?
Well, let's see. The ones that I have in mind going to I'm going to go to markup UK, I think. Let's see. Declarative Amsterdam is worth going to. I'm a big fan of Bali sage, the markup conference and all coming up.
Most important and most directly to this talk is jazz con, which is coming this spring. Excellent great. Oh, sorry. I was going to say, Nettie, I think we're out of time. We could talk about this for I. I understand. Yeah, I think it's a great. And now I discovered the spotlight feature in Zoom. Thanks, Kendra.
So, but I will now remove your spotlights. Tommie and Jeff, thank you so much for your chat. I still have a lot of questions. We've got a Zoom room for that in a few minutes. So our next talk in alphabetical order is on k bar and we're lucky to have no. 11 Noah has been a co-chair of the k Bart standing committee for oh, it's been a while. So you are very knowledgeable in all things k bar and good describe her of what's the most up to date information on knowledge bases and related tools.
So thanks, Noah, for joining us. Thanks, Nettie, and I'm going to share my screen and bring out my presentation and after Jeff and Tommy's presentation. I'm sorry, guys. You're just going to stare at some slides. I think next year I'm thinking of doing AI can spotlight you. Oh, sure, why not? But yeah, I think next year I'm going to do like a, I'm thinking a metadata who's on first kind of routine.
It'll be both informative and you'll also all find it hilarious. So let me share my screen. 2 seconds. OK so hopefully you all can see my presentation. You bet. Oh, great. And let me start this.
It has taken me my entire career to figure out finally, how to make a slide full screen. I think I've finally done it. You go? All right. So as Nitti said, my name is no. 11 so I am the standing committee co-chair. I also work at Springer publishing as their manager and metadata and digital asset management.
I've been working in publishing for something like 20 Plus years, working at various different publishers and always looking at their metadata. So starting off, who is our who is our standing committee? So here you can see a plethora of names. This is our current roster, myself and my co chair, Andre, you can see and you also see people from the various different knowledge bases, discovery services, content providers, libraries and so on.
So really a nice cross section of the industry I kept in my what is k part slide. I think originally I took I took this out and truthfully, you know, I think a lot of you probably know what k Bart is, but I don't want to assume everybody does. So it stands for knowledge, bases and related tools. It started off as a co working group back in 2007 and then has been taken over solely by now.
So starting in 2014, this slide, I actually don't know where this slide originally came from. One of my colleagues made this years ago and I've been reusing Christine stone at x leverage. Well, thank you, Christine. If you're here because I've been using I've stolen your slide for years now, mostly to add color to my presentations because mine tend to be voice and text heavy.
So what, what, what I'm really getting at here is in the landscape is, is what is KBR trying to do. So you have within the landscape, you have content providers who are selling content. They have title package lists. You have they're selling that, of course, to libraries and institutions. Libraries and institutions also utilize their discovery services and then they need to know what's in these packages.
The discovery services and knowledge bases need to know how do we link to this content? What is it exactly that these libraries and institutions hold? How do you get to this? And that's really what art is about. It is a human readable format showing what is this content? What are these sellable packages that content providers are offering?
And very importantly, how do you how do you link to this content? So when it's done properly, it's something that is virtually invisible from the user standpoint, like most metadata standards. So instead they'll just know, like, hey, we went into our discovery service, we search for content and behold, it came on and Bart is a big part of that.
So a little bit of our history. So originally Kephart was published in phase one. Back in 2010, it was very much looking to improve on open URL for linking and it was very specifically for journals. In 2014, phase two came out and that included then ebooks and conference proceedings, open access and how to deal with consortiums.
In 2019 we had the Bart automation working group and we put out some we put out guidelines for how to automate your k bar. So with k Bart, you have kind of these publicly accessible lists that in terms of like, oh, this is the title package that a content provider has put out. However, that is from the library standpoint and institution standpoint, very manual.
You all are dealing with a lot of us. I am not the loan content provider you purchased from. So the idea was, is how do we automate this? And also those package lists, although they work well for describing what's in the sellable the sellable content. Kay part automation very much also is librarian institution specific. So perhaps you have grandfather titles that aren't in those that is not in those packages.
So you've been given occasionally access to a couple of different things. So that really takes into account very specific. So you do a one time setup and then after that year your CBR automatically goes to your knowledge base after that set up. So that was in 2019 that was published as of 2020. We've been working on k bar phase three. So, so here's kind of like the timeline, you can see here.
We started in March 2020. You can also anybody who's knowledgeable of history also realizes that is in the world also went through a lot of changes that is not phase threes doing that's just a coincidence. That happened probably a week or two after we started our work. So we've been largely in this period of looking at our different ideas that we had presented for our phase three development.
We've since broken into subgroups and we've done really a lot of work, and I think we've gotten through a lot of it. It's phase three was a very ambitious plan. I'm proud of it. So but as a side result, it's definitely been a good deal of work going through things. So looking at phase three, I actually. What what I've done is I've kind of broken this into what we've completed versus what we're currently working on.
Let's look first at our completed. Initially, I think I had these in kind of the order that we created this stuff in. I left my little sunglass emojis in here because I'm very proud of the fact that after staring at a screen for several hours, I figured out how to put an emoji in here. So I've left this in here for you to look at. So the first part we've looked at is clarifying our current recommendations.
The at least in my head, the way I like to think about that is I remember learning years ago my first publishing job that with dictionaries. And interesting things is that dictionaries tend to all copy all the different definitions over the years. And I was very surprised to learn that. So you have like 100 years worth of a definition not fully changing, and as a side result is a little bit like a game of telephone.
So sometimes maybe there was initially mistakes, sometimes things just kind of crept in or sometimes the scope of it changed. So, so in that publisher, they kind of did a complete refresh where they looked over every single definition. And that's really what we did. I think initially we put clarify current recommendations as an easy item. And I think since then it got moved upwards to incredibly hard.
But we did complete that and this was basically we really opened the hood of the car and we looked at absolutely everything that was in there. Some of it going back to phase one, we really looked at each piece, held it up to the sun before putting it back in, trying to make sure everything really made sense, making sure there was better examples in there, simplifying recommendations.
Also, we revised a lot of our guidance, like how do you handle withdrawn titles? I think that wasn't really in previous versions kind of improving things that caused content providers and problems like gap coverage. You know, like so we're offering this journal, but it the journal itself didn't exist for like two years. How do you how do you show that in your k part? So kind of tightening that up a little bit.
And also one thing that I think will help is also access types. So in the past with phase two, when we're looking at open access, that was something where we offered basically two codes, FNP. We said if it's 100% free, put an F there, otherwise put P for paid. And that really didn't. I think we knew at the time that wasn't going to entirely work, but we didn't want to get into the weeds of looking at each article.
And chapter since k part is very title level. So we've added in this version, we've added a new mx code which is for mixed. So that's for paid, mixed, paid and free content. That's hybrid access. It's also to show when say over a journal's lifespan may be a year of it is free content and the rest of it's paid and also giving better recommendations as well in terms of how do you show your open access content maybe that needs its own package and that works in conjunction with that access type.
Another part of that is our file guide group. So sometimes called file manifest. That is specifically just an overview, a quick overview of what does a content provider what files are they offering? So just kind of like it's almost like a table of contents, if you will, showing like here's all the different package lists, also showing maybe there's stuff that's no longer sold but is available on a platform, also showing open access, also taking into mind maybe packages that are sold just on an article and chapter and doesn't receive CBR.
I'll get to that in a second. We've also completed putting some licensing language in there, our mission and also updating our mission statement. I think the mission statement was kind of important, although I don't remember if we use the word mission statement, but basically kind of the goal, what is the goal of kbr? And the reason for that is going back to 2010, the scope of KBR in 2010 was very simple, important but simple.
But it's actually really had widespread usage since there then. And it's used by libraries to figure and institutions to figure out what have they purchased, content providers, even their sales teams look at that stuff trying to figure out what they're selling. Kmart automation starts looking at, like I said before, institutional library holding specific info.
So it's really about looking at everything that KBR tries to do today and is relied on today and accounting for that 2 minutes. And I will talk quickly. The other one is CSV files to allow those. We are now presently working on content types, so that's expanding for audio video. Any anything that you can think of, not just books and journals, also looking at translations, multiple languages.
So these are projects we're actively working on in a subgroup on that. We are finishing up article and chapter level data. That's just a roadmap. I think with that we're just looking at this point having kind of proper instructions in the k bar practice in terms of how to go to your discovery service, who can best handle that? And we're just starting work on a roadmap for new formats that would be in addition to the current text tap text have to limited.
So json, XML and seeing if that's needed in the future. And then finally we re looking at the endorsement process. So and that is to allow content providers, you know, don't get scared off. You know, we realize there are certain things that content providers might not be able to provide. Aiming for 100% before you try and get endorsed is probably not the best thing to ask people. So looking at perhaps different levels, so that way people can get endorse, but we realize certain things are trickier and also just making it clear to people.
So and then finally we have a validator tool coming out and that is to that's something you can just run on your own. That's not part of phase three, but that will be something where you can just kind of easily do a quick check, perhaps, before you come to us looking for endorsement and see where some problem areas in your bar is. And then finally, if you want to be involved, you can email us. And when we do put our draft up, we would love to hear your feedback.
Great thanks, Noah. That was a good update. I know the group's been really busy for a while, and it's good to hear that it's coming together. Thank you. So our next speaker, our last speaker is Robert Wheeler, who is co-chair of the standard specific ontology standard, which I think is going to win the prize for the best ISO acronym for a while.
Source so, Robert, it's all yours for the next 13 minutes. I won't need the whole time. OK OK. So today we will be talking about knife. So SOS or sauce, as you said. Provide some context.
Standards publishing is often lumped in with STM, where scientific technical medical publishing revenue is roughly 4% to 5% of all publishing standards. Publishing is a fraction of that, but that number doesn't account for the cost of developing standards or of implementing them, both of which are substantial, nor the value of standards. Codes and standards help improve safety interoperability, including terminology, efficiency, opportunity creates and expands an ecosystem, including tools for creating things to the standard and utilizing those standards and probably preaching to the choir here.
But what really prompted us to do this work was the need to move beyond paper, given the standards. Development is generally consistent across standards development organizations. It's funny, the publishing ecosystem is so varied, each standard organization seems to have their own unique mannerisms and have evolved in their own organizations.
And when everything was paper and users were experts on the standards they regularly needed, idiosyncrasies were OK. But one of the things changing is our audience. We still have our experts who understand this mostly, but we also have new engineers looking for new answers. And more notably, machines that can only understand with training and guidance and consistency would really help a lot.
Print and PDF are roadblocks that inhibit the use of our standards, things that haven't changed much since the inception of standards. But we're working to change the way they're disseminated. Beyond PDF. PDF is more user friendly than print generally, but it has been around since 1993. We developed nice hosts, the XML standard tag suite, an XML standard built on the shoulder of Giants you've seen earlier in this session.
The NLN journal article did back in 2002, which evolved into Giants and nice, and there were definitely a lot of jokes about standardizing standards during the host's development. As we move beyond paper and PDF to extraction and more granular bits of standards exposing the data and standards. One of the one fundamental problem remains. There are hundreds of standards organizations, and no.
Two in the world use identical standards lifecycle states to describe their standards. Our goal was to develop a high level ontology describing standards lifecycle states with a limited set of core concepts and relationships. So that you can actually say something with it and not to define subject or disciplines, specific aspects of standards.
Nicer souls will allow uniform standard life cycle description and interpretation, increasing transparency, discovery, navigation, greater interoperability and setting a Foundation for further advances. Designed so that it can be later built upon and extended either by organizations or another round of going deeper into standards or both. We defined the standards life cycle as the sequence of events in the first idea, the ultimate archiving and/or withdrawal of the standard.
We defined over 70 terms directly related to standards and the ontology, and this was a big part of the work. It was learning experience for many of us, or at least from me, and seems quite valuable. And it was fun. We developed top level states in development action, active and inactive, and a number of second level states. These standardized states will allow standards organizations to map their proprietary lifecycle states and their deeper subtasks and processes to lysosomes in a manner that will be meaningful to standards and users.
We created a representation of the ontology and owl and a few other technical formats and describe several informative use cases. This is a sunburst illustration of the proposed top and second level lifecycle states. These are a couple of screen captures of the owl object and data properties in the protege application, which exhibits the core defining concepts we mentioned earlier.
This is the first part of our timeline with associated so-and-so's lifecycle states. And here's part two of the timeline, including nice November ballot that actually just closed last Friday, February 10th. We were approved and the tentative timeline for our last few steps leading to publication.
We'd like to thank and acknowledge our working group members representing 16 different organizations. We'd also like to thank the organizations who, beyond the time spent on the project, help fund it and of course, the support of ISO its members and access innovations who acted as secretariat.
And that's right. Thank you. Great thank you, Robert. So we do have about 9 minutes for some discussion. I created breakout rooms. I don't know if we could use those or if people want to have ask any questions in the open forum. We could continue the potpourri and just throw questions at all of the presenters that might be easier and more fun and help us get to know each other, not to split ourselves off.
Any questions for any of the presenters? Jeff I have a question for Caitlin about the. Correct I'm wondering if you're correct. Applies to pre-print withdrawals or if you've given any thought to those, how they would be similar to journal article retractions, removals and what the effect should be on the preprint if the corresponding published article is retracted.
Yes, we are considering that both in terms of the preprints, but also potentially things like data sets maybe that have been made open that are kind of associated with a retracted publication. So we are considering kind of how we can describe that linkage or describe that connection within the metadata itself. We are also hoping that the metadata fields and the recommended best practices that we come up with could potentially be applicable for preprints themselves, as opposed to preprints in Association with a retracted publication.
So something that's going to be broadly applicable. So we're hoping to kind of address both of those potential use cases there. But yeah, we are looking at those preprints that are out there that are maybe not the retracted publication, but are, in a sense, a version of that retracted publication and probably well, not probably should be corrected in Association with that that retraction action.
Excellent. Thank you. I look forward to it. Thank you. Bill you had your hand raised, and then you took it down. Oh bill castle. I actually didn't deliberately take it down. Oh, OK. All right.
Well, I also have a question for Caitlin. And I noticed that there was no mention of Crossmark in your talk. And I've always well, I've come to think of Crossmark as extremely useful and hardly at all used. So what are your comments about cross mark? I would say I absolutely agree that Crossmark is an incredibly valuable resource. I think we have not specifically addressed that in our conversations at this point, largely because we were looking at the metadata standards outside of one particular tool or one particular publisher or information provider.
But hopefully the work that we're doing in providing some of these recommendations and more of a standardized approach could also be of benefit to crossref and Crossmark as well. And the co-chair of the check is actually from crossref. So she'll be able to take the information that we're gathering and generating, kind of apply it to their local context as well. But I completely agree.
I think it's a really useful tool, and I would love to see folks taking more advantage of that, particularly end users who maybe are arriving at a page and don't necessarily. Have the knowledge and understanding of folks in this room in terms of tracking down all the different versions and all the information associated with a particular publication. Good so I will not abandon hope.
No, please don't do that. Thanks, Caitlin and Bill. There is a question in the Q&A for Noah for k Bart from Brady at tripoli. Have persistent identifiers like doy been considered in art. Yeah so we two part answer. So that comes up a lot with the Earl portion in k part where people go, oh, you know, it's always the, the, the direct URL that you're requesting.
Why is that? Why don't you use doi? And the reason for that is the reason for that is, is because the Doi can be used by more than one publisher. So and with those k part packages, it needs to be able to link always directly to where the content is. You always need the very direct URL to lead the person. Whereas with the die off is more than one package, it's going to lead to the user to an oc.
Which direction do you want to go? So we don't want to do that. Kind of like choose your own adventure discovery version. The more positive answer I have because I started off on a negative is when it comes to audio and video and stuff like that. We really had to have a lot of discussions about, OK, well, what identifier do we do we use? I know there's a topic a little later from the video.
I'm doing them short video and audio metadata guidelines. Thank you. Never, never trust me to say a name out loud. So I just have the idea in my head and nothing more. So but they'll probably talk a little bit more about what the recommendations are and why, but we certainly got a little bit of guidance from them in the answer. It was kind of like a bit of a potpourri pourri, including DIY.
So we do actually put that into there at the same time. K part is always allowed, kind of if you want to tack on additional information, we're not against people kind of adding in a line about Doi. So but that certainly I think will come up a lot once, once we get to the world of trying to give an identifier for those kind of video and audio and stuff like that. So there's another question for you, Noah from uni.
R Rk. It's mainly sold by publishers at this point, or are most publishers giving them away for free? Who's paying for this? Knowledge base vendors. It's free. It's free for everybody. I'm a big economics of k Bart, so it's the economics of k Bart.
Yeah, that's I'm a big believer that your metadata should be free. So whether that's k Bart or Marc records or onyx feeds, all of that should be free k part by definition actually is free. So you put up a so what content providers are expected to do is, is they put up a public portal. So like a web page or an FTP, a publicly available FTP and then the content, sorry, then the knowledge base is can then go in and pull that data, but they also receive it for free.
It's, it's in the content providers best interest that people can access their, their content. So if anybody wanted to put that data behind a paywall, I would argue them fiercely and to my last breath. Yeah, there is a k Bart registry and I'm going to put that into the chat. It's a website that's maintained by the standing committee that lists the publishers who have endorsed k Bart and are supporting it through making their metadata available.
So Thanks for that. Any other questions or comments that anyone wants to ask or share with our presenters or each other? Well, all right, then, I guess I will move to thanking the presenters for their presentations. And also for all of the work you've put into the work you do.
That is part of the ISO portfolio. It's really something to mention to everyone that everyone who works on an ISO project is a volunteer. Of course, often the work is dovetails with their daily work, their job. So I hopefully it's a supportive circle, but it really is in addition to all of the things that they have to do for their job, for their family, for their lives.
So we are really indebted for all of the time that people put into the work and the outputs. Thank you all so much. And I hope to see you all at the next nice Plus session. That starts in 15 minutes. See you later. Thanks Thank you all.