Name:
Solving problems with standards
Description:
Solving problems with standards
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/c3fcc540-ea36-46a7-99b9-7301456a6792/videoscrubberimages/Scrubber_1.jpg?sv=2019-02-02&sr=c&sig=l%2FZVCjPNWEY9AvjGA3GJTdBtC9bsqVkdjAMIeB6pjDY%3D&st=2024-12-21T14%3A34%3A28Z&se=2024-12-21T18%3A39%3A28Z&sp=r
Duration:
T00H49M26S
Embed URL:
https://stream.cadmore.media/player/c3fcc540-ea36-46a7-99b9-7301456a6792
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/c3fcc540-ea36-46a7-99b9-7301456a6792/39 - Solving Problems with Standards-HD 1080p.mov?sv=2019-02-02&sr=c&sig=93QYhLWqgWzm18dP3qTZKo1j8WYdyBB5%2Bse1F8kbAGo%3D&st=2024-12-21T14%3A34%3A29Z&se=2024-12-21T16%3A39%3A29Z&sp=r
Upload Date:
2021-08-23T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
[MUSIC PLAYING]
GREG GRAZEVICH: Hello and welcome to the NISO Plus session, solving problems with standards. I'm Greg Grazevich, associate director of bibliographic information services at the Modern Language Association, and editor of the MLA International Bibliography. And I'm very pleased to be the moderator for a session that will present insights into three distinct projects, each of which involves the development or refinement of standards to address the needs of a diverse community of users.
GREG GRAZEVICH: Our first presentation is on the NISO recommended practices for video and audio metadata. We'll hear from Violaine Iglesias, CEO and co-founder of Cadmore Media, Barbara Chen, an independent consultant to the information industry, and Bill Kasdorf, principal of Kasdorf and Associates. Violaine, please take it away.
VIOLAINE IGLESIAS: Hello, and thank you everyone for viewing the session of NISO Plus 2021, and hopefully, for joining us for a live discussion afterwards. So I have the honor of introducing this NISO working group, which has undertaken the task of producing guidelines for the production of metadata for video and audio content. So the ultimate objective of the guidelines is to help with the interchange of video and audio assets between the different NISO constituency groups.
VIOLAINE IGLESIAS: So that would be publishers, libraries, and vendors. We kicked off in 2019. We survived 2020, barely. We're now hoping to publish a recommended practice by the end of this year. So fingers crossed that we actually get to that. So the first thing we're going to do is introduce ourselves. We are the three co-chairs of the group, which does include over 20 people.
VIOLAINE IGLESIAS: Each of us is going to state what our initial interest was in founding this group. So I'll start. My name is Violaine Iglesias. I am the CEO and co-founder of Cadmore Media. Cadmore is a newish video and audio hosting and streaming service that is dedicated to the publication of academic video and audio content. So our goal is to help scholarly publishers and societies by giving them the technology and the services they need to properly publish video content, rather than just posted on the internet.
VIOLAINE IGLESIAS: So what that means is that means paying attention to things like usability, discoverability, and accessibility. All of which requires metadata, which we want publishers to use so they can treat them with the same care that they are with their journals and their books. So the story of this group. The story is this group I think started like a lot of NISO groups.
VIOLAINE IGLESIAS: When Cadmore was just getting started, one question that arose quickly was around the metadata. So we needed to know what metadata standard we should recommend to our publisher clients and also to the technology partners that we were working with, like the publishing platforms or the discovery services. So we quickly found out that nobody really knew what method or standard to use in the context of scholarly publishing.
VIOLAINE IGLESIAS: So we went to NISO for advice. NISO said, it would be great if somebody started a group to find out what metadata standard should be used in these contexts. So the rest is history. Well, not yet, but it will be history soon, hopefully. So that was my bit. And I will head it over now to Barbara, who is going to tell us more about the use cases.
BARBARA CHEN: Thanks Violaine. I come to you as a semi-retired member of the information industry. We can never get away. It's in our blood. Where I worked for 42 years. When I was the director of the Modern Language Association and editor of the MLA international bibliography, I knew that we needed to add indexing for relevant videos into the database.
BARBARA CHEN: When I started my quest, Todd Carpenter just happened to pop by the offices, and we, Greg, Todd, and I were having a lively conversation. I discussed the video goal and he suggested that I contact Violaine, because she was doing something with NISO. And I was intrigued. Greg certainly insisted that we adhere to standards with our metadata, and we were going to do that.
BARBARA CHEN: I was eager to join the group because I wanted to make certain that the needs of information-- excuse me, indexing and abstracting services were going to be considered. At that first meeting, an effort to make certain that we would incorporate the needs of as many beneficiaries as possible, we asked ourselves the simple question of what did people want? What was the purpose for this material?
BARBARA CHEN: The whole group decided, as we were discussing users, that we needed to first identify what relevant media types were we going to be thinking about. I knew what kind of videos and online literature, language, film, and folklore index want, and I had a vision of the properties we needed. But it's not just about me or us.
BARBARA CHEN: There's a world out there. In any case, I thought about documentaries, interviews with authors, directors, and scholars, oral histories, dance and theatrical performances. The group considered so much more scientific methods videos, medical procedures, multimedia textbooks, conference recordings, lectures, podcasts, concerts, you name it.
BARBARA CHEN: So once the list was done, we worked on the properties needed to accurately describe this stuff. And Bill will touch upon that in a moment. Then we went back to the question of whether we got everything. So we rethought use cases. At this time, we broke into subgroups because we had several tasks at hand. Dasia, Jeffrey, Sarah, and I took a more philosophical approach since we had already gotten specific with the media types.
BARBARA CHEN: So the question was, who are these people? What do content producers, creators, publishers, indexing services, librarians, preservation professionals, identifier registries, institutional repositories, researchers and scholars from every field, and end users, what were their needs? So some examples, I'll give you some examples of what we wrote.
BARBARA CHEN: But again, just to first mention, that we realized that the needs were overlapping. There were a lot of similarities between the groups. As publisher-- here's one. As publisher, producer, society, indexing services, aggregator, or hosting service, I want metadata to be supplied to me by the content creator or author in a way that can be standardized. The more metadata is standardized when it gets to me, the less time, resources, cost I have in spending-- I have to spend in manipulating that metadata into existing standards.
BARBARA CHEN: To achieve that I want to standardize metadata model that I can supply to my clients. Another example. More specifically, as an indexing aggregator hosting service, I want metadata that can be easily and automatically integrated with my existing records. So we can provide consistent access to many record types from a diverse group of international publishers or producers.
BARBARA CHEN: I need to be able to add my own subject, abstract, metadata to each record as needed. So I'm not going to bore you with the whole list. It's time for Bill to talk. On to Bill. Thank you.
BILL KARSDOF: Thank you, Barbara. I'm Bill Kasdorf . I'm an independent technology consultant, and my work is in mostly modeling and metadata editorial production workflows, standards and best practices, accessibility, et cetera. So I encounter lots of different kinds of things in my work. And in fact, I had worked with Violaine and Cadmore on this very topic as a consultant earlier on, which is why she recruited me to join on this group, and I was happy to join.
BILL KARSDOF: So as Barbara said, we recruited a very diverse-- deliberately diverse group of people because we wanted input from lots of different parts of the information ecosystem. And one thing that became very clear early on is that there are lots of standards out there. It's just that those participants tend to use different standards.
BILL KARSDOF: So in the first phase of the work in the working group, we deliberately didn't want to replace these existing standards that you're seeing on the screen. What we wanted to say is what properties does a recipient need from a provider of video or audio content? And how should that be expressed so that they can basically both speak the same language?
BILL KARSDOF: So in effect, we came up with what is actually quite an elaborate a structured set of properties and sub-properties and values that is really designed to serve as a sort of a Rosetta stone, hence the side between me. So for example, you might find that there's a broadcaster is using PBCore for their metadata. And their librarian is getting something from that.
BILL KARSDOF: Oops, I see we're missing one, because MARC Record isn't on that list. So anyway, MARC Record or MODS would be what the librarian needs. The librarian doesn't speak PBCore and the broadcaster doesn't speak MARC or MODS. So our goal is that our vocabulary gives them a common language to get between these things. So we completed, what is in effect, the draft proposed vocabulary.
BILL KARSDOF: And then we recruited another subgroup working in parallel with Barbara's use case sub-group as a standard subgroup. And in that case, I was the chair of that-- the leader of that subgroup, and I deliberately recruited people that were expert in each of these existing standards so that we could basically analyze our set of properties and determine, have we accounted for everything that somebody using that metadata vocabulary would want and would need in the interchange of audio and video assets?
BILL KARSDOF: That was really quite a project. We are not trying to develop crosswalks, we're basically just trying to say, what properties does this librarian need, or does this repository need, or does this archivist need, and who's providing these assets to them, and how do they convey to that provider, here's what I need you to tell me.
BILL KARSDOF: I'm going to put it in the model that I use, but I need these pieces of information metadata from you about these assets. So that's where we are. We completed that work toward the end of 2020. And so our goal there was twofold. One is to assess our list of properties to make sure we weren't missing something that one of these other models would consider essential.
BILL KARSDOF: But also to see if we have properties that there was no evidence anyone was actually using. So that's where we are right now. We've got that work done. And this-- what we think is the penultimate phase of our work is to get extremely granular use cases now that are basically sub-use cases of the broader use cases that Barbara's group came up with. Along the lines of I'm an academic librarian, and a faculty member is giving me a video of her course, what do I need to tell her that she needs to tell me about that video so that I can get my metadata, which I'm going to probably express in MODS or MARC to work?
BILL KARSDOF: Or I'm a broadcaster and I'm getting interview content from a scholar, what do I need that video producer to tell me about that asset so that I can express it in what I use, which is PBCore. So that's kind of where we are. What we intend to have at the end of the day, at the end of the year is a nice and recommended practice that will basically spell this out and give kind of concrete examples of if you are this sort of provider and you're providing video to this sort of recipient, here's an example set of VAMD metadata that you provide.
BILL KARSDOF: So that's where we are. And I think we're about to the end of our session. So we will be taking questions at the end of the session as a whole. So thanks for listening, and glad to talk with you in the Q&A session. Thanks.
GREG GRAZEVICH: Thanks, everyone. Our next presentation is introducing the software citation, giving credit where credit is due. We'll hear from Dan Katz, chief scientist at the National Center for Supercomputing Applications, and research associate professor in computer science, electrical and computer engineering, and the School of Information Sciences at the University of Illinois, Urbana-Champaign. Erica Pastrana, editorial director at Springer Nature.
GREG GRAZEVICH: And Melissa Harrison, head of production operations at eLife Sciences Organization. Dan, take it away.
DAN KATZ: OK. Thanks very much, Greg. So we're going to talk about three different subjects. I'm going to talk about guidance for authors, developers, and journals in terms of software citation. Erica is going to give us a view from the journals about how software citation works, and then Melissa is going to talk about how software citation is related to JATS and JATS4R.
DAN KATZ: So starting off with overall guidance. This is all in the context of the FORCE11 software citation working group, which initially started in 2015 with about 55 members of a variety of different backgrounds. And what we did was to review existing community practices and to develop use cases around software citation. And the next year, in 2016, we published a document about software citation principles.
DAN KATZ: And we did that by looking at the data citation principles that previously been published and updating it based on the software use cases and related work, working group discussions, and community feedback. So the software citation principles are important credit and attribution, unique identification, persistence, accessibility, and specificity.
DAN KATZ: And this is published in this paper that's in PeerJ computer science, as well as on the FORCE11 website. So once we had done this, we finished our working group. And we started another working group that was called the software citation implementation working group. And the idea of that working group was that we thought at that point that there was just going to be a little bit of extra work that we would have to do to turn these principles into implemented practices.
DAN KATZ: Of course, it turns out that that's completely wrong, and there's a huge amount of work that we need to do. And so what we've really been focusing on has been developing sets of guidelines for implementing the principles. And so over the last couple of years, we've published, in a guidance task force, checklists for paper authors and checklists for software developers.
DAN KATZ: We have a task force that's been looking at a metadata schema called CodeMeta, which is basically providing a way of recording all the metadata with the software that's needed in order to cite it. And that task force is in the midst, now, of trying to suggest recommendations to changes that would be in schema.org, and then we would basically go to using schema.org as a way of capturing metadata for software.
DAN KATZ: We have a repositories task force which has published a set of best practices for how repositories that deal with software should deal with them. And then finally, and the subject of this is we have a journals task force, which has recently published a document on recognizing the value of software. A guide for journals, authors, and editors.
DAN KATZ: And what we hope is going to happen out of this is that this journal task force is basically starting an adoption process with journals and with publishers. And then there will be some extra work that will be needed after that, which is, basically, how do the citations actually move through the system after they've been submitted into the journals? So just to dive into this last piece a little bit, we published this paper in January.
DAN KATZ: The authors include representatives and editors from a fairly wide variety of journals and publishers. And it has recommendations for how software should be cited. Specifically, that those citations should include the creators, the title of the software, the publication venue, the publication date, and an identifier. And preferably, that headed the results to a landing page. Some desirable properties for citation or the version of the software that's actually been used, and in many cases, a type of reference.
DAN KATZ: And this depends a little bit on what the citation style is. We also say that if an article exists that describes the software, it should be cited as an additional reference, but it's secondary to citing the software itself. That's the primary function that we're encouraging. We provide examples in APA style for a variety of different kinds of software or software that's archival repository, like Sonata, for example, software that's on GitHub, software that's in the software heritage archives, commercial software, and then a few other things in between.
DAN KATZ: And what we want to happen at this point is that we've published this paper that provides very generic guidance and as examples in the APA style. But we think the different journals have different communities, and those communities will want versions that have appropriate software examples for them, or picking examples of software that's common in those communities, and that use the citation style that's appropriate for those publishers and those journals.
DAN KATZ: So that's where we're trying to go at this point. I will turn it over now to Erica to talk about a view from the journal side of things.
ERIKA PASTRANA: Great. Thanks, everybody. Thanks Dan for the introduction, and thanks to the organizers for allowing us to talk about software citation today.
GREG GRAZEVICH: Sorry. I think we need a slight pause here because the remote control was asking me to change preferences on my computer. That gave me a pop up that didn't actually let me do what I wanted to do.
ERIKA PASTRANA: We will not do the remote control. I'm happy not to give up remote control. There we go.
GREG GRAZEVICH: OK. Well, let's go back. Sorry, I haven't tried this before, and there's something I have to figure out. This isn't quite as straightforward as it should be, at least on a Mac OK. So we'll stop and then Erica just get started in a couple of seconds again. I'm sorry.
ERIKA PASTRANA: Great. Thank you. Thanks Dan for the introduction. Thanks to the organizers for allowing us to present about software citation today. So next slide please, Dan. I'll, as Dan mentioned, cover the perspective from the publishers. I work for Springer Nature, but I hope to represent with this a lot of the guiding principles and practices that many publishers are implementing.
ERIKA PASTRANA: We, as Dan mentioned, many of us participated in the working group to develop the guidance for journals, and editors, and authors about code citation or software citation. And what I will cover very briefly is what's the motivation on our side, and how are we contributing to help software citation is implemented and reflects in the published paper.
ERIKA PASTRANA: Just very quickly. This is sort of principles that I personally feel are guiding a lot of the motivation for software citation amongst journals and editors, as well as authors. And it's obviously clear to this audience that code is a central part of research nowadays, and that it should be as well a central part of the published paper. It should be recognized as such.
ERIKA PASTRANA: And that we would ideally want the code to be not only found and shared as part of the publication, but also cited to ensure permanent accessibility and proper recognition. So next slide. Thanks. So to guide us in how to do that, it's clear that we had a number of objectives. That we needed the proper documentation of the code, the code needed to be available in throughout the life cycle of the paper.
ERIKA PASTRANA: So in a way submitted as part of the paper in terms of acknowledging the documentation for the code, the dependencies, the operating systems. We typically go through a process, as editors, to evaluate if all that documentation is accurate and if the code is actually even executable. But most importantly to this point, at the point of publication, editors and authors should make sure the final paper details is in a certain way how the code can be permanently accessed and how it can facilitate others using it.
ERIKA PASTRANA: So as sort of next step to those guidelines that Dan mentioned that we participated on, I'm posting an example of our guidance that is external and given to all the authors as they navigate the process of submission and publication with us. There are similar examples in other journals that I'm pointing to there, but I'll just walk very briefly through the nature research policies, which, as you can see here, will recommend the use, will strongly encourage the use of a DOI-minting repository to deposit the code.
ERIKA PASTRANA: We do mandate that if the code is central, it is made available and cited in the reference list. We're pointing to the guidelines that we developed recently and also giving some details about license and other any restrictions that should apply. So that's an example of somehow journals are guiding authors to take that step of using the software citation guidelines to actually implement them in the paper.
ERIKA PASTRANA: And just lastly, the last slide, I wanted to just mention what this looked like in the final paper. In a given example, as Dan was referencing, every journal has a different citation style, and we then apply our own. But in the end, the key features, let's say, of the guidance are that we are pointing to a citation, a clear citation to the software in the reference list, and that that gives a reader a direct access to the DOI version in this case of the code.
ERIKA PASTRANA: And just finally, for further reading for those that wanted, in the last slide, I've just included a note to read some Scholarly Kitchen blog posts that Dan authored. That also gives a bit more context to how publishers should be using these guidelines to improve reproducibility, reuse, and credit to code. Thank you.
MELLISSA HARRISON: Thanks Dan and Erika. So as these guys have explained, there's a whole process and lot of work that's going on behind bringing up software citations to an important level. So I'm going to talk about JATS and JATS4R. As a publisher, if you're publishing full text, generally speaking, it probably means that you're using JATS as a markup for your content. And these are both NISO's things.
MELLISSA HARRISON: So JATS is a NISO standard, and it defines the set of XML requirements and attributes. You might notice, if you are at last year's conference, I've just basically lifted this slide from that conference, just to kind of explain the difference. JATS4R, I'm the chair of that organization. And what our role is is trying to optimize reusability of scholarly content, because there is a variety of ways that you can tag your content, and we're trying to kind of define standards and whether it's a recommendation for tagging content.
MELLISSA HARRISON: So next slide, please. So JATS4R has a number of recommendations. I've just pulled up here an example from data citations. So a number of years ago, we published a recommendation just to positioning the publication type attribute. So I'm not getting too technical, but basically, JAT says you can use many different terms in this as a recommended list. But this is a way to distinguish different types of references.
MELLISSA HARRISON: So if you have one that says data , it's easier to kind of pull it out in a machine readable way and to find it. And this allows the interchange with other parties and with all the work that's been done by FORCE11 and editorial communities to ask authors to cite software by using JATS. And for that information, to go to various places, it means that the reuse potential is a lot higher because the citations and things like that will happen more easily.
MELLISSA HARRISON: So next slide, please. So I know that we're prerecording this. So I'm hoping that by the time the conference happens, this group will actually be up and running. But there is a new JATS4R working group. It's a subgroup on software citations. As Dan mentioned earlier, when you start setting up something, you think it's going to be simple and easy, and I have great plans that this group will work really quickly and things will be done pretty imminently, but I suspect there'll be lots of back and forth thing and discussion from all the different parties.
MELLISSA HARRISON: At great, we've got 15 people that are already willing to be part of this group. The working group will commence, we'll have calls regularly, we will be gathering samples and documentation. So thank you very much to the FORCE11 groups for all the documentation that's already out there. And what we'll be working on is a recommendation that can go to the JATS for our site. And this complements all the work that's already been done, because it means that when people are thinking, well, that's great, as we're meant to put software citations in the reference list, how do we tag that?
MELLISSA HARRISON: What do we do? We'd like to get out there and have a recommendation already so that it's clear and that everyone does the same thing so that the interchange and reuse of this software citations is of use. Thank you.
GREG GRAZEVICH: Thanks, everyone. Our final presentation is subsetting the JATS DTD. So what? And the presenter for this topic is Charles O'Connor Business systems analyst at Aries Systems. Take it away, Charles.
CHARLES O'CONNOR: Today I'm going to be talking about subsetting the JATS DTD. Now, I can't imagine that there are too many people who don't know what JATS is among this group. But it is the XML format for articles and scholarly publishing. It was developed by the National Library of Medicine. But it has grown beyond biomedical information to be used for STM articles, social science articles.
CHARLES O'CONNOR: And it's even being used in the humanities now. It was made by sub-standard in 2012, which is one reason we're here. Now, the topic of this session is solving problems with standards. And I wasn't exactly sure if this was solving problems with standards or solving problems with standards, because I do have a problem with JATS, and that is that JATS is too loose.
CHARLES O'CONNOR: It is by design a descriptive DTD, and it is not prescriptive. So it allows many different ways to associate authors and affiliations. It has different bibliographic reference models. The publication history can be captured in very different ways. And JATS does come in more or less restrictive flavors. The archiving and interchange set is rather loose, it's meant as a target for existing content.
CHARLES O'CONNOR: Publishing is a little bit less loose, and it's used for new articles and hosting and archiving. And then the most straight of the tag sets is authoring, and nobody uses it. I'm certainly not the first person in the world to figure out that JATS is a little bit too loose. And the JATS for reuse group, as Melissa was just talking about, comes with recommendations for various parts of the XML.
CHARLES O'CONNOR: And so it's a narrowing of what you might do in JATS. And these recommendations are helpful in making JATS machine readable. Now, machine readable JATS, there are certain considerations that you want to have in designing machine readable JATS. Now, where is the information coming from? Where am I pulling it from?
CHARLES O'CONNOR: And a couple of different places is maybe OK, because you can have it or say, it could be here or there. But you really do want to control what the information contains. I worked on the office and affiliations sub-group, and I think the very best thing that we did in that group was make it to that an affiliation-- to do tag as an affiliation should contain one and only one affiliation.
CHARLES O'CONNOR: And that affiliation should be complete. So you don't pull an affiliation entity department of biology, Coba, and then you have to figure out where the rest of the information of that infiltration is. And also, you need to know how that information is formatted so that you can pull it into the system that you are under, that's under your control.
CHARLES O'CONNOR: Now, I come at this from the point of view of a toolmaker. I'm making tools for XML through workflows. And there are a lot of good reasons to bring JATS XML as early in the process as possible. There are ones that you can sell to your manager, like publication times and reducing errors and saving money. But it's very clear that as we move from a print and PDF based world to a XML and full text world, that metadata is more and more important.
CHARLES O'CONNOR: And having your production staff have that in their control is very important. I'd like to think that allowing office and staff to change the content directly themselves will increase their experience in a production workflow. And having the article in a structured format, as opposed to word or a PDF, makes it more amenable to automated analysis.
CHARLES O'CONNOR: So AI analysis for, is this article even appropriate for your journal? Or in a production system, being able to use schema trying to examine what's in your XML. So lots of good reasons to have XML very early in a production workflow. But it does present some challenges as well. One is defining what would be machine readable JATS.
CHARLES O'CONNOR: And so here you have to decide, where does this information go? You're pushing it. And you can't push a string, so you have to be rather rigid about that. And so hopefully, you have only one place to put that information. Also, you need to think about how that information interacts with the rest of the article. When I add this, what else changes in the article?
CHARLES O'CONNOR: And also key is how you're entering the information as well. Depending on how you entered, maybe has implications for how you want to structure it. And to the solution here is to subset the JATS DTD. And to be very clear, when I worked on subsetting the DTD, very much informed by JATS4R.
CHARLES O'CONNOR: So certainly, did not want to get outside of the lines that JATS4R has created, but simply draw those lines even a little bit closer. So here is an example of something that we did with subsetting the DTD. For affiliations, we allow it in only one place, because somebody is going to push a button that says, Add an Affiliation, and you need to know exactly where that's going to be.
CHARLES O'CONNOR: And also, we enforce that the affiliations and authors have to be associated with each other through a cross reference, an explicit cross reference in JATS. What's called an exref element. Another example is in bibliographic references.
CHARLES O'CONNOR: In this case, what we did was we removed from the content model for reference the label. And one reason for that is that labels are used to number each of the references. So one, two, three, four in all the references. But when you're editing, you may be adding citations to the text somewhere, and you want those citations to be presented in document order.
CHARLES O'CONNOR: So it's rather difficult to reorder and remember things in Word or a PDF, but it's easier in XML, and it's even easier if you don't have to update all the labels as you go along. So that becomes a generated text. And at some point, if you need labels, you can transform out to get those labels back. And at some point, I thought, oh, all these things, anybody building a production system or something similar will want to do the same kinds of changes that I'm making to the JATS DTD.
CHARLES O'CONNOR: But that's not really true, because there's that third consideration of how you're entering the information. Again, in reference, JATS has two ways of encoding references. One is element citation. And element citation doesn't contain any situation, it doesn't contain spaces or any untagged text, everything has to be within an element in that model.
CHARLES O'CONNOR: So if you have a form based method of adding a citation here, then you would certainly want to subset to only allow element citation. That way, there's a one to one correspondence between the fields and the elements, and you don't have to worry about whether does an author-- is that followed by a comma and a space or parentheses around here?
CHARLES O'CONNOR: None of that matters you're entering it within the form. In our own system, we allow text based entry. You can just free form the text as you go along. And therefore, we've narrowed the content model, so it only allows mixed citation, which allows punctuation spacing and also untagged text. So for example, if an author is adding a citation, they can just write in whatever text they would normally type in Word and later on, the tagging can be called out.
CHARLES O'CONNOR: But they wouldn't be prevented from doing so if you-- as if you had used element citation. So really here, we are talking about solving problems with the standard, because JATS is made to be customized. JATS is very well put together, it's modular. There are a bunch of the definitions of how elements are put together in many different files.
CHARLES O'CONNOR: But really, you don't need to touch those common files at all. All you need to do is create a set of override files. So whatever you want to change, you can change in those files without needing to go into each and every file and find the content model and change it there.
CHARLES O'CONNOR: I would recommend, for anyone who is doing any customization of JATS, that they read the JATS compatability meta-model. It gives a lot of the philosophy of how JATS is put together. And so it's meant for people who are superceding JATS so that whatever they add to the JATS model conforms with how JATS is put together.
CHARLES O'CONNOR: But I find it useful just to have a very good overview of the philosophy behind JATS and the philosophy about how different elements and attribute to work together. So that's high recommendation for this particular document. Now, as a bet, not everything can be expressed in a DTD.
CHARLES O'CONNOR: So if you are coming up with strict tagging guidelines, I would do again like JATS4R does and use Schematron. Schematron is a rule-based validation language. And the cool thing about it is that you get to write your own rules. So if you hit an error, the error messages are things that you wrote and mean something to you, as opposed to personal errors that can be somewhat cryptic.
CHARLES O'CONNOR: So what does subsetting get you? If you're building tools for creating and editing XML, you don't have to develop for every single possibility that exists in JATS. So that it makes it easier to build in the first place, but also easier to maintain as you go along, and more robust.
CHARLES O'CONNOR: And the editing environment is not the only tool that will be using your JATS content. So it does also help to have a predictable format for rendering, for transforming, and other uses of your XML. Subsetting the DTD helps you establish clear expectations for your suppliers and other partners.
CHARLES O'CONNOR: And it's a good means of enforcing those expectations. So you can give your DTD to your suppliers and ensure that they won't send you anything that it's not conforming to. There are some things to avoid when subsetting the DTD. Certainly, don't get rid of anything that is mandatory in the parents DTD.
CHARLES O'CONNOR: Otherwise, your subset won't be valid to the parent to DTD. You don't want to have to re-add something at the end of the process just to make it valid to the parent DTD. And also, don't get rid of anything unless you get some sort of technical win out of it. When I first went to subset the DTD, I was very, very enthusiastic.
CHARLES O'CONNOR: And I thought, oh, that is going to use this thing. And oh, that's just silly. And the best laid plans, you make contact with the enemy, you have a customer, and that's something that they use, that they need, and I did need to put some things back as well. And the last bit is, certainly, don't forget to document everything.
CHARLES O'CONNOR: You need clear documentation to support your subset of the DTD. Thank you.
GREG GRAZEVICH: Thanks, Charles. And a big thank you to all our presenters. We've reached the point in our session where we're about to go live. So our presenters and attendees can interact in three separate breakout sessions, one for each topic. [MUSIC PLAYING]