Name:
Not Your Mother’s Migration: Lessons from migrating more than 47gb of technical XML
Description:
Not Your Mother’s Migration: Lessons from migrating more than 47gb of technical XML
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/c398e6de-fec3-429e-be4c-e8df5cc26e08/videoscrubberimages/Scrubber_0.jpg
Duration:
T00H56M27S
Embed URL:
https://stream.cadmore.media/player/c398e6de-fec3-429e-be4c-e8df5cc26e08
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/c398e6de-fec3-429e-be4c-e8df5cc26e08/PS23 June recording.mp4?sv=2019-02-02&sr=c&sig=Rkq0LSBlxpr16k8%2B%2FQvzWEAbPXj%2FaFS8XACcRWhz1W4%3D&st=2025-01-22T11%3A45%3A04Z&se=2025-01-22T13%3A50%3A04Z&sp=r
Upload Date:
2023-06-27T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
BETSY DONOHUE: So just to kick things off, just a little bit of an introduction here. Anyone who has ever performed a platform migration can tell you that the state of your archive is one of the biggest determinations of the level of complexity of the entire project. However, it's also one of the biggest areas of opportunity. And that's because migrations are a great time to review, to modernize, and to upgrade your entire corpus of content.
BETSY DONOHUE: So just over a year ago, Silverchair, AIP Publishing, and DCL undertook exactly that task. And that task was moving more than 90 years of content to its new home on the Silverchair platform. I'm joined today by Melissa Jones, who is the content architect at Silverchair, as well as Tracy Denien, who is the head of product development and operations at AIP Publishing, Richard O'Keeffe, who is the manager of digital asset management at AIP Publishing, as well as David Turner, who is the digital transformation consultant at DCL.
BETSY DONOHUE: And we're all here today to talk about that process-- that process, not only how it worked, but what publishers can learn from it. But before we get started, we decided to launch a poll. And our aim here is to get a sense of our audience's experience with this topic. So Steph has popped this poll up. If you could take a minute to quickly answer yes, no, or no, but we have one coming up, that would be wonderful.
BETSY DONOHUE: So thanks for participating in that poll. Awesome. Thanks a lot. OK, we got immediate results. Oh, pretty cool-- 70% and 30% "no." OK. All right. That's a good mix.
BETSY DONOHUE: So let's hand it over to today's speakers. We're going to take an approach where the speakers introduce themselves and really set the stage, sharing with us their roles in their migration project, what their goals were, what their experience was in this project, and afterwards, then, we'll engage in really an open discussion. And you'll have an opportunity to ask questions, and we'll provide the answers.
BETSY DONOHUE: So, yeah, we're going to start off with Tracy from AIP.
TRACY DENIEN: Great. Thank you. I'll just share a couple of quick slides. So as Betsy said, my name is Tracy Denien. I'm head of product development and operations at AIP Publishing. And my team drives the technology related areas for the platform, including integrations with other systems, our front-end services, and, most importantly, the content.
TRACY DENIEN: AIP Publishing itself was formed in 2013 as a wholly owned subsidiary of the American Institute of Physics. And we're charged with publishing over 30 journals and conference proceedings. The American Institute of Physics has, for nearly a century, worked to advance promote and serve the physical sciences. Both organizations share in this mission and continue to work together to achieve their goals.
TRACY DENIEN: So now, if we're talking numbers, AIP Publishing publishes the American Institute of Physics' flagship magazine, Physics Today, 10 open access titles and growing, 12 subscription titles for our publishing partners, along with 16 of our own titles. We have over 2,400 conference proceedings published. In total, that's over 1.1 million articles currently on the Silverchair platform, which is why we're here today.
TRACY DENIEN: So why did we move? And let me just stop sharing because I just had those two slides as a visual. Why did we migrate? As you know, a platform move is not for the faint of heart. In 2021 AIP Publishing reached a fork in the road, where we needed to do a major and costly upgrade with our then platform vendor. The platform had an outdated look and feel.
TRACY DENIEN: And it was the perfect opportunity to explore another option, which was to migrate to Silverchair. Even prior to being a Silverchair customer we attended their platform strategy days, we spoke to other Silverchair customers, and had exposure to many of the Silverchair staff. So Silverchair itself was not something that was unfamiliar to us. We have many areas identified as we worked on the discovery with Silverchair that can get into if time allows.
TRACY DENIEN: But right now, I'll pass it off to Rich O'Keefe on my team to introduce himself.
RICHARD O'KEEFFE: Hello. My name is Rich O'Keefe. I, as Tracy mentioned, the manager of the digital assets group here. And our primary role at AIP is to support the production workflow, the maintenance of the archive, and the integrity of the archive, as well as our XML implementation for current, past, and future implementations of the content that we publish. Our role in this particular migration was, we were the primary contact with Data Conversion Laboratories and with the Silverchair content architects-- responsible for the delivery of the material and the answering of any kind of questions and any kind of anomalies that had been found during the conversion process.
RICHARD O'KEEFFE:
DAVID TURNER: All right. I guess I take it from here? So as they mentioned, they worked with us at DCL to move their content. And as Tracy hinted at before, it was a massive amount of content. I think it's the largest set of content ever loaded on Silverchair, if I'm not mistaken. This slide gives you a sense of some of the size here-- 47 gigabytes of journal content, 1.7 terabytes of journal assets-- and not just loaded but really, really optimized.
DAVID TURNER: And that's the role that we played in this process. So, again, as I mentioned before, I'm David Turner. And I do a little consulting with new customers. I do a little business development. And I also manage the relationships with partners like Silverchair. That's my role at DCL. And DCL stands for Data Conversion Laboratory. We're based in Queens.
DAVID TURNER: We've been solving content challenges for more than 40 years. And we really shine in situations like this where quality is mission critical and where the content is complex. We've got a really solid team-- Beth, Devorah, Robert, David, some of those people. And we've been doing a lot of these projects. I don't even know the exact number that we've done with Silverchair now.
DAVID TURNER: But it's been over several years. Just quickly, more just about DCL, some of the things that we do-- so mentioned the content conversion, not just conversion but optimization. We didn't want to just get their content onto Silverchair but make sure that it was optimized for the new platform. We also handle the migration piece. We have some other services around accessibility and scanning and entity extraction.
DAVID TURNER: But the other really big part of where we were involved here had to do with identifying problems with metadata. As you're moving to one of these new platforms, you want to make sure you take advantage of the opportunity to be able to get in and fix those kinds of things because they're going to affect your experience. They're going to affect discoverability. And so what we did is we employed a tool that we have-- and we do this with all of our Silverchair projects-- that's called content clarity.
DAVID TURNER: And you'll hear us talk about this a little bit more today. So just quickly about that-- content clarity is effectively an analysis that we do of all of publisher's XML content. And it's really designed to do three things-- first of all provide content metrics on things like see here, number of files, number of bytes, how much full text you have versus header-only XML, what DTDs you've used historically.
DAVID TURNER: It also identifies where information may be missing. If you've got missing DOCTYPES, or missing titles, or missing issue numbers, or things like that, we can identify those and start working on getting those fixed. And then, finally, it also surfaces errors and duplications and things like that. So when we're talking about content clarity that's what we're talking about. So with that, I will stop sharing that.
DAVID TURNER: And we'll pass this over. I think Melissa's next.
MELISSA JONES: Yes. So I'm Melissa Jones. I'm a content architect here at Silverchair. A content architect wears many hats here. But our role in the migration process is to act as a facilitator for the migration and also to provide support for the content conversion vendor and the client to make sure that they have the answers that they need to make informed decisions about the content and the migration process.
MELISSA JONES: I co-managed the AIPP migration with my colleague Brooke Begin, who did a lot of the day-to-day work on this. So I have a slide I want to share just to give some context on where this project falls in Silverchair's migration journey. And it has been a long one. So let me just share my slides here. OK, so this timeline starts with the OUP migration in 2015, 2016.
MELISSA JONES: And I think that the OUP migration is a really good mirror for the AIPP migration because of the size. The OUP migration was approximately 2 million journal articles. And AIPP, which is the largest we've done since OUP, is well over a million with book content as well as proceedings content. So these two are very similar in terms of scope, in terms of the level of publisher engagement and involvement and expertise.
MELISSA JONES: But the context for us was much different. So I want to talk a little bit about that. So OUP was actually our charter Zipline 3 and SCJATS client. So we were building Zipline 3 and creating the SCJATS specifications while we were migrating. And for that reason, we decided to do an internal conversion. And that provided us with the flexibility to adjust as we were building the tools and the specs. And it also gave us a significant opportunity to learn all the ins and outs of the migration process.
MELISSA JONES: I mean, what better way to learn than with millions of articles and doing everything on the fly. So it was a challenging project for those reasons. But it was also a significant turning point for us as a company. And a lot of the lessons that we learned in partnership with OUP informed the direction and the trajectory that we took for the years after that. So the DUP migration-- Duke University Press-- is on this timeline.
MELISSA JONES: Not because it's similar in size-- it was a much smaller migration. But this was actually the first full vendor conversion that we did on Zipline 3 and SCJATS and SCBITS. And it was also our first Zipline 3 partnership with DCL. DCL was no stranger to Silverchair at this point. But it was their first encounter with the SCJATS and SCBITS specifications and the Zipline 3 tool. And this is where we started to form the beginnings of the content project plan that we still use today.
MELISSA JONES: We came up with some internal ways of working and also some ways of working with content vendors and with clients to maximize outcomes for the migration process. And we also discovered during this project that DCL was probably going to be a really great partner for us, that could grow with us and that could provide a strategic option for customers who were looking to de-risk the migration process.
MELISSA JONES: And so DCL became the unofficial preferred vendor after the Duke project and eventually became a universe partner. And, as David said, they've done many, many migrations with us. I would say probably 90% if not more of the migrations we've done since 2017 have been in partnership with DCL. And our relationship with them has really grown and matured. And the outcomes of that have been things like content clarity and additional innovations from them that have really made the migration process go even smoother.
MELISSA JONES: So now that brings us to the AIPP migration, which was the largest since 2016. And AIPP brought a lot to the table. But from our perspective, this was also an opportunity for us to leverage very mature tools, specs, processes, and also benefit from the long-standing relationship that we had with DCL.
BETSY DONOHUE: Awesome. Well, thanks, everybody. That was a fantastic start to get the intro and kind of lay the groundwork for what's coming next, which is discussion points. So we have a couple discussion questions I'm going to kick off. So starting with the first one, "You each have a great deal of experience with migrations across the board and from all different perspectives.
BETSY DONOHUE: What made this one unique and what lessons were you able to apply to this project?" Who wants to go first?
TRACY DENIEN: I can speak, just from an overall-- I think for me, what made this project unique was the speed at which we went from decision to migrate to launch. It was just over a year from the ink being dry on the, I think, on the contract. So that, to me, was at light speed as compared to past migrations that we've been through. And I think that what DCL and Silverchair teams brought to the table played a large part in that.
TRACY DENIEN: Also, from past experience, AIP Publishing ensured that the business owners were always available working across Silverchair, and with DCL as needed, to make sure that we met our deadlines. So to me, just the speed at which we did it was very unique. You had to pass.
BETSY DONOHUE: Right. Great perspective. Anybody want to add anything additional to that one?
RICHARD O'KEEFFE: I would say from-- what we were excited about was content clarity, as David and Melissa had spoke about, because everyone thinks their content is good. But you know there's some landmines out there in it. And we were really excited to have this tool run through our corpus of content and identify those particular discrepancies and disparities because when you have this much content, nobody really knows that in 1942, maybe they did this.
RICHARD O'KEEFFE: And it helps bring everything into alignment so you can have a more consistent presentation on the platform. So we were really excited to have that tool applied to our content and learn from it.
BETSY DONOHUE: Yup. All right. That's a great point. And that's kind of intermingled with the next discussion point.
DAVID TURNER: Well, here, can I add one other thing to that first part?
BETSY DONOHUE: Absolutely. Absolutely.
DAVID TURNER: I thought one of the things that made this unique was really the knowledge that the AIP publishing people brought to the table. A lot of times in these engagements, you know, we're working with people, it's their first time to do this. They're not really sure where all their assets are. They're trying to get things together. And one of the things we quickly discovered about working with Rich was that he had his ducks in a row.
DAVID TURNER: And he knew where things were. He'd done this before. He had a lot of answers to questions ready to go and really made the process a lot easier by bringing that level of preparedness.
BETSY DONOHUE: Mm-hmm. Very nice. Great point. The next kind of add-on to this area and this topic that we're covering, and this is specifically for David and Melissa, what gains have you seen other publishers achieve through migrations like this?
MELISSA JONES: So one area that is typically addressed the most is metadata-- so enhancing metadata, adding DOIs, normalizing article types, normalizing casing and article titles, adding article titles if you don't have them. Our platform actually requires them, so it's kind of a forced improvement. Taking the opportunity to make sure that your metadata is clean and up to date tends to be the area where most of the focus is.
MELISSA JONES: The brave among us tackle their references from eons ago and try to make them more granular and machine readable. That is a pretty challenging thing to do. But we have had some clients recently who have taken that on.
DAVID TURNER: Yeah, I think there's a whole host of gains that we've seen. I mean, obviously, there's a lot in terms of new functionality when you move to a platform like Silverchair that Hannah and her team put together great use cases. And I think all of that probably factored into why they were chosen here. For other publishers, sometimes you just see, it's just the ability to get all their content in one place. Sometimes people have their journal content here, their book content here, their conference proceedings over there.
DAVID TURNER: And so I think that's a big gain for people. Also, it's a chance sometimes for people just to establish an archive. I mentioned that Rich and Tracy had done this before. And they had an archive but. But there are a lot of publishers out there who, they haven't really kept things in house. It's always been someplace else.
DAVID TURNER: And doing a migration like this really helps with that. Just a couple of other things, just off the top of my head, here-- well, Melissa already mentioned the chance to fix those metadata problems, those nagging problems that are there. But I think there's also-- I saw on the chat here, there was a question about digitizing content.
DAVID TURNER: And some publishers, they really take this as an opportunity to make gains around getting content from a paper format into a digital format or from taking things that maybe were PDF into XML, or things that were header-only XML upgraded to full-text XML. Should I go ahead and answer that question about the digitization? I can't remember where I saw that-- somewhere.
BETSY DONOHUE: Sure. Yeah.
DAVID TURNER: Is it in questions, or is it in a chat?
BETSY DONOHUE: Yeah, it's in the chat. I think from Guy Jackson. Is that the one you're referencing?
DAVID TURNER: Yep. Yep. So, Guy, thanks for that question. We absolutely can do nondestructive digitization of fragile hardcopy and return that. We do have that hardware in house. I will say, it depends a little bit. There are-- occasionally we'll come across some really special things where we'll use some out-of-house vendors. We might subcontract someplace else.
DAVID TURNER: Or if they're just-- if the cost of shipping it to our facility is just incredibly cumbersome. But anyway, if you want to contact me afterwards, I'd be glad to talk about that with you.
MELISSA JONES: And I do just want to add something that I think is important to call out about this project in particular. MathML remediation was pretty significant for the AIPP project. So any client who has very math heavy content, DCL is very good at pulling out math errors and helping to resolve those. So that's another area that's often addressed.
BETSY DONOHUE: Nice. Great point. And taking the discussion questions and bringing it back to Rich and Tracy for a minute, in this process with the project, did you have any moments that really surprised you, and/or really got you excited to see things addressed that you didn't expect?
TRACY DENIEN: Rich, you may be better equipped to speak to that, as far as the work you've been doing on the conversion.
RICHARD O'KEEFFE: Yeah, I mean, there were a couple of different aspects that the Silverchair platform had than on some of our previous platforms, one of which was it gave us an opportunity to consolidate and pair up our supplementary material with the rest of the assets of articles because we had a legacy process that had it in a separate repository. And we had the opportunity to merge it in so we could deliver everything consistently to DCL and then have it converted for the platform.
RICHARD O'KEEFFE: I guess from a surprise standpoint, and this builds on what Melissa's comment was, and [INAUDIBLE],, is that that was probably the largest area of where we had encountered content that needed to be updated. We had older processes where we had the composition process integrated with the XML generation process. And while the MathML would have been valid if you opened it up, in many cases, or if you turned it on strict or were using the latest version of MathML, it would have been invalid almost always because of attributes that were put in that just weren't permitted because someone decided to use Roman instead of normal or they used a font face [INAUDIBLE] name instead of that.
RICHARD O'KEEFFE: It worked fine on the PDF and everything back in the day in the context in which it was done, but it wasn't working here. So we had a lot of corrections for that. When you compare it to the overall number that David had flashed, it's a very, very small percentage, which was nice. But still, it was a number of files that we had to update, and we were able to get that fixed, updated, and redelivered to DCL.
RICHARD O'KEEFFE: So it gave us that one opportunity to clean that up in the archive. So that was probably the biggest surprise. We knew it was out there, but not to the extent that it was.
BETSY DONOHUE: Wonderful. Thanks, Rich. That's a great answer. The next discussion question is a little bit of a different flavor. And I'd like to start with Tracy and add on or ask for a little bit more detail. One of the points you made, when you first shared your initial slide, you shared a little bit about the situation when the decision was made to undertake the migration and where AIPP was in crossroads.
BETSY DONOHUE: If you could give a little bit more detail-- what were the motivating and compelling reasons, and organizationally, what that was like for you all to come to that realization and work together and kind of start to make the decisions to do that migration and work with Silverchair.
TRACY DENIEN: I think at first it was how to get people within the organization motivated as opposed to, here we go again. But we did. And I think because of the situation we found ourselves in, the motivating factor was to really bring our platform into a modern day. And with Silverchair, we were able to achieve this researcher-friendly web design. We have a new professional look.
TRACY DENIEN: We have new features that we didn't have in the past, like a split-screen article view, which we have our associated content and the data in the same view, with the research and the article. So that's huge for us. Getting access to the Silverchair community and workshops, it was really a positive. Another thing for us too was we didn't have a lot of control over our old platform.
TRACY DENIEN: And we were at the mercy of waiting for the vendor to make front-end changes. And what Silverchair provides us is a robust tool, allowing us to manage the front-end displays, configurations, adding publications or publishers much easier. So that allows us to respond to things a lot quicker. So I think a lot of that, and really working together with our stakeholders on all the benefits-- and we brought them into the process more this time than we had in the past, which has pros and cons.
TRACY DENIEN: The more people involved, the more voices there are. So a lot of it was managing the voices-- working along with Silverchair, working along with DCL. But I do think, in the end, it really brought everyone together. And having everyone involved was an important aspect of it.
BETSY DONOHUE: Tracy, that's awesome. Thanks for that extra detail. That's great because it all comes down to the people, right?
TRACY DENIEN: Yes. Yes.
BETSY DONOHUE: For sure. The next discussion question is a little bit more precise. We want to focus on newer content for this next question. How did you manage the switchover? And what we mean for that is specifically about the content actively in production-- so setting up a parallel publishing process-- how did that get done? How was that achieved?
RICHARD O'KEEFFE: Well, that's always the most challenging part of any migration. I mean, you see 1.1 million articles. And while that's a large number, the first million's the easy part. We copy off 1929 to 2021 on a drive. And conveniently, the DCL dropoff location was on the way home from work. So I could just take the drive and drop it off. So that was the easy part.
RICHARD O'KEEFFE: So it's those subsequent delta deliveries that are the most challenging because there were specific times in the schedule where we would drop off the 2022 content. And then it was the first couple months of 2023. And the target is always shifting because as anyone who works in publishing knows, the publishing engine just never stops. It just keeps going and going.
RICHARD O'KEEFFE: And you're supporting that while you're trying to do the migration simultaneously, which is always a challenge. And it's coordinating internally on a couple different matters, one of which is establishing with your production groups a cutoff time, which in retrospect, I wish we had allowed a little more time for ourselves. But moving so much content it was difficult to get a solid date.
RICHARD O'KEEFFE: We actually were publishing up to a day before we cut over. The other part is having a mechanism to identify changes in your content. You may have delivered everything through 2022. But what has changed since then? Fortunately, we have a nice content-management system where we could run reports and identify everything that had changed since the last time we collected it.
RICHARD O'KEEFFE: So we were able to run those reports and include those in the delta along with the new material. So those are really the two facets-- all the new material as well as everything that has changed, and being able to identify those precisely and send it over. So that was a key benefit for us as well. But the other, probably most important facet of the parallel publishing, is determining, well, what's your output channel going to be?
RICHARD O'KEEFFE: For AIP, we used the JATS archive article 1.3 version as our archival content. And that's what we deliver out to customers and everything else like that. And it would have been too large a project to modify that. So we were exporting out to SCJATS. And we had the good fortune-- I work with my talented colleague, Jennifer McAndrews. She wrote the XSLT. And while we were preparing the final deltas earlier, she was writing that the conversion.
RICHARD O'KEEFFE: And we were testing and doing that to get that output. And you just work at that. And we went from 60% failures with articles we were uploading into the test area down to 2%, 3% by the time we were at launch. And there's still some cleanup to do for the outliers that you never catch during the process. But the overall bulk of material goes through. Since launch, we've been publishing anywhere from 100 to 400 articles a day.
RICHARD O'KEEFFE: And we have very few failures with it. So making sure you understand the spec, you read the documentation that's there, and you rely on the advice of the content architects and the information that DCL had provided during the content clarity, where we identified how you could tag things. You could use their material for samples. Pulling all that together to build your export, that was the critical path for us because without that export, nothing's getting to the platform at the quality that we want it to.
BETSY DONOHUE: That's great, Rich. That was a really detailed, thorough answer to that question. Anybody want to add on to that?
DAVID TURNER: Well, I think it's important that at the beginning of these projects-- you know, Brooke and Melissa, and Beth and Devorah, they work really closely with the clients to outline the schedule and the process and exactly how this is all going to work. Does it change along the way? Sometimes, but fortunately, this is not our first rodeo, as we say here in Texas.
DAVID TURNER: And it's something that we feel pretty confident that we can get through, even with a large amount of content. And sometimes it's just a matter of we have to push through, and we have to make it work. And we've got to work some extra hours to make it happen. But it's all worth it in the end.
BETSY DONOHUE: Nice. And related to that notion, worth it in the end, literally, this was part of the question that we got in our Q&A here. Let me read it and, David, let me know if you want to take this. And we probably have a couple of different answers, since it dovetails beautifully onto what you just said. Are there examples from folks in this group of organizations righting past wrongs in content migration to inject best practices, like accessibility, et cetera?
BETSY DONOHUE: What other rewrites, seen as onerous, are worth it in the end?
DAVID TURNER: Yeah, I think we've seen a lot of that. And part of that is because publishers don't always-- how do you how do I put this nicely? Budgets tend not to be unlimited for publishers, right? I mean, unless, I mean, AIP, you guys probably have unlimited budgets, right Tracy? But you always discover that little shortcuts were taken along the way. Yeah, we're going to convert this content.
DAVID TURNER: But we're not going to-- it would cost too much to do this data. Or we're going to capture all of our math as images. We're going to tag our references in this way. And, you know, so we have seen things come along in terms of connecting with supplemental material. We've seen wrongs righted around, well, like the math that was just mentioned; cross-references.
DAVID TURNER: The way we handle affiliations-- not exactly the same today as it was 25 years ago. And being able to address some of those things, I think, has made a big difference. Cleaning up DOIs-- as discoverability becomes more and more important, all these little things just make such a big difference. Accessibility-- I'm trying to think if we've done any purely accessible type of things.
DAVID TURNER: But there has been, just a lot of times when you're converting-- just getting that extra measure of tagging around certain elements makes it that much more able to be read by a screen reader and things like that. And we have seen like eBook productions and things like that from these projects as well.
BETSY DONOHUE: Thanks. Anyone else want to add on to that?
TRACY DENIEN: I'll just say that [INAUDIBLE]---- I'll just say along with what you said, David, you're always managing budgets. But certainly looking at the costs to do it during a migration versus backtracking. Once you're already on the platform, you really have to think about it because in a lot of cases it's worth the investment while you're doing a migration rather than trying to fix something once you're already on the platform.
BETSY DONOHUE: All right. And, Melissa, it looks like you had something to add as well.
MELISSA JONES: Yeah, so, just wanted to say a couple of things. The first thing is that a big part of our mission with SCJATS and SCBITS was to nudge people in the direction of better practices as much as possible. We really started that project so that content could be more reusable across multiple platforms, deposit services like Crossref and PubMed. So that is one big thing that I think helps in the migration is that we do have a much more restrictive spec than just everything that's available in JATS.
MELISSA JONES: Because you can do a lot of things in JATS, but not everybody agrees on the right way. So we try, as much as possible, to put the right way or as close as the right way as we know at the time into those specs to nudge people in the right direction during the conversion process. The second thing-- accessibility. I think I would like to see more. But I think that money does become an issue because it is a significant undertaking, especially when you're talking about hundreds of years of content and adding something that is really pretty editorial.
MELISSA JONES: Alt text that makes sense is not something that's easy yet for a machine to do. So we have some initiatives going on in the industry. JATS for R is planning to release a recommendation around accessibility, I think probably later this year. So I'm hoping that there will be some momentum around more publishers taking advantage of the migration process to implement some of those practices.
BETSY DONOHUE: Nice. Great points, Melissa. Thank you. And now, to take a little bit of a pivot. And we've been talking about technical stuff. What about communication, so broadly, non-technical strategies. The three groups participating on today's webinar-- can you summarize for us your strategy and your approach to communication and collaboration during the project?
BETSY DONOHUE:
MELISSA JONES: So I'll start with that because the hub of communications is actually a Silverchair ticketing system. So we have a migration ticketing system that both DCL and the publisher have access to. And they're able to talk to each other and talk to us. And we can all see what's going on. And we can chime in as necessary. And it enables us to communicate and see the history of the conversation, instead of getting pulled into an email thread that's branched off from another email thread.
MELISSA JONES: We can see the original problem and everything that's happened since then. And I think that created a lot of cohesion around communication.
RICHARD O'KEEFFE: I would agree with that because going back to see the history, what you mentioned, is really key. Because how many times, four months down the road, oh, what did we decide? And then you can go back and you can look at it. And in addition, there was also always the opportunity to speak with either a Silverchair staff member or with DCL staff in a separate call, if you had something very specific that a give and take through an electronic format might not have been sufficient to get the answer fast enough or to get the details that you wanted to.
RICHARD O'KEEFFE: You would always go back and want to document it in the thread as what was discussed, but you could, in order to maintain schedules have that opportunity to speak with one of the partners and get the proper answer as fast as you could, which was very helpful because sometimes, you just need to bounce off those repetitive follow-up questions so you can move on instead of waiting for people's schedules in a ticketing system.
RICHARD O'KEEFFE:
DAVID TURNER: Yeah, and I'll add to that, that we also-- we try to spend a lot of time in communication on the front end of these projects. We probably wore Rich and Tracy out a little bit with all the questions that we asked at the beginning. But we do try to make sure that we get off on the right foot and that we've got all the details and that we're all in alignment. We do try to have regular communication along the way.
DAVID TURNER: And then, just between DCL and Silverchair, we're in constant communication with their teams at multiple levels. So we do think that's all absolutely critical in this.
TRACY DENIEN: Yep, I agree, David. A lot of people have meeting fatigue, Zoom fatigue. But in a migration like this, having constant meetings-- even with the ticketing system, which I agree, Melissa, was really key-- being able to have the in-person communication, where you could address certain questions, issues, flesh things out, that really is needed as well.
BETSY DONOHUE: Mm-hmm. All really good points. And then kind of a follow-on from that general communication question, and then it looks like we're getting some questions from the audience that I can switch to-- "What are some of the most important non-technical things that partners can do in projects like this?" Is it that face-to-face time, either Zoom or in person?
BETSY DONOHUE: Tracy, what is it?
TRACY DENIEN: That's definitely one area. I think also, whether you've done a migration before or you haven't, is trying to pull together as much of your information up front as possible. I mean, we did have a lot of information that we were able to share with both DCL and Silverchair up front. We had-- a lot of our requirements were already outlined. Having conversations with the vendor.
TRACY DENIEN: I know Silverchair, there are things that we asked for just because that was the way we always did it. Being open minded and listening to your vendor because they're working with multiple organizations as opposed to just us. And there are best practices that we may not really be thinking about. So it really does pay to listen to what your vendor is saying.
BETSY DONOHUE: Great point. Great point. Anyone else want to add to that?
DAVID TURNER: Yeah, just touching back on the content, I think it's been said a couple of times, but when it comes to doing one of these migrations, they are big, and they're time consuming, and they can be expensive. You should take advantage of that opportunity and be really strategic about how you do it. You're going to have more chance to be able to fix things, and enhance things, and make them the way that you want during an initiative like this, as opposed to, like Tracy said, doing it later and trying to backtrack.
DAVID TURNER: There are some platforms out there that when you migrate to them, they just simply want to move it over-- lift and shift. And one of the things I love about working with Silverchair is that they don't take that approach. They realize that if you're going to take the time and the money to move, ought to do whatever you can to really take advantage of that.
DAVID TURNER: Be strategic. Get your content upgraded. And I think that then, the follow up to that is really plan for the time that it takes. Whatever you can do to not rush it-- I know occasionally Silverchair will get a deal where somebody says, oh, hey, my contract with my other vendor is ending in September. And it's June.
DAVID TURNER: You know, hey, I want to see if we can get this converted over really quickly. And that's probably a recipe for trouble, right there. So plan for the time that it takes. Don't try to rush it. And then I'd also say, it's a good idea to have a consultant. We do a lot of these with like a third-party consultant being involved, just sort of helping to manage the process. And there are some good consultants out there.
BETSY DONOHUE: Great. Awesome. Great tips. So let's shift for a bit to the questions coming in on the chat. So let me know who wants to grab this one, or everybody. "What are the panel's views on header-only versus full-text XML archives? Is the industry trending towards the latter?
BETSY DONOHUE: What are the cost "trends? Who wants to grab that one first?
DAVID TURNER: Well, I could talk about the cost. The cost tends to be a lot more expensive to do full-text XML. So when you're converting content to full-text XML, if you don't already have full-text XML-- much of what we did for AIPP was moving XML to XML. But if you've got PDF that you're trying to move to full-text XML, that's typically charged on a per-page basis, whereas when you're doing header-only, that's typically charged on like a per-article basis, or per chapter if you're doing books.
DAVID TURNER: And so it does tend to be more expensive to do the full-text. At the same time, full-text really gives you all the flexibility. It enhances your search. It gives you the ability to create multiple types of outputs. It allows your content to be mobile friendly, so instead of reviewing a PDF on your mobile phone, you can have referable content, if it's in full-text XML.
DAVID TURNER: I think that the industry is trending towards full-text XML. And I think most publishers are trying to create it as they go now. The bigger question is, what do you want to do on the backfile. And of course, at DCL, we want you to move all of your old things from the PDF to full-text XML. We're happy to help. [CHUCKLES]
MELISSA JONES: Yeah, I've seen this most often with smaller projects, where you convert from a PDF. You have one book, and it's only ever been available as a PDF, it really only makes sense to make it full-text. Otherwise, you kind of just get what you already had. Or maybe there's five or six books, and the cost implication is not as high as if you're talking about someone with thousands of book assets, millions of articles. Those tend to be more hybrid.
MELISSA JONES: Like we'll select part of our backfile to make that investment. But the rest we'll leave us full-text.
DAVID TURNER: We have seen a couple of publishers do things where, you know, if you look at their site, they have 10 years of full-text XML and then the next 10 years are PDF with XML headers. And then what they'll do is afterwards, over time they'll come to DCL and they'll say, hey, we want to maybe do another 10 years. And let's either take that to header-only, or they'll say, we're going to go back, and we're going to take the 10 years that are header XML.
DAVID TURNER: Let's get those to full text. And then let's expand the ones that we have that are not on Silverchair because their PDF only. Let's add XML headers so that we can get them up as well.
BETSY DONOHUE: Right. Thanks for that detail. A little bit of a different area coming in on the Q&A. Generally looking back I think this is one for AIP folks. "Looking back"-- well, actually for everybody-- "looking back, what would you do differently? What are the key lessons learned in this project for future migrations?"
TRACY DENIEN: You go first, Rich. Then I can go.
RICHARD O'KEEFFE: Well, I would say, one, it goes back to the parallel publishing. I just don't think we allowed enough time for that, based on scheduling, resourcing that we had at the time. We got it done, but I would have liked to have a little more time to have our internal staff get familiar with the publishing tools here. We were working in a staging area. And it was coinciding with DCL still loading content in as we were in that mad rush to get everything ready for launch date.
RICHARD O'KEEFFE: And I wish we had extended a little bit more time for that. The other is, coordinate with your people responsible for the mockups and things like that, because on a mockup it's pretty straightforward. Oh, this is the author. This is the main title. This is the abstract. All that stuff is pretty straightforward. It's all the surrounding metadata to make sure you understand exactly where that's coming from-- there's a date.
RICHARD O'KEEFFE: OK, is it coming from a pub date tag? Is it coming from a history tag? Is it system-generated? Understanding where that is, because it's all the peripheral metadata, whether it's the full-text record or just the header record, that is so key and very noticeable by your end users, whether it's a table of contents setting, your tagging for licensing, copyrights, permissions, subject information that you might have in there, as well as how are you author footnotes?
RICHARD O'KEEFFE: What's the labeling style? What's all that? Those little details of what people tend to notice and find because the full-text is pretty much a dump of text. You have paragraph tags with a few stylistic tags, maybe math tags in there. It's not as critical for Search and other functions on the platform as your other metadata. So I had wished I had a little bit more in tune with that early on.
RICHARD O'KEEFFE: And it sometimes can be difficult because the platform isn't built yet, and yet you're needing to make decisions on it. So understanding the mockups and engaging in a more in-depth conversation to understand how that data is applied on the platform will go a long way to helping perfect what your rendered output will be, as well as searching and indexing.
BETSY DONOHUE: Really great answer. Thanks, Rich. Anybody else?
TRACY DENIEN: To add to what Rich is saying, I mentioned earlier about customizations, is really, really thinking about what customizations are needed and trying to keep them at a minimum. Because the customizations add cost, time, and scope to your migration. So being thoughtful about what you really need. And, again, listening to what the vendors are recommending based on what other publishers are doing on the platform, that will definitely help with your timeline.
BETSY DONOHUE: For sure. And to add on to that-- I don't want to cut you off, but to add on to that, because I think it's right in line with where you're going-- we have another question coming in that said, "How do you make sure that you make the most of the opportunities of the migration, while sticking to a tight deadline? It's great to focus on quality improvements, but the reality is, there's a lot of pressure to complete a migration." And that's probably where you're heading.
TRACY DENIEN: Yes.
BETSY DONOHUE: Yeah.
TRACY DENIEN: And another thing, as far as if you do need customizations, it's really also understanding when you need them. Are they something that you need at launch, drop dead, you can't launch without it? Or is it something that you can live without and circle back post-migration? So think that helps control-- also controls that time. Also having a clear accountability.
TRACY DENIEN: I mean, I think we went from one migration where we had one person making decisions, and whether they were for good or for bad, it definitely streamlined things. With this migration, we did open it up to other stakeholders to contribute information. And I think that was very good in making sure that we met everybody's needs. But in hindsight I think, too, just having clearly accountable decision makers because otherwise some decisions tend to swirl a little longer than they should.
TRACY DENIEN: So those are some things that I would recommend.
BETSY DONOHUE: Those are excellent.
RICHARD O'KEEFFE: The circle-back point is very important because working with DCL, our primary responsibility was, they would highlight a discrepancy or disparity in the data and ask us, what do you want us to do? We would be able to tell them what to do. And they would go ahead and they would do it, which would mean that the content would be fine on the platform. But we had the benefit of them providing us with a listing of everything that was a problematic.
RICHARD O'KEEFFE: We know the decision, based on a ticketing system history. And we could go back, and we can update our archive. And in parallel, we can update our transform so anything new coming out will abide by that new compliant rule. But yet, maybe for 40 years worth of data that might need to be changed, we can go back and do that on our own schedule and not disrupt the schedule for the launch.
BETSY DONOHUE: Great. Excellent add-on, Rich. So we have one more follow-- actually, in the chat, a question going back very quickly to the full-text and header-only question. And the question is, "Are there any case studies comparing the monetization and profitability in that previous point?" And I think if there are, the unspoken part of this question is, can we direct folks to those case studies?
BETSY DONOHUE:
DAVID TURNER: I'll have to think about that. I bet I could find some information on that.
BETSY DONOHUE: OK. Great.
DAVID TURNER: That's also Guy. OK, Guy, contact me afterwards, and let me see what I can find for you.
BETSY DONOHUE: Yeah. Wonderful. Thanks, David. And then finally brief question, but we could probably talk about it for hours. There's lots of excellent answers and specific technical detail that was shared on this webinar. But if we could really boil it down organizationally, starting with AIPP, what have you gained from this migration?
BETSY DONOHUE: The features and the benefits covered, it's the next step-- so what does this allow AIP Publishing to do that they weren't able to do before? Tracy, do you want to start that one?
TRACY DENIEN: Sure. As I said, we have more control, which is a big thing for us to be able to continue to move the platform ahead. I think also for us, it's getting more access to our customer data. And what we're looking forward to taking advantage of is some of the robust tools and data capabilities and analytics capabilities that the Silverchair platform offers. So our first challenge was getting migrated and having a very professional looking site that was researcher friendly.
TRACY DENIEN: Now it's going to be about us taking advantage of more of what Silverchair has to offer moving forward, to continue to grow the platform.
BETSY DONOHUE: Wonderful. Great. And, Rich, your perspective-- anything to add there
RICHARD O'KEEFFE: No, I would just add on to Tracy's comments, it gives us a foundation on which to grow and evolve with our material. One of the offerings that Silverchair has is that they have a tool known as Radiate that does content delivery. We only engaged with it for launch for industry standard delivery, such as Crossref and PubMed because it would have been too big a lift to deal with three dozen other customers and changing all the formats simultaneously with the launch and all the correspondence that's required for that.
RICHARD O'KEEFFE: But it gives us the opportunity to maybe explore and use that tool going forward as we evolve as an organization. So having that palette of tools available to us certainly increases our potential capability moving forward.
BETSY DONOHUE: Wonderful. Excellent. So we're coming close to the end. We have two minutes to go. But I know that Stephanie wants to close up and spend a little time talking about some closing comments. So I want to hand things back over to Stephanie. Thanks, everybody, for joining today. This was great
STEPHANIE: Yeah, thank you so much. It was so hard to stay in the background and not want to jump in. This is-- you guys really nailed it. Great discussion. Thank you so much, all, for joining. Thank you to all the attendees for participating in the Q&A and chat. And I'm just here again to remind you that the recording of this will be available on the website later this week, along with the transcript.
STEPHANIE: I'll also be emailing it to all the attendees and registrants. And as well, one more plug. We'd love to see you in DC in September. The link for the registration and discount code are in the chat and will also be sent to you in a followup email. So otherwise, thank you so much to our speakers and to all the attendees who joined us for these virtual sessions.
STEPHANIE: It's been really nice to connect with people from all over the globe throughout the year at multiple points throughout the year, which has kind of been the benefit of having more virtual events in our lives in the last few years. So thank you again, everyone, and have a great rest of your day and week.
TRACY DENIEN: Thank you.
RICHARD O'KEEFFE: Thanks for having us.
MELISSA JONES: Thanks.