Name:
Data in the Driver’s Seat: Using Data to Steer Your OA Business Strategy
Description:
Data in the Driver’s Seat: Using Data to Steer Your OA Business Strategy
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/729ef50b-d44b-455a-a590-7d97607d0b7c/videoscrubberimages/Scrubber_1.jpg
Duration:
T00H24M18S
Embed URL:
https://stream.cadmore.media/player/729ef50b-d44b-455a-a590-7d97607d0b7c
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/729ef50b-d44b-455a-a590-7d97607d0b7c/copyright_clearance_center___data_in_the_driver%e2%80%99s_seat__usin.mp4?sv=2019-02-02&sr=c&sig=6BmOsEMgvCugJirTBcyQdjAm7%2B1qCAD7p1dcXtqrFys%3D&st=2025-04-29T20%3A37%3A21Z&se=2025-04-29T22%3A42%3A21Z&sp=r
Upload Date:
2024-12-03T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
So excited to that. You've decided to kick off SSP with us. So thank you for for joining today. I'm Jamie Carmichael. I'm senior director of information and content solutions at copyright clearance center. And I'm joined by my very close colleague, Shannon revell, who is senior product manager at c.c.c. Code of conduct.
Today we welcome open and respectful dialogue. We do plan to have a question and comment period toward the end, so please share your thoughts and perspectives with us. OK so before Shannon gets into the meat and potatoes of our presentation, I wanted to first share a little story. So over the weekend, my family and I visited new York city, and I've been to new York dozens and dozens of times for business, for fun.
But this was my first time touring the uss intrepid. This is a massive I could not get over the size of this. This is a massive aircraft carrier that served in world war two and Vietnam. And this was really pretty special to my family as both of my grandfathers and my husband's grandfather served in the navy during world war two, while my seven-year-old kid was looking for artillery around every corner, I stumbled upon a data distribution room, which I found really intriguing and how it worked on the picture.
On the left, radar dishes on top of the ship gathered signals about the location of enemy and ally aircraft and nearby, and sailors would use the panels. In this room. You can see on the left hand side, they basically use them as like a switchboard of sorts to route this information to different areas of the ship. And one of those places was the combat information center, where sailors would then analyze the data looking for threats and opportunities to inform their position, their their strategy, their military strategy.
And I know that scholarly publishing the stakes aren't as high as this, but I can't think of a time where it is more important to leverage data to inform how you're steering the direction of your businesses. And, you know, this was just this kind of this this resonated with me. And more and more, we're seeing publishers invest in ways to collect and analyze and distribute their data to take their businesses forward.
It's it's really becoming core to the the business transformation that we're seeing across the industry. In the case of transitioning to open access, we see publishers considering whether the model that got them to where they are today is enough to transition at a pace their customers want. And on top of that, asking whether it's sustainable too, to their businesses in the long run.
Do they need a single model? Do they need to adopt a mixed model approach? These are all good questions that publishers that we work with are asking. And of course, with really big decisions, heavy decisions. Publishers are not taking these lightly and are working really diligently to bring data into their strategic approaches. But what we're seeing is the challenge lies in the quality of the data that should be used to inform this strategy.
And more often than not, data are not structured or disambiguated enough to help provide accurate, trustworthy direction and certainly not at scale. So here's where we talk a little bit about our solutions in the scholarly publishing space. So we have a newer product called open access intelligence, which automates data enrichment, modeling and analysis to help accelerate strategic objectives.
And we also have rightslink for scientific communications. This is a core open access management platform which operationalises both agreements and traditional author workflows to help transition away programs at scale. And what the combination of these two solutions does is it enables publishers to optimize their programs to meet the needs of their customers and authors, to help trust the accuracy and to help trust the accuracy of their data and automated affiliation and funder disambiguation and gained strategic insights into their historical open access program.
Both both when it comes to publication data in open access articles as well as closed articles. And so with that, I'm going to turn it over to Shannon to dig into some use cases that we've been working through recently. Great thank you, Jamie. So we have three use case examples that I'd love to walk you through today.
All related, but quite different. So in this first example of a publisher that is leveraging intelligence, we have a large publisher here. About 40% of their publication output is open access. So by today's standards, I'd say that's fairly significant. And they've engaged in several transformative agreements over the last couple of years to drive that growth. But they are at a bit of a crossroads with their different models and how much further they can take that.
So they're interested in possibly some more transformative agreements really scaling that program. But at the same time, they're also considering other business models like subscribe to open. And so either way you cut it, whatever way you're going or think you might want to go, understanding publication output by institution is a critical component.
But if you're sitting in this room and you look at any sort of article level data, you probably understand that that the affiliation information on publication records and articles can be quite sparse. And so this particular publisher was facing a number of issues as they were analyzing their publication data, trying to see that by institution. So these are things that seem really simple, but when you're dealing with them at scale across thousands of records takes a lot of time to rectify.
So great example. This publisher was not a client of any sort of persistent identifier. They did not use ringgold ids or anything like that. And so a lot of their affiliation information for each article was entered as free text by the author at submission. So you might have something that says Massachusetts institute of technology.
On the next record you have MIT, and on the third record you have MIT department of engineering. And so with the human eye, you might be able to go through and rectify that. All of those are MIT, but across thousands of articles, that's a big time, time consuming effort. On top of that, the human eye really can't leverage other pieces of information that might validate that affiliation further, things like the author's email domain, or if they are an APC or formerly APC model publisher information about who actually paid that APC invoice.
And so that gets really harder to rectify with the human eye. So with intelligence, we were able to double the records that they had disambiguated by eye. We were able to do that in about an hour. So a lot faster to get a lot more results. Each one of those records also leveraged additional pieces of information that their manual disambiguation effort had not. So things like email domains, sub domains, APC, invoice information, like I said.
Furthermore, I think that this particular publisher had a kind of special outcome maybe of this exercise. We got a lot of different people from different areas of the organization in the room to talk about data and what we see when in rightslink for scientific communications, when we're helping publishers operationalize deals, we see a lot of operations folks and sales operations folks doing the best that they can with the data that they have when a sales person comes to them and says, hey, how much publishing are we doing with mit?
And so they're doing their best. And so by bringing in decision makers into the conversation about standardizing their data so we could disambiguate it, we were able to gain access to other experts within their building and see that there were other pieces of data that we could use. So in our next iteration on this particular project, we are expecting even more results because they have found that they have some other information in little alleyways internally.
So that's our first project. Now our second project is similar, but the publisher was trying to ask a very different question. This is a small society publisher with very little open access output today. And so their question was, should we engage in any sort of agreements? Do we need to and if so, where do we even start? And so their angle was to take a look at their author information.
And this for a small society publisher was a huge undertaking. If they looked past at the past couple of years, they had almost 100. It was actually closer to 150,000 records. And so if you think about manual exercise, similar challenges with free text entry, it was just not something that they were going to be able to do. In addition, they were really interested in getting their sales team out of excel spreadsheets.
If they're going to go through this exercise and have normalized data. So with intelligence, again, this was a trial run at a test set of their data where actively engaging on this project. But we have some really exciting early results. Their test data set, we were able to match 90% This is actually a ringgold licensee customer, so they tend to have, especially on more recent records, some better efficacy in the collection of those normalized names.
And so we were able to really connect the dots with their historical publication records and create a better output. What's possibly more important for this publisher? Because once they saw these results and realized that we're going to be able to get them to a much better data set, I think the realm of opportunity really opened up. And so now we're looking at our visualization tools within intelligence and getting their team out of excel.
Spreadsheets will be able to actually visualize their institution publication by journal so they can, with just a couple of clicks, clicks, figure out which institutions are producing the most publication with them and perhaps target those institutions for some type of deal. OK now, our third example really switches gears out of analytics and into operations. So let's say you've made the decision that you're going to engage in.
It could be a transformative agreement. Some other open access agreements subscribe to open. But along the way, you still need to understand where your publications are coming from, what the institution affiliation is. And as Jamie mentioned, rightslink for scientific communications is a product of ours that has been helping publishers manage and operationalize really their whole open access program, but they're transformative and other agreements for about five years now, about a little over a dozen publishers, I think, and we have always supported whatever identifier the publisher has adopted, if they've adopted an identifier.
We love that no matter what one it is. But we also can lean back on email domains as a sort of matching criteria to make sure that real time as articles are accepted, they are going down the right path based on funding options that are available to them for their open access. And in in managing all of this, we realized early on that the relationships within a university system is of critical importance.
You if you if you are engaging in any sort of negotiations around this, you might know that as publishers and institutions are, I think, getting better at sharing data, they are also able to get more intricate with the terms of their deals. And so we're seeing that within a university system, not all schools departments, et cetera are eligible for coverage under that deal. So it could be something like a university hospital.
Oftentimes research that comes out of a university hospital can be privately funded. And we're at a time where we really need to be looking at these other funding sources within this ecosystem. And so we will be working within rightslink for scientific communications on a new feature that's going to enable our publishers to actually automate around these very intricate deal terms.
So we've always supported, of course, ringgold ids on the platform well before c-c-c acquired ringgold. But now that we are all part of the same family, we are able to do a lot more with this ID, so coming probably in the fall we will be able to expose the whole hierarchy to our publishers so that when they are setting up the terms of their agreement so that we can then automate that throughout the year, they'll be able to actually carve out those exception cases and ensure that although a university is eligible for unlimited publishing, open access, we can carve out a university hospital if research is coming from there and ensure that we gain access to potential private funding.
OK so. The other thing about the institution levels and the university system, it comes back to the agreement modeling piece. And so this is an additional area that we will be working on within intelligence to also enable modeling at these different levels. So if you do need to start carving out university hospitals or thinking about what that might do to a given deal, what impact that might have.
The modeling side of the coin will have similar functionality very soon. OK so before we kind of say our closing remarks, I just wanted to pause and see if there are any questions in the room. We'd love to have a conversation. One in the back.
Hi yes. My question for the intelligence was about if you have challenges with differences for affiliations geographically, so do you find that your system is better at matching things in, say, north America than Europe or in asia? That's a great question. So we leverage the Ringold database, of course, as part of our whole matching algorithm.
And so the ultimate output of our disambiguation is a parent level organization with a Ringold ID and a confidence level in our match. So that's what we can achieve. And, you know, geographically, I think it really depends on the Ringold database and what's available in there for various levels of an organization. So certainly within Europe and America, that database is really, really healthy and it continues to grow all the time.
And so that is extending east for sure. Any other questions? Come on, give me one. OK please. When the two questions. So first, do you also support the raw data? The raw matching?
Yeah yeah. So within our tool that actually operationalizes agreements, no matter what identifier a publisher wants to use in their manuscript metadata, we can support automation around that. So that includes a raw ID. In the past it's been grid ids. We have some publishers using those or they used to before they merged.
And so yes, we can support both on the operational side. Great so we can support the automatically without the further training, the model? Yes, that's right. Within the within the actual management of an existing deal around the modeling and modeling a new deal, we would be mapping information through our disambiguation engine to an ultimate parent level ringgold organization. So that's consistent on every record I see.
Thanks so last question. Do you have any the statistics or results that mean the accuracy about the accuracy of your the disambiguation engine? Yeah so it really it really depends on the publisher and what their data collection looks like. But we have run this across over 500,000 manuscript records and we were around 95% matching on all of those. Now this is working with the clients that we've worked with for years that we've been helping them kind of think about these data cleanup exercises over the years.
So, I mean, the the match rate will matter when you start and what you implement for sure. Thank you very much. Hai hai is the disambiguation output delivered by just excel or is it an online dashboard? No how is it delivered? Yeah, it's a really good question. Thank you for asking it.
So we intelligence is really two key parts. There's the disambiguation engine. When we get data into the tool and we are actually trying to normalize the institutional relationships, we actually also do currency normalization as well, maybe a little bit less critical at this point. But if you need to be able to look at publication payments that have been made over the past, it's nice to have that normalized.
And then the the big piece of this tool is if I kind of go back here, I think we have a here you go. Here is a screenshot of what is actually shown once we have your data in the tool. And so I know it's teeny tiny, but there are really important things in here. So there's a number of different filters. So once we have actually disambiguated your data, you can go in and actually filter your data by institution using the name of the institution.
If you don't use ringgold or searching just by the ringgold ID, if that's something that you have access to, you can filter it by journal, by different date ranges, all sorts of different things to ultimately get a visualization of, you know, breakdown by institution, by journal, by author, country, by the cc license that it was published under. So the more information that we can get, the better. But the key piece around the disambiguation is the institution.
Yeah, but visualization is a key component to. Hi I was just wondering, is there any way or are there any plans to include opportunities to put pricing in here and then to be able to model that going forward? Yeah, absolutely. These are great questions. I couldn't have planted you myself.
So the so you kind of can't really see it. But up along the top here, you can see there's like three dots right along along the way. So once your data is in the tool, intelligence breaks down, modeling an agreement with a new institution or a consortia, whatever it is, it's broken down into three steps. The first is called prepare. And that's really all about looking at the historical publication data that we have now normalized.
And so that's what this piece looks like. Once you are happy with the historical data that you're looking at that is reflective of the relationship you have with that institution, maybe the journals that you want to include in a deal. Then you move on to the next step, which is build. And within that step there's a bunch of projection tools, different parameters that you can enter. So things like payments, if it's a read and publish, there's areas to enter subscription payments as well as APC changes, discounts on apcs, publication growth.
If you're going to start flipping journals, flipping from pure to hybrid, there's a bunch of tools for that. Yeah, absolutely. Anything else? Goodness gracious. One and the third step is analyze, which is well, which is to put the history and the future side by side be able to compare what is going to change over the course of a potential new deal.
And then coming back to our three questions ago, that's also where you can actually pull out an excel export of the data that was used to create that model. And what we imagine and what we see some publishers doing is using that as what they share with an institution for negotiation. We have followed the article list template as our output in excel.
We did a bunch of different conversations with mainly consortia, administrators across the globe, and that was if there is a market standard, don't think there really is quite yet markets working on it. But that seemed like a really good place to start. It had a lot of information that those administrators needed. We have added to it a little bit, so we call it isac plus it has a number of additional fields that in our conversations with those admins came to light that we should add.
Thank you for asking that question and letting me finish the three step process. Awesome OK. Any other questions? All right. If you take nothing else from this session, please let it be that if you're looking at your historical publication data and you're thinking, I can't do anything with this, this is going to take us way too much time.
Not all hope is lost. There are enrichment capabilities out there, and we would love to talk to you about them. And second piece, if you are in any sort of position of power and decision making at your organization, the earlier you start, the better. We've been helping publishers to operationalize these changes over the years, and it can be a mess downstream if you're not thinking about data first.
And so we're happy to help you untangle that mess. But we would all prefer if we started off with the right data from the beginning. So think about that. Talk to your teams and you may be able to find some information that you didn't realize that you had. All right. Thank you so much for coming to our session.
And enjoy the conference.