Name:
Are You Thinking Enough About Your Platform Data?
Description:
Are You Thinking Enough About Your Platform Data?
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/6c9d0cb0-a92b-4dfc-8281-15093ed8472a/thumbnails/6c9d0cb0-a92b-4dfc-8281-15093ed8472a.jpg
Duration:
T00H47M52S
Embed URL:
https://stream.cadmore.media/player/6c9d0cb0-a92b-4dfc-8281-15093ed8472a
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/6c9d0cb0-a92b-4dfc-8281-15093ed8472a/7 - Are you thinking enough about your Platform Data.mp4?sv=2019-02-02&sr=c&sig=JPW0XMs1ya4h9vCMjEx8eFm1yPqScdFTotzpXAv5XZk%3D&st=2024-11-21T08%3A57%3A29Z&se=2024-11-21T11%3A02%3A29Z&sp=r
Upload Date:
2020-11-18T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
WILL SCHWEITZER: So we're going to start with some just very basic questions kind of building up into a more nuanced discussion of data and what it means in our industry. So, Ned, what do we mean when we talk about data?
NED MAY: Oof-- [LAUGHTER] Well, I think to start, we've been talking about data all day long. So it's interesting to have a discussion at this point that is pulling it out and trying to put some confines around it. But I think in this context for what we're talking about, we're really talking about data-on-data or data-on-content. So facts, information that is not information.
NED MAY: But it's unstructured. It's raw. It's unorganized. And we're looking at it, and we're trying to discern what the value, if there is value in it, and what to do. So we heard a lot today about structuring it. So we're taking it back. And we're saying, this is the pile of facts that we can see and discern on information, whether that's a data set, or content, or even a marketing initiative.
NED MAY: So we're looking at that aspect of data, I believe, in this discussion.
WILL SCHWEITZER: Right.
NED MAY: So if I'm on the wrong track--
WILL SCHWEITZER: No, I think you're on the right track. And if not, we're going to go with it anyway. And one of the reasons we asked Ned to join us on stage is he actually brings a lot of perspective from broader industries and what makes for a successful data product. So are there generalizable rules about the quality of data or the scale of data really to be entering into this space?
NED MAY: No. So as background, you know, at Outsell, we look across what we call the data information and analytics industry. It used to be easier because we could call it publishing and media. It spans from newspapers to STM to legal tax and regulatory. We have been watching this emergence of data and the business of data for quite some time. And one thing we've learned is that it really starts with the individual and particular need of the enterprise and what they're trying to accomplish and how they're trying to serve a particular and a very narrow client set.
NED MAY: So to say, are there some rules in general is tough. There are. We have a structure and a format that we call the data-value pyramid. But I would have to take every example and walk through and say, where are you in your own continuum on this? Where does the value lie in that data? Is it raw and unstructured, or is it highly predictive?
NED MAY: And that will dictate really some of the processes you want to put in place to monetize that. So a lot to unpack with that question.
WILL SCHWEITZER: Right, so and for those of you who haven't seen Outsell's data-value pyramid, it looks like a food pyramid. But I think it might be helpful if you talk a little bit about in the existing information market today, what's at the top, and what kind of data creates the most, say, financial value or return, just so we have a sense for scale of the universe.
NED MAY: Well, at the top it's-- you know, we heard a lot about AI today and some of the limitations. So it's still a bit dreamy at the top in terms of prescriptive analytics where not only can it tell you what you're going to do, but it can actually influence you to do it. So at the bottom of that pyramid is this raw unstructured content. Then there's smarter content that is tagged as many have talking about already today.
NED MAY: And it's able to be served in a much more meaningful way to meet the needs. But then as you move up that, there are ways to determine greater insight out of that and really understand what's about to happen. And then, again, that holy grail top is to influence you to get you to do something.
WILL SCHWEITZER: Right, and could you just name a company or product that is at the top of the pyramid just as a way of helping us. Is there someone like Dun & Bradstreet? Is that a--
NED MAY: I mean, in the financial services space is where we're seeing tremendous amount of experimentation with some of this. But that's a great question, not when I came prepared to answer.
WILL SCHWEITZER: That's all right. [LAUGHTER]
NED MAY: Who is at the top of the pyramid today? I'll have to think about that.
WILL SCHWEITZER: OK, this is the danger of agreeing to do sessions like this with me is that I don't stick to the script so--
NED MAY: Yes.
WILL SCHWEITZER: --apologies. So, Lauren, bringing this into the scholarly publishing context, some of the bigger players in this room think a lot about their data. They may have data products. They may have even a CIO. But for an independent or society publisher, what should they be thinking about?
LAUREN KANE: Well, I think everyone irrespective of size needs to be cognizant about what's going on with data. You know, as we see, as you say, roles are changing. Elsevier, that we all know as a publisher, instead now brands themselves information analytics company. So what does that tell us about the value chain, and the way it's changing, and the directions that money is going, and the way products are being monetized? So for societies that don't have the scale or the resources of Elsevier, they're still things that can be done.
LAUREN KANE: And even though there are so many more questions than answers right now for most, there are still practical starting points. And I think the first thing to do is really think about what do we mean when we talk about value? Well, it's internal value of data meaning to a specific society or organization. What can the data tell them about the ways that they might innovate, about the products that they might create, about insights that they can have into their customers, into their users?
LAUREN KANE: And certainly there is internal value there. But then also, where is their external value? How are those same insights, how is that same information potentially valuable downstream? What sort of information right now in the current value chain is being provided to third parties perhaps not with any sort of financial return, but that that then party, in the case of say abstracting and indexing vendors, that in the past content was provided and the return to the society was discoverability.
LAUREN KANE: And there certainly is a value in that discoverability. But in this changing value chain when perhaps that third party is then monetizing an aggregated product, is discoverability then enough? Or should that society be getting more in return? There's not an easy answer to that question. But I think it's a question that everyone should be asking themselves. I think the biggest thing going forward is that you may not as an organization be prepared at this point in time.
LAUREN KANE: You're busy with so much. You're dealing with open access still. So don't talk to me about data. And so you might think we're not in a position to build product around data. We don't know what we want to do with this yet. And yet, just because that is the case, you don't want to shut the door to the future. You don't want to give up rights now.
LAUREN KANE: You don't want to give up monetization ability. And so just asking those questions now and getting to a point where you're thinking about what you might do later I think is a good starting point for societies.
NED MAY: Yeah, I would add to that. And we saw this in some of the other spaces is experiment and experiment carefully. So put contractual terms in place. But make sure they're limited. Make sure you're protected, right? Recognize that the rules are likely to emerge, and you'll understand those ways to make value near term. So don't overcommit. But don't hesitate to get involved too.
LAUREN KANE: Good advice.
WILL SCHWEITZER: So, Ned, do you have any other practical advice for somebody in this room who may be thinking about a data product or, say, brokering their data?
NED MAY: Where do we start? So I mean, there's very tactical concerns around push-versus-pull, the technical setup of the data. We haven't heard much about Cloud and where to put it, how to position it. We could go into that. But on the business side, I would say, again, experiment. Understand that for you that piece of data is extremely valuable.
NED MAY: It's the most important thing you've just discovered. It's the future of your business for some where we heard earlier on the pressures on Direct Publishing. And that may be true. So make sure you understand the commercial value you're applying to it as you share it with the ANI. But don't overreach too. Understand that it might not be as valuable, right? It may be just a piece that you need to play.
NED MAY: We talk about a theme for this year was is going to be the iceberg economy. And we heard about personalization earlier. And there's tremendous need to deliver to individuals the information they want. But also underneath that is this massive effort. And at times, you're just a participant in that smaller ecosystem. So be flexible in your mindset.
WILL SCHWEITZER: Right, so I'm going to jump ahead a couple of questions because we brought up this notion of value, which I think is closely related to how folks in this room may think about assigning a price to their data. So we're really good at figuring out how to price a journal, or how to price a book, or, perhaps, how to approach third party licensing. And it seems like a lot of data products-- or when you're aggregating data, you're really talking about derivative value.
WILL SCHWEITZER: And the value is created one step away from providing information from the content you publish. Any suggestions on how to think about price?
NED MAY: A couple-- [LAUGHTER]
LAUREN KANE: --carefully-- [LAUGHTER]
NED MAY: --you know, we had a great session before. I think it was Coleen and the marketing. And there's examples in the space of marketing where we're seeing these data marketplaces emerge where-- the acronyms aside-- where people are collecting data about consumers so that we can track you every moment of the day and serve up that ad that follows you across every moment of your day and gets really annoying over your shoulder all the time.
NED MAY: But it's really effective. And they're really good. And the tech startups in Silicon Valley have created these companies that just do that. Along come the big platforms-- Oracle, SAP, Adobe-- and they're saying, that's nice. But we're just going to do that for free. So there's a lot of effort that's gone into creating a marketplace for this data.
NED MAY: People think they're going to get rich off this data and then large giants come along and say, no, we just want to keep you on our platform. And we're going to do that just as a cost of doing business. So there's no real price to be associated with the marketplace itself. That's kind of the extreme. And the other extreme, Will, is Alt Data in financial services where there is no price going in because the-- and some of these buildings here are filled with servers, not people, processing this data.
NED MAY: That's what's nuts about Wall Street today is that server space is more valuable than people space. Because they're crunching this data. And they're going out to companies. And it's the notion of Alt Data. And they're saying, we don't know if there's any value-- and this is the hedge fund traders-- but give us all your data. Let us process it.
NED MAY: And if we reach alpha, if we determine that there's actually some signal in there that we can profit from, then we'll talk about what we're going to pay you for that data. And so there's this massive leap of faith involved. But if you don't give them the data-- and it's odd data sets too. Its satellite images of cars in mall parking lots. It's weather patterns. It's not things you think of.
NED MAY: But they're ingesting all this to try to determine whether there's any value. And then they're setting up this notion that if there is, we'll come back and talk terms.
WILL SCHWEITZER: Right, so a key point there was this leap of faith. And publishers usually aren't really good at leaps of faith particularly in handing away our IP to somebody else and then letting them come back to us with a price. Lauren, where does that take you?
LAUREN KANE: Well, I mean, I find it fascinating. I mean, I'm certainly a learner in all of this. So listening to you, Ned, and in the previous conversations we had preparing for this discussion, I think that we are all new to this, to data, in this industry. Not all of us, but let's say most of us. And I think there's a lot of lessons that can be learned from other industries and from what you're saying.
LAUREN KANE: And what struck me with what you just said is this idea-- not that we should be going out to people and saying, what do you want to pay me for this-- but is the idea that there is going to be a community approach to assigning value. I think that as we collectively learn what we have, who the customers might be, what the market segments might be that there is going to need to be some kind of transparent discussions among organizations and players about what is the proper price or value for this?
LAUREN KANE: How is this going to fit into the economic mix of the future? And I wish I had the answers to these things. But again, I think these are questions that not only should individuals, societies, and organizations be asking, but these are things we should be asking in community, again, to try to find that mystical, sustainable ecosystem that we all keep talking about.
WILL SCHWEITZER: So we've touched on essentially wanting the right business model framework, potentially the right contractual framework, the right partnership model for data products or data strategies. And, Lauren, I think in our preparation discussions, you've raised that publishers are sitting on a lot of data. Societies are potentially sitting on a lot of data. But we don't actually, say, have a clear ethical or legal understanding of what we can do with it.
LAUREN KANE: Yeah, and I mean, the contractual piece is really the tough one here. I mean, we're living in a post GDPR world. This is all something that we are rightfully very cognizant of. And I think if we were sitting here, you know, from a corporate perspective, if we took this from just a legal burden, we might say as long as findings are anonymized, then we're fine.
LAUREN KANE: This is not something that we should worry about. But I think coming from the nonprofit perspective, from the society perspective, when our customers and our users are also members of our community, it's not just a legal burden. It's a moral, ethical burden about what is appropriate and fair to do with these customer and user insights and not just to amass that data. But what does it mean when I then take those insights and derive revenue from them?
LAUREN KANE: Is that appropriate? So again, these are questions to consider. There is not a right or wrong. But I will say from a licensing perspective, most societies are strapped and are using older versions of subscriber licenses, of user licenses. And most of them do not account for the way that we, again, amass data let alone monetize data.
LAUREN KANE: And so again, not wanting to shut doors on future products or future pathways, I would just encourage everyone as much as possible when you are updating your licenses to make sure that you put in a framework to allow for you to go in those directions in the future. Maybe you don't end up actually doing those things, but it'd be nice if you had the rights to.
WILL SCHWEITZER: Ned, you look like you--
NED MAY: Yeah, and I'd just add to that maybe throw some more water on the fire here. One thing I haven't heard discussed at all today is security and privacy and to your point around GDPR. And if you go down that path of monetizing that secondary information, you're going to open yourself up to more and more scrutiny around that. And the financial burden of maintaining an effective and secure information site is quite high.
NED MAY: And it's continually getting harder and harder. So there is some sort of goodwill that's acknowledged around membership organizations that may not be acknowledged if you're suddenly trying to monetize that membership in a different way.
LAUREN KANE: And I think that's an excellent point too just in terms of as part of these baseline discussions and discovery periods in your organization. A lot of this is a cost calculus. This is not something that happens overnight. And this is not something that happens without cost, the costs of looking into these licensing issues, looking into legal limitations to figuring out from an infrastructure perspective how is data being stored?
LAUREN KANE: What formats is it in? Are there ways to package this? There's a lot of cost in here. This takes a lot of resources. So if you are not an organization in a particular market or market segment where you're going to be able to readily recoup that investment, then perhaps this is not the path for you. But again, I think having at least a baseline audit of what do we have, what is the value, what would the cost be, that's a discussion worth having.
WILL SCHWEITZER: So, Ned, hearing that our industry-- and I think it's reasonable generalizable-- may not have what you could consider usable, end user licenses or subscriber licenses to have access to the data, is that something other industries have seen? Is it signs of, essentially, a maturing market or a maturing data structure?
NED MAY: Well, certainly on the consumer tech front-- I mean, we know the 30 page licenses that we have to click through to use our phone every day. So yes, but I'm not sure it's any better off. I think what it reflects is that there's not a strategy around data in place yet. So we heard a lot about unstructured data and how to gather this. We know there's insight.
NED MAY: But the value will come once it's a little more thoughtful in what we're capturing. So we heard today about you don't have the right data. And that's because you haven't been getting the right data, right? So you need to start with the questions, work backwards, and slowly build up on the data set that you need rather than trying to structure, rather than look at the license, see what you can peel out of your user engagement today, and make a case for data.
NED MAY: It needs to be much more thoughtful in the approach. So the license will be a piece of that.
WILL SCHWEITZER: Right, and I think this is a dangerous question that wasn't on the script. But it sounds like a lot of companies are essentially operating in a gray area. Is there any guidance for essentially thinking about what is the legal risk?
LAUREN KANE: How do we operate in a gray area? Tell us. [LAUGHTER]
NED MAY: Well, this [? stay ?] beyond this industry. There are companies that have built themselves operating in that gray area, I mean from regulation and license. This strikes me as a rather pragmatic and conservative piece of the data and information industry, the scholarly publishing by nature. So there are a few examples-- you probably know them better than I-- of those that are pushing the limits that have set up licensing agreements that don't openly acknowledge their intent for the data but are looking to monetize that.
NED MAY: I don't know if we would consider that gray or just opaque, right? [LAUGHTER]
WILL SCHWEITZER: Excellent, Lauren, you mentioned earlier, essentially, a publisher or society thinking about, say, the content, the license where they contribute to an ANI service. And I think we've talked about the value of, essentially, well-structured data. And we know that well structured data, well structured content corpuses have a lot of value, say, as training sets for machine learning applications. And I guess it isn't clear that a lot of publishers could essentially get a financial return for that.
WILL SCHWEITZER: But I guess the question for both of you-- Ned, are there examples outside publishing where companies are getting a non-financial benefit from contributing their data to a larger universe or to another organization? And I guess after hearing that, Lauren, I'd be interested to know if you think there are some examples or some ways to apply that here in STM.
NED MAY: Yeah, if you want to start--
LAUREN KANE: Sure.
NED MAY: --I'll work on mine.
LAUREN KANE: So I think it's a good starting point because any organization however small-- you hear this all the time, well, we don't have data. Well, you all do because you have metadata for your content. And you spend a lot of money on it. It is in, as Will said, highly structured formats that cost a lot to prepare. Much of it has meticulous author information, article information, things that, as Will said also, will have a value to machine learning, to third parties in a variety of different ways.
LAUREN KANE: And so again, in the past, that has often been provided freely to different discoverability engines thinking that the benefit to the society in providing that was that users would be driven back to the full text content. And it would drive sales. And that's 100% true. But again, in thinking that now in this new kind of value chain that those third parties, the Scopuses, the digital sciences, the Clarivate Analytics-- just listing my future enemies from saying this-- [LAUGHTER]
WILL SCHWEITZER: Just don't look at Daniel, it's fine. [LAUGHTER]
LAUREN KANE: --you know that they are deriving value. They are monetizing this in the aggregate with other products. And so there is a question of what should be received in return? And I am not suggesting that there should be a payment for this, a financial payment. But there are some that think that perhaps there should be some sort of quid pro quo that the value received is more than discoverability. And for some, it might be a discount on a product that this is being used for.
LAUREN KANE: So or it could be specialized findings, the example we talked about, or surveys. So we all the time in this industry will answer responses or calls for surveys. And perhaps I just did one the other day for executive compensation. And in exchange for me providing my information, what I got in return was an aggregated summary of peer information and executive compensation benchmarking.
LAUREN KANE: And I got that in return as a value for my providing information where others that didn't provide information would be charged for that product. So for me, that's a great analogy that what we want to think about, what specialized benefit can we get for the information that we provide if that information is going into an aggregated product that is then being sold?
LAUREN KANE: So it is obviously going to be highly dependent depending on the size of your information, what market segment you're in, who you're dealing with. But again, I think it's worthwhile to have these discussions.
NED MAY: So I'd like to add to that because a couple examples did come to mind in the interim. One, it's been nice to hear everyone talk about Google today because 10 years ago, it was not mentioned. But what Google did to the newspaper industry was pretty similar in terms of that exposure. The difference was all the newspapers came and put all their content up for free, in readership. So it wasn't just traffic. But they thought they were going to monetize those little digital ads and that would supplant the indirect or the lack of direct traffic that they got.
NED MAY: That piece of content was a much higher value to their own business than the secondary information. So they've pulled back from that. They still love Google because it drives traffic. But now they have paywalls up. They limit-- you can see one or two articles. But they have worked out the business rules around monetizing that traffic. So I think that balance will come.
NED MAY: I would be cautious with looking at an ANI database and saying, you need to pay us more. And I don't think you were saying that. Waze on the consumer front is another great example, the traffic apps. We passively give our data to it for an immediate payback of a better route to work or a shorter commute. So there's--
LAUREN KANE: Absolutely, it's about trade offs. It's not necessarily about direct payment. It's what sort of special advantage can I get from giving you an advantage to help enhance your product?
NED MAY: And that can be an insight stream back.
LAUREN KANE: Yeah.
WILL SCHWEITZER: So I think that if we're equating data with content, we actually have a talk tomorrow on content aggregation and syndication. And we talked a little bit earlier about digital strategies and publishers needing to have or find partnerships where their content, their services, can be found essentially where their readers or their authors expect them to be. Ned, is there a similar construct outside of publishing that a company may want to aggregate their data in, I guess, licensing their data or contributing their data so it can be found by their core audience?
NED MAY: I think that's a core of what we're talking about. So it goes back to this pyramid and the different values. So one person's critical information set is another person interesting distraction. And so the challenge is getting the business rules around that and the licensing restrictions around it so that your data just doesn't get blown out to every perceivable use case. So it's that atomizing.
NED MAY: It's that iceberg that I was talking about. It's critical because you can experiment, but you need to be careful. And I this is a careful audience in terms of freeing up your data for other use cases. So in terms of examples and partnerships, I'm not quite sure where you're going there with.
WILL SCHWEITZER: That's OK.
NED MAY: But--
WILL SCHWEITZER: I'm not sure I was either. [LAUGHTER] All right, so Ned and Lauren survived my somewhat horrible questions. But we would be happy to take questions from the room or from the app. And my colleague [? Sara ?] has the microphone.
NED MAY: I would just add--
WILL SCHWEITZER: Yeah.
NED MAY: --one thing to that didn't come up was we have this content set that is maybe or maybe not data. We have data around it. If you're looking to create products or explore products, you need to be thinking of that third layer of data. So it's not the metadata that we have. But it's the data that you're going to need to report back around that for those use cases to make sure you can track the usage of that.
WILL SCHWEITZER: Right.
NED MAY: So it's a layered strategy that needs to take place.
WILL SCHWEITZER: OK. I think that's [? Rod ?] over here. Sara is on her way.
NED MAY: [? I better ?] [? turn. ?] [LAUGHTER]
SPEAKER 3: Hi, I'm [INAUDIBLE] in [INAUDIBLE].. That was really interesting, thanks. We've got one member of staff [INAUDIBLE] he just works on data insights for us. And he's pretty overwhelmed. And I was quite intrigued with [? John ?] [? Shaw ?] and [? Michael ?] [INAUDIBLE] were both saying similar things about [? Sage ?] and [INAUDIBLE].. One of the questions we have-- we get data from several dozen sources and then the challenge becomes how do you [INAUDIBLE] of data together?
SPEAKER 3: So all the things you're saying aren't-- it's not the question. Just getting good data in [INAUDIBLE] form is a significant challenge. Is there a way that we, as publishers and platform providers and various other players, can work together to make that integration task simpler? Because I think we're all having the same difficulty that it's just lots of different shaped pipes coming in from lots of different directions.
SPEAKER 3: And when we're doing the work around this to get things to actually combine, that's pretty hard work. I mean, all of us--
LAUREN KANE: I think you're absolutely right. I think normalization, and cleanup, and standardization is a key challenge for every organization. And I think that-- this came up earlier in the day-- that one of the things that we as a community we're doing well but we can continue to improve on is having standards and enforceable standards. I think actually [? Ann ?] brought up earlier the idea about open access at the article level in [? Jazz, ?] It's there.
LAUREN KANE: It's a possibility. But it's not a requirement. And so if people don't collect that, if they don't tag for that, then it doesn't then show up. So then when the data goes downstream, it's not there. So if there are ways to encourage this-- I mean, I think [? ORCID ?] has been a positive example where the adoption of that has been very widespread. So if we can do that with more fields, with more information, and standardize it throughout the industry, what you're going to have is better cost containment, which, obviously, I think is the goal for us all.
NED MAY: And that means more people in process too, so dedicating your own staff to participate in those standards decisions, getting that collective energy behind it.
WILL SCHWEITZER: Daniel has a question up front.
DANIEL HOOK: So I think it's interesting that we've focused a lot things beyond metadata and user data. I think it's very clear that in the next few years the data around the object will actually exceed the [? organization ?] [? of ?] data in the object. And I think that's actually something everybody in the room should be aware of and something they should be thinking about. But to actually think more about the object itself, do you have a sense of when we will return to the object that has been published and actually imposing better structure on the data within the object?
DANIEL HOOK: Because I think there's a significant amount of value in that. And it's not really where anyone's looking right now.
LAUREN KANE: I mean, yes-- [LAUGHTER] --please. No, I think that goes hand-in-hand with Rod's comment. This is something we need. If we can capture, in a normalized way, from the very beginning, you know, have this native-- and I don't want to even say article-- but object that encapsulates all the data we might want, I mean think about what that would mean for us all, for the industry?
LAUREN KANE: So yes, I think we all need to think about collectively how we can achieve that.
WILL SCHWEITZER: And it-- sorry, go ahead.
NED MAY: Well, I would just poke on one word there, which was impose and impose the structure on rather than extract the insight or the information or the context from. So and I'd love to have a discussion because-- [LAUGHTER] --I know in the prior speaker it came up how can I contribute information on what I want back out of this content? And then we had heard earlier about the limitations of advanced search.
NED MAY: And only 2% of the people actually carry out that activity while lots of us put our thumbs up on pieces of content. So I'm just opening it up a little here.
DANIEL HOOK: The only comment I have is it, obviously, with [INAUDIBLE] science as a background, one of the things that we perennially have difficulty with is thousands of different approaches that everybody has decided to take. In some sense, this is the weakness of the repository [? method. ?]
WILL SCHWEITZER: Yes.
NED MAY: Mmm.
DANIEL HOOK: And possibly the point of the Dublin [INAUDIBLE] was to ensure that no one could actually profit from Dublin because everybody interpreted and implemented it differently. So I do actually use the loaded word impose quite specifically.
NED MAY: Mmm.
DANIEL HOOK: Because I think we as a group of people needs to agree and impose a standard for doing some of these things so that we can keep the cost contained.
LAUREN KANE: Mmhmm.
DANIEL HOOK: I think if we don't, then we will be seen by our colleagues in academia to be wasting the money that they're giving us for our products.
NED MAY: Mmm.
DANIEL HOOK: And I think that that's a very dangerous place for us to be as an industry, which is so closely tied with public funding and public good.
SPEAKER 2: What we a remark to come after. Maybe I'm opening up Pandora's box here, but isn't this what Elsevier is doing? I mean not many of us [INAUDIBLE] business--
NED MAY: (aside) You can take this.
LAUREN KANE: No. [LAUGHTER]
NED MAY: Lauren? [LAUGHTER]
DANIEL HOOK: It's [? twisting. ?] [LAUGHTER]
WILL SCHWEITZER: Again, it's a room of friends, folks. [LAUGHTER] But silence may imply yes. Was there another question? Oh there are two questions over here.
SPEAKER 3: [? Scott ?] [? Henry ?] from ASM International. Lauren, you made an off-hand comment earlier about our license agreements still would touch on this area. And I'm wondering if there is magic language that's been developed that both gives us the ability to use the data but also it doesn't scare off libraries and consortia who might be licensing or using our platforms?
LAUREN KANE: It's a great question. And I'm not 100% sure of the answer. I think there is a publicly good examples that were developed for text and data mining allowances that I think may be applicable here. But like anything, it's probably going to need to be tweaked for your particular organization. But I think it's a great question. And I think, again, with this community approach mentality, it would be good to hear from those that are actively doing this, if they have a model that could be shared and transparently used.
LAUREN KANE: I don't off the top of my head know of anyone that's doing this that wouldn't be in a position to share it. But if others have examples, I'd love to hear it.
NED MAY: I'd just add to that we've been looking across data. And we produce a mega-theme each year. And last year our meta-theme was trust is the new algorithm. And I would say, whether you get the words right in the license, being clear and open with your stakeholders in terms of what you're doing, how they'll get value back, how you're not looking to extract something unnecessary from them, it needs to be a critical piece of that.
WILL SCHWEITZER: I think before Roger, that brings up a related question in the app, which is a bit of an easier one for us, which is, what are our thoughts on personal data brokers aggregating select personal data on behalf of the consumer and compensating those consumers?
NED MAY: A great concept--
WILL SCHWEITZER: Yeah.
NED MAY: --right? And I think we've seen it and said, well, this is your data. This is Lauren's data. This is my data. I should be paid for it. I haven't seen it work.
WILL SCHWEITZER: Yeah.
NED MAY: And I haven't seen it pulled off. It's just very complex once you start going down that path of consumer data. So it's more I'll pay you for a survey--
LAUREN KANE: Mmhmm.
NED MAY: --that direct quid pro quo. But in terms of you have a data value around your data cloud, I just I don't think we'll get there.
WILL SCHWEITZER: Darn.
NED MAY: I'm sorry.
WILL SCHWEITZER: Roger.
SPEAKER 3: I'm Roger [? Schonfield ?] from SNR. I've really enjoyed this discussion. One of the challenges it seems to me in valuing one's data to share with the other party is that there's often a information asymmetry at work. So I think about to take an analog example, when Google went out and digitized all those library books, it didn't talk about all the AIs it was going to be training with those data. And we don't even really, as far as I know, know the full story there publicly.
SPEAKER 3: So can you talk a little bit about how a publisher can go about the process of valuing its data in an environment where you may not actually know all of the uses to which in the aggregate those data can be or might be put? What are the boundaries? How do you think about that?
LAUREN KANE: Do you have any insights there?
NED MAY: Well, there's some parallels. So in the realm of financial services, we have a model called the Six Ds. And it's six different elements of data. And there are different aspects that you can pinpoint and say, this is critical. So provenance-- is it your data? Can you clearly demonstrate that it's your data? What's the frequency of it? What's the veracity of it?
NED MAY: How accurate is it? So there's these different elements. In this case, though, I think you don't know, right? So you have your opinions on the value of the data. If Google can go off and create some extra value, is that value they're taking from you, or value they're creating? If you're worried about that, put short term licensing arrangements in place.
NED MAY: I would go on to say, if they extract everything they need once, then it was going to be hard to construct something from an ongoing sort of data stream. So if they can train the algorithms, understand something, and then move on, you're probably unlikely to extract value from your data. And it's just a hard fact to swallow at some point, right? But I think if you can contribute in a way that's adding in an ongoing basis, then you can get to terms with a license where you'll benefit.
LAUREN KANE: And I think we've talked a lot today from the revenue side of the equation. What can I get for my data in terms of payment or even a some sort of monetization? And also, for the record, I'm a woman, so I'm counting this as the female question. [LAUGHTER] So but I think we also need to examine this from the cost side. And one of the things that we didn't touch on that we talked about in our earlier discussion was what could this mean in terms of working together in aggregate reducing costs, especially things like we all as societies, or as publishers, work with platforms.
LAUREN KANE: That's why we're here today. We work with EMS systems. We work with other technological infrastructure. And we are often not the only society or organization working with those different vendors. And so in the aggregate through those technologies, data is being amassed. And if there is a value in that data, is there a way to basically get a reduction in cost for participating in that aggregate?
LAUREN KANE: And is that a way to get costs down for the technology in the industry?
WILL SCHWEITZER: Very interesting point.
LAUREN KANE: So I would love to flip it to you from the-- just to totally put you on the spot as you have done for us today-- [LAUGHTER] --from a platform perspective, is that a direction that you all are thinking?
WILL SCHWEITZER: Not today. I mean you-- [LAUGHTER] --this is dangerous because a third of this room are Silverchair clients. And actually maybe this takes to a first principle. I was a publisher for a really long time. And anytime we formed a partnership, we asked ourselves just this really basic question of who owns that user relationship and the data that comes from that user?
WILL SCHWEITZER: We clearly own the content and have the intellectual property. But do I lose that ownership, or am I taking on some risk in contributing that content to some other place? And then, essentially, what is the exhaust that comes off the interplay of the content and the users or even data farther up, say, in the manuscripts submission and peer review system? And I think you have to think really carefully about those things.
WILL SCHWEITZER: So publishers sit on these rich assets of their content and their data. I think you have to think about who you're sharing that data with. What is the nature of that partnership? Do you trust those people? What does that ecosystem look like? And know that that data may be as valuable tomorrow as your content is today.
WILL SCHWEITZER: But as a platform, it's your content, your data, your users. And that is at least Silverchair's operating principle.
LAUREN KANE: Thanks.
WILL SCHWEITZER: Ned?
NED MAY: So are you looking at that third tier and adding in that third tier-- and perhaps you do it already today-- but the tracking of the data on the data, if you will?
WILL SCHWEITZER: Not today.
NED MAY: OK.
WILL SCHWEITZER: So [? Ann ?] has a question.
SPEAKER 3: So just adding on to that, as opposed to saying in some way looking at data and analyzing it across customers and doing some kind of a cost benefit, what about doing some kind of de-identified benefit of helping people to understand best practice that works on your platform, which I bet every one of your customers would probably love as long as they give you permission to use their data in that fashion and is an annonymized?
SPEAKER 3: It doesn't seem like there'd be a lot of harm in that.
WILL SCHWEITZER: It's a really good point. And I think a related thing that we're thinking about as a technology provider in this space is very similar to comments that Daniel made. What are tools and services that we can provide that draws more intelligence out of the content that can essentially synthesize data on the use of the content or users on platforms in a meaningful way? There are people in this room who have been in this business for a really long time of extracting information from content whether that's an [? ND ?] extraction to match back to GenBank, whether that's to surface the protocols or apparatus that are used in literature.
WILL SCHWEITZER: And you could even take that the next step to say, here's the link to the product catalog or back to the provider. And this isn't too dissimilar to what we've done in the advertising space for a very long time. So this is a direction of travel. And I think it actually starts, as Lauren said, back to kind of a community conversation of what we're comfortable with and what we're trying to achieve.
WILL SCHWEITZER: But we've got to get the infrastructure right first.
LAUREN KANE: And that was by no means meant to put you on the spot or Silverchair.
WILL SCHWEITZER: That's all right.
LAUREN KANE: It really was more that there is a benefit to aggregating content, to aggregating data. Obviously these findings are more useful the more scale that we have. And so there are not a huge amount of organizations, the connectors of the industry, that are in a position to provide that scale and to then provide that insight that can help us all. And so taking this back to where we began, yes, so we're talking about ways to produce new revenue.
LAUREN KANE: We're talking about ways to save on the cost side. But we're also talking about what are ways that we can, working together, working in collaboration, how can we make this ecosystem function better? How can we all have the insights and findings that are going to push us to the next stage, help us collectively serve our customers better, serve our users better?
WILL SCHWEITZER: So this is a question that came in through the app. And the question is, in your experience considering how varied markup and formatting are over time, how have you seen publishers harness the sometimes massive and valuable volume of journal archives? There are challenges when archives consist of limited metadata and scanned PDFs but no full text XML and no rich markup. I think that's getting back to the earlier questions about data quality and hygiene.
NED MAY: Yeah, it sounds like a data conversion question and markup. There's a fair amount of work that needs to go into that cleanup. It's a worthy task. And there's some companies, some of which are in the room, that are good at helping with that.
LAUREN KANE: And that's why it's a cost calculus. I think we're saying, yes, yes, we having normalized, cleaned-up data, there's a great value in that in terms of the products and the insights that might come from it. But it does come at a cost. And so before anyone would go into a project like that, I think it's important to look at what is the eventual benefit going to be, and does that outweigh the cost of the project?
WILL SCHWEITZER: [INAUDIBLE] Any other questions from the room?
NED MAY: Well, and it goes back to the other point of hopefully we'll get it right going forward and put more control over that environment.
LAUREN KANE: Yeah.
NED MAY: So we don't have to spend backward.
LAUREN KANE: Yeah.
NED MAY: Yeah.
WILL SCHWEITZER: Last call.
NED MAY: First call.
LAUREN KANE: Yeah.
WILL SCHWEITZER: Or first call. [LAUGHTER] This is a meaty topic for the end of the day. And given that it is really new to kind of our industry, please help me give a big thanks to Lauren and Ned for going through this conversation with us. [APPLAUSE]
LAUREN KANE: Thanks.