Name:
Search and Discovery
Description:
Search and Discovery
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/ac0b5634-b9e0-45f2-96bd-107d623646a3/videoscrubberimages/Scrubber_3535.jpg
Duration:
T00H59M49S
Embed URL:
https://stream.cadmore.media/player/ac0b5634-b9e0-45f2-96bd-107d623646a3
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/ac0b5634-b9e0-45f2-96bd-107d623646a3/Platform Strategies 2018 - Search and Discovery.mp4?sv=2019-02-02&sr=c&sig=xz32JsbrCZXGngRb%2Fd%2FISALZ01FqObVTvDb%2BnKACqW0%3D&st=2024-11-23T15%3A29%3A23Z&se=2024-11-23T17%3A34%3A23Z&sp=r
Upload Date:
2020-11-18T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
NIALL LITTLE: I came to Silverchair with a background in software development from a customer experience firm. So much of what I'm really passionate about in my job is really making the experiences on our platform better for the people who are using it-- the readers, the admin tools, those sorts of things. But for the researchers, I want them to be able to spend their time reading articles, not navigating around a website, waiting for these things to load.
NIALL LITTLE: I like to think about the moments that matter. And when it comes to our readers, there's arguably no more important moment than when they decide to read an article-- when they click on that article, however they found it. And a lot of our talks earlier touched on search within the site. People were talking about how people navigate within their site.
NIALL LITTLE: What this panel does really well is make it a cohesive unit across the entire industry. So what they really do-- they pave the way for users to not only find what they're looking for but help them find the things they should be looking for and where to find it. So joining me on stage today are Anurag Acharya from Google Scholar; Ruth Pickering, co-founder of Yewno; and Jan Reichelt from Kopernio Web of Science.
NIALL LITTLE: And so we're here to hear them talk, so I'm going to let them go ahead and get started. Here's the number, in case you want to text questions. But I will also be promoting the live question asking as much as possible.
ANURAG ACHARYA: I would like to thank Niall and [INAUDIBLE] for giving me this opportunity. They told me I had 15 minutes. And I said, I have so much to talk about. He says, you take your pick. So I said, what might be of interest to this community? Probably, growth-- that seemed like a safe thing to talk about. So I'll talk about two different ways that we have been exploring opportunities for growth.
ANURAG ACHARYA: The first and the most obvious is the device that every one of you has in their pockets and very few of you today use-- whether you do research or not, very few of your users use it today to do research, even though they use it for every other frickin' thing in their life. [LAUGHTER] This is obviously some of the examples in there.
ANURAG ACHARYA: Scholarship as a whole is way, way behind on mobile. And the graphs are diverging. It's not that we can just wait and it will come to us. So the question is, first, why does it matter? Mobile is a small fraction of my usage. Do I care? And that's what I thought for a while. It's 15% of my usage. Can I possibly put any resources into it?
ANURAG ACHARYA: The problem is, by thinking that way, we are missing out on the biggest driver that has been for web usage in the last five to 10 years. That's where growth has come from. Desktop has seen no significant growth in comparison. We are also training an entire generation of users to skip this modality, which they have trained themselves to use otherwise.
ANURAG ACHARYA: So the question is, why is it slow? What is it that they're doing that makes them a [INAUDIBLE] disk, compared to everything else. The common operation everybody does-- at least, keep in mind that "common operation," for me, is what I see. But nevertheless, the common operation for people, in terms of literature search, is trying to find things to read.
ANURAG ACHARYA: Unlike normal web search, there is not one answer to a question. There is not, what time does flight leave? Or who was the King of Serbia in 1,400? That has one answer-- one place. What you get normally is things to read that tell you, here is the state of affairs in this particular area. And then you figure it out which of them will actually be useful to you in your work.
ANURAG ACHARYA: There is a lot of scanning. And there is some reading. When I was faculty, the way we used to do it was you scanned journals back then-- lists of journals. You went through abstracts. You figured out which papers you wanted. You photocopied them, got a cup of coffee, got three papers, read them.
ANURAG ACHARYA: The world has not fundamentally changed. The process of getting to those papers has changed. But reading papers takes time. Finding also takes time. Question is-- reading on these devices will be hard. Finding need not be. OK, now why is this thing deciding to-- [LAUGHTER] You see, there are many components.
ANURAG ACHARYA: All right, we can do this. So what holds it back? The problem is speed. And I'll talk about why speed becomes an issue. You take your phone, turn it on, and wait for 3 to 5 seconds. It will blank itself out. It does this because it's trying to save battery. It does this for everything. It tries to conserve battery.
ANURAG ACHARYA: This is the primary difference between this as a computing device and your laptop and your desktop as a computing device. It will try to save battery. In particular-- part that is relevant here-- it will shut the network down. Your cell radio will shut down if you don't use it for 5 to 10 seconds. Then when you click on a link, the radio will wake up.
ANURAG ACHARYA: It will connect to the cell tower, establish a connection, then do a DNS, do a TCP, then ask for request-- everything while the user is waiting. And you do that not once. You do it multiple abstract-- one at a time. What used to take a few seconds, now takes 30 seconds to a minute or more. No wonder people avoid this. They wait this.
ANURAG ACHARYA: They delay the operation. So what can we do? Take the entire workflow as a unit and coordinate the interaction with the cell radio, rather than taking every click as a completely independent action. Avoid long setup times.
ANURAG ACHARYA: While the user's reading, can we use that time to actually fetch something that they are very likely to be reading next? What we came up with was to integrate abstracts-- just the abstracts-- for mobile devices, within the search interface. You click on the sources. Oh, you can take this out.
ANURAG ACHARYA: This is live. You can take out your phone. Play with it. Go to Scholar. Do any query. Click on any result. Then swipe left. You will see no perceptible gap. You will not see the 3 to 5 second wait. Everything will be as you might expect it to just flip through as if it was local.
ANURAG ACHARYA: It goes from-- I like to think of it as an order of magnitude speed-up in the interaction with literature on the mobile devices. The click for reading full text to get all the other features-- you click through to the platform. There is a prominent link right below the abstract, saying, here you go there. But just for the abstract and just for the scanning operation-- that's integrated directly.
ANURAG ACHARYA: And the usage-- all the impressions are then logged at the publisher's server or Google analytics-- whichever is preferable to the publisher. So where are we? We launched this March 21. Many of our partners have seen the benefit of it, the value of the direction, and also the fact that we are starting from such a low place.
ANURAG ACHARYA: There is no place to go but up. [LAUGHTER] Seriously! In some sense, there is not much downside. And there is only upside to be gotten. This is telling you who's participating. This is not telling you what the impact is. Impact will be of two kinds.
ANURAG ACHARYA: One is the immediate impact of people who are already using the service to use more of it, because it's faster. And the second is, the people who were not using these devices-- to begin to unlearn the habits that they have learned and to begin using this device, like they use this device for everything else. So we have some evidence.
ANURAG ACHARYA: Since there's been a few months, we have some evidence of both. OK-- that was not intended. It looked different than I displayed it. On launch-- this is the first part-- a factor 2 two immediate growth in the number of articles a user has interacted with. That is step one.
ANURAG ACHARYA: And let me see if I can actually get to any part of this. If not, I know what it is. So I can talk about it. What I'm displaying here is this number of abstracts viewed for one single publisher since the launch. Normally, we have, in scholarship, two peaks-- everybody knows this-- the spring peak and the fall peak.
ANURAG ACHARYA: At this point, we are three weeks into the fall. The usage is already 13% above the spring peak. The rest of the fall is yet to come. They did, like, two red bubbles. Effectively, the one is the spring peak and the other one is where we are currently. The slope is positive. Expectation is, this growth will continue.
ANURAG ACHARYA: This semester-- not even a question. The next semester-- I expect there's room there. You're talking between 15% and 50%, 60%. How far we will get, I don't know. But there is such a large room to grow. Mobile usage has been the largest online growth driver. And it is beginning to happen. The second part is, once we are able to get this to work-- thank you!-- it will enable many other things that we currently even don't think of.
ANURAG ACHARYA: This as just taking what we already know about, what we already do, and to do it faster, just like when we put mobile maps online. Nobody thought of ride-hailing services as a thing that was going to take off like crazy. The opportunity, ones it's available, opens many new doors. And we'll switch gears and talk about a different aspect. This is scholarly publishing.
ANURAG ACHARYA: We think of experts as our authors, experts as our readers-- the primary audience. But there is another audience that all of us have. And we know we all have, because we have that spring peak and that fall peak. Where does that come from? It comes from all those people who come onto campus and then go away in the summer-- the undergraduates-- step 1.
ANURAG ACHARYA: You also have the non-undergraduates, the experts who are looking into their related field. Since you have a single place to search, you can do it. And since you have a relevance ranked search, you don't need to know every term to be able to get useful results out of it. So you have most of the thing in place to draw this user base in.
ANURAG ACHARYA: There's still a [INAUDIBLE] problem. The problem is, scholarship is, linguistically, a lot of little villages, not a single big city-- every village having its own little different language with a little bit of overlap here and there. And unfamiliar users can begin to poke into it, get some idea. But you don't have an easy glide path to be able to figure out, what should I be looking for? So what we did was to say, OK, if you come in with a query that indicates that you know less about the field, can I help you go deeper into the field?
ANURAG ACHARYA: Can I suggest to you what would be spaces to explore? And if you know a lot about the field, can I give you spaces around it that lets you get an idea of what other things, aside from what you're already exploring, that you would like to? So what we do is to build a multi-dimensional topic map-- complicated words for not much helping. I'll show you examples and then it will be better.
ANURAG ACHARYA: It goes from broad fields to specific fields and from specific fields to other related specific fields. We launched it last year. It's available in two different places. It's available if you click back in the search box. Or it's also available at the bottom of the search results or in the middle of search results. I'm giving you some examples-- carcinoma.
ANURAG ACHARYA: Mm-- broad-ish query in the space of medicine. What do you get? Different kinds of carcinomas. Let's be more specific-- renal carcinoma. What do you get? Many different-- not all-- many different things that might be of interest to a researcher or a student who's needing to understand about renal carcinomas.
ANURAG ACHARYA: We go more specific. By this time, if you are able to say renal carcinoma nephrectomy, you know what you're doing. Then trying to get you more deeper in is not nearly as useful. It's far more useful to say what the space around it is. So that's basically what you're seeing. You're no longer trying to guide people deeper. You're helping them go sideways.
ANURAG ACHARYA: What the result of this is that, people who would stop after one or two queries, go deeper. The number of queries increase, which is useful-- step 1. But how happy with that they are with the results of those queries also increases, which means we are drawing these people in. These people who will be experts tomorrow can begin to learn faster.
ANURAG ACHARYA: Thank you. [APPLAUSE]
RUTH PICKERING: Thank you very much for inviting me to join this panel. It's really lovely to be with you all this afternoon. So what I'd like to do first of all is take a broader look at the search and discovery environment, in terms of market value and what people are doing. Then I'd like to make the distinction between the search and discovery spaces, in terms of how we see things, and really look at a little bit of AI technology and what it can enable in this space.
RUTH PICKERING: So you know, we see next generation technology providing something very different, something additive, to the existing ecosystem and really complementing what is available today in the learning space, but not replacing search. And finally, I want to tell you a story. It's a story about an intern called Miriam, a high school student called Hannah, and a teacher called Freedom. But first of all, looking at the market-- so this information comes from the 2016 Davos panel on artificial intelligence.
RUTH PICKERING: And they defined the existing search market as a trillion dollar market in 2016, in terms of value. So the people you would expect to see in there are all of the existing traditional search players, whether they are paid-for products or whether they're free products. And it's a pretty big market. It's a pretty big space. So they go on to define what they call as "next generation search" as a 10 trillion pound market by 2026.
RUTH PICKERING: And the explanation for that is, if computers can read books or if machines can understand content, if they can understand context and meaning, they can do so many more things for so many more people and provide something so much more valuable. So we see ourselves in that space. And I think, probably, you'll see all of the existing search players moving to that space. And sometimes people say to me, oh, you use graph theory.
RUTH PICKERING: We do. And they'll say, oh, I've heard of this other company. There are hundreds of companies using graph theory, AI-- all the different components of artificial intelligence. But as you can see from this slide, it's an absolutely enormous and growing market. And at the moment, it's still in its very, very embryonic, early growth stages. So looking at search-- and I want to make the distinction between search and discovery.
RUTH PICKERING: So for me, I still think back of being in high school and being at university and literally going to a library to read a book or find a book or paying for an item of content and having to have that physically in my possession. And digitization, internet access, proper search has really changed the world. And it's hard to almost remember what it was like before. But when you have a specific question-- if someone says to you, look for x, find a book, find an article, you go to a search query.
RUTH PICKERING: You go to the right place. You put in a good query, you'll get a brilliant result. And you'll get it really, really fast. But what if you don't know exactly what you're looking for? What if you know a field that you're interested in or a topic that you're interested in, but you don't know the exact terms? And one of the things I find slightly amusing is-- students will start a completely new course of study.
RUTH PICKERING: There's no way, in your first week of lectures, that you can possibly formulate a good search query. And we hear this all the time from the librarians. Students can't formulate good search queries. And they can't find what they're looking for. So there is this irony in trying to find something but, until you're an expert, you potentially aren't formulating the best query. And so you won't find what you're looking for.
RUTH PICKERING: So the next generation discovery space is very different, because it's based largely on full text. And it can offer completely different things to people. So we don't see this as an either-or. We don't think you're either in search or in discovery. We think people will use both types of service to do different things. So in terms of how everything works-- I don't want to go through a really, really laborious technical presentation-- but I think the key things for this audience are starting on the left-hand side.
RUTH PICKERING: What you see is full text being analyzed-- all different formats, all different types and, in the future, multiple languages. So if I ask for people to put their hands up in the audience, I don't know what you think the quality is of your metadata. So I said, is your categorization data good? I'm not seeing any hands. I don't know if that's because you're not answering-- oh, good.
RUTH PICKERING: And then, is the quality of your keywords good? And then let's say you'd all said it was brilliant. Is it consistent? Have you got brilliant categorization data and brilliant keywords across every single thing that you publish, going back historically? At this point, I do not think anyone would be putting their hand up. So the great thing about future search is that, because people will be ingesting full text and because the algorithms are understanding context and meaning, it actually doesn't matter.
RUTH PICKERING: It doesn't matter if you've categorized an item as one thing and it's, in fact, about five subjects. It doesn't matter if you've only got three keywords, because it will be reading every single line, every single sentence, identifying meaning in the form of concepts, promoting them, and making them discoverable in a completely different way. So I actually think that's really, really good. The second thing is, if algorithms are reading the full text, they can actually take the research or take the student, take the person who's looking for information, to the exact section of that document.
RUTH PICKERING: When I was in Frankfurt last year, I'd spoken to a publisher. And they'd just done some research. And they said, how many people do you think get more than 50% of the way through a journal article? Anybody? Anybody? OK.
AUDIENCE: [INAUDIBLE] percent.
RUTH PICKERING: Sorry?
AUDIENCE: 10%.
AUDIENCE: 2.
RUTH PICKERING: 10%-- 2%-- actually, 5%. It's less than 5% get 50% of the way through. And I'm about to go to Frankfurt again in two weeks. I'm still shocked. A year later, I'm still shocked that only 5% read all the way through, because I used to read things end-to-end. So going back to what Anurag was saying, people are expecting things immediately.
RUTH PICKERING: They're expecting things on a mobile handset. People want to go to the exact section. We've actually been asked-- we don't even want a paragraph of text. Can you give us those 12 words, so we can push them onto a mobile handset? So there's that side of things. And then I think, the third thing really is, if machines can understand context and meaning and they can do something algorithmically, then they actually can find things, find concepts, find connections to scale that none of us could ever do.
RUTH PICKERING: So I think, from a technology perspective, that is some of the advantages we can be looking forward to. And now I want to talk about serendipity, engagement, and fun. So I hear these words a lot. And I think, one of the great things about moving forward with this kind of AI technology is that you can reintroduce some of the things that potentially got lost as search and move forward.
RUTH PICKERING: So before I move onto my story about Miriam-- when we first developed Yewno Discover, we were targeting a higher education market. And we thought, if we can build this product for some of the cleverest people with the most complex questions at the big research institutions across an interdisciplinary content set-- so all domains of information-- then we thought, after that, we can repurpose it to any vertical application and to any other group-- for example, high schools So we started off there.
RUTH PICKERING: And soon after the product was launched, we started getting questions from high schools. And we thought, mm, interesting. So we built this content set, which is clearly targeted towards higher education. And we built this interface, which we've tested on higher education. What would a high schooler do with it? So let me introduce you to Miriam.
RUTH PICKERING: So Miriam was 18 when she came to us, summer, 2017, initially for a 3-month internship. But she stayed a year. She took a year off before she went to university. And she was just graduated. She was very excited about the product. She had this teacher who was her mentor. She got him pretty excited about the product. And he said, can I use your product in my class?
RUTH PICKERING: And we thought, well, we would love to know how high schoolers react to this higher education content set into this completely different interface. So we said, yeah, great. Let's give it a go. So off we go to this class. And Miriam is doing this introduction to this group of kids. And she gives them all a log on.
RUTH PICKERING: And about two minutes later, I'm at the back. And I'm watching them. And with basically no instruction, I can see them all building these graphs-- pretty complex graphs. So the teacher sets them this assignment. And off they go. They all start working. Some of them are in groups.
RUTH PICKERING: Some of them are working individually. And the first thing I noticed was there was this really happy buzz in the classroom. And I went to school in England. And maybe that's because it's a small island and it rains a lot, but I do not remember a happy buzz in my math class. [LAUGHTER] So anyway, that for me was one of the first things that struck me.
RUTH PICKERING: And then what they were actually studying-- and when I'd agreed that we could test in with this class, I didn't actually realize the teacher was a math teacher. And I possibly would have said, maybe not maths. But anyway, off we go. And he's actually set them this work on the quadratic function. So I do remember learning about quadratic functions. It is a distant memory. I definitely do not remember fun.
RUTH PICKERING: I definitely don't remember joy or anything that would inspire me to write a 9-page paper. But this is what Hannah did. And this is how her paper starts. She says, Yewno first helped me understand the quadratic function. And then it helped me find connections between parabolas and other topics. And then she goes on in her introduction to say, I was fascinated once I made the connection between parabolas and orbit.
RUTH PICKERING: So my mother was a teacher. And she said, in your class, however much preparation and structure you've done, you need to capture the student's imagination. And she said, the minute somebody is engaged, restructure your class and go in that direction. And this quote really made me think of that-- that very, very early on, through the visualization, the students were seeing something.
RUTH PICKERING: They were seeing something that captured their imagination, that engaged them, because it wasn't passive search. It wasn't looking at a list. With looking at the visualization, all of them were making different choices and taking their research in different directions. So this is one of the pictures that Hannah actually put into her research paper.
RUTH PICKERING: And she put three in. And you can kind of see here, she's drawn this incredible graph. And so I think, although digitization has delivered enormous benefits, which I mentioned at the beginning, in terms of access to information, et cetera, one of the things that search has done is-- the better the search query you put in, the narrower your results will be.
RUTH PICKERING: And odds on, at that point, you're unlikely to find anything unexpected. That kind of serendipity has probably a bit gone out the window. And finding unexpected things is fun. And being able to interact with the product is fun. And both of these things create much higher levels of engagement. And people will use your product for much, much longer time, because they're enjoying it.
RUTH PICKERING: So Hannah then writes this 9-page paper-- you can actually find it-- it's been published-- "Design Thinking Applied Mathematics, parabolas, hyperbolas, and black holes." So [INAUDIBLE] that was completely phenomenal. And then she actually wrote in her conclusion, I learned from creating this paper to not try and control the output of my research, which I thought was really an incredible insight for a 15-year-old.
RUTH PICKERING: So one of the reasons I wanted to talk about this story is that these kids found content they would never ever have been given, because it was higher education content. And the reason it was accessible to them and the reason they could understand what was going on is because they were so engaged, because it was an interactive environment, a little bit more like gamification. And it led them on to read things.
RUTH PICKERING: And then the other thing is, the way in which the information was presented to them was not a 400-page book, not a journal article. It was what I would call a snippet, which is a paragraph of information. And again, if someone is interested, they can step up to a higher reading level for a short amount of information, where they couldn't possibly read an entire book or an entire journal article.
RUTH PICKERING: So that's one of the things I thought was really important about it. And I wanted to leave you with a different picture. So I've talked a lot about high schools, because I think everyone's quite familiar with the higher education space. But we get a lot of questions about AI and the dangers of AI and the perils of AI and things like bias. And actually, one of the key things-- and if you look at this picture, you're looking at a visualization of the MMR vaccine and the autism controversy.
RUTH PICKERING: And the machine isn't giving you an answer. It's giving you a visualization, which explains to you how things connect. You then have to go in, read through, make the decisions, and decide if it's relevant and what your opinion is. So the machine isn't giving you an answer. And it isn't telling you what to think. So I think, from my perspective, the next generation of discovery tools can introduce visualization and show connections in a way that hasn't been possible before.
RUTH PICKERING: They can be interactive. They can speed up the research process and make you more productive and bring back serendipity, and remove from researchers, from students, from anyone trying to learn, some of the frustrations that they may have had associated with it. Thank you very much. [APPLAUSE]
JAN REICHELT: OK, so hello everyone. My name is Jan. I'm the last one before we start a Q&A. I am coming here, representing Kopernio. I have a broader role-- also, [INAUDIBLE]---- representing Web of Science. But I'm really talking to you today with the idea of Kopernio. And I'm coming back to some of the things that both you mentioned, which is about speed for end users.
JAN REICHELT: It's about-- how do we engage end users?-- and creating these opportunities. Now, my personal history-- and that is important to this story-- is that I was a PhD student, but a very unsuccessful one. So I figured out, during my PhD, that I was not going to be an academic, in fact. And I found a fairly big problem that I wanted to solve, which was organizing research documents for researchers.
JAN REICHELT: And I started at Mandalay. And Mandalay became quite big. And it was fairly successful, because we actually really meaningfully addressed the problem that researchers had. And I think that was a very good story and a very good training, I would say, for me. One thing that we did not solve during the journey of Kopernio was, in fact, the problem of access and convenience when accessing research papers.
JAN REICHELT: When we started Mandalay, we always thought around, well, wouldn't it be cool if we could help researchers get easier access to research papers that they have access to through the institution subscriptions? And then you get into pass dependencies. And in the end, you build a reference management software. So after we did the Mandalay thing, I then said, well, this problem is eight years in. This problem still exists, if it's not even bigger than before.
JAN REICHELT: Because we see things popping up, like Sci Hub and so on, where people get easy access to research papers. So we said, it's worthwhile addressing this problem. And that's the history of Kopernio. So the idea is, how can we give one-click article access to high-quality published content by integrating with institutional subscriptions? And it's really just what researchers want.
JAN REICHELT: So put yourself in the shoes of me, being a PhD student at university. So let's go PDF hunting. Because everything that the researcher wants, at the end of discovery and search, is-- give me that damn PDF! So let's try. We start, as probably the majority of the researchers or at least a large chunk, at Google or Google Scholar.
JAN REICHELT: You type something in there. And the first link takes you to PubMed. By the way, already in Google Scholar, you know, great. That's the article I want. So you click. You get to PubMed. On PubMed, you basically see, more or less, the same information. So you say, yeah, right.
JAN REICHELT: That is the article I want, so-- please. So PubMed does a fairly good job linking you to the publisher, where you see, the third time, exactly the same information. You say, right, exactly! That's the article. Great! Now I get it. So the reality, unfortunately, is now-- it's three times context switching.
JAN REICHELT: And in this case, I now say, OK, it's the final destination. Where's the PDF? So I need to reorient myself-- so coming back to convenience, speed-- and that's mental brain capacity that's being taken away. I need to find the PDF button. So where's the PDF button? It's somewhere in the page. Some publishers do a good job.
JAN REICHELT: Some public publishers do maybe a slightly less good job. But anyway, there is a PDF button eventually-- most of the cases. We've actually spotted cases where that's not the case. You click on the PDF button. And then you're being redirected to an authentication page. What the heck is my access management federation? Who is this Mr. Shibboleth? [LAUGHTER] What's your customer ID for publisher platform ABC?
JAN REICHELT: Anyone know? Anyone care, if I may ask, from an end-user perspective? Probably not so much. So what would the end user do? [RELIEVED SIGH] Salvation! Go back to Google Scholar. Try the second link. Second link gets me to ResearchGate, because they do a fairly good job in indexing their article pages.
JAN REICHELT: They are very good in search engine optimization. So Google Scholar-- blue button-- request full text. Click. Sign me up-- yet another cul-de-sac. So that journey-- I don't know-- actually, I think no one knows how often that journey happens. But I'm pretty sure it happens more often than any one of us would want.
JAN REICHELT: It's four platforms, 12 clicks, 60 seconds, and still no PDF. Now there are cases where that journey does not happen. There are cases where it goes smoothly, so we don't need to solve any problem. And that's OK. Kopernio adds to that by trying to solve exactly that problem, when that journey is about to happen, to stop that from happening. That's the idea.
JAN REICHELT: Because the result of that is frustration. And that is recognized by librarians. That's recognized by researchers. And in fact, we have evidence that this doesn't need to happen. So there is a librarian at the University of Utrecht who did an analysis into SciHub downloads from researchers at the University for Utrecht. And she found out that 75% of the research documents that have been downloaded on SciHub could have been served via the institutional subscription.
JAN REICHELT: So this stuff is there. It's legally available to that user. Yet, they go somewhere else. I think it's untenable really. And the reason why that is-- you call this, in digital product development, design lines. So there's a road design, right? We design a road for the user.
JAN REICHELT: And the gate is open. It's not closed. It's just really inconvenient. You need to get off your bike. And as we heard from Anurag, people are starting to get on their motorbikes. You need to get off your bike, turn around. And if someone is coming your way, then it takes even longer. Maybe a website is down-- all this kind of stuff.
JAN REICHELT: So what do people do? Well, they say, fuck it. I'll go right around it. [LAUGHTER] Because that's what the internet offers to them, for better or worse. Because there are people in the internet who don't care. Now, from an entrepreneur's perspective, with the experience of Mandalay, I'm saying, well, is this a problem?
JAN REICHELT: No, actually, it's an opportunity. So when you then think about, is it worthwhile trying to solve this problem or address this opportunity?-- is this meaningful enough?-- and then you say, there are 2.5 billion times that this thing might happen. And how do I get to that number? That number is-- we have a 2015 SEM report-- 250 PDF documents that each researcher wants to access per year-- 250.
JAN REICHELT: And we have about 10 million researchers in our community, at the core. And we haven't even thought around the wider community, the undergrads and all this kind of stuff. So just as it's core, we have 2.5 billion times per year-- that potential problem. I'm not saying that it's always like this. But potentially, having that problem is quite significant. And we also have to recognize, if we don't solve that problem, researchers will solve that for themselves.
JAN REICHELT: Because there's always a way to get to the PDF. So either we start to engage or researchers solve that for themselves. And the other part of that is because access, legally, is actually there. I think, we need to define this not so much as an access problem. It's a convenience problem. Because I have subscription access, if I'm at the University of Utrecht.
JAN REICHELT: I just don't use it, because it's inconvenient. And that becomes quite a big problem. So as a researcher, I can get to the PDF through different platforms. I can go through the primary publishing platforms. I can go through aggregators. I can go through the university pages. And so I have different ways that can take me to my final destination, which is the final published paper.
JAN REICHELT: Now imagine 10 million times-- these different connections and figuring out which way I need to take and trying to automate that at scale. That's what we're trying to do with Kopernio-- trying to bring a researcher from where they enter into this world-- be it Google Scholar or PubMed or ResearchGate-- and always direct them back to the high-quality published journal document that sits on the publisher platform.
JAN REICHELT: And that works as a browser plugin. And it's traveling with the researcher. We're not trying to force the researcher into a new workflow. We're not trying to create a new destination site. We're recognizing, some researchers prefer Google Scholar. Some prefer PubMed. Some prefer a publisher platform. Some prefer Web of Science.
JAN REICHELT: Whatever they like, we try to help them wherever they want to be by giving them what they want and always directing them back to the publisher version, if they have legal subscription access through their institutional subscriptions. The way that looks like is-- take an example-- Archive, a very popular reprint repository. Now if I'm an end-user and I come to Archive, Archive does not know, at this point in time in most of the cases, whether, for that pre-print there is already a published journal article available.
JAN REICHELT: And if there is, they also don't know if I'm, as the end-user, have subscription access. What we do with Kopernio-- because Kopernio sits in the browser, it pops up at that point, has figured out if you have subscription access-- one button. Click that button. It gets you to the journal article that we pull from the publisher and gives it to the researcher at that point.
JAN REICHELT: That will be usage that would be lost to the publisher. It would be a missed opportunity to give the researcher the better versions, instead of the pre-print. And so I think that's a win-win situation. Same would be true if you take the example as ResearchGate. You get to ResearchGate. Well, sometimes ResearchGate allows you to download stuff there.
JAN REICHELT: Sometimes, they try to sign you up. But again, it doesn't matter. If somebody wants to use ResearchGate for discovery purposes-- fine-- or because they hang out there with their colleagues. That's also OK. But what we want to make sure is that they use the published journal article. So we go there and we pull the article from the publisher.
JAN REICHELT: It's a counter-complaint download. And we give it to the researcher at this point. And of course, we can let the publisher know that this has happened. Take the example from an institutional repository. And here in that screenshot, you can see, via this little download button, there is a version available, that probably the librarian will have uploaded.
JAN REICHELT: Now, if I've got Kopernio installed, I go to that web page. Kopernio pops up. Despite there being some version available-- it might be a pre-print-- we try to make available the published journal article at that point-- so try to engage the researcher wherever they are. And that happens for both subscription content as well as open access.
JAN REICHELT: Because the situation with open access is that people put up open access papers all over the place, because maybe they don't understand all the different licensing requirements. So even if that was an open access paper, we'd still pull it from the publisher, so the publisher knows that there's usage. And that leads to quite some astounding data.
JAN REICHELT: So what we can see is, because all this engagement happens all over the place-- it's the internet after all-- we deliver quite a huge number of additional downloads that otherwise would not have happened back to the publisher. Because we say, well, there is someone on ResearchGate or there is someone on that-and-that platform who wanted to get access to your document. That person was authenticated.
JAN REICHELT: We pulled that document. And we gave it to them. And that is the platform where they chose to interact with their document. At the same time, we also know when somebody wanted to get access to that document but did not have access. So in the last month we know, for example, that almost 400,000 times somebody wanted to get access to a research paper and we didn't give it to them.
JAN REICHELT: So what do they do? What do these people do? Think about it. What can we do, if we can get them engaged. And the other interesting thing is, from just a business development perspective, I think that 40% of that is coming from emerging markets. But also, 35% of that is coming from developing countries. So I think-- I'm coming back here to what my other speakers said-- if we take into account speed, convenience, and get people engaged, we can actually uncover a lot of opportunities that before we wouldn't have thought possible.
JAN REICHELT: And the great thing about Kopernio in this case-- and that's something that makes me very happy-- is that people just love it. They just like the product. Thank you very much. [APPLAUSE]
NIALL LITTLE: All right, thank you very much. Those were excellent talks. I maybe should go to a conference where my bosses aren't here, so I can be the first ones to drop an F-bomb. That was pretty awesome. [LAUGHTER] So we're opening up the floor to questions. Do we have any questions from the audience?
NIALL LITTLE: We have a microphone back there? Yeah.
AUDIENCE: I thank you all. That was really interesting. Is this on? Yeah-- thank you. Google question-- what do I need to do to ensure, if I type in, word-for-word, an article title, with or without quotation marks, that my native article appears at the top and not PubMed and not ResearchGate and not news media coverage?
ANURAG ACHARYA: You're talking about Scholar or about web search?
AUDIENCE: The latter.
ANURAG ACHARYA: If you're talking about Scholar, if it is not working, please come talk to me. We'll make sure it happens.
SPEAKER 1: Sara--
ANURAG ACHARYA: Web search is a different beast altogether.
RUTH PICKERING: Web search--
ANURAG ACHARYA: Web search looks at many, many metrics to decide what version, what things should go where. And there isn't a similar mechanism to be able to say, this version should go there or that is not. They take into account-- at this point, I don't want to say hundreds, because I have no idea how many metrics to decide how the ranking should work. The ranking isn't designed for scholarship in any way.
RUTH PICKERING: Right.
ANURAG ACHARYA: It is designed for everything. Scholar, on the other hand, is designed with what you have in mind.
AUDIENCE: OK, thank you.
ANURAG ACHARYA: And it should do exactly as you as described. And if it doesn't, tell me.
NIALL LITTLE: All right, we've got another down here.
AUDIENCE: A number of publishers are adding three-bullet summaries, graphical abstracts, or plain-language summaries to try to help reach the broader audience, the non-subject specialists. Do those need to be included in the same field as the abstract in order to display on a mobile device?-- particularly if that's the device someone will be using when they make a decision about whether to go further?
ANURAG ACHARYA: So, yes and no. If it is included in some field, I can make it work. If it is not marked out at all, I may still be able to make it work. But then it is less certain. Many times, the editorial summary is marked out as a separate type of abstract. And that's fine, too. All the versions will get displayed.
AUDIENCE: Casa plus Kopernio-- talk. [LAUGHTER]
ANURAG ACHARYA: Say again? [LAUGHTER]
AUDIENCE: Is there any integration at all or planned integration between Casa and Kopernio?
JAN REICHELT: There is no, let's say, planned integration going on right now. But Anurag and I have indeed started to discuss about these topics. Just consider-- Kopernio is really, really young as an idea and product. So all hands full-- but I do think a combination of that technology would be very powerful to get people better access-- yes.
NIALL LITTLE: I've got one.
AUDIENCE: Oh, good. [INAUDIBLE] As it relates to search and discovery, what role do you think ontologies will play going forward? And is there some real, maybe, medical examples where people have really found a lot of value there?
RUTH PICKERING: So when we founded Yewno, I was not from a publishing background. I didn't even know what metadata was. And then as I became more and more familiar with metadata, I have come to the conclusion that-- I think I'm going to be thrown out if I say what I'm thinking. I am not a great metadata fan. I see it as a necessary evil. And I think ontologies could be replaced and should be replaced by something that is much more dynamic and changes.
RUTH PICKERING: And the fact that there are so many, to me, is just pointing to the fact that they're wrong and that they don't work. And I think they were necessary historically, before you had things like AI capabilities reading the full text. But now that you can read full text and extract concepts at such a granular level, I think they'll all become less and less relevant.
RUTH PICKERING:
ANURAG ACHARYA: Wait-- [INAUDIBLE]..
NIALL LITTLE: All right.
ANURAG ACHARYA: So yes, I completely agree with her at one level. But there is a different aspect. For things where you're not looking for something, how do I let you indicate interest in a broader field? I need some mechanism for you to be able to specify a broader field. Now whether that network is automatically computed or whether that network is manually computed, the network has to be stable so that the user can actually recognize and say, yes, this is the component that I'm interested in.
ANURAG ACHARYA: So that would be one place where I would think some kind of an organization would be useful to the researcher.
AUDIENCE: Ruth, the context within which I'm as familiar is-- you're working with libraries to put this tool in front of libraries' users. What are some other applications that you're working with? An then specifically, is it possible to put this tool on my site for my readers and researchers?
RUTH PICKERING: Yeah, that's a really interesting question. And we've been asked recently-- a couple of people have asked me if their editors could use it as a tool. And I've had a VC ask me if they could use it as a tool. And their particular use case was-- we get sent all of these proposals from all these startups of all this new stuff. We don't know anything about it. And we somehow have to decide whether or not to give them money.
RUTH PICKERING: And so one of the great use cases is, of course, because it's so visual and because it makes all these connections-- and actually, even just looking at who's published on something gives you a sense of where that is. And we recent included patents, which is super useful. So I think there are use cases, end-to-end, in education. I talked about high schools. I think people are familiar with higher education.
RUTH PICKERING: I think, in different library types, it's also super relevant. We see applications in finance. We actually have some financial products. They're completely different. They do not look like that. But if you think about what the technology is doing-- say, reading text, identifying meaning in the form of concepts, creating a graph that changes-- so our graph for education is updated every 24 hours.
RUTH PICKERING: Our finance graph is updated every minute. It ingests real-time news. And if you think about concepts and how they relate and how they cluster and then how they move, you can imagine how you can use that to create things like predictive analytics for indexes. And that's, in fact, what we do. We launched the AI index in January. And we've got a few other indexes.
RUTH PICKERING: So there are lots of other types of applications. But yes, I agree with you. I think it's a brilliant tool for people like editors and authors, in terms of being able to explore and understand around a topic in a completely different way.
NIALL LITTLE: Any other live questions?
AUDIENCE: This question is for Jan. I'm wondering what you think about the RA21 initiative.
JAN REICHELT: Yeah, so we already work with the-- let's say-- established access mechanisms. So say, Shibboleth or Easy Proxy-- we integrate with these types of technologies already. And the way I would see is, depending on how RA21 will develop, we're more than happy to work with RA21 as well. So I think it adds to the coverage, overall, to give people more access to research papers. So that's my view on that.
JAN REICHELT: I think it would be great if it succeeded. It would match with our mission of making that content more widely available. And everything that helps, I think, would be great.
NIALL LITTLE: Would you also answer that question? [INAUDIBLE]
ANURAG ACHARYA: I didn't have an opportun-- what was the question?
AUDIENCE: Oh, just wondering your thought--
ANURAG ACHARYA: That's an open question about RA21? What would you like me to answer about?
AUDIENCE: Just wondering, what are your thoughts about it? Do you think it's a worthwhile initiative? Do you think it will succeed?
ANURAG ACHARYA: That's a question that I would not have the ability to answer. What I can say is, one of the challenges is the logistics of getting so many people on board. They have approaches that they're considering that will let them scale. But it still remains a challenge. Again, I have not solved those kinds of problems. The problems I solve are more often related to what I can build and what I can build along with people in this room.
ANURAG ACHARYA: But for RA21, you have to go work with another group of people who are not in this room as well-- the libraries. It's a different challenge, the challenge that I'm not familiar with.
AUDIENCE: So what are the boundaries of each of your services? And how are those boundaries changing?-- in terms of what's in the scope of discovery and what's outside the scope of discovery.
JAN REICHELT: Yeah, as you can tell, for Kopernio's particular case, it's more like, OK, what is after discovery?-- after I found what I found and I actually want to have it. So in that sense, there is nothing in scope for Kopernio to expand into search and discovery. Because I think there are quite a lot of attempts out there to further improve on that. So we want to be very focused. To us, it's more like, OK, how can we fill those gaps where, also, Kopernio can't deliver anything, be it open access or subscription content?
JAN REICHELT: And then figuring out how we can work, for example, with publishers to share the data and then say, OK, well, in those cases maybe sometimes there was just a website that was down, or it was a technical glitch or something like that-- and then figure out how we can close those gaps. So for access and discovery-- for the Kopernio piece, it's not so much in scope.
RUTH PICKERING: So I think, from our point-of-view, machine learning will just get better and better. So the more content we get, the better everything will get. And the more different things people will find. I think some of the things that we've been really excited about this year are introducing a multilingual environment. So you can now look at the graph in English or German or Chinese or two or three of those in combination. So that's super interesting, layering on the different languages.
RUTH PICKERING: And I think that's quite a unique capability. And I think those are probably two of the most interesting things that we see.
ANURAG ACHARYA: So I'll actually borrow something that Ruth said earlier today. I believe, over the years, we have done a fairly reasonable job of helping you find things, if you are able to articulate what it is that you're looking for. That presupposes two things-- that you know how to articulate what you're looking for and you know you need to be looking for something.
ANURAG ACHARYA: So those are the two things that are beyond the boundary today. And that would be the place where I would like to expand the boundary. Some of what I showed today was trying to address part of one of them-- how to help you get kinds of terminology within a field. We have some levels of recommendations for people who have profiles and are authors.
ANURAG ACHARYA: That is an important subset of our user base. But that's not the entire user base. There is an additional opportunity over there. So those would be the ones I would [INAUDIBLE]..
AUDIENCE: All right, so one of the things, especially for our platform strategies thing-- if we're just going to be repositories of PDFs that get served out and nobody visits the websites to see all of the curated collections and taxonomies and other things that we're building around that, I think that's-- well, that's a publisher right. We're always sad to hear that kind of stuff. So I'm just wondering-- and we talk about maybe the PDF going away eventually or better mobile experience and things like that.
AUDIENCE: So one of the things Google Scholar does is you can share your subscriber information. And then they give you the choice of going to the abstract page or get the PDF directly and will verify access and things like that. That works for IP authentication at least. So I guess this is more for Jan. Does Kopernio have any desire to also link back and give the user a choice of where they go and not just leave it to whatever links happen to be the page?-- things like that.
JAN REICHELT: Yeah, so at the moment, when you look at a PDF, we do already link back to the journal. So we do that at the moment. But we've had, already, a couple of conversations with publishers and got that feedback. So the thought is now, what else-- so I think the most important step that was very important for us to address first is-- how can we recreate this engagement with an end-user in the first place?
JAN REICHELT: How can we engaged an end-user? Because if we don't even manage to do the first step, then everything else further down the road has less impact. And so then we think around, OK, first of all, how can we link back to the publisher page? How can we display additional supplementary material? We've, of course, also heard from the librarians that they would like to somehow see something like, brought to you by.
JAN REICHELT: Because they spend loads of money on content subscriptions. And they want to make sure that they, next year, have that budget as well. So if we overall can increase that impact, once we have engaged the end-user, I think that's a good outcome. And for us, at the moment-- so we can also follow up afterwards-- we are now in the phase of exploring, how can we now satisfy the additional interests that we generate?-- which makes complete sense.
JAN REICHELT: And that includes all the publishers as well. So that would be something like further promotion or link outs or supplementary materials or whatever it might be. So we have a couple of ideas. But we can't just do everything at once. I think we ought to be careful how much we want to do-- and step by step.
NIALL LITTLE: All right, I think we have time for one more question. This question is-- oop, where did it go? So we talked a little bit about Kopernio and Casa working together. But this question is about Scholar, Casa, and Kopernio working together-- how could the power of Scholar Search and Yewno Discovery be paired together as a research workflow?
RUTH PICKERING: So I think I said at the beginning, we do not think that people are just going to go to one place. We see this as an ecosystem today. People will go to multiple places when they do their research. And there'll be personal preferences. And there'll be different places you go, because they're relevant to the different type of research you're doing. So I think the people who use Discover do use Google Scholar today.
RUTH PICKERING: They're always talking about it. I know that Hannah did in her research paper as well. So I'll hand over to Anurag. I don't know what you think.
ANURAG ACHARYA: I don't know. [LAUGHTER] Honestly, I don't know. I would like to approach these one step at a time. I'll share some background about visual approaches. At Google we have tried visual approaches in the past-- not in Scholar, but in the main web search. Visual approaches, at least in the way that we have explored them, have not been successful for users, largely because it requires more work from the user to be able to navigate the visualization.
ANURAG ACHARYA: They would prefer, from the point-of-view of just getting their work done, to do what you think is the most likely part of what I would do and present it to me. Now this is not to say that there aren't alternative pathways that would be way more successful than we were in the past. But that is a challenge visual presentations usually have. They require more engagement.
ANURAG ACHARYA: They require more work for you to do what might be relatively straightforwardly achieved by the machine pre-analyzing it for you.
NIALL LITTLE: All right-- ooh! Whoa! Yeah, we'll close there. Thank you to our panel-- very much. I believe we have a networking break now. [APPLAUSE] But thank you all so much. [APPLAUSE]