Name:
SSP Innovation Showcase (Winter 2024)
Description:
SSP Innovation Showcase (Winter 2024)
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/1262d9f8-9163-433a-978e-b1e7b91386dc/thumbnails/1262d9f8-9163-433a-978e-b1e7b91386dc.png
Duration:
T01H00M37S
Embed URL:
https://stream.cadmore.media/player/1262d9f8-9163-433a-978e-b1e7b91386dc
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/1262d9f8-9163-433a-978e-b1e7b91386dc/GMT20240222-160123_Recording_gallery_1920x1080.mp4?sv=2019-02-02&sr=c&sig=YZ2itZeBtsJ%2BWfdq%2B%2FLY7mNBwl1MfBrwQMtVr6KYhKE%3D&st=2025-01-02T14%3A10%3A12Z&se=2025-01-02T16%3A15%3A12Z&sp=r
Upload Date:
2024-07-22T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
Recording in progress. Just give it a minute or two.
It looks like participants are streaming in. That's great. That's great. That's great. Somebody has an echo going on.
Welcome, everybody. We'll be getting started in just a moment. Just giving people the time to join in.
While we're waiting, um, check the chat. Uh, Susan from SSP has asked. US and everybody to let you to let us know where you're joining from. So we'd love to see the breadth and scope of SSP on today's webinar. Well, maybe even some sort of shout out for the person coming from the farthest distance from, let's say, where I'm from, which is right outside of Washington, d.c., in Potomac, Maryland.
We got three UCS coming in. Sorry Robbie. From Canada. Not far enough. Uh, Costco. Great All right.
I think it's time to get started. We have close to 40 participants already. So thank you and welcome to today's SSP innovation showcase. I'm Dave Myers, the CEO of data licensing alliance and a member of the SSP education committee. And I will be moderating today. Before we get started, I have just a few housekeeping items to review.
Attendees microphones have been muted automatically. Please use the Q&A feature in Zoom to enter questions for the moderators and panelists. It's located at the bottom of the Zoom interface. You can also use the chat feature, also located there to communicate directly with other participants and organizers or myself. I'll be monitoring both of those. Closed captions have been enabled and you can view captions by selecting the More option on your screen and choosing show captions in the drop down menu.
This is a one hour session and will be recorded and available following today's events. Registered attendees will be sent an email when the recording is available, so and folks that didn't register will still be able to access free of charge an SSPS on demand library. Again, before we get started, a quick note on SSPS code of conduct and today's meeting. We are committed to diversity, equity and providing inclusive meeting environment that fosters open dialogue and a free expression of ideas, free of harassment, discrimination and hostile conduct.
We ask all participants whether speaking or in the chat, to consider and debate relevant viewpoints in an orderly, respectful and fair manner. So enough about the housekeeping about today's webinar. Today's webinar will showcase three companies who are present for 10 to 15 minutes. The presenting companies include kudos openathens and Bremner astrolabe company. After all, presentations are complete, participants can ask questions.
Again, use the Q&A box or the chat, and I will direct that to the appropriate panelist. You can will also be able to connect with them directly via QR code that you can scan for their contact information. So now, without further ado, I'm pleased to introduce our first panelist, David Sommer, co-founder and chief product officer at kudos. Thank you very much indeed, Dave, and great to have everybody here.
Thank you so much for joining. So yeah, I'm David Sommer, co-founder, chief product officer at kudos or kudos, as we say here. And today I'm going to be talking about how we're using AI to really revolutionize the way in which you can tell the story of your research. So just to give a bit of background, kudos is a platform for showcasing research. So we are all about telling the story of research, and we do that through stories.
So stories are plain language pages, so story pages contain plain language summaries, plus we then link together all of the research assets that relate to that bit of research. That could be videos, posters, infographics, data sets, images and so on. So we bring all those together as stories and then we roll those stories up as beautiful showcases. So magazine style pages that can be searched and browsed and these can be rolled up in all sorts of different ways.
So we've got some examples here. The Royal Society of chemistry is publisher research showcase. We have subject showcases, but we also collaborate across publishers on these grand challenges. So climate change, coronavirus, sustainable development. And so on. We have 500,000 registered researchers across the world from 10,000 top institutions using the platform every day. So that's just a little bit of background.
Let me tell you a bit about how we create these stories, how we get the summaries onto kudos. So traditionally the way we've done this is by being integrated into the publisher workflows. So when somebody publishes an article, we send an invitation on behalf of that publisher co-branded, inviting them to come and use the platform and create their summary. We have hundreds of thousands of people who've done that, but we also integrate further upstream in the process.
So integrating with the manuscript submission systems such as scholar one editorial manager, press, where people can actually create their summary at the point of submission, and then that automatically gets flowed into kudos, ready to be made available when the article is published. We also have our own team of expert writers that are available to handcraft and create bespoke summaries of articles, clinical trials, article extenders, projects, whatever the research relates to.
So we've been doing that for some time. The new bit, the really innovative bit I'm going to talk about today is AI generated summaries. So when we look at sort of the anatomy of one of our story pages, there's various elements to it. We have a plain language title, which is like a headline, a nice sort of short, simple summary. Sometimes the actual official title can be very complex to understand.
We have the what's it about and the why it's important. Fields so what's it about? Is a simple non-technical summary of what the article or the research is about and then why it's important. Why should somebody spend time reading this? Why does it matter? Why is it timely? Why is it unique?
What are the key takeaways from that? And then we have other elements such as images, links to all the related elements, quotes and perspectives from the different researchers involved. But we decided to start with the three most important elements when we're thinking about AI. So that's the plain language title, the what's it about and the why it's important. So we started running a pilot back at the end of last year.
We've been running this with a number of publishers and here's a really sort of simplified overview of the process. So we take the full text PDF from the publisher with their agreement. We then summarize that and create the plain language title. The what's it about, the why it's important and the key takeaways.
And we do that in conjunction with our partners at cactus labs. They've generated their large language model. And what's really important about that is it does not train the model. So if unlike something like chatgpt, if you're submitting full text there, then you lose control of that, that content could end up anywhere. And the legislation around that is not there yet.
So you really lose control of that. But with our model, we only use it for the purposes of summarizing and then we delete it after it's created. So it's a safe model that you remain in control of your content, which is obviously really important. We then take those summaries and we add them to the platform. So we create the story pages automatically.
We then email the authors, they can come and review them, they can edit them, they can tweak them if they want to, and then they're automatically then rolled up onto these showcases, thematic showcases, subject showcases, publisher showcases, and so on to help get the maximum readership from those. And we have, you know, significant readership. I'll show you some stats later on, but we've had something like 90 million views of content since we've started this, and the results really are quite impressive.
But I'll share more of that with you in a moment. So as I said, we did this pilot and we have contacted all of the people that we summarized their work and we asked for their feedback and really, really good feedback, both in terms of the accuracy of the summaries and in terms of the readability. Both are really important. So the really exciting news is 100% of people said both the accuracy and the readability was either excellent or good.
So excellent. No issues at all. No errors. Good know almost there. A couple of minor tweaks, but it's pretty much there. So 100% saying good or excellent, which we were delighted with. And we've gone into sort of more detail and looked at the feedback, which has also been excellent.
And I'm going to share one very specific example to meet professor Bernie Carter. She is at edgehill University in the UK. So one of her articles was one that we summarized as part of this pilot. And there you can see what's it about and the why it's important that was created by our AI process. And you can see hopefully you can read that. But you can look at the slides later.
If not, it's very readable. The English is very good and it's a really nice, succinct summary of what it's about, why it's important, what are the key points around that need to be aware of? And Bernie also said that she would love to have this for all of her work if she could have these auto summaries generated for all her work, that would be fantastic.
That would really save her time. And the point is, it's much easier to go from 1 to 2 than it is from 0 to 1. So if you're starting with something, it's very easy to add to it. But the big blank canvas, the big scary, empty sort of blank screen can be a bit daunting to start with. So having something to start with, which you can then build on, she really value that.
And in fact she's gone in and added a nice image and a little perspective and some links to her particular bit of Article research there as well. And this has generated significant usage since this was launched. So what does all this mean for publishers and farmer and other people on this call? Well, it means that you can automatically create really good quality summaries of your research, and you can do that without the authors needing to be involved.
They can be involved, but they don't need to be involved. So it removes a barrier. People are busy. They want to they've done their publication. The publisher's job is to get it out there. So this is a really good way post-publication of helping them understand how they can get the most readership, most citations, the most utilization for their work.
So you can create the summaries and then you can roll them up into topic clusters, thematic showcases around whatever areas of research matter to you. And I'm going to show you a few examples of those. So we have a whole range of thematic showcases we're creating. You can see some of them there cancer, agriculture, neuroscience, aging. And so on.
So you can join one of our thematic showcases which are all timed to coincide with various events and congresses around the world. You can see some of the sort of content calendar we have there, or you can do what some of our publishers have done and create your own topics as well. And that can either be just for one publisher or it can be across a range of publishers. We did an AI showcase, for example, across a number of publishers, so lots of flexibility on how these can be used.
Here's a couple more examples. So Royal Society of chemistry, one of our publisher clients, they created a showcase for sensors and diagnostics, so one of their journals, open access journals, and they wanted to find a way to bring together the best content on that, to really inspire people to submit to that particular journal. Similarly, RSI, again with one of their other journals, have used this to really attract early career researchers.
So to show really good results for early career researchers can have so very, very easy to spin these showcases up. And pnas, they have a prize called the cozzarelli prize named after one of their editor in Chiefs. And every year they have a competition and they're using kudos showcase to actually promote and sort of raise awareness of the finalists in the cozzarelli prize each year. So again, a really nice way to kind of reward authors and give something back to them as well.
And the authors absolutely love this. We can also create these showcases around therapy areas. So whether it's diabetes or obesity or oncology or whatever it happens to be, we can have cross publisher showcases around those areas, whether it's big grand challenges such as climate change, covid, sustainable development, or if there are specific events and forum and symposia as the I have done here as well.
So lots of flexibility on how these might be done. And just want to share a couple of numbers with you. So let's go back to the climate change knowledge cooperative, which we launched in time for the cop26 meeting. So this is how the showcase looks. We've had over 200,000 views of this showcase already and the content on it, it's getting closer to a quarter of a million views now. Our most viewed article had over 12,000 views, which is staggering.
It's about 400 views. Typically for an average article. And over 50,000 people have accessed these pages since we launched it and they're viewing lots of pages. The industry standard is to generally view about two pages on a site like this. We're getting over 10 different articles viewed in a session there as well, and there's much higher usage, higher altmetric scores and higher citations for articles that are included on these showcases as well.
So some really good concrete, tangible results there. And if you'd like a bit more information about that, I'm happy to share more granular data with you as well. So just two more slides then. So I just want to talk a little bit about metrics. So we are gathering a lot of metrics behind the scenes here. So we have, for example, 22 different categories of user type. So five different researcher roles, pharmaceutical roles, patients, students.
There's a whole range of different areas there, HTTPS health professionals, and we're capturing that data alongside with the subject areas. They're interested in the universities that they're at and the countries they're at as well, and the metrics. So we can provide all of that data for stories and showcases and then roll that all up on beautiful dashboards as well. So you've got all the stats there and all of the raw data available to you as well.
And this is a unique data set to being able to categorize readers by that sort of level of granularity. Our publishers find that really, really useful. So my final slide and there is a QR code there if you want to scan that and find out more information about anything I've talked about so far. But the summary is that we are up and running with summaries, we're innovating, and this lets you really scale up very quickly without the barrier of having to get authors to do something.
So you're able to very quickly take a corpus of content, whether it's a journal, a subject, some back, whatever it is that works for you, create summaries with that and create showcases very, very quickly. The quality is really good. As you saw, authors are really happy with this. They like it. They don't have the sort of concerns of what's going to come out of the other end of the machine because it's a language model that's been generated based on academic research.
It knows from the prompt engineering and so on exactly the right way to do that. And it's not training up the model. So you stay in control. Your content is not being used in ways that you're not aware of. And as you've seen, you can create those topic clusters, those thematic showcases you can create your own, you can join others, really does grow awareness of that.
And when those sort of summaries are created, it significantly increases citations, metrics and usage. And finally, you've got all of the unique and valuable data around that, the metrics which help guide you and understand what's effective, what's less effective, how you can get the best return given your limited time you've got there. So that's where I'll wrap it up. We have a Q&A session, I think, later on in the session, but I'll hand it back over to Dave.
So thank you very much. Thank you, David. It was great. And I clearly is a theme that we're going to be talking about. As David said, we'll be holding off on our questions until the end. I hope that's OK because we want to make sure we give appropriate time for our presenters speaking of that, our second presenter is Kieran prince, excuse me, business development manager for openathens.
After you. Thanks, Dave. Just waiting for my slides. Probably can advance it yourself.
There we go. So slight delay there. Yeah hello, everybody. Yeah appreciate you being with us and appreciate SSP for inviting me. As well. Just make sure I can move these slides. There we go.
We're in. Um, Yeah. Thanks, everybody, for. For being here. So, yeah, openathens, Uh, has been building single sign on solutions for libraries and publishers for about 25 years. So, yeah, my name's Kieran. I'm a business development manager. So we'll be looking at the new openathens reporting tool and appreciate season changing the data.
My slice, it did say 2022. So like I said, we've been building single sign on solutions for libraries and publishers for a long time. Many of you have been in the industry for a long time. Well, may have known us as Athens. But yeah, openathens has been supporting connections between libraries and publishers and ultimately the users who are accessing those publishing resources for a long time.
So you might expect, we handle lots of data. We work with publishers and libraries all over the world, and we collect lots of data. In fact, we have customers now in 87 countries, although we're based in the UK. The us is our biggest market by far, accounting for about 50% of our customers. But for a small team, we are truly global. We resell through partners across the world.
We have a support office in Singapore to give us almost around the clock support. But yeah, customers all over the world and that number is growing every year. And that accounts for all, again, over 3,000 customers as it stands. A customer for us is a library, an academic library, a public library, a school or a corporate library.
And publishers, like many of you on here today. So vendors, e-book platforms, academic journals, essentially. Yeah connecting users with content via the library. On average, we have over 2.5 million monthly users. So those are students, patrons, medical practitioners, researchers are going through the openathens service to get access to their subscriptions that their librarians very carefully curate for them.
And then looking into lots of resources. So we work with over 350 vendors and publishers. Again, yeah, all of that traffic going through our service and many of you on here today will be familiar with how that transaction works. But yeah, the point, lots of data going through our service. In fact, over 200 million transfers last year, a number again, as customer numbers grow, those transfers will increase year on year.
So lots of data. So of course it makes sense for us to make use of that data and give it back to the publishers. So why didn't we think about this sooner? Build a reporting tool. Libraries have had access to this for a long time. They love it. They use it often in their negotiations with their vendors.
So we are again using this data and giving something back to our publisher customers. So here's our dashboards. This is our summary section. So our publisher clients have access to this dashboard and ultimately, like I said, it's a summary page giving you a snapshot of usage. So for this publisher in particular, we can see they've had over 90,000 transfers from over 2000 unique organizations.
It defaults to a 30 day period. You can change this to seven days if you like. But yeah, like I said, giving you highlights over the last period of how your customers are interacting with your platform and you can see down the bottom there total transfers per customer with that daily average as well. So you can kind of spot trends and get an insight into that period and how your customers are logging into your platform.
Under the dropdown, you can see we have a default view of the top parent organizations. Many of you will sell to consortiums library consortiums, so the default view will give you access to the consortium view with top organizations being individual libraries within the consortium or just a University or a public library. So you can filter down by organization.
We also have countries against self explanatory. You can see where your users are and where they're logging into your platform from. And we have applications. So an application. In this scenario is you might have multiple products, you might have an book platform, a journal platform you can filter down by application, by product to, to gain a view of usage per product.
So we look at countries again, many of you will sell internationally. So you can have a look at how regions are performing, how regions are performing. Again, particularly if you have regional sales teams or regional sales managers, you can have a look at usage per country and this will automatically filter by the most popular, most popular country.
And again, the total transfers for the daily average. And yeah, like I said, many of you will go and sell your subscriptions internationally. So you can see how each region is performing. But again, ultimately giving you a highlight of that period. So that's the summary. But we also have custom reports giving you a little bit more control over the view you have.
Um, so yeah, a few more filters at the top here. So again, you can filter down by application if you want to see usage for a customer in particular, you can filter down by individual organizations or multiple organizations at the same time. Again, if you want to see how a particular country is performing, maybe you're a regional sales manager. You can filter down by country and you can extend the date range within here as well.
So you can extend it to 12 months. So you're going into those renewal conversations. You can gather 12 months worth of data and take that into those conversations with your client. But ultimately, again, this is a highlight. This is a spotlight to give you a quick view of how your products are performing in different markets and with different customers.
But we give you the ability to download custom reports. You can do this on a regular basis or indeed download the full transfer report to see exactly how often your customers are logging into your platform and how often. So yeah, what we're recording again, we're recording transfer data per organization.
Pretty self-explanatory. And transfer data per application. If you have multiple products, we can filter down by your products. We also have filters by country. Again, for those international products you're selling to different markets we can filter down by those countries. If released, we can also see students, staff or alumni at status.
So again, many of you will sell archive content to alumni, so you can see data for that. For those of you familiar, we have Federation traffic. So if you're part of incommon or the federation, we expose that data to your products as well. And lots of our customers will do custom saml connections through openathens. So you may have a corporate client you're connecting with directly.
You can see data for those, so they don't necessarily have to be openathens customers. And we also record turnaway data. So turnaway data, the user might attempt to log into your platform and there isn't necessarily a subscription in place. We still record that data as well. So that can be quite valuable for sales teams. If you've got, let's say, a particularly tricky prospect, you can go in and look at the data to see if there's any demonstration of demand from the users.
And we've seen examples of, you know, hundreds of users attempted to log in to different platforms without a subscription taking place. So that can kind of spark up an interesting conversation. But ultimately, all of this data is in one place to kind of saving you time or managing multiple CSV files and giving you a little bit more granularity over whether they're a student or a staff member.
But yeah, all the data is accessed through our dashboard for our customers. So I think we're doing questions at the end. So I'll hand back to you, Dave, but appreciate it. Thank you. Thank you, Karen. Really appreciate it. OK, great.
Our last presenter, um, the CEO of grimner, Anand. Thank you, Dave. I hope I'm audible. I'll wait for the slides to come on and we shall dive right in. Thank you, Susan.
I'll begin by introducing who we are as an organization. So I'm part of gram, which is a data and analytics company that was acquired by strive recently. And to give you a very quick overview of strive. 20% of the world's research passes through us. That includes the likes of elsevier, lexisnexis, Springer Nature. We are also quite actively into edtech. Roughly two out of five five students in the Uc effectively benefit from our content.
The third area that we are into is data and analytics, and that's the part that I focus on. I run the data and analytics division of strive. And what we do here is primarily automate insights from data. Effectively we take data, we convert that into something that's a bit beyond just the numbers and then narrate those as stories. But in order to do that, one of the first things that we need to do is know what constitutes an insight.
And this is something that our clients often ask us, what do you mean by an insight? What I'm going to do is in this session, teach you, explain to you how you can go about identifying an insight from data and give you a feel for how we apply this as an organization. Firstly, what constitutes an insight? We have a fairly specific definition. An insight is something that's big, useful, and surprising.
Big in the sense we want to make a substantial change with it. Don't give me a point. Zero 1% improvement on the top line. Who cares? It's got to be useful. I've got to be able to act on it. So if you were to tell a farmer that, let's say rainfall increases his, Uh, his crop yield. That's not useful because it's not likely that a farmer is going to be able to increase their level of rainfall.
Of course, this is contextual because you could have this information passed on to an organization like Monsanto who might be able to seed the clouds and actually increase rainfall. But it's got to be actionable, and that's what useful means. And up to this is a reasonably straightforward. But the third thing that an insight must qualify for is that it's surprising.
Tell me something I didn't know before. Is it non-obvious? And this is usually the tricky part. How do we go about systematically identifying what is not obvious? This is what we offer as a service through technology by having a systematic process and a systematic set of libraries that enable the creation of insights that are surprising.
And what I'm going to do is, like I said, teach you a portion of the process. So that you'll be able to apply this yourself. I'm going to begin with a piece of work that we were doing for a. An education Ministry in the government. They had data which had the students marks for every single student for every single year over the last decade.
And they asked, what can you tell us that is surprising from the data? And we began our process, which is simply to establish what is apparently obvious to them and see if there are deviations. Are there outliers? That's one of the easiest ways of detecting surprising insights. One of the things we were asking them was, what do you believe is the distribution of marks?
For example, if you took English as a subject, do you believe that it would represent a normal curve? Very few students having low marks, very few students having high marks. And it would be a smooth curve in between. They said, yes, absolutely. We believe that's true for every single subject. And as one of the hypotheses, we said, let's test that out. And this is what we got.
For English. The marks are. Yeah, kind of normally distributed. But there is a spike and the spike happens to be at 35 marks, which is odd because that's exactly the marks at which students pass. So it appears as if the teachers look at the students who score 34 and out of sheer kindness, let's assume they just grant them a few grace marks, and that pushes those students up to 35.
But you'll notice something. The number of students who score 34 is not zero. So there are some English teachers who say, no, you shall not pass, at least not on my watch. And the students had better hope and pray that they don't get corrected by those teachers. There certainly is a lot of room for prayer, at least when it comes to English. But when we look at the social sciences, you don't need to pray.
For whatever reason, nobody fails between 30 to 35. The teachers seem to have gotten their act together, and everyone who gets between 30 and 34 just gets moved up to 35. And the education department said, look, we swear we have nothing to do with this. There is no official policy on moderation. We have we have no idea why the English teachers haven't figured this out.
But the social science teachers seem to have gotten together and are doing this by themselves. The mathematics teachers are doing that as well. Except that in this case, it's a little more stark because that number is chopped off at $40,000 students. In reality, it goes on to well over 100,000 students who scored exactly 35 marks. So you can get a sense of the degree of. Generosity, so to speak, that the mathematics teachers have.
But the other anomaly which the Department spotted was we did not expect that there would be an upward spike at 100. Why is that? There was a big debate and there were two possible answers that emerged. The first. Is mathematics is objective. So the teachers feel justified and confident that they can award 100 to a student who has, in fact.
Scored everything correct, which sounded plausible. Another explanation that was proposed was mathematics is bimodal. That is, there are students who get it. And there are students that don't. And these actually form two normal curves. It's almost like a grade 12 student taking a grade 10 exam on the right side. And because they can't score more than 100, it just gets squeezed to the right.
This sort of a pattern is seen in computer science as well. We don't know which of these has the larger impact, but mathematics is one of those truly unusual subjects where you see this kind of a bimodal. Please simply ask for what people think is obvious and see if it's true. It's not always something that leads to an insight, but it is a systematic process that can be automated.
Let's take another one. We were curious what influences Marx, what are the factors that drive performance and does gender have an impact? We asked them, what's your hypothesis? Without exception, every single person in the room said, we believe girls score more than boys. We tested that hypothesis. And it is true.
There is almost no subject where there is no subject, where boys score higher than girls. Vantage point advantage there, but it very much validates their hypothesis and therefore was not an insight. But the power of collecting hypothesis is in not dropping something because we believe it's probably going to be true or it's probably going to be false.
To give you an example of that, someone raised their hand and said, do you think sun signs make a difference? And we did a quick poll across the room. Not a single person believed that the sun sign had an influence on the marks, which makes perfect sense. It was a fairly scientific audience. We believe that as well. And this was one of those that we actually just decided to drop, except that there was one gentleman who was quite persistent.
And he said, you know, just test it out, will you? So we did and found, to our surprise that it actually makes a pretty large difference. June, bonds tended to score the lowest September. Bonds tended to score the highest, and the difference was a whopping 10 percentage points on an aggregate of 1,200. The difference was about 120, and this was the second largest factor that influenced maths.
It was not one off. We did this year after year, same pattern. We broke it up by district. We broke it up by gender. We broke it up by subject. We broke it up by grade. Any which way we sliced it, the pattern remained the same. Now, some of you may have read about a phenomenon like this in Malcolm gladwell's outliers, where he talks about the Canadian junior hockey team having almost exclusively students, children who were born in January, February or March.
And it's simply attributable to the age cutoff. See the schools open around August, July, August somebody who is born in July or more likely, June just makes the cut off and tends to be the youngest in the class. Somebody who's born in August, more likely September invariably Mrs. the cut off and ends up being the oldest in the class. And at that age, a one year or an 11 month difference is massive.
It has an impact on their physical, mental, emotional maturity and to the point where 10 years later it leads to a 10 percentage point difference, which is massive. And since birds were so interesting, we said, look, we want to go a little deeper into this. And we just want to see test another hypothesis, whether birthdays are random. Are they in fact? And the hypothesis was, yes, they should be.
Why would birthdays not be random? Now we plotted 15 years worth of birth data in the Uc and it turns out that it's far from random. January to December. Top to bottom, from the first of the month to the 31st of the month. These are the columns. The darker the cell, the more children are born on that day of the year.
And you can see that there are several children born in June, July, August, September, kind of in October, but not so much in January, February, March. Let me point out a few things. You'll notice that the second half of September is where the majority of the births are. Many of you are probably aware of this phenomenon. Most of the conceptions happened during the winter holiday seasons, and nine months later is when the kids are born.
These are the winter babies. But what's interesting is that there are surprisingly few birds during the Christmas holidays. During the Thanksgiving holidays, during the new year holidays, in fact, July 4th, Independence Day. It's one of the days with the least number of birds. This almost reads like a holiday calendar, not a bird calendar. You start wondering why.
But that can be explained because given the percentage of c-section births, this is data from 1975 to 1990 say, OK, it kind of makes sense if the hospital is going to be short staffed, if the doctors are not going to be available, if the consultants are busy, then you do have the flexibility to move the dates a little bit this way or that way. But let's not blame just the health system.
You'll notice that on the 13th of every month, the column that's the 13th is actually lighter than any other column. In other words, it's the parents who are saying, I don't want my child to be born on the 13th. It's no, that's not a lucky day. Let's instead go for the 14th of February, which is a very popular day for kids to be born on Valentine's Day.
Yeah, sure. That works great. So this is something that is such a powerful, powerful social behavior. We thought, let's apply it to other countries. We said we have data for India. Let's apply it to India. And what does that look like? Our hypothesis was that there would be anomalies, but we did not expect this.
This was weird. A significant number of births in may, in June and a few vertical stripes. But when we started looking closer, the most striking thing we found was how few births there were in August. We then went back to why this might be. We sourced this data from the very same school data set that we were working on, and this was admission data.
And that's when we realized, hold on, if the schools are going to close their cutoff dates in July and the parents who don't necessarily have to submit a certificate of birth to prove the age have the option of filling in a date, what are they going to do if the child is August born, wait for a year or fake the dates? Now we. We thought that was a little weak.
How how could we just guess that that was the reason? Until we looked a little closer and found that there were far more birds on the fifth, the 10th, the 15th, the 20th, the 25th. You see the vertical stripes there? You see, when people cook up dates, they cook up round numbers. It's far easier to say the fifth or the tenth of a month than the fourth or the 17th or whatever.
And that seems to have played a part. But the strongest evidence we got for that was when we overlaid the marks that children scored with these birthdays. So what you see here is a pattern where the Reds indicate low marks, the greens indicate higher marks, and you can ignore a few on the right extreme, those are nonexistent dates. But it appears that children who say they're born on the first, the fifth, the 10th, the 15th and the 20th and the 25th, especially in the first half of the year or close to the admission dates, they tend to score lower.
And that's because these are kids who are born later have been pushed into admission early and are suffering from the age disadvantage. In fact, the first yeah, the 1st of June is the single most common Bird Day in India by a huge margin. The second most common birthday is the 1st of January. Now, the 1st of January has a legislative reason. The law says if the child has does not have a known date of birth, then you fill in the 1st of January for whatever the year is.
But the 1st of June is purely an urban phenomenon driven by parents intense desire to have their kids come into school early. And it also happens to be the date on which, on average, children scored the lowest, at least if a kid says they're born on the 1st of June, on average, their score tends to be the lowest. Now, I did tell this to a manager of a fairly large bank in the country in India, and he said, look, hold on there.
I'm going to introduce you to someone. And he pointed me to a Lettie. He said, she's the Chartered accountancy gold medallist in the country last year. She's the she's born on the 1st of June. Let us see. Absolutely congratulations. First of all, I have nothing against people who are born on the 1st of June.
It's just that if somebody says their official birthday is the 1st of June, there's a slightly higher chance that the birthday is faked. So what we're doing here is nothing more than taking a set of hypotheses, filtering it for whether that hypothesis is true, in which case it's not a surprise or it is untrue, in which case there is a potential surprise and using that to determine what constitutes an insight.
My objective here was to give you a feel for the process that we apply, and I hope you learned something from this. But this is what we do as an organization. We take our clients data and we tell them something that hopefully surprises them even from their own data. I'd really love for you to explore your insights, to explore insights from your data, play around with it. It's a straightforward process.
Just write down your guesses, see if it's true, see if it's false. Somewhere you'll find something interesting. I'd love to take up any questions in the Q&A on how you might apply this. Thank you. Excellent well, Thank you. Um, it's now time for our a, and so if you have any questions for the panelists, please put them in the Q&A section at the bottom of the screen or in the chat.
We have both, and we will get to them in a second. Certainly and I have to challenge I am a June birthday and, you know, we we're going to have we're going to have an offline conversation about how wrong your data is. But that's a different story. Um, so anyway, Our first question from Lisa Braverman to kudos. Are you currently partnering with elsevier?
And do you have a list of publishers with whom you are partnering? Thanks, Lisa, for the question. Yes, we do partner with Elsevier. Elsevier are actually one of the sponsors of our climate change knowledge cooperative, along with a number of other publishers. So yeah, we do work with Elsevier plus a number of others.
I can message you directly about I think you're with the Society of radiology and oncology, so have a chat with you about that. But one thing to mention is because we integrate with orchid, that means that people can claim publications from lots of different journals, lots of different publishers. And with our 500,000 registered users, we already have a lot of people on the platform creating summaries themselves as well, which we can then enrich.
So yeah, I'm happy to have a conversation with you on that, but Thanks for the question. Thank you. Um, we have another one for David. Um, how are topic clusters created? That's a great question. So there's essentially two, two methods. There's a sort of dynamic and a sort of a curated approach. So the dynamic one is if we have some sort of tag, some sort of metadata that lets us group together stories.
So it might be everything from one publisher, everything from one journal, all research from an institution where at least one of the co-authors of that paper is from Harvard, for example, or from a subject area, then we can automatically create those and roll those up. And then we have the curated ones where we essentially have a list of dois. We can sort of create that based on what has been hand selected and we've got tools to help hand select those.
So we've got both options there and both have different uses. Thank you. Yeah Thank you. Thanks, Richard. Um, looks like David. You're the popular one at the moment. We have another. I'm a may birthday, so I don't know what that means. I forget what color that was on the chart, but, um, Uh, David from kudos, could an open access journal join individually?
And how much does it cost for them? Yes, open access. It's really important because availability does not equal accessibility. So just because something is open and can be accessed by anybody, it doesn't mean it will be understood or necessarily cited if people aren't understanding it. So, yes, open access is a really big part of what we do.
We work with open access publishers, open access journals, as well as subscription journals. And in terms of costs, yeah, we'll have a chat. But yeah, happy to explore more about that. Great Thanks. Thank you. Um, one for Ana. Um, how has AI enabled hampered data analysis and hypothesis testing?
We're seeing a couple of interesting trends, especially with large language models coming in like chatgpt and Gemini. One thing that I found that was perhaps most disruptive is the generation of hypothesis. So when people look at a data set and. You know, ask themselves what are our beliefs or our assumptions about this data? What do we think is true about business?
They come up with, say, half a dozen thoughts and ideas, but they're usually limited in what can be applied. Whereas once we start using the power of large language models, the set of ideas that come up in terms of what can be done seems to be significantly larger. And that's one way in which is disrupting this. The other is automated analysis. That is, you ask a question, the just looks at all possible ways of answering the result and comes up with an output.
In fact, to take something that David talked about, which is topic clustering. What we see is that when we use the power of embeddings in the large language models, which tell us if a piece of text is similar to another piece of text and the ways in which it can start linking a word like, let's say, country to region to continent without explicitly having to mention those and potentially in any language. What that means is that the way in which we start creating clusters becomes meaning based rather than word based.
And that opens up a whole new set of possibilities, which means that now we can do certain kinds of analysis that just were not possible before. So in short, at least with large language models in the last year, we are able to come up with more questions, come up with answers to questions that we didn't think were possible before. And the interplay of this is particularly powerful because it's automated.
So you just let the system run, asking the questions and answering them. And the only limiting factor is our ability to understand and act on it. Thank you. Are there any other questions? I have a few, but I'd like to open that up to the attendees to put it in the Q&A a, and while we're waiting for that. So I'll first one quick one for Kieran.
Um, so tell me a little bit about the, the integration. So are you integrated with all the major platform vendors or is there more to go? Oh yeah, it'd be great if we were integrated with. With all of them. Um, I'll call market is academic journals. Um, so, yeah, our first customers all those years ago were the core academic journals. So, you know, elsevier, Wiley, um, but new markets again internationally.
So regional publishers, non-native non-English speaking publishers are joining in recent years. And again, outside of the academic space. So, um, using current affairs platforms like the economist, for example. Um, so yeah, it's open to everybody. Again, essentially if you're selling content, institutional licensing into organizations, then we have a role to play in that for sure.
Um, but yeah, we certainly have our roots because of our roots in academia and University universities. Our roots are definitely in, in academic journals. And a quick follow up Follow up, is it just journals or is it all types of content? All types? Yeah Yeah. E-book platforms, applications, databases. Um, yeah, you name it.
Um, so yeah, journals are our core market, but for sure we have lots of vendors who are outside of that traditional academic space. Sounds good. Thank you. David over. Kudos question. So you talked about the summaries and creating summaries.
Um, but a lot of people, especially with Gen Ii, are looking just for answers. And so how do you think that the, the industry and, and technology will leapfrog summaries and just go straight to answers and wanting answers versus still having to read something. You're on mute. It was bound to happen.
Yeah, it's a great question. And yeah, there are great sort of answering type sort of platforms out there. Something like scite is great at doing that. I think it depends on the audience as well. So if you just want a quick answer to a question, then a sort of answer type platform is really good. If you want to get into it a little bit more detail and actually get the human side of it, the perspectives of the authors, the research team around that, the related links and so on, which may not be accessible just through a quick sort couple of sentence answer, then I think, yeah, kudos helps that and others do as well.
But yeah, things are moving very fast, which is why it's such an exciting area at the moment to be in. Yeah, absolutely. I mean, we're deep, you know, my practice is deeply involved in AI and so I appreciate that. The last question I have is over back to anon again. June I still have an issue with. But um, so the question that arises, it's fascinating your analysis, but that was humans looking deeper into the reasons about what the data was showing.
And that sparked an, you know, the concept of trust and trusting the data. And so AI systems are just taking data and ingesting it and spitting out answers. But clearly in the case of the faked birthdays, that's incorrect data and that could lead to incorrect answers. Um, tell me a little bit about, um, what your feelings are with that and also how grammar can help with that.
What we've seen is the majority of business users have this concern that the quality of the analysis is only as good as the quality of data. And therefore, unless the data is perfect, the analysis and the actions that they can take on. It cannot be perfect. In theory, that's true. In practice, we need nothing close to even perfection. All we need is a directional indicator, and with a reasonable amount of statistical analysis, you're able to get to some fairly confident results.
I'll put it another way. If you believe that your data is more than half wrong, then yeah, that could lead to some problems. But 10% of the data having issues not a big deal. And not just that. A it's possible to remove in an automated way any of this data. That's incorrect. It may be possible to automatically correct this kind of error as well.
For instance, one of the things that we are actively working on is researcher integrity. Can we identify a paper that's well, let's just say a fake paper to begin with that's being published in a journal. Now, one of the ways in which and we've found this to be a fairly effective technique that we can identify this is take a collection of known authors who have published papers that have been rejected from an integrity perspective.
Identify who their co-authors were. Identify who they've been publishing through, identify the topics they've been publishing in, and start connecting these as a network. So if I have an author who has co-authored with, let's say, two people who've been flagged off for integrity, then there's a fraud rank, so to speak, that we can associate with such an individual. And that raises a red flag.
We may want to investigate that further. In this case, what is happening is the data itself, which contains some known inaccuracies, is able to spot more issues and problems allowing us to filter and potentially correct. So, yeah, generally believe that unless the majority of the data is bad, we have both an opportunity to identify flaws as well as correct them. Thank you.
Well, I want to thank you and all the rest of the panelists and, of course, all of you attending the webinar for your participation in today's innovation showcase. Please use the QR codes in front of you to connect with them directly. And more importantly, we also want your feedback. Please consider completing a session evaluation form with the QR link in front of you.
As a reminder, a recording of today's session will be posted in on demand library in a few days. And I also want to note that the annual SSP meeting registration is now open and Kristen would really be upset with me if I didn't tell you that there are sponsorship opportunities are available for the annual meeting. Again, Thank you all for being here with us today. And this concludes our session.
Have a great day.