Name: SSP Innovation Showcase (Summer 2024)
Uploaded: 2024-07-22T00:00:00.0000000
Duration: T01H00M34S
Description: SSP Innovation Showcase (Summer 2024)

Name: SSP Innovation Showcase (Summer 2024)

Description: SSP Innovation Showcase (Summer 2024)

Thumbnail URL: https://cadmoremediastorage.blob.core.windows.net/b9dfbb9d-04f7-480c-a591-f210f9212ecb/thumbnails/b9dfbb9d-04f7-480c-a591-f210f9212ecb.jpg

Duration: T01H00M34S

Embed URL: https://stream.cadmore.media/player/b9dfbb9d-04f7-480c-a591-f210f9212ecb

Content URL: https://cadmoreoriginalmedia.blob.core.windows.net/b9dfbb9d-04f7-480c-a591-f210f9212ecb/GMT20240718-145958_Recording_1920x1080.mp4?sv=2019-02-02&sr=c&sig=Pb6sRVDUzmBsI1TUxe0NwcGE5yc%2BWnd60mD4khdnW5I%3D&st=2025-07-18T23%3A26%3A17Z&se=2025-07-19T01%3A31%3A17Z&sp=r

Upload Date: 2024-07-22T00:00:00.0000000

Transcript: Language: EN.
Segment:0 .
We'll give another 15 seconds for everybody to file in and then we'll get started.
All right, why don't we get started. So thank you and welcome to today's innovation showcase. I'm Dave Myers, the CEO of data licensing alliance and a member of the SSP Education Committee. Before we get started, I have a few housekeeping items to review. Attendee microphones have been muted automatically. Please use the Q&A feature in the Zoom to enter questions for moderators and panelists.
You'll find it at the bottom of the screen under Q&A. You can also use the chat feature to communicate directly with other participants and organizers if you so choose. Closed captioning has been enabled to turn on captions. Please select the CC button on the Zoom toolbar as well. So this is a one hour session. It'll be recorded and available for all following today's events. Registered attendees will be sent an email when the recording is available.
Folks that didn't register will still be able to access it free of charge on SSPs on demand library. So you should really check it out. A quick note on SSPs conduct, code of conduct and today's meeting. We are committed to diversity, equity and provide an inclusive meeting environment that fosters open dialogue and the free expression of ideas, free of harassment, discrimination and hostile conduct.
We ask all participants whether speaking or in a chat, to consider and debate relevant viewpoints in an orderly, respectful and fair manner. So about today's webinar. Today's showcase will feature four companies who will present for approximately 10 minutes. The presenting companies include in order cloud source Humm Silverchair and site Fusion Pro consult. After all the presentations are complete, participants can ask questions.
Again, use the Q&A box and I will direct to the appropriate panelists. You can also ask questions in the chat and panelists colleagues may answer in real time. We will also provide QR codes that you can scan for more information. So now, without further ado, I'm pleased to introduce our first panelist. It's Mariska Connelly, director of partnerships and communications at cloud source from sirsidynix.
Thank you for watching this brief preview of cloud source. It's a new comprehensive e-resources platform from sirsidynix that leverages the vast and expanding body of open access content. Cloud source is part of the cloud library services platform. It delivers comprehensive, comprehensive aggregation, enrichment management and discovery of all e-resources for all library types.
Cloud source works with all and discovery platforms so you don't have to be a sirsidynix customer to benefit from cloud source. The foundation of the cloud source index is the collection, which currently contains more than 60 million open access resources, including articles, proceedings and reports, ebooks and textbooks and open educational resources, which include course materials, lessons, plans, videos, activities, et cetera.
Cloud source plus expands that index by about a billion. Closed and licensed resources. And this includes scholarly content. Magazines and trade journals. E-books multimedia resources and news sources. Cloud source doesn't just aggregate metadata, it enhances it with tools that add subject headings, access points and enriched discovery.
Cloud source always experience is as easy to use as Google. You have your basic search fields that you're used to. Plus, for those who are not proficient with more advanced searching and Boolean operators, we have an advanced search builder. Here you can see that we have all of the filters you would typically see within a scholarly database, but also a lot of specific filters and ways to limit and search.
I have limited here to a peer reviewed and also journal articles. And on this particular search here for artificial intelligence, I'm able to search within the search results to further limit. So I'm looking for hallucinations. You'll see here, even on the brief display, you've got a lot of information, including your peer reviewed status, any open access licenses that apply, citations, social Media Buzz and everything that your patrons and users need is in this axis is access actions menu here.
So clicking full view resources takes you to the full text with no login and no authentication required, which also allows you to share this with your alumni, with your community users who don't have logins and your visitors. Detailed views give you a lot more information. And this is where a lot of that enhanced metadata comes in. If any of the citations or references within an article or resource are also open access, you can open those from here and view the full text.
And we have a robust citation tool which allows for annotation and formatting before you export it to your document or your citation manager. Cloud source plus has a lot of these same features, including the search. Within here, I'm looking for artificial intelligence and limiting within that search to accuracy. And you can see here that this article here is published under a commercial license, not an O one, and that it is part of the holdings for this library.
So again, that view resources button will take me to the full text since we do provide support for proxy services. The mobile desktop experience isn't much different. You're going to get exactly the same search tools. You're going to get all the same information on your brief search. You're able to limit using all of the same filters and all of those quick action buttons are available, including your citation manager and all that enhanced metadata so you lose nothing on the mobile site.
Collection management is available to authorized users that you select for the side. You have the options to edit your profiles. Including deciding what types of resources you'd like to be available. You can also use one of our custom collections or build your own by eliminating any articles that you don't want to be titles, rather that you don't want to be available within those search results.
So it gives you a lot of control. We also provide usage statistics, a variety of counter compliant reports that let you do things like limit to the types of metrics you want, the data you're looking at date ranges and produces counter compliant reports like this platform report here. Holdings management for cloud source plus is even more robust. So we have obviously the ability to look at your particular holdings, get a bit of information about those, and this is a managed service.
So you may have noticed on that one that this is a managed collection. You can see your online source details, your access points, and also the ability to search by titles. In this case, I'm looking at nature and I can see things like my item details, my access points, any subject headings that are associated with this title, past names of the title, and also a quick glance at stats. So if you're making collection development decisions on the fly, this is, of course, a development server, so there's not a whole lot of use here.
One of the most interesting tools in the cloud source plus collection management is the overlap analysis. And here I'm going to look by content source. So I want to see what other database titles have overlaps with EBSCO. Masterfile complete. And I can see right here I've got 36.77% overlap with EBSCO Academic Search Complete.
Well, I can dig into this a little bit more and look at which ones are one to one overlaps, which ones have unique coverage. And again, this is another great tool to aid in your collection development. So some key takeaways here with cloud source because you have an expanded collection of resources, it can help replace some of your low use or no use subscription databases.
It expands your library collection in a cost effective way, including decreasing a lot of your and document delivery requests, which we've seen on many of our customers, because suddenly they have access to a lot of those materials. And it eliminates your typical barriers to accessing articles, databases and resources, because all of those and resources have a full text, no login required access.
With cloud source plus, you can provide libraries with tools to evaluate their collection overlaps. You can aggregate all of that content into one index, making it easier for patrons to find everything you have. And you can increase your usage by offering relevant content to your users. So cloud source allows you to save money, keep your users happy and add a more robust collection. And it is advancing by itself.
So I'm going to stop there. Thank you so much. I'm so happy to answer any questions you have. All right. Well, Thank you, marishka. Our next presenter is Richard Bennett, chief growth officer at Hunt.
There we go. So we get the right slides. It's running a bit slow. So today. So today we're going to talk about taxonomies. And I'm going to share a little bit about the work that harm has been doing in the area of taxonomic development. Specifically, we're going to look at a couple of different areas.
One, we're looking at the utilization of AI to build custom taxonomies based on a specific content corpus. And then the second one, we're going to look at how I can potentially use that taxonomic structure to be able to create new use cases and new cases of uncovering content. Yes so taxonomies have been around for a very long time. But as they content explosion is created in within the digital economy digital sphere, we're seeing greater and greater requirements for structures and taxonomic structures to be able to be applied.
This is especially true when we're starting to see the utilization of AI. And one of the simplest way of underlying this is when combined with AI and natural language processing taxonomies, make it easier for machines and ultimately their users to find assets in the form of language. And this is the real key aspect of what we're seeing with the development of AI and using taxonomies.
What this allows us to be able to do is to create a superior amount of topic associations with specific pieces of content. Those topic associations allow us to be able to drive kind of superior outcomes. So there's a number of ways that we can be able to use this and we'll go into some of them as we go through this presentation. So some of the ways that you can be able to then apply this and be able to utilize it is in audience segmentation.
So you can be able to look at specific areas of interest across your audience to be able to pull out areas of deep subject interests. You can also use it for things like content recommendations. So understanding and having a deep understanding of the content and being able to apply that at a personal level can create personalized content recommendations. You can then start to work on things like advertising. So for personalized advertising, opening up new opportunities, new subject areas for to be able to targeted, and then things like lead generation identify mapping demographics versus their subject interests allow us to be able to actually specifically pick areas of development for things like journal editors, reviewers, potential authors and understanding overall, a very deep subject level allows us to be able to do things like content development.
So understanding the engagement with specific areas of content allows publishers to be able to develop and understand their content and develop their content strategies. So alchemist is. So I'll introduce alchemist. Alchemist is essentially the brain that sits within hums content, customer development, customer development platform.
And it is essentially the part that is driving and building these taxonomies and then essentially enabling the tagging of all the content across the publisher's corpus. And we're using AI in a number of different ways. So first of all, it's using interpretive AI to extract key terms from a publisher's corpus. It's then using the generative AI to be able to actually create an understanding of that corpus, create hierarchies within the taxonomy itself, and then we're moving to be able to use predictive AI to be able to apply that taxonomy, to apply all of those taxonomic terms in a very intelligent way that then gets applied back to any sort of corpus.
So it could be in this case, it's a publisher's corpus, but it could be any, any corpus of materials to be able to create a deep understanding and deep level of tagging. So the way that we do this is two levels. So the first level is the traditional way that we've started to operate. So you have a content corpus, you're extracting a whole level of known terms subject to subject inferences, terminology that are attached within the corpus, and you're creating essentially a bag, a bag of words.
These are just a whole series of terms that are associated with the content. What then goes is all the synonym kind of attachments like terms that culling and refinement to really create a series of flat keywords. Now what we're starting to see now is this development of the next generative aspect. So you can take those flat keywords, you can start aggregating them and start building out your taxonomy.
The first level is actually aggregating around specific subjects and that gives you the top level of the taxonomy. There is human intervention in this. You then start to be able to aggregate like terms around specific nodes. Those nodes become the hierarchical pieces as you're building out the levels of your hierarchy. We're to be able to activate internet scale knowledge.
So not only just looking at a corpus of material, but actually being able to go out into the internet and grab areas and subject knowledge to be able to enhance and develop that taxonomy and fill in the gaps. That creates essentially a final taxonomy that we can be able to use. That taxonomy isn't the only thing that can be used across a corpus.
So we will create an alchemist generated taxonomy that can be run across any corpus of material, but it can also be augmented with a number of different ways. So we can augment it with. We have B2B clients, so there's things like IAB terms, but we can also introduce things like open concepts can be introduced to the taxonomy. Also, a lot of different parties have their own taxonomies. So we can also start augmenting the key core taxonomy with various different in-house taxonomies that are provided by clients.
So ultimately that's the building of the taxonomy. But really, if you're going to derive benefit and value from the taxonomy itself, it's actually truly in the way that it's being applied and across all of the content. So what we're doing is taking the taxonomic terms that have been built in this process. And then starting to apply it to every piece of content that's not just limited to published pieces of content that can be any piece of content across any, any content site that we may encounter.
So all of those topics are assigned to each piece of content. The taxonomy is also a living entity. It is as the corpus changes, the taxonomy can be revised updated, so it it can automatically always stay relevant. And all of those new terms can be applied to the content. Once you've applied all the content terms and the content tagging, what you really need to understand is how does the audience really interact with this content.
So this is where the behavioral association comes in. So as researchers are interacting, reading, downloading, sharing that content, engaging across that content, all of those taxonomic terms can then be attached to individual profiles. So what you have is a set of content with a series of taxonomic terms which are consistent. You now have a set of individuals that you are now at through engagement with the content.
And now being able to associate with the same taxonomic terms. And that creates some really interesting things that you can do with that content. So the first is you actually can look at it from a content perspective. You can really analyze your content in a deep, deep manner. So you can create custom collections of content that can be based on topics, keywords. You can overlay that with engagement.
So you may want to have a certain field and have only the most engaged parties within engaged pieces of content within that content segment to be able to be surfaced and shown. So essentially your, your shows engagement filtering over the top, if you take it from a person perspective, you can then start being able to use that same taxonomic kind of idea to be able to apply to people. So you can actually be able to segment and understand your audience in a way to be able to just focus on those folks that are engaged in very, very specific topical associations.
You can either use it direct engagement or now we're also being able to utilize AI to be able to show predictive interest. So being able to further predict off of their current engagement, current engagement interests. And then the final part is you can actually bring those two pieces together. So you can actually utilize the content understanding to be able to create a swathe of content on a very specific topic set.
You can actually then use the audience segmentation to be able to create exactly the same taxonomic pull for individuals who are interested in that area and be able to match them up for a truly engaged experience. So what we've seen is that compared with our experience at the moment, we're able to surface around 50 times more related content compared to traditional tagging methods.
That's that opens up an enormous amount of content being able to understand your content, being able to segment that content for things like greater content licensing opportunities that allows you to be able to truly get value from all of the pieces of the content and also be able to deliver in a very specific way. The pieces of content that are most relevant to individual users. If we're looking at the final part, really when we're looking at benefits, it kind of aggregates around three different areas.
So the first one is compared to manual kind of basis of development of taxonomies. Your speed is significantly different. You're creating text custom taxonomies in around two days, which is significantly faster. The other aspect of this faster and lower manual intervention allows it to be done on a vastly reduced cost. So you really having significant cost savings and speed. And the last part is really being able to leverage this wider and increasing internet scale knowledge to be able to further enhance and generate a significantly richer set of taxonomic terms and be able to apply those.
And that is me. Great well, Thank you, Richard. Our next presenter is Hannah heckner Swain. She's vice president. Strategic partnerships at Silverchair. Hi hi, everyone.
I'm really happy to be on this panel today and to perhaps provide a refresher or just offer some updates on census impact. Census is a product that Silverchair developed in partnership with Oxford University Press in pursuit of demonstrating the impact of funded journal articles on a funder by funder level with key bibliometric and attention data. We released census to the market on Valentine's day, so February 14th, and this is a site that is publicly available to anyone that wishes to peruse it.
So why would we focus on our innovation funds on a publisher agnostic product that's focusing on activities that are happening on and off of our platform. As you may have read or sensed across the last months and years, there's a lot of scrutiny on the currency that's flowing within our scholarly communications ecosystem, with publishers often being seen as a drain on these funds. Apks and subscriptions are certainly a large expenditure when considering all of the costs in our system, those costs of disseminating research, but especially where Silverchair sits.
We see that publishers hold a lot of value in this exchange. They're doing lots of work in the background to increase the reach of research. So it seemed like this was a really ripe time to shine a light on a lot of those activities. And just wanted to include this quote because James butcher's recent journalology really provided some really great context for census impact.
There we go. So as with all good products, we like to think at least the idea for census impact came out of a problem. How do we better connect the efforts of publishers and funders in this landscape that we find ourselves in, this landscape of shifting mandates, increased scrutiny on where dollars are going. Funders are kind of siloed in our ecosystem.
They're an incredibly important stakeholder, but they are working independently in a lot of ways and have more close ties to the researchers that publishers and their vendors are working with. So a little bit disjointed and apart publishers are finding themselves under pressure. How do they demonstrate their value in a way that meets the funder kpis, that communicates that in a situation where they might not be able to communicate directly.
Both of these folks are looking for a return on investment. We have shared goals, but we find that we're really speaking different languages. However, as mentioned before, there is a shared goal here. At the end of the day, any scholarly publisher worth their salt has the same goal as any funder, and that is to disseminate high quality research that advances science, improves health outcomes and benefits the public.
Of course, doing this is easier said than done. How do we define the value of research. How do we define the value of different disciplines of research. What exactly are we measuring. Do publishers and funders have different key metrics. How do we connect those. Where does this data live. Is it in a place that is actionable.
In these sort of, you know, we're moving towards an A future and a present of granular data, but a lot of these metadata elements are still being hidden in things like acknowledgments and author notes. So we have a complex issue to solve here. But since this is trying to work on solving this issue once again, Thanks to James butcher for his coverage in his recent newsletter.
So how does this work. Well, Silverchair is at the helm of this project. It has really been community led since its conception. This product came out of conversations with Oxford University Press as a partner, and we have amassed a community of practice that is a incredibly loud voice in the future development of our product. We really are looking to this community of practice to inform new developments, inform new data streams and really help us to round out the impact narratives that we're trying to tell.
Telling a good impact narrative really hinges on the consolidation of data, and that data has historically lived in very different places, many of them hidden or hard to find see notes on metadata. And last slide. So by combining this aggregate platform usage citations and alternative metrics at funder specific dashboard sites, we're really hoping to facilitate funder publisher engagement, showcase the efforts of publishers and really offer funders a dynamic at a glance telling of the stories that their grants hold.
We really want to demonstrate alignment between the goals of these really key stakeholders. So what does this actually look like on each of census impacts. Thunder microsites. We have some beautiful visualizations Thanks to our partners at hum. And we realize that there's still some room to grow. We want to increase these data streams.
But right now on the site, you will find. On a platform level views and downloads from the content that has been loaded to the census impact platform, which right now is all of the content published by Oxford University press that has granted attached to it. So this you see here is from one of our funder microsites communicating views and downloads from the platform across the last 12 months.
We also have this information in addition to citation information organized onto tables on these microsites. You'll see from these dropdowns that these are sortable tables, and these links here will bring users to the content landing pages. So these are articles that have funding from the funder on the microsite. And these content pages will take you these links to content pages will take you to a site that provides top level metadata, as well as a very prominent link to the version of record.
We also have wonderful and valuable attention data Thanks to our partners at altmetrics. This is communicating outside of that bibliometric narrative. You're able to see how many news articles are featuring citations to the content that provided that excuse me, that received funding from the funder. How many tweets, blog posts, Facebook posts, policy mentions? I'll also note here in the coming days, you'll also be able to see patent mentions with his attention data.
Also you users to census impact are also able to benefit from site search. So you we've leveraged tailored facets to allow users to find specific content pages according to funder or grant award information as well as date of publication. If you wanted to find a specific article that received funding. So it's really our hope that, you know, answering questions like these on this slide here and many others through these dashboards that I just showed you.
And also building off of those I should say building onto those dashboards, expanding the data sets, feeding them, we can facilitate better relationships within the scholarly communications ecosystem and really create a deeper understanding of the role of publishers. This is some meaty content, so we know that we can't just go at this alone as a platform provider with, you know, a singular partner.
So as mentioned before, we have a community of practice that is now 64 members strong. These folks represent funders, publishers, consultants and technology providers in our space. We currently have 18 funder microsites up and running. This list will grow to include funders from Canada, China and Europe in the coming months. And we also have an engaged audience of 2,900 people, so almost 3,000 folks.
I really look forward to days in the future, years in the future where I'm presenting more updates on census impact about a growing community of practice, a list of funder microsites that are equal to the number of raw IDs tied to funders and engaged audience that's maybe at the likes of, of a PubMed. And I know that this will really leverage the engagement of the community and will also be due to the narrative and storytelling that census impact can provide.
Thank you all for your time today. Please visit census impact. Reach out to me if you'd like to learn more, junior, join our community of practice or talk to me about becoming a partnering publisher or a data partner. Please also visit our site. When you visit our site, join our newsletter. We have our first installment coming out today, and we'll also soon have a piece of thought leadership on the current ecosystem of research impact that promises to be a thrilling read.
So Thanks, everyone, for your time and please don't hesitate to reach out with any questions, thoughts, concerns. Well, Thank you, Hannah. Now we're here at our final presenter. Our final presenter is Colin O'Neil, senior content solutions specialist at site Fusion Pro console. I think I got that right.
Yep, you did. Thank you, Dave. My name is Colin O'Neil. I'm here to talk about using the site fusion CMMS as a publishing hub. I want to thank everyone for coming here, and I look forward to, hopefully, some engaging conversation.
Today I'll be discussing how we use this CMMS as a publishing hub. First, a little bit about me. Like Dave said, my name is Colin O'Neil. I'm a content solution specialist at site Fusion Pro consult. One of the reasons for that peculiar kind of title is because in a lot of ways I have to wear a different hat based on what I'm doing. You know, my background is in development.
I've also worked as a business analyst. I've worked on warehouse management systems. I've worked on in traditional publishing scenarios. So I kind of have this broad range of ideas and conceptually how to handle content. So I've been in this game for 20, 25 years. And you know, obviously I've, I've kind of focused the past 10 years, past 15 years on more traditional publishing platforms, scholarly journals, but also education and things like that.
And really when it comes down to it, like every conversation is going to be a little bit about AI right now. And, you know, these waves are going in and out. And we keep saying like, what are the strengths. What are the and someone's always going to say, well, it's Messing up here. It's really strong here. And what we're trying to do is we're trying to work with publishers on coming up with the best way to use AI.
But it's also about, you know, not changing much in the way publisher works from, you know, submissions all the way through to creating the content and then going onwards. A little bit about site Fusion Pro consult. We are about 700 employees strong. Most of us are spread across Europe and North America. We also serve customers in a couple other continents.
We have a lot of education and legal background. More and more, we are working on academic publishers with academic publishers. We're also working with standards developers. We have a strong partnership that, you know, with editorial tools and data tools. So what we really focus on is creating a full solution for everyone. We're not just a CMS company site.
Fusion CMS is a company that has, you know, 20 years of experience in Germany. And so EPCOT is an Austrian it consultancy company, which is which teamed up with site fusion. And what we do is we take digital assets content and we apply workflows using the Camunda workflow engine. And we give you semantics in a secure background.
And then publish out to whatever multiple formats that we have. We have, you know, traditional ones where we're just sending things out to PDF. We also have solutions where we're sending things to, you know, VR solutions for training and things like that. We we're willing to work with anyone who has any questions, just to I'm willing to talk to anyone for hours actually about, you know, what are you trying to do. And hopefully, we can come up with a solution.
So AI has been the thing that we have been talking about for about 18 months now. Every conversation, every conference, every SSP that I've been to over the past couple of years, we are talking about AI in some way, shape, or form. A lot of the things that we were worried about have started to come to a head lately. So when we talk about creating an AI hub, using your content and using a CMS, we are talking about creating data access at scale.
So we want to create a seamless orchestration of processes. So this is ultimately where a lot of the Roi solutions of I have come into play, right. So we're trying to automate the stuff that we don't need personal human interactions for an end. We are going to focus on that. We also work with data integration and harmonization. This involves an effective data integration and schedule.
We use machine learning operations. Integration with existing integration patterns for flexibly integrating with existing systems, but also creating new systems and new solutions based on whatever customer may need. A lot of these things we're always going to talk about Roi, but a lot of these things are also about value. Add at the end of a solution too. So we can always create a better, better, more productive way for workers to work.
But it's also about creating a better product at the end of the day, right. We want something that customers or users or, you know, academics and, you know, within scholastic publishing, your main users are just. Other scholastic publishers and we want to make a better product for them as well. So when we talk about process and automation, the use of AI applications can be automated in a low code fashion using workflows.
So like I said, we use the Camunda workflow engine and we do this to create these low code solutions. So quickly. Just kind of what a low code solution is in this case is we create a piece of software we create know sometimes it's code, sometimes it's 15 different iterations of code and we send it out and you can do This and you can plug it in at the beginning. And you can say, send this to an AI platform, translate this, and then give it back to a copy editor to check and whatever.
And then that's a reusable piece of code. So what our low code solution does is it says, yes, you can do this at the beginning of a workflow, but you might want to do this at the end of a workflow too, after it's been through everything else. Nothing else needs to change. This is a reuse of code. And that's what we're trying to do. We're trying to build something that we don't need to rebuild all the time.
We want to do this with a customer, but like if something is, is, is working, we don't need to we don't need to put it in a silo. We can share it with other customers. We can also use it within the same solution to go on after that. So a lot of our background with AI solutions is not just from a publishing standpoint, right.
So epcot, my our, our IT consultancy company based in, in Europe, we have been working or our data teams have been working within, within machine learning realm for a long time. A lot of this has to do with know, basic integrations, but it also works with fraud detection and banking and fintech and even some like advertising technologies where we use these things.
And it's, it's about taking that knowledge and then moving it into a publishing platform because while you don't really need to worry too much about some of these things, you know, the problems that a fintech company and the problems that a publishing company have might not always be the same, but sometimes 70% of that code can be reused in a way. And that's what we're really trying to do when we're building our publishing platform with site Fusion Pro console or with site fusion CMS.
Sorry so we have started to see some flaws in using just large language models in generative AI. So while large language models have shown remarkable capabilities in generating human like text, there are several limitations and challenges associated with their use. The lack of contextual understanding, data sensitivity, bias and fairness, scalability issues. Limited, limited interoperability.
These are all things that we see. And, you know, like the jokes are out there about, you know, putting glue on your pizza to make sure that the cheese doesn't fall off and stuff. And what we've learned as a company is that the best way to combat these hallucinations is a strong term. But it's the common one right now. And we found that the best way to combat these is with a graph shaped problem.
So graphs are essential to modeling relationships between data enabling, advanced querying and data analysis. So in a graph database, relationships are the first class citizens. It's no longer the email address or the person or the title or the paragraph. It's the relationship of that paragraph to another paragraph, or it's the relationship between an author and what he wrote. It's and it's identified with these first with these unique keys.
This allows for efficient storage, retrieval, and traversal of data. There's a whole part of this presentation that I gave where I start to go into the, the, the benefits of using marklogic database or a NoSQL database. So we could also work with MongoDB if we really wanted to. But, but we're going to skip all that because it's not really something I can talk about in 10 minutes. What we want to create though, is highly interconnected data so graph databases can handle a highly interconnected data at scale with a high performance.
Conventional databases struggle with modeling or storing relationship data without added complexity. Graph databases, however, maintain performance even as the number of relationships and database sizes grow. So traditionally, you know, no one, no one's buying the solution that relational databases can handle all of your content, but relational databases have their use.
We just don't think that relational databases and key value databases are really storing the pertinent information and what we want to do while we're feeding these solutions. So we are developing these graph databases and these graph analytics solutions. In the hope to we want to graph out with graph analytics. We want to involve applying algorithms to a graph database to explore the relationships between entities, such as organizations, people and resources.
Since graph databases explicitly store these relationships, queries and algorithms, utilizing this connectivity can be executed in sub seconds rather than hours and days. So with the complex indexing system that again, I'll just mention Mark logic because that's the, that's the, you know, the NoSQL solution that we tend to work with, the indexing that goes in behind the scenes using a marklogic database and their ONNX solution, you know, creates this index that can be, you know, within seconds you're getting an answer.
It's a little chatty on the back end, but it's the best way to get the most relevant information to the people who need that information and say people. And obviously, that can be another machine as well. But that's a completely different solution that we would talk about too. So when I was putting this together again, this, this, this is a fragment of a, of a larger presentation that I've been working on for a couple months.
And I was thinking about different use cases and the, my company has a lot of background with supply chain management and fraud detection, reusable share documentation, money laundering management systems. But I think the easiest one to just explain is the search engine optimization. Everyone knows about this. You have content and hopefully what you're trying to do is you're trying to make sure that content reaches people who want, who want it, right.
And so. So the challenge is that we're looking at with an seo, with an SEO. Backend we're looking at link analysis and backlink management user behavior analysis analysis, entity relationships, regular algorithm updates which search engines frequently update their algorithms which can significantly impact your content's visibility.
I don't know if anyone works on the marketing end, but this is something that there was just a big, you know, Google Analytics overhaul over the past like six months, and a lot of people have been infected by that. And so you need we need to build a solution that can quickly shift to something that you're not losing customers just because something like Google or someone like Bing changes the way that their algorithms are going to do are going to work.
And then the academic publishing field is highly competitive, with many institutions vying for top positions in search results. So these SEO challenges or sorry, these SEO strategies are really important to get your content towards the top and hopefully into a more interactive system. So what we do when we're trying to build our SEO solution, we want to build internal, high quality backlinks by cross-referencing data from your CMS and in this case, it would be site fusion web pages and SEO analytics.
We can identify opportunities to create internal backlinks. This ensures that important pages receive appropriate link juice, enhancing their visibility and their credibility entity keyword mapping entity centric content optimization, which is creating content that covers relevant topics answers user queries, satisfies search engine algorithm is crucial. So these taxonomies that we've been storing our content with, with, let's start to use them in a better way to make them more discoverable.
We want to focus on contextual relevance, capturing the semantic context of entities to enable a better understanding of their relationships, the topics, concepts and user intent. Because at the end of the day, like I said, a lot of our end users are just other academic publishers. So that's kind of the solution that we're building. I would love to hear from anyone if anyone has any further questions.
Like I said, this is kind of like a small piece of a larger presentation that I've been working on for a few months. So there are kind of like we can get deeper into the graph piece, we can get a little bit deeper into the logic piece, we can get a little deeper into a lot of these things. I just wanted to create some I just wanted to make something that was, you know, like a top down view of everything. If anyone wants to reach out, the QR code here will lead us to the contact page.
But you can also reach out to me directly. Colin O'Neill at fusion pro, I think. I think we'll be publishing this and yeah, would love to hear from anyone. All right. Thank you very much. All right. Well, Thank you, Colin. Now, as mentioned at the beginning of this webinar, we've got time for a few questions and answers.
So please, I encourage you at the bottom of your screen to hit the Q&A and type in any question you may have. There's no dumb questions. So please, please go ahead. We do have our first one, which is which publishers does site fusion provide the i.e. CMS to work with. I don't. I don't think we're allowed to really talk about naming some publishers directly in a platform like this.
What we can do, if you want, you can reach out directly and we can share solutions that we've built without kind of talking about the publisher specifically. Can you talk a little bit about the general industry that these publishers are in or something more generic. So our so within publishing, yeah, we can talk about educational. So educational publishers have a lot of information.
They have a lot of ways to use that information. And traditional learning resources have been kind of, you know, you, you read something, you answer a bunch of questions for it. And a lot of people are starting to realize and, you know, people have been I come from a family of teachers and educators. This isn't the best way to do it. So what our educational publishers are trying to do is figure out what's a better.
What's a better way to prove that a student is learning it. And so what they're doing is they're coupling in all these different solutions and recognizing the patterns between the way students are answering questions and trying to come up with something that's a little more suitable. Because when you start to look at it, you start to notice there are different kinds of students, there are different ways of learning these things.
And so that's the kind of thing that we're trying to find with educational publishers. You know, when you talk about scholarly publishing, you're talking about this other thing. We also have been developing this solution with some standards organizations. They're a little more protective of their IP. They're more who, who I'm thinking about when I say, like, I don't want to mention anyone, but but standards organizations are their best audience, right.
And one of the most important things about a Standards Organization is we publish a 10,000 page article and for, for an end user maybe one paragraph is all the relevant information they need. It's just a welder at a, at a, you know, on a machine in, in a specific country. It's not someone who needs to 10,000 things. So what we're trying to do is figure out what the relevance score is.
So if you are a welder and I'll just say Portland, Maine, you know, like this is the most important information for you and you don't need the rest of it. So Thank you. We have another question. It looks like it came through the chat. This is for Hannah at Silverchair. May you please discuss census impact metadata sources. How do you collaborate with other data partners.
Yeah, this is a great question. So metadata sources right now, as we're our sole partner, is Oxford University Press. As their platform provider, we obviously have first hand access to their article level XML. So we are able to filter the articles that they're publishing, filter the funded articles from those and bring that metadata over to the census environment. We have, of course, also developed a framework to get content onto the platform from other platforms.
So we are going to be leveraging Crossref workflows for this that's going to contain all of the metadata that will need to populate these content pages and also bring together the other data sources. As far as the data sources that are not that article level metadata. A lot of those are just leveraging the Doi. So we have access to Altmetric data. As I noted, we are bringing that data over using the article level Doi.
So by using that Doi, it's the magic key to say, OK, there's this many, this many news references to this article, this many blog posts, et cetera. So that answers that attention data bit. We also, as the platform provider, have access to the citation page view and download data from the platform for Oxford University Press and other Silverchair publishers. So we're able to just transport that over, but we're also creating frameworks to bring off platform data usage data like that onto the platform.
We'll probably start with a more straightforward framework just so that we can more easily bring that data over. But we're also looking at leveraging the global item report from counter to inform the platform level usage and hopefully also syndicated platform usage in the future. But that will be sort of a 2025 goal. Does that I hope that answers your question. Feel free to reach out for more specific information.
Thank you, Hannah. We have time for maybe one or two more questions. In the meantime, I encourage you to scan the QR codes on your screen. If you want to connect or learn more about any of the people or companies that presented today via their QR code. So please put those in the chat or the Q&A and while you're thinking about them, actually, Richard, I have one for you.
So, you know, everybody is really keen on AI these days, but maybe you can explain to the audience how our taxonomies used when with AI to train the data or AI systems. And then that's first part and second part, you talked about opportunities and content licensing, which, you know, that's what I do for a living. And so maybe you can expand a little bit on that.
Yeah so I think, I mean, in the first instance, I mean the, the specific understanding, the deep understanding from the, the, the platforms that hum is utilizing. So this is alchemist, I mean, ultimately it's the actual training and the learning itself is, is built into the deep understanding of the system. So every piece of content that's flowing through there, there's an extraction of the terms that allows us to be able to create conceptual associations with different pieces of content and that ultimately is creating all of those, all of the understanding, the subject level, understanding.
So it's done for every single piece of content for an individual publisher. So that deep understanding is already built within the model itself. And the extra part of learning and the extra leverage of the internet scale model kind of understanding is when we're actually taking a taxonomy that's already partially developed for a specific publisher and then going out and, and having the system going and analyzing the known universe of subjects that are associated with a specific topic.
And then we're to be able to leverage that to bring it back to enhance and develop the taxonomy itself. And just so the, the, so ultimately we're understanding the concept taxonomy and then the people and how they relate to each other within the, within the Alchemist brain and effectively bringing that, that, that level of knowledge together. Now as far as content licensing. So this is where it really becomes quite interesting.
So if you're to be able to associate a wide range of content terms to a wide range of content in association, so essentially you're creating taxonomic association across the entire portfolio, you are able to uncover areas of research potentially that haven't been investigated before. So we did it for one client and what we found was that there was significant kind of both. There was a significant opportunity.
They had certain content in certain areas that was to be able to be tagged and selected. It wasn't in their original taxonomy. So essentially it's an additional set of research that had developed over a period of time. And that was able to be surfaced because essentially the taxonomy was updated. So there's that side, but also just having more taxonomic terms attached to more pieces of content, not just published content, but also you've got, you know, obviously you've got published content and any pieces of content that could be potentially licensed across any sources.
So that's both published and ancillary, you know, conference proceedings, things like that are coming out as ancillary pieces of content. So those are now available and visible so you can be able to slice your content specifically across on a subject level, across every piece of content that you have within your corpus. So again, there's a couple of ways. So the depth of the actual subject tagging itself, but also the depth of the type of materials that are those tags are associated with.
It just allows for a slightly deeper way of being able to understand the content and surface opportunities for, you know, things like content licensing. Outstanding Thank you very much. Yeah, I completely agree. A lot of it is about the discoverability of content and the associations that allow you to create these unique data sets that people really need. Anyway, well, we're running up against the top of the hour, so I want to thank all our panelists and all of you for joining us today.
We want your feedback. So please consider completing a session evaluation form. And as a reminder, you know, just by scanning the QR code and as a reminder, a recording of today's session will be posted on demand library in a few days. Lastly, I want to notify you that you should save the date for SSPs new directions seminar on October 1 and 2 2024 in Washington DC and online.
And with that, this concludes today's session. Thank you all for joining. Have a great day.

Cadmore media player playing video SSP Innovation Showcase (Summer 2024)

Video Player

Transcript

Segments

End of Video Player Control