Name:
Building our infrastructure to expand the research lifecycle Recording
Description:
Building our infrastructure to expand the research lifecycle Recording
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/9254489e-ba16-4f89-93b3-8533f048738a/videoscrubberimages/Scrubber_3.jpg
Duration:
T00H41M00S
Embed URL:
https://stream.cadmore.media/player/9254489e-ba16-4f89-93b3-8533f048738a
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/9254489e-ba16-4f89-93b3-8533f048738a/Building our infrastructure to expand the research lifecycle.mp4?sv=2019-02-02&sr=c&sig=UiPoCFXXcVOlS%2F4iEJzFIwE3EBE%2BYLTaTfTQFIHkR%2Fs%3D&st=2024-10-25T01%3A27%3A59Z&se=2024-10-25T03%3A32%3A59Z&sp=r
Upload Date:
2024-03-06T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
Welcome to the nice Plus 2023 session on building our infrastructure to expand the research life cycle.
This is about a 30 to 45 minute panel discussion and will be followed by another 30 minutes of live Q&A. This panel discusses how sharing research early and often throughout the life, scientific life cycle and process has the potential to rapidly accelerate the scientific enterprise and provide unique insights into the evolution and direction of scientific thought. However without any established infrastructure for early stage research.
In many of our fields, this segment of the market could be lost. A focus on sharing and integrating research objects from early in the life cycle presents a more holistic view of how a researcher's professional output could allow them to advance, connect and accelerate the impact of their work. In this session, our speakers will discuss how technology can enhance this transformation and the role of the various stakeholders across the industry in this process.
Our speakers for today are as follows. We have Osama Dar, his pronouncer, hicham, who's the chief technology officer at Morris, here with a background in product development, software engineering and design. His focus is to harness the power of cutting edge technology to revolutionize scholarly publishing. His previously led teams Sciences and Elsevier to leverage their platforms for machine learning, and more.
Our next speaker, Jennifer Goodrich, pronounced shear is the VP of product at Morrisey. She has worked in scholarly publishing industry for decades and most recently at the Copyright Clearance Center. Her focus on creating innovative solutions to complex problems has led to the development of the industry's leading open access platform. Our other speakers, Sebastian rose, pronounced. Gm is the director of data at Morrisey.
His experience as a data engineer and a software engineer give him unique perspective into the building blocks of the publishing infrastructure. And finally, Samantha Greene will be moderating. This panel is the head of content marketing at Morrisey. She's worked in scholarly communication for 10 years, focusing on creating compelling stories from complex brands and helping researchers extend the impact of their work. And with this, I'll now pass on the virtual NIC to Samantha.
Thank you so much. We wanted to start with sharing a couple of slides before we dive into our discussion today. So thank you for joining us for our session today. We thought that it would be good to start with a very brief overview on why infrastructure and how we contextualize this at Maria and for early stage research in general.
In case you haven't heard of Marissa, we are the platform Transforming Early stage research, those content pieces that are historically hidden and not shared more broadly. We specialize in solutions for publishing, workflows, conference hosting and really expanding peer review into this new segment of the market. So we're talking about infrastructure because our work in early stage research has proven how the right workflows can truly accelerate scientific breakthroughs.
And we think that there's huge potential in an adaptable infrastructure that's future proofed to whatever tech gets thrown at us throughout the years. Central to this type of infrastructure is an expansion of scientific publishing. That means expanding upstream to earlier outputs that have historically been hidden. It also means diversifying formats to videos, presentations, etc., and embedding those into the publishing process.
So they're trackable, measurable and sizable. And of course, as our panel will explore, expanding publishing will require embracing and leveraging technologies. So without further ado, let's get into our conversation. I have a series of questions for the group here to help us explore this intersection, and I'll stop sharing so that we can all see one another. So let's start at the beginning, everyone, long before that journal article.
Let's dive a little deeper into the value of expanding scientific publishing further upstream before that article or book. Jen, I thought we could start with you for this one. Thanks, Sam. So great question. We we MRSA, as you mentioned earlier, are all about accelerating scientific breakthroughs.
And we talk about scientific breakthroughs as really starting with a spark, an idea that is nourished and challenged by sharing and debating and publishing output early and often throughout the research lifecycle, the positive results, the negative results, the early hypotheses, and so on. Scientific conferences are where this first and ongoing debate usually occurs, and it's often months and upstream of publishing the journal article books and other repositories.
So really for us historically, when we looked at this space, a lot of this early conversation, early debate, early artifacts were lost because things were not digitized. The posters, the presentations, they'd end up in the garbage can, literally. So we think probably set up to 70% of some of this early research output was lost. And that's exactly why RCA was formed.
Susannah, what's your perspective on this question? You're muted. That's the most spoken sentence on Zoom is you're muted. Yeah, I definitely agree with Jen and I've spent my career inside and outside publishing just thinking about how technology can make a difference.
And especially in publishing and the topic that we discussing, which is the early stage of bringing publishing upstream. I do see definitely a lot of opportunities in connecting the early stage research with the actual publishing record of journals and articles. And not only that, but also I feel like if we can publish early and soon, I think that we capture a lot of data and information that helps us build a better publishing corpus.
Right and and that has many that is a huge impact on many things. It improves the reproducibility and the transparency of research. If you're publishing raw data research protocols and early stage research results or intermediate results, you know, the researchers can actually provide a much more detail and much more transparency and context into their findings, which which actually makes it easier for others to not only replicate but also follow and verify their work.
Right and I feel like there's a huge potential in cross disciplinary exchange when you make that research available early. It can help facilitate collaboration across different disciplines of research, exchange of information, exchange of insights, and makes the discoveries obviously possible much faster. And also, I think that this is in line with the open science principles by making it more accessible and then empowering different stakeholders, whether they are funding bodies or the general public, to, to assess the impact and the quality of research.
Absolutely as you mentioned, Usama, I think this new set of outputs would be immensely valuable in terms of the insights that could provide to the publishing industry and the analytics. So let's talk a little bit about that publishing data. Why has it sort of at times been hard to find or to gather in from multiple sources? And how would sort of transforming our infrastructure in this way help to create this repository of data?
Yeah yeah, that's a great question. I think the data that is associated with scientific publishing, I think it's not hard to find. It's just the fact that everybody looks at that data from their own perspective and cares about the data in their own sense, in a different way. And whether it's funders or publishers or researchers or even people who are consuming their research institutions, they all care about that data will look at that data in their own way, but also that data actually exists in different repositories and in different places.
I think the main challenge has been getting that more visible, connecting it with each other, and then using that to actually create a much, much more complete picture. So, you know, and we need to do that to be able to look at that research in a more, more holistic manner. Right and if you think about it, the research data really falls into the workflow and all the data and all the user uses and the reader data and the workflow and also data which which is data that is generated analyzed used by the researchers for the, for the purpose of their research.
And then the user read the data is mostly data that is consumed by other researchers, practitioners, institutions or general public. And we need to look at both sets of data to be able to complete the picture and to be able to tell the whole story. And today, all these different pieces of data live in different repositories and different silos. Right and and I feel like, you know, if we can actually move capturing and publishing updated data much more upstream I think then we get many pieces of information that, that allow us to, to tell a complete story to make, make sense of emerging trends and then also make editorial decisions.
You know, you can spot things like if somebody is using a research method that may have a flaw in it and something that you can actually capture and identify much earlier, along with many other things. So you can also apply various integrity checks which, which, which allow you to make research more, more reliable, or the probability of retraction much, much lower down the stream. So I think the data exist.
It just exists in different places. So so connecting all together captured beautifully allows us to a lot of these benefits about early on. Absolutely Sebastian data is really your bread and butter, so I'd love to hear your take on this. It is. Thank you, Sam. So I want to point towards the importance of data.
As some are already mentioned, like capturing the data, especially early on in the research lifecycle, provides us with large opportunities to create visibility, transparency, foster collaboration and that across the entire research lifecycle. And the way to do this is through some kind of interoperability or standardization because data that's not standardized or at least not interoperable, right.
Is of limited usefulness, as everybody that has good data knows. So and but if we have this if we have interoperable infrastructure, it will provide us with sort of like a 360 view of all the data that is collected during the research lifecycle, especially the publishing data. It can be challenging to make sense of all this data, especially if we're talking about semi or unstructured data.
And there. We've seen that advancements in AI or machine learning have contributed a lot and other industries for sure. But also, I think there is a large opportunity here in the publishing industry to really sort of like drive forward the understanding that we all share about the research lifecycle and what that means for the research, for the researchers and their contributions.
And this data will be immensely valuable to all parties involved that like being funders, publishers, the researchers themselves, for example, it could also leverage or like enable more personalized like products or software for researchers and the institutions themselves, of course. Yes it's always I'm always struck how complex this ecosystem is with so many different stakeholders.
So I think we mentioned a couple of times when we were talking about data that the importance of research, integrity and how kind of increasingly important that's becoming, especially in kind of today's world of scholarly publishing, where we're seeing an increased number of post-publication retractions across the board and new kind of variables entering the research integrity space like AI and machine learning.
So I'd like to ask kind of all of you how we can approach finding the balance between, you know, some of the potential benefits of a tool like AI or machine learning and the risks to research integrity. And what type of tools do we need in place in our infrastructure to account for these additional complexities? I think I'd like to start with you for this one. Yeah I mean, before Sebastian and Usama add to I mean, I think it's you've hit it right on the nail.
We are getting questions every day from publishers and societies about how can we help them fight fraud and plagiarism, you know, how can we flag fraud? And as the research moves through the life cycle, because it's not just at the time of publication downstream in a journal, but it's upstream as a scientist first being shared as it moves from an abstract into a poster presentation at a conference, as it moves into proceedings and elsewhere.
So, yes, this problem, of course, is in the news, and we definitely are thinking a lot and developing tools. And techniques to be attacking this and finding the right balance. Right because, like you said, it's both a gift and a major challenge in the peer review workflows. You can use these technologies to help speed up many, many of the manual tasks that still happen in submission workflows.
But the other side is, is this huge problem with retractions in the costs involved in that. And and there are stories every day in the news about this. Yeah you can't go one day without hearing about chat these days. That's true. That's right. Osama, what would you add to this?
Yeah I would say it's obviously definitely interesting to see the increasing application of AI in the population and also at the same time calls for regulations or control rules around the AI and how it's used. Obviously I like an area. Technology is a tool and it can provide great benefit, great efficiency, it can reduce time. And and especially when it comes to workflows, I think it really can streamline the whole process and shorten the time between an early idea and a published and a published or finished work.
And, you know, we all sort of can imagine the use of AI in things like detecting, manipulation of the articles or vandalism and so on. And I think there's been a lot of work done in different places and different spaces in how to use and apply to solve some of these problems. And I think we also need to bring all of them together and have some sort of strong ethical and quality controls around it.
And I think the only way really to do that is, is for all the stakeholders to work together, whether it's the publishing community, the AI experts, the know, and the stakeholders that are involved in this community. I think when it comes to actually building the AI technology itself, we need to evolve in terms of what kind of best practices and control quality controls we apply on it, which includes, you know, very extensive testing and validation to ensure that the output is reliable and accurate, especially when it comes to working with scientific publications where the results actually can have profound impacts down the road and shaping the society itself.
I think in, in, in especially in quality publication, we have to think about with the rise of charity, as you said, or other generative AI technology, it's becoming more and more common to be able to generate content that can be part of scholarly publication. So we have to think about what level that is acceptable and what are the guidelines, guidelines around that. So they are clear and also how that will be used or aided in the peer review process.
Right and you need clarity and standards around that. We also need to make sure that the AI algorithms or technology that's been built or been worked on is actually open and transparent. And I'm a big proponent of open source software, open source technology, so that other people can audit and see what's been built and can detect risk or biases and other such things. In in those algorithms, I think the AI experts and professionals and they need to work very closely with the publishing community and vice versa.
I think that collaboration is key to ensure that the air has a reasonable and responsible role in Scotland publication. And also, I think just in terms of the general infrastructure, the computational resources and also access to quality data, but also people who are actually trained in a responsible use of AI. We all know that the algorithms can intentionally or unintentionally be used in ways that are less than ideal or can have results that are biased in a certain way.
So so having training and education in how to responsibly and ethically uses is important. So, so. And I think that all the stakeholders will need to work together to achieve that. Oh, absolutely. Sebastian, what would you add to this? So I would underline this point of like AI is a very powerful tool and with powerful is the nature of powerful tools that they come with, like larger opportunities but also large risks.
And we like as an infrastructure provider that is active in the publishing space, like we need to think very carefully on how we are applying AI or like how, what kind of like. Problems or realities we are going to be facing with the advent of these generative like AI models. The I think the infrastructure needs to follow the research or the researchers and like not the other way around.
So like it needs to be sort of like adaptable to the needs of the researchers because they are the ones that are doing the research. Therefore, the more they are the most important people in that conversation, of course. And I think I can actually help us do this right. For example, what I've seen is like a large need, like to content, to be sort of like categorized or like classify it in certain ways.
Like the entire industry like relies on taxonomies to do this, for example. And they are I believe there is a large opportunity here to use the advancements in AI, especially natural language processing, to create more flexible, more adaptable taxonomies, like what I would like to call like a living taxonomy instead of relying or like maybe supplemental to like already existing taxonomies that are created by experts in like, well, a particular field of science of course, but also like a knowledge engineering, semantic web technologies, like all these things that are currently employed to create taxonomies, that is that tends to be somewhat rigid in places.
And the use of I can like very nicely Enable Infrastructure that adapts is more flexible to the needs of the research and the researchers. And it will also increase. There is a large increase in speed with which like this can be applied because it does not require so much like manual or semi manual work to adapt to changes.
I think we will see like a large increase in the amount of content being created for sure. That can be a good thing. That can be a bad thing. Right and part of the new infrastructure that we're building here is to, well, not just give like the content creators the tools that they need to create better content, but also like the people who are reviewing where like sort of like looking at the research, deciding if they or if and how they want to publish it and so on.
And I think it's still early days for that, that second part of that conversation. But I feel it's a really it can be a really powerful addition to the publishing and publishing infrastructure. Simmons said of the follow up question to Sebastian on that, of course, please. So, Sebastian, I think when we talk about taxonomies and infrastructure. You talk about there needing to be a high volume of input from multiple sources.
So, you know, as we talk about taxonomies and living taxonomies and infrastructure and collaboration across the industry, I think you also are saying. Each each individual society or organization or even discipline doesn't necessarily want to have a siloed taxonomy that there should be a lot of collaboration. And often the researchers who are doing the science and collaborating across organization and across teams are some of the best to add right to that taxonomy as the science is emerging.
Is that right? It absolutely, yes. So like the expanding like research is becoming more interdisciplinary. So there is a need for like taxonomies that span like multiple fields of science. Sort of like getting this together can be very challenging, at least in the way that the taxon taxonomies have been created so far.
And this is where I see like AI being like a large and powerful tool that could help us build this. Of course, like, as some already alluded to earlier, any kind of AI that gets employed for this purpose needs validation. Yeah, and that ties very nicely to what you just said, Jenn, about the researchers. Like, of course, like we need to do this in lockstep with the researchers, right?
Like they are the subject matter experts in whatever field of science they are active in and any kind of results that gets generated from an AI to enable like a taxonomy and needs validation by the researchers because they are the experts. Right fantastic. That was so helpful. It strikes me that it's really all about these connections between tools like AI and researchers in the community and creating the ability of technology to create new connections between different silos in the research community, but also sort of kind of collectively.
Usama I'd love to come to you for this, this idea of a living taxonomy. What's your take on this? You are muted. Yes it's a double edged sword meeting yourself. Yeah so so I think the way I think about living taxonomies is something that is very dynamic, that can evolve the categorization of topics, subtopics, keywords.
And so on, and connect different concepts and research together. Right which is, which is as opposed to something which is very top down, static and, and doesn't, you know, it doesn't categorize the, the intersection between different, different disciplines and different areas of research. I think obviously that requires a different type of approach and how you create dynamic, evolving taxonomies.
You know, speed is obviously a factor and you have to transition from a more top down to a more Democratic type of approach to create those kind of taxonomies. And obviously, the benefits, as you can imagine, are that often unforeseen connections between topics, as we said, between different disciplines, that these kind of taxonomies that evolve with content can actually provide.
And and especially doing that in early stage, which and updating them in real time as early stage content becomes available and we captured it, I think it will provide us with a more accurate representation of the ideas and the concepts and the findings. And also that helps everybody, that helps the peer review process. And and we are able to identify and assess the relevant research much, much faster.
So I think creating taxonomies is incredibly complex as we've alluded to, it requires very expert knowledge by the subject matter experts. But also, even if you think about the fact that creating static, more static taxonomies or less dynamic taxonomies is an incredibly hard, time consuming process. And then making them more dynamic or living is also incredibly hard, right.
You have to employ a combination of techniques, including manual creative curation. You have to employ automated text analysis or machine learning techniques. You might have to do some crowdsourcing, but also at the same time integrate with the existing databases and ontologies that exist. So I think the traditional taxonomies are very use case specific and introducing more flexibility into this process.
Definitely broaden the horizons and the impact and the usefulness of taxonomies and the speed at which we are able to do it, especially using technology I think can help us get there. Yeah, I completely agree with that. Especially kind of bringing it into that early stage research space is going to just kind of exponentially accelerate that process over time. Sebastian, would you add anything?
Final thoughts on living taxonomies? Yeah, just that I think AI or machine learning can enable like creates enabling, creating these connections very nicely. It's going to be like natural language processing as advanced by like leaps and bounds, like especially over the last couple of years. There is no reason to think it will. It's going to stop anytime soon.
So I'm actually incredibly optimistic about the ability for machines to understand and make sense of content, also scientific content. There is a lot of research that's being done on text understanding, especially in the research and academic space, because everybody knows that this is highly valuable. And I also wanted to highlight again that this will need to be done with the researchers in sort of like in conjunction, right?
Like the validation, validating it to make sure that it actually makes sense, especially if something that is supposed to like change like a lot instead of like being static will need sort of like more validation or continuous validation to make sure that it still makes sense as compared to a more static version of a taxonomy. But I think the, the benefits outweigh the risks. Yeah, definitely.
Jen, I wonder if we could turn to you to we've talked a lot about how this type of integrated infrastructure might be built in the way we'll use our tools. Can you talk a little bit about some of the use cases for this in the industry? You are muted now, Jen. Yes, sure.
I think a lot of. A lot of the most common use cases are having to do with peer reviews. So again, as more and more science is interdisciplinary and everybody's struggling to find the right peer reviewers and not overuse peer reviewers and ensure that there aren't conflicts of interest and other things, this technology can really help speed up some of that matching of talent to submissions and cross-disciplinary fields.
So we see that as a really nice application that should help decrease time to publication, make it more efficient, make it more rewarding to do peer review. Also, just in terms of identifying research trends, whether you are a funder looking to make sure you're funding is having the right impact, whether you are organizing a conference and trying to make sure that you've got the most dynamic conference tracks, whether you're in publishing and trying to make sure that you are publishing program is actually addressing those, those new emerging areas and trends and gaps.
All of this can help with this technology, with these technologies, with linking up the research history across the infrastructure. So really making it easier, faster to collaborate among researchers, research teams, institutions and funders and publishers throughout the ecosystem. Absolutely I think that brings us directly into our last question for everyone.
We've alluded several times throughout this conversation to how collaborative this process needs to be and how many different stakeholders and even stakeholders sort of outside the traditional scholarly publishing industry need to be involved in this when it comes to the ethics of AI and things of that nature. So I'd love to ask a little bit about how we need to approach partnership going forward and how we need to approach kind of building this infrastructure collaboratively and the cultural changes that need to happen in order to make it possible.
Jen, I think we'll maybe start with you for this one. That's such a good question because there is a huge need to come together. And I think organizations like nisa are great at bringing some of the partnerships together. There's also, I think we need to do the working groups. We need to be sharing and building on each other's learnings.
Some of that will feel competitive, but it will also. Give us a lot of opportunities. So I think we need to work within organizations like ISO and then outside in our partnerships and working super closely with all of the stakeholders. Ultimately what about you, osama? Would you add anything? Yeah when it comes to cultural changes, I think we've touched upon this a couple of times.
I think it is a lot about, you know, having everybody to work together. And I think we need to encourage a open collaboration between AI experts, publishers and researchers to find the best possible outcomes. I mean, there's so much potential to build lots of predictive models for publishing data. We we can use technology to help public publishers on publication organizations to publish the best and publish this faster than they can write by, you know, optimizing and Speed knob all these workflows and the checks and making sure that the research flowing through the system is, is, is of high quality.
But we need to foster a culture of openness and transparency. We need to have we need to promote sort of open access to data results. And a lot of it is possible to early capture of that information and data. And as the idea start to form, we have to develop a common understanding and a shared understanding of the potential risks and the benefits of technology and the AI in scientific publishing.
And we have to work together to make sure that the risks are minimized and the benefits are maximized and the work doesn't stop there. We have to then constantly work together to do continuous improvement. We have to regularly evaluate the, the, the work of technology AI, for example, and then update them and adopt as necessary to, to maintain the accuracy and the reliability of the work.
And I think. And all of this is possible if you encourage or if we encourage innovation by providing incentives to researchers and publishers to experiment with new approaches and find new ways to make scientific breakthroughs faster and find new ways to leverage technology. So there's a lot of work to do, but there is also a very huge potential for partnership for all stakeholders.
And absolutely. Thank you so much, everyone, for such an interesting and exciting conversation. I wanted to just close quickly, just with a couple of key points that I think appeared again and again in this conversation. And that's really the value of building this integrated, adaptable infrastructure, whether it's with responsive workflows that are especially valuable when applied to that early stage research, like we were talking about, the ability that we will have to harness greater data and insights when we take a look at these emerging trends and build this type of infrastructure and just kind of the possibilities of these living pieces of infrastructure or dynamic pieces of infrastructure that can help us to scale and grow.
I think those are kind of our key points that I'm really excited to kind of hear more about. So thank you to everyone. And so I'll pass back to you. Great thank you so much. And once again, thank you to all our panelists for this informative and thought provoking session. A big round of applause. And I would like to remind our members in the audience that the Twitter hashtag, if you're tweeting about this, is nice.
So Plus 2023 and ESOP us 2023. And with this, we will now move on to the Zoom room for a live Q&A with all our panelists here. Thank you once again.