Name:
FAIR data principles and why they matter
Description:
FAIR data principles and why they matter
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/6f2e2f57-dc59-4aad-bfe0-f51a26b59de0/videoscrubberimages/Scrubber_1.jpg?sv=2019-02-02&sr=c&sig=vOdAkWp5vF1u5BUtsU9cutlMbqB9gjSjczHjoj%2FO7h8%3D&st=2024-10-16T02%3A11%3A39Z&se=2024-10-16T06%3A16%3A39Z&sp=r
Duration:
T00H35M30S
Embed URL:
https://stream.cadmore.media/player/6f2e2f57-dc59-4aad-bfe0-f51a26b59de0
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/6f2e2f57-dc59-4aad-bfe0-f51a26b59de0/23 - FAIR Data Principles and Why They Matter-HD 1080p.mov?sv=2019-02-02&sr=c&sig=ticesuFbD797We6snwDxudWJ8rGbB1LHZ7wq6JvMejk%3D&st=2024-10-16T02%3A11%3A39Z&se=2024-10-16T04%3A16%3A39Z&sp=r
Upload Date:
2021-08-23T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
STEPHANIE DAWSON: Welcome to the Niso Plus session, FAIR Data Principles and Why They Matter. I'm Stephanie Dawson, head of the Discovery platform Science Open, and I'll be moderating this session today. On a daily basis, for example every time we accept cookies for a new website, we are reminded that someone, somewhere wants our data. Today we will explore what makes data fair, why it matters, and how we can make it fairer. Brian Cody--
BRIAN CODY: Hello.
STEPHANIE DAWSON: --will kick off this session with an introduction, FAIR metadata, low hanging fruit for scholarly publishers. Brian is CEO and co-founder of Scholastica, a web based software platform with professional peer review, production, and open access publishing solutions for journal programs of any size. Before starting Scholastica, Brian has been doing doctoral work in sociology at the University of Chicago, and he's a self-taught Ruby on Rails programmer.
STEPHANIE DAWSON: In the second talk, Steven Howe of copyright clearance center--
STEVEN HOWE: Hello, everyone.
STEPHANIE DAWSON: --will provide an example of what's possible with FAIR Data, entitled Leveraging FAIR Data Principles to construct the CCC COVID Author Graph. Steven has spent his career working at the intersection of publishing, education, and technology, holding position in sales, production, digital publishing, digital editorial, and product management. Steven currently works as senior product manager of analytics at Copyright Clearance Center, and is responsible for building internal data processing systems and data pipelines.
STEPHANIE DAWSON: Finally, Paul Stokes--
PAUL STOKES: Hi there.
STEPHANIE DAWSON: --of Jisc will round out the session with an outlook on fair and fairer data, and his talk entitled Fairest of Them All. Paul has had a varied career in both the commercial sector and academia, and all points in between. At present, he leads on preservation for Jisc's preservation service, and is currently referred to as product manager. He is director of the digital preservation coalition and a director of the Open Preservation Foundation.
STEPHANIE DAWSON: He's been passionate about preservation for many decades and currently also has a number of bees in his bonnet regarding costs, value, sustainability, and storage. So welcome Brian, we're looking forward to your talk.
BRIAN CODY: Great, thank you Stephanie. So as the first of three panelists today, it seemed fitting for me to begin with a brief overview of what FAIR is. But I actually want to start with what FAIR is not. Let's start today with un-FAIR. I'm going to go through some examples. For this first one, this is Samantha, a researcher looking for an existing data set related to her current chemistry research program.
BRIAN CODY: So in an unfair world, she can't find the data set even though it exists. So it's not findable, it's the F in fair. Another sort of experience in the unfair world could be she finds the metadata for a data set, she finds a reference to it for example. But maybe the link goes to a professor's website. They're not at that university anymore, so the data aren't posted anymore.
BRIAN CODY: So in this case, it's again not findable. In an unfair world, she might also find a good data set but the variable names are unfamiliar. They seem random to her, they're not conventional. So in this case, it's not interoperable. So that's one of the I in fair. Let's look at another example. This is Everett. So this is a researcher who's trying to do a metadata analysis around COVID-19 research.
BRIAN CODY: In an unfair world, he finds lots of articles but few data sets, because they're not linked. They might even be out there, but because they're not linked his machine where he's scraping them doesn't find them. Another scenario, he finds lots of data but they're in, say, a Unix specific data processing format, so he can't read or access the data. Access, the A in fair.
BRIAN CODY: Another example he might experience is he wants to look at who's funding COVID-19 research. So he looks through millions of articles with a machine looking for machine readable metadata, and his script finds lots of articles but very few of them have structured data about the funders. So what he ends up with is his computer finding Bill ampersand Melinda Gates Foundation, and then Bill the word and Melinda Gates Foundation, some of them might have the acronym at the end.
BRIAN CODY: And the machine, the script, thinks all those are different. And when you think about across all the funders and all the variations, it would be extremely onerous to try and clean those data up. So this project either ends up being not feasible or requires quite a bit of extra work. Another experience in an unfair world Everett might have is trying to look at who the authors who are publishing this kind of research.
BRIAN CODY: Well, the challenge he has is grouping them by authors, very few of these publications or data sets have ORCID iDs. So it is, again, computer scripts might find Sam Johnson, UIC and Sam Johnson, UCSF, and there's no way to know whether these are different researchers or the same researcher. And across tens of thousands of possible matches that's not something he's able to clean up, so again he doesn't do this project.
BRIAN CODY: So sort of the punch line here is in an unfair world the pace of research is slowed, certain kinds of research which might be really interesting are hard to do. So these large data sets, or trying to comb through millions of articles without structured data is very difficult or not possible. OK, so that's what unfair is. Let's look at FAIR, what we're talking about today.
BRIAN CODY: As I mentioned, an acronym, it's when data are findable, accessible, interoperable, and reusable. And these principles apply to data, so such as data sets, metadata, which describes the data, and infrastructure. So one of the punch lines from my experience working with publishers is that fair isn't easy. I have a three-year-old-- this is not a picture my three-year-old, But this does capture a lot of my recent experiences-- is that taking turns, sharing, learning to be fair is not easy.
BRIAN CODY: It's not always as clear as we'd like trying to explain what fair is, it takes time and patience. That's also true with a capital letter F, FAIR. How to have FAIR data isn't always as clear as we'd like, it takes time and patience to do well. And to get people's sense of scale of sort of how much effort is expected in this, in 2016 Science Europe released a briefing paper, and they recommended that 5% of the total research budget should be spent on a FAIR data management plan.
BRIAN CODY: So that's budget, not time. My guess is for some people this would end up being more than 5% of the time they spend, formatting this and learning how to do this well. So doing FAIR correctly is not a tiny task, it's measurable in material. For background on my experience with FAIR, the Scholastica platform, so we collect metadata during the manuscript submission process.
BRIAN CODY: We also generate metadata for web crawlers, we generate JATS XML for discoverability services for open access articles, and that includes structuring and normalizing citations. So we work with publishers and hit a lot of the challenges around making data FAIR. Earlier last year we conducted a survey with society and university journal publishers, and this helped give me more insight in the kinds of challenges they're facing with achieving FAIR practices.
BRIAN CODY: From that survey, a key challenge we heard from many publishers-- and I'd say this is especially true for small and medium sized publishers-- is that implementing the FAIR principles around their data, it's not super clear how to do it, and sometimes they don't know where to start. To put that in broader context, they're already struggling to meet all the industry's professional standards like the ones you can see here; ROR, JATS, ORCID, DOIs, LOCKSS and CLOCKSS, JATS4R, FundRef, the list goes on.
BRIAN CODY: Now this is Niso, so I know many of the Niso standard die-hards watching this might think, well, that's an OK list, but there are way more important ones you should be worrying about, insert your favorite standard here. But for many small and medium journal publishers, this is a daunting list. And of this list, I dare say that FAIR is in the top 10% for difficulty to understand and do well.
BRIAN CODY: Many of the publishers and editors in chief that we work with sell the print first mindset. And historically, that meant that the results of their research were published in a journal. But the data were not part of the journal's purview. That's changed, of course, but with many publishers being behind the times on that, the idea of making the data and the metadata accessible is not something they're always thinking about.
BRIAN CODY: So we're dealing with historical and institutional inertia here. And that means that FAIR data might not seem like a journal publishing priority the way some of these other standards might. My experience working with publishers is that they do really care about improving the scientific enterprise, and they care about improving the quality of what they publish, and meeting the FAIR principles is part of that.
BRIAN CODY: This can help accelerate the pace of science and lead to new kinds of research that have never been done before thanks to the emphasis on machine readable data sets and metadata. And spoiler alert, we're going to hear more about an example of this from Steven later. So given these challenges, I wanted to offer a few ways to start your trip to the FAIR based on my experience working with journal publishers.
BRIAN CODY: So first we'll talk about data sets. So many authors don't know anything about FAIR. And so that's a challenge right there. So one recommendation I would have is include information in your journal submission guidelines. This can be a really great educational tool. More scholars can know that, first off, this is a thing that exists, a set of standards or principles. It's a thing they should care about, and a thing that they can do.
BRIAN CODY: So one of the ways of doing this I recommend is to find an appropriate FAIR data repository and recommend that to the authors based on what would be a good fit for your field. If you're not sure where to start, Datasite and the American Geophysical Union maintain a repository finder. I think they have over something like 1800 repositories, and so you can look for something that's appropriate to your field.
BRIAN CODY: Another option is to begin publishing a data availability statement with each article. And that can just start as text that you fill out that you tasked the author to submit as part of the manuscript submission process. And this can start to normalize among your author base that stating where the data are is something that they should expect to do. And eventually, you could professionalize that to include links and bridge metadata about the data sets, which again is great for machines.
BRIAN CODY: There's also a standard JATS section for this, which you can see here. It's a section type, data availability. So that's something that you could support. Down the road something you can do is require authors deposit their data into a FAIR repository. One thing I want to note is there are repositories that are curated, meaning they'll help make sure your author's data meet their principles.
BRIAN CODY: Dryad is one curator repository that I've had a good experience with personally. And for smaller staffs it can really be a lifesaver to have a partner help answer some most technical questions from your authors rather than that all falling in-house. Another step you could take is there are GO FAIR groups around different conventions and best practices based on the discipline.
BRIAN CODY: So these groups can help you understand what's going on around FAIR data within your field. They often have implementation recommendations, which make it easier for you to know what to do. So for example, if you're with collecting your personal health data sets or I saw there's a nanofabrication data set, which I don't know anything about, but seemed very exciting. There's GO FAIR groups around that.
BRIAN CODY: So now let's talk about metadata. One of the low hanging fruit areas is to make sure you're citing data correctly in your JATS. So this, again, is really about the machines. And we often don't think about this, but even if you list it as text, having the JATS structured this way will help machines find and access those data sets. So three key fields to support, and I put them here. One is that with your mixed citation or your element citation, that you have the publication type data.
BRIAN CODY: That's really an important signal that tells machines we are siting in data set versus just something on the web or conventional journal or book or something like that. Related, I'm including the pub-id, usually that's a DOI or some kind of unique identifier for the data set itself. And then last one is the data title attribute. As you can guess, that's the title of the data.
BRIAN CODY: And so those three fields can help machines be able to access this and really helps meet those FAIR principles. Another area that helps with large data sets is to collect ORCID iDs for all your authors. Ideally, verify them. So the difference there is if you have a text field, authors can type in their ORCID iD, but they might have errors. If you verify them, you can usually integrate with ORCID's API and then that way we'll know that it's actually linked to the correct author, and that reduces errors.
BRIAN CODY: Ideally you'd have authors do this when submitting, that they're highly incentivized at that time. You can also require them to have an ORCID if they don't have one, which helps get more authors having ORCID and knowing about it. That also pushes the data collection challenge to the moment of ingestion. So getting the manuscript before it goes through the process, becomes an article, rather than trying to enhance the metadata during the production side, which adds time and can delay publication.
BRIAN CODY: So last low hanging fruit recommendation is look at the JATS for our validation tool so we can look at your XML and make sure you're following emerging conventions. So I like this because it's way for machines for the script to tell you when something is wrong, versus you having to stay up to date with what's changing. Again, if you're in personal data and they make a new recommendation, once it makes its way in the JATS4R, that would actually bubble up and you'd have a way to stay ahead of that without having to, for example, wait until someone complains that their data weren't found or a discoverability issue.
BRIAN CODY: I included some of the links that I mentioned here. I'll make sure that the slides are available to the participants watching this, but I thought some of these would be a nice starting point for people. That's what I wanted to go over. I'll say that for us, we do a lot with the journal metadata, but also helping people link to their data. It really is a challenge, but when it's done well, it leads to these possibilities, which are really exciting.
BRIAN CODY: And on that note, I'll pass the baton. Or if we stick with the county fair analogy, I'll pass the cotton candy stick to Stephen.
STEVEN HOWE: Thank you Brian, and cotton candy would be nice. Let me set up my screen here for you. All right, so I'm Steven. I'm going to continue this conversation about the role of FAIR and talking about how you can build-- when you have FAIR data-- building sophisticated analytics products.
STEVEN HOWE: I'm going to cover three main topics, beginning with CCC COVID Author Graph. This specific knowledge graph is an example of the type of analytics you can do when you have high quality and FAIR data. Next I'm going to talk about how FAIR data enables the building of quality knowledge products. And I will finish by describing some of the data challenges that we encountered when building this graph.
STEVEN HOWE: In fact, they'll resonate with a lot of what Brian just talked about. Let's begin by talking about the problem we were trying to solve and CCC's response to it. Throughout 2020 as the COVID pandemic has spread, the global research community has gone into high gear to study this disease and to share their research in hopes of finding a solution.
STEVEN HOWE: This increase in research output has impacted CCC's customers, the scientific publishers, as they now have an accelerated need to quickly identify experts who can review all of these new COVID manuscripts. I'm sharing on my screen a chart produced by the National Library of Medicine, showing the weekly count of new COVID publications. Since early 2020, the National Library of Medicine has produced, with COVID, a curated list of publications about the 2019 novel coronavirus.
STEVEN HOWE: As you can see from this chart, since the beginning of May there have been an average of 2000 new publications a week. From the perspective of a publisher overseeing peer reviewed journals, that is a tremendous amount of new manuscripts to vet, edit, and publish. In response to this need, CCC created a COVID Author Graph, a free tool that we offered to our publisher partners.
STEVEN HOWE: Starting with a curated data set of published scientific articles and virology with special attention to coronaviruses, we used bibliographic citation metadata to extract authors, articles, and journals, along with their relationships. On top of that, we built a visualization tool that allows a user to explore these entities and relationships. The idea here is that the best solution to quickly identifying qualified experts was to provide a user a tool where they can explore the set of COVID researchers and their interconnections.
STEVEN HOWE: Graphs like this, knowledge graphs, provide a very natural way to interact with data that describes entities and the relationships. Transitioning to my second topic of FAIR data, let's talk a bit more about what's going on here. When I am talking about a knowledge graph, I am actually talking about the product or output of a knowledge system or a knowledge supply chain.
STEVEN HOWE: A knowledge supply chain takes as its input, data, and then transforms into information, and then into knowledge. Knowledge here is understood as actionable information. With data as its primary input, the quality and value of that output, knowledge, is highly dependent on the entropy of the source data. That's really a fancy way of saying garbage in, garbage out. Moreover, you reduce entropy by iteratively identifying, measuring, managing, and improving data quality.
STEVEN HOWE: Importantly for this panel, these systems and processes highlight the role and value of FAIR data. The better, higher quality, and more fair our data, the more value we can derive from it. Think about the problem that CCC was trying to solve with the Author Graph. We are trying to help publishers identify qualified experts in COVID. We believe that a graph that allows a user to explore the publishing relationships of COVID researchers would be one such solution.
STEVEN HOWE: This solution actually reveals an additional problem. How do we extract these authors and their relationships from the available data reliably and with confidence? And confidence is key here. Information is actionable only if a user actually has confidence in it. When we build the Author Graph, we are relying on data that is FAIR to many degrees. We use data from a variety of sources, much of these data and their formats were certainly findable, accessible, interoperable, and reusable in many ways.
STEVEN HOWE: The degree to which data is FAIR directly determines how accurate our outputted graph is, how much work and complexity we have to build into the pipeline to make sure it is accurate, and then finally the level of confidence that the user has in the end result. Let's look at some examples. The table here describes the four dimensions of FAIR data along with some metrics, or maturity indicators.
STEVEN HOWE: On the right hand side here are some examples from the data that we use that allowed us to conduct more sophisticated analytics. For example, F1, data and metadata are assigned separate, globally universal identifiers. Because of the strong assurance of identifiers such as the PMID, we can accurately disambiguate data about journal articles. A1 1, resolution protocol is open and universally implementable.
STEVEN HOWE: Much of the data that we used was readily available for download by known protocols like FTP and with public access. And finally, much of the data that we used utilized well described XML structures and ontologies, making interpretation and representation easier. Let me conclude today by raising one more topic, the realities of data.
STEVEN HOWE: These are really the lessons learned. I said that our data sources were fair to many degrees, and that is true. And it is also true that data entropy, data quality issues, run deep. Data is not FAIR once and then static. And data is not just FAIR at one level of the data. Data quality, data findability, data accessibility, et cetera, is an iterative process that must be maintained.
STEVEN HOWE: The realities of data is that issues are common, and they present obstacles to extracting knowledge from it. Certainly not insurmountable, but the issues demand our attention. And what were some of those issues that we encountered? We found that some standard identifiers were very well used. I gave you the example earlier of PMID, but others were sparse or virtually not present.
STEVEN HOWE: For example, less than 10% of the authors in the raw data-- and I'm talking about over 800,000 author instances-- had an ORCID attributed to it. And then ISNE and GRID were even less prevalent, sometimes just 10 records maybe, had one of those identifiers. A standard identifier is only good if it is used. There are a number of character and string issues that are very common across all publishing data; unescaped characters, embedded MATHml and other sorts of markup languages, leading, trailing spaces, line breaks right in the text.
STEVEN HOWE: There are a number of different scenarios describing issues around values that really should be unique. For instance, duplicate rectors describing the same entity. Or the same unique identifier being used to describe different entities that should be different. We saw this in both ORCIDs and DOIs. Sometimes the data that we have is actually just wrong at source.
STEVEN HOWE: We traced a typo in an author's name all the way back to the printed article, and that typo carried through all the different data representations of that person's name. And finally, authors and their affiliations can provide their own unique challenges. Authors who are also part of working groups, and when those working groups are listed as authors, ended up being duplicated in the overall list.
STEVEN HOWE: And author affiliations are off often sometimes duplicated across all the authors in the list, making attribution between an author and a specific affiliation impossible. The other comment that we've seen is we've seen a lot more of these mistakes in the 2020 data. So this rush to publish, especially on an area like COVID, is introducing even more data quality errors.
STEVEN HOWE: These are some examples of the data quality issues we encountered when building the Author Graph. The main lesson learned is that the more FAIR and higher quality your data is at source, the more value you can derive from it in a knowledge system. And now for a broader look at FAIR, I will turn it over to Paul, who will ask the question of whether FAIR is enough.
STEVEN HOWE:
PAUL STOKES: Thank you very much, Steven. Let's just share my screen. OK. So this is me. I'm also a senior co-design manager or product manager at JISC, which is the UK's National Research and Education Network. I lead on preservation at the open research services team and I'm also director of the Digital Preservation Coalition, and director of the Open Preservation Foundation.
PAUL STOKES: And for this session, I'm the devil's advocate, or [INAUDIBLE] depending on your point of view. So I think FAIR is not good, or rather, not good enough. So let's take a step back. Most of us here will think FAIR is a good thing, don't we? Given this audience, that's almost a given. Who could be against something that once started what's stated is so blindingly obvious. It is not findable and usable, then why are we spending time, money, and resources keeping it?
PAUL STOKES: And FAIR, it does what it says on [INAUDIBLE],, it's a great acronym. I always think if you've got a good acronym, or relevant rhyme, or great alliteration, you're halfway there when it comes to hearts and minds. But-- and you knew there was a but, didn't you? It's a big but, and here comes the heresy. Just because something has a simple, snappy acronym, doesn't make it right.
PAUL STOKES: OK, that's a bit over the top. Obviously there's a great deal, right, with the concept of FAIR. But there is also-- in my personal opinion, not that of my employer-- some fundamental flaws in the whole concept of FAIR. Flaws that, unless addressed, will ultimately doom the whole concept of FAIR. Has a flawed concept being disseminated without addressing the gaping holes for the sake of the aforementioned simple, sappy acronym?
PAUL STOKES: Quite possibly, I never let an incomplete concept get in the way of good acronym. And so let's look at the things that are wrong with FAIR. Cost, there is a cost involved in making data FAIR. Quite apart from the cost associated with the creation of data in the first place, money is required to deposit data, to plug it into discovery systems, and keep it usable, and so on. And what's more, it's not a simple one off cost.
PAUL STOKES: Costs are cumulative and ongoing. The more data you add, the more money is needed to keep the disk spinning, to keep the systems up to date, and for ongoing curation of the data. It is inevitable that costs will continue to rise, and we'll need to do more with less resources. I know some of you will say that the cost of storage is coming down, so it's getting cheaper, and so on, but gentleman's paradox.
PAUL STOKES: Cheaper storage leads to greater use, which ultimately leads to more use and greater spend, so gentleman's paradox. And it's worth also noting that the rate of reduction in storage costs is starting to level off. We are actually running out of data, or space for data. So what about the value? Well, in purely fiscal terms, to make a case for keeping data we need a perceived economic value for the data to balance the cost of keeping it.
PAUL STOKES: And if there isn't value to be realized in that data, then it will never be sustainable to keep it in the long term. FAIR is all well and good, but there is no business case to keep it, it's going to go. So do you know what your data is worth? Well this maturity scale might help you think about it. And as Steve mentioned, quantity is important. Metadata is important.
PAUL STOKES: But there's more to value than quality. Most institutions don't know what data they have, let alone its quality and value. And is the data reliable? If no consideration is given to the provenance of the data, the trustworthiness of the data, the veracity of the often truth, if there's no indication of those qualities, then those nuggets of truth will be increasingly swamped by the deluge of dross.
PAUL STOKES: It wouldn't really be fair of me to mention recent elections, so I won't. So how about this for an acronym? FAIRER, findable, accessible, interoperable, reasonable, economically viable, reliable. Well, it's not quite there. Not quite, close but no cigar, there's more to consider. Carbon. When I talk about the economics of data and curation, I'm not just talking about monetary cost, which as we know is not insignificant in and of itself.
PAUL STOKES: There's also the power cost to consider, the carbon cost. Putting aside the embedded carbon costs from the manufacturing operations and building of the infrastructure, do you know how much power is consumed by data centers worldwide? This is a video presentation, so I don't really expect you to answer. Is it 100 terrawatts?
PAUL STOKES: Well a brief search on the web shows up many, many different discussions and many, many different figures, but one thing they all have in common is that they are huge. 416 terrawatt hours is a figure is often quoted. So this is comparable to the aviation industry and the power they use. By the way on the little screen here, Ireland's use is due to its popularity as a data center in Europe.
PAUL STOKES: Well I found an online calculator in the gov domain, and if it's gov, it must be legit, right? It says that 416 terrawatt hours is the equivalent of just under 295 million metric tons-- that's big-- of carbon dioxide. I don't expect you to fully comprehend that figure. The numbers are mind bogglingly large. So if I say it's bad and going to get worse. And this 416 terrawatts hours figures is for data centers, which account for approximately 45% of the IT industry's emissions.
PAUL STOKES: IT generates about 653 million metric tons per year. So how much of this is due to making data findable, accessible, interoperable, and reusable? Well the short answer is we don't know. There is a longer answer, the longer answer is we don't know a lot. We have an inkling, we know where to look, what we haven't really started. So let's go back to another thing, impact.
PAUL STOKES: Researchers like impact don't they? Well it's a good thing? Not if it's the impact in the form of tons of CO2, or reputation damage because you didn't consider the environment you when you were squirreling away all those petabytes of data. So taking all that into account, I'd like to propose a new extension to the FAIR concept. FAIREST; findable, accessible, interoperable, reusable, environmentally friendly, sustainable, and trustworthy.
PAUL STOKES: It's not perfect, far from it. But I think it's better than just FAIR. It encourages people to consider a bit more than just making their data available. So how are we going to ensure that what we produce is FAIREST data? Well to be blunt, I don't know. That's a great picture, isn't it? So over to you, what can we do about it?
PAUL STOKES: You tell me. Answers on a recycled postcard, please. And perhaps we'll come across answers in our discussion. So speaking of which, over to Stephanie to tell us what's happening next.
STEPHANIE DAWSON: Thank you Paul, Brian, Steven. Paul has certainly left us with a lot of food for thought on the broader economic and ecological context of data collection and preservation. Brian reminded us that in the academic publishing sector, we are both consumers and producers of data, and that there is a big advantage as well as a responsibility to making it as FAIR as possible. And Stephen showed us the magical things that can happen to that data when it is FAIR, and the decidedly [INAUDIBLE] problems with incomplete, inaccessible, non interoperable, unfair data.
STEPHANIE DAWSON: So we are looking forward to a lively discussion on what is the FAIREST data and why it matters. So I hope you've been jotting down your questions, and we would be happy to start the discussion now. [MUSIC PLAYING]