Name: Data Privacy, Ethics, and Governance for Usage Analytics: An Emerging Open Access Dialogue
Uploaded: 2024-02-02T00:00:00.0000000
Duration: T00H59M55S
Description: Data Privacy, Ethics, and Governance for Usage Analytics: An Emerging Open Access Dialogue

Name: Data Privacy, Ethics, and Governance for Usage Analytics: An Emerging Open Access Dialogue

Description: Data Privacy, Ethics, and Governance for Usage Analytics: An Emerging Open Access Dialogue

Thumbnail URL: https://cadmoremediastorage.blob.core.windows.net/7339a18f-2b21-475d-a088-052539bd8214/thumbnails/7339a18f-2b21-475d-a088-052539bd8214.png

Duration: T00H59M55S

Embed URL: https://stream.cadmore.media/player/7339a18f-2b21-475d-a088-052539bd8214

Content URL: https://cadmoreoriginalmedia.blob.core.windows.net/7339a18f-2b21-475d-a088-052539bd8214/AM21 Session 6B - Data Privacy%2c Ethics%2c and Governance for U.mp4?sv=2019-02-02&sr=c&sig=%2FeSsRLQAkzVVBbMygcIDycvzgzFJDKQJyF0IaKxP39s%3D&st=2025-10-22T03%3A14%3A26Z&se=2025-10-22T05%3A19%3A26Z&sp=r

Upload Date: 2024-02-02T00:00:00.0000000

Transcript: Language: EN.
Segment:0 .

LISA JANICKE HINCHLIFFE: OK. We're live now everybody should be began streaming in. Yup. Here they come.
LISA JANICKE HINCHLIFFE: Welcome everyone. We're just going to wait for a few seconds while everyone gets logged in with the possible format. Keeping everyone out until just the moment before we start. So we'll start in one more minute.
LISA JANICKE HINCHLIFFE: OK. We're going to go ahead and get started, even though we can see that the people are still logging in, but we do have a busy session for us today, and I'm super excited about today's session and my collaborators in putting this together. So hello everyone. My name is Lisa Janicke Hinchliffe. I'm a Professor and the Coordinator for Information, Literacy Services, and Instruction at the University of Illinois at Urbana-Champaign.
LISA JANICKE HINCHLIFFE: I'm joined today by Joe Karaganis from Open Syllabus, and Christina Drummond from Educopia. We will let each of them introduce themselves in a little bit more detail when they give their parts of this session. So I will try and do justice to their own biographies-- plus you can probably read them online. Our focus today is on data privacy, ethics, and governance for usage analytics.
LISA JANICKE HINCHLIFFE: Particularly where it plays out in the open access arena, or the open content arena. This comes out of conversations that Christina and I had after hearing each other present at conferences over the past couple of years. Particularly, work I am doing on a Mellon funded project called Licensing Privacy, where we're looking at the ways that libraries are licensing-- or using their licensing terms to ensure user privacy and good data governance around user data, which brought out for me the notion that, oh, wait.
LISA JANICKE HINCHLIFFE: What are we going to do in an open access world around these issues? Because in an open access world, we're not contracting for reading. And so, we don't have the same mechanisms for reader privacy. In December, of 2020, I gave a presentation at the JROST conference, somewhat provocatively titled Are Readers the Product? Asking ourselves this question about, how do we understand usage analytics in the open access environment?
LISA JANICKE HINCHLIFFE: So I'm really pleased that both Joe and Christina were able to join me, and to put together this session today for us to really look at this issue, and to invite a much broader conversation than just the three of us on our Zoom. Now we have a lot more people on our Zoom. I do want to remind everyone as part of my responsibilities as the moderator of the session, that SSP does have a code of conduct for this conference, which you have probably seen before.
LISA JANICKE HINCHLIFFE: But in case you're just logging in for the first time, we do want to make sure that everyone is aware of this. Because of the SSP's and our own personal commitments to this being an inclusive, equitable, and harassment-free environment. Also remind you that SSP does have a hashtag on Twitter for this conference. The recording will be available on demand after the broadcast today.
LISA JANICKE HINCHLIFFE: And we very much invite you to type your questions in the chat. And if you haven't already, there is a poll question for this session that is on the Pathable site. So click over to that poll tab, and take that poll for us. So the motivation for today's session is this question. "If data about scholarship or other intellectual work will be productized, by whom and how will that happen, and what ethical principles and guardrails should be in place?
LISA JANICKE HINCHLIFFE: How do we align as well, our ethical and legal frameworks, and not leave this only to the legal environment? And with that, I want to invite Christina to begin sharing her presentation, and guiding us through an interactive exploration of these issues.
CHRISTINA DRUMMOND: Thank you so much, Lisa. Hi, everyone. I'm Christina Drummond, and I'm the Program Officer for the Open Access E-book Usage Data Trust. And I'll explain a little bit about what that is later in this session. But I also happen to be a Certified Information Privacy Professional, and I've been working in data governance and privacy, and data ethics, for almost two decades now.
CHRISTINA DRUMMOND: And I wanted to take the time to kind of frame the session in terms of its importance, and how scholarly communications much like other industrial sectors, is in a position to chart its own course. Thinking of the publishing industry, we're moving towards this age of data analytics and big data-- whether we like it or not. And with that, we're moving beyond raw count for views and downloads, towards linked data-- which allows us to flip and repurpose data.
CHRISTINA DRUMMOND: So publication data on access usage, funding can be linked and transformed into information about scholars, and aggregated into scholar-specific profiles. We know a lot of this is already happening. And while this can enable positive impacts, like an author could understand the reach of their work around the world, you could describe the impacts of your own scholarship. The question that we're here to think about today is what happens if that data, or the resulting analytics and report, can be repurposed for users that have negative impacts.
CHRISTINA DRUMMOND: Could the same information be used to recombine to identify readers? If we had IP addresses, that could be possible. Could it be used to target scholars? Like all technological tools, analytics-- in this case publishing analytics, can be purposed to benefit or harm society. And the challenge we have is to think about how to design guardrails in scholarly publishing to ensure that the way these analytics are used, really abides by shared values and principles.
CHRISTINA DRUMMOND: So put that another way, we always like to think about it right, now our technical capabilities kind of outpaced what is legally possible, what's regulatory-- in regulation, what's allowable. And this of course, varies across the globe. It's a very different reality in Europe than it is in the United States, for example. And so, when we have that, the question becomes, must we consider ethical guidelines in the middle to protect the principles that we hold dear, such as scholarly freedom.
CHRISTINA DRUMMOND: So often in cybersecurity and privacy, we think about worst-case scenarios, and you have to plan for those worst-case scenarios to be responsible. And that plan of action is there-- worst case, you can put into effect. But ideally, by going through this planning process, you can design processes and guidelines that reduce the risk of ever having those harmful issues occur in the first place.
CHRISTINA DRUMMOND: And so, this is where we're thinking today. And I wanted to highlight this because if we have data analytics that are out there and reports that are individual-specific, so information about a specific scholar for example. And we have information that's out there and could be used in an unrestricted fashion. What happens if that data can be recombined, reengineered, brokered, put to imaginative uses?
CHRISTINA DRUMMOND: Could there be negative consequences? And what you see here on the screen is from the very real impact the authors reported. And this was back in 2017. So PEN America Survey about Online Harassment. And these are scholars that were being targeted. And what you see here is there are very real dangers that could be out there in this world of linked data and publishing data analytics.
CHRISTINA DRUMMOND: And so the question is, what guardrails do we need to put in place to reduce the risk of any irresponsible use? And the bigger question is, how do we work together to reduce this, and build those guardrails that really make sense for our community. So this actually leads us to our first interactive activity, which is to ask all of you to join us in GroupMap, which is a virtual whiteboard, to add your thoughts, and we have the set up to be anonymous.
CHRISTINA DRUMMOND: On where you currently turn if you find yourself in a situation where you have an ethical quandary-- you're not quite sure how to handle data combination-- linked data, perhaps it's a data sharing agreement or a term that makes you uncomfortable. And so what we'd like you to do, is go into this GroupMap-- and it's shared in the chat, you should be able to click on that link and add your thoughts.
CHRISTINA DRUMMOND: And go ahead. I'll give folks 30 seconds. We want to keep this a little bit interactive. I want to see what we think as a community. What resources do you turn to? You're welcome to put your [INAUDIBLE] in there. And if you double click any of those ideas you can also like them, if you see your thought already up on the board.
CHRISTINA DRUMMOND: Go ahead and give it 30 more seconds here. A lot of peers and colleagues coming up. COPE. It's been cited here a couple of times. Internal Policies and Governance cited as well, or Internal Guidance.
CHRISTINA DRUMMOND: And what I'm going to do really quickly, I can see-- OK. Now, I shifted everyone to the Results tab. And hopefully you've seen what's sorted here by like. But there is a lot of information here that shifted everyone's view, feel kind of magically jump. Colleagues in COPE are at the top of this list, but there's a lot of different ways that people look to gather this information.
CHRISTINA DRUMMOND: And so, one of the things I want to just kind of highlight here are two really quick case studies we can keep in the back of our mind of other industries and how they thought about ethical guidelines. The first is a story I always like to tell. Comes out of the privacy sector, but thinking of cell phones and the cell phone industry. We all have and we all use them. And conversations in that industry really circled around names, and phone numbers, and people got to do that with personal information related to an individual.
CHRISTINA DRUMMOND: But ironically, it's the way phones operated-- the pinging of phones to the cell towers that ultimately went unregulated and resulted in the location tracking industry. The New York Times Privacy Project has an exceptional report-- if you haven't looked at this that shows how the data from that pinging could be repurposed by actors, not only to identify individual routines and preferences-- which again, don't have the name attached, but because of those patterns, you could reidentify the individuals themselves.
CHRISTINA DRUMMOND: With national security implications actually as a result. And at this point I think a lot of us are like, oh, gosh! This is really scary. Maybe we should just avoid data collection, or avoid the data linking. The risks are too high. But I'm here to urge us to actually take a step back. We don't want to risk losing the positive social gains, and things we would give up by having access to this.
CHRISTINA DRUMMOND: And this is where I bring up the second example. And this is here based to the US, but the National COVID Cohort Collaborative, N3C. It's working on ways to link and share, and restrict use of private health data to exercise pharmaceutical research for COVID. And here they're not giving up, but they're looking at how they can share protected information, and negotiate ethics and principles.
CHRISTINA DRUMMOND: So with that, we wanted to go to our next question here. Which is one of-- what ethical questions or privacy concerns have you encountered in your work in scholarly publishing? And so again, we'll have another chat. So you can go ahead. I can actually move folks to, if they haven't click the link this screen should magically change. And so if you want to go ahead, take 30 seconds and feel free to add to this anonymous word questions that you faced in your day-to-day.
CHRISTINA DRUMMOND:
CHRISTINA DRUMMOND: Like that we're seeing here in my work across industries, often people want to know the basics, and notice and consent as we all do and we use things online. Who has access? How are they going to use this information? And what I see here in these responses are those same things coming out for scholarly publishing, and the analytics and usage data we create. So feel free to keep adding to this board, but I actually want to give the mic back over to Lisa to introduce Joe.
CHRISTINA DRUMMOND: LISA JANICKE

HINCHLIFFE: Actually, I
LISA JANICKE HINCHLIFFE: think we can make this pretty simple. Joe, but that's-- OK. Joe, from your perspective running the Open Syllabus Project. This isn't scholarly publishing that you're engaged in, but you're gathering up a lot of open data. And in it up, it's aggregating it in ways that create new information. And I know that you've sort of had to navigate some interesting ethical issues of interest.
LISA JANICKE HINCHLIFFE: So you've had to navigate ethical questions-- many of which are very complex, and you've agreed to share with us today some of the things that you've encountered. So we turn it over to you.
JOE KARAGANIS: Yeah, I'm happy too. There we go. OK. I'm assuming that everybody can hear me that my video's coming through. Yeah. I'm here in the capacity, sort of a case study. I've done quite a bit of work in the past on data policy issues, primarily around the question of, what kinds of guidelines you need in place to make public policy with private-- privately collected data sets?
JOE KARAGANIS: So that's an important issue with respect to policy making. But there's less on the question of library publisher use. In more recent years, I've moved into a role where I'm responsible for collecting, curating, and really developing data stewardship principles around a large data set. That is work associated with the Open Syllabus Project. I don't know if you're all familiar with that. But if you'll indulge me just for a second, I've found it's often much easier to show than to explain.
JOE KARAGANIS: Can I share screen? All right. LISA JANICKE

HINCHLIFFE: You should
LISA JANICKE HINCHLIFFE: be able to Joe, let's double check.
JOE KARAGANIS: OK. And that's-- I don't know whether-- I'm sorry. I don't want to waste our time on this. So I won't bother. I will encourage you to take a look at the Open Syllabus work. It's opensyllabus.org. You can see pretty quickly what the sort of scope of the project is, we've collected around 10 million syllabi from entirely public sources-- public facing sources, which is not the same as open sources.
JOE KARAGANIS: So syllabi end up online for all sorts of reasons. Sometimes published by the universities through centralized archives, sometimes individual faculty members, and everything in between-- departments, schools, often run their own archives, and may have their own syllabus policies. Third party aggregators, there is also labor of love, topic-centered syllabus archives that are pretty common form of academic and collective organizing.
JOE KARAGANIS: And this was, in many respects a new document class that had very few strong operative norms around its-- often very few policies at the institutional level to govern them. And so, you see everything. You see every possible policy, every possible flavor or faculty investments in the document. For some faculty, it's a kind of secret sauce to their teaching that they guard like trade secrets.
JOE KARAGANIS: And other cases that people are perfectly happy to put everything online there's-- a view that contributing to the-- to the discipline or to the profession implies sharing of particular knowledge. And we've been trying to sort of tease out operative norms that can guide our work, and also understand that that's in a context where we are helping to shape those norms too.
JOE KARAGANIS: Because we are one of the first projects to actually do something with this category of documents that begins to feed back into both institutional policies and faculty understandings of what these documents are and what they mean. So we've struggled with that on lots of levels, just because in large part there's just been so little pre-existing guidance about what you can do with these documents that is respectful not just of copyright, which is fairly flexible terrain.
JOE KARAGANIS: But of the range of real and imagined faculty investments in these documents. How that relates to the wide range of institutional policies that are in place. So for example, Texas requires that all curriculum materials be posted online. So by default, Texas public schools publish-- publish all their syllabi. And we've collected them also.
JOE KARAGANIS: We have a really strong Texas collection. And we are-- we've been sensitive to the question of what it means to re-contextualize a public document. So the fact that somebody puts a syllabus on their website, or the fact that it is-- a school publishes it in their public-facing database implies a set of judgments about the publicness of that action.
JOE KARAGANIS: That may not carry over to the context in which we would use these documents. So we've been sensitive to risks of re-contextualized of open materials, or materials that are actually in the open. And more or less where we've come down, and again, in a context that is sort of evolving as this sort of dialogue between our work and faculty opinion and faculty practices continues, we've opted for some fairly strong principles of non-identification of individual faculty in the collection.
JOE KARAGANIS: So we can show aggregates, we can show rankings, we can show the rankings of an individual title, or drill down into a department to see what the most frequently assigned titles in that department are. But we make it difficult to identify individual faculty within those aggregated rankings. We've established a country blacklist where we think even aggregated rankings could pose risks to faculty, or to departments.
JOE KARAGANIS: So for example, the Communist Manifesto is a very highly ranked title in the overall collection. And that occasioned some controversy. It's gotten picked up as a right-wing talking point in the US. But if you were in a Turkish political science department, and Communist Manifesto turned up at the top of your ranks-- the science texts. Would that expose the departments to harassment or some kind of investigation that would threaten faculty?
JOE KARAGANIS: We had no resources to guide this process at all. I've literally been making it up as I go along, because there's no infrastructure for thinking about academic freedom at the level of kind of the curriculum as a whole. We've been-- we provided the first capacity to look at it at that level. And so, the academic freedom groups haven't really been equipped to think in these terms.
JOE KARAGANIS: And we've built up with some advice from some of those groups, but largely independently, country blacklists, or we don't show any data. And that, as you can imagine includes the growing portion of the world at this point unfortunately. And it's evolving. So things change. The most dangerous scenarios from our perspective are situations in which regimes have changed in ways that quickly impact academic freedom.
JOE KARAGANIS: And that might mean that choice as a faculty member made three or four years ago, look very different from the choices they would make today. So for example, in a country like China, where there's-- the faculty have understood for some time there's no academic freedom with respect to teaching materials that may be politically sensitive. They're not going to-- there's nothing new that would impact the publication of those materials.
JOE KARAGANIS: In Turkey, situations change very quickly. In Hungary, the situation has changed very quickly. Arguably, it's changing quickly in countries like Poland or India. Those questions become much more sensitive. So we have a kind of-- there's an international policy around what openness means in relation to our data. That has been pretty important how we think about responsibly managing the data sets.
JOE KARAGANIS: At the same time, we're trying to sustain the project through commercialization of some aspects of the data. So there's a whole other dimension to this work that involves thinking through what responsible commercialization looks like. We have both university and publisher markets that we are actively working to develop in that context, and there are different sets of requirements that apply to both.
JOE KARAGANIS: Publishers for example, are interested in this data. I think not solely, but with a very strong focus on direct marketing to faculty. And we've just decided that-- that's a line we're not going to cross. We're not going to become a direct marketing service. But as I've said, we're kind of evolving space where we don't begin with a strong sense of what the norms are around faculty, faculty use or institution-- institutional use, and we recognize that we're also changing those norms in the course of our work.
JOE KARAGANIS: So the kinds of faculty perceptions of syllabi that they may have developed in the course of their own work, and we're showing syllabi in a kind of universal-- much, much larger context that potentially challenges some of those assumptions. First and foremost, the syllabi are scarce resources-- and clearly they are. So we have 10 million of them. They're not-- the idea that either at an individual faculty level, or institutional level, the content of an education, or the quality of an education is dependent on the secrecy of the curriculum, I think is something that our work is slowly chipping away at.
JOE KARAGANIS: Still an operative norm in some context, but we're pushing on it. It's a function of the way we are opening certain kinds of information about the curriculum to faculty and institutions. We routinely get concern from faculty that at some level this work would lead to targeting of faculty, if you could begin to identify faculty choices within these kind of larger aggregates.
JOE KARAGANIS: We're open to that concern. I think we've been very sensitive to it, and we haven't seen it operative at all in the course of now-- four plus years in this work. That's one-- that's an area where our general conclusion is that they're just-- looking up syllabi, looking up faculty teaching trust. This is a very inefficient way to target faculty in the context of right-wing pressure on faculty.
JOE KARAGANIS: They just-- it just passes through other channels that proved much more efficacious. We don't want to become one of those channels, but based on what we've done so far, we're pretty confident that we can't. And then just to sort of sum up some system level thinking about this, and we wondered what the right structure is for making decisions about governance of this category of documents.
JOE KARAGANIS: We've toyed with the idea of a social model in which universities could both participate in sport and also govern the project. That's-- that's been a very heavy lift for us, in part because universities well, I should say are the first people who understand what we're doing are the librarians and publishers. University administration in other levels is pretty far down the line.
JOE KARAGANIS: So it's been hard for us to make the case to previous level staff that there's good reason for universities to reorganize their syllabus policies in order to facilitate the kind of work we're doing. Libraries and publishers get it immediately, but it's been a longer road with respect to other units of the University. So for now we have a loose but largely self-invented set of principles sort of what I continue to think of as principles of data stewardship, rather than principles of accountability, or stakeholders to whom we are responsible.
JOE KARAGANIS: I think that that will at some point need to change, because we want to be an active partner in university self governance of this important category of documents that I think is-- so the other parts of this conversation show feed into questions of kind of faculty independence. About the privacy boundaries both with respect to students, and faculty, and a variety of other areas that the syllabus a little by touch-- and syllabi really in the middle of this conversation in lots of respects.
JOE KARAGANIS: I'll leave it at that.
LISA JANICKE HINCHLIFFE: Great. Thank you Joe, and I think this is such a great lead. And the question every time I think about this, because I'm pretty sure I have syllabi in your collection is, huh, well, should I get to find out who's looking at my syllabus and my documents, in exchange for my data being in the public like that? And I think that is a great-- for me, that's the question that Christina's work is now helping us also think about, which is-- so what should we get to know about readers?
LISA JANICKE HINCHLIFFE: And who should get to know it? So Christina can I turn it over to you?
CHRISTINA DRUMMOND: Absolutely. And to get to that question, what I'm going to try to do is talk a little bit about the project that we're working on, so that individuals know what a data trust is. Because I think there are two pieces to that answer. One is, do we have the right technology and the right infrastructure we need to even put in place the controls and ethical guidelines to enable those guardrails we envision? That question I think is in addition to, do we even know what those principles and guidelines are?
CHRISTINA DRUMMOND: And I would argue we don't yet. So with that, let me just give a quick overview of what it is we're doing. And I want to talk a bit about what trust looks like in this data exchange framework. So I actually, as I mentioned earlier, am a program officer on a project actually started in 2015. And we're now in our second phase of funding through the Andrew W Mellon Foundation.
CHRISTINA DRUMMOND: And what we're trying to do is work with the open access e-book publishing ecosystem, all of the stakeholders, to understand how publishers, libraries, publishing platforms and services, and even authors want to leverage usage data. So what do they want to do with it, because each of those parties will have a different perspective. And in addition, we needed to understand the data flows. So how does it go between parties and stakeholders involved, with an eye towards interoperability and existing metadata standards.
CHRISTINA DRUMMOND: And so our effort's been doing this through interviews and design thinking workshops, and partnering with consultants and four university presses, a commercial publisher, and the OAEBU network to develop and pilot some of that open source infrastructure, but also identify the issues where we need to explore principle development of recommended practices and standards. And so with that, I'll just note that all of this does come back to this question of ethical guidelines.
CHRISTINA DRUMMOND: And as you see here in the last bullet, one of the things that we're doing in our current project is shifting to modeling what does governance look like in sustainability and policy. And that is not only organizational governance and sustainability, so how would you sustain an entity that accesses a data trust. But a lot of that's around data governance and data stewardship, which is where ethics and principles really fit in.
CHRISTINA DRUMMOND: We, of course, can't do this alone. I'm very honored to be representing a very large brain trust. We have over 20-- over 25 advisors now between our advisory board and technical advisory group representing organizations that you see here, and really representing the diverse types of stakeholders that are involved in open-access book publishing.
CHRISTINA DRUMMOND: With that, I'll just note that what is that we're trying to do when we talk about a usage data trust? Well at the core of this problem is people wanting to understand how a book-- their book, their institution's book, as a publisher, the book you published, how is it used across the internet for open-access books there disseminated across platforms.
CHRISTINA DRUMMOND: So what this means is we have to find ways to bring together all of those streams of information about impact and usage and link it in a cross-platform fashion. And this helps us answer questions. You could envision linking to data, such as Open Syllabus Project, which we haven't done yet, but that could potentially tell you who's book is having an impact on students. And so to answer these questions and to link this data, we have to do a couple of things.
CHRISTINA DRUMMOND: We have to think about not only who has to be at the table and has pieces of that information, but how do they want to use it. And Michael Clarke and Laura Ritchie, Clarke and Esposito, they actually documented the complexity of that ecosystem for a project. And the visual you see here on this slide illustrates the variety of roles and stakeholders that are involved in the OA e-book usage data exchange.
CHRISTINA DRUMMOND: So it also illustrates here the centrality of publishing platforms and services, of which there are many and lots of different varieties. Circled on the slide, you'll see where multiple data flows are aggregated and curated. And people are in those points where they have to make decisions about how to combine this information. And in those places, this is another area where those ethics and linked data pieces come up.
CHRISTINA DRUMMOND: And I think the question relevant today in this session is within this complex ecosystem, how best can we steward usage data and put in those guardrails that we're thinking about, given all of this complexity? So this is where the data trust pilot enters the picture. And I want to introduce a concept here because this is a really novel space. And folks, look this up. There's something called a data space.
CHRISTINA DRUMMOND: And the concept actually is envisioned in something the European Union is bringing up, I would say, as GDPR's privacy data spaces are to data stewardship. There is a data governance act that's being proposed in Europe that really envisions this neutral data exchange type of organization that exists to act as an independent third party to foster an exchange and process data among diverse institutions.
CHRISTINA DRUMMOND: And so this type of data exchange, if we think about it for a given industry, in our case for e-book publishing, this is what we're trying to do with the-- as we call it, the OAEBU, but the Open Access E-book Usage data trust. A Data Trust would be a variety of a data space. And this is very similar to what the National Collaborative NC3 is doing for COVID primary research. What's nice is that at this time-- pros and cons at this time, data spaces are growing in lots of different industries globally.
CHRISTINA DRUMMOND: And so there is lots of efforts we can learn from and do it together. The challenges-- it is very new as a concept and so there is a lot of learning going on. But to share things that are going on so far, I wanted to bring up this slide. This is actually an illustration of the data spectrum from the Open Data Institute. And something that I think really illustrates the complexity here, too, is the types of data that we're looking to bring together.
CHRISTINA DRUMMOND: And this was in Joe's remarks as well. Some of the data is sensitive and some isn't. One size doesn't fit all when we're thinking about privacy ethics and data stewardship. As these graphic here, the data spectrum graphic shows, any one of these data feed could be closed due to privacy concerns. It could-- data that identifies readers or proprietary concerns, if it's sales data.
CHRISTINA DRUMMOND: It could be open data that is coming through any of the open APIs and scholarly communications. But what's important is we also have all of that shared data in the middle that's being steward. And so thinking through how data is shared for specific reasons under specific terms is something that can be done. And for efforts like ours, we have to navigate all of these different streams and think about how access may need to be restricted or managed in accordance of how that data needs to be governed.
CHRISTINA DRUMMOND: So that really depends on your use cases. And so for us, one of the things that we're looking at is can you alleviate some of the economic and resource pressures on the data curation side for a lot of the usage data. And what does that mean, and how do you do that with data sensitivity. And of course, that has multiple approaches. There is technological infrastructure and technical controls you can do to ensure privacy and ethics.
CHRISTINA DRUMMOND: There are policy mechanisms you can put into place, data sharing agreements. And of course, there's governance as well. But I would argue to do all of that, you do need that neutral, trusted party in the middle that both the data providers and data consumers can trust to broker the data Exchange and use according to those community standards, to make sure that there is stewarded and exchanged in a way by that middleman that is trusted, which requires trust, not only in the organization, but in the principles of the organization providing by.
CHRISTINA DRUMMOND: And so this is where our data trust comes in. I'll just note as we've been seeing, it's really a concept that we're trying to take this data space concept and apply it to open-access books usage data in particular. And the thing that we're trying to do is to see if this can also foster usage data benchmarking across platforms, economies of scale for data curation. But the key here is we have to find ways to not only enable trust, but to allow competitors, different publishers, different platforms, different services, to trust that they can share their data and have it kept securely.
CHRISTINA DRUMMOND: But also trust from those who are not only noted in the data, the scholars, but also those who use the data that data is being processed according to accepted, agreed upon norms. But that it's also being done by a trusted independent party. And so for our project, we actually turned to our communities. We have a number of open communities grouped by stakeholder to clarify how they rely on leverage usage data.
CHRISTINA DRUMMOND: And so the results-- we did this through a number of design thinking workshops and conversations. And we found that publishers, libraries, book publishing platforms and services, had many shared interests when it came to the usage data. And while they all curate usage data, what's more germane here is that the usage data-- there is already interested in linking that data with other data domains, from marketing to funding and grant data, obviously author IDs.
CHRISTINA DRUMMOND: And it's this process that really shows if you dig into some of these use cases, you see where potential tensions can emerge, and indeed, for ethical guidelines. Let me give you an example. We heard from research offices and funding agencies want to have that reporting to understand usage and impact, tied to specific grants, for their research office and institution reports.
CHRISTINA DRUMMOND: And even specific scholars want that data. When you turn to the specific scholars, the individuals, the scholars we talked to, we had a workshop. And we were like, OK, we understand that they have this-- you have this usage data. And they definitely saw the value in using that information to illustrate the reach and impact of their scholarship. They wanted to talk about how well their book did.
CHRISTINA DRUMMOND: But when asked how they felt about others accessing this data, they strongly indicated that they wanted to maintain control over what data was or was not visible in their PNT files. And so if you think about that for a minute, that doesn't-- they wanted to have the ability to say yes, this should be seen or it should not. And so if such scholars' specific analytics are made available without scholar consent to all those other parties, what does that mean for scholar trust?
CHRISTINA DRUMMOND: And will we get increased distrust among scholars in these data analytics systems and then see advocacy and opposition grow from that? In my own research, we've seen this occur in other industries. You can have damaged brands. You can have distrust of algorithms, tech aversion overall. And I think at this moment in time here in scholarly communications where we have a window of opportunity to get ahead of these issues and develop ethical guidelines and policies, so that we can essentially maintain both scholarly freedom and scholar trust.
CHRISTINA DRUMMOND: I did want to quickly note that the data trust, as I mentioned before, is relying on multiple operational mechanisms. And these are things we're piloting right now. We're very much a research project looking at how you could do this. And these mechanisms would be to steward and govern the ethical data use according to those community norms, which I'll note don't exist yet. But all of these technology means the technology is easy.
CHRISTINA DRUMMOND: It's not novel. We can leverage it from other industries. The contractual terms are core, but facilitating legal interoperability needs to be streamlined. And this is even more challenging stuff. There aren't necessarily standard contractual language that is out there to govern usage data, to govern not only the data transfer but the handling and what we need to mask in terms of personal information and any downstream use of sensitive information.
CHRISTINA DRUMMOND: And I will note for our project that we have our data management plan. In accordance to that, we're not collecting any personal information. And we're actually removing the last digits of IP addresses to ensure that that PII is removed. But this gets complicated when we start linking things with ORCID, which we haven't done yet, which reintroduces questions of GDPR.
CHRISTINA DRUMMOND: And so I will note, if parties are interested in that contract, please let me know because they're actually-- I'm actively bringing folks together to look at the Research Data Alliance and potentially build a working group to explore some of that language to see if we can come up with some model language around data transfer and use-- or, usage data.
CHRISTINA DRUMMOND: So with that, I want to just underscore how so much of this really does ride on ethical guidelines. You can't build those contracts unless you know how you want the data to be used or where those guardrails really are. And so this is something that needs to be done. And this is another area where we need to build that out as a scholarly communications community.
CHRISTINA DRUMMOND: In this conversation, as Lisa mentioned at the beginning, is one step on that path. I wanted to end my talk with one more slide, which points to another resource in the Open Data Institute. And this is exceptional, as folks are thinking about ethics. It's called Designing Trustworthy Institutions. That's the report, sorry.
CHRISTINA DRUMMOND: In the designing trustworthy institutions report, it highlights a number of expectations that both data providers, data users, and the objects of that data, so in this case, it could be scholars and authors, what those entities expect to retain trust in an organization. And you'll note here that the data providers, those who share data with an institution, such as our data trust, in our case, it would be publishers or aggregator services.
CHRISTINA DRUMMOND: They need to have faith that the processing is according to agreed-upon terms and create value and impact, and that their participation in a trust will not harm their reputation. So again, going back to brand and reputation. But the data users, those who want to have access to all of this normalized and processed data, the decision makers, the library publishers, and again, platforms and services, even downstream authors or research offices, they need to trust that the data access is fair and equitable, is timely and accurate, and that limitations of the data are transparent, and that that data is appropriate for intended use.
CHRISTINA DRUMMOND: We don't have overreach, if you will. And so this is where the transparency of underlying processing algorithms are so vital to not only these ethical conversations, but to understand if things are actually working as the community expects according to those norms. I'll finally note in this last column you see here that it's really important to remember that those were impacted by the data use, so in our case scholars and publishers and in publishing platforms, need to trust that the data sharing and usage beneficiary and will not cause them harm, and that the data sharing is tied to a mission, and that those who are impacted by this data use, those subjects or objects if you will, of the data, that they have the ability to engage with the staff and that the data institution in the middle, in our case, the trust, can be responsive to feedback.
CHRISTINA DRUMMOND: And so I will note that we are in the middle of trying to figure out how to do this. And all of these aspirations you see here from the ODI are really a future goal. And I look forward to that future where we have this figured out, but that day is not today. And we have a lot of work to do to develop the standards and community norms that would make this possible.
CHRISTINA DRUMMOND: Which leads me to yet another ask. As Lisa noted, we are trying to tee things up so that in the coming year or two, we can actually progress on principles. Are there ethical guidelines that need to be in place for scholar-specific analytics? What are those? So if that's something of interest, keep that in mind and be in touch with us.
CHRISTINA DRUMMOND: So with that, I'm going to transition us to moderated Q&A. So Lisa? LISA JANICKE

HINCHLIFFE: Great, and I
LISA JANICKE HINCHLIFFE: think Jackie will pin us into a gallery here, spotlight the three of us. So I have had the pleasure of being over in our chat area, as well, and engaging in the dialogue there. And I think one of the things that is coming up there-- and I'd invite maybe the distinction between information about, if you will, the authors of a work, and whether that authorship is made public in a publication, typically it is.
LISA JANICKE HINCHLIFFE: And Joe said, I think, you are aggregating the data and not disclosing certain data points in order to-- because you're not really publishing the syllabi, per se. But the difference between, if you will, author analytics, like this person published this piece and it had this many citations, like all in the public, versus reader analytics. And Christina, I think that's a question. Why are-- why do we even care about the reader analytics, in the OA world in particular?
LISA JANICKE HINCHLIFFE:
CHRISTINA DRUMMOND: So I'm happy to take the first stab at that. One of the use cases that we've heard in our work about why people care about usage data, and this came from both published-- primarily from publishers, but also from libraries. If you think about how that can form collections development, how that can inform editions and the creation of materials, if you know that readers as a general aggregate are located in certain places. So it's not so much knowing about this specific person who lives in this specific place with these specific demographics, so it's not about the individual reader.
CHRISTINA DRUMMOND: It really is about the trends. And so with our project, one of the things we are looking at is which of those trends are important. Is it by discipline? Is it by region? How specific do we have to be by region? And how aggregate do we need to be so that we can protect privacy? And those are the kinds of conversations we're having.
LISA JANICKE HINCHLIFFE: And I suppose also how detailed you have to collect in order to do the kind of analysis that you might share out in the aggregate.
CHRISTINA DRUMMOND: I'll note that the heart-- I think the two areas for us that perhaps are the trickiest is IP address, because if you collect those IP addresses, of course, you have very specific data. So we actually remove those. And that's a step we have to do to cleanse the data, if you will, to make sure it's not identifying. The other piece of that. I think this is really tricky as ORCID IDs.
CHRISTINA DRUMMOND: They are unique identifiers of individuals. And even if the individual scholar makes their ORCID public, that doesn't necessarily mean they are authorizing that aggregation and reporting and profiling of them downstream. So unintended consequences, perhaps, but I think we have to think through, what does it mean when we use an ORCID.
LISA JANICKE HINCHLIFFE: And so Joe, you're actually making a decision when you actually share the information out. You're sharing out analysis that you're doing on the syllabi, if I understand correctly. You're not literally like here's Lisa Hinchliffe's three syllabi if you want to look at her work. Or can you talk about the decisions you've made around that? Publishers, by the way, in the chat are like, well, wait.
LISA JANICKE HINCHLIFFE: I might like to know who is using my materials, the specific faculty member and institution. Can't you tell me that, Joe?
JOE KARAGANIS: So we've in the first instance, we decided that for public-facing services, like everything we do so far, the Syllabus Explorer, the galaxy, you can get pretty far down the tree to what's happening within a department. But we cut off the aggregation at basically large departments. So if we've accumulated more than 250 syllabi from the departments, you can see so you can drill down into that list.
JOE KARAGANIS: We don't like to drill down into individual syllabi, although we are building an open access collection. That's the product of faculty donations that we will display in the next version of the tool. And we are preparing a publisher version of the tool that does allow drill down into something more like a fleshed-out catalog description of the class. So that begins to provide some closure.
JOE KARAGANIS: It moves a step closer to the thing itself, but it's not going to permit re-identification of faculty. If we're-- and that's going to be a private product that will be limited to an academic and publisher uses. So that's how we're trying to navigate this. That's a step further than we've taken to this date. But some of it is a function of having, I think, learned from the reception so far and the uses so far of the data we have presented, and feeling more comfortable with taking that step than we would have three or four years ago.
JOE KARAGANIS: Just because some of the scare scenarios that we were worried about just haven't emerged. Questions around use of the tool to identify and harass faculty hasn't emerged. We have been picked up in generalized attacks on the university system, because you can look at the top rankings of assigned texts and you see that Marx is up there. So we do feed into those sorts of-- that mostly right-wing commentary on the university.
JOE KARAGANIS: But generally speaking, we've received very little feedback-- very little negative feedback from faculty and no documented cases where the data, at the level we've made public, has been misused. [INTERPOSING VOICES]
JOE KARAGANIS: Just one other point. We do make the full data set anonymized, but in textually complete, available to academics for research purposes. So we part of what we mean by being the Open Syllabus Project is that we contribute or try to serve the research needs of the community that's contributing the raw materials. So that's been important to us. And that raises a whole other set of questions, probably we don't have time for that.
JOE KARAGANIS: So that's a short answer to your question.
LISA JANICKE HINCHLIFFE: Yeah, and what I hear in what you're saying is you've articulated your own, I guess I would say, operating principles, which is we had a concern about this, so we put out what we could and we step-- held back. And then we said, OK, so that didn't happen, so we feel comfortable going a little further. And I guess I'm inferring from that, should you suddenly discover that the point to which you've gone is causing harm, you would re-evaluate that decision.
LISA JANICKE HINCHLIFFE: It might be hard to put the genie back in the bottle, as they say, but you would at least-- you'd need to revisit that decision.
JOE KARAGANIS: Yeah, I think we're certainly prepared to do that. And it comes up for me all the time around questions of academic freedom in countries whose political situations are unstable. And I'm genuinely torn about whether we should add Poland to our black list, or we should add India. There are faculty under threat in both places there. And I think, systematic threats to academic freedom. It's not clear to me where the threshold is for thinking about curricular choices in that context, whether we should pull the plug on India because of threats that are operative at other levels.
JOE KARAGANIS: This is the day-to-day one. LISA JANICKE

HINCHLIFFE: But I think
LISA JANICKE HINCHLIFFE: what strikes me is if this was easy, we wouldn't have a panel. So there's a really good discussion going on primarily around textbooks, actually, in the discussion. But Christina, I'm thinking about open access books that are potentially-- we're also looking at how do these things get funded, like open access books. And so my understanding is funders would also like to know, essentially, the return on their investment, whether it's a library or a philanthropy or, I mean, even a non-profit that puts out its own materials is asking those questions.
LISA JANICKE HINCHLIFFE: Are we reaching the audiences that we want to reach? And so I'm wondering, are you starting to see people say, OK, we want to know, but the importance of this data trust you're saying where they say, but to do this protection, will have this trust notion going on where a third-party is serving as a fail-safe. Or what's the ethical framing we can put around that.
CHRISTINA DRUMMOND: I think that-- so yes, we're seeing those use cases. And these use cases really vary by the different stakeholders. And the challenge is there has to be some kind of ethical review process and really principal development to guide those conversations. Because what a trust would do or any data space would really exist to facilitate, and that is the stewardship of the data according to those principles. And the challenge is right now we don't really have those documents for how these analytics should be used.
CHRISTINA DRUMMOND: I think Joe has done a great job explaining. It's a little bit of, hey, we're doing our best. We'll see what works. And to be fair, I noted this in the chat, that's what a lot of companies are doing right now with respect to data analytics. I was part of a research team where we surveyed corporate America to say, OK, how are you guys handling these issues?
CHRISTINA DRUMMOND: Sometimes it's ethical review boards. Sometimes it's by gut. Sometimes it's hey, we're going to do our best and watch for problems to crop up. There are a lot of different approaches right now. But I think as an industry you can always create-- here's what we want as our ideal. And so, right now the trust doesn't have that, because it's really not for the trust to define.
CHRISTINA DRUMMOND: It's for the community to define. And so I think we have to go back to the scholars and each of these parties to have this question answered.
LISA JANICKE HINCHLIFFE: So believe it or not, and I guess we probably can believe it because we're familiar with our challenges here, we are reaching close to the end. And we had one last brainstorming for people to do. So Christina-- and a request was made. Christina, maybe if you could actually share the screen with the group map, showing for people who are for some reason, perhaps are on their phone or something unable to get to it and participate in the same way.
LISA JANICKE HINCHLIFFE: While she is doing that, I will go ahead and thank everyone for participating. We will go through the chat and see if we can respond to any additional questions that we weren't able to get here. We will also put the slides, and then I promised in the chat-- Christina, you've probably seen it. There's some desire if we can figure out a way maybe to export the results of the brainstorming so that people can have that as a record, since I do know there's a limited number of logins that we can actually have on this group map space.
LISA JANICKE HINCHLIFFE: So ideas needed, resources, relationship standards. And with that, I am thrilled that we were able to have this conversation at SSP. I think the intersection with the community that SSP represents is so critical for these. And I can't wait to hear more about what people might like. Please feel free to get in touch with me, Christina, Joe, any of us at any point.
LISA JANICKE HINCHLIFFE: And we will be happy to continue this conversation. And including Christina, we'll be happy to-- she's a very inclusive person. Then I-- and there's a lot of opportunities to participate in the work that she's leading that I think is going to become even more important as we move to different business models, particularly with open access publishing and open content. So with that, we will call this session to a close right here at the top of the hour.
LISA JANICKE HINCHLIFFE: Thank you so much, as well, to Jackie from SSP, who has been our technical support for this session. So much happens behind the scenes at a conference. And I think there deserves to be a recognition of those. Oh, she's even maybe going to put herself on video. She is. Thank you, Jackie, so much. And thank you to everyone who was able to join us today. Please have a great rest of your day.
LISA JANICKE HINCHLIFFE: And don't forget to join the rest of the.

Cadmore media player playing video Data Privacy, Ethics, and Governance for Usage Analytics: An Emerging Open Access Dialogue

Video Player

Transcript

Segments

End of Video Player Control