Name:
NISO Two-Part Webinar, Discovery and Online Search, Part Two: Personalized Content, Personal Data
Description:
NISO Two-Part Webinar, Discovery and Online Search, Part Two: Personalized Content, Personal Data
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/ee4daff5-155c-4ab2-9d84-b246fac415f4/videoscrubberimages/Scrubber_3.jpg
Duration:
T01H35M21S
Embed URL:
https://stream.cadmore.media/player/ee4daff5-155c-4ab2-9d84-b246fac415f4
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/ee4daff5-155c-4ab2-9d84-b246fac415f4/NISO Two-Part Webinar%2c Discovery and Online Search%2c Part Two.mp4?sv=2019-02-02&sr=c&sig=4m0E7m%2Fbd4m5cGnQfrPusq1DulJbjXRDr3t%2FsRhk7ck%3D&st=2025-01-15T05%3A24%3A27Z&se=2025-01-15T07%3A29%3A27Z&sp=r
Upload Date:
2023-08-09T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
All right. Good afternoon and welcome to the niso monthly webinar for June 19th, 2019. My name is Todd carpenter and executive director and I'll be the moderator for today's program. The session today is part two of this month's two part program, focused on discovery and online search, which will address personalized content and personal data.
If you weren't able to attend last week's session on drivers of change in online search, it was a great conversation and I encourage you all to listen to that recording, which is available to all members of niso as well as registrants to this two part series. So you should see on your screen a welcome screen and you should be hearing my voice either through the audio voice over IP network or Alternatively dialed in using the dial in instructions that were sent in.
If you need technical assistance at any time, you can get that by contacting Zoom. That's our service provider using a live chat that's at support zoom.us. You'll lead today's webinar ID, which is 680971180. We will post that into the chat so that you have access to it.
Some frequently asked questions. We will be making the slides for all of the speakers presentations today. We'll be posting those on the website for this event. We are also recording it. So that you can view and listen in at any later time. Usually takes us about 12 to 24 hours to get those files back from our service provider and we will email you all a link with information on how to access that recording.
So we're coming to the end of our summer programs. We take a bit of a hiatus in July and go on a little fishing trip, so to speak. But before we do, we will be at the American Library Association meeting later this week, starting on Friday, Friday afternoon, from 1200 noon to 4 p.m., we will be hosting the 12th annual niso bisg forum on the changing standards landscape. This is a free event at the convention center room 148.
On Saturday, we will be hosting our niso annual members meeting and the standards update the Marriott Marquis from 1 to 230 PM and then finally on Sunday at 230, we will be talking about the Ra 21 project. And the Coalition for SeamlessAccess. That will be in room 204 at the convention center. So if you need more information about all of the niso events and standards activities taking place at ala, you can click on the link at the homepage of the website.
More information about the fall programs for niso events, which will kick off in August, are available under the Events tab on the website. Before we begin, I'd like to just draw your attention to a new project that's being spun up within Nasa. We are launching a revision process to a fairly old standard within Nasa z 39.4, which was actually withdrawn in 1997.
But we're spinning up a new group to reprise that standard. That standard focuses on indexes and the creation of indexes. As you can imagine, since it was withdrawn in 1997, there was a technical report that replaced it. But that report has is somewhat out of date since it doesn't focus on, as one might imagine, say, electronic publishing and all of the things that have happened over the last 22 years in digital publishing. If you'd like more information about this project, you can get it by clicking on that link.
I know you can't do it during the zoom, but you can get it with the slides or the link is under on the Nasa website about a new ISO project about bringing these indexing standards up to date. We are looking for volunteers to participate in that working group. So if you're interested, please do let our director of programs, Nitti Lagasse, know.
So we've become accustomed to personalization in nearly every aspect of our daily lives. Long gone are the days when Henry Ford famously said any customer can have a car painted in any color he wants, so long as it's black. Just this morning I was in a cafe listening to the various ways people wanted their morning caffeine fix. I personally have purchased bespoke running sneakers.
We all like to, quote, have it our way. To quote the 40 year long, 40 year long Burger King advertisement slogan. And we've come to expect a world that will provide us those options. And in many ways, the world has complied with our desires. And when we come across a situation when it doesn't, we often get aggravated and move along to another service that does.
And much of today's digital experience is customized to match our preferences. Your purchase or browsing history or what a system might know about your previous activities, your news feeds, the movies offered up on your Netflix or Amazon accounts, the songs you listen to on various streaming services, the advertisements you see on most websites and the topic of today's session.
Your search results are all driven by personalization algorithms. Google began personalization personalization of its search results based on a user's prior search behavior with information stored in browser based cookies way back in 2004. Using these data, Google could then provide more relevant search results than simply relying on the relevance ranking of a generalized search term.
With an ever increasing pool of data on users behavior, Google has become exceptionally good at providing search results and it is providing these results now in a variety of ways through a variety of devices, not simply just words typed into a search box. The library world, for its long standing and deeply embedded ethics of privacy and intellectual freedom, has been reticent to dive deeply into these services in many respects.
Libraries are one of the few remaining places online that has stood firmly against the automatic inclusion of many of these personalization services. Now, patrons, of course, may choose to go that extra step and register for personalized services, but on the whole, libraries have limited personalization features by default, out of deference to patron privacy. Now, one might ask to what extent have libraries hamstrung their own services by limiting personalization or contextual information about the patron?
Are there ways in which we can navigate this world where users come to expect personalization and yet libraries are adhering to their ideals of privacy and anonymous online activity? Can we as a community, provide that personalization in a way that protects users' privacy and shields their Information Discovery behavior from monitoring? These are many of the interwoven threads that should prove today to be leading us to a very fascinating conversation.
So kicking us off on these topics is Bob katzenbach, who is taxonomist and director of business development at access. Innovations Bob is going to be talking about semantic search context and how those topics apply to discovery services. So, bob, I am going to stop sharing my screen and then pass the presentation over to you.
And you can now see you see your slides, you're all set to go. Although no. There we go. How about that? Is that better? OK, I can hear you now, but you need to go into presentation mode.
I am. OK does that working. There you go. That's perfect. And how about that? Are you still seeing just the slide? Just the slides. Fantastic thank you so much, Todd. And Thanks for having me.
Good morning. Afternoon, everyone. My name is Bob piszczek and I'm a taxonomist and director of business development at access innovations here in sunny Albuquerque, New Mexico. Today, I'm going to talk in an introductory way about semantic search, why it's important ways to implement it and some other related topics. Some of them are based on personalization and some of them are based on other things.
This talk is a version of a talk that I gave at interface, which is now a related body to niso this year, and they asked me to reprise a version of it here today. So at interface it made a lot of sense that my talk was grounded in the context of scholarly publishing. Perhaps it makes oblique sense here, but some of my examples are Google and other search engine based, and some are very specific to scholarly publishing.
So the goal of my talk is to outline the topic and introduce the concepts around semantic search, and I'll do a little framing of it in the context of searching in very large repositories of specialized content. So when I started researching this topic, it seems that the term semantic search started bubbling up in the academic published journal literature sometime in 2002.
But it's only in the last couple of years that it's really come into the more popular information services consciousness, and there's a lot of confusion about it because lots of people say semantic search when they mean very different things. So here's a brief outline of my talk. My goal is that by the end of the talk, you have some idea of what semantic search is, or at least about some of the sundry approaches that people mean when they say the string semantic search.
There should be some time at the end of the block for questions and discussion. So ironically, since we're talking about semantics and I'm a taxonomist, people use the term semantic search to refer to a variety of things, but they all have something in common, which is trying to extend or amplify or improve search relevance results beyond matching keyword strings, using some method or methods to determine the context of the search.
This takes a number of forms about which, more shortly, Google has popularized the tagline things not strings to explain its semantic search, which involves a knowledge graph which more about that later and some other stuff. But first, I think it will be helpful to investigate the problem that we're trying to solve. So what problem are we trying to solve? What's the problem with regular or basic standard search? And again, some of my examples come from the world of scholarly publishing.
But this applies to any specialized repository of content. Most of the time, search is limited to whatever default search is available on the platform or engine used by the publisher or library or repository. There are almost always options to tune the search behind the scenes, and they're almost never used. Further, many of these platforms have limitations. Most are based on the assumption of a regular string based search.
When I say regular or basic search, what I mean is that you enter a keyword or a keyword string in a box, and the search application using an inverted index tries to match that string with documents that contain that word string. Sometimes fuzzy matching is used to catch misspellings and whatnot. But in essence, what regular search is doing is literally looking for words in documents.
That is to say it's matching text strings. The problem for most owners of specialized content repositories is that in very, very large specialized content sets, basic search fails because, one, it's simply looking for text strings and does not have the detailed kinds of indexing that Google does on a constant basis. Two because language is ambiguous and three, because specialized repositories tend to have very, very large content sets with extraordinarily detailed and specialized vocabularies used in the content that change over time.
In essence, simple search just looks for the words that you put in the query. Here's an example from Google Scholar. I have searched for the word horse. Two things are noteworthy here. One, I got over 3 million results. Now, it's very commonly known that most people never get past page two of the search results and even more people never get past the first half of the first page.
So it's extremely important that relevant results are included on the top set of search ranking results. So number two, the logic of the algorithm here must prioritize the author name because as you can see, I hope in the red boxes, the first paper listed is not about horse or horses. The author's name is red hyphen horse. This is in my analysis as an information professional, not optimal.
Why have a place? Why not have a place to search for author instead of including author in the keyword search box? The universal box at the top of the application or omit author names from the search and provide a separate way to do it. So the first results that you see here are not relevant to horse even more terrifying without using any fancy synonyms or semantic trickery.
If I search instead for the string horses, I get 1.7 million results, which is almost which is slightly over half of what I got for the string horse. This seems to be triggered by the form of the word in the title. In other words, Google Scholar doesn't recognize simple English plurals as the same string there 0 natural language processing going about it does not normalize or stem anything.
It literally searches for the word horse or horses, depending on what you put in the box. So it's literally, and in my analysis, merely looking for the instances of the text string that you typed in the box. So the search simply tries to match the word or words in the box in some place in a very large set of documents with some priority seemingly given to which field, title or author or abstract in which the word appears.
Google search, which is different than Google scholar, seems to have figured this out long ago, but for some reason the Google Scholar search has not. So some simple language natural language processing or dare I suggest taxonomy would go a long way here, such as recognizing plurals and other types of synonyms instead of just literally the text string words that are typed into the box. Here's another example, this time from a specialized repository from a mid scholarly publisher.
The same concept can be expressed in multiple ways besides morphological variants like plurals and adjectives. The problem I'd like to highlight here is acronyms. So this problem. So while all acronyms have a great number of meanings within any within some specialized field, acronyms are so ubiquitous that they should be accounted for In Search.
So this publisher, which is a aeronautics and Astronautics sort of publisher, so flight things about flight and flight vehicles this. So I searched for unmanned aerial vehicles, which we call drones in and I got 170 search results. The ubiquitous acronym for this concept, uav, actually returns three to four times as many results. So those result sets are separate. And if you were a researcher doing a survey of the literature, you would have to know, first of all, that this acronym is ubiquitous, and you would also have to think about searching on both the text and the acronym to come up with a complete search result set.
I would like to note that since I took these screenshots, this publisher is undertaking a steps to improve their search and website, but that's just an aside. So it's obvious from the results listing of these respective queries that the way the concept is expressed in the title and in the search string has a major bearing on the content returned. So let me say that one more time. The way the author chooses to represent the main topic of an article in the title has a major bearing on the search results.
This, again, seems to me to be not optimal. So why does search fail? Search fails because simple string matching is not adequate for large specialized repositories of content, especially those with technical or specialized language that evolves over time. Also, as I always like to say, language is ambiguous. So the basic idea is that semantic search goes beyond simple keyword string matching to try and provide some context around the search to drive relevant results to the user.
This takes a variety of forms, the common denominator of which is that semantic search does not merely try to find keywords, but examines the semantic context of the search query to drive relevance. This can include synonyms and lexical variants, natural language processing and things that Todd was describing, like the location of the user or previous searches by the user. It can also involve graph databases, ontologies to drive results for related concepts.
And a host of other methods. Some of these are quite simple and simple to implement. And some are quite complex and require a lot of algorithms. So I'm going to go through some examples and try and give a survey of the various kinds of things that are going on in semantic or contextual search. So one basic kind of semantic search is designed to help people find what they're looking for without knowing the exact or specific language or terminology used in the content.
The idea is to allow the search engine to match near matches instead of just exact strings. We call this fuzzy matching and other similar terms like that. You hear fuzzy matching a lot. This can cause noise because it might match things that look similar but actually are not. So use caution. But the upside is that someone might not know how to spell gastrointestinal stromal neoplasms, but if they can get close, they'll still get a match.
You see this a lot in Google results. Levenshtein distance is a term you'll hear thrown around in fuzzy matching and levenshtein distance is a way to measure the similarity of words. Essentially, if I have two words, let's say Rob and Bob and I change one letter to transform one into the other, that's a very close levenshtein distance match while Bob and anti disestablishmentarianism are far, far, far apart.
So it's a way to measure the distance between word and word b, and it's used in some kinds of fuzzy matching algorithms and to drive results instead of exact string matches. Another use of another way to drive contextual search is to parse queries. So if I go to Google and I Google Harrison ford, I get a search result. And on the left hand you see the Wikipedia article, the imdb and some news stories.
And on the right is are the results from the Google knowledge graph. But oftentimes when we're searching for things, we type in a natural language question and not just a keyword. So if I type when is Harrison Ford's birthday? Google parses this query. It is not looking for the text string quote, when is Harrison Ford's birthday, close quote.
In documents. What it does is say, oh, when is a type of query. So I'm going to push that over to the side here and save it and eliminate it from the search string. Harrison Ford it recognizes that a string. And it makes a Boolean and between Kherson Ford and birthday. So the way Google reads this query is something very much like and I'm not an algorithm expert, but something like when is in brackets Harrison Ford and birthday.
And and behold, the first result on the left comes up with the exact information I was looking for. Even though it's not looking for the text string I searched for in quotes in the Google box, it parsed the query to give me the information that I need. So then you'll see over on the side we're still getting results from the Google knowledge graph, which I'll talk a little bit more about in a second. So the query is parsed and then it referenced the graphs to derive relevant results that are not based on the text string search, literal text string search.
Another similar set of methods, which goes back to personalization, are in a category called contextual search. This moniker applies to a variety of techniques to use information gathered about the user or location or recent searches and other stored information, whether that's cookies or your IP address. So some applications, notably Google, but other map based applications use your location to derive relevant results.
This is usually done either using your IP address, which contains your location information or the GPS data from your phone or some other device. So if I Google pizza, the first results aren't the Wikipedia page on pizza and definitions on pizza. Rather, I get suggestions for pizza restaurants near me, including a map, in fact, the entire first page of results which I was not able to screenshot the whole thing for display, but the entire first page of Google results on the left comprised pizza restaurants in my area.
Now note that on the sidebar on the right hand side, the Google knowledge graph is delivering me some basic information on pizza, including a link to the Wikipedia page and which I find very interesting. Pizza stocks in which I might like to invest, which I was not expecting also from Google, is something that I call user activity, which is based on stored cookies and other things, including your most recent other searches.
So if you. And there's a few caveats here to be made, some people, some companies promote their content to be Race to the Top when certain search strings are triggered. But I was recently, looking for cars, so if I googled Jaguar, am I going to get cats or cars? Well, I get cars. I can't quite say why, but I assume it's because I was browsing a car site shortly before I did this query.
Someone who regularly searches for animals might get results for large cats instead of cars at the top. And that's based on your most recent searches. In fact, if you go Google Harrison Ford and then your next search is when is his birthday? And you don't even put Harrison Ford the string in, chances are very high that it will return the same results that I got that I showed. On the previous slide.
So this is something that can be useful for publishers and libraries, and other societies with repositories if and only if your users log in when they come to search your site or they stay persistently logged in because the browser remembers them or whatever. So if you're a cancer research organization and your members have some kind of indication about whether their doctors or researchers or patients or pharma reps or students, they can get relevant content delivered based on their member profile.
Now, as Todd mentioned, this has security and privacy implications, so those things are best taken care of behind firewalls. And naturally there's some work to do to set it up, but it can be quite powerful. Incidentally, I also googled jaguars, plural, expecting to get the cats and instead I got the football team probably because I regularly read NFL content. I would like to note that I use the same Google profile across my devices, so this does not mean I'm reading NFL news at work.
Now I'd like to talk a little bit about the Google knowledge graph, which people are becoming more and more familiar with. They rolled it out quite a while ago, but that right hand box with the information is becoming ubiquitous. I'll show you a screen grab in a second. Briefly, when a query strikes some node in the graph, which is essentially we can quibble about terminology, but it's essentially a large ontological structure.
In addition to providing search results of the web pages, a sidebar appears with other information related to the search. This works particularly well with entities and less well with concepts. But here's a good example. If I search for Empire State building, predictably I get the website for the Empire State Building. That's great.
I get its Twitter feed, the Wikipedia page and so forth. But over on the right hand side is a bunch of information that's parsed from the Google knowledge graph. So I'm going to Zoom in and take a closer look at what those results look like. So here are the cut and pasted, the right hand vertical bar. So you can see the whole thing. I grabbed the stuff from down screen. So we get some pictures of the Empire State Building with a link to more pictures and links to their website.
Google Maps for directions addresses, statistics about the building questions and answers, reviews popular times. People go to the Empire State Building stuff about the movie of the same name, links to social media and links to other people search for, which is all pretty cool and a much richer experience than just the web page results and it's pretty intuitively presented.
So how does this work? Now, this is a mock up, obviously, because I don't have access to the whole thing, but it works something like this. There's a ginormous ontological structure, specifically a knowledge graph. And if your query hits a node, which is the pink node at the beginning there of the object, it returns a bunch of other information associated with that node using essentially linked data and the semantic web so that once you hit the node, it's able to pull out nearly.
I'm pointing at my screen as if you could see me. I always do this. It pulls out a bunch of nearly related stuff and delivers it to you in the knowledge graph box on the right hand side. So this is a totally made up example, but it's very plausible representation of the information that we saw displayed on the last screen. So it's a lot of work to build these kinds of knowledge graphs, but they are extraordinarily powerful and I think the industry is moving in this direction.
Now as a taxonomist, obviously, I think that a good way to approach semantic search is to basically try to control the semantics. Taxonomies can help in a number of ways. Tuning the search to prioritize tags over free text. If you have your documents indexed, allowing users to browse a taxonomy of subjects using taxonomy terms to drive typeahead. Or did you mean style redirection using synonymy to drive the same relevant results for a number of string inputs?
This allows for improvements in search and improvements in interfaces, which is how users interact with the data. So the irony of document categorization, I like to say, is that we're not interested in the words, we're interested in the concepts. But the only window into the concepts we have are the words in the documents, which is why we use subject metadata to describe things using a taxonomy or other controlled vocabulary and make that metadata available to the search engine.
I realize this is not new news, of course, but it illustrates why the subject categorization with the controlled vocabulary helps organize the data to make it more efficient for search. So all my previous examples showing failed searches using abbreviation synonyms, acronyms, and other lexical variants can be solved with a robust, well-formed taxonomy and document tagging program. Obviously this is not new to the information industry, but in the context of a discussion about semantic search, I thought it merited some mention.
So what can good document categorization achieve and look like? So as an example, I'm going to take I'm going to show you the PLOS one platform. Full disclosure Plus is a client of my shop. And we work with their taxonomy team. Plus uses a very large thesaurus, about 10,000 preferred terms, if I recall correctly, and automatically tags each article for up to eight subject terms from their vocabulary for search.
They also expose the full hierarchy for browsing. Here's the Browse interface at the top of the screen here. You can see if you scroll over, you keep going down and down further to the hierarchy. And you can see how many articles are attached to each term. And if you click on a term, it launches a search. And here's a screenshot of a sample article from PLOS that I clicked on when I was taking screenshots and doing the Browse.
You can see in the lower right hand corner, those yellow bars, those are the indexing terms that have been applied. So that's exposing the subject metadata to the user. This is great. You don't have to guess how they phrase the term you're interested in. You can click the yellow bar to launch a new search and those little target looking buttons next to it, they actually crowdsource metadata.
So if you think a term has been applied in error, you can flag it and it sends an email to their taxonomist. So these are pretty standard applications of metadata, but I think innovative ways to expose them to the user to make search and contextual search about topics easier to do. And the same principle applies to tagging content for machine learning. I'm running out of time and I want to make sure I leave time and I just have a couple more slides.
So I'm going to go quickly through the end here. J store has a robust taxonomy program behind the scenes, and they have a pretty cool thing called JSTOR text analyzer, where the document becomes the search query. I think semantic search is heading in this direction. So j. Store labs built this application. I'll show you a screenshot in a second, which relies on a combination of taxonomy based tagging and naive topic modeling to create a new search experience.
The idea is that any document that you can upload OCR or take a picture with of your phone becomes the search string. So it OCR it. If it has to analyzes it with naive topic modeling and a taxonomy and then brings you up a results page about other documents that are similar to the paper that you're looking at or wrote or whatever. It's a really cool way to discover content, generate bibliographies and sets of research citations and other things like that.
And so it recommends content from the 7 million article JSTOR corpus, and then you can curate the results using the sliders and boxes that you see on the screen here. To make it more accurate, I totally recommend you go play with JSTOR text analyzer because it's a very cool beta project that takes a new perspective on Search using traditional metadata as well as machine learning taxonomies.
So the last couple of slides. So we've seen a bunch of things about semantic search, all the things that people are calling semantic search. Some of them are easy to implement and some of them take a little more work. So practically speaking, how do we get this done? If you want to implement semantic search, where do you start? So any existing search platform, whatever the back end of it is, whether it's solar or Elasticsearch or whatever, it has options that you can tweak about which fields to prioritize, how to weight things, whether to look at metadata tags before free text.
You may need a developer to access and pull these levers, but they can be configured to take advantage of some of these built in features. Oftentimes, almost always, this has things like let's include fuzzy matching or don't changing other settings like query parsing and putting automatic Booleans in the query strings, using dates to rank relevancy and prioritizing certain fields or keywords or something like that.
More complicated. Next level of implementations for real good semantic context. Beyond configuring search to improve results, consider taxonomies and tagging both for retrieval and interface options like type ahead and did you mean. And the next level after that is something like a knowledge graph, which is a considerable effort to construct and deploy. But it's a very powerful tool.
And I would be remiss if I didn't put one final plug at the end saying that understanding your users, whether that's through interviews and user modeling or considering their personalized data, like previous searches and location on, is another Avenue to pursue. With that, I'm going to wrap it up and say thank you. All right. Thank you, Bob.
That's a great introduction to some of these issues. And how they interplay. Want to remind people that they can use the Q&A functionality. So if you're in a minimized window, that's the little q at the bottom or if you're in full screen mode, that's the little green menu bar up at the top. If you click on that, you can get a menu bar and click on the question mark, type in any questions.
We'll moderate those as we go through the presentations. So um, in terms of the knowledge graph that many of these systems implement, where would you say that contextual information sits compared to other information in the graph, say taxonomies or some of the natural language processing that you talked about?
So where in the context of your search system would you say contextual information falls? Is that near the top? Is it somewhere down the list? Is it where? I think most of the time the contextual information gets taken into account in one way or another before in the system workflow, before it tries to find relevant nodes in the knowledge graph to deliver information.
So if you put in, let's say, Jaguar or Jaguar or something based on your contextual information, it might try and decide whether you want football teams or cats or cars and then at that point drive you to the appropriate node in the knowledge graph. So my understanding of how it works, which is largely research based and non-technical, is that those contextual determinations are made before trying to drive to a node in the knowledge graph to deliver results.
Does that make sense? No, absolutely. So you're using the contextual information to narrow down the elements of the graph that you're looking at, right? Because at the end of the day, you're going to pick a node and its related things in the knowledge graph to display and you have to select which node you're going to pick based on cookies, location, previous searches, whatever that thing, or you're logged in and have your preferences set or you know, the topics of interest selected before it drives you to someplace in the graph to try and deliver it.
And you touched a little bit on natural language processing. How does that is there a difference from your perspective in terms of the. Voice versus text typing. Is there an interaction in those systems? Is there context that can be drawn from the ways in which people are interacting with the systems?
That's interesting. I'm certainly not an expert on voice driven systems, but I've also certainly been to a lot of presentations and listened to lots of talks about them over the past year or two. It seems to me that voice assistants are behind. It also seems to me that they do more stemming than they do fuzzy matching. That is to say they can.
They'd rather not because the voice assistant can only return you one result. It can't deliver you a list of things to parse through. So it's probably doing more things like looking in a dictionary to find if it's the adjectival version or some kind of synonym rather than parsing it and doing a fuzzy match on something. But that's just an educated guess on my part.
OK well, I want to remind people if they have any questions. As we're moving forward, you can certainly ask them. We can circle back to Bob either if we have some time at the end or alternatively, if you have a question, we can actually we can get in touch with Bob and provide a written answer if we don't have time to do it on the live call. Absolutely thank you. Thank you so much, Bob. All right.
So now we'll move on to our next speakers who are Amanda Wheatley, who is a liaison library for the management business and entrepreneurship program at McGill university, as well as an apology. Sandy, if I. My French isn't up to snuff. Sandy which who is liaison librarian for the political science, religious studies and philosophy departments again at McGill University.
Amanda, Amanda and Sandy will be discussing the role of voice, interaction and discovery in some of the implications for that those services for libraries. Amanda I think you're going to kick us off, so if you want to draw up your slides. Get you ready to go. Thank you. all right.
Can everyone see the screen there? No, not yet. Oh, is that working now? No, you have to click Share. I did click Share and now it's not taking me back to the webinar. Yeah So it's brought me to a survey for the. Amanda, this is Jill.
Look for the Zoom meeting window. And use the Share icon in that Zoom meeting window. There we go. OK now you're there. Let's get on the test. All right. There we are. And go into SlideShare there. Perfect thank you.
Now you're all set to go, Amanda. Perfect thank you guys so much. Um, yeah, so I'm here with Sandy. Hi Hi. We're going to be talking a little bit about voice assistance and where we see the role of the library playing in terms of how we communicate these to our students and our faculty and specifically with the implications of information literacy.
So as previously mentioned, I am the management business and Entrepreneurship Librarian here at McGill and my colleague Sandy. So I'm the librarian for political science, philosophy and School of religious studies. So the two of us are heavily involved in our virtual reference committee. And through that, it sort of stemmed this interest in artificial intelligence and how we might use voice assistants to facilitate a reference process.
So today we are going to do a little bit of an introduction to voice assistance for those who aren't familiar. We're going to talk about the role of AI in libraries, and more heavily focus on the future of information literacy and a few of the next steps in the research process. So to start things off, a little intro to voice assistants. So if you're not already familiar with how these work, essentially we have a voice driven agent who then sends vocal recordings to a server.
This is where I think the semantic search that Bob was talking about earlier is really going to play a crucial role. The recording is then interpreted and created into a command. The command is sent back to the voice assistant and then the assistant will then relay that data, play any media or complete tasks as assigned. And that's kind of the quick little lifecycle of a voice assistant and its role in retrieving information.
Now, the major players in the commercial game, as we all know, apple, Siri, a little bit less Microsoft's Cortana and then Amazon. Alexa came out huge in 2014 and Google, the Google Assistant and the Google Home have been also dominating the market, and they were released in 2016. So you can see within the past 10 years that this field has really started to grow.
Um, now, again, if you're not familiar, you might be asking, what can these assistants do now? Matthew Hoy wrote an article just last year about voice assistants for the medical reference services quarterly and highlighted just a few of the capabilities, including texting, making phone calls, completing emails, reading this information back to you. They do, as we know, those basic informational queries. They can set timers and reminders.
They can work as a calculator. They control our media. I can ask Google to play something on Netflix. If I choose, I can. If I have a connected device, I might ask Google to turn on the lights or lower the thermostat. So it's all those sort of comfort things that we think about. And of course, I think Siri has been very good at telling jokes and stories.
Those make their way through the news quite often. The aspect here that I really want to focus on is that basic informational query. So each of these devices is capable of answering questions within their known item limits. One of the things that we were talking about previously that Bob had mentioned about Google really understanding how you search and prioritizing results that are relevant to you.
So the more sort of complex our questions get, if Google understands how I search, I'm going to start retrieving very relevant and accurate answers, even if it is just a basic voice assistant, even if it's Siri or Google, it's going to get to know how to pull results for my style of searching. And I think that's something that we need to be really cognizant of.
And then, of course, these voice assistants can be updated independently and by third parties to enhance or grow features. So if you didn't already know this, you can actually add skills to Alexa and Google to grow what they're already doing in terms of those basic capabilities that we just looked at. The number of Amazon, Alexa, Amazon, Alexa skills that have been added as of 2019 is over 80,000 globally.
The us is dominating this with 56,000 skills. So if you can imagine those basic capabilities of an Alexa device and then added on each of these skill frameworks which are often open source so anyone can download these things. And even creating one on your own is not all that hard to do. So you can upgrade Alexa to do things that it wasn't originally programmed for, but that could have huge implications on the research process, on how we teach information literacy, on how our students and our faculty go about their their, their research.
And of course, another thing that we have to keep in mind is the growth of these types of devices. They like I said, they emerged on the market about 10 years ago and they've shown no signs of stopping. Smart home devices are projected to grow to 66.3 million per household in the United States by 2022. And I think this is something we need to pay attention to, is that we are growing a generation of device users who rely on this type of technology for their everyday use.
And the research shows that these personal habits are indicative of future research habits. So if we have students in elementary schools who have Alexa in the classroom or using Google homes to kind of navigate their everyday life, why wouldn't they start using these types of devices for research? So I'm going to hand it over to Sandy to talk a little bit more about what we're doing in terms of artificial intelligence in the library already and where we make room for these kind of voice assistants going forward.
All right. So I'm going to talk about an environmental scan that we did to look at what actually are libraries in Canada and the US doing with AI. So in terms of our environmental scan, here's the methodology that we used. So we evaluated University and University library websites of 25 research intensive institutions, again in Canada, in the us, and we searched for keywords like artificial intelligence, machine learning, deep learning, AI hub, and we specifically focused on the library website part on strategic plans, missions, visions to, to see if I was something that libraries thought about.
And then we looked at topic research or subject guides to see if there was any information there about it. Were they engaging in any kind of programming about AI or any partnerships with researchers? Or external hubs. And then for the University websites, we looked at hubs courses that were taught on artificial intelligence and also major researchers in the field.
So if you want to look at our sample. So we looked at the U15 in Canada, so the 15 major research universities in Canada, and then we looked at the US top 10 according to times, higher education. Our rationale for using this sample is that we figured, well, we want to start with a smaller sample. And then grow outwards.
But we wanted to see what the major research universities who get funding or grants. And have the researchers in the field were doing about AI. So maybe not surprisingly, 100% of University libraries do not mention artificial intelligence in their strategic plans, so it does not seem to be on their minds at all. And then if we look more in depth at the results, well, all universities have an AI presence of some sort, whether it be a hub or a course offerings.
There is artificial intelligence happening in the institutions. Only one academic library as a subject guide on AI, the University of Calgary in Canada, few libraries offer programming and activities related to, I think it's only three, and most of them tend to be talks given by external researchers. 68% of universities have significant researchers in the field, and although most or some libraries actually have digital Scholarship hubs or engage in digital scholarship, there is no involvement with AI whatsoever.
And then more specifically, of the 25 academic libraries that we sampled, only two are collaborating with AI hubs in significant ways, and those would be Stanford and MIT. We're doing really interesting work, but for our purposes, for this webinar, we decided to focus on a case study of Waterloo. So the University of Waterloo in Ontario, Canada, they are not partnering with the library, but we chose this case for a specific reason.
So their artificial Intelligence Institute is doing really interesting projects and they have a lot of potential impact on the research process. So first of all, they're doing speech transcription, so making sure that speech decoder engines are able to transcribe audio into understandable sequences. So taking the natural language that people use and transcribing it into understandable queries and they're building a search and semantic engine to retrieve those results, which sounds a lot like what we call a catalog in a way.
And then they're building a learning object repository network which will focus on knowledge extraction and learning object mining, and it will address some problems such as representation and extract extraction. Sorry of learning object repository contents, but we decided to pick this one because we thought it was interesting that although these are all things that librarians could say, they have an expertise in metadata, search queries, cataloging this project is in no way affiliated with any librarian or library resources.
All right. And essentially, if we look at what's already being done in AI, we see that in the library. Well, we see agent technology to streamline digital searching and suggest articles. So if you've heard of, for example, you know, which is a discovery, essentially a discovery layer, it presents information in an entire new way and uses AI to search it and recommend articles that you might not have found using a more traditional catalog.
There's also converting conversational agents or chat bots that are using natural language processing. I'm sure we've probably all had an experience using a chat bot on the internet for virtual reference, maybe not in a library, although some libraries are using it. A lot of us sometimes go on commercial websites and get hit with that. Can I help you? Box of chat.
And they're very often AI driven. It has implications for digital libraries and information retrieval and also RFID tags in circulation for higher education. We're beginning to see a lot of Digital Tutors and online immersive learning environments, also programs and majors dedicated to the study across disciplines. So we're starting to see AI in law courses and medicine finance, for example.
And we see a lot of implication from students that do research in AI hubs. OK so now that we've taken a look at how voice assistants are being used, and whether or not AI is sort of being approached in the library, the next thing I want to do is bridge these two gaps, because where I think the library has a good chance of getting involved, especially on the ground for our users, is with information literacy.
You may already be familiar with the ACR framework for information literacy threshold concepts. So these are concepts that as liaison librarians, we often teach to our students in order to better their habits as researchers and to make sure that they're producing credible work and they're being involved in Scholarship and the process, and they have an understanding of how it works, where we think these voice assistants have the potential to perhaps disrupt this process is more specifically with the research as inquiry or the searching as strategic exploration, when we think it definitely has impacts on all six.
But these two in particular stood out to us. When we look at research as inquiry, they're identified practices and dispositions that each researcher should possess. And so we think of AI as being able to formulate questions based on information gaps. It's able to determine a scope. So like Bob had mentioned before, the Google algorithm was able to identify Harrison Ford's birthday.
So we would assume that a voice assistant is able to do the same based off the speech transcription. It can identify various methods. So the AI is probably capable of doing this, but one of the dispositions where I think we do have to start asking more questions is demonstrating intellectual humility, seeking multiple, multiple perspectives during information gathering.
If my voice assistant is going to be trained to understand me as a researcher, it might only seek sources that I've constantly referred to in my searching history, so I might not be seeing a wide perspective or the whole picture of what's going on there. And is the AI even capable of admitting when it has found something that is not accurate or that it does not quite know how to interpret something? Aside from telling me there are no search results, which I know is technically not accurate.
But but can the AI admit that in its process? And then again with searching as strategic exploration. So we want researchers to be able to identify who might produce certain types of information. We want them to be able to utilize divergent and convergent thinking, which is something that, you know, we expect AI to be able to do because it is artificial intelligence. But I think there's this human element where we're not quite ready to give up both of those, both of those thinking styles to a machine.
So there are a lot of questions to be asked of how AI is going to disrupt this process as well. There are there dispositions here about seeking guidance from experts. It doesn't just have to be librarians, but but other experts in the field. Are we expecting the AI to do these steps for us or is this something that we're going to kind of coexist and do together?
So I think when we have these larger conversations about what happens to information literacy, we need to really focus on aspects like this where it could definitely change the way that people conduct their research. If they are integrating these devices into the process. You might be thinking, well, why would I ever use Siri or Alexa or Google to do this kind of research? Well, we did a bit of background searching on this, and what we find is that.
In general, your personal habits, your personal search habits or anything like that. They are indicative of future research habits. So things that you like, the processes that you use to do things, whether it's searching through a book by hand or typing something out on a typewriter or writing it out in pen. These are things that we learn very early on and we take with us as we go throughout the research process.
So like I mentioned before, if we're raising a generation of future researchers who are already interacting with these devices at an early age, and if we're raising a society that is expected to interact with these devices, well, then I think it's pretty clear that the next step is to then bring this into the workforce or the research process into academia itself. So saying I'm going to ask Siri to help me conduct some research is not that far off.
And I think one of the good examples of this I mentioned before that you can actually upload new skills to Alexa. And one of the ones that was done was a project called archive. Ml and what this new skill does is it actually reads the 50 most recent machine learning papers from the archive repository, which is an open access repository of the top open access papers for machine learning.
So Alexa will start by reading out the titles to you. You can then tell Alexa that you want to go forward with the abstract. You can skip on to the next item. And where this goes next is asking Alexa to file something in zotero or EndNote or mendeley to download a PDF to mark up certain things to have Alexa almost read the entire paper for you if that's what you want to do.
And I think that's a very assistant based use of the technology. But we saw 80,000 skills uploaded to Alexa globally. I'm sure there's something out there where we could be using it for this type of semantic searching to cover all aspects of this research process. We can already use things like, you know, for augmented literature reviews, where we're the AI is actually conducting the lit review for you.
So I have two important questions that I want to leave for everyone here to think forward and to bring with them in the work that they do. And that's the first, is I prepared to allow researchers to continue their information literacy process. Are we expected to continue teaching these concepts and continue developing these standards if AI isn't going to make room for that? And then the second is, is AI even capable of being information literate?
Because if we're not teaching students and future researchers how to be information literate, is that assuming the AI is going to do all of these steps for them, is that assuming that the AI is going to take care of the framework for them and they'll just be expected to do it in the direction they want to go? So definitely things we should all be considering. All right.
So I'll be talking about the next steps of our research, because this is only a small part of what we're looking at and we're hoping to keep thinking about those questions that Amanda just asked and reflecting on them and looking into what the future of artificial intelligence means for libraries. So if we look at the entire plan, our first phase was really seeing what's out there. What are libraries doing with regards to artificial intelligence?
Our second phase is to send out a survey. And this will be done both in Canada and the US. So we'll send a survey to gather. What are the librarian perceptions with regards to ai? How do librarians feel about it? What do they think of it? And we aim to survey quite a lot of people, so please take a minute to do it when it becomes available. And then we've started engaging with phase three, which is really device testing.
So our first phase of device testing to see, well, OK, we've talked about the implications of Alexa and Siri and Google Voice assistants in the research process, but how accurate are they and how reliable? And then we'll do a similar thing with students. So we'll poll students to see how do they feel about ai? Is it something they would use in their research process or in the creation of scholarship? And then we'll do another round of device testing with students, both undergrads, graduate students from a variety of different fields.
And then we hope to bring all this together in an AI experience in our institution. So currently we are engaged in phase two. We are almost done with the research board ethics approval, and then we'll be able to send. Our survey, which will be distributed across the US and Canada and to all types of libraries. So public, academic special libraries will send it to a lot of listservs, so keep an out.
Then we hope to compile the results and present them at a conference and subsequently publish them in information sciences journal. And then, as I mentioned, we're already engaged with device testing. So we have compiled a list of reference questions which we've pulled either from our personal interactions and consults with students or from colleagues. Virtual reference questions that have come in, and we've used them to calibrate devices.
So for now we're working with apple, Siri and Google Home. We're going to expand that to Alexa at some point. And also expand our pool of questions. And we have built an evaluation matrix that's based on relevance, accuracy and authority of the responses. And also would the librarian either use this resource or recommend it to a student in good conscience? And then we are actually presenting a poster at ala on Sunday about our preliminary findings on this.
So if you're at ala, please come say Hello. We'd love to chat with you and we will publish our results in an information science journal. Like I mentioned, once we've expanded the pool of questions and also tested them on all three voice assistants. So thank you very much for listening to us today. Here are some concluding thoughts about AI in libraries. So more and more libraries and librarians need to be aware of the impact of artificial intelligence.
As we're seeing, it's not necessarily at the forefront of the conversations, but we need to begin planning for this next wave of technology and the implications it's going to have. And we also need to have conversations about how AI and voice assistants will alter our understanding of information literacy. So what is it going to mean for what we said, the framework, for example, and how will we teach those concepts?
But also this leads into creating maybe new standards that will better integrate data and digital literacy with information literacy skills and teach people how these devices work and what they should look out for if they're going to use them. All right. Yeah that is our presentation for you today. Thank you all for listening. And please let us know if you have any questions, we're happy to answer.
And of course, the slides will be shared and and you're welcome to email us at any time as well after the webinar. Thank you. Great Thank you, Amanda. Thank you, Sandy. Very quickly, where and when will you be presenting? Your poster will be next week, Sunday or no, this Sunday. This Sunday.
Wow it's creeping up fast at ala will be presenting the poster. Um, what time?
11: 30 yes, we are in the outreach group, so we'll be. Yeah Sunday, early afternoon around noon ish. Um, and Yeah. So please feel free to stop by. We will have candy and that's in. Is that part of the exhibit hall or is there another room? I believe it is part of the exhibit hall. According to the information I was sent. Yeah, I think that's where it is. First time attendees.
11: So we're very excited. Well, I hope to see you. I just wanted to give direct people to you. If if we easier to know when, where and when. Um, good luck with the poster. Thank you. I want to remind people to use the Q&A functionality to ask any questions if you have some.
11: One question and this kind of tee up the conversation with Scott and Sarah. Have you given any thought to the privacy questions if you're running all of this through either Amazon or Google, both of which are really well known for gathering up as much personal information as possible? In terms of the using those services in particular and any issues or concerns you might have with regard to privacy.
11: Yeah, privacy is definitely something that is on our minds at all time. We think the scope of our project right now, we're definitely more concerned with whether or not the devices can answer these questions. But I think where we play a huge role is communicating these privacy and ethics issues with our users. So if they want to participate in using devices like these, we want to make sure that they understand all of those privacy and surveillance implications and what might be collected about them.
11: I think there's definitely a wave in this new generation of wanting to know more about what is happening with their data and sort of unplugging from some of the stuff. So we think that we're going to have some good conversations going forward with the ethics of using these devices. And what's being collected about them. And we're also being mindful and thinking of ways to mitigate that, especially when we get to the point where we do student testing because it's definitely something that our research ethics board is going to ask about.
11: So we'll definitely be thinking about having devices that are provided for the students as opposed to using their own. Yeah, we don't want any students who contribute in our research. We don't want them to have to put their own profiles at risk for anything like this. If they're helping us sort of gather this type of data. We definitely want to be talking to them about that at all times.
11: OK all right. Fascinating project. And really interesting conversation. Just in the interest of time, I want to move on to our next speaker. But Amanda and sandy, I really appreciate your talk. Again, if anyone has any questions, you can feel free to type them in. And we'll circle back to all of our speakers if there are other questions.
11: So I want to move on to our final two speakers today. Scott young is user experience and assessment librarian at Montana State university, along with Sara mannheimer, who is assistant professor and data librarian, also at Montana State. They're going to be talking about the topic of privacy and analytics, especially as it relates to library services. So, Scott, I think you're kicking us off.
11: OK, great. You can see the slides and hear my voice. Yes, we can. Excellent thanks, Todd. And Thanks to the other panelists, to bob, Amanda and Sandy. It was really great to hear the projects that you're working on. I'm excited to see more.
11: So I'm Scott and assessment librarian at Montana State. And Sara is here presenting with us today too. And our presentation and our project is co-authored by Jason Clark, who is one of our colleagues at Montana State. He's not presenting with us today, but we want to recognize him here. So our project, achieving privacy in the age of analytics, skills, strategies and ethical approaches.
11: So where we're going today, we'll talk about a little bit of the background of our project, the creative process that we follow to produce some of our ideas and directions and then what we actually produced, where we're headed. And then finally, we've got some prompts for discussion. So our project background, the title of our project is called a national forum on web privacy and web analytics. The project's funded by the imls and our institution, Montana State University.
11: And so this was a national forum grant. Through the imls, we convened 40 librarians, technologists and privacy researchers on our campus at Montana State University last fall. Um, here were the project team me, Sara and Jason, Lisa hinchliff and then two staff members at Montana state, Jacqueline Frank and David Sweetman. They are awesome.
11: oh, here we are. This is a press release just to show you what we look like. And here we are at work and it's just kind of cool to have a press release. So there you go. Um, so these are the questions that we asked. Um, we wanted to critically address web analytics practices. That was kind of the focus of our project.
11: And we wanted to, with these 40 participants, develop a roadmap towards building a more privacy aware, values driven analytics practice in libraries. We have a project site that we invite you to visit. Libby montana.edu/privacy forum. We'll show this a couple more times throughout the presentation. This is sort of the front door to our project. We've got lots of information and resources there.
11: OK. So what do we do? Um, well, the participants pretty much drove this project. So here they all are. These are the 40 people that we brought to our campus. It was a great 2 and 1/2 days. Um, I'll move past this slide, but take a look at it again. All the participants were really great. We tried to get a cross-section across libraries and archives, but also pull a little outside of our discipline.
11: So we had, um, UX designer from Silicon Valley there, which was very interesting to hear his perspective, um, and various other people. So, um, leading up to our forum, we distributed a survey that asked participants to share their thoughts on privacy analytics, ethics of librarianship. we have shared the survey through zenodo, so there's a link there.
11: It'll be available through the SlideShare after the presentation, but go ahead and check that out. The major themes that develop from the survey, the things that the participants were interested that we think are representative of our profession, building partnerships and collaborations to support privacy and analytics, thinking about privacy, equity and justice, especially around vulnerable communities, developing policies and statements in support of privacy.
11: Having more practical guidelines to help librarians actually implement privacy focused analytics and then outreach and education models to help not only our user communities, but also the librarians ourselves understand how analytics work and some of the privacy incursions that occur as a result of using some of the leading analytics software like Google Analytics. And then that leads to analytics tools just to try to raise a better understanding around what is available out there, because there is a whole world beyond Google Analytics that we discovered.
11: So with this as sort of background, we brought everyone to our campus and we followed a participatory design approach where participants worked in small groups to conduct a series of activities. We did about 10 activities across three phases. We had like an inquiry phase where we just sort of established the landscape of privacy and analytics, some of the problem areas that we wanted to address.
11: Then we conducted another series of exercises to generate ideas, to help support a better analytics practice. And then ultimately we identified what are the most feasible and implementable ideas. So some of what help guide our practice, there was these three resources that we can totally recommend in terms of building more participation and idea generation. So gamestorming 75 tools for creative thinking and the design method toolkit.
11: So I'm just going to outline some of what these exercises look like. This one was called float your boat. A lot of the exercises use metaphor to help participants generate creative dialogue around questions. And so this one, float your boat. The boat represents privacy in libraries.
11: The sails represent things that help us, things that guide us. And then the anchors represent things that hold us back that are barriers. And so this is a great way to generate ideas, get evidence down, get participants talking about a topic. And so participants drew a boat. The boat represents privacy, education and engagement specifically. So we have the anchors and the sails.
11: Um, zooming in here. So some of the things that hold us back a little bit from our user communities is that sometimes users come into privacy with what our participants phrase is learned helplessness or even disengagement. And of course that's conditioned by, you know, Google and Amazon and others who just sort of create the conditions of our technological reality.
11: And then those companies, of course, prioritize convenience and time, which then, you know, lead us to make certain decisions. And so sometimes users have low tech literacy, technology or an overreliance on the tools or a lack of a holistic approach. Thinking about privacy. Um, sometimes our library administrators can also be kind of anchors can prevent privacy action, focusing on low costs over other benefits, sometimes lack of stakeholder knowledge.
11: Sometimes our library or campus leaders don't really appreciate the privacy values that we hold. Um, and then, of course, there's competing priorities. Privacy isn't always at the top of the list. Um, but to buoy us up, we have these great sales. Libraries are trusted. That's something for us to leverage. Um, we have really strong statements in support of privacy and ethics, the code of ethics, the ACM code of ethics.
11: We also have pretty strong awareness across our own profession. Um, there is some increasing money for collective action as identified by our participants. So in interest of time, I'm just going to skip past that really quickly. So project outcomes. So we did these exercises then what? What came of it?
11: Sorry thanks, Scott. So the outcomes of our. For were threefold. We had a white paper that the project team wrote an action. Handbook that really provided some hands on activities, actions forward, and then eight pathways to action, we're calling them. So those are based on ideas developed by participants at the form.
11: And you can access all of these outcomes on our website. Libby montana.edu/privacy forum. So I'll go through the outcomes one by one. The white paper is basically a detailed overview of the form and its outcomes. So this presentation gives the information of the white paper, but you're welcome to access it and read more. The action handbook. This is a handbook providing practical recommendations for implementing privacy oriented analytics practices.
11: So the handbook. And sorry. And it also provides technical and social action items. So it's like things that you can implement technologically, but then you can also train your staff. So I'll go into that in a little more detail. So the first thing we talk about are implementing Google Analytics in a way that can be privacy, a little more privacy oriented.
11: So many libraries have installed Google Analytics with the default configuration, which reminded us of this quote from danah Boyd that many online tools are public by default, private through effort. So unfortunately, this is one of those times where Google Analytics is public by default. But through a bit of effort, libraries can create an implementation instance that is a bit more privacy focused.
11: So the three things we identified is you can implement force SSL, anonymize, the IP addresses of the users, and then you could also sort of forego a full Google Analytics implementation and instead use a snippet like this minimal Google Analytics snippet that sends page views directly to Google Analytics. So you don't have to use the Google Tag Manager library, which links to a network of other trackers. So there's more information on the specifics of these ideas.
11: In the action handbook, I encourage you to take a look at it. The next thing we talk about is alternatives to Google Analytics. Altogether, there's several good alternatives matomo, countly, simple analytics, open web analytics, and then of course, your own server logs so that there's more detail in the action handbook on that as well. And then we talk about privacy, sorry, staff skills and competencies.
11: So this is the idea that there are things we can train our local staff on to help them better understand privacy and its implications. There's core concept understanding privacy vulnerabilities, auditing data, and then preparing data. For example, sodomising pseudonyms. Sorry, personally identifiable information. So creating fake names for real people.
11: And the next privacy indicators. So these are kind of our five main sort of guidelines in the handbook. Collect only data that you need for your use case support analytics, tools that allow retention and downloading of your own data in open formats, support analytics, tools that allow the setting of data retention strategy and enable the complete removal of data if necessary.
11: Implement tools that allow for pseudonymization anonymization, sorry, and the removal of personally identifiable information, and then implement analytics tools that have support for some of these emerging international privacy standards like the GDPR. Um, next we'll talk about pathways to action. So this is our participants in the form basically came up with a research agenda for the future to support privacy and analytics in libraries.
11: And so we came up with eight different ideas. There's a certification and an analytics dashboard leadership training modules, privacy for tribal organizations, model license. A Research Institute related to privacy. Privacy policy workshops. And then a privacy focused assessment toolkit. So I'll very quickly run through all of these. And there's more information on our website again, that you can read through a one page overview of these and see who the point person is for each of these projects.
11: So our idea is that anybody in the community will be able to reach out to either us or to these point people. And if you want to take on these projects or move forward with them, these projects belong to our community. So let's start with privacy certification. This is an idea of a certification system to establish a stratified data privacy standard for libraries and for our vendors.
11: So it's similar to the LEED certification where you have a certified silver, gold, platinum, and this will be a badge that you could add to your website that would show that you value privacy and that you have a commitment to it in these specific ways. The second pathway is this privacy focused analytic analytics dashboard. So it's an idea that we could create a simple lightweight analytics framework and dashboard that would only show the necessary data points, but would still sort of communicate our value and tell our story in ways that analytics are meant to do.
11: The third idea is leadership training. So this would be a module that can be implemented in bigger leadership training institutes like the Leaning Change Institute that just happened recently. And Yasmin shorish, mark matienzo and Sandy Thompson are the leads on this. And so we're going to be talking about it later this week. So this is one of the ones that sort of beginning to move forward.
11: Next tribe is privacy and tribal organizations. We had several participants at the form who worked at tribal colleges and universities. And so this pathway is thinking about how privacy and surveillance specifically affect tribal communities and tribal members and how tribal organizations can implement culturally appropriate web analytics and web privacy practices.
11: Uh, next model license. Lisa hinchliff has been working on this a little more. So this is another idea that's actively moving forward. But the idea is to equip libraries with model licensing languages, language so that they can use to promote patron privacy when they're negotiating with vendors.
11: Their privacy Research Institute is sort of like a think tank idea that we would bring researchers together from various institutions who would all come up with an evidence based advocacy plan, especially thinking about how to redefine metrics in a way that redefines success. Our last two pathways. First is a set of privacy policy workshops to help libraries develop implementable privacy policies that can help be easy to understand by users and easy to understand how easy to implement or straightforward to implement locally.
11: And then last, the privacy focused assessment toolkit, which would be a toolkit that provides tools. And best practices for implementing privacy aware and user conscience assessment. So that we as libraries can show our value, understand our users, but in a way that's privacy focused. OK so where are we headed from here?
11: so our project team wants to facilitate the realization of one or more of the pathways that Sarah just outlined. And we want to iframe this as a community effort to achieve our community goals of achieving privacy. And so the point of the form was to develop some practical steps that librarians can use to implement a better analytics practice. That's the action handbook.
11: Then the white paper and the pathways to action are the sort of bigger vision part of the project where we're looking for wider participation. So again, we'll ask you to visit our project site, Libby montana.edu/privacy forum. More information about all of this work is available there. We also have a project site on OSF. So please go ahead and check that out.
11: OSF IoT NFPA. Uh, here's what our project site looks like. So this is how you'll be greeted. We have links into the White paper and the action handbook, and then we have overviews of each of the pathways. So please visit our website and take a look at the pathways in the action handbook and consider what it looks like to take action in your context. And if you're feeling motivated or if you're available to work with us to move any of these ideas forward, please contact us.
11: We want to keep it going. Um, we also Sarah will be present at this, this, this Saturday, June 22nd, from 1 to 2 PM in the convention center in room 143b so Sarah will be talking about this project, networking with the participants and the attendees at Ayla. So if you're there and want to talk more with us, please consider going into the session as well. Um, so we'll say thank you.
11: Thank you for joining us today. And we have some discussion prompts to help facilitate some thinking around these questions. So some of the prompts, um, include what pathways look most promising. Is there anything you saw there that you really see motivation towards? You're like, Oh yeah, that's, that's a good one. Um, but then also what are the, what are the barriers to privacy action in your local context and also some of the wider forces that, that, that put pressure on us either from our parent institutions or even just wider national cultures?
11: Um, and then how could you see yourself or your organization getting involved further? Um, so yeah, those are, those are the question prompts we have. And I think we have a few minutes, so Thanks. Thanks, Scott. Sarah first of all, great, great effort. Nitti Lagasse, NASA's associate director, was a participant in the form.
11: So thank you for including her and for including niso in this process. We can use the. There's a Raise Hand functionality on the if you go under the participants tab, I can pull this up. Can you back up to the pathways? So I think.
11: So let's see if you are interested in the first privacy certification. If you want to just raise your hand and I can count how many people have raised their hand. Looks like three. Not maybe not everyone can raise their hand, but at least three.
11: We're now up to four analytics dashboard. dad asked again about the analytics dashboard. OK, so analytics dashboard, I think we needed to clear off the people who had previously raised their hand. Exactly exactly.
11: A couple to the next one. Leadership training. When I clear. That off. It's clear. Four OK. Next one.
11: So tribal organizations and tribal issues. only one. The next one to. A model licensing. Uh, this one's getting a little bit more. This is up to six.
11: 7 and then next one, that one off Research Institute. No, that's too. OK well, that gives you some sense. Just a very impromptu. Oh, sorry. Policy policy workshops.
11: Last one. One more extra after this, actually. Sorry Uh, we got for this one. And then the last five, six. And then the last one is an assessment tool kit.
11: And it looks like also six. So a fair mix across the table. But that gives you a little bit of impromptu data. Scott Sarah. Yeah, that's great to see the people speak, the assessment toolkit and the model license. Those are ideas we're very excited about too. Um, one question that I've had, and this has come up in the context of other projects.
11: Um, how have you seen getting buy in on library values outside of a library? So for example, in the, in the IT departments or in the administration who might have. Maybe more. Let's think this in a positive way. They have more kind of openness to sharing information and see benefits about opening and sharing information, maybe about student assessment or other information across the institution.
11: What kind of conversations do you think libraries need to have with outside of the library in their institutions about some of these issues? I think that's a really great question. Um, that question is partly what motivated the idea for the Research Institute. Um, because it is a challenge for libraries to bring our values forward because we can be dismissed as sort of just looking inward essentially.
11: Um, and so other disciplines or other organizations that don't share our values, it can be difficult to, to get those other people to act from our perspective. So I think something that is a path forward is translating our own professional values into advantages for, for our common goals, which is, I would say, focus on the user experience. Um, you know, so when we talk about access or privacy or intellectual freedom, those can be a little abstract on their own.
11: And a little sort of inward focused, which is good for us. But then we need to think, do the additional step of translating them for like wins. And if we can show people or, you know, library administrators that users actually do care about these things. And there's a better user experience as a result of some of these measures. There's a great funded project going right now involving Kyle Jones and Lisa hinchliff and others.
11: The data doubles project, um, where they're doing interviews and surveys with students to find out what student perspectives are on privacy or on data collection. And so this will give us the evidence from the user perspective to help make arguments in support of privacy. I do think that in some ways values aren't enough. They're totally essential, but they're not.
11: The full picture. So combining our own values with user experience evidence. Think is really can be effective. All right. Well, thank you so much, Scott. Sarah, I really appreciate your talk. Good luck with the conversation this weekend.
11: I know that we are a little bit over time, so we'll draw today's session to a close. I think this has been a fantastic mini series, a two part series. Thank you all for joining us. When you close out the session today, you'll be presented with a short survey. Please let us know how we did today. Always appreciate the feedback.
11: We are in the process of putting together ideas for our 2020 programs, so if you have other ideas on suggestions for future programs, there's a place where you can fill that in that little survey. So with that, I want to thank everyone again, remind people to if they have are going to be in Washington to join us at one of the sessions where either one of our speakers today will be or one of the NISO sessions.
11: Hopefully, you can put that on your plan and agenda for Allah. And with that, I want to remind people, we do take July off, but we will be back with you in August for our fall webinar series. So I hope everyone has a great summer. And thank you very much for joining us and have a great afternoon. Thanks, everyone. Bye bye.
11: