Name: Unanticipated Metadata in the Age of the Net & the Age of AI
Uploaded: 2024-03-06T00:00:00.0000000
Duration: T01H01M16S
Description: Unanticipated Metadata in the Age of the Net & the Age of AI

Name: Unanticipated Metadata in the Age of the Net & the Age of AI

Description: Unanticipated Metadata in the Age of the Net & the Age of AI

Thumbnail URL: https://cadmoremediastorage.blob.core.windows.net/605154da-8f8d-4b27-8c34-6cd90e5fa6a6/videoscrubberimages/Scrubber_6.jpg

Duration: T01H01M16S

Embed URL: https://stream.cadmore.media/player/605154da-8f8d-4b27-8c34-6cd90e5fa6a6

Content URL: https://cadmoreoriginalmedia.blob.core.windows.net/605154da-8f8d-4b27-8c34-6cd90e5fa6a6/Unanticipated Metadata in the Age of the Net the Age of AI-.mp4?sv=2019-02-02&sr=c&sig=FIGgCH3rY%2FKohQKcOcBBBNrUa79GtWYSNMhPPeBFyjc%3D&st=2025-01-22T04%3A15%3A48Z&se=2025-01-22T06%3A20%3A48Z&sp=r

Upload Date: 2024-03-06T00:00:00.0000000

Transcript: Language: EN.
Segment:0 .
ROBERT WHEELER: We're very happy to once again be sponsoring the NISO Plus conference.
ROBERT WHEELER: It is a great pleasure to introduce the opening keynote speaker, Dr. David Weinberger. Dr. Weinberger is an American author and technologist with a PhD in philosophy. In his five books and countless posts and articles in wired, scientific American, Harvard Business review, CNN and many more, he's explored the effect of the internet and AI on knowledge, on how we organize ideas, on the disruptive architecture of the web, and on the core concepts which we think about our world.
ROBERT WHEELER: Dr. Weinberger has been deeply affiliated with Harvard Berkman Klein Center since the early 2000. He was co-director of Harvard Library Innovation Lab from 2010 to 2014, where his personal project was library cloud and open API for Harvard libraries and on private data and functionality. He has also been a journalism fellow at Harvard's Shorenstein Center and advisor to high tech companies and to presidential campaigns, and a Franklin fellow at the US State Department.
ROBERT WHEELER: He also currently edits an open book series for MIT Press about the effects of digital tech. By way of disclosure, David asked me to note that he is currently an independent, part time contributor to Google's moral imagination group, but obviously does not speak for Google in any regard. Dr. Weinberger's talk is entitled unanticipated metadata in the age of the net and the age of AI.
Segment:1 David Weinberger.
DAVID WEINBERGER: Thanks for having me. My topic is going to be unanticipated metadata, and I'm going to talk about both the how that applies to the net and to the age of AI and if there's one thing we nobody here needs any convincing of is that in the age of the internet, we have been we've had an abundance of abundances of abundances in terms of information available. So our little murder board that you just saw that has clues and scraps of information, just multiply that by gazillions.
DAVID WEINBERGER: And that's what we face every day. All of us, thankfully, we have metadata. In this case, it's the pins in these items that allow us to they serve as anchors for the sense making thread that ties these events together in ways that we think are appropriate. And that's essential. That's great. So I ran this presentation past Jason and Griffey, of course, and he said, I wonder if the string is actually metadata too.
DAVID WEINBERGER: And at first I said, no, no, no, of course not. It's this. It's the pins that enable as metadata the story to be connected. But since he suggested that, I've thought about it and I think he's actually on to something which we will get to towards the end, so everybody here knows why metadata is important. So I read through this.
DAVID WEINBERGER: It makes things fine to what makes things interoperable, which is increasingly important in the highly connected digital age. It lets us make sense of things, and that's the strings in this case. And the aim of metadata is not to show us what matters to us, but by looking at metadata, you can see what matters to us in various ways. And so a very traditional sort of example is the way in which what at the time seemed like simple and obvious decisions about what metadata to collect on lots of different sorts of forms, et cetera.
DAVID WEINBERGER: Over the course of since forms were invented, we frequently, frequently thought that we need it would be useful to know the gender of the applicant or person filling it out. But we assumed that that was two choices and that tells us about what we care about that. First of all, in these cases, gender matters for a good or bad reason, but more interestingly, that it matters to us that there are two and only two genders.
DAVID WEINBERGER: And over the past few years, after a lot of struggle, decades and more, we finally got we recognized that something else mattered as well, and no other categories of people matter as well. And that tells us something important just as metadata for one thing. One thing it tells us is that. We've had a lot of bias. But another thing it tells us is that.
DAVID WEINBERGER: I want to be careful here, but there's a sense in which culture is bias. It doesn't have to be damaging, pernicious bias that hurts people. But culture consists of the things that matter to us, and that's a type of bias. It can be elevating, it can be diminishing. But there is a sense in which culture consists of these sorts of.
DAVID WEINBERGER: David held and often unthought. Ideas about how the world goes together. What matters to us. So one of the consequences of this and of some other factors is that designing metadata has been a matter of anticipating what it's going to be used for and what matters to those applications. It's all it's been about anticipating. And people who deal with metadata get very, very good at anticipating what's needed.
DAVID WEINBERGER: But this has been part of our species oldest strategy, actually, a strategy of strategies in which we try to anticipate and prepare for what's going to happen. It's our way of dealing with the uncertainty of the future and has been at least since the Paleolithic era, which is when these Flint rocks used for arrows or axes were created. You know, our ancestors and long ago thought that we might be going on a hunter, maybe animals.
DAVID WEINBERGER: Tomorrow we better make some more Spears. We anticipate and we prepare. And this is deeply baked into our culture. We pay a tremendous price for this strategy because you can miss, prepare or under prepare or overprepare, and there's a price for each of those. But it is a fundamental strategy that we are never we're never going to give up. We're going to look both ways before crossing, no matter what.
DAVID WEINBERGER: You can see a case of how well this works in the modern era, relatively modern. In the case of Henry Ford and the model t, which he took aside in 1908, he took aside a handful of engineers, locked them in a room with him. And Ford anticipated what the market was going to want and in a car. And as a result, they came out, they launched the Model T 19 years later, they had sold 15 million copies.
DAVID WEINBERGER: Or I guess they're not copies because they're not internet thingies, but you know what I mean? They sold 15 million of them basically without changing a single thing, because Henry Ford was a genius at anticipating the needs and desires of markets. He was also a pro-nazi anti-semite, but he was really good as a marketer, as a product designer. So this is just one really clear example of how anticipating and preparing works in a relatively complex environment.
DAVID WEINBERGER: But we can see how quickly new possibilities have emerged rather than that strategy. You can see it, for example, in the widespread use these days on the internet of launching products as minimal viable products. This idea was conceived in the early 2000s. And I'll just give you one example. Dropbox locks that way, which meant that it launched a product with the minimal set of features.
DAVID WEINBERGER: If the company thought the market would buy the sort of core feature, which is SeamlessAccess cloud up and down for dropbox, they had that nailed and they launched simply, pretty much simply with that product. And then they watched what users did with it, what users wanted from it, other changes in the environment they needed to adopt to. And they ended up these days with quite a full, fully featured product.
DAVID WEINBERGER: Launching with an MVP minimizes risk rather than you anticipating and possibly not thinking of what the market would like. You let the market tell you another, I think, important sort of example, which is the iPhone, which when it launched had basically no features that had not been in prior phones. The thing that it had that really made it distinctive was the Store.
DAVID WEINBERGER: And the Store is Apple saying perhaps not out loud, that there's so many things you can do with this product. We can't possibly imagine them all. And if we could, we couldn't possibly build them all. So we're going to be really niche products, which may be hugely important for the niche that it serves. And others just going to be, we can't do it, we just one company. So create the Store and the environment in which developers can create things for the app store, run it on the iPhone, and before you know it, there are two million apps and more.
DAVID WEINBERGER: And so this tool now takes on every time somebody writes an app for it, it gains more value. Slack did very much the same sort of thing, but they did it through an API that allowed anybody to add some features to the tool to integrate it with other apps and in particular, to integrate it into particular workflows that are using these apps which make it way more valuable. In fact, slack kick started this with an $80 million fund to support this sort of development by people who do not work for slack, not anticipating leaving room for all the unanticipated, I should say.
DAVID WEINBERGER: Which in some way is the originator of this approach in software, one could say were early video games because many of them and I mean, we're going back now to early Wolfenstein and doom and that sort of thing would allow you allow users to extend them to change the graphics, to add maps and levels, to turn half-life three into a physics simulator, not even a game. Minecraft is incredibly popular.
DAVID WEINBERGER: 200 million copies sold. They allow these mods as they're known. Modifications over 50,000 have been made and each when x tends the value of minecraft, sometimes a little. Sometimes I guess that makes it worse. But many of them are increased the value of the game to its users. So there's a long history of doing this.
DAVID WEINBERGER: And it's not just in software, it's open source is a way open source and Creative Commons open access licensing opens up a reserve. It enables a reserve, a resource to be compiled, sort of crowd crowdsourced to be used for unanticipated purposes. The 100 million GitHub. Old projects that are up.
DAVID WEINBERGER: Not all of them are open source, but a lot of them are open source clearly is a form of anticipation. I'll make this, but you can use it any way you want. If you agree with the license. So this is a characteristic of the internet. It's really it's not unique. But to have a platform that encourages this so much, so widely is new, and to have it be such an important part of our culture.
DAVID WEINBERGER: So we have done something remarkable after over a very long period of time, we've added to the most fundamental strategy of strategies, which is to go from anticipation and preparing to anticipation and encouraging the unpredictable. It's as if we. We, we, in fact. I'm sorry.
DAVID WEINBERGER: In fact, the very structure of the internet is designed to support uses that the designers of the internet all these many decades ago knew that they could not anticipate in the United States. That's what net neutrality is about keeping the internet open to all possibilities and fair to all possibilities. Other countries don't have that concept because they don't need it because they already have it, and maybe we will have it and have it again sometime before too long.
DAVID WEINBERGER: A neutral net. So it seems as if. We've spent 20 years making the world more unpredictable on purpose, because that also means making the world more possible in the world and making more metadata because it's metadata that increases possibilities. And so in this new environment, we want metadata. I think all of us want metadata to be more abundant because that enables more possibilities, more interoperable excuse me, more interoperable because that extends functionality and new products and new types of intelligence were shared sharing data as well as functionality.
DAVID WEINBERGER: And we want to do this in support of an unanticipated one. This means a change that we're already seeing in the nature of metadata itself. In fact, we've already experienced Thanks to the internet and other digital networks, but really the internet, a pretty fundamental addition to the nature of metadata. So you can think of it like this.
DAVID WEINBERGER: We've had metadata of the first kind in which the metadata is attached to the object. I mean, it's a label that's great for lots of reasons. It's also pretty limiting since it limits the amount of data and also that you can use the spine of a book, for example, or the label of a pickle jar is pretty small. And it also it's designed for use in a system where the objects themselves are physical and is the metadata, which means that they can only be put in one place at a time.
DAVID WEINBERGER: There's no multiple, multiple shelving of a single book to get it into multiple categories. So it reinforces a sort of rigidity of thinking about the nature of things in which we have thought that. Do you know what a thing is, is to know its essence, and it only has one essence. We get the west, we get this from the Greeks. But it has continued for 2,400 years or so in metadata.
DAVID WEINBERGER: The second kind, which is also ancient, we realized that we can separate the metadata from the object, make a new physical object. It has capacity for more information than the object does frequently, and we can have in the case of books, we can have three or four different way or more ways of organizing the data. At the same time, we can have a combined catalog that lets you look by author, by title, by subject, and you can add some categories to that as well.
DAVID WEINBERGER: Not, not a lot, but and that's a big breakthrough that really enables new possibilities for finding and putting together knowledge. In the age of the internet, we get a third kind in which everything is digital, the metadata is digital, the content is digital. And this enables the rise of new types of metadata, which has been very exciting. For one thing, you have an unlimited amount of metadata because the internet, in practical terms, is as infinite capacity.
DAVID WEINBERGER: I know that doesn't make sense. You know what I mean? You can start to use social metrics as a type of metadata, a user tag. You allow users to attach tags and you can have as many different sorts of tags or different categories as you want. You can federate this. It doesn't all have to be controlled by a single entity.
DAVID WEINBERGER: And you can generate folk economies taxonomies that arise from observing what tags users attach to things and many more possibilities than we've seen. An enormous growth in the metadata uses and our understanding of metadata over the past 20, 25 years. And as a result, we are in a world and have been for a while in which we can do searches like these.
DAVID WEINBERGER: We can say, who wrote Moby dick? And it will tell us. Herman Melville. No surprise there. We can go. Go same search site. We can say.
DAVID WEINBERGER: I can't see what it says. What's the book that Melville wrote? And it will say about whales, it will tell us it's Moby Dick. We can ask it. There's a phrase. Call me Ishmael and misspell it, and we can ask the search engine where it came from.
DAVID WEINBERGER: And it will figure it out. Even even though we misspelled it, we'll correct our spelling and we can even ask it. Things like what's the name of that author who lived in Pittsfield, Massachusetts, and wrote about a whale? Maybe and it will tell us. The point is that in the current internet world, everything is metadata.
DAVID WEINBERGER: Even even the address of the author of a book is now metadata. Who was the who was the author who went up the top of monument mountain in the Berkshires and had a picnic with another author? Or with Hawthorn, if you want to be more specific and it knows that too, that is metadata. What isn't metadata. Nothing isn't metadata.
DAVID WEINBERGER: Everything is metadata, which illustrates what I think we've known all along, although practically we haven't been able to behave this way at all. He got into this more fluid environment, that there is no function, there is only a functional difference between data and metadata. It turns out that metadata is what and data is what you're looking for. And that now can be those terms can change all around.
DAVID WEINBERGER: That's the difference. Which means that we have unbound metadata now. We have undone the bindings by no longer having to anticipate what we think, how we think people will look for things, how they put things together. It is now an era of metadata that doesn't have to always anticipate what users are going to want to look via.
DAVID WEINBERGER: So that complicates things. But that's nothing because machine learning has just come along to complicate everything. I'm going to use AI and machine learning as synonyms, by the way. So I want to talk about pardon me. I want to talk about four uses of AI as a metadata tool. And this a little bit more time on the first one.
DAVID WEINBERGER: But I'm going to go pretty quickly through these. And in the first case, we can use it to generate the sort of structured data that we traditionally have wanted and which is still often quite very useful. Google Books didn't went this route when it launched it, when it launched, in order to provide the standard sort of metadata that users want without having to use the carefully, painfully and highly reliable metadata that librarians and other catalogs have put together over the course of generations, carefully checked.
DAVID WEINBERGER: Instead, they Google said, well, no, we'll just read, have our machines read the books, and it'll be able to figure out what the title in the author is and which edition or all the rest of it. And it worked OK, but not great. And there's tons of errors in Google, Google books, metadata. I mean, they try to fix it, but you still come across errors. So this may not always work. It may work well enough.
DAVID WEINBERGER: It's a judgment call. Here's an example of where it worked. But it's a much smaller example. It's extremely recent, though. So Matt Webb is a fan of the podcast in our time, hosted by Melvyn Bragg on the BBC for many, many years weekly show, and he wanted to automatically generate an index of them, the service to the public.
DAVID WEINBERGER: So he scraped the site, in our time site to get the list of all the podcasts. And then he ran that through GPT. Which every talk is now obliged to mention at some point, and I'm going to talk about it more actually later. He prompted GPT to generate a Dewey decimal category for each of the talks or set of categories. And it did and it worked just as it should.
DAVID WEINBERGER: According to Matlab, it was actually quite, quite accurate. It's also sort of lower stakes than many other situations. But yeah, I mean, that worked and that was a single person's sort of hobby. Let me give this a try sort of thing. So that's. That's a very positive case. But there are going to be many, many cases of hallucinations in which is a technical term, seriously as a technical term for when these large language models of GPT cat GPT uses the GPT large language model.
DAVID WEINBERGER: These language models tend to make things up, which is becoming quite well recognised, which is a very healthy thing and we recognize that it makes things up because it knows nothing about the world. It only knows what we have said about the world and we have said many, many things about the world. And Furthermore it doesn't. It decomposes sentences into words, not always. Sometimes those word groups as well.
DAVID WEINBERGER: It's very complicated and looks for the signs patterns among all the words that it's found on the internet and other sources looking for the probabilities statistically derived of, if for any word, the likelihood that it follows or is near other words. Some words are wildly different. They're rarely used together. Others are very frequently used together.
DAVID WEINBERGER: But as you can imagine, it's immensely complex statistical algorithmic analysis. And so it comes up with it's amazingly good, it's shockingly stupendously good, this chat GPT stuff, but because it doesn't actually have any way of assessing whether what it's saying is true, only if it's something that somebody if it's words that probably would go together based upon prior usage, it can easily hallucinate.
DAVID WEINBERGER: And you come across this. I do anyway pretty frequently where it's doing a great job writing an essay on something for me that I asked it to. And it say casually something like. When Twain was writing the bible, he found and said. Where did that come from? Well, it was an hallucination. So that's a pretty big issue to worry about and not just within the case of metadata or structured metadata.
DAVID WEINBERGER: So another way in which I may help with metadata is expanding classifications. And I think this is fascinating and potentially really, really useful as well as potentially really frivolous depending on what you're trying to do. So we have gotten better at letting objects, including books, be classified in multiple ways. Tagging does that and metadata schemes that are more open ended enable that as well?
DAVID WEINBERGER: Machine learning can do that for us. We can ask for a category and have it pull it together, or it may automatically cluster objects in ways that we would not have expected. So some of these examples that you're looking at are frivolous, but frivolity is worth something, too. So you could. Ask, ask GP.
DAVID WEINBERGER: Or another chat bot that's built upon a large language model to find a book that's similar but in a different field, in a wildly different field. And it will do its best to do that. And it may be helpful, may not be maybe a waste of time and may bring insight. You can you could ask, for example, for the same idea in a different field or the work that most disagrees with the original with the sample works theses what it's saying you could ask it to generate a sticker version like a bumper sticker version of a book and then ask it to find other books in any field that are summarized the same way by machine learning, and that might turn off interesting things or be a complete waste of time.
DAVID WEINBERGER: You could ask it to find the book that is maximally opposite, not just in terms of the argument of work that disagrees, but these large languages, these large language models are deeply, highly multi-dimensional. The ways in which thing there are so many different ways that things relate that finding the opposite in some algorithmically derived sense of opposite and maximally might give interesting results or might be a complete waste of time.
DAVID WEINBERGER: And similarly, in terms of waste of time, if you're reading, you've really liked the tone of a voice and you want to want to cook dinner while you're reading it. Stupid example, but what the heck? You could ask it to find works that are like the one that you are reading but or QuickBooks and who knows? In a highly dimensional model, lots of things are possible. A possibly. So I actually think there are serious uses of that capability.
DAVID WEINBERGER: But I'm going to move on to one where I think the seriousness is more obvious, which is non-binary inclusion, by which I do not mean the gender issue at all. What I mean is if we use machine learning to categorize in some sense or work back to Moby Dick here, we could also ask it to tell us what competence, competence level the AI has, that it is a member of that category because you get that more or less for free in machine learning, because when it comes up with result, it does have a confidence level, usually conventionally measured between 0 and 1 about how likely its prediction is accurate.
DAVID WEINBERGER: And so, you know. If you have a generate a set of categories or you ask it about some or some of them it's likely to be very confident about. And I'm making all of this up. I haven't done this experiment. But so it's going to be I'm going to guess, really pretty confident that Moby Dick counts as literature and is fiction.
DAVID WEINBERGER: Is it about environmentalism? Should it be categorized under that? I mean, a lot of readers today think that it is maybe as a counterexample, so I don't know. It will give us some confidence level. Is it about Maritime law? I don't think so, but it sort of is in there, maybe by implication.
DAVID WEINBERGER: Is it a romantic comedy? You know, that's a 0.0.02 or lower. Again, I'm making this up and it's pretty clearly it's not a romantic comedy about maybe there are ways in which it is that could be actually really helpful, I think. So rather than just the category, also how in the category we think it is. And the fourth is the one that I want to talk about and we'll end with, which is a possible use of AI in the world metadata, which is what I'm going to call generative metadata.
DAVID WEINBERGER: So I went back, GPT was too busy to take inquiries, so I went to a competitor called cat sonic and I prompted it with this question. When you see before you, Dante's Inferno has three levels. What are 10 artworks that also show something in 3 parts? And it came up with a couple of seconds later, it had come up with pretty good set of responses which. Actually he was only seven. And one of them repeats.
DAVID WEINBERGER: So six. But six. Pretty good answers. We are going to rerun that in case you missed it the first time. Actually, I think I made the same mistake. He's going to go to the next slide.
DAVID WEINBERGER: No OK. Thank you. Good job. You know, some of the examples right away you look at and say, yeah, garden of Earthly Delights. Yes in fact, that's also got a healthy. So it's a triptych. So Yeah. Good catch.
DAVID WEINBERGER: So others are interesting and some of them seem pretty stretched, like Starry Night by Vincent van Gogh or van Gogh. Sorry, my Dutch accent is not good, which it says this painting is divided into three distinct parts the lower portion, the middle portion, and the uppermost portion each with its own recognizable elements in the images, seems like a pretty loose form of response to the question that I asked.
DAVID WEINBERGER: On the other hand, maybe I'll look at it and think, oh, it's now showing me something I hadn't noticed. It may be useful in some case, getting that sort of response is interesting. In fact, I think it's a way of thinking about a fourth kind of metadata. It's a metadata that creates its own data. That text that I showed you that it generated, it did not exist before.
DAVID WEINBERGER: That was not copy and paste. Large, large language models do not do the copy pasting they are generating because they've dissolved all the words that they're trained on it into soup. It's not copying paste, it's new content, it's new data. And so this is fourth, fourth kind of metadata. Generates new stuff.
DAVID WEINBERGER: It does it on the basis of word relationships, which I still maintain are type of metadata that a word is some distance from. Some other word is metadata about that word. It's not data, it's not the word, it's information about the data. So I'm going to maintain that it is metadata, those word relationships, and it produces something that is new on the earth, possibly because it's wrong, but also because it's been generated by him, by a system, by a machine learning system.
DAVID WEINBERGER: This type of fourth generation, fourth kind of metadata has some serious issues, worrying issues. The first is reliability, as we've discussed it. The second is. Well, if one believes you don't have to be post-modern to believe that everything that we express comes from a point of view. And I think that's certainly the case. I thoroughly believe that.
DAVID WEINBERGER: And I think that it's true for machine learning as well, although the point of view is a default point of view, that is the status quo as defined by the assemblage of all this text from the cultures it was assembled from, in which sources are given priority and weight and the like. By the way, I want to be clear, the sources that are given weight are things like stuff from Wikipedia, not individual articles, as opposed to stuff from the internet.
DAVID WEINBERGER: The responses from chat, GPT and other chat API things may reflect the bias of the status quo. As a default. And there are good things about that and bad things. The good thing is that we need stable, agreed upon knowledge. The bad thing is the status quo is often, one might say, inevitably determined by privileged folks who get to do more of the quo in the status quo.
DAVID WEINBERGER: That's not a proper use of language. And related to that. These are expressions that are being generated by the large language models by check keeping key. They speak in a voice because just as everything has to happen, everything has a point of view. Everything every communication has a voice, even if it's the default voice of a monotone of a telegram or something.
DAVID WEINBERGER: These these systems have a voice. But whose voice is it? It's very bland. It's very average. It's confident generally. Not always. It's speaks with a confidence and an authority. And I think often comes across as smug. It's a type of bottle splaining that.
DAVID WEINBERGER: Is often actually an expression of power. Even though these are machines and must have different power interests than we do because they have none, because they're just machines. They express a type of power, or they also reinforce a form of speaking that is typical of a confident, empowered class of people. In fact, if you ask me if you forced me to guess, I would say or I would say that it.
DAVID WEINBERGER: Jack Beatty sounds like a guy to me. I mean, it's not mansplaining, it's not complaining, but it's. And I know that's a dumb thing to say, but it's not an entirely dumb thing to say. Maybe this. And because these large language models are unbelievably expensive, they require so much computing power that they only so far can only be done by large corporations who have the scale of equipment required.
DAVID WEINBERGER: They're very expensive to make. And that puts tremendous power in the hands of further tremendous power in the hands of large corporations and organizations. So those are issues. The overall issue is not necessarily a negative thing. These new ways of generating metadata further confound metadata.
DAVID WEINBERGER: Metadata has been getting more and more complex, and that's good. This further confounds it. In fact, it goes from complicated to being fairly complex and even chaotic because the black boxes that are machine learning systems that are filled with analysis and patterns in multi-dimensional arrays, vectors that often we cannot follow, we, we can't understand it.
DAVID WEINBERGER: That's that's a there are issues with that for sure. There are deep and important issues with it, but it is sort of what happens when you go from complexity to anticipation, when you open your arms to that complexity to see what you can get that you hadn't even imagined. What is newly made, newly possible by the complexity that is hidden inside of these boxes?
DAVID WEINBERGER: And this is a tremendous opportunity, both well, commercially, sure. But also it enables enables applications to be more user driven, more contextually aware about the user and about events and things in the world, more connected, more connectivity and more transparent about its ground. Because if we ask why, why does machine learning tend towards being towards being a black box?
DAVID WEINBERGER: Not always. Lots of work being done to open up the black boxes, but the natural default state. If we didn't ask anything of machine learning, it would go ahead and manage all the data without a care in the world about whether we can understand it. In many instances, we would not be able to understand it. So why does that happen?
DAVID WEINBERGER: It's not because machine learning has it in for us. It's because it's the data that's going in is reflective of the reflective of the world. One way or another. And the world is the real black box. It's the world that's so complex. That's why machine learning is able to do what it does. It absorbs more of the complexity, does not require simplification of the data going in.
DAVID WEINBERGER: You just give it the data and it figures out the patterns without us telling it what to look for, what we think the patterns are. And it will find minute patterns that, when added up with other patterns and so forth, will present a result. That's because that's how the world works. That is, the world is infinitely complex. Again, I miss using infinite, but you understand the complexity of the world.
DAVID WEINBERGER: The chaotic nature of the world is surpasses human intelligence as much as anything can surpass human intelligence. That's why machine learning works. Because it's able to accept that complexity. And so I hope anyway that it becomes more apparent that the ground of the success and complexity of machine learning and the metadata that can take more advantage of it now by becoming more complex, the ground of that is the world itself is a universe that is itself infinitely dimensional, dimensional, beyond counting, that is so constantly interconnected that we are lucky we are able to walk on the surface of a planet is that's the truth about the universe, which I think we all know what we managed to hide from it by having the sense that we.
DAVID WEINBERGER: Yeah, that's true but let's anticipating let's prepare and that strategy works really well. It's gotten us this far, but we can go further if we are able to acknowledge the truth about the world, which is it is complex, beyond prediction, beyond understanding. And now we have a technology that lets us take more advantage of that very fact.
DAVID WEINBERGER: Thank you. Thank you, David. That was a wonderful, wonderful conversation. A wonderful presentation. So many, so many pieces and threads to pull from this. Your your murder board at the very beginning, I'm going to try to pull a few of those threads here with just a quick couple of questions from the chat.
DAVID WEINBERGER: We had one person ask, if you were to give this talk five years from now, would we still be considering metadata or will machine learning and other AI technology have done away with sort of that or concept of creating metadata? Do you think we're going to reach a point where the sort of curation part is overwhelmed by the complexity? So I don't know what will happen five years having just ended.
DAVID WEINBERGER: On the note of the wild complexity of this. I have to say that first, but that's not going to stop me from attempting an answer. I have little confidence in it, of course. I think there are plenty of uses where we want. Structured metadata that put together, or at least carefully vetted by humans. One one of the very common, I think, modes of interacting with machine learning is already but increasingly will be, you know, you can take a rough cut at it, machine learning, but this is important enough that we need to get some human logs on it.
DAVID WEINBERGER: And some of it's going to be so important that we're going to let it go forward without human eyes because we can't figure out how it's doing its work. And even though it makes mistakes, it's still doing better than humans. Or I'm thinking in particular in health fields where diagnoses may actually not be fully explicable, although lots of progress is being made in this field. But we learn empirically that, yeah, it's generally right when it predicts you have this and we should do something about it because it's serious, even though I can't tell you why the machine thinks you have it.
DAVID WEINBERGER: So there will be those cases as well, and it's going to come down to the utility of the risk and the utility and the benefit of the predictions, I think. One of the ways of thinking, one of the things that might happen is we've been making progress over the past 10 years, 15 years in moving from. Relatively relatively non complex metadata to building ontologies.
DAVID WEINBERGER: In which many more elements. Attributes of some domain can be connected and connected multiply and the linked data standard generated some apps that take advantage of it and that generated interest in ontologies and a lot of very useful work being done by humans. That's a place where it may well be that a machine learning system can generate an ontology that humans can read and inspect.
DAVID WEINBERGER: I would expect that it would be way more complicated ones than that. We draw would have more, more features or more attributes, richer connections and draw that. And let us take a look at it. And we can. And if that form of data representation becomes common, then that's way more I don't know, contains a lot more information than, say, a relational database because there are more connections.
DAVID WEINBERGER: Sure and I don't I don't expect that I'm going to be using machine learning to set up my spice drawer. I would actually love it. I wish I could give it the names, give it all the books that I have, including where they are on my shelves, and help me understand myself that way. Oh, and also find my books.
DAVID WEINBERGER: It's a fascinating that's a fascinating point, that finding the connections within the ways that you yourself organize the data might tell you something about your own sort of thought processes behind the scenes. That's a more just chronology. Yeah, Yeah. You never know.
DAVID WEINBERGER: Yeah, it could be. Yeah, it could be completely banal reasons. Like you like colors together, right? Or something. Not very helpful with finding things. Yeah yeah, I think I got around 2013. I I'm sorry. The blurb behind me is because of my office. It's mainly books anyway.
DAVID WEINBERGER: All right, so one more. One more question. Maybe, maybe two if we have time, but one more. So we had a couple of comments in the chat and also in the Q&A where they were where attendees were making connections between these large language models, chat GPT being the sort of model Du Jour. But obviously there are, there are others in both history and in near future that are going to start emerging, treating them and making connections to them as search engines or using them in a sort of search engine like way.
DAVID WEINBERGER: And I've had discussions personally with librarians and such online over the last few weeks about that sort of popular understanding of these large language models as sort of. Engines of discovery rather than as. I think. I don't think you use the word remix. But certainly that's the. The more technical sort of way of understanding what's going on in the background.
DAVID WEINBERGER: Can you just speak a little bit about your take on the limits of this sort of process, the large language model process and how it relates to something like what we would more traditionally think of in the information science world as a search because they don't it my background doesn't let me put those two together too closely right there. They seem very distinct in my brain, but I don't think that's the case in popular, popular media or understanding right now.
DAVID WEINBERGER: I know. And in fact, the right now as we speak, there's Bing is launching its chat air interface to Bing and Google is going to very soon as well. And it's perceived as that lagging behind. It's intensely competitive and a huge amount of pressure on the search engines to do this. And there are searches for which it's fine. That is what you want.
DAVID WEINBERGER: But there are also searches where you should be very worried about. Yeah like. Students will research researchers in general and all of us. Oh, I forgot the name of the person whose term this is. That's terrible. It's Alison.
DAVID WEINBERGER: See, when you get older, you can only remember one or a person with two names. You can only remember one or them. That's literally, oh, I feel bad because she's also a friend of a very smart and good person. Well, OK. So she she has done work. About 10 years ago, I think at field work and how students use University libraries.
DAVID WEINBERGER: Mm-hmm And among her findings and at this point, using Wikipedia as well, is that they many of them even say typically use Wikipedia not for research, but for research. It's a way of getting a grounding in the field. Right you know nothing about it. You've got to write a paper on it. Look it up on Wikipedia. It's actually fantastic for that and is generally really reliable.
DAVID WEINBERGER: The chat API stuff. The chat API stuff seems to me like maybe and I'm guessing of course it may be very appealing as a research tool, one from which you then branch out and maybe for many people, more useful than Google or Bing style search. But then you end up going into those traditional search engines in order to dig deeper once you know what it is you're looking for.
DAVID WEINBERGER: It may be who knows that? We'll go back to using the old style search that we're all currently using. One of the disadvantages of some of the possible implementations of the AI interface on top of a search engine is, let's say it gives you a nice two paragraph, three paragraph essay and it has links out. You ask me some question. If we're lucky, it has links out and you're not seeing the search results that we've all grown up with.
DAVID WEINBERGER: Now, a list of pages you can go to. If most of us end up only using the newly generated AI generated text, then we lose. We're in danger of. Losing sight of something that search engines have been afraid at, which is not just searching, but necessarily showing us in a list that there is an endless amount of knowledge, ideas, research out there, and that knowledge is not the set thing.
DAVID WEINBERGER: It's not the end product of your research. It's actually it lives in those thousands of links, of millions of links which are interconnected because everything is linked on the internet and that have different ideas and different levels of authority, a different, different truth values that counts are a lot different biases and perspectives. And if we lose that, we've lost something, a really important advance that the internet has given us, which is being able to see knowledge as it actually is, not the summation of it.
DAVID WEINBERGER: But how it felt. It turned us all in that sense, very limited sense. And to scholars, that is in the scholarly word world, you're aware of all of or most of the other work that's being done by other people you're aware of which one supports which. And this has, you know, the way it works. And there are arguments all the time. Not nasty sometimes, but, you know, that's what each of the pieces you see is referring to others and refuting or accepting.
DAVID WEINBERGER: We're not all in that environment, is that that's not scholarly work. It's, you know, how do I how do I unlock the trunk on my car or it's how do I what's a good recipe or etc.? It's tremendously helpful. It's very healthy for culture to see this all the time. I don't if I had to predict and I don't, but I will. So why am I? Excellent question.
DAVID WEINBERGER: I would think that the search results are going to stay apart, even if it's a secondary part. And I don't know at this point for sure. I don't know if the manufactured the generated answers, which every time one of those generated answers contains an error, people are going to laugh. It's going to be tweeted or mastodon edited that I cannot say with a straight face, much as I like mastodon, it's going to people are going to be having fun with that.
DAVID WEINBERGER: And every time that happens, it's going to wear down trust in those, which is good because we need to be careful about them. No anchor. They only anchor to the world not to sorry. They only anchor to language. They don't anchor to the world. Yes, absolutely. You're as a fellow recovering philosopher.
DAVID WEINBERGER: Somewhat my background is my academic background is in philosophy, much like yours. It that answer reinforces my belief that information science should include an epistemology class at this point because there are long and deep conversations to get into about grounding all of this in truth and reality in some way, which is a question I would love to get into. You get into talking with you about that.
DAVID WEINBERGER: I think we are out of time for this particular talk. So, David, again, thank you so, so, so very much. I appreciate you being willing to be a part of nice Plus and to spend a little extra time to make sure that we get this recording nailed down for the attendees. I do appreciate it. Thank you for inviting me. Thank you for your patience with the technical problems that were my fault and for and Thanks for the many years of friendship and conversation Jason thank you, David.
DAVID WEINBERGER: Interesting to draw between the way in which you started the presentation with a two dimensional map, a kind of murder board of someone mentioned in the chat. Connecting the dots between different places and the way you ended on this multidimensional, complex world.
DAVID WEINBERGER: How do you see those? How do you see information professionals managing this multidimensional ecosystem where cataloging isn't any longer? Here's the book and come up with, you know, five different keywords. It is a much more embraced. Ecosystem at a level connecting the content in numerous ways.
DAVID WEINBERGER: How do you see professionals working in that, either at publishers who are, you know, trying to create marketing metadata or librarians who are creating metadata for discovery. I think there's still tremendous need. I think I hope what I'm saying is completely obvious, what I'm about to say, that there's continuing tremendous need for well structured metadata with the growth of data systems and growth of ontologies, whether they're in linked data or some other being put to use the set of attributes that we care about because goes from being limited to what the system can handle to anything that we think might be of interest, it might be of use to our users.
DAVID WEINBERGER: So in the world of books, it goes from the relative handful of attributes that are created by and managed by humans and made available to humans to use to a much broader set that is also created by humans, but is much more generous in its willingness to connect attributes, not just attributes that maybe people will want to search on, but attributes that seem to have some semantic meaning and even some that may not but may work in conjunction with other attributes.
DAVID WEINBERGER: Turn up interesting things. I'll give you a pretty terrible example of. Integrating whether data or whether data into book data would let somebody you'd have to do this historically, but let somebody wonder whether sad poets tend to come from cloudy areas or rainy areas, dreary areas. As I say, I'm not a great example of an actual use, but not a bad example of how unexpected metadata might turn out to be useful.
DAVID WEINBERGER: With the growth of machine learning, we get some of that sort of for free because it will discern what it thinks are the attributes that enable it to. Form patterns that help to differentiate the books that information has been given. So that's a fundamentally managing that is a fundamentally different task, I think, than.
DAVID WEINBERGER: Then trying to structure either a catalog or a link, open data or ontology for the category. It will the machine learning will find attributes that we didn't see. And so people first of all, we need help creating those machine learning systems, deciding what data to use, how to balance it, seeing if we can manage to provide locally based queries.
DAVID WEINBERGER: That is, ones that are will give different answers, depending upon the culture of the person asking it or the culture in which the data is embedded. That right now, as far as I know, that is not captured in large language models. Prompt design is going to be, I think, fairly. We're going to need a lot of help learning and teaching how to get prompts, get right prompts that get you the results that we want.
DAVID WEINBERGER: On the other hand, maybe the machine learning folks are going to figure out how to design systems so that they're more responsive to the widest range of prompts that humans come up with. So I think there's in fact, I think there will be more and more need for structured metadata where the structures get expanded and more and more categories of attributes are included with machine learning, I think, and hoping helping with that.
DAVID WEINBERGER:

Cadmore media player playing video Unanticipated Metadata in the Age of the Net & the Age of AI

Video Player

Transcript

Segments

End of Video Player Control