Name: Multilanguage metadata Recording
Uploaded: 2024-03-06T00:00:00.0000000
Duration: T00H44M02S
Description: Multilanguage metadata Recording

Name: Multilanguage metadata Recording

Description: Multilanguage metadata Recording

Thumbnail URL: https://cadmoremediastorage.blob.core.windows.net/6214c6e3-26cc-4d46-9c1d-24bee007ea60/videoscrubberimages/Scrubber_3.jpg

Duration: T00H44M02S

Embed URL: https://stream.cadmore.media/player/6214c6e3-26cc-4d46-9c1d-24bee007ea60

Content URL: https://cadmoreoriginalmedia.blob.core.windows.net/6214c6e3-26cc-4d46-9c1d-24bee007ea60/Multilanguage metadata -NISO Plus.mp4?sv=2019-02-02&sr=c&sig=0T09VwdxiRSxatGE%2BM%2BJ%2BLyJVmWHumA%2BDU2mhWw1X5c%3D&st=2025-01-22T04%3A15%3A51Z&se=2025-01-22T06%3A20%3A51Z&sp=r

Upload Date: 2024-03-06T00:00:00.0000000

Transcript: Language: EN.
Segment:0 .
GREGORY GRAZEVICH: Hello and welcome to the NISO Plus 2023 session multi-language metadata.
GREGORY GRAZEVICH: I'm Greg Grazevich and I work for the Modern Language Association of America where I am the associate director of bibliographic information services and editor of the MLA International Bibliography. I'm very pleased to serve as moderator for this exciting and timely discussion. This session features panelists from East Asia and North America who work with metadata in a wide range of languages with different writing systems.
GREGORY GRAZEVICH: I will ask our speakers to introduce themselves. Let's begin with Juan Pablo.
JUAN PABLO ALPERIN: Great thank you. Good morning. Good evening. Good afternoon. Wherever you may be. And thank you very much for taking the time to join us today.
JUAN PABLO ALPERIN: And thank you all for the invitation to join this panel to talk about something that, as you will soon hear, is near and dear to my heart. I want to talk about a little bit about issues of multilingual metadata and why they matter. As I was just introduced, my name is Juan Pablo Alperin. I'm an associate professor in publishing at Simon Fraser University in Vancouver, Canada, where I'm speaking to you from today.
JUAN PABLO ALPERIN: But I'm also the scientific director of the Public Knowledge Project, an initiative that I'm sure many of you have heard about, but that is creates open source software that is used by a variety of journals from around the world. The presentation today that I'm giving, although it is, I think, work that I'm very much personally involved in, it's also going to be presenting on work that is done on behalf of a project that we have ongoing that I'll tell you a little bit more about on behalf of Mike Nason, Marco , Tullney, Julie Shi, and Dennis Donathan.
JUAN PABLO ALPERIN: So I want to just acknowledge their work that has gone into what we will be presenting to you today. For those that are not familiar with the Public Knowledge Project, I only have a few minutes so I won't tell you too much about it. But just so you know, it's a project that has been around since 1988. It's a project that is primarily based here at Simon Fraser University.
JUAN PABLO ALPERIN: But we have sort team members sort of spread out around the world. We've been creating open source software, primarily the one that we're best known for is Open Journal System and will come to tell you in a few moments software that is very widely used for publishing academic journals from around the world. In fact, just recently we published an article it's actually already available at the journal QSS that's Quantitative Science Studies.
JUAN PABLO ALPERIN: And the reason that I'm pointing out this article is because it really describes to, I think for the first time that we were able to put together really the vast extent of how many journals are using open journal systems and where they are on the world. As part of the mission of the Public Knowledge Project has really been a project aimed at increasing the participation and sort of the global participation in scholarship.
JUAN PABLO ALPERIN: And trying to create that scholarship not only improve this quality and make it available and open access as the name Public Knowledge implies, but really believing that that quality is improved when we're creating knowledge with everyone around the world who is trying to do so. And so this publication is just one that I invite you to, to read it, because it will give you that sense of what we believe is a large part of the scholarship not indexed, a lot of it not indexed in other commercial databases that are usually used to track this scholarship and really create a much larger sense of the truly global endeavor that is research and science, just to kind of just the highlighting of the figures of what we see published in Open Journal Systems.
JUAN PABLO ALPERIN: And then when we did the study with the data that's available, there was around 20 or 25,000 journals using Open Journal Systems spread out over 9500 installations. So a few installations per journal and over 1 million articles published just in 2020 alone. And that number of journals has gone up now since. And we've published a more recent data set which is having done all of the analysis on it, but it's over 30,000 journals now that are publishing according to our latest figures.
JUAN PABLO ALPERIN: And I'm pointing out here and again, not just to give you the context of where I'm coming from and the project that I've been involved with for 15 years. But because Open Journal Systems has really been set up to be the journal software and management platform that is intended for use really across languages and across cultural context. In the platform itself, it really is set up so that you can have the system itself, not just the interfaces, as many systems do, where the interfaces and what you're clicking in can be in other languages, but actually out all of the metadata.
JUAN PABLO ALPERIN: And this is just a quick example, taking a screenshot from the latest version where you can see that the title can be entered in multiple languages. In this case, the primary language was set to English, but then also you're able to set the Spanish. And we have a little iconography that tells you whether titles have been translated into other languages or not, really trying to facilitate the insertion of multilingual metadata into the scholarly record.
JUAN PABLO ALPERIN: And without going into all the details about what OJS does in terms of helping you to point out when there's some issues with the metadata, if one of the languages has not been hasn't been translated into another language, you can see that. But also for those that don't know, Open Journal Systems really is able to then send that metadata off to places like Crossref and to ORCID so that that metadata in all of its multilingual forms is available across the rest of the publishing record.
JUAN PABLO ALPERIN: Just to give you again, just a couple of highlight figures, mostly here, I get to paint the picture of the truly global nature of scholarship, but I think is so often not thought about and is often we concentrate very much on the North American and European context, and the world of scholarly communication way too often just centers on those places. But we see that actually half of these 25,000 journals, over half of them are published in other parts, so about 80% of them are published in the Global South, over half of them in the Asia-Pacific states and Latin America is also a big user of the software.
JUAN PABLO ALPERIN: And when we look at the languages of what's published in the software, we see that we do see a dominance of English. And that's not a surprise, but surprising to many people perhaps, is to see that over 25, around 23%, over 5,000 of the journals are primarily publishing or publishing in Indonesian content, Spanish and Portuguese also being languages that are quite prominent. When we look into journals that are publishing in more than one language, we find that a little less than half of the journals publish in just a single language.
JUAN PABLO ALPERIN: And actually the rest of them are publishing in two languages or more, and in some cases in three or more languages. That means that they have articles published in all of these different instances. The point here again, is to say that there are actually thousands of small, mostly in small, independent, distributed journals that are around the world, that are publishing in multiple languages and publishing from very different cultural context than what, again, often the scholarly communication world centers on.
JUAN PABLO ALPERIN: And that in all of these cases, I want to make the argument that the language that they're publishing in really matters, and it has an effect on both what is being captured, but also how people are able to work and what they're able to do. Let me pivot now to talking a little bit about just in the remaining time that I have, I want to talk to you a little bit about a project that we started in collaboration with some support from Crossref.
JUAN PABLO ALPERIN: They had a call out for projects a couple just a year or so ago. And with this lab is the research group that I lead here at Simon Fraser University and the Public Knowledge Project. We collaborated to try to present a project that we are for a short form calling Metadata for Everybody. And there's really a project aimed at trying to understand what are the kind of metadata quality problems that exist as they pertain to different elements, in particular to language and to culture and to what are some of the quality problems that emerge in what we're finding in the metadata record?
JUAN PABLO ALPERIN: And we decided to analyze that in a two phase approach. The first was to take just a very purposeful sample of 427 records. This is we collaborated with Crossref. We took our own knowledge of what we knew. This is looking beyond journals using OJS. I just wanted to use that journal of using OJS just to set a little bit the scene. But we took records from all of the cross-checked databases from 15 or so million to millions of records that they have.
JUAN PABLO ALPERIN: And we took 427 from journals and from instances where we anticipated there would be a problem as a way of trying to classify and get a sense of what the kinds of issues existed. And now we are in the midst, and I'm going to present to you, things that we haven't actually shared out. It's very preliminary work from the phase two, where we've taken a random, completely random sample of Crossref records of 100,000 Crossref records.
JUAN PABLO ALPERIN: And we've tried to see to what extent can we try to actually measure and identify through computational approaches now those same kinds of issues that we were able, able to find? So in that first phase, again, we were looking very closely. We had so this is where Julie that I mentioned earlier. Julie, she she went through and read not only the records, but then compared them to the actual articles that were published and really tried to do an analysis of trying to see what was happening.
JUAN PABLO ALPERIN: And we tried to classify these different kinds of issues that we were able to find, and we were able to see that actually many of these issues are actually things that we are saying. And and they can get into a little bit more of the details on this. In the conversations we're saying, there are issues that are emergent or somehow linked to culture.
JUAN PABLO ALPERIN: So this is things to do, for example, naming conventions and how people trying to put their names, the things to the scripts that they're writing in, in terms of if they're not using Roman characters or the languages and trying to include things in multilingual or in one or two or three multiple languages, just to give you a couple of concrete examples, so we have found examples where the metadata is only in the Roman alphabet, right?
JUAN PABLO ALPERIN: When when we go to and click through and see that the article is actually published using another script, but yet in the metadata record, we only see it visible in that one, or sometimes only see a transliteration of those characters, but not an actual translation of the content. We see instances where people do things like present things in all caps, like surnames or surnames of only some of the people.
JUAN PABLO ALPERIN: Again, trying to signal something with this capitalization and there's different cultural norms around the use of those characters. We obviously see missing translations where some elements are translated in the actual document but not translated into metadata. Different instances where the stated language is different than the language actually used. They say that the record is in English, but when you actually go to look at the document, the document is actually in other languages and we see some instances where they cram multiple languages into the same field.
JUAN PABLO ALPERIN: So you'll see an abstract field that will have multiple languages written into each one. And so, again, just a little smattering. Well, I didn't have really time to break down those 33 different types, but just to give you a little bit of a sense of the work that we're doing and happy to continue that in the conversation. The next thing I just want to just quickly walk you through a couple of the analyses that we've done to see how prominent these errors are and where are they and who is making them.
JUAN PABLO ALPERIN: The good news is that if we look at different publisher sizes, that the amount of errors are decreasing over time from what we're able to see, that they are more prominent and older records and they're a little bit less prominent in newer records. We can see that different publishers publish different to different extent, depending on the size of the publishers. We just build them into like big, small, medium sized publishers.
JUAN PABLO ALPERIN: You see that small publishers are the ones that are primarily publishing in languages other than English. We see that the extra large publishers publish primarily, primarily in English. We are finding that in terms of the different kinds of errors that we're seeing, we're seeing that, for example, not entering what language the record is in happens in about 20% of cases. So 20,000 of the 100,000 records don't actually say what language the record is.
JUAN PABLO ALPERIN: Again, the presumption there is that everything is in English. But as you can see, depending on the publisher, that is always, always the case of those records that have a stated language. We find that around 9% of them are monolingual, but there are non-English of the multilingual records. So we find around 6% have more than one language, but actually almost all of them if it's English and another language. And so English continues to be dominant in that regard.
JUAN PABLO ALPERIN: And in about 5 and 1/2 percent of the records, people say that the record that's the example where records say it's in one language. But it turns out when you look at the title and you try to detect the language automatically, we find that it's in a different language, that the one that is stated. And finally, just another just one more little sort of tidbit. In fact, here is around...
JUAN PABLO ALPERIN: we've just find it in the small number of records, only like 1 and 1/2 percent where they insert they do that thing where they insert multiple languages into a single field into the abstract. I know I'm out of time here, so I just want to say two things around why we can see that language matters is that we see that the different records that are different errors are more or less prominent depending on the language.
JUAN PABLO ALPERIN: So we see some language records are in some languages tend to have more records than others. Again, showing that the cultural context that is generating these records is creating things that we are classifying here as errors, but are actually just might be in some instances, just capturing a cultural and a different reality. And so as a conclusion, just to say that we need to really remember that scholarship is truly global and multilingual and that metadata needs to accommodate this diversity.
JUAN PABLO ALPERIN: We need to be careful what sort of metadata we're trying to we're putting, quote unquote, fixed, because in many instances we see that what is in the records is a reflection of just a different context in a different way of seeing the world. I'll finish with that. Thank you very much for your time. I look forward to the question.
HIDEAKI TAKEDA: So how do I run? I will talk about multi language issue of scholarly publishing in Japan. My name is Hideaki Takeda. So I'm chair of Japan Link Center one of the registering agencies of DOI Foundation. And my main job is a professor of computer science National Institute of Informatics. Let's start.
HIDEAKI TAKEDA: So, first of all, I talk, I explain. So what is Japan Link Center? So Japan Link Center is a registering agency for DOI Foundation founded in 2013 so by these four leading institutes in Japan. So currently member is over 70 and associate member is around 3,000. And important thing. So our mission our scope of DOI is scholarly publication in Japan, so that is our JaLC.
HIDEAKI TAKEDA: So then so I talk about so we have meaning of DOI for a publication in domestic publishing. So in Japan. So then what about so scholarly activity in Japan. Actually our activity is a mixture of domestic and international activities. But so it's a difference by disciplines. So for example, so and the natural science like mathematics, physics, chemistry, so and major language for scholarly publication is English.
HIDEAKI TAKEDA: So Japanese is only used for a domestic communication, so not so like a publishing, so achievement. So and then so but different. situation is engineering, medicine, agriculture, these kind of domains. There so there is a lot of practitioners like engineers, doctors, nurses.
HIDEAKI TAKEDA: So a scholarly society has a lot of member as such. So practitioners. So publication also intended. for such people. So that. Japanese is also important. language for publishing. Of scholarly works. But also scholar is working for international community.
HIDEAKI TAKEDA: So that is two language. Japanese and English are also used. Then social science and literature. So many languages. There is a Japanese. So because it goes so especially a cultural issue, is there is it very important to discuss in Japanese? So that is the point.
HIDEAKI TAKEDA: So and also in cultural studies include some various languages other than English and Japanese. So also publication is used such various languages. So the point is that actually the community is something overlap. Let's see the example. So this is my list of publication in one year so I can see so some publications are in Japanese, some in English.
HIDEAKI TAKEDA: So that is. So I'm working with computer science. So this is a case of one computer science researcher. So OK let's see. So how happened in JaLC So DOI registration. So there is something about a data about Title how the language of title. Roughly so JA is Japanese and EN is English and UNK is unknown.
HIDEAKI TAKEDA: But it's presumably Japanese. So simply say Japanese to English is 2 to 1. OK look, look at it closer. And so this yellow part is J-Stage. It is one major journal platform. There it's... the proportion is 10 to 7. The blue part is JAIRO or IRDB, which is a regional institutional repository.
HIDEAKI TAKEDA: So here it's 6 to 1. Some difference in the source of data. So we can look at. So J-Stage closely, J-Stage is the most popular journal publishing platform in Japan is operated by JST. So Japanese societies are actual users. So they use this platform for publication for public journals.
HIDEAKI TAKEDA: So the main target is a Japanese journal. So because these domestic societies are mainly for Japanese scholars. So actually researchers are also publishing English papers in the international journals is published. International publishers are not included. So here so look at here. So and some is that.
HIDEAKI TAKEDA: So percentage is a little bit different so we can see so some literature or law, politics is a high average. Of Japanese, and low is mathematics, physics, something like that. So that's a very different culture and different. discipline. So institutional repositories case is very different. And because the major part of the articles come from the department bulletins I think also a Japanese culture in Japanese universities.
HIDEAKI TAKEDA: So it's a kind of periodical publication edited by it's published by departments of universities. It's kind of also looks like journals. So but so it's also most dominant. language is Japanese so it is that. So that it's the case that Japanese title is a majority. So but not so simple a case, actually. The metadata is bilingual. So it's one article, also two sets of metadata like so one submitted for English, one in Japanese.
HIDEAKI TAKEDA: So publication in Japanese often have English metadata in addition to Japanese metadata. So that is something that. is introducing. activity in the Japanese scholarly community to English communities. So that's I can say you can see that here's some overlap so some these areas of overlap of activity. So some researcher published an article in Japanese but so they just tell the activity to international community that's so main role over English metadata.
HIDEAKI TAKEDA: So this example is also taken from my publication. It's not just sort of my publication here is OK, this is actually an article written in Japanese. So that's a title also authors' affiliations and of course content is in Japanese. But here I can see So the English title, authors in English and affiliations in English also abstract in English. So typical in a computer science case.
HIDEAKI TAKEDA: So in JaLC in order to search kind of metadata. So currently journal metadata item are in multiple language values. So title, creator, affiliation, abstract and keyword. So and so. Look at so closely. So actually so. And one third of the articles have two metadata so that you can see. So and so two thirds have one metadata.
HIDEAKI TAKEDA: So incase of J-Stage, so English journals is not so many. Main is the Japanese journal. And a mixed journal is one allowing Japanese article and English articles But the majority of articles is in Japanese. So that's so you can see so some Japanese and English.
HIDEAKI TAKEDA: metadata appeared in these journals. So some issues for bilingual bilingual metadata. So system issue is that sometimes treatment of multiple data value by language. So once it was a serious problem, because some international systems and international activities only allow one language, English.
HIDEAKI TAKEDA: So once the Crossref metadata was something like that. But now it's a change. So now also, most systems can treat such bilingual metadata. OK, it's good. And there's still a problem on the search. Search in single language or mixed or select language or together.
HIDEAKI TAKEDA: Or mapping between the systems. For example we offer our metadata to other systems, then there is different on primary languages. So DataCite case. English is the major language. So we should also offer English metadata first and otherwise. Japanese metadata. So that's our decision. So how to provide data to other system it's sometimes so probmatic.
HIDEAKI TAKEDA: And some semantic issues exist. Actually one is relationship to content language. We have some English metadata for Japanese content but so what is it that this metadata means. So just translation or something more because it's this kind of metadata is attached by authors. And it is also related to the author's intention.
HIDEAKI TAKEDA: So how much they want to deliver the content to audience in English because the content itself is Japanese. So and also how much do they want to express metadata? A good example is the expression name like a local name or English name. In my case, this is my case in ORCID. So I put, my name in the alphabet, and I put the alternative name in Japanese.
HIDEAKI TAKEDA: So because I use ORCID for mostly international communities. So I prefer also the name for the primary one, but it depends on the researcher. So some emerging issues is that automatic transmission is coming. So it is now also access content, as well as metadata via automatic translation. So new issues such as accuracy of translation.
HIDEAKI TAKEDA: So risk of misunderstanding. And also authorization of translation. So formal or informal translation. And also willingness of translation, something was authors' expectation. So yeah, because it's so is sometimes surprise, so that their content. Metadata is translated. So sometimes it's serious, because the risk of unexpecting audience, because sometimes, sometimes some material includes some ethical issues.
HIDEAKI TAKEDA: that's very sensitive. OK, that's the emerging issues. So with automatic transition. OK, let's look at one case, it is again my case, this is information of funding. So actually, this is domestic funding by the Japanese government. So I didn't know that. So this name of the fund is translated.
HIDEAKI TAKEDA: So someone translates into English automatically. So this is my surprise. So in summary so multilingual metadata and content exists in some disciplines in scholarly publishing in Japan. So how is the other language case similar or different? Metadata may be two or more by language for a single publication. Some issues exist, not so serious. Introduction of automatic translation will cause some change of culture.
HIDEAKI TAKEDA: Yeah, it's a big issue. Language is a big, a big barrier, But now it's falling down. It's all good. Oh, someone says, oh it's good and will welcome it, but some complain for such change. It's OK. Someone always complains about something anyway.
HIDEAKI TAKEDA: OK that's it. Thank you for listening.
JINSEOP SHIN: Hello, everyone. Today I am going to talk about Korean language and multilingual metadata in science and technology fields. The subtitle is "From table of contents to content." My presentation consists of three parts: Hunminjeongeum, Korean language diversity, and multilingual metadata.
JINSEOP SHIN: First of all, I'd like to talk about Hunminjeongeum or Hangul, which is Korean alphabet. King Sejong the Great creates Hunminjeongeum In 1443, he created it. So the common people could actually easily read and write the Korean language. He published the Hunminjeongeum manuscript in October 9 in 1446, which is now Hangul Day in South Korea.
JINSEOP SHIN: It contains the explanations and examples of Korean alphabet. Hangul is the most scientific character that can express almost all the sounds on Earth with 24 very concise alphabets. Korean is concise, so we can easily express words with our fingers. I learned the Korean finger alphabet in 10 minutes.
JINSEOP SHIN: To move on to the next part.
JINSEOP SHIN: About 80 million people living in Korean peninsula, Northeast China, speak Korean as their first language. Korean is the official language of South Korea, North Korea and China. There are two analyzing studies on the regional diversity of Korean language. The first study found out that about 37 percent of words of Korean.
JINSEOP SHIN: Have different spellings or are not commonly used in Chinese Korean. The second study analyzed the terminology of middle school computer textbooks in South Korea and North Korea. It found that grammar and pronunciations are almost the same. However, they are significant differences in vocabulary. The two regions have different language policies.
JINSEOP SHIN: South Korea uses foreign words as they are written. However, North Korea adopted a policy of transcribing computer terms in Korean. Since, you can see at the bottom of the page, the "server" in English. We write it as the "server."
JINSEOP SHIN: It corresponds to the sounds of the original words. But in North Korea, they call it "bong sa gi." It means the "serving device." So to the multilingual metadata for scientific articles. I asked a chatbot what the multilingual metadata for scientific article is. It answered with the definition, gave some examples and benefit of the multilingual metadata.
JINSEOP SHIN: The most important thing is that the multilingual metadata makes the article more easily discoverable and accessible to wider audiences. Moreover, it improves the visibility of the articles on search engines. The KoreaScience is the platform to disseminate Korean research articles to worldwide.
JINSEOP SHIN: We make metadata as written in the articles. Korea academic societies request. The authors write bibliographic items in Korean and English. However, authors write the full text of articles in Korean or English, not in multiple languages. KoreaScience displays the metadata of the two languages together. The display order depends on user selection.
JINSEOP SHIN: The Korea DOI Center is a national DOI registration agency in Korea. When content holders assign DOIs on their content, we recommend to them to put the author's name, affiliation, title, and source title of the content in Korean and English together. It facilitates content discovery and provide easy ways for readers to cite them with a DOI.
JINSEOP SHIN: We call it content negotiation. To go into the content from list of contents. At the beginning of COVID 19 pandemic, we hired more than 2000 young people. They made the HTML full text of their 500,000 research articles at the online platform.
JINSEOP SHIN: We developed an AI model. With their HTML full text to convert PDF into HTML files automatically in 2022. Users can read the converted HTML by translating it into their preferred language using Google Translate. As you can see, the figures. the original PDF files are automatically converted into HTML.
JINSEOP SHIN: And then you just can translate the HTML into English with your machine translations. Let me briefly summarize my presentations. King Sejong the Great created the Korean alphabet around 600 years ago to lower the people's language barrier, The Korean language has significantly changed from country to country.
JINSEOP SHIN: After the Korean peninsula division. Especially in its vocabulary. And we want to collaborate on lowering the language barriers to science and technology information with multilingual metadata and AI technology such as automatic transformation of PDF files into machine readable formats.
JINSEOP SHIN: Thank you for your attention.
FARRAH LEHMAN DEN: Hi, I'm Farrah Lehman Den. I am an index editor and instructional technology producer with the MLA International Bibliography and I'm going to be discussing working in the Hebrew alphabet from the point of view of an index editor. So searching library databases in romanized or transliterated Hebrew is not always convenient or reasonable.
FARRAH LEHMAN DEN: Different transliteration authorities include ISO 259, the Academy of the Hebrew Language and Library of Congress, which focus on Modern Hebrew and also the Society of Biblical Literature and Artscroll, which focus on Biblical and Rabbinic Hebrew. Additionally, many Hebrew language authors' preferred or authoritative proper names do not match any of these standards,
FARRAH LEHMAN DEN: often when there is an Ashkenazic as in Central or Eastern European surname. For example, most name authority files spell Israeli author Savyon Liebrecht's name differently from what, say, its ISO or Library of Congress standard transliteration would be. The MLA International Bibliography facilitates searching for Hebrew language author names and works in our romanized thesaurus by including searchable untransliterated Hebrew in the names and works thesaurus accessible through the EBSCO interface.
FARRAH LEHMAN DEN: Here, if a user searches Moshe ben Maimon in the Hebrew alphabet, they can then click on the preferred term Moses Maimonides, which allows them to find all related publications without having to guess at transliteration or what the authoritative name is, or locating only the records that might include a non-standard transliteration in an English or Spanish or French title.
FARRAH LEHMAN DEN: The back end view of our thesaurus shows that Maimonides' name is also searchable in Arabic, as well as in its direct Arabic and Hebrew transliterations. We are also no longer transliterating the Hebrew titles of publications that we index. Overall, this is an important step in making Hebrew publications more findable, but we've encountered some issues with punctuation mistakenly rendered as bidirectional, as the example of the quotation marks that I've highlighted here.
FARRAH LEHMAN DEN: Or in some cases, metadata imported from other databases or e journals are imported fully backwards, meaning rendered left to right. This is a poster that I spotted in the background of a television drama which is supposed to read "Kennedy lenasi" "Kennedy for president," but instead reads "isanel yedenek." And no one apparently on set noticed that.
FARRAH LEHMAN DEN: So it's everywhere. And this sort of backwards rendering is common enough, a common enough problem with Hebrew and Arabic that there are metadata editors for library cataloging that detect right to left writing systems like Hebrew and insert Unicode in order to define direction. Google Translate doesn't recognize Hebrew and Arabic when it's mistakenly rendered backwards.
FARRAH LEHMAN DEN: And there have been reports of Google Translate reading PDFS from the web left to right, rendering them essentially gibberish. Per TEI, because the header and the attributes of the language itself already imply directionality, any additional markup beyond Unicode is technically redundant unless there is a mix of left to right and right to left languages.
FARRAH LEHMAN DEN: This has been an issue for the MLA International Bibliography with punctuation, as I pointed out before, particularly with quotation marks and can also be an issue when a Hebrew word is used inside an English title, which often occurs in the journal Hebrew Studies, the example I give here, where essays in English will focus on a single Hebrew word or phrase. Here we see the Hebrew words "hineni" and "hareini" rendered correctly,
FARRAH LEHMAN DEN: Right to left within the English title. Software engineer Moriel Schottlender, who offers an excellent outline of what Unicode can and cannot do for rendering of right to left and bidirectional text notes that Hebrew affects the characters around it. So yes, as Schottlender and others have noted, left to right, rendering of Hebrew, reversals within bidirectional text, punctuation such as quotation marks mistakenly read as bidirectional instead of as integral to the Hebrew text.
FARRAH LEHMAN DEN: Are all browser or interface issues. But when titles and authors are imported into our indexing system, where we, the index editors, are working with the data prior to publishing our indexing, then these reversals and punctuation issues essentially become metadata issues again. So in this sense I'd argue when we talk about directionality, browser rendering affects our metadata and not just the other way around.
FARRAH LEHMAN DEN: Thank you.
GREGORY GRAZEVICH: Thank you to our speakers. NISO Plus attendees, please click through and join us in a live conversation with the panelists with your questions and comments.

Cadmore media player playing video Multilanguage metadata Recording

Video Player

Transcript

Segments

End of Video Player Control