Name:
NISO Unfettered Access Series #4 - Duff Johnson, PDF Association
Description:
NISO Unfettered Access Series #4 - Duff Johnson, PDF Association
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/e7a9e648-fa61-4679-a57e-1f42aaf7bc60/videoscrubberimages/Scrubber_10.jpg
Duration:
T00H30M42S
Embed URL:
https://stream.cadmore.media/player/e7a9e648-fa61-4679-a57e-1f42aaf7bc60
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/e7a9e648-fa61-4679-a57e-1f42aaf7bc60/Unfettered Access 4 - Duff Johnson - SD 480p.mov?sv=2019-02-02&sr=c&sig=8qOgnlTwBz0wV8lQeEm3UeQv2uihi9%2B5da7F%2BfiVoFY%3D&st=2024-05-19T23%3A45%3A46Z&se=2024-05-20T01%3A50%3A46Z&sp=r
Upload Date:
2024-04-25T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
Hello and welcome to the next installment of unfettered access. NISO's occasional conversation series with leaders and innovators in our community. Today, I'll be speaking with Duff Johnson, CEO of the PDF Association.
The PDF format has been a mainstay of publishing and content distribution since it was released in 1993. Over the ensuing three decades, it has become ubiquitous, both favored for its preservation and conformance with the appearance of page layout as well as derided as some holdover of a bygone print era. The reality is actually much more nuanced and many are not aware of the formats, power and capabilities to address this.
And to follow up on a recent niso educational webinar, I'm going to discuss the advances in the format, and I'm pleased to welcome Duff Johnson. So, Duff, Hello. And could you tell us just a little bit about yourself and your role? Sure Thanks a lot for inviting me. Todd so my name is Duff Johnson, and I've been managing the PDF Association as its CEO for the last four years and before then as its executive director and also a member of the board of directors.
So I've been involved in the PDF standards development community for some time. Prior to that, I was in the industry myself. I operated AI served in various roles in a couple of different software companies and got into it through a service Bureau that I started in the mid 1990s, scanning documents, turning them into digital documents, and that led me to PDF, which then led me to the industry and the PDF Association.
Great so could you tell us a little bit about the PDF association, what it does and who's involved? Association is an international nonprofit. It's a trade Association. So the members are primarily companies that produce software for the creation and manipulation and management and utilization viewing and so on of PDF files. There are companies from all over the world and 30 different countries have member companies in the association, in addition to companies that are also a number of individuals, 20 or 30 individual consultants who are also members in order to gain the technical and marketing benefits of membership.
So really the constituency of the Association are the technical organizations that have a business interest in developing and maintaining the format, as well as some of the institutional organizations that have a kind of overarching interest in what's happening in the digital document space. Examples would be the Library of Congress. So the National Archives or even in some cases, the Department of Defense in the United states, and certainly then large, large customers with critical infrastructure investments in technology such as Boeing Corporation.
Now, I know everyone is familiar with PDF as a file format and the technologies in content creation and distribution have changed over the years. So given the modern web and the rise of HTML documents, why is the PDF still necessary? Well, it's a great question and we get it quite, quite frequently because the more the modern web develops, the more capabilities it grows. It's a very reasonable question to ask.
So why are we still using a 30-year-old file format? It turns out if you look at Google trends, data is actually getting more and more popular over time, not less. And that's because and there's data to back this up continues to fill a vital niche in kind of the information technology world. PDF is the format that seemingly everybody uses for their final, their most critical documents, their contracts, their invoices, their bills of lading, those their reports, their statements to the court.
Their their. Their final drafts for markup. PDFS PDF files are used in this context precisely because people still need a portable that is to say, independent of web technology, independent of servers, even independent of a connection. They need a portable, self-contained object that they can pass from around that they can share with total reliability that the recipient of the file will see exactly what the author intended.
And PDF files can not only represent the original material, but they can accept all kinds of changes to that material that is extremely difficult or impossible to achieve in a web context. So in PDF files you can annotate documents, adding notes and markup to make suggestions or changes or remarks. You can redact content, remove it from the page, for example. You so there are a variety of aspects of PDF that make it really the only general purpose file format that people use when they need to commit stuff for the purposes of posterity for long term archiving, or when they really need to share something that in a way where they can have a great deal of confidence that the recipient will get exactly the same experience even if they're on the other side of the world using unknown computer system, or they're not going to look at it for like 2 or 3 or five years.
These are all use cases where PDF continues to offer a critical facility that web technology simply doesn't replace yet. As a result, major document management systems Sharepoint, for example, I believe it's still the case that PDF files are the number one format stored on SharePoint and corporate SharePoint systems. Yeah, and particularly as document. Word processing documents.
Word processing programs develop and change. That preservation of look and feel and representation and text is critically important. The modern word processor programs don't really saving things in file structures in quite the same way that they used to. Having a reliable source file to go back to and say, this is what it looked like on this day is critically important.
Yep and I would say, you know, source formats are doing better and better with this. You know, I could take a Google Doc and roll it back to any given particular point in time, but that's, you know, that's, that's still I need a connection to that server. We need permissions to, to, to view that file and then and then the it's limited in terms if we need to remove PII or we need it to, to do annotations to it.
Yeah all these things are possible to an extent in the source environment, but it's still the case that any time you're going to need to share this content with third parties, you're usually going to make some sort of a decision. Do I want to give them access to the source or do I actually just want them to have a copy? Represents exactly the current state of the document. PDF is very much for that latter role and the need for that role is not yet gone away and we don't anticipate that it will.
Yeah so PDF has a lot of functionality that most people don't pay attention to. Don't dig into the details. If there was one thing that you wish more people knew about PDF or the functionality within PDF, what would that be? Yeah, that's a good question. You know, it's hard when for somebody such as myself, I consider the entirety of the format and the enormous variety of capabilities that it has hard to kind of pick one.
The one I would pick, I think is one that, you know, for me is one of the very first things that excited me about PDF and yet I still see it. You know, people just don't use this very frequently. So I'll show you what I have in mind. And this is a feature that's technically called bookmarks or called outlines, but the typically users know it by the name bookmarks. See here, this is a PDF viewer that a lot of people will be familiar with.
And so right now we're looking at a PDF file that's 1,000 pages long and it's and it's the specification for the PDF format. And so what I'm doing here is turning off and on the Bookmarks panel. Now the Bookmarks panel appears to the side of the page and provides a way of navigating through the document using a kind of floating table of contents. And characteristically what you'll find when people create PDF files is they make a file which, you know, it might have a table of contents in this document definitely does.
So So there is a table of contents, but you have to be at this particular location in the document in order to use it. Whereas if I'm located, if I'm, if I have the Bookmarks panel open, I can always get around the document by through the Bookmarks panel. Remarkably, to me, in my view, every, every PDF file over, I don't know 20 pages, 15 pages, something like that should almost certainly have bookmarks.
But it's extraordinary to me how frequently they don't. And so when you're creating that PDF, is there something particular that someone would have to do in order to embed those bookmarks in the document? So a lot of applications, not enough. And I would, you know, strongly suggest that not only application developers should do this more frequently, add this option that they present to users, but users should also be asking for this, because one of the things that I would definitely want to communicate to people is that the more you ask your vendor to provide a given feature, the more they will be interested in doing it for you.
There's a very clear dynamic there and they're usually based on the headings, as in this case, in this document. Here's a heading and then here are other subheadings below that which are, which are created using the bookmarks which are available via bookmarks to create them. In this application you can simply add bookmarks using the the, the menu the context menu at the top of the Bookmarks panel.
You can create them one, z, two z. With this feature, you can also create new bookmarks by examining the structure of the document. And it'll automatically generate a whole set of bookmarks for you based upon the headings and so on that were found in the tags. Yeah, which kind of goes back to the guidance that I think I hope most people who are creating long documents are thinking of is that document structure and using headings and sections and delineating the elements of your document rather than just I will create this header by just changing the font and make it bold.
That's, that's a, that's a big subject. And yeah, so, so the tags view of PDF which I'm now displaying this feature is added to PDF in, in 2001 and it's an increasingly important aspect of making it possible for PDF files to be both reusable and, and provide these sort of navigation utilities. As you can see here, I'm these are the kind of semantic identifiers for PDF that I'm highlighting now these paragraphs.
And when I, when I select the paragraph tag, the respective paragraph highlights these features. These tags can be put to use in all kinds of ways for making these documents usable to people with disabilities to, to, to improving the reuse of content, experience, reflowing documents and so on. I think it's really important to understand that PDF has the functionality to make things robust and understandable, semantically enriched accessible.
The challenge is people not understanding how to use those functionality and not embedding them in their document creation process. Absolutely as you showed that, it's sort of. Showed that the modern PDF is not so disjointed from the current web. It has a lot of embedded functionality and people can use them in more robust ways than simply generating a page image.
What are some other features in the PDF structure that you could highlight? PDF portable collections are a great and powerful feature that were added relatively recently. A portable collection allows you to incorporate a number of documents within and embed them within a PDF file, essentially turning a PDF into something kind of like a zip archive, but with a cover page with actually using a PDF file as the zip archive.
So within a collection you can include, you know, other PDF files, word file like maybe the source of the original document. You can include like Excel files that were used to as part of it. You could put a variety of content in there and kind of treat that collection as sort of a binder or a folder of related content. That's something that you can so you don't have to send an email, so to speak, with 10 or 20 files, make up a given folder of digital content.
You can put them all in a single PDF file, make them more portable. That way, along with respective metadata. That's one feature that I think people could get a lot of value out of. And this is something that we've talked to a number of different federal agencies about, you know, not only being aware of the feature, but also of improving their utilization of it.
And digital signatures are another great example of a technology that people today, in many cases, they're adding like a kind of an image of a signature to a page and sending that and calling that signed. Well, it's better in many ways than nothing, particularly if it's a reasonably trusted kind of situation. But if there are any going to be or one could imagine significant or serious questions about whether the material is legitimate, it's possible to add digital signatures to files so that there's kind of a machine authenticatable chain of custody on the document and you can ensure that it's tamper, it's, it's not been tampered with.
Yeah So another feature or another kind of underappreciated element of PDF was accessibility. And during a niso webinar last month you mentioned some of the work that the PDF Association has done to improve accessibility functionality within the PDF. So how can you talk a little bit more about some of the features and functionalities. The PDF has that people can use to make content more accessible for the print disabled?
Yeah so PDF, so accessible PDF was made possible really in 2001. The Adobe who owned at the time. Now PDF is an ISO standard. Adobe no longer owns it. It's another thing that people often don't know about PDF at the time. So PDF before then the had no structure capabilities within it.
And so the results that a piece of software would get if it attempted to interpret a PDF file and present it using assistive technology to a user who needed this technology in order to read, those facilities are limited and most importantly, they were not interoperable in the sense that you could not guarantee the result. Starting in 2001, with the advent of tagged PDF, it became possible to add this markup.
As as I actually showed earlier to PDF files such that software interpreting the file could get a reliable, consistent experience of what was a heading. What's a list? What's a paragraph? What's a table? What's a table cell for that table cell which which is the row header, which is the column header.
These are questions that can't be answered in PDF without tags because PDF doesn't have any natural structure to it. It's possible to make a PDF file that's simply a painting, simply colored dots on a page. And so in order to ensure the PDF files offer more information, making them usable to assistive technology. Tagged PDF was invented to give authors the ability to give their PDF files some facility for assistive technology to be able to use them.
And and so it's been a long time coming. There are now there have been the first ISO standard for accessible PDF was published in 2012. The technology for doing this has improved dramatically over the last what, 23 years or so that it's been since tag PDF was invented. I'd still say, though, that at least one of the barriers, one of the factors that keeps the majority of PDF files from being properly tagged so they're properly readable by users irrespective of disabilities, is that authors are not yet using the tools in their authoring applications, their word processors and so on to create documents.
Forget PDF, just to create any document that actually is usable by end users with assistive technology. So, for example, many authors are still using the size of the font too. They're increasing the size of the font to indicate like a heading. They're using the Tab key to kind of space out text on the page. So it looks like a list, right?
And the tools are getting better at offering the facility to, you know, if you start add a number and then a tab and you start typing and you hit Return. The software now thinks you're probably writing a list. And so it'll kind of automatically make it a list for you. All those things help a great deal in terms of getting structure into the content. So that structure can then be used is can then turn into tagged PDF.
But if that structure doesn't exist in the source file, you're going to have to use automation to kind of figure it out when you go to PDF, and that's definitely going to be an error prone process. Yeah so one of the things that authors can do is to use the structure tools in their authoring applications to make sure they do the right thing, right when they're creating the document in the first place. And author awareness of the, the need for accessibility and the use of the tools in their software to move that content forward.
That's probably the single largest factor that holds up accessibility in documents. And the particular advantage that the PDF format has over so many others in terms of helping end users to communicate their needs to the vendors is that the PDF specification is completely open. You can download it at no cost from PDF Association. You can you may need a developer to help you find your way to the relevant portion of the specification, but it's possible to give a vendor extremely specific feedback saying know the implementation of this feature in, you know, in your software, don't like it because you're not using this aspect or you know, the quality is thus and such, you know, please do something about it.
And the more literate, the more technically literate input they get and the more they pay attention to it, I think. Yeah, absolutely. So what's on the horizon in terms of developments and future improvements related to eprdf from your perspective? Sure so there are a variety of different aspects of the format that are under development today. One that one would highlight certainly would be the trend towards more reusable and easily reflowable pages.
And this aligns very closely with accessibility. So the Association has just published a new specification called well tagged PDF or PDF. And this specification is essentially the same as the new ISO standard for accessible PDF called PDF ua2. Both of these documents are based on PDF 2.0, which is the latest version of the specification first published in 2017.
Updated in 2020. 2.0 is the version of the SPAC, the industry is advancing at this point. The older generation technology from 1.7 from 2008 is not going to be progressing any further. All the work is in PDF 2.0 and in that context there is increasing support for easily reflowable pages. So if you wanted to have a PDF page that would easily Reflow itself to be more readable, let's say on a phone.
That's the kind of thing that it's more there's more facility for doing that in the PDF 2.0 world. Smarter documents. So I mentioned PDF portable collections, for example, or digitally signed documents, documents that include 3D data, geospatial data, a lot of capabilities and facilities in PDF that do all sorts of work in a variety of domains, engineering, medical imaging for the government, a lot of different individual capabilities that people will find their way to as they have their need for it.
We are just started a new group for forensics. So as, as you might imagine, as digital content becomes more and more critical in this world, and this is dramatically accelerated by the pandemic there's more and more opportunity or people are trying to use create fake documents or to tamper with documents in order to, you know, cash a check or, or commit fraud or so on. So the need to do be able to do accurate, you know, forensic analysis of PDF files to ensure that we understand, you know, that this one was, you know, it looked at such in such a way it was modified or what modified it, you know, and being able to represent correctly represent to a court or to an ANSI.
What's been going on with this file? That's an important initiative. We just started a working group on that. One of the most significant initiatives, at least technically, and I think for different users who become aware of this in different ways, is the industry is now moving towards support for HDR or high dynamic range images in PDF that essentially means whiter whites and blacker blacks and more lively or exciting colors that are possible in the context of PDF today, PDF uses a you know, technologies or imaging technologies that were more or less defined, you know, about 20 years ago.
And those things evolve as they go forward. And that's going to happen with PDF two. So how can people engage with the developments of PDF if they're interested? Yep well, I would encourage them to subscribe, for example, to the Association newsletter and follow us on LinkedIn. A PDF Association is all the time generating new articles on various aspects of technology.
We are producing new specifications or other kinds of guidance documents, but the PDF Association is really very much of a technically oriented organization. We don't try and get in between vendors and the end user. We're not here to kind of provide education, broadly speaking, on the capabilities and functionality of PDF. We're really a service organization that creates a kind of a vendor neutral space that the various interest groups and vendors can come together in to consider what's going on with PDF, to consider new features, to answer questions, to resolve ambiguities in the specification and so on.
So although the PDF Association can help end users in terms of developing the right question for the problem that they seek, we're really not so much about providing education on how to use PDF software and so on. The single best way for end users to engage more in improving PDF is really to work with the the, the, the vendor of the software that they themselves are using.
They find a limitation. They want more support for 3D or they're looking for a smarter way to correct tags for example, to make them more accessible to improve accessibility. These are the sorts of things that their vendors need to hear from. And I'd suggest to users that not only can they themselves get in touch with their vendor to make suggestions or to make requests, but importantly, they should get in touch with the people who do the procurement for the organization and suggest to them that they should be calling on the Adobe account manager or other vendors who whatever software they use and saying, hey, you know, could you please improved support for this or that or the other feature?
It's really the best way for it to influence the software companies to improve what they're doing. OK anything else you'd like to add? Anything we didn't cover that you'd like to highlight before we break? I'm not sure, you know, is, has been it's a remarkable technology to me because it's, it's something that it works. So well for, for these basic applications that everybody seems to use it for, that it's something that people don't ask a lot of questions about.
You know, they, they know what they've got. It does the job that they think it needs to do, you know, very reliably, very capably. And people are pretty much happy with that. On the other hand, there are a lot of, of, of, of kind of enterprise like solutions for document management, for, for AI, for interpreting the internal content of PDF files or other content and then representing that or, or in some other way.
I think these technologies could really benefit from improved utilization of the PDF format, from leveraging the fact that it's an open ISO standard and and from and could find in PDF many, many new capabilities and benefits for their end users that they could be bringing delivering to their end users. So I would suggest to organizations that whether or not they can, they don't.
Most of them, many of them don't consider themselves to be PDF technology companies. And that makes perfect sense to me. If if you're doing document management, you have to deal with all kinds of files. On the other hand, with PDF, you're dealing with the single file type that is used just generically for the most important documents for those say the documents that matter the most to various organizations and therefore probably the documents that people want to do the most amount of work with, invest the most in, get the most utilization out of those documents.
So it would be I encourage organizations to consider paying more attention to the particulars of technology to make sure that they're leveraging it to maximize to reduce their time and their costs and to maximize the benefits that they're getting from the digital document transformation environment. All right. Well, Duff, thank you so very much for your time today.
Really appreciate the opportunity to speak with you and learn a little bit more about PDF and the PDF Association. So thank you so very much. Great talk. Great and this concludes our latest episode of unfettered access. You can get more information about nyiso and all of our work at the Nasa website and look forward to seeing you again at a future nyiso event.
Thank you.