John Andrew Kunze – Biography

Metadata Research Center, Drexel University
3141 Chestnut Street Philadelphia, PA 19104, USA
Curriculum Vitae | CV | ORCID: 0000-0001-7604-8041
Linkedin: linkedin.com/in/jakkbl | Github: jkunze
Mastodon: fosstodon.org/@jakkbl | Bluesky: jakkbl.bsky.social
jkunze.net | ARK Alliance

John Kunze is a pioneer in the theory and practice of digital libraries. His passion for creating and sharing free, open, pragmatic digital solutions has guided his long public sector career. As a Berkeley undergrad in computer science and mathematics, he wrote software that comes pre-installed in every Mac and Linux system. He had leading roles in establishing identifier standards (URL, ARK), metadata standards (Dublin Core), archiving standards (BagIt, WARC), the Z39.50 library protocol, UC Berkeley’s first Campus Wide Information System, and repository microservices used in HathTrust and OCFL. He is currently leading the ARK Alliance forward and serving as a senior research associate at Drexel University, where he is working on the crowdsourced vocabulary tool, yamz.net.

Participating in open source at the University of California (UC) Berkeley from 1978-1983, 20 years before that term was coined, Kunze* began fixing BSD Unix bugs and writing tools that come pre-installed in today’s Mac and Linux systems (jot, lam, rs). During that time he maintained the global terminal capability database (termcap), created an online Unix help system, brought the Bell Labs Unix “learn” program and its computer-aided instruction scripts to life, and became principal author of the book Common Lisp: the Reference. Realizing that content is king in information technology, he became interested in working with libraries and archives.
In 1989 he proposed and began creating UC Berkeley’s first Campus Wide Information System (CWIS).† Into that system, called Infocal, he built in pre-web hypertext navigation (echoing “learn” scripts), a custom search engine, and open access to (a) library catalog search via the then-new Z39.50 search and retrieval protocol, (b) major campus datasets previously available only on paper (course catalog, schedule of classes, phone directory, job vacancies), and (c) the World Wide Web. Z39.50 interoperation had never been done previously, requiring heavy investment in several areas. This included standards development, software development – resulting in release of the first complete open-source client-server codebase – and the first true interoperability demonstration, conducted between UC Berkeley, Penn State University, and the UC Division of Library Automation.
As Infocal became web-aware, Kunze began to work with identifier standards. In 1994 he declined principal editorship of the URL specification, and instead agreed to write the functional requirements in an attempt to unblock the URL standard, which was at an impasse because the average URL link (web address) was seen to break (stop working) after about 100 days. His proposed requirements permitted URLs to break and was published as RFC1736, resulting in immediate approval of the first URL standard as RFC1738. As part of a 3-year fellowship at the US National Library of Medicine (NLM), he analyzed the persistent identifier landscape and in 2000 defined the framework for the NLM multi-dimensional permanence levels.
To deal with broken URLs at the California Digital Library (CDL), Kunze created the ARK (Archival Resource Key) persistent identifier scheme in 2001. With the goal of addressing broken links flexibly and affordably while leveraging the NLM permanence levels, he evolved the ARK specification, created the ARK resolver and registration infrastructure, and registered the first 600 ARK organizations. In 2018 with help from DuraSpace, he led the creation and growth of the ARK Alliance (arks.org). By the end of 2025, there were over 1720 ARK organizations, including 12 national libraries, 215 universities, 254 archives, 144 museums, 124 journals, and 59 scientific centers. The non-paywalled ARK identifier is vital for open knowledge linking across world cultural and scientific institutions, especially in the global South.
Motivated by the Unix philosophy favoring simple, extensible tools that combine easily and by a distaste for siloed solutions, Kunze developed open source tools for ARKs that also work for non-ARK identifiers. The Name-to-Thing (N2T.net) resolver supports ARKs as well as hundreds of compact identifier schemes, the EZID identifier service supports ARKs and DOIs (URNs, PURLs, and IGSNs were planned), the Noid (Nice Opaque Identifier) tool mints billions of ARK and Handle identifiers, and THUMP specifies inflections for ARKs that work with any URL-based identifiers. He also co-authored RFC1625 (WAIS) and RFC2056 (Z39.50 URLs).
A key takeaway from his protocol interoperability work, especially for nonbibliographic applications, was the notion of shared attributes such as title, author, and date. This inspired him to propose the Uniform Resource Citation (URC) in 1992 and to join the Dublin Core initiative in 1995 to focus on a new thing being called metadata.‡ There he led publication of the world’s first metadata standards (RFC2413, RFC2731, ANSI/NISO Z39.85), upon which most metadata schemas are based: Schema.org, OAI-PMH, MODS, METS, EPUB, DataCite, Darwin Core, etc. Considering metadata to be far from finished, he created the minimalist Dublin Kernel based on his Z39.50 work, the TEMPER date format, and a vision (1996) of a kind of “Dublin Mantle” that he that he would later implement as the YAMZ.net vocabulary builder.
In 2003 Kunze wrote the vision for a Library of Congress (LC) grant to harvest and preserve at-risk websites. Under that grant, he published the first draft of the WARC standard, now used in all large-scale web archiving (e.g., Internet Archive). To move files between archives, he worked with the LC team as principal author of the BagIt standard (RFC8493), which is widely deployed in libraries and archives (LC, Stanford, Cornell, Dryad, etc). He also created repository microservice specifications used in HathiTrust (Pairtree), BagIt (Oxum), and OCFL (Namaste).

* IPA pronunciation links 🔊: Kunze /ˈkʊnziː/, Infocal /ˈɪnfəʊkæl/, Noid /nɔɪd/, EZID /iːˌziːaɪˌdiː/, URL /juːɑːɹɛl/
† Not everyone knows that a few years before the web appeared, dozens of custom-built state-of-the-art networked information systems were emerging at universities for the purpose of sharing diverse types of information with students, faculty, staff, and the general public. In this brief era of the Campus Wide Information System (CWIS), institutions of higher education effectively piloted the web insofar as they worked out presentation and maintenance of heterogeneous online data over network protocols such as FTP, NNTP, Z39.50, and Gopher. From the University of Minnesota, Gopher was the first CWIS software packaged for easy installation, and just as thousands of non-campus sites were adopting it, the WWW software’s winning hypertext capability overtook it.
‡ This usage likely comes from the 1993 IETF URI Working Group meeting (p. 556) held 17 miles from Dublin, Ohio: ‘Tim Berners-Lee, John Kunze and Michael Mealling made presentations as to how to handle this “meta data” or ”factoids.”’