A five-stage arc · Introduction1 of 5 live
How librarians solved identity, decades before Big Tech tried.
Every digital object has many names. The work of keeping those names in agreement — of deciding when two strings refer to the same thing and when they do not — is one of the foundational hard problems in computing. Catalogers and reference librarians have been doing it, publicly and at scale, since the nineteen-sixties. This is a walking tour of how.
…if you are.
You have built or maintained an identity-resolution system.
Start at Stage 01. Watch a free public API do, in production, what most modern systems try to rebuild from scratch — and see what its persistence guarantee actually rests on.
Resolve an identifier live
You work on entity resolution, knowledge graphs, or retrieval.
Start at Stage 02. Library catalogers have been disambiguating same-name entities longer than most retrieval systems have existed — including the cases your training data still gets wrong.
See two records, vote, then read the evidence
You have read about authority control but never touched the APIs.
Start at Stage 01 and walk forward. Each room is a live working interactive with the actual data your textbooks describe.
Walk the full arc
You work with SPARQL, RDF, FAST, or VIAF already.
Start at Stage 04 — a live Wikidata SPARQL walk grounded in labeled edge-types, with explicit framing of what each predicate asserts and where the graph is sparse for reasons of attention.
Walk Wikidata live
Learn how libraries know what things are. The arc moves from "Who is this?" to "How does the knowledge graph know?" — and ends with the older, harder question of how that knowing is kept true over time.
Each stage builds on the one before it.
- Stage 01
What is this thing, really?
Every digital object carries a fistful of identifiers — ISBN, OCLC number, LCCN, VIAF cluster, DOI, Wikidata Q-number. They all claim to point at the same thing. The work of deciding is older than the web.
Enter the room - Stage 02
Distinguish
in preparationWhich Stephen King?
There are hundreds of Stephen Kings in the world's catalogs. The discipline that resolves them — cluster by cluster, evidence by evidence — is what every identity-resolution system reinvents from scratch.
- Stage 03
Classify
in preparationWhere does it sit in human knowledge?
Dewey is hierarchical — a book belongs at one number. FAST is faceted — one book gets many headings that intersect. The difference between a tree and a graph, taught with real records.
- Stage 04
Connect
in preparationWhat does it touch?
Authority data is a knowledge graph. Walk outward from an entity — works by, works about, influences, contemporaries, subjects in common — and watch the structure of human knowledge render itself.
- Stage 05
Maintain
in preparationIdentity as stewardship.
Names change. Records get corrected. Entities get merged and split. Cataloging is not clerical work; it is the ongoing care of identity, meaning, relationships, and trust. The stage where fingerprints become signatures, and stewardship becomes verifiable.
The Authority Arc is built by Paul Clark using the public APIs of the Virtual International Authority File (VIAF) and — where access permits — OCLC's WorldCat Entities API. VIAF data is provided under the Open Data Commons Attribution License (ODC-BY). This site claims no affiliation with OCLC.