Library of Words

This blog post describes the rationale and motivation behind the Library of Words, a digital collection of pages filled with every possible combination of 320 words.

We are writers.

From the dawn of the human species, we have always found the need to communicate. We developed complex languages that could accurately describe our abstract thoughts and feelings. That was the step that set us apart from other species. Yet, spoken language was not enough. We felt imprisoned by its locality. We needed a form of communication which would span through space and time. Something which would make our finite and minuscule spark of existence immortal. The answer was yet to be written in stone, literally (1), but when it was, we soon realized its power. Writing elevated us and made us progress as a species. It was not simply a tool to make our thoughts durable, but also a way for people from other places and times to explore new ideas and build on them. Single minds created the seeds, but the collectivity could finally make them sprout and bloom, thanks to writing.

Science is founded on writing. Progress and new discoveries are a slow and steady process which could not happen within a single lifetime. As Isaac Newton once stated in a letter to his rival Robert Hooke: “If I have seen further, it is by standing on the shoulders of giants” (2). Those giants are not single individuals, but a collection of revisited, accumulated thoughts and ideas of many people from the past. Writing is one of the most important part of research, it is what makes you one of the giants.

Writing goes hand in hand with reading. But reading has a darker shade. Pieces of writing are potentially eternal, and if every human writes something, the amount of scripta must be immense. If everybody on Earth writes a single word in this second, the combined corpus of words would form the equivalent of about 9,000 bibles. That’s the potential amount of writing the human race can produce in an instant. But we write considerably more than a single word in our lifetime, so nobody can ever read every single word ever written. We are restrained by our own finite time boundaries and each one of us can only put a microscopic tap into the colossal source of knowledge. That is why we specialize and why it gets harder to do so with time. That is why we select books to read and summarize them. And that is why we share our knowledge.

A library

When I walk in a library, the initial excitement of discovery is soon replaced with a feeling of disorientation. I feel lost and overshadowed by the vast amount of information I am facing. In front of me, books I will never be able to read and ideas I will never be able to grasp or even think. We are all armed with the will of knowing and searching for the truth, but we lack the instrument to comprehend it all.

An even more desolating experience could be given by a hypothetical library containing every book ever written. Google Books estimates this number to be 130 million (or - very roughly - about \(10^{7}\) (3)). Imagine walking through this library and reading only the titles of the book it contains. It would take you about 12 years of your life to read them all, without a pause.

The Library

Now imagine a library which does not just contain all books, but all books that will be written and all books that could have been written. It could be a library containing books with every combination of characters. How big would this library be? What would a page at random look like? How hard it would be to extract knowledge from such place? Jorge Luis Borges explored this idea in his short story: The Library of Babel (4). An extract from the book reads:

The universe (which others call the Library) is composed of an indefinite and perhaps infinite number of hexagonal galleries, with vast air shafts between, surrounded by very low railings. From any of the hexagons one can see, interminably, the upper and lower floors. The distribution of the galleries is invariable. Twenty shelves, five long shelves per side, cover all the sides except two; their height, which is the distance from floor to ceiling, scarcely exceeds that of a normal bookcase. One of the free sides leads to a narrow hallway which opens onto another gallery, identical to the first and to all the rest. [...] There are five shelves for each of the hexagon’s walls; each shelf contains thirty-five books of uniform format; each book is of four hundred and ten pages; each page, of forty lines, each line, of some eighty letters which are black in color. [...] The Library is total and its shelves register all the possible combinations of the twenty-odd orthographical symbols (a number which, though extremely vast, is not infinite): Everything: the minutely detailed history of the future, the archangels’ autobiographies, the faithful catalogues of the Library, thousands and thousands of false catalogues, the demonstration of the fallacy of those catalogues, the demonstration of the fallacy of the true catalogue, the Gnostic gospel of Basilides, the commentary on that gospel, the commentary on the commentary on that gospel, the true story of your death, the translation of every book in all languages, the interpolations of every book in all books.

In such a universe, the people living in the library - called librarians - would live their lives exploring it and knowing nothing about it. With time, they would have realized that the library was made of all possible permutation of letters, by bumping into a book explaining combinatorial analysis. They would wonder about its finiteness, possible periodicity, presence of fundamental truths or of a person having read such book, which would be worshipped like a god. Cults formed and books containing gibberish destroyed in the vane hope of reducing the size of the library and find the hexagon containing such truths.

Quantifying infinities

The size of this library would be roughly of \(10^{1,834,097}\) books, a number with almost two million zeroes. Humans are bad at judging dimensions, but a number that big is not just something one can barely visualize or imagine, but something this universe cannot physically contain. To put things to perspective, the universe has roughly \(10^{80}\) atoms in it. But atoms are not the smallest measurable thing. The Planck length is a fundamental physical constant of the universe and quantum mechanics hypothesizes that it is the shortest theoretically measurable length. The order of magnitude of this length is \(10^{-35}\) meters. In comparisons, there are around \(10^{185}\) cubic Planck lengths in the observable universe. Compared to the number of books in the Library of Babel (\(10^{1,834,097}\)), this number appears minuscule.

Any random page from a book from such a library would most likely look like a random sequence of characters. In fact, considering space, comma and full stop as separation characters, the expected value for the length of a string of letters is around 9 characters. The chance of finding a 9-character dictionary word in a page at random of the Library of Babel is 1 in 298,625. In comparison, here’s the chance of some accidents in the U.S.: 1 in 164,968 to be struck by a lightning and die; 1 in 112 to die from fatal motor vehicle crashes; 1 in 7 to die from cancer or heart disease (5).

Given how hard it is to find even just a single dictionary word in a page of a book from the library, the odds of finding a full sentence are even slimmer, and the odds of finding a sentence that makes sense even less so and the odds of finding something useful, interesting or new, make the chance of winning the national lottery look like an extremely common event (1 in 175,000,000 in the U.S.). Yet, by the law of large numbers, someone, sometimes, wins the lottery. With enough people browsing the library and with enough time, the chances of finding something useful in it are slightly shifted to our favor. The existence of such god in the library is then not such a silly idea. The book mentions the figure of a man every three hexagons, making the population of librarians close to infinite. Yet, for the population of librarians, it would only take 50 years, by reading 4 lines a minute for 10 hours every day, to explore the entire Library of Babel. The problems arise when you realize that - given the size of the population - god will most likely be not you (just like for lotteries), or it would be close to impossible to find. Furthermore, there would be “pseudo-gods” that would have read millions of copies of books stating the opposite of the truth or incomplete versions of it.