The Rarity of the Ampersand: Frequencies of Special Characters

We use certain letters more often than others (‘e’ or ‘t’, for example), but punctuation marks and other symbols occur far less frequently. How often are they used? Which are the most popular?
Image may contain Word Number Text Symbol and Plot

I was recently reading about the history of the cloverleaf shape on Apple keyboards, more properly known as Saint John’s Arms, and this got me thinking about special characters more generally.

I’ve always been enchanted by these rare symbols, found in odd keyboard combinations or LaTeX commands, which make up our typographical long tail. While we use certain letters more often than others (‘e’ or ‘t’, for example), the punctuation marks and other symbols occur far less frequently. But what is their rarity? Can we learn anything about how we use symbols?

Happily, some have already begun to explore this topic. Michael Dickens has examined different types of text—prose, programming code, and more—all to understand the frequency of different characters. And through a system of weighting these different types of writing, he came up with a single canonical ordering for all characters (as well as just for special characters):

*Character Frequency*: SPC e t a o i n s r h l d c u m f g p y w ENT b , . v k - " _ ' x ) ( ; 0 &nbsp;j 1 q = 2 : z / * ! ? $ 3 5 > { } 4 9 [ ] 8 6 7  + | & < % @ # ^ ` ~
*Punctuation Frequency*: , . - " _ ' ) ( ; = : / * ! ? $ > { } [ ]  + | & < % @ # ^ ` ~

As you can see, the comma is the favored symbol, followed by a period. This seems to only include ASCII characters, so we unfortunately don’t get to see the rarity of the Saint John’s Arms. Nonetheless, there are some interesting features. For example, a right parenthesis is more popular than a left one, at least by a small amount, as compared to curly brackets and straight brackets. My guess as to why? Emoticons, which use the right parenthesis quite often. In addition, while the less common symbols—including the ampersand—are the least common of all characters, a comma is actually seen more often than certain letters of the alphabet, such as ‘v’ or ‘j’.

That being said, certain types of writing have wildly different frequencies. For example, in the world of passwords, symbols occur far less frequently than in regular text, and commas are much rarer than might be expected.

The frequency of all Unicode characters is left as an exercise to the reader.

Top image:Wikimedia Commons/Public Domain

Homepage Photo: Coulter Sunderman / Flickr