this page is prepared by wentian li of north shore LIJ research institute, new york city.
you are the visitor no. since January 1, 1999.

Zipf's law, named after the Harvard linguistic professor George Kingsley Zipf (1902-1950), is the observation that frequency of occurrence of some event ( P ), as a function of the rank ( i) when the rank is determined by the above frequency of occurrence, is a power-law function Pi ~ 1/ia with the exponent a close to unity.

The most famous example of Zipf's law is the frequency of English words. Click here to see a count of the top 50 words in 423 TIME magazine articles (total 245,412 occurrences of words), with "the" as the number one (appearing 15861 times), "of" as number two (appearing 7239 times), "to" as the number three (6331 times), etc. When the number of occurrences is plotted as the function of the rank (1, 2, 3, etc.), the functional form is a power-law function with exponent close to 1.

If you want to download English texts and analyze it yourself, get texts from Project Gutenberg (National Clearinghouse for Machine Readable Texts) (one mirror site is at UIUC ).

The second example Zipf showed in his book was the population of cities (or population of communities). The population of the city as plotted as a function of the rank (the most popular city is ranked number one, etc) is a power-law function with exponent close to 1.

The income or revenue of a company as a function of the rank is also an example of the Zipf's law (also in Zipf's book). This should also be called the Pareto's law because Pareto observed this at the end of the last century.

Does Zipf's law describe rare or common events?

(new on sept-15-1999)

Well, both! It depends on the quantity used in ordering the events. If an event is number 1 because it is most popular, Zipf's plot describes the common events (e.g. the use of English words). On the other hand, if an event is number 1 because it is unusual (biggest, highest, largest...), then it describes the rare events (e.g. city population).

Actually, in Miller's preface of Zipf's book, he distinguished Zipf's "first law" and "second law", one for rare events and another for common events. We don't make such distinction here (it's hard to remember which is the first law and which is the second law!)

Power-law or "stretched exponential" or "log-normal" or "Yule distribution"?

(new on may-02-2002)

I am yet to find a more complete list, let me just start to compile papers which question whether a seemingly power-law function may not really be a power-law functions...

Zipf's original work

pre-Zipf work: "Pareto-Estoup-Zipf law"

Mandelbrot's early work

Mandelbrot and Simon's debate

Zipf's law in natural languages

(updated on december-10-2001)

online reports (new on sept-15-1999)

Zipf's law in natural languages (papers written in non-English languages)

(new on feb-05-2002, I would like to thank Dr. Gabriel Altmann for this collection)

Zipf's law in monkey-typing texts
(updated on feb-12-2002)

Turing's formula?

Connection with information theory (added on may-10-2002)