this page is prepared by wentian li of north shore
LIJ research institute, new york city.
you are the visitor no.
since January 1, 1999.
Zipf's law, named after the Harvard linguistic professor George Kingsley Zipf
(1902-1950), is the observation that frequency of occurrence of some event (
P ), as a function of the rank ( i) when the rank is determined by
the above frequency of occurrence, is a power-law function Pi ~
1/ia with the exponent a close to unity.
The most famous example of Zipf's law is the frequency of English words.
Click here to
see a count of the top 50 words in 423 TIME magazine articles (total 245,412
occurrences of words), with "the" as the number one (appearing 15861 times),
"of" as number two (appearing 7239 times), "to" as the number three (6331
times), etc. When the number of occurrences is plotted as the function of the
rank (1, 2, 3, etc.), the functional form is a power-law function with exponent
close to 1.
If you want to download English texts and analyze it yourself, get texts from
Project Gutenberg (National Clearinghouse
for Machine Readable Texts) (one mirror site is at UIUC ).
The second example Zipf showed in his book was the population of cities (or
population of communities). The population of the city as plotted as a function
of the rank (the most popular city is ranked number one, etc) is a power-law
function with exponent close to 1.
The income or revenue of a company as a function of the rank is also an
example of the Zipf's law (also in Zipf's book). This should also be called the
Pareto's law because Pareto observed this at the end of the last century.
Does Zipf's law describe rare or common
events?
(new on sept-15-1999)
Well, both! It depends on the quantity used in ordering the events. If an
event is number 1 because it is most popular, Zipf's plot describes the common
events (e.g. the use of English words). On the other hand, if an event is number
1 because it is unusual (biggest, highest, largest...), then it describes the
rare events (e.g. city population).
Actually, in Miller's preface of Zipf's book, he distinguished Zipf's "first
law" and "second law", one for rare events and another for common events. We
don't make such distinction here (it's hard to remember which is the first law
and which is the second law!)
Power-law or "stretched exponential" or
"log-normal" or "Yule distribution"?
(new on may-02-2002)
I am yet to find a more complete list, let me just start to compile papers
which question whether a seemingly power-law function may not really be a
power-law functions...
- Colin Martindale, Andrzej K Konopka (1996), "Oligonucleotide
frequencies in DNA follow a Yule distribution", Computer & Chemistry,
20(1):35-38. (Yule distribution?)
- Richard Perline, "Zipf's law, the central limit theorem, and the
random division of the unit interval", Physical Review E, 54(1):220-223
(1996). (Log-normal distribution?)
- Jean Laherrere, D Sornette
(1998), "Stretched exponential distributions in Nature and Economy: 'Fat
tails' with characteristic scales", European Physical Journals, B2:525-539.
(
http://xxx.lanl.gov/abs/cond-mat/9801293) (Stretched exponential
distribution?)
- Ronald
Rousseau (1999), "A weak goodness-of-fit test for rank-frequency
distributions", in Proceedings of the Seventh Conference of the
International Society for Scientometrics and Informetrics, ed. C.
Macias-Chapula, Universidad de Colima (Mexico), pages 421-430.
- Carlos M Urzua (2000), "A simple and efficient test for Zipf's Law", Economics
Letters, 66:257-260. [PDF]
- Bill Reed
(2001), "The double Pareto-lognormal distribution - A new parametric
model for size distribution", preprint. [note: this paper is on size
distribution, not on rank-frequency distribution.]
- E Limpert, WA
Stahl, M Abbt (2001), "Lognormal distributions across the sciences:
keys and clues", Bioscience, 51(5):341-352. [a general discussion on the
lognormal distribution] [ PDF ]
Zipf's original work
- GK Zipf, Selective Studies and the Principle of Relative
Frequency in Language (?, 1932)
GK Zipf, Psycho-Biology of
Languages (Houghton-Mifflin, 1935; MIT Press, 1965).
[Zipf actually
thought about this 10 years earlier, i.e., around 1925.]
- GK Zipf, Human Behavior and the Principle of Least Effort
(Addison-Wesley, 1949).
pre-Zipf work: "Pareto-Estoup-Zipf law"
- V Pareto, Cours d'economie politique (Rouge, Lausanne et
Paris, 1897)
- JB Estoup, Gammes Stenographiques (Institut Stenographique
de France, Paris, 1916).
- JC Willis, Age and area (Cambridge Univ Press, 1922).
- GU Yule, " ", Phil Trans. Roy Soc. London, B213, 21 -? (1922).
- GU Yule, Statistical Study of Literary Vocabulary (Cambridge
Univ Press, 1944).
Mandelbrot's early work
- BB Mandelbrot, "Adaptation d'un message a la ligne de transmission.
I & II", Comptes Rendus (Paris), 232, 1638-1640 & 2003-2005 (1951).
- BB Mandelbrot, in "Contribution a la Theorie Mathematique des Jeux
de communication" (Institute of Statistics, Univ of Paris, page 124, 1953)
- BB Mandelbrot, "An informational theory of the statistical
structure of languages", in Communication Theory, ed. W. Jackson (Betterworth,
1953) , pp. 486-502.
- BB Mandelbrot, "Simple games of strategy occurring in communication
through natural languages", symposium on statistical methods in communication
engineering (Berkely, Aug 17-18, 1953). appearing in Transactions of IRE
(professional groups on information theory), 3, 124-137 (1954).
- GA Milller, "Communication", Annual Review of Psychology, 5,
401-420 (1954).
[a summary of Mandelbrot's result.]
- BB Mandelbrot, "Information theory and psycholinguistics", in
Scientific Psychology: Principles and Approaches, eds. B. Wolman, E.
Nagel (Basic Books,1965), pp.550-562.
- BB Mandelbrot, "Les constantes chiffrees du discourts",in
Encyclopedie de la Pleisde: Linguistique, ed. J. Martinet (Gallimard,
1968), pp. 46-56.
Mandelbrot and Simon's debate
- HA Simon (1955),
"On a class of stew distribution functions",
Biometrika,
42:425-440.
[ PDF]
- BB Mandelbrot, "A note on a class of skew distribution function.
analysis and critique of a paper by H.A. Simon", Information and Control,
2,90-99 (1959).
[ABSTRACT: This note is a discussion of H.A. Simon's
model (1955) concerning the class of frequency distributions generally
associated with the name of G.K. Zipf. The main purpose is to show that
Simon's model is analytically circular in the case of the linguistic laws of
Estoup-Zipf and Willis-Yule. Insofar as the economic law of Pareto is
concerned, Simon has himself noted that his model is a particular case of that
of Champernowne; this is correct, with some reservation. A simplified version
of Simon's model is included. ]
- HA Simon, "Some further notes on a class of skew distribution
functions", Information and Control, 3, 80-88 (1960).
[ABSTRACT: This
note takes issue with a recent criticism by Dr. B. Mandelbrot of a certain
stochastic model to explain word-frequency data. Dr. Mandelbrot's principal
empirical and mathematical objections to the model are shown to be unfounded.
a central question is whether the basic parameter of the distributions is
larger or smaller than unity. The empirical data show it is almost always very
close to unity, Sometimes slightly larger, sometimes smaller. Simple
stochastic models can be constructed for either case, and give a special
status, as a limiting case, to instances where the parameter is unity. More
generally, the empirical data can be explained by two types of stochastic
models as well as by models assuming efficient information coding. The three
types of models are briefly characterized and compared. ]
- BB Mandelbrot, "Final note on a class of skew distribution
functions: analysis and critique of a model due to H.A. Simon", Information
and Control, 4, 198-216 (1961).
[ABSTRACT: We shall restate in detail
our 1959 objections to Simon's 1955 model for the Pareto-Yule-Zipf
distribution. Our objections are valid quite irrespectively of the sign of
p-1, so that most of Simon's (1960) reply was irrelevant. We shall also
analyze the other points brought up in that reply. ]
- HA Simon, "Reply to 'final note' by Benoit Mandelbrot", Information
and Control, 4, 217-223 (1961).
[ABSTRACT: Dr. Mandelbrot's original
objection (1959) to using the Yule process to explain the phenomena of word
frequencies were refuted in Simon (1960), and are now mostly abandoned. the
present "reply" refutes the almost entirely new arguments introduced by Dr.
Mandelbrot in his "final note", and demonstrates again the adequacy of the
models in (1955). ]
- BB Mandelbrot, "Post scriptum to 'final note'", Information and
Control, 4, 300-304 (1961).
[ABSTRACT: My criticism has not changed
since I first had the privilege of commenting upon a draft of Simon (1955).
]
- HA Simon, "Reply to Dr. Mandelbrot's post scriptum", Information
and Control, 4, 305-308 (1961).
[ABSTRACT: Dr. Mandelbrot has proposed
a new set of objections to my 1955 models of the Yule distribution. Like his
earlier objections, these are invalid. ]
Editorial note: Dr.
Mandelbrot feels that no further comment is needed and this debate terminates
herewith.
Zipf's law in natural languages
(updated on december-10-2001)
- GA Miller, EB
Newman (1958), "Tests of a statistical explanation of the rank-frequency
relation for words in written English", American Journal of Psychology, 71,
209-218.
- GA Miller, EB
Newman, EA Friedman (1958), "Length-frequency statistics for written
English", Information and Control, 1, 370-389.
- Henry Kucera, W Nelsen Francis (1967), Computational Analysis of
Present-Day American English (Brown Univ Press). [out of print: see Amazon]
- Ronald E Wyllys (1975), "Measuring scientific prose with
rank-frequency ('Zipf') curves: a new use for an old phenomenon," Proceedings
of the American Society for Information Science 12, 30-31. Washington, DC:
American Society for Information Science.
- H Dahl (1979), Word Frequencies of Spoken American
(Verbatim).
[rank-frequency of spoken words. the top twenty is: I, and,
the, to, that, you, it, of, a, know, was, uh, in, but, is, this, me, about,
just, don't]
- R Rousseau,
Qiaoqiao Zhang (1992), "Zipf's data on the frequency of Chinese words
revisited", Scientometrics, 24(2):201-220.
- EG Bard ,
RC Shillcock (1993),
"Competitor effects during lexical access: Chasing Zipf's tail", In
Cognitive Models of Speech Processing: The Second Sperlonga Meeting,
Eds. GTM Altmann and RC Shillcock (Lawrence Erlbaum Associates).
- DR Ridley , EA Gonzales (1994), "Zipf's law extended to small
samples of adult speech", Percept. Mot. Skills, 79:153-154.
- J Cooke, S Gregor, J Luck, JL Clark, KT Lua, J McCallum, "Analyzing
the conformance of Chinese text to Zipf's law and Automatic indexing of
natural language text in the UNIX environment", (transcript of slides, 1996?
Univ of Central Queensland, Australia)
- J Tuldava (1996), "The frequency spectrum of text and vocabulary",
Journal of Quantitative
Linguistics, 3(1):?-?. [ABSTRACT: The present paper deals with some
problems of the analysis of the word-frequency distribution and the
possibility of its analytical description ]
- Colin Martindale, SM Gusein-Zade, Dean McKenzie, and Mark Yu.
Borodovsky (1996), "Comparison of equations describing the ranked
frequency distributions of graphemes and phonemes", Journal of Quantitative
Linguistics, 3(2):?-?.
- VK Balasubrahmanyan, S Naranan (1996), "Quantitative linguistics
and complex system studies", Journal of Quantitative
Linguistics, 3(3):?-?.
- S Naranan, VK Balasubrahmanyan (1998), "Models for power law
relations in linguistics and information science", Journal of Quantitative
Linguistics, 5(3):?-?.
- W Li, Letters to the
editor, Complexity, 3:9-10 (1998).
- B K Sen, Khong Wye Keen, Lee Soo Hoon, Lim Bee Ling, Mohd Rafae
Abdullah, Ting Chang Nguan, Wee Siu Hiang (1998), "Zipf's law and writings
on LIS",
Malaysian Journal of Library & Information Science,
3(2):93-98. [
abstract ]
- R Rousseau
(1998), "George Kingsley Zipf: life, ideas and recent developments of his
theories", preprint (talk presented at the Beijing International Seminar of
Quantitative Evaluation of R&D in Universities, and Fifth All-China Annual
Meeting for Scientometrics and Informatics. Dec 4-6, 1998).
- Leo Egghe (1999), "On the law of Zipf-Mandelbrot for multi-word
phrases", Journal of the American Society for Information Science, 50:?-?.
- Claudia Prun (1999), "G.K. Zipf's conception of language as an
early prototype of synergetic linguistics", Journal of Quantitative
Linguistics, 6(1):?-?.
- MA Nowak (2000), "The basic reproductive ratio of a word, the
maximum size of a lexicon", Journal of Theoretical Biology, 204(2):179-189.
- Marcelo A Montemurro (2001), "Beyond the Zipf-Mandelbrot law in
quantitative linguistics", arxiv.org e-print ,
cond-mat/0104066, [ abstract
]
- Alexander Gelbukh, Grigori Sidorov (2001), "Zipf and Heaps laws'
coefficients depend on language", Proceeding of Conference on Intelligent Text
Processing and Computational Linguistics (CICLing'2001), ed. Alexander
Gelbukh, Lecture Notes in Computer Science, Vol 2004 (Springer-Verlag), pp.
332-335.
online reports (new on sept-15-1999)
Zipf's law in natural languages (papers
written in non-English languages)
(new on feb-05-2002, I would like to thank Dr. Gabriel Altmann for this
collection)
- JB Estoup (1916), Les Gammes Stenographiques Paris,
Institut Stenographique. (in French)
- W Skalmowski (1961), "Polskie przeklady Hafiza w swietle prawa
Zipfa-Mandelbrota", Sprawozdania Kom. Orient. PAN 125-127.
- VM Kalinin (1964), "O statistike literaturnogo teksta", Voprosy
jazykoznanija Nr. 1, ?-?.
- VM Kalinin (1964), Razvitie schemy Puassona i ee primenenie
dlja statisticeskich svojstv reci, Leningrad: Diss. (in Russian)
- Ju A Srejder (1967), "O vozmoznosti teoreticeskogo vyvoda
statisticeskich zakonomernostej teksta (k obosnovaniju zakona Cipfa)", in
Problemy peredaci informacii, Vol 3, 57-63. Moskva.
- EA Kalinina (1968), "Izucenie leksiko-statisticeskich
zakonomernostej na osnove verojanotnoj modeli", in Statistika reci i
avtomaticeskij analiz teksta, Leningrad, ?-?.
- G Billmeier (1969), Worthaufigkeiten vom Zipfschen Typ,
uberprüft an deutschem Textmaterial, Hamburg: Buske. (in German)
- Ju K Orlov (1970), "Statisticeskaja struktura soobscenij,
optymalŽnych dlja celoveceskogo vosprijatija", Naucno-techniceskaja
informacija, 2m(8):11-16.
- PM Alekseev, ST NavalŽna (1971), "Pro graficnij opis zaleznosti
'rang-castota' lingvisticeskich odinic", Visnik CharŽkivskogo universitetu 64,
folologija, vyp. 8:?-?.
- GG Belonogov, AP Novoselov (1971), "Nekotorye kolicestvennye
zakonomernosti v automatizirovannych informacionnych sistemach", in
Avtomaticeskaja pererabotka teksta metodami prikladnoj lingvistiki. Materialy
vsesojuznoj konferencii: 219-220. Kisinev.
- BA Volosin, JK Orlov (1972), Obobscennyj zakon
Cipfa-MandelŽbrota i raspredelenie cvetovych ploscadej v proizvedenijach
zivopisi, Tbilisi, AN GSSR Institut kibernetiki.
- LS Kozackov (1973), Sistemy potokov naucnoj informacii,
Kiev: Naukova dumka.
- MV Arapov, EN Efimova (1975), "Ponjatie leksiceskoj struktury
teksta", Naucno-techniceskaja informacija, 2:3-7.
- MV Arapov, EN Efimova, Ju A Srejder (1975), "O smysle rangovych
raspredelenij", Naucno-techniceskaja informacija, 2:9-20.
- MV Arapov, EN Efimova, Ju A Srejder (1975), "Rangovye
raspredelenija v tekste i jazyke", Naucno-techniceskaja informacija, 2:?-? .
- AT Micevic (1975), "Issledovanija struktury potokov
naucno-techniceskoj informacii po masinostroenii", Naucno-techniceskaja
informacija, 2(5):3-16.
- Ju K Orlov (1976), "O svjazi mezdu raspredeleniem Pareto i
obobscennym zakonom Cipfa-Mandel'brota", Bulletin of the Academy of Sciences
of the Georgian SSR, 83:57-60.
- Ju K Orlov (1976), "Obobscennyj zakon Cipfa-Mandelbrota i castotnye
struktury informacion-nych edinic razlicnych urovnej", in VycislitelŽnaja
lingvistika, ed. EK Guseva, pp. 179-202. Moskva: Nauka.
- E Schurer (1976), Das Zipfsche Gesetz in der fruhen
Kindersprache, Munchen: Diss. (in German)
- MV Arapov, JA Srejder (1977), "Klassifikacija i rangovye
raspredelenija", Naucno-techniceskaja informacija, 2(1-12):15-21.
- MV Arapov (1977), "Dve modeli rangovogo raspredelenija", Voprosy
informacionnoj teorii i praktiki, 4: 3-42.
- AI Jablonskij (1977), "Struktura i dinamika sovremennoj nauki", in
Sistemnye issledovanija. Ezegodnik 1976 , ed. DM Gvisiani, pp. 66-90.
Moskva: Nauka.
- SV Kopejkin, VE Ostapenko (1977), "Zakon Cipfa i sopostavitelŽnyj
analiz castotnych struktur anglijskogo, fancuzskogo, rumynskogo i russkogo
jazykov na baze matematiceskich modelej", Naucnye trudy Kujbysevskogo
pedagogiceskogo instituta, 193:91-94.
- PM Alekseev (1978), "O nelinejnych formulirovkach zakona Cipfa",
Voprosy kibernetiki 41:53-65.
- MV Arapov, JA Srejder (1978), "Zakon Cipfa i princip dissimetrii
sistemy", Semiotika i informatika, 10:74-95.
- LS Kozackov (1978), "Informacionnye sistemy s ierarchiceskoj
('rangovoj') strukturoj", Naucno-techniceskaja informacija, 2(8):15-24.
- W Marx, E Schuprer-Necker (1978), "Uberlegungen zur Interpretation
des Zipfschen Gesetzes am Beispiel der fruhen Kindersprachee", Glottometrika,
1:154-167. (in German)
- A Rouault (1978), "Loi de Zipf et sources markoviennes", Annales de
lŽInstitut H. Poincare, 14:169-188. (in French)
- H Birkhahn (1979), "Das 'Zipfsche Gesetz', das schwache Prateritum
und die germanische Lautverschiebung", Sitzungsberichte der osterreichischen
Akademie der Wissenschaften, philosophisch-historische Klasse 348. (in German)
- L Hoffmann, RG Piotrowski (1979), Beitrage zur
Sprachstatistik, Leipzig: ?
- C Muller (1979), "Du nouveau sur les distributions lexicales: la
formule de Waring-Herdan", in Langue Francais et Linguistique
Quantitative, ed. C Muller, pp. 177-195. Geneve: Slatkine (in French).
- A Babanarov (1980), "Castotnyj slovnik i avtomaticeskij slovar?dlja
masynnogo perevoda tereckich gazetnych textov", in Inzenernaja lingvistika
i optimizacija prepodavanija inostrannych jazykov, Leningrad, pp.?-?.
- MG Boroda (1980), "Haufigkeitsstrukturen musikalischer Texte",
Glottometrika, 3:36-69. (in German)
- Ju K Orlov (1980), "Informacionnye potoki: statisticeskij analiz i
prognozirovanie", Naucno-techniceskaja informacija, 2(2):23-30.
- Ju K Krylov (1982), "Stacionarnaja model?porozdenija svjaznogo
teksta", Acta et Commenta-tiones Universitatis Tartuensis, 774:81-102.
- Ju K Orlov (1982), "Dynamik der Haufigkeitsstrukturen", in
Studies on Zipf's Law, eds. H Guiter, MV Arapov, pp. 116-153. Bochum:
Brockmeyer. (in German)
- Ju K Orlov (1982), "Ein Modell der Haufigkeitsstruktur des
Vokabulars", in Studies on Zipf's Law, eds. H Guiter, MV Arapov, pp.
154-233. Bochum: Brockmeyer. (in German)
- Ju K Orlov (1982), "Linguostatistik: Aufstellung von Sprachnormen
oder Analyse des Redeprozesses? Die Antinomie 'Soprache-Rede' in der
statistischen Linguistik", in ? , eds. Ju K Orlov, MG Boroda, IS
Nadarejsvili, pp. 1-55.
- Ju V Orlov, MG Boroda, IS Nadarejsvili (1982), Sprache, Text,
Kunst. Quantitative Analysen, Bochum, Brockmeyer. (in German)
- AN Lebedev (1983), "Zakonomernosti postroenija slov v reci",
Psichologiceskij zurnal, 4/5:11-23.
- SD Haitun (1983), Naukometrika. Sostojanie i perspektivy,
Moskva: Nauka.
- Ju K Orlov, RY Chitashvili (1983), "Generalized Z-distribution
generating the well-known ŽRank-Distributions?, Bulletin of the Academy of
Sciences of the Georgian, 110(2):269-272.
- VN Byckov (1984), "K probleme obobscenija i interpretacija
rangovych raspredelenij v statisticeskoj lingvistike", Ucenye zapiski TGU,
689:61-70.
- RG Piotrowski, KB Bektaev, AA Piotrovskaja (1985),
Mathematische Linguistik , Bochum, Brockmeyer. (in German)
- J Tuldava (1985), "Castotnaja struktura teksta i zakon Cipfa",
Ucenye zapiski, TGU 711, 93-116.
- G Altmann (1988), Wiederholungen in Texten, Bochum,
Brockmeyer. (in German)
- Ju K Orlov (1988), "Unsichtbare Harmonie", Musikometrika,
1:281-315.
- C Prun (1995), Die linguistischen Hypothesen von G.K. Zipf aus
systemtheoretischer Sicht, Trier: Magisterarbeit.
- A Knuppel (1997), Untersuchungen zum Zipf-Mandelbrot Gesetz an
deutschen Texten, Gottingen: Staatsexamensarbeit. (in German)
- RG Piotrovskij, KB Bektaev, AA Piotrovskaja (1997),
Matematiceskaja lingvistika, Moskva: Nauka.
- J Tuldava (1998), Probleme und Methoden der
quantitativ-systemischen Lexikologie, Trier: WVT.
- A Knuppel (2001), "Untersuchungen zum Zipf-Mandelbrot-Gesetz an
deutschen Texten", in Haufigkeitsverteilungen in Texten ed. KH Best,
pp. 248-280. Gottingen: Peust & Gutschmidt. (in German)
Zipf's law in monkey-typing texts
(updated on feb-12-2002)
- GA Miller
(1957), "Some effects of intermittent silence", American Journal of
Psychology, 70:311-314.
- GA Miller, N
Chomsky (1963), in Handbook of Mathematical Psychology II, eds, R.
Luce, R. Bush, E. Galanter (Wiley), pp. 419-491.
- J Nicolis (1991), Chaos and Information Processing: A Heuristic
Outline (World Scientific). [out of print, see Amazon]
- W Li (1992),
"Random texts exhibit Zipf's-law-like word frequency distribution", IEEE
Transactions on Information Theory , 38(6):1842-1845.
- W Li (1996),
Comments to "Bell curves and monkey languages" (letter to the editor),
Complexity, 1(6):6.
- Richard Perline (1996), "Zipf's law, the central limit theorem, and
the random division of the unit interval", Physical Review E, 54(1):220-223.
- G Troll, P beim Graben (1998), "Zipf's law is not a consequence of
the central limit theorem", Physical Review E, 57(2), 1347-1355.
- Leo Egghe (2000), "General study of the distribution of N-tuples of
letters or words based on the distribution of the single letters of words",
Mathematical and Computer Modelling, 31:35-41.
- Leo Egghe (2000), "The distribution of N-grams", Scientometrics,
47(2):237-252.
- Ramon Ferrer, Richard V Sole (2002), "Zipf's law and random texts",
Advances in Complex Systems, to appear.
Turing's formula?
- Christer Samuelson (1995), "Relating Turing's formula and Zipf's
law", Proceedings of the 4th Workshop on Very Large Corpora, Copenhagen,
Denmark, 1996. [
abstract ]
Connection with information theory
(added on may-10-2002)
- P Harremoees, F Topsoe
(2001), "Maximum entropy fundamentals", Entropy, 3:227-292.
- P Harremoees, F Topsoe
(2002), "Zipf's law, hyperbolic distributions and entropy loss", IEEE
International Symposium on Information Theory (ISIT) Proceedings, in press.
Zipf's law discussed in popular
books/Tutorial
- BB Mandelbrot (1977), The Fractal Geometry of Nature (W.H.
Freeman and Company). section 38 "scaling and power laws without geometry".
[
Amazon entry]
- George A
Miller (1991), The Science of Words (Scientific American
Library, a division of HPHLP, distributed by W.H. Freeman and Company). [
Amazon entry]
- Manfred Schroeder (1991), Fractals, Chaos, Power Laws
(W.H. Freeman and Company), pp. 35-38. [
Amazon entry]
- Murray Gell-Mann (1994), The Quark and the Jaguar (W.H.
Freeman and Company), pp.92-97. [
Amazon entry]
- Lada
A Adamic Zipf, Power-laws, and Pareto - a ranking tutorial (online
tutorial: http://ginger.hpl.hp.com/shl/papers/ranking/)
Zipf's law in city populations
(updated on jul-30-2001)
- Bruce M Hill (1970), "Zipf's law and prior distributions for the
composition of a population", Journal of the American Statistical
Association, 65:1220-1232.
- R Gunther, L Levitin, B Schapiro, P Wagner (1996),
"Zipf's
law and the effect of ranking on probability distribution",
International Journal of Theoretical Physics, 35(2):395-417.
- Hernan A
Makse, Shlomo Havlin, H Eugene Stanley (1995),
"Modelling urban
growth patterns",
Nature, 377:608-612.
- DH Zanette
and SC Manrubia (1997),
"Role of intermittency in urban development:
a model of large-scale city formation",
Physical Review Letters, 79:523-526.
[ PDF]
comments by M Marsili, S Maslov and Y-C Zhang, and reply at Physical
Review Letters, 80:4831(1998).
(note: the x-axis in the paper is city
population, not rank)
- SC Manrubia, DH
Zanette (1998),
"Intermittency model for urban development",
Physical Review E, 58:295-302.
- Matteo Marsili, Yi-Cheng Zhang (1998),
"Interacting
individuals leading to Zipf's law",
Physical Review Letters, 80(12):2741-2744.
[
PDF]
- X Gabaix (1999), "Zipf's law for cities: an explanation",
Quarterly Journal of Economics, 114:739-767.
- Bill Reed
(2001), "On the rank-size distribution for human settlements", J
Regional Science, in press.
[ PDF ]
- LC Malacarne, RS Mendes, EK Lenzi (2002), "q-exponential
distribution in urban agglomeration", Physical Review E,
65(1):article017106.
Zipf's law in Web Access Statistics and
Internet Traffic
(updated on mar-07-2001)
See also, Mark Crovella's publication list
Jakob Nielsen's column Zipf curve and website
popularity
Jakob Nielsen's column Traffic from
referring sites
Hewlett-Packard's information dynamics group
- Steve Glassman, "A caching relay for the world wide web", In
First International World-Wide Web Conference, pages 69-76 (May 1994). (
html)
- WE Leland, MS Taqqu, W
Willinger, DV Wilson (1994), "On the self-similar nature of Ethernet
traffic ", IEEE/ACM Transactions on Networking, 2:1-15.
- Carlos R Cunha, Azer Bestavros, Mark E Crovella
, "Characteristics of WWW client-based traces", Technical Report
TR-95-010, Boston University Computer Science Department, June 1995.
- Virgilio Almeida, Azer Bestavros, Mark Crovella,
and Adriana de Oliveira (1996), "Characterizing reference locality in
the WWW", Boston University Computer Science Department, TR-96-11, June
1996. In Proceedings of the Fourth International Conference on Parallel and
Distributed Information Systems (PDIS '96), December 1996.
- Martin F Arlitt, Carey L Williamson (1997), "Internet web server:
workload characterization and performance implications", IEEE/ACM
Transactions on Networking, 5(5):631-645.
- ME Crovella, A
Bestavros (1997), "Self-similarity in world wide web traffic: evidence
and possible causes", IEEE/ACM Transactions on Networking, 5(6):835-846.
- P Barford, ME Crovella
, "Generating representative web workloads for network and server
performance evaluation," in Proceedings of Performance '98/ACM SIGMETRICS
'98, 151-160, Madison WI. [Slightly expanded version appears as
BUCS-TR-1997-006, November 4, 1997.]
- ME Crovella, Murad S Taqqu, Azer Bestavros
(1998), "Heavy-tailed probability distributions in the world wide web",
in A Practical Guide To Heavy Tails, eds RJ Adler, RE Feldman, MS Taqqu, Chapter 1, 3-26
(Chapman & Hall)
- N Nishikawa, T Hosokawa, Y Mori, K Yoshida, H Tsuji (1998),
"Memory-based architecture for distributed WWW caching proxy", Computer
Networks and ISDN Systems,30:205-214.
- BA Huberman, PLT Pirollo, JE Pitkow, RM Lukose, "Strong
regularities in world wide web surfing", Science, 280:95-97 (April 3, 1998).
- M Harchol-Balter, ME
Crovella, CD Murta (1998), "On choosing a task assignment policy for
a distributed server system," in Proceedings of Performance Tools '98,
Lecture Notes in Computer Science Vol 1469, pp. 231--242, 1998.
- ME Crovella, R
Frangioso, M Harchol-Balter (1999), "Connection Scheduling in Web
Servers," Boston University Computer Science Technical Report
BUCS-TR-99-003.
- ME Crovella, MS Taqqu (1999), "Estimating
the heavy tail index from scaling properties," Methodology and Computing in
Applied Probability, 1(1):?-?.
- P Barford, A Bestavros, A Bradley, and ME Crovella (1999),
"Changes in Web client access patterns: characteristics and caching
implications," to appear in World Wide Web, Special Issue on
Characterization and Performance Evaluation.
- Albert-Laszlo Barabasi, Reka Albert (1999), "Emergence of scaling
in random networks", Science, 286(5439):509-512. (may be relevant, but i
haven't checked)
An ABC News online article on this work can be found at
http://abcnews.go.com/sections/science/WhosCounting/whoscounting991201.html(Dec
1, 1999)
- JM Carlson, J Doyle (2000), "Highly optimized tolerance: a
mechanism for power laws in designed systems", Physical Review E,
60(2):1412-1427. [PDF ] (this
paper describes a general theory for power-law, not just in internet
traffic. but there is a section on this particular application.)
- Lee Breslau, Pei Cao, Li Fan, Graham Phillips, Scott Shenker
(2000), "Web caching and Zipf-like distributions: evidence and
implications", Proceedings of INFOCOM'99 (IEEE Press). [
abstract] [ PDF]
- Sidney Resnick, Holger Rootzen (2000), "Self-similar
communication models and very heavy tails", Annals of Applied Probability,
10(3):753-778.
- Lada
A Adamic, Bernardo A Huberman (2000), "The nature of markets in the
World Wide Web", Quarterly Journal of Electronic Commerce, 1:5-12.
[ PDF]
- Anders Johansen, Didier Sornette (2000), "Download relaxation
dynamics on the WWW following newsppaer publication of URL", Physica A, 276:338-345.
Zipf's law in bibliometrics, informetrics,
scientometrics, and library science
(updated on mar-07-2001)
This is similar to the Zipf's law in natural language, but discussed in the
context of information retrieval and library science.
Some links to conferences:
7th International Conference on
Scientometrics and Informetrics (July 5-9, 1999, Mexico)
6th International Conference on
Scientometrics and Informetrics (June 16-19, 1997, Israel)
a collection of
links on bibliometrics
- RA Fairthorne (1969), "Empirical hyperbolic distributions
(Bradford Zipf Mandelbrot) for bibliometric description and prediction",
Journal of Documentation, 25:319-343.
- Bertram Brookes (1977), "Theory of the Bradford law", Journal of
Documentation, 33:180-209.
- Ronald E Wyllys (1981), "Empirical and theoretical bases of
Zipf's law," Library Trends. Summer; 30(1):53-64.
- Bertram Brooks (1982), "Quantitative analysis in the humanities:
the advantage of ranking techniques", in Studies on Zipf's law, ed.
H Guiter, MV Arapov (Brockmeyer), pages 65-115.
- J Fedorowicz (1982), "A Zipfian model of an automatic
bibliographic system: an application to MEDLINE", Journal of American
Society of Information Science, 33:223-232.
- Bertram Brooks (1984), "Towards informetrics: Haitun, Laplace,
Zipf, Bradford and Alvey programme", Journal of Documentation, 40:120-143.
- Linus Ikpaahindi (1985), "An overview of bibliometrics: its
measurements, laws and their applications", Libri, 35(2):163-177.
- Ye-Sho Chen, Ferdinand F Leimkuhler (1986), "A relationship
between Lotka's law, Bradford's law, and Zipf's law", Journal of the
American Society for Information Science, 37:307-314.
- Ye-Sho Chen, Ferdinand F Leimkuhler (1987), "Analysis of Zipf's
law: an index approach", Information Processing and Management, 23:71-182.
- Ye-Sho Chen, Ferdinand F Leimkuhler (1987), "Bradford's law: an
index approach", Scientometrics, 11:183-198.
- Leo Egghe (1989), The Duality of Informetric Systems with
Applications to the Empirical Laws, Ph.D Thesis (City University,
London).
- Michael J Nelsen (1989) "Stochastic models for the distribution
of index terms", Journal of Documentation, 45:227-237.
- Howard White, Katherine W McCain (1989) "Bibliometrics", Annual
Review of Information Science and Technology, 24:119-186.
- Abraham Bookstein (1990), "Informetric distributions. Part I:
unified overview", Journal of the American Society for Information Science,
41:368-375.
- Leo Egghe (1990), "The duality of informetric systems with
applications to the empirical laws", Journal of Information Science,
16:17-27.
- Leo Egghe, Ronald Rousseau
(1990), Introduction to Informetrics: Quantitative Methods in Library,
Documentation and Information Science (Elsevier).
- Liwen Qiu (1990), "An empirical examination of the existing
models for Bradford's law", Information Processing and Management,
26:655-672.
- Ronald
Rousseau (1990), "Relations between continuous versions of
bibliometric laws", Journal of the American Society for Information Science,
41(3):197-203.
- Leo Egghe (1991), "The exact place of Zipf's and Pareto's law
amongst the classical informetric laws", Scientometrics, 20:93-106.
- Ronald Rousseau
, Qiaoqiao Zhang (1992), "Zipf's data on the frequency of Chinese words
revisited", Scientometrics, 24:201-220.
- Ronald Rousseau
, Sandra Rousseau (1993), "Informetric distributions: a tutorial
review", CJILS/RCSIB, 18(2):51-63.
- Quoniam Luc, Balme Frederic, Rostaing Herve, Giraud Eric, Dou Jean
Mari (1997), "Bibliometric law used for information retrieval", in
Proceedings of the Sixth Conference of the International Society for
Scientometrics and Informetrics, eds. Bluma C Peritz, Leo Egghe, Hebrew
Univ of Jerusalem.
- S Redner (1998), "How popular is your paper? An empirical study
of the citation distribution" European Physical Journal B, 4:131-134. (http://xxx.lanl.gov/abs/cond-mat/9804163)
- ZK Silagadze's preprint: "Citations and the Zipf-Mandelbrot's
law", arxiv.org e-print , physics/9901035
[ abstract ],
Complex
Systems, 11:487-499 (1997).
- Another preprint, C Tsallis, MP de Albuquerque, "Are citations of
scientific papers a case of nonextensivity ?", (March 1999) (http://xxx.lanl.gov/abs/cond-mat/9903433)
- ZK Silagadze (2000), "Citations and the Zipf-Mandelbrot law",
Complex Systems, 11(6):?-?.
- Robert Losee
(2001),"Term dependence: a basis for Luhn and Zipf models", Journal of the
American Society for Information Science and Technology,
52(12):1019-1025.[ PDF]
Zipf's law in finance and business
(updated on sep-09-2001)
- Of course, Pareto's paper should be listed here.
- If the distribution is not plotted as the rank-frequency plot, but the
number of companies in each revenue/sale/income/whatever category (this is
actually the other type of Zipf's plot, see Zipf [1935]), the log-normal
distribution is usually relevant (I haven't got the chance to trace the
references...)
- D Champernowne (1953), "A model of income distribution", Economic
Journal, 63:318-351.
- BB Mandelbrot (1963), "", Journal of Business, 36:394-?.
- BB Mandelbrot (1963), "New methods in statistical economics",
Journal of Political Economy, 71:421-440 .
- E Fama (1965), " ". Management Science, 11:404-419.
- JP Bouchaud (1995), "More Levy distributions in physics, in Levy
Flights and related topics in physics, Lecture notes in physics 450,
Springer pp 239-250.
- MHR Stanley, SV Buldyrev, S Havlin, RN Mantegna, MA Salinger, HE
Stanley (1995), "Zipf's plots and the size distribution of firms", Economics
Letters, 49:453-457.
- BB Mandelbrot (1997), Fractals and Scaling in Finance :
Discontinuity, Concentration, Risk (Springer-Verlag, Nov 1997) [
Amazon entry ]
- D.
Sornette, D. Zajdenweber "Economic returns of research: the Pareto law
and its implications", European Physical Journal B, 8:653-664 (1998). (
abstract)
- JP Bouchaud, D Sornette, C
Walter, JP Aguilar, "Taming large events: optimal portfolio theory for
strongly fluctuating assets", International Journal of Theoretical and
Applied Finance, 1:25-41 (1998).
- N. Vandewalle and M. Ausloos, "The n-Zipf analysis of financial
data series and biased data series", Physica A, 268:170-176
(1999).
- Greg Ip, "Analyst discovers the order in internet stocks
valuations", Wall Street Journal, Dec 27 (1999).
[
http://interactive.wsj.com/articles/SB946246776318315015.htm ][ a local
copy ]
- J J Ramsden, Gy Kiss-Haypdl (2000), "Company size distribution in
different countries", Physica
A, 277:220-227.
- Sorin Solomon, Peter Richmond (2000), "Stability of Pareto-Zipf law in
non-stationary economics", arxiv.org e-print
, cond-mat/0012479. [
abstract]
- H Aoyama, W Souma, Y Nagahara, M P Okazaki, H Takayasu, M
Takayasu (2000), "Pareto's law for income of individuals and debt of
bankrupt companies", Fractals, 8(3):293-300.
- A Dragulescu, VM Yakovenko (2001), "Evidence for the exponential
distribution of income in the USA", European Physical Journal B, 20:585-589.
- Bill Reed
(2000), "The Pareto law of incomes - an explanation and an
extention", submitted.
- Bill Reed
(2001), "The Pareto, Zipf and other power laws", Economics
Letters, in press. [note: the paper also contains a model for Zipf's law
in general.]
[ PDF]
- Robert L
Axtell (2001), "Zipf distribution of US firm sizes", Science, 293(5536):1818-1820. [note:
it's a frequency-size plot, not the size-rank plot.] [ PDF]
Zipf's law in ecological systems
(updated on mar-07-2001)
(well, i haven't checked the original papers, so i'm not sure the papers
are in the right place ...)
- BM Hill, "The rank-frequency form of Zipf's law", Journal of American
Statisticians, 3, 1163-1174 (1975).
- Juan Camacho, Richard V
Sole (2001) "Scaling in ecological size spectra", Europhysics
Letters, 55:774-780.
Zipf's law in earthquake?
- D Sornette,
L Knopoff, a YY Kagan, C Vanneste, "Rank-ordering statistics of extreme
events: application to the distribution of large earthquakes", Journal of
Geophysical Research, 101(B6):13883-13894 (1996).
[ PDF ]
Biomolecular sequences
(note that i didn't use the words "zipf's law", because these are not!)
- G Gamow, M Ycas , "Statistical correlation of protein and
ribonucleic acid composition", Proceedings of National Academy of Sciences,
41 (12), 1011-1019 (Dec 15, 1955).
- I wouldn't list the recent papers on the so-called Zipf's law in
subsequences in DNA sequences, because these rank-frequency plots do not
follow the power-law well, and the slope in the double-logarithm plot is far
from -1. These are rank-frequency plots, but are not Zipf's law!
Estimation issues
- BM Hill, "A simple general approach to inference about the tail of a
distribution", Annals of Statistics, 3, 1163-1174 (1975).
- G.S. Lo, "Asymptotic behavior of Hill's estimate and application",
Journal of Applied Probability, 23, 922-936 (1986).
- BM Hill, "Bayesian forecasting of extreme values in an exchangeable
sequence", J Res National Institute of Standard Technology, 99:521-538
(1994).
Miscellaneous
(updated on jul-05-2001)
- CJ Brackenridge (1978), "A study of phenotypic arrays derived
from seven genetic systems in an Australian population sample", Ann. Human
Biology, 5:381-388.
- P Schuster, PF Stadler (1994), "Landscapes: complex optimization
problems and biopolymer structures", Computer & Chemistry,
18(3):295-324.
- P Schuster, W Fontana, PF Stadler , IL Hofacker (1994), "From
sequences to shapes and back: a case study in RNA secondary structures",
Proceedings of Royal Society of London (B. Biological Sciences),
255:279-284.
- P Schuster (1995), "How to search for RNA structures. Theoretical
concepts in evolutionary biotechnology", Journal of Biotechnology,
41(2-3):239-257.
["The frequency with which a structure is realized in
sequence space is inversely proportional to some power c > 1 of the
structure's frequency rank, thus following a (generalized) Zipf law"]
- MS Watanabe (1996), "Zipf's law in percolation", Physical Review
E, 53(4):4187-4190.
- JD Burgos, P Moreno-Tovar (1996), "Zipf-scaling behavior in the
immune system", Biosystems, 39(3):227-232.
- E Bornberg-Bauer (1997), "How are model protein structures
distributed in sequence space?", Biophysical Journal, 73(5):2393-2403. [If I
understood correctly, some protein structure corresponds to many protein
sequences, whereas other structure corresponds to fewer sequences. So
structures can be ranked...]
- M Gerstein, H Hegyi (1998), "Comparing genomes in terms of
protein structure: surveys of a finite parts list", FEMS Microbiol Review,
22(4):277-304. [well, the words Zipf's law is mentioned in the abstract...]
- YG Ma (1999), "Zipf's law in the liquid gas phase transition of
nuclei", European Physics Journal, A6:367-371.
- Piqueira JR, Monteiro LH, de Magalhaes TM, Ramos RT, Sassi RB, Cruz
EG (1999), "Zipf's law organizes a psychiatric ward", Journal of
Theoretical Biology, 198:439-443. [what?]
- W Li (April
2001), "Zipf's law in importance of genes for cancer classification using
microarray data", arxiv.org e-print ,
physics/0104028. [ abstract
]
- J Kalda, M Sakki, M Vainu, M Laan (Oct 2001), "Zipf's law in
human heatbeat dynamics", arxiv.org e-print
, physics/0110075. [
abstract]
Relation with Benford's Law (also called
first-digit law)?
(new on sep-19-2001)
- L Pietronero, E Tossati, V Tossati, A Vespignani (2001),
"Explaining the uneven distribution of numbers in nature: the laws of
Benford and Zipf",
Physica
A, 293:297-304.
More links to Benford's law:
- S Newcomb , "Note on the frequency of use of the different digits in
natural numbers", American Journal of Mathematics, 4:39-40 (1881).
- Frank Benford, "The law of anomalous numbers", Proc. American Phil
Society, 78:551-572 (1938).
- RA Raimi, "The peculiar distribution of first digits", Scientific
American, 221:109-119 (Dec 1969)
- J Burke, E Kincanon (1991), "Benford's law and physical constants: the
distribution of initial digits", American Journal of Physics, 14:59-63
(1991).
- Mark J Nigrini, The Detection of Income Tax Evasion Through an
Analysis of Digital Frequencies (Ph.D Thesis, Univ Cincinnati, 1992)
(current a professor of accountancy at the Southern Methodist University,
Dallas, TX)
- "He's got their number: Scholar uses math to foil financial fraud" (Wall
Street Journal,July 10, 1995)
- E Ley, "On the peculiar distribution of the US stock indices digits",
American Statistician, 1995
- Theodore P Hill, "A statistical derivation of the significant-digit
law", Statistical Science, 10(4):354-363 (1995).
- M Nigrini, "A taxpayer compliance application of Benford's law", Journal
of the American Taxation Association, 18:72-91 (1996).
- TP Hill, "The first digit phenomenon", American Scientist, 86:358-363
(1998).
- Matthews, The power of
one, NewScientist, July 10, 1999.
- Eric Weisstein's Treasure
Troves of Science
http://www.treasure-troves.com/math/BenfordsLaw.html
- Alexander Bogomolny's Interactive
Math Miscelany and Puzzles
http://www.cut-the-knot.com/do_you_know/zipfLaw.html
- New York Times, Aug 4, 1998 "Following Benford's Law, or Looking Out for
No. 1" (a copy from
http://courses.nus.edu.sg/course/mathelmr/080498sci-benford.htm)
- LM Leemis, BW Schmeiser, DL Evans (2000), "Survival distributions
satisfying Benford's law", The American Statistician, 54:1-6.