Lecture 10 - data-ppf/data-ppf.github.io GitHub Wiki

"We [NSA] had very close contacts with the Bell Laboratories. They were very, let's say, willing to work along with us"

  • Solomon Kullback (1907-1994), who spent 1942 at Bletchley before a distinguished career as Chief Scientist at NSA, interviewed in 1982.

The thread from Bletchley goes over the Atlantic ocean to the proto-NSA and Bell Labs. To that end, we'll set aside AI, for now, and turn to how computing with data flourished at Bell Labs 1945-1966.

The #readings for March 26 will emphasize the mindset of applied computational statisticians at AT&T's Bell Labs, which was the combined Google and Facebook of their day (a government-tolerated monopoly with all the data, all the researchers, and all the compute), working quite directly with data and information about people. It will also emphasize how they saw their work as quite distinct from the academic tradition many of them knew as mathematical statisticians, which we last encountered several weeks ago in the fights among Fisher, Neyman, and Pearson.

Readings this week are several short dishes, taken from

  • Tukey, founding chairman of the Princeton Statistics Department while also directing the Bell Labs Statistics Research Department;
  • John Chambers, creator of the S programming language; and
  • Colin Mallows, who worked at Bell and AT&T 1960-2000.
  1. Tukey, John W. "The future of data analysis." The annals of mathematical statistics 33, no. 1 (1962). This is a 67 page paper so please only read the 1st and last sections (12 pages and 4 pages): pp2-14 (end at "II. Spotty Data") pp60-64 (start at "VIII. How shall we Proceed?")

  2. Tukey, John W. Exploratory data analysis. Vol. 2. 1977. This is a 711 page book so please only read the 1st sections (5 pages and 7 pages): "Preface", pp v-ix Sec 1A and 1B, pp 1-7

  3. Chambers, John M. "Greater or lesser statistics: a choice for future research." Statistics and Computing 3, no. 4 (1993): 182-184. This is a very short (3 page) paper.

  4. Mallows, Colin. "Tukey's Paper After 40 Years." Technometrics 48, no. 3 (2006): 319-325.: this is only 7 pages, and each is better than the next; it's a selective history of everything right and wrong in mathematical, computational, and industrial data in the 2nd half of the 20th century.

BONUS: for added flavor, please do watch this 7 minute video of Bell's Claude Shannon, demonstrating what would now be called a machine learning robot from 1950 --- or at least watch up until the 3 minute mark so you can see how seamlessly he transitions from presenting the cute little e-mouse finding its e-cheese to a rockets-red-glare interlude celebrating the data-centered "military industrial complex" (a cautionary phrase not to appear until 1961) of his day: https://www.youtube.com/watch?v=vPKkXibQXGA