Colliding with Other Datasets: Population Versus Number of Shows

One of the things on my to-do list has been to collide my touring data with other available datasets from 1941. In the long term, such digital work not only offers new ways to analyze dance history, but also points up new ways to make dance materials available to historical studies in other fields.

Unfortunately, very little historical data from the tour year is already collated online. But I did use a combination of the World Almanac and Book of Facts from 1941 and 1942, together with the 1941 Rand McNally Commercial Atlas to construct a little dataset of populations for the cities in which American Ballet Caravan performed. I then ran these against the number of shows performed in each city, in order to see whether any stood out as anomalies.

For this map, I have divided the color of the cities by standard deviations above or below the mean. The farthest outliers are -2 and +3. The dataset is not really large enough to support a high level of confidence, but it at least gives some sense of outliers. Here it’s possible to compare Mendoza, where 3 shows were performed for a population of 76,780 to Rosario, where 2 shows were performed for a population of over half a million, or Sao Paulo, which had 4 shows for 1,151,249 residents. Medellin comes out at precisely the mean. So why did they do so few shows in Sao Paulo? Or so many in Mendoza?

The map is color coded by standard deviation, with relevant information (city, population, and number of shows) appearing on the hover.