Merging Tables and Layering Maps

In progress: a post about researching and georeferencing fabulous historical basemaps. (Really… who can resist a 1942 map entitled “The Good Neighbor Pictorial Map of South America”?) But for now: a bit of experimentation with composite maps. Before moving on to keying off of other datasets, there is more to do with putting together my own data differently.

CartoDB allows users to merge datasets and also to layer multiple datasets. This screengrab comes from a map that is not very pretty (yet), but informative nonetheless.

Screen Shot 2015-07-27 at 5.20.19 PM

This map consists of two layers. The farther-back layer is drawn from a dataset with the routes converted to GeoJSON linestrings. Here each string is labeled with the type of transportation. Right now, these are simply represented as direct lines, although in the future they could be arced, or even made to run along a suggested route (so that boats would not be, for example, running on land).

The second layer is based on several tables which have been merged inside CartoDB on the basis of “stay” identifiers. The choropleth is based on number of performances per city, from 1 to 18. I made the distribution uneven, because there is a much finer gradient of performances in the lower numbers than in the upper, and therefore the difference between 1 and 2 performances seems more significant than, say 15 and 18. The cities with null values are white with a 50% opacity (they were only passed through in transit). I’ve also added labels that appear on a hover and list the city name and number of performances for the cities in which performances occurred, although they are currently blank for transit cities.

This composite map is useful in how it begins to pull the pieces together. Montevideo, for example, suddenly appears very important as the only location not “on the way” — compare this with the one-night stands in Vina del Mar near the water, or Manizales along the rail journey. The addition of certain transit (non-performance) cities also begins to flesh out a more circuitous journey, such as the transition in Callao from boat to train between Valparaiso and Lima.

Getting an Outside Eye

Yesterday, I met with Mike Migurski to look over some of the datasets and maps that I’ve been setting up. A few key takeaways from the conversation:

  • There is a project from last November, which was doing something very similar in circling around the same datasets in different ways to see what insights turn up: Paul Downey’s One CSV, Thirty Stories (he only actually makes it to 21, but still fascinating).
  • How to key off of other datasets (for example with a GeoNames id), but also the complications of this for historical work, where borders, roads, population, etc will have been different. Another interesting collision might be to explicitly follow this historical thread and run up against a WWII dataset.
  • Restructuring my database. While I’ve been working with two primary tables (“shows” and “routes”), Mike suggested adding a third for “stays,” so that each has a unique identifier, for example if dancers pass through a town more than once. We also discussed the difference between entering a value, for example in a “transportation” column, versus having five columns, one for each possible type of transportation, into which one selects only y/n.
  • How to deal with ambiguity. I’ve been struggling with knowing when dancers are in a given place in order to perform, but not how long they traveled to get there versus how much in advance of a given performance they arrived to rehearse (other archival evidence suggests this was not consistent — sometimes they even performed the night they arrived). We talked about dealing with these stays and trips by keeping track of four dates for each: i) earliest possible begin; ii) latest possible begin; iii) earliest possible end; iv) latest possible end. This is appealing to me, because it has the potential to highlight the offstage time, which is harder to track than the performances.

Animating Performances on Tour with CartoDB

I have other visualization ideas to try, but first I wanted to look at one of the same ABC performance datasets from before in CartoDB. Even though the underlying data is the same, each platform has its own quirks that require reformatting the database through trial and error. Upload, stare, delete, edit, upload…

Whereas Palladio is not made for easy sharing, all datasets uploaded to CartoDB are public by default. Sharing work that is still in progress might be nervewracking for a historian. But this open data means it is very easy to share animated maps that are hosted directly on CartoDB’s site, for example by embedded them directly into other webpages. (Note: to see any of the maps embedded here larger/cleaner on CartoDB’s site, you need to click the paper airplane icon in the top right corner, followed by “link to this map.”)

Like Palladio, CartoDB is also an evolving system. Last time I explored, they made it very easy to customize the CSS file, but there were a limited number of “out of the box” options. Since then, more have been added. One of the more straightforward settings is to animate chronologically a heat map of number of performance days per location (note: not performances), which can be set cumulatively as well, so that it leaves a trail across the map.

Another possibility with CartoDB is to set up a poster-style map that plots cities based on density of a particular dataset column. With this color scale, it is very easy to see at quick glance the proportion of performance dates per location across the continent.


These maps can also be modified by other parameters as well. For example, in the database, I categorize the shows into three buckets: “matinee,” “evening,” and “[evening].” The last refers to a show that is likely evening but only listed in the budget documents as under a sub-type, ie: “subscription” or “benefit.” Here is an example of a map where animation is essential, because multiple performances often occurred at different times of day in the same place. I’ve set the duration to be slower, so that there is time to see the color changes. However, the visibility of particular shows depend on a change in time of day.

Other notes so far on new additions to this version of CartoDB:

  • It is much easier now to set click or hover-based information pop-ups, which used to have to be entered in CSS form.
  • Although I haven’t done much with them, there are many more wysiwyg tools than before for annotation, titles, etc.
  • Once you get used to it, the forking of one dataset to many possible maps is useful.
  • The timebar in the lower left does not have the same visual appeal of CartoDB’s other features. I need to look into custom mods people have done.
  • Any kind of point-to-point map still requires custom code. Next time!

Returning to Palladio

Since I’ve spent time with Palladio before, I returned there to begin working with the American Ballet Caravan data. It remains a very clean interface through which to visualize simple point and point-to-point poster-style maps, and to filter those based on simple parameters.

For example, here is a map of South America, with points sized for the number of performances in each of the respective cities, up to a maximum of 18 down to a minimum of 1.

Screen Shot 2015-07-15 at 5.34.59 PM

And here is another map with the full route plotted, followed by versions filtered for two different types of travel.

Screen Shot 2015-07-17 at 1.09.58 PMboat Screen Shot 2015-07-17 at 1.18.52 PM   and train Screen Shot 2015-07-17 at 1.19.32 PM

Filtering the point-to-point map by the facet of transportation is fascinating. It not only helps to trace a route and establish distances between places, but also serves as a reminder of the functions of different forms of transportation. For example, in the train image, the gap between the third- and fourth-from-left dots represents a trip that was scheduled to occur by train, but was delayed by snow and ultimately occurred via airplane. At the same time, what looks like a small gap represents what was in fact a huge financial and logistical hurdle. It would be great to be able to apply colors to these different types of transportation and view them together.

Putting these two Palladio functions together could reveal which locations were critical, versus which merely “on the way.” For example, why travel a far distance via multiple forms of transportation for only a few shows? Unfortunately, each parameter can only have one extension, and only one primary table is allowed. Starting with the data from the point-to-point map for the primary table, requires that “source” and “destination” cities be specified, each of which can only have one extension (coordinates), not also number of performances. On the other hand, starting with the data from the map that sizes points based on number of performances for the primary table means that they cannot also be extended as source/destination cities. A further issue is that the exact travel dates are not as clearly documented as the performance dates. I attempted to build a Franken-file as the primary table, with lots of redundancies (ie: city, source, and destination all pull from the same coordinates extension), but am still searching for the correct database format.

Another feature of Palladio that I worked with previously was the timeline.

Palladio has come a long way, for example with the error downloads, although there are definitely still some bugs, such as raw code popping up in places where the user should be choosing parameters for a map. I have not yet tried installing locally, but that should be the next step, because one of my frustrations with the browser-based version of Palladio has to do with its user interface. I understand all of the reasons they don’t store data (I wouldn’t want to be blamed for losing a humanities academic’s data, either, especially when there is a darn good chance it’s a pebkac-kind of issue), but the flip side is that it is extremely unstable. There are quite a number of ways to hit a button that refreshes or the equivalent to knock you back to the original upload page, in order to add all of your data and begin again.

American Ballet Caravan Intro, Part 2: More Datasets

In the previous post, I wrote about some of the challenges I was facing to clean datasets based on letters from the Rockefeller Archive Center and certain New York Public Library Performing Arts Library collections. Since then, I have reviewed and manually entered material from two more important collections: one that provides concrete anchors for much of the previously less certain data, and another that ventures into the realm of historical fiction.

The former comes from the New York City Ballet’s meticulous archives, which include everything from bulletin-board notices instructing dancers where and when to receive particular shots before departure, to invaluable budgetary paperwork. These critical documents revise certain previous assumptions. For example, they clarify the dates, times, and types of performances in particular cities, thus identifying which fifteen cities mentioned in previous documents the dancers performed in, versus which of the total sixty they traveled through or only visited as tourists. This lends itself to cleaner timelines, or even exploratory maps that chart number of shows per city versus population. At the same time, it is worth noting that these only mark where dancers performed, but not durations of travel or interim stops along the way.

These documents also foreground mobility capital. While my previous data on types of transportation had been piecemeal, the budgets meticulously records amounts spent on trains, planes, cars, and busses. I have already made certain selections by only entering those used to transport dancers longer distances (ie: from an inland city to a port city in order to take a boat), rather than also including shorter rides, for example the taxi from a train station to a hotel. A necessary decision going forward will be whether to only trace the movement of the larger group as a whole or whether to also include accessory movements, for example of Lincoln Kirstein back to the US to request deferrals of male dancers’ draft orders, or the flight taken by an injured dancer in order to catch up with the train-based group after staying behind.

Another new opportunity suggested by the NYCB archives comes from the paperwork collected for the purposes of blanket and individual visas. This include lists of the dancers’ and other personnel’s places of birth and citizenship, which reinforce the disparate forms of contact engendered at a person-to-person level by such a tour. Beyond the scope of this project, there are also opportunities for material histories that focus on props, costumes, etc., such as logs of replacements and repairs or the fact that the dancers left New York with 550 pairs of pointe shoes.

I said there were two new source collections. The second is a 400-page “fictional” unpublished manuscript that was written by one of the dancers while on tour, and comes along with a key to a good portion of the pseudonyms used. In addition to a lot of gossip, the account fills out some of the empty spaces between the dots on the map. Although not consistent, this perspective yields more data both on the locations in between, and on the duration of those transitions. Events take place, for example, where boats stopped between New York and Rio de Janeiro, or on the way up the west coast from Vina del Mar to Lima.  Likewise, the lengths of particular journeys are mentioned, as is the tendency to travel through the night.

Even amid the interpersonal relations of bored dancers far from home, this manuscript further continues to fill out the dancer’s own interconnected worlds, such as the Russian wardrobe manager who had already done two tours of South America with another ballet company.

Together, these offer exciting new resources, and lots of angles from which to begin!

American Ballet Caravan Intro: Identifying Datasets

This summer, I am doing test visualizations with data from American Ballet Caravan’s 1941 tour of South America, with specific interest in the affordances of various available tools. My work is supported by a Digital Humanities Summer Grant from the University of Bristol.

This tour is special for a few reasons. First, over the course of five months, the forty-six employees of American Ballet Caravan managed to pass through almost every country in South America, with the exception of Bolivia and Paraguay, playing ninety engagements in about sixty cities. These performances took on a variety of formats, from subscriptions and benefits, to lecture-performances for theatres unable to support full productions. This exceeded the touring scope of any previous North American or European dance company,and also got them stuck in a blizzard in the Andes mountains.

Second is the particular mixture of public and private support for such an undertaking. I was first drawn to this tour via my research on Sol Hurok, among whose papers appear several suggested itineraries. Hurok, however, is rarely mentioned by name in other archival caches of documents on the 1941 tour. The credit for organization tends to go to Nelson A. Rockefeller, a friend of American Ballet Caravan’s director, Lincoln Kirstein. As Coordinator of  Inter-American Affairs, Rockefeller supported a significant portion of the tour as a test case for the future use of dance in cultural diplomacy. In the final report, the question of official sponsorship is raised in terms of the tension between “commercial enterprise” and “good will.”

In this first post, I am going to explore the types of datasets available in the archives that I have had access to so far, and some of the challenges of organizing them for this spatial-history-based inquiry into the economics and backstage labor of transnational dance touring.

Key Types of Archival Documents Related to the 1941 ABC Tour:

  1. Suggested itineraries from Hurok’s people, organized by number of performances, given travel times and transit methods. (Note: None of these came to fruition, because the tour schedule was entirely reorganized two weeks before departure.)
  2. Contextual notes to accompany such itineraries (i), in particular by the South American management organization Sociedad Musical Daniel, whose annotations concerned everything from potential audiences to altitude.
  3. Formal reports prepared and submitted by Kirstein to Rockefeller before, during, and after the tour. These include reflections on the larger stakes of the endeavor, as in one report submitted four months into the tour, which addressed multiple categories of success (“financial return,” “reception by the press,” and “residual prestige”) on a city-by-city basis.
  4. Less formal letters and reports by Kirstein and others to Rockefeller and his colleagues. These include more piecemeal reports about reception and particular challenges encountered, as well as allusions to past and future plans.
  5. Personal documents by performers, including scrapbooks, photographs, and satirical writing.

These are all fascinating in themselves as historical stories that reveal the importance of infrastructures. For example, the tour was not only rerouted two weeks prior to departure, but it had to be rescheduled again once it was underway, due to weather and political conflicts (both World War II and the Ecuador-Peruvian War). In terms of means of transportation alone, a blizzard over the Andes cut off train travel and stranded the company in Mendoza for about ten days, before they were able to charter a private jet to Santiago. And while WWII would seem to have been taking place far away, Atlantic shipping schedules were thrown off, and the company ended up needing to take a Spanish refugee ship originally from Bilbao between Santos and Buenos Aires.

I am trying to break these stories down to “raw” data and classify them. In terms of a confluence of dates and places, there is no single authoritative travel plan for the tour since it ultimately unfolded between June and November 1941. Therefore, I have been categorizing the pieces of information in various documents as “prospective” versus “past,” although even seemingly-authoritative references to what already happened at times contradict one another! Some specific dates are clear, such as the opening night in a particular city. Others specify a range. For example, the company gave nineteen performances over a period of twelve days in this city, versus only two shows in another one. Seventy-one performances were supposedly done by the time of a report dated September 9th, but then again, the same report also makes reference to September 21st as in the past. It is not consistently possible to identify the difference between travel dates and performance dates, and we also know from complaints about performer fatigue that they sometimes also traveled in the day and performed the same night.

One of the techniques I have been using to smooth this data is to count the number of calendar weeks for the twenty-four-ish week tour. Although this is not necessarily representative of the number of shows on a city-by-city basis, it allows for uncertainty in terms of dates while still setting up a clear temporal sequence. That said, establishing a comprehensive, sequential list of locations is equally problematic. While eighteen cities are named in the reports and letters, there are references to a total of sixty cities. This suggests less than a third are accounted for, namely the locations in which the company stayed more than one night. Many additional cities appear in the planning documents and, while it is possible to speculate as to where they would fit geographically, there is no evidence the company ultimately appeared there.

Beyond performance and travel locations and dates, there are other types of data within these documents. In terms of logistics, we can note modes of travel (boat, train, plane, car); whether a given country offered a subsidy for the tours in their region; and the type of performance (subscription, benefit, regular, lecture-performance). Among other, more qualitative forms of data, we note where American Ballet Caravan reported gaining or losing money, and list the other international companies referenced as benchmarks for their reception in given places.

These datasets will be key to better understanding what John Urry would call the “mobility systems” of dance touring.