Yesterday, I met with Mike Migurski to look over some of the datasets and maps that I’ve been setting up. A few key takeaways from the conversation:
- There is a project from last November, which was doing something very similar in circling around the same datasets in different ways to see what insights turn up: Paul Downey’s One CSV, Thirty Stories (he only actually makes it to 21, but still fascinating).
- How to key off of other datasets (for example with a GeoNames id), but also the complications of this for historical work, where borders, roads, population, etc will have been different. Another interesting collision might be to explicitly follow this historical thread and run up against a WWII dataset.
- Restructuring my database. While I’ve been working with two primary tables (“shows” and “routes”), Mike suggested adding a third for “stays,” so that each has a unique identifier, for example if dancers pass through a town more than once. We also discussed the difference between entering a value, for example in a “transportation” column, versus having five columns, one for each possible type of transportation, into which one selects only y/n.
- How to deal with ambiguity. I’ve been struggling with knowing when dancers are in a given place in order to perform, but not how long they traveled to get there versus how much in advance of a given performance they arrived to rehearse (other archival evidence suggests this was not consistent — sometimes they even performed the night they arrived). We talked about dealing with these stays and trips by keeping track of four dates for each: i) earliest possible begin; ii) latest possible begin; iii) earliest possible end; iv) latest possible end. This is appealing to me, because it has the potential to highlight the offstage time, which is harder to track than the performances.