A few years back the Proquest database folks implemented a nice feature in their search results. In the right-hand column there’s a bar graph showing the number of hits by date for your search term. It’s a search facet, so you can zoom in on a particular decade and it shows you a breakdown by year. I’ve found this feature to be instructive, especially in teaching, because it’s an easy, visual way to chart the significance of some phenomenon (say, like “sugar tariffs”). Out of curiosity, I wanted to know how hard it would be to create such a bar graph for an online archive that did not automatically generate one. My in-house digital humanities consultant sat down with me yesterday to experiment with this. We used Cornell University’s HEARTH, a fabulous online archive of home economics, nutrition, dentistry, and random other topics dating from the 1850s through the 1950s. From an initial search for “sugar,” we got 149847 matches in 4146 records. After a few hours of playing around, this is what we came up with:
I find it quite fascinating (and not at all surprising) to see the peak in hits between 1913 and the mid-1920s, at about the same time and pace as simultaneous debates about sugar tariffs. People in those years cared a lot more about tariffs than people do now. People didn’t glaze over at the mere mention of the t word. The HEARTH archive features nutrition, etiquette, and health books and magazines, sources which do not in themselves discuss the tariff or sugar politics writ large. But these findings offer a kind of confirmation to my hunch that people talked a lot more about eating sugar at the same time that they had heated debates about the sugar tariff in the 1910s and 1920s.
So how did we make this graph? It may be that some DH wunderkind has some different tools up their sleeves to accomplish this, but here’s our homemade technique.
1. We tinkered with the url from the search results so that all of the records for the “sugar” search showed up on one page. The very end of the unwieldy url goes like this: start=1;size=25 We changed it so that it read start=1;size=5000 It coughed up the results relatively quickly.
2. We copy pasted the results into notepad ++ and went through a number of steps to clean up the data. With its nice a find/replace function, notepad ++ is a great tool for systematically converting/cleaning up text. We essentially removed all text, leaving only the numbers. We did this slowly, one step at a time, until all that was left were pairs of numbers, tab-separated like this:
etc., The first number is the year, the second number is the number of hits. Each pair represents one book or magazine that had hits for “sugar.” Thus, as you can see, there were multiple lines for each year. We condensed these in the next step.
3. We copy pasted this list into a google docs spreadsheet. I can’t remember exactly what I did to make it do this, but I did some magical step that merged together all of the hits where the year matched. (update: the magical step may have been converting it to a pivot chart). I tried to make a nice bar graph in google docs, but I couldn’t figure out how to make it label the axes properly. So I ended up cutting and pasting the data into Excel to make the chart. The hardest part was saving the chart out.
Not fancy, and somewhat labor intensive. But mildly entertaining for a Saturday afternoon!
If you have other ideas about how to do this, please feel free to drop us a line.