Everything I did not take a course on: Data Sets, APIs, Random Links & Intersectionality

I’ve been writing about the Digital Humanities Winter Institute, reporting on the Data Curation track here, and weird data archiving here.  The DHWI was a faucet of information which I spongily absorbed, learning that I am especially apt and interested in: large-scale data/text analysis, working with open data sets, and image algorithm analysis; and that I will continue to apply my radical and working-class understanding of the world to something as ostensibly “neutral” as data and technology.

After being in the Data Curation track for a few days, I comprehended that I am a huger nerd than I had realized, and what I actually care about is sitting with a powerful computer and crunching data because I think that ideas outside of institutions are important — surprise! And there are, in fact, really cool and not-that-hard ways to do this. And, I learned about APIs and gathered a bunch of rad resources.

Data Sets

Did you know that you can look up other people’s research RIGHT NOW?… and then make cool graphs with it, perhaps thinking about it in a way which someone locked into an institutional mindset hasn’t considered? YOU CAN!! That’s fucking cool.

“The DBpedia [http://dbpedia.org/About] data set uses a large multi-domain ontology which has been derived from Wikipedia. The English version of the DBpedia data set currently describes 3.77 million “things” with 400 million “facts”.”

Google Refine [http://code.google.com/p/google-refine/ ] “Google Refine is a power tool for working with messy data, cleaning it up, transforming it from one format into another, extending it with web services, and linking it to databases like Freebase.”

Freebase [http://www.freebase.com/]  is an “An entity graph of people, places and things, built by a community that loves open data.”


Application Program Interfaces [APIs] have felt like a unicorn to me for quite some time, a special black box in which one can place one’s Special Programming Skills and retrieve streams of pertinent data at lightening speeds. If only I were Special! If only the Black Box liked me, too! I sighed to myself looking forlornly at the links to code and instructions.

An API is an information interface intended for machines, specifically software that holds data. The API talks to the software to interact with machine data structures and retrieve the kinds of data you tell it to. It’s kind of a data bottom that you negotiate with very carefully.

After this week, I’m still not a whiz at APIs but I’m no longer intimidated. And that’s why I’m excited that CollectiveAccess offers an API because the data that we’re placing in this radical archive needs a way to get *out*.

Intersectionality: Race, Class & Gender

I got to DC from an overnight bus and arrived in College Park, MD via train, so thoroughly out of it that I didn’t notice the shuttle bus to campus and just walked, weaving my way onto campus through the facilities and maintenance department; a working-class metaphor that did not escape me. After finding deliverance in the form of coffee and getting to the opening plenary, there was one awkward moment that stood out: the organizers asked everyone who’d gotten a scholarship to raise our hands [there were a few dozen of us, mostly graduate students] and then named that every teacher had taken an honorarium cut in order to be able to let us come.  I’m trying to narrativize why that felt bad — perhaps because it didn’t feel like a scholarship anymore, but like taking something from someone; or because my internalized shame around being the scholarship kid didn’t like being pointed out as needy; or because it’s a practice of naming I’d never seen done before; or just because the math didn’t make sense to me. It made me think a lot about the culture of money and access to money that I might encounter at the Institute.

I picked the Data Curation track specifically because it seemed to apply to my work with the Interference Archive on their CollectiveAccess build; because as a cultural producer I create quite a lot of data which I wanted to understand how to wrangle most efficiently; and because I want to learn from folks who are not white men whenever I can [which is not totally often and which is no discredit to amazing male teachers I’ve had who also happen to be white]. Either way, I walked from a conference plenary that was more white and male, into a room populated by women and wondered what about gender was going on in this particular production of knowledge.

That I picked a track based on the race and gender of my teachers is a commentary on the state of formal and institutional education* but I am also kicking myself a little because making that kind of decision meant that I did not take the hardcore data crunching track because I thought it would be either spurious to my work or a man[splaining] party. As a result of taking the Data Curation track and being at the conference and exposed to the edges of all those other ideas, I am now more interested than ever in: large-scale data/text analysis, working with open data sets, and image algorithm analysis. Lord goddess get me a grant so I can play with the badass macbook pro I got and some of these programs.

*though that will change, since the last several years of enrolling classes in colleges and universities across the board are more women and ethnically diverse than ever.

