I attended the Digital Humanities Winter Institute at UMD’s MITH, [http://mith.umd.edu] this past week, which meant I learned how to be a better researcher and overall nerd.
Firstly, hooray for CUNY GC for co-sponsoring, including making graduate student scholarships available, which meant I could go.
I enrolled in the Data Curation for the Digital Humanities track, co-taught by Trevor Muñoz and Dorothea Salo, where I learned about issues in data ordering; data sociology; general wrangling, selecting and retrieving; and linked data. I looked at all this with an eye to the CollectiveAccess [http://www.collectiveaccess.org/] build project I’m working on with the Interference Archive [http://interferencearchive.org/] at the moment.
According to Muñoz, data curation addresses challenges of maintaining digital information in a manner that preserves its meaning and usefulness as a potential input for further research and scholarship.
Why concern yourself with data if you’re not in the sciences? “Data is alleged evidence” said Dorothea Salo, e.g. data is what you are going to show people to prove you didn’t pull your critical and analytic conclusions our of your a$$.
Data curation is larger than archiving. The individualism of humanists is a problem — you can’t curate alone! If you want a project to continue after you die, retire, get a new job, or start a new project, you need to think about how you are digitize, organize, and preserve your data. The basic instruction was DOCUMENT. BUT REALLY, DOCUMENT. Which is an instruction I could hear and tell my colleagues over and over. It’s just that important.
Data is most useful to people who care about it, which is to say make your data available to your communities of interest, be they Whitman scholars or critical race theorists. There is a social element to data sharing, it lives on in social circles.
As in sociology [and life], other people matter. The audience is the REASON we ultimately select what to keep and share — we are *not* hoarders, we are scholars.
So you have some various data, perhaps these types:
• Image collections
• Page scanned books
• marked up books
• these and dissertations
• website preservation
• audio & video
• complex multimedia
Before your brain explodes, remember that there are many software options, and that choosing one comes last. First, consider your audience, your order, and your content!
General Data Wrangling & Collections Software & Tools:
Digital Library Software – Designed for image exhibitions
ContentDM [will do books]
Preservation: Fedora Comons, microservices
Deposit/mgmt: Hydra, Islandora [VREs, virtual research environments]
End-user UI: Yhrda, Isandora, Okema, glue [puts Omeka onto Fedora Commons]
Archivematica, ArchivesSpace [beta soon], Duke Data, CollectiveAccess, BitCurator
Data Management Platforms
Dataverse Network, thedata.org [db mgmt platform in a box]
See list on DCC wiki.
Linked Open Data
With Linked Open Data, the main idea is that researchers and, well, anyone, can derive different kinds of value from existing technology. There is what is called a semantic web, a way of using terms that can be replicated across the web, which one of my classmates likened to esperanto, which can be used to describe things. Every Thing, actually. And if “we” web-information-sharers agree to use shared semantics, then the information about Things can be linked up using a friendly little identifier called a URI, which is like a URL, but more specific to a particular thing.
Well how the HECK would I know what URI to pick? Is that tree a TREE or a DECIDUOUS TREE? Before your brain melts,I’m gonna tell you that’s the easy part, folks — thanks to the naming power of language, most things have already been assigned a taxonomy [creepy but comforting?]. If you’re talking about an author, try VIAF http://viaf.org/, and if you’re talking about a book, you need the Library of Congress [LOC], and if you’re real unclear still, try CALAIS [http://www.opencalais.com/] to help build your Linked Open data URIs.