Day 6: Delving into Data Mining

Back after a weekend break, ready to stuff more new information into my organic content management system (a.k.a. brain). This week we will see just how much will fit in there!

Brain.old.pictureRobert Fludd, Utriusque cosmi maioris scilicet et minoris […] historia, tomus II (1619), tractatus I, sectio I, liber X, De triplici animae in corpore visione (Wikipedia Commons)

I like that the day often begins with an introduction to a tool that is super easy to use. Today that was Google Ngram Viewer that you can use to compare word frequency in all the books that Google has scanned as part of its Google Books project. I tried “Latin America” and “South America,” and it looks like this:

(Ugh. That’s A LOT of empty space there between the chart and the text, and I can’t seem to fix it. Clearly I do not get all the ins and outs of WordPress.) Although the tool is limited and even problematic (how to determine exactly what the data set was? Does Google publish a list of its books?), I can definitely see using it as a teaching tool, for example to show my students that the term “Latin America” is recent, and that scholarship on the topic has peaked at particular times that correspond with (or follow) political interest in the region. We also looked at Bookworm, which performs in a similar way on different repositories of digitized texts.

So then we moved on to slightly more complex tools for data analysis, Voyant and Open Calais. Spent a good bit of time playing with Voyant. Again, I can see the potential, perhaps especially as a teaching tool. I am intrigued by what these tools can potentially reveal about what’s in the text.

Screen Shot 2014-07-14 at 9.57.28 PM



Accessed at Mining

Okay, so I love the idea of using data mining for analyzing internationalism vis à vis biennials in Latin America, but like most other participants at this institute, I don’t have a lot of digitized text available to analyze. I’d like to analyze biennial reviews, for example, but those I have are PDFs and would need to be OCRed (I’m sure I’m not using the terminology correctly!) before I could use digital tools to analyze them, and even then, I’m not sure that the volume of text would be enough to make it worthwhile. I’ve learned, for example, that to do a good topic analysis, I’d need a minimum of 1,000 texts. So reserving judgement of data mining as it might apply to my project and looking forward to talking about visualization tomorrow.

(A small mystery that’s bugging me: why does this post show up as having been put there at 2:02 am on Tuesday, July 15 when it’s 10:11 pm on Monday, July 14? It’s not my computer clock–must be the clock linked to the host?)

Source: Day 6: Delving into Data Mining

Day 5

Learning how to use Google Map Engine Lite to map… stuff. Anything with geographical points that are map-able. I created a small Excel spreadsheet with addresses and created this test map of some public sculptures in Bogotá, Colombia. Then I added pictures. Click on the points to see the pictures and addresses! (I did this quickly and couldn’t find the exact address for Bursztyn’s Homenaje a López Pumarejo, so it is not completely accurate.)

Another cool and powerful and easy to use tool.



Day 3

We delved (briefly) today into a discussion of how power structures are built into our data structures. Our guest speaker Kimon Keramidas, Assistant Professor and Director of the Digital Media Lab, Bard Graduate Center, referenced briefly a project by Aaron Glass on Franz Boas that deals with indigenous knowledge systems and databases (must look that up). Someone gave an example of a person working on a digital project to make a database of First Nations objects publicly available on the internet, consulting with a First Nations person, and discovering that the typical systems of classification (Dublin Core) often lead museum professionals and others to replicate colonial systems of knowledge/power. Granted, the very idea of making a catalog of objects from another culture publicly available is very embedded in Western European civilization! But how do we catalog objects in ways that allow us to understand their original purposes or to see them anew instead of perpetuating stereotypes of knowledge?

I thought, wow, the power of the internet: One could catalog a small group of objects in various ways, according to various cultural values, and compare them, side by side! A way to visualize how differently various cultures think/value. (Even that project smacks of the desire of the Western mind to master all areas of knowledge though, right?) Of course, that would require a complex, relational database. Omeka wouldn’t cut it for a project such as that. Which brings me to Omeka.

Today’s assignment is to blog about differences between various platforms for content management that we’ve been introduced to (Omeka, Drupal Gardens, and Scalar). Sadly, we didn’t have much time for hands-on exploration of these. I only poked around Omeka myself. It’s amazing that this resource is available to scholars for free! And it is fairly easy to use (although I did not find it to be intuitive). But extremely limited, too. Just trying to enter a few “items,” I began to hate the Dublin Core. Several other participants observed that it would be nice to have the option of using the VRA core, instead (thanks to JJ Bauer for pointing to me to the link). But standards for classification aside, Omeka is a flat database, not relational. I have been spoiled by my earlier experiences with Filemaker Pro–I expect great flexibility from my data management systems. At any rate, I think Omeka will be a great tool for pedagogy. I look forward to playing later with Drupal Gardens and Scalar to see what they can do.

Source: Day 3

Day 2

Q: How to find and organize all that data? A: Zotero! Well, a partial answer, anyway. A very good answer, I think, for how to keep track of bibliographic sources, archival documents, and images. It still seems I will need a different kind of database to create the internet interface I want to have. This IS only day two of our ten-day workshop, however, so I am sure there will be lots of answers to the big question of HOW TO MAKE THIS PROJECT A REALITY.

It was fun to play around with Zotero today, and to discuss many issues about finding and organizing information. I am happy to be meeting art historians from all over and with many specializations. There is even an art historian from Brazil—ouch, Brazil! that was a painful loss today—and one from Argentina, too—let’s go, Argentina! for the Americas! But I digress.

Today’s homework is to “identify relevant digital repositories and consider ways to create an intentional archive of sources.” That seems both easy and difficult. Easy because I do already know many good sources. Difficult because there are surely several more and I don’t want to miss any. One wonderful font of sources for my project will undoubtedly be the ICAA–MFAH’s “Documents of Twentieth-Century Latin American and Latino Art.” But how useful will it be to add documents from that site to my own intentional archive of sources when they already have a tool for saving “my documents” on their site? I suppose I will want to have all documents specific to my project saved in one place, like Zotero. Another repository that I know I will use is the Biblioteca Virtual of the Biblioteca Luis Ángel Arango (Bogotá), especially for the Colombian biennials. Fellow institute participant Georgina Gluzman has pointed me to the Internet Archive as a good place to find “many primary sources on Argentina and Latin America.” I look forward to scouring those sites, to begin with, for information to help build my project. Now that I know a bit more about metadata, I am wondering, will they have metadata that I can easily scrape to save some time? So much to discover.

Screen Shot 2014-07-08 at 9.38.23 PM

Biblioteca Luis Ángel Arango’s Biblioteca Virtual is a great place to search for sources on Colombian art and a site I have used heavily in other projects. They have many of their own more recent art exhibition catalogs available online, as well as images from their permanent collection, but much, much more since they are primarily a library.

Altogether, another satisfying, if exhausting, day. As another participant Tweeted, it’s like “digital art history bootcamp.”

Source: Day 2