A very deep, dark, treacherous mine.

Tidy data will be in my mind as a guideline as I begin my data collection–because I do not have my data yet. So I won’t be cleaning house, as there’s no house to clean yet, but I will try to build and keep a clean house as I begin to collect and organize my data.

The biggest question in my mind is still: how will I collect my data in the first place? The books I want to analyze have not been digitized yet for the most part. I know it sounds ambitious to make it into part of my project, but I also see this as a potential benefit–making these works available in digital form to a large public, and allowing these works to be known and preserved in a different way from their current analog formats (small editions, many out of print, languishing in public libraries…?)

I don’t want to choose my project based on what data is already available! And I can see my scholarship (and that of many others) as a way of correcting and complementing the Eurocentric biases of digitized collections and platforms. For example, in addition to digitizing the works, I realize I will have to find a good platform to process them. Many of those shown in class won’t work fully because their lexicons will be missing important words. When using Voyant, I had to input many “stop words” in Portuguese (pronouns etc) because even their multilingual option didn’t have them.

I realize that a huge part of my project will be finding ways to build the tools and platforms necessary to begin and run the project in the first place–maybe that’s too much to aim for, but I will at least try.

I must confess I also still have some lingering questions about text mining. My project on the urban world of Brazilian modernists began with old-fashioned manual text mining, which I carried out over two years as an undergraduate (this was an independent research project I came up with at the time, and got funding to do). I read the books and wrote down all the things I was looking for (mentions to urban life broadly defined–from the words “city” and “urban” to specific locations and sites to aspects of modern urban life such as cars, elevators, machines etc.) I was on the lookout for certain terms a priori, but I also discovered most of the terms and themes by reading the books. I would have missed out most of them if I had used a previously prepared lexicon (even if it had been a very perfect lexicon for Sao Paulo in the 1920s). Many of those “mentions to urban life” were also figurative, and I discovered them by reading whole passages of poems, or by analyzing the plot of a short story or novel. Our discussion of text mining today made me realize that it might not be exactly the tool I thought I needed, not just because of the language and region limitations but mostly because what I was doing in the first place might not be best described as mining.

I must say that the volume of works I analyzed is also manageable. It wasn’t billions or even hundreds of books… Perhaps text mining would allow me to expand my field and include other texts besides literary works and journals–say, newspapers from the time–but then again they’re not digitized etc. etc.


Spatial humanities for an architectural historian

The main reason I am interested in Digital Humanities is because of spatial technologies, in particular mapping and the ability to overlay different data (from demographic information to literary references) over historical and contemporary maps. I have already posted about this regarding my project on the “urban world of Brazilian modernists.” I would also like to apply mapping to a couple of other projects. One of them is on informal urban and architectural projects in Sao Paulo (grassroots urbanism) in the last ten years. I want to plot these projects onto a map of the city and correlate them to socio-economic data and to sites of political street demonstrations. And another project is to map present and past sites of alternative living and art projects in Berlin. Both of these projects are tough in the sense that there isn’t a lot of documentation or data (digital or otherwise) because these were often temporary, ephemeral, and illegal initiatives. Data collection will be one of my biggest challenges; much of it will have to be done analogically and then digitized in some way (entered manually or scanned, depending on the data). I think the mapping platforms we explored (geolocation, Google Maps) will be helpful to create prototypes or early versions of these projects, but I suspect that I might have to explore GIS or maybe a custom solution (in collaboration, of course).

Platform salad

Each platform offers possibilities for different aspects of my project:

1. I want to be able to have annotations that appear if you roll over tags; in these annotations I want to be able to include text, images, maps, links to other pages etc. ThingLink was wonderful for this, especially because it is embeddable. It is so great when these platforms can “talk to each other” and work together, which doesn’t always seem to be the case.

2. I didn’t get a chance to explore Scalar, but from the demo it looks very appealing as one way of presenting my project as a text rich with media, annotations, and dynamic features. I am not sure that I could build my whole project there. Right now I see it as a way to make a “book version” of my project, so to speak–with a sustained analysis, argument etc etc but not with all the interactive features of the map-based idea. I don’t mind thinking of this as a companion platform that could be linked to the map site.

3. I like OHMS for a different research project where one of my sources is a video on Youtube made by a Berlin collective, which is an oral/video history of that community. I’ve been annotating it manually by writing down the minutes and seconds of important passages and transcribing them, but OHMS would make the work so much easier.

4. Animoto was super fun, but I probably would only use it to make vignettes for teaching. I like the ease of adding content, but perhaps this ease is precisely what makes Animoto limited for me: I can’t control the timing of each shot, the templates are very formulaic and rigid (and most of them a bit campy!), and I can achieve similar results when I record a slideshow on Powerpoint–except that I can easily add my voice-over, control timings etc. Powerpoint is admittedly much slower and clunkier, and the final file is gigantic and not easy to share, so I suppose Animoto could be best in some situations.

5. Omeka 2.0 looks promising! I tested Omeka yesterday and liked it as a teaching tool, for building my course website and for having students work on their own. Omeka 2.0 seems more flexible, visually appealing (not just on the dashboard side but also on the “user view” side), and easier to work with.

I am not sure what is available at my home institution because I’m joining them in the Fall. It’s a big university and I suspect there might already be resources, people, platforms etc. available. I also think they would be open to new projects and suggestions.

Platforms and projects

This post is mostly speculative, since I didn’t try out most of the platforms presented today. And I’m only just starting to learn the one platform I tried, Omeka.

1. Omeka appeals to me as a pedagogical tool because it seems simple to understand and use, and offers a great format for student projects with the “Exhibit” plug-in. I am considering using it in a Fall seminar, where each student would create her own website on a research topic in lieu of a research paper. The students will still have to do research and formulate an argument, but in a different form from a term paper.

I am not sure how to use Omeka for my own research projects so far. My projects (the one on Brazilian modernism described in earlier posts, and two other research projects on architecture and urbanism) are not really about collections of discrete objects, but rather about sites and buildings that are inseparable from their urban locations and larger socio-spatial contexts. I can’t quite picture separating them as “items” and individual files with labels and metadata. Although I could use this format to tell a story about my projects, I don’t think this would take full advantage of their spatiality. But then again, there is a map plug-in in premium Omeka, so perhaps that would open up possibilities that I can’t envision yet. I’m thinking of the site on Visualizing NYC that Kimon showed us today–I think that was Omeka, and it had a very cool map. BUT: Kimon mentioned that they used a custom Java script for the map (which I don’t know how to do).

2. Prezi

Prezi seems great for organizing class materials for students. I didn’t try it out, but Kimon’s timeline was an amazing way of displaying information. I’d seen Prezi presentations before and I didn’t particularly like them, but today I changed my mind.

3. Drupal Gardens

I didn’t try it out, so I don’t have a sense of what it looks like or how it would work. It was described as very flexible, and based on a node structure–all of this seems appropriate for my projects, where I envision a map as the center of information and interactivity. As in: a page taken up mostly by a high-definition map with links to book passages and historical photographs placed on specific locations. From these links someone would open up new pages, which could be texts or images; and there could also be a reverse-lookup from the texts to the maps. It all sounds very abstract, I might try to sketch this out later.

4. Scalar

I must confess I don’t quite have an idea of Scalar compared to the other platforms (maybe I was in the restroom when it was explained in class?).

5. WordPress

I like the ease of use. It is appealing as a teaching tool for this reason. I also like it as a way of developing my thoughts and recording my work process, just as we are doing with the homework assignments. I wish it were a little more flexible in terms of its calendar/blog structure.


Digital sources on Brazilian Modernism

I will need four main types of digital sources for my project:

1. Images of Brazilian modernist artworks by artists such as Tarsila do Amaral, Oswaldo Goeldi, Anita Malfatti, Lasar Segall, Di Cavalcanti, etc.). I’m looking for paintings, prints, book covers, book illustrations, and magazine or journal illustrations, so repositories might include museums, cultural institutes, and digitized books.

Repositories: Established museums and cultural centers such as the Museu de Arte Contemporânea da Universidade de São Paulo and the Instituto Moreira Salles have images and metadata, but the images are not always high-res, and the sites are a bit clunky (not easily searchable).

2. Literary works by Brazilian modernist writers such as Mario de Andrade, Oswald de Andrade, Menotti del Picchia, Antonio de Alcantara Machado, and Patricia Galvao. These works include books (novels, short stories, poems) and also literary journals and newspaper pieces.

It doesn’t seem like much of this has been digitized at all. I’ve read most of this work in print in the 1990s. A search for Google Books shows the titles, but most have not been digitized; some have very limited partial previews; and the ones with more generous previous (though not full) are scans from print editions and not e-Books. I think this will probably be one of the challenges of this project. I’m dreaming of writing a big grant proposal that would include funding for many things, including the digitization of these works.

3. Historical maps of the city of Sao Paulo from the early 1900s to 1940.

The official website of the Sao Paulo city government (prefeitura) contains historical maps and photographs, neatly organized by year, and associated with the census (so there’s also demographic data). Maps are high res! Very zoomable! And can be easily downloaded (the images saved right to Zotero as jpgs in one step).

4. Historical photographs of Sao Paulo from the early 1900s to 1940.

In addition to the site mentioned above, the Departamento de Patrimonio Historico de Sao Paulo has an online database (easily searchable) with lots of historical photos.

On a side note, this department and this particular collection (which began as an analog collection in the 1930s) only exist because of the work of one of the modernist writers included in this study, Mario de Andrade, who founded and directed the city’s Department of Culture. In addition to being a writer and a government worker, he was a folklorist who traveled Brazil and documented many cultural practices (from music to oral narratives). So he was both an early archivist, and an early public intellectual committed to democratizing culture in many ways. After our discussions on public engagement today, I will surely add this as another component to my project and argument.

Summing up: As I suspected, there are few resources already available in digital form for this project (at least findable online). I will have to do more investigating to see if there are digitized resources in Brazil that are offline. I suspect that a huge part of this project will involve searching for funding for the digitization and cataloging of these materials, which in turn will entail finding contributors in Brazil (and maybe in the US too). I know it’s a lot to take on. The upside is that, out of this effort, not only my project would be possible, but also these materials could be made publicly available for the first time in a high-quality, easily searchable format (maybe a database that would be a parallel or sister website).


