Text mining

The Philadelphia Photographer, Volume 15, 1878. Most common words, presented visually


Today we were introduced to data and text mining. Structured data, unstructured data, dirty data, all kinds of texts and millions of words analysed in the same operation.

For researchers using images as their primary sources, the value of text mining was perhaps not  immediately obvious, especially given the paucity of digitised texts (or any texts!) in some research areas.

Data mining on the other hand, was deeply compelling, considering the variety of unstructured yet hugely important archival documents, notebooks, albums, artists records and other material that are essential components of many projects.  Turning this material into data that can be analysed seems complex but very productive.

Baby steps….


A very deep, dark, treacherous mine.

Tidy data will be in my mind as a guideline as I begin my data collection–because I do not have my data yet. So I won’t be cleaning house, as there’s no house to clean yet, but I will try to build and keep a clean house as I begin to collect and organize my data.

The biggest question in my mind is still: how will I collect my data in the first place? The books I want to analyze have not been digitized yet for the most part. I know it sounds ambitious to make it into part of my project, but I also see this as a potential benefit–making these works available in digital form to a large public, and allowing these works to be known and preserved in a different way from their current analog formats (small editions, many out of print, languishing in public libraries…?)

I don’t want to choose my project based on what data is already available! And I can see my scholarship (and that of many others) as a way of correcting and complementing the Eurocentric biases of digitized collections and platforms. For example, in addition to digitizing the works, I realize I will have to find a good platform to process them. Many of those shown in class won’t work fully because their lexicons will be missing important words. When using Voyant, I had to input many “stop words” in Portuguese (pronouns etc) because even their multilingual option didn’t have them.

I realize that a huge part of my project will be finding ways to build the tools and platforms necessary to begin and run the project in the first place–maybe that’s too much to aim for, but I will at least try.

I must confess I also still have some lingering questions about text mining. My project on the urban world of Brazilian modernists began with old-fashioned manual text mining, which I carried out over two years as an undergraduate (this was an independent research project I came up with at the time, and got funding to do). I read the books and wrote down all the things I was looking for (mentions to urban life broadly defined–from the words “city” and “urban” to specific locations and sites to aspects of modern urban life such as cars, elevators, machines etc.) I was on the lookout for certain terms a priori, but I also discovered most of the terms and themes by reading the books. I would have missed out most of them if I had used a previously prepared lexicon (even if it had been a very perfect lexicon for Sao Paulo in the 1920s). Many of those “mentions to urban life” were also figurative, and I discovered them by reading whole passages of poems, or by analyzing the plot of a short story or novel. Our discussion of text mining today made me realize that it might not be exactly the tool I thought I needed, not just because of the language and region limitations but mostly because what I was doing in the first place might not be best described as mining.

I must say that the volume of works I analyzed is also manageable. It wasn’t billions or even hundreds of books… Perhaps text mining would allow me to expand my field and include other texts besides literary works and journals–say, newspapers from the time–but then again they’re not digitized etc. etc.


Source: A very deep, dark, treacherous mine.

Day 6 / homework?

El problema es el idioma. Pero no se trata de que quiero tener la ayuda de una herramienta determinada en mi idioma. Eso no me parece importante ni posible.

Sitios como Opecalais.com o Voyant me devuelven a una pregunta crucial para quienes realizamos tareas de investigaciones enfocadas en áreas no centrales (no es un eufemismo, es sólo que décadas de corrección política han hecho mella en mí). Las fuentes, los personajes, los lugares, los eventos sobre los que quiero saber más… están simplemente en otro idioma y en otro universo cultural. Es así de simple.

No postulo que sean absolutamente inútiles (aunque tampoco asevero lo contrario). Sigo pensando en el tema. Sí puedo asegurar que para el viernes no tendré una respuesta.

Day 6 / homework (for real)

Today we started to work on data mining. Data. That is a very big word. You will see… there is something you must know about me. I have a secret to confess, a reality to share, a burden to be lifted with help. This is how my doctoral research files looked like just a few years ago:

380160_255465214505757_646165245_nMy folders, not so long ago.

They do not look that different now, truth be told. Those are the files I have on paper. There is another disturbing reality: the folders in my computer.

Screen Shot 2014-07-14 at 9.26.48 PMOh, yeah. Some of my folders.

These two images reflect the two complexities of my research. I have to deal with (1) a huge amount of data that (2) I have collected in research carried in libraries and private archives. I have not found relevant materials in digitised collections. Period. This makes the whole data mining process… elusive, to say the least.

However, this does not mean I would not have found relevant data on digital collections. I may have. But I do not know, since most digital repositories are not free: university projects require suscriptions that are beyond the reach of my university and the oldest primary sources from Google Books cannot be read from Argentina (yes, it is true).

And those are the challenges I face as a Latin American based scholar. So far, data mining seems like a distant dream.

Source: Day 6 / homework (for real)

Day 5: Geospatial Art History & the Art of Mapping Hipsterdom

Digital Art History bootcamp ended on a high note (for me) as we delved into mapping and visualizing change over time. Before the institute started, I possessed little knowledge of mapping but knew it would be useful for my project. For example, I want to be able to show the areas affected by epidemics in sixteenth-century Mexico alongside those locations that display artworks related to death and dying. I assumed that mapping would be complicated and messy, but after Friday I learned that there are plenty of tools that are easy to use.

The Google Map Engine proved straightforward and user-friendly. I didn’t have an excel file with data for my project, so I decided to turn to the only reasonable alternative: mapping hipsters in a small part of DC (see the above embedded Google Map). While mapping hipsters doesn’t relate to my project, the data did allow me to experiment with mapping.

I was so excited by new-found ability to create easy, readable maps that I called my husband during the lunch break to walk him through the process. I thought it would be useful for him as he thinks about opening a new practice. He plans to plot potential “rivals,” transportation stations, and rent-able locations for instance.

I wasn’t sure that anything could top my excitement, but then we moved on to the New York Public Library’s Map Warper and StoryMap. I was in awe of the Map Warper. I found an eighteenth-century map of Mexico, which I georectified. This is an incredibly powerful tool made available to everyone by the NYPL. For example, the map I used allowed me to pin cities in New Spain (colonial Mexico) and a modern map of Mexico. Cities that had different names are now coordinated with their modern equivalent. The historical map “cloaks” the current one. For me this was one of the most powerful ways to demonstrate how maps lie and manipulate space. While the Map Warper wasn’t as immediately relevant to my project, it certainly does provide me with historical maps to orient people in the viceroyalty of New Spain.

StoryMap amazed me with its creative capabilities. I decided to create a story map of the spread of hipsters in Brooklyn (of course!). I’m not sure yet how I would incorporate it into my project on Mexican deathways, and I don’t know that I will. I abide by the notion that the tools and technology shouldn’t dictate the project. However, I know I will use it in my courses. It was easy and fun to use. Plus, I think it offers wonderful possibilities of engaging students with certain types of materials.

To sum up, I see great potential in using some of these mapping tools for my project. The Google Map Engine in particular will become a crucial component of visualizing some of my data. I dreaded what I thought would be a complicated and time-consuming process, but in reality it will develop much faster–the click of a few buttons really.

Source: Day 5: Geospatial Art History & the Art of Mapping Hipsterdom

Homework Day_5: Ways that spatial humanities techniques might influence my scholarship.

In my dissertation work, I had already begun to experiment with spatial humanities techniques (although I did not know that was the term for them), by working with GoogleMaps and placing pins on the known ‘haunts’ of Degas in Rome, Naples and Florence, and then (as my dissertation is very analog) saving those maps as jpgs and including them as figures in the final product. The results were static, and (admittedly) not too helpful for readers.

From the get-go of my dissertation, I was frustrated with the limitations of more traditional tools, and had developed a few small-scale hacks to try and work through the limitations (e.g. the static GoogleMaps), but the experience left me wanting. Thus, this experience was important as it was one of the catalysts to get me thinking about how (what I learned is called ‘digital humanities’) could help me push arguments further/engage with the information in different ways/etc., and lead me to this point in my development as a provisional D(A)H.

Fast forward to the crystallization of my thinking on Mapping Paris, i.e. summer of 2013: in the first iterations of the project, I planned to simultaneously to plot the social AND spatial maps of the artists (and their circles) living and working in Paris between the years of 1855-1889, but as the project truly moved into its planning stages, I realized that not only did I need create geographical and temporal limitations (which would be easily expandable), but I decided that I needed to limit the project to the social sphere only, in order to make it focused and do-able in the short(er) term.

Long story, short: the spatial elements are going on the back-burner for now, but I am excited to learn about them, and think about ways to incorporate them at least into my pedagogy in the short-term. I also think that I will include in the design of the database, a field for spatial information, in the hope that as the project grows, that the component can be ‘easily’ [and I knocked on wood as I wrote that] tacked on at a later date.

In other news, do keep an eye on the Drupal MappingParis website (which is still bare-bones), which I am using at the moment as a sandbox/showcase for some of the tools that we have learned. The contents of my attempts are slim-to-none, but the tools themselves are what is of import.

Note: the link did not work when I first posted it, but I have corrected the error here and in the original blog post.

Source: Homework Day_5: Ways that spatial humanities techniques might influence my scholarship.

Blog about the ways that spatial humanities techniques might influence your scholarship

blog1 copy

Captions for Ancient Mayas & Space – Left: GIS reconstruction of site of La Milpa, Belize (Boston U); Center: 3D model of Castillo, Chichen Itza, Yucatan, Mexico; Right: detail of Pakal’s Sarcophagus Lid, Palenque, Mexico and artist’s (fanciful) reconstruction of Pakal piloting a spaceship.

How are these useful? GIS is very useful for reconstructing and analyzing what we can no longer see. It allows us to see the development of natural and urban spaces and to track patterns.

What about 3D models? Their creation can be labor intensive, and can allow a view of what once was. However, the creator must be absolutely certain that her/his information is accurate or else the result is faulty and misleading. Why do most 3D models spin? I don’t know, but they trivialize scholarship and tend to blur the lines between serious work and video games. (Creating games is serious work, but it is not academic work.)

Why do people still believe that ancient and non-western peoples were astronauts or ETs? Is it because many people think that they can read images without any training when in fact reading Maya art–or the visual culture of any society–is in fact a great investment of time and careful thought?

The spatial turn has affected my work even before I knew there was a term “the spatial turn.” I am concerned not by the flat space of a map or a plan but rather by the three-dimensional space that people inhabit. Key Thinkers on Space and Place (P. Hubbard and Rob Kitchin, eds., 2011, rev. ed., Sage) includes about 60 philosophers and practicioners. My own list includes philosophers, anthropologists, urban historians, and architects: Foucault, Benjamin, Heidegger, Deleuze, Levi-Strauss, Bourdieu, de Certeau, Keith Basso, Barbara Bender, Edward Sosa, Dolores Hayden, Judith Butler, Tim Ingold, and Bernard Tschumi. Particular foci of my projects to understand and analyze spatiality, architecture, and ancient society include the “in-between,” “third spaces,” “phenomenology,” and ontology; other concerns are memory and landscape). In many cases I borrow modern concepts and adapt them to analyze the ancient world because, many times, those questions are more interesting to me, and more fundamental. Too, I approach architecture and space not as containers or stage sets for art and activity but instead as resources that people used–purposefully and not–that have the capacity to inform about intimate and broad filial, social, and cultural patterns.fpsyg-03-00271-g005

This diagram that explains Yucatec Maya speech is a much more useful aid about space than the previous images. Yucatec Maya was and is spoken in and around Chichen Itza, the site of my project. It contains some fascinating features that English lacks regarding space and time. One is the expression of deicitic time; expressing sequential time is limited and instead cyclical time is emphasized. This is exemplified in gestures and speaking about completed and future actions. Although time and space are linked, they are not linear and a 3D project may be more relevant.

I have not found the ideal platform for my project yet. I would like to layer maps that relate information about geology, geography, astronomy, a restoration of part of the original site, and conceptual maps. The latter would supply vital ideas about the Maya world, including the ancestors and some deities who dwell below the surface of the earth, especially below ballcourts and cenotes or sinkholes; and some of the deities who dwell in the sky).

Google Maps would work if I could add more than three layers and tip the maps to a angle that would allow a 3D view. I was able to load the Omeka plugin. I am exploring some other software that promises to do what I want. But we live in a Late Capitalist world and have all heard that promise before.

To be continued…

Source: Blog about the ways that spatial humanities techniques might influence your scholarship

Tracking the Production of Ceramics

Maps are an essential tool for tracking the production of ceramics. This production was concentrated in cities that had access to clay, water, and kilns. Using two different mapping applications, I located some of these production centers and provided illustrations. With greater amounts of data (especially with the google application), these maps could be further developed as research tools to track stylistic differences and similarities and to consider trade and gift-giving in the sixteenth century.

Thinking About Mining Data


Data mining seems like a valuable source, especially for statistical analyses of text, for historiography, and for detecting patterns. It seems that the field as presently configured favors modern and contemporary projects, where sources are most likely to be in digital format. (The programs and strategies for dealing with digitized print sources that were in old fonts seem to be time consuming and fraught with problems.)

What are art historians who deal with older eras to do if they wish to include sources that are not in digital form? Some issues are:

We may have primary sources still in primary form, where paleography is still an issue, or sources that are not in digital form
In some cases we do not even have the requisite several hundred sources that would yield a corpus to analyze
If we have sources in more than one language, synonyms (not proper nouns) may have subtly different meanings that could skew results

However, I do see potential in using data mining to analyze student work, including exams, papers, and other written work. At some colleges there is an option for exams to be done in testing centers on computers and more colleges are moving toward tablets for all students to,use for reading and assigned work.Some possible data mining projects include:

Analyzing essay questions – if students were asked to discuss one (or more) art works, artists, or scholars, what are the proportions for who/what was mentioned? This could provide some insight into what sources students rely on (text book, outside reading, info mentioned in class) or what interests them.

Source: Thinking About Mining Data

High-five! A Gigundo Week of Ginormous Discovery


Let’s just pause a moment and recap what I accomplished this week:

1. set up a new website domain (you’re reading it!)
2. learned smarter ways to search for images on-line (Googlerama)
3. played with Zotero
4. elevated my Twitter game
5. wrestled with Omeka, furrowed my brows at Scalar and Drupal
6. thought about the lack of oral history in architecture
7. was not made to feel better about copyright issues
8. impressed myself with Thinglink and had ridiculous fun with Animoto (my husband, who teaches a woodshop-safety class at our university, requested something that would attract and hold the attention of freshman students)
9. had big fun annotating film clips on YouTube (even if they’re not as immediately pretty as other tools)
10. got dizzy over the thrill of Google Map Engine Light
11. totally rectified an 18th-century map of Philadelphia
12. spent a few hours making a very spiffy StoryMap for my architect
13. crashed and burned with the new install for Omeka

Overall, many more successes than failures–and even the latter have value for defining limits and maybe encouraging re-thinking about the learning (or trial-and-error process) overall.  While I am delighted that I can look back at having learned so much in really such a short amount of time, my work through the weekend did reveal some points of weakness.  First, not everything has really sunk in, and I am reminded how important it is to practice new skills over and over to make sure they are really truly learned, even after an initial success.  Second, as I tried to  build a little project to display my new skills, I found there were aspects of the project that weren’t yet served by my little skill set, or that there were still things that I just don’t know how to do, even if I can imagine them or have seen them working in some other online source.

I am starting week two trying to balance these two main impressions: 1. delight that I know so much cool stuff, and 2. anxiety that I won’t know everything I really need soon enough!

Source: High-five! A Gigundo Week of Ginormous Discovery