Text mining

The Philadelphia Photographer, Volume 15, 1878. Most common words, presented visually


Today we were introduced to data and text mining. Structured data, unstructured data, dirty data, all kinds of texts and millions of words analysed in the same operation.

For researchers using images as their primary sources, the value of text mining was perhaps not  immediately obvious, especially given the paucity of digitised texts (or any texts!) in some research areas.

Data mining on the other hand, was deeply compelling, considering the variety of unstructured yet hugely important archival documents, notebooks, albums, artists records and other material that are essential components of many projects.  Turning this material into data that can be analysed seems complex but very productive.

Baby steps….