Data mining, part 2

The Woman in WhiteThe Woman in White

This is my word cloud (with omitting “said and say” stopwords, although that might be interesting too as an analysis of the narratology of saying versus showing).

Google n-gram

I have been thinking about what “data” might mean for some of my work, and it isn’t obvious what would be a good category for this kind of research. I also work on ekphrasis in Roman poetry, but ekphrasis in this context isn’t manifest by signal words that I could use for different poets. Instead, Roman poets intentionally chose polysemous words, which had meaning in Greek and Latin, and prided themselves on the wordplay that that would allow, and particularly Augustan and so-called Silver Latin poets tried to be as opaque and allusionary as possible in the poetry. So one of the things that I’ve been grappling with is the idea of how to capture that kind of wordplay in a large database or is it even possible to do so?

Another question that I’ve been considering is the one about data in relation to the construction of meaning; that data isn’t “natural” or “unmediated”, but that the “cleanest” data has less interpretive baggage to dirty up the analysis. I need to think some more about this issue of data, what it is and isn’t.

Source: Data mining, part 2