Monday, August 23, 2010

Understanding Shakespeare’s Word Frequencies

I saw this post already about Understanding Shakespeare using data visualization techniques, I’m just not sure how I feel about it.  The play is presented as a grid of word clouds – characters across, acts and scenes down.  The theory is that you can learn about a character’s progression through the play by looking at how their word frequency changes.

Look, for example, at Hamlet.  Tell me what you see?  I can’t see anything enlightening, but maybe I’m missing it.

I think what you could do with this is apply another level of semantic detail to it.  Imagine if you could group all “light” and “dark” words together, and then look at Romeo and Juliet.  Or Macbeth.  Then, I think, you might start to see patterns.  Or what if you could select out and compare usage of “you” versus “thou” in certain interactions between characters?  I’m often told that this is a very important key to their relationships.

There’s a version of this technique that somebody does every year where they do a tag cloud representing the current President’s State of the Union address.  Over time, that’s fascinating. You see how some presidents spent more of their time talking about the Depression and economic issues, then some had to deal with war, Germany, Russia … all the way up to modern times where the word terrorism shows up and never goes away.

I wonder if somebody could do Shakespeare’s usage over time, and see how his own vocabulary expanded.  I think to be valid, though, we’d really need to know when he wrote everything, and I don’t think we can ever really know that.

6 comments:

Miss P. said...

I've never used visualizations like these before, but I have used Wordle.net to generate a word cloud for soliloquies as a jumping-off point for analysis. It's very interesting, for example, to compare the funeral speeches from Julius Caesar. I've used this approach with my students who often feel overwhelmed by the language, but who responded quite well to the word-cloud approach.

In the lesson, I show them the word cloud for Brutus's speech and ask them to tell me what they notice. Which words are used most? Can they spot any patterns? Can they tell the most important parts of his message to the audience? When we look at Antony's, the word cloud is so strikingly different that it starts to bring up a lot of questions about rhetorical style, repetition, and key words in political speeches. I love this lesson during election years because YouTube is a feast of modern "Brutus" and "Antony" rhetorical style.

I've had students who struggle to comprehend the monologues and soliloquies use Wordle to help them focus on the key words so as they read the passages again, they can focus their attention a little bit better on the main ideas. It seems to help.

Duane said...

See, P, that makes sense. And maybe that's what the study in general is going for, a sort of "drill down where it will do the most good" approach. But the actual visualizations they provide, a non-interactive glimpse of a single character's progression through the play? I think it's limited.

Cass said...

Duane, I'm glad you said, "I can’t see anything enlightening, but maybe I’m missing it," because I was feeling pretty dumb for not quite understanding the point of the visualizations. I mean, it's... neat, I guess, and in some cases it's interesting to see who talks the most and when -- but I feel like the point is hindered by having the "major character" decided for you. I mean, there's something a little odd about identifying Caesar as the major character of a play he dies halfway through, when his line count is significantly less than Brutus's, Cassius's, or Antony's. And I feel like the word clouds could be misleading -- fidget with the font sizes and spacing a bit, and you can make it look like whatever you want it to look like.

I really like Miss P's suggestion for using it with students, though. The comparison angle has potential to be fascinating.

Ren du Braque said...

I'm currently reading the book "le style et ses techniques" and just in the first chapter it describes how some words imitate the hissing of a snake, how alliteration, rhyming, choosing a long or short word can evoke orally the sensation that the words are conveying.

I fear that word clouds are only useful to look at content. I mean if you word clouded "how much wood would a woodchuck chuck..." you'd see that would and wood and chuck and woodchuck get a lot of density in the cloud, but it wouldn't say anything about the fun of saying the sentence...

am i being fuddy-duddy on this one?

Ian Thal said...

Part of the limited utility of these visualizations is that I can't tell by looking at these clouds, either which characters use these words or in which acts these words are most used.

For instance, it's not useful to me to know that the word "Jew" shows up a lot on The Merchant of Venice; it's however far more useful to know which characters talk about Jews the most.

JM said...

The "surprisingly insightful way to 'read'a play in less than a minute" seems rather ill-suited to its stated benefit. Disjointed sentences that are sometimes totally disconnected to a preceding statement? Rejoinders responding to or appearing out of nowhere changing the subject? Algorithms are great mathematical tools. But the recurrence or repetition of words, admittedly an important component in the Works, is best observed with their neighbors around them in my opinion. I've been in the habit of doing it for years as a component of analysis without the help of what seem to me over-protracted mathematical formulas. Maybe it's so brilliant a concept I don't get it. But then again, I don't need it either. David Crystal has already been through every play in word for word meaning analysis without it. And his results are a whole lot clearer and edifying.