What we're buzzing about.

This is a Mostly Negative Post about Sentiment-ality

Resource Logo Resource , Building Open Brands Aug. 30, 2012

“Automated sentiment analysis” is one of the most overhyped and under-understood aspects of “online listening.” Every major online listening platform (Radian6, SM2, Sysomos, Syncapse, etc.) offers the ability to identify the “sentiment” of individual tweets, posts, comments, and updates—positive, negative, neutral, mixed…or something else (“mostly positive,” “mostly negative,” etc.). This is data that marketers have never had, and it’s a siren song promising rapid and meaningful actionability!

The logic marketers follow that leads them to automated sentiment analysis is pretty straightforward:

  1. Consumers are talking about my brand and competitors using social media
  2. I don’t have the time to read every post and conversation
  3. I do have time to glance at a line chart on a dashboard
  4. That chart will give me a quick alert if there is a spike in the positive or negative conversation about my brand (or competitors)
  5. I will take action if that happens!

Unfortunately, this approach rarely (if ever) plays out as designed. Automated sentiment analysis can be useful (jump to the end of this post for that),  but it’s worth understanding some of the not-so-pleasant realities that make many of the “desired uses” not meaningful or actionable.

Sentiment About a Brand Doesn’t Change Rapidly and Dramatically

For almost all brands almost all of the time…consumers don’t rise up as one with a massive change in their attitudes about the brand. The hype around automated sentiment analysis implies that dramatic changes in positive or negative sentiment posts occur regularly:

The chart above is highly doctored data from Radian6—it had to be doctored to show the step function change that is shown! The actual data—for a major retailer over a 30-day period—looks like this:

While there is day-to-day variation in the negative-sentiment posts, these are a fraction of the overall conversation volume about the brand, and none of this variation constitutes a true “spike” in the data.

Retailers don’t change business models, their customer service processes, store layouts, products, or any other aspect of their brand overnight. Even when they do make a dramatic change, they cannot simply wipe their customers’ memories of all past interactions with the brand. Changing consumers’ attitudes toward a brand overall takes months, if not years. Gauging consumers’ reactions to a specific event—outbound/planned activity by the brand (a campaign, a product launch), or an unplanned, brand-relevant occurrence—is more feasible, but these are fewer and farther between for most brands than marketers realize.

There are exceptions, of course—spill 100+ million gallons of oil in the Gulf of Mexico, and consumers will rise up with comments! But when these sorts of abrupt and dramatic changes in consumer sentiment about a brand occur, automated sentiment analysis tools aren’t needed—the phone calls to the company’s PR department will be one of numerous alerts to the issue!

140 Characters + Sarcasm = Noisy Data

One of my favorite illustrations of how hard it is to automatically—or even manually—determine the sentiment of tweets:

This is an extreme example, but the reality is closer to this anecdote than you might realize. Below are some tweets flagged as “negative” through Radian6’s automated sentiment analysis related to a major retailer:

  • “My mom says we dont have money but yet shes at <the retailer>… Thats bullshit”
  • “Every since I started working at <the retailer> I’m good as hell at flipping percentages”
  • “When people give away <coupons for the retailer> <<<< idiots”
  • “Hate the shirt I’m wearing, going to attempt a blitz shopping trip to <the retailer> before school. Wish me luck!”

It’s easy to see why each of these got flagged as negative, and it is easy to see how largely impossible it would be for an automated platform, with nothing else to go on than the characters in each tweet, to have done a better job. The same challenge occurs with positive sentiment. Tweets for the same retailer that were identified as having positive sentiment:

  • “goodbye to working as a cashier at <the retailer> to working in the bakery at <a competitor> #whatilovetodo #excited #cantwaittostart #makingmoney”
  • “amazing rainbow at <the retailer> instagr.am/p/O4JUXxXxXx/”
  • “So happy tomorrow is my last day at <the retailer>”
  • “Management meeting then off to B&N and <the retailer>, thank goodness everything is located on the same busy street! :)”

Culling the examples above took less than 10 minutes. And, as it was, Radian6 identified over 80% of the posts as being “Neutral.” Many of them were neutral (we don’t like to think about how often references are simply neutral factual statements about a brand), but many others were actually positive or negative comments. This is not because Radian6 is worse at sentiment analysis than other platforms —it’s an incredible challenge to pull this off, and no platform has the requisite magic dust to do it well.

Counting + Sentiment = Arbitrariness

Putting all of the realities above together leads to an undeniable conclusion: discrete counts of positive and negative counts as a metric (much less a key performance indicator) is akin to throwing a dart at a target in a dark room. Blindfolded. While standing on one leg. And drunk. You might be able to hit the wall where the target is mounted, but the chances of even hitting the target—much less the bullseye—are slim.

BUT…Sentiment Analysis Does Have Its Place!

I’ve spent the bulk of this post highlighting some of the biggest problems with automated sentiment analysis. But, just as happened when I wrote about Klout, I’ll end by clarifying that there is value to be realized from automated sentiment analysis.

We regularly use the sentiment analysis tools within Radian6 to filter down to a subset of tweets and posts that we manually review for meaningful content themes for a brand. This may be because the platform picked up a sharp spike in overall conversation about the brand, and we want to see the posts that had the strongest language in them to try to determine why. It may also be that we are evaluating overall conversations about a particular topic, and even a wildly imperfect breakdown of those conversations by sentiment provides a more meaningful subset of content for us to cull through than simply pulling a random sample of the conversations.

Like almost all aspects of social media data, there is a gap—a chasm, really—between the hype as to what is practical and possible and the reality of unstructured, quasi-anonymous, text-based data scattered across a wide range of exploding and evolving platforms. The data is valuable, but that value comes in the form of semi-precious gemstones buried under topsoil rather than from a Magic Gemstone Finder that identifies large, pre-cut diamonds scattered liberally on the ground at marketers’ feet.



Tell us what you think