If you don’t subscribe to Pilot’s quarterly newsletter or if you’re just behind in your reading like I am, you might have missed the recent article that cautioned against always using the traditional red/green/yellow traffic lights to represent performance. The simplest argument is that three states may not provide enough granularity to distinguish between levels of performance. For example, when a driver approaches an intersection in a car, there are only two potential outcomes – proceed or stop – represented by green and red. While the extra state, yellow, provides the additional information that the light is about to turn red, it doesn’t add another outcome. On a yellow light, most drivers proceed through the intersection — albeit presumably with a little more caution than if the light was green.
In business, our decisions are usually significantly more complicated than proceed or stop so the traffic light metaphor doesn’t always apply. Imagine the situation in which we targeted selling 100 units of a product every month. If we sold 50 units one month, our poor performance might be represented by red. Similarly, 80 units might yield a yellow to show caution while 90 units would give us a green for good performance. In this situation, however, how do we represent exceeding the target value by selling 150 units? In sales, as in many other areas, it would be demotivational not to differentiate between exceeding and almost reaching the target. The most natural solution is to include an additional state; dark green could represent exceeding the target.
Including additional states is also a way to handle an issue that can happen when you’re using qualitative KPIs based on surveys and you’re limited to three states. Because people naturally tend to avoid outliers, they bias their answers such that many KPIs end up as yellow. As such, I recommend adding both a dark red and a dark green to the traffic light, which yields five potential states. With five states, even if people avoid of outliers, there are still three potential results: light green, yellow, and light red.
For those of you that remember the difference between nominal, ordinal, interval and ratio measurements (statistics, anyone?), there’s a mathematical reason to prefer five states over three as well. The short argument is that red/green/yellow are nominal labels that – through convention – have become ordinal rankings with green as best and red as worst. The limitation with ordinal ranks is that, while they show that one result is better than another, they don’t show how much better one result is than another. If I finish first in a race, no one knows if I won by one second or one minute. Interval measurements like thermometers (the differences between 80, 81, and 82 degrees are the same) and ratio measurements like age (40 is twice as old as 20 which is twice as old as 10) solve this problem but can’t be represented by a small number of states. The more states you have in an ordinal ranking the more it approximates an interval one. So, 5 is better than 3. Statistics class dismissed.
For some organizations, using a red traffic light to indicate poor performance might send the wrong cultural message. (What if red is your corporate color?) These organizations could instead consider the emoticons used in instant messaging tools (smiling face, frowning face) or thumbs up/down images (gladiator style).
All images, including traffic lights, suffer from the fact that end users don’t intuitively understand the grading scales. At what value does a KPI change from red to yellow? How much better is dark green than light green? At the risk of pulling out the statistics book again, we have an ordinal ranking problem. To solve this communication problem, I often recommend that organizations use letter grades (A/B/C/D/F), which are widely understood and self-documenting (90% and above is an “A”). And, for those keeping track, these are interval measurements.
Some of my colleagues have suggested we go one step further and use thermometers. After all, you can often deduce close to the exact value from the height of the mercury. Therein lies my concern with thermometers; they track the actual value rather than the gap between the actual and target. There are times when getting close to the target is ok (fundraising) and other times when it’s not good enough (zero defects in an airplane). I’ll expand on this idea in a later post.
I’m a big fan of visual clues but some times it may be better to use words rather than colors, images, or letter grades. A scoring system with intervals like “Best In Class,” “Exceeds Performance,” “On Target,” “Needs Work,” and “Unacceptable” is instantly understandable and therefore may be the most likely to be accepted. And, of course, acceptance leads to adoption.