Moneyball is about baseball. But it’s also about breaking down accepted preconceptions and finding new ways to look at individual skills and how they mesh as a team. Sometimes the characteristics that we believe the team needs aren’t all that important in assessing and improving quality. Moneyball is also about people deceiving themselves, believing something to be true because they think they experienced it. Some of the team’s accepted practices may have less of an impact on quality than we would like.
This presentation examines how to use data to tell the right story about our state of quality and our success in shipping high quality applications. It examines whether our preconceptions are supported by facts, and identifies characteristics for building a high-performance testing team. It applies the Moneyball approach to testing and quality to give teams a new way to evaluate capabilities and software to deliver the highest quality possible.
The Oakland Athletics major league baseball team had a big problem. It had a payroll that was a tenth the size of the very best teams in the league, and it had to field a winning team in order to remain profitable in its small market. Its general manager, Billy Beane, started looking deeply into the characteristics that produced winning teams.
What Billy Beane did was take what he had available, and make a winning team, year after year. With some research, he understood the errors others were making in evaluating players and teams, and figured out how not to make the same errors.
You likely have a similar problem to what Billy Beane defined here. What do you do when you seemingly lack the resources to build a team and perform testing in ways that seem ideal?
In Moneyball, Billy Beane discovered that the “experts” selected players who looked like their ideal of a baseball player, whether or not that person could actually improve their team. There was also a lack of understanding of what characteristics really won games over the course of a season.Beane wanted to build a competitive team, but didn’t have the budget to do it the way large market teams did. So instead he began paying attention to why teams chose the players they did. After examining a lot of baseball data that was just becoming available, he found that there were some clear truths that weren’t acknowledged by any of the experts.These experts had many years in judging talent, and believed they were the best at doing so. But that was their bias talking.
Our beliefs influence our practice and our results. Beane looked at what others believed, and discovered that they were wrong. Their expertise was really a source of bias that came to incorrect conclusions.We do the same thing in testing. We typically come into a project with a set of biases that we may not even be aware of. The next 15 slides describes some of those biases, and how they affect our thinking.
Daniel Kahneman, in his book Thinking, Fast and Slow, defines two types of thinking. He calls them System 1 and System 2 thinking.These, of course, are models, and don’t have any physical representation in the brain or elsewhere.
50-80 percent of college students get it wrong. Why? Because we think we know the answer without any further thought. It’s an error of System 1 thinking.
Daniel Kahneman has developed a model that divides thinking into two components, which he calls System 1 and System 2. System 1 is immediate, reflexive thinking that is for the most part unconscious in nature. We do this sort of thinking many times a day. It keeps us functioning in an environment where we have numerous external stimuli, some important, and many not.System 2 is more deliberate thought. It is engaged for more complex problems, those that require mental effort to evaluate and solve. System 2 makes more accurate evaluations and decisions, but it can’t respond instantly, as is required for many types of day-to-day decisions. And it takes effort, which means it can tire out team members.
When we answer the second question, with intuitively think of our financial situation in defining the answer. That often means that our answer is based on our thought surrounding the prior question. We have been “primed” to think in a certain way.While plastic is convenient, it's also a threat to thrift: A 2011 study found that people paying with cash think more about a purchase's costs; those using credit dwell more on the benefits -- and are primed to pay more.
If we are already thinking about something, those thoughts will influence our subsequent image of the situation, and any decisions that arise out of that situation. If we believe our software has defects because of complexity, confusion, or poor practices, we will likely find more defects than if we believed it was of high quality.
This error is familiar to most of us, because we often let first impressions dictate our subsequent beliefs and actions. We do this because we want to see ourselves as good judges of people and situations.But the halo effect is more than this. It also enables us to judge, or even change our judgments, based on subsequent events. We may believe that our manager is decisive because he/she has a ready decision whenever we ask. But in times of crisis, that same decisiveness may be perceived as rigidity, because they aren’t able to easily process new information and adjust decisions accordingly.
How do we deal with our instinct for early judgment?
How many of you have ever had pilot training? Years ago, in a PA-28 Cherokee 140 like this one, my instructor put me “under the hood” in practicing recovery from unusual attitudes. With the hood down, he put the plane into unusual flying positions from which I had to recover as quickly as possible. When he brought the hood up, I could see only the instrument panel. I rapidly developed a heuristic that enabled me to quickly identify and correct an unusual attitude. In short, I focused on the turn indicator and artificial horizon, and worked to center both of them.My instructor figured out what I was doing, and I did the same thing the next time. My turn indicator and artificial horizon were centered, but I was still losing over 1000 feet a minute! I was stumped. My instructor had “crossed” the controls, leaving me in a slip that my heuristic couldn’t account for. I was worse than wrong; I couldn’t follow through at all once my heuristic failed. I never forgot that experience.
Kahneman had subjects spin a wheel of fortune rigged to stop at either 35 or 65, then asked a question on how many African nations were in the UN. The result of the wheel of fortune demonstrably swayed people’s subsequent answer up or down, even though it had no relationship to the question. They were anchored to a particular number, and that number influenced their subsequent guess.
Exceptionally good or bad performances are probably due in large part to very good luck. Skill plays a role, but not as much as we believe. People can still be good or poor at a particular task, but luck is a variable on top of that skill.
The praise or criticism has nothing to do with it. An exceptional performance is almost certainly in part due to a large measure of luck (good or bad).
That leads to the question of why dowe believe experts? Their analysis and forecasts may overall be better than chance, but they almost certainly won’t be exceptionally good. It seems we like experts who take strong stands or say controversial things, for the theater value. We also like it when experts are right, but both they and we tend to discount the times that they are wrong.
Do you watch House, MD? (starring Hugh Laurie) House deals with complex medical cases, and he is typically wrong in his diagnoses 3-4 times before he gets it right. It’s TV, certainly, but he is in a noisy data environment, and can’t really be expected to develop an intuition about cases that are all one-of-a-kind.
Kahneman was part of a group building a new national curriculum. At one point he asked the group how long they thought it would take for them to finish. In a secret ballot, their collective answer was about two years.He then asked a member of the group experienced in doing that task how other groups had fared in the past. This person paused and said “About 40 percent didn’t finish at all, and the rest took between 7-10 years. He also offered the opinion that their particular team was slightly below average in ability at the task.At that point, they should have abandoned the project as a bad bet. Yet they believed their result would turn out different. Ultimately, it took 8 years to complete, and their curriculum was never put into use.
How can we apply these often-surprising facts about thinking to testing activities and testing teams?
While we use the word quality a lot, much of the time we really don’t know what we’re talking about. It’s an abstraction. So instead, we substitute operational definitions based on things we do know, and data we can collect and analyze.
We have operational definitions of quality – definitions that we can measure. Sometimes we explicitly draw a connection between them and quality, and sometimes we just assume that some relationship exists. But we may not know precisely what we mean by quality. And our definition of quality may well be different than that of our users or customers.
How about exploratory testing as a System 2 stimulant? Exploratory testing seeks to find out how the software actually works, and to ask questions about how it will handle difficult and easy cases. In reality, testing almost always is a combination of exploratory and scripted testing, but with a tendency towards either one, depending on context.Going back and forth between exploratory testing and writing and running individual test cases makes it possible to shift between System 1 and System 2 thinking, and also make it possible to more readily engage System 2 thinking when System 1 is likely to fail. Consciously going back and forth between the two thinking models has the potential to make both better and more responsive.
Unintentional bias is a big cause of testing and assessment errors.
Automation helps reduce bias by standardizing process, including tests, workflows, and defect tracking. However, it also produces a high level of System 1 thinking, so it shouldn’t be relied on exclusively. We see the results of the automated test, decide based on bias or heuristics, and move on.Automation needs to be supplemented with active manual activities that bring out System 2 decision-making, such as exploratory testing or reporting that enables testers to visualize data.
This should come as no surprise to anyone. We estimate both time to test (and to develop) as though everything goes like clockwork. Life isn’t like that. We have meetings, time off, rework, more meetings, and other things that prevent us from devoting 100 percent of our time toward our direct work. People leave for other jobs, and hiring and training replacements is time-consuming.Instead of estimating based on your expertise, look carefully at past experiences. Those are your best guide to estimating future projects. Make use of data from those projects and use that data as a starting point for estimating. You may adjust those estimates based on greater expertise or better automation, but don’t do so without a good reason.
Bias is common in our judgments, even if it is fully unintended and unrecognized. I’ve described several types of bias. The best way to guard against them in building a team is to engage System 2 thinking, rather than working on autopilot (System 1).In managing people, we need to establish a balance between automatic work (driven in large part by automation) and thought work, driven by more difficult problems and decisions. Too much of either can result in errors in judgment and fatigue, which can cause us to make mistakes in testing and project work.
We have opinions of the value and quality of software projects before even starting them. These expectations will invariably skew our approaches, and result in mistakes in judgment and decisions. We can temper these expectations with group efforts to consider the opposite expectation, and come up with reasons why our initial opinions may be in error.We often agree to not rock that boat. That means that the loudest (or most senior) ideas win out, without a good vetting.One strategy for avoiding groupthink is to tell the team that a year from now, the plan has clearly failed. Ask them to individually write down their “pre-mortem” of the failure. These pre-mortems should then be evaluated by the group to see if the plan is really as good as they thought it was.
How can System 1 and System 2 thinking effectively interact? Consider varying the two in a tester’s work. System 1 thinking involves rote activities, such as manual testing (from existing test cases) and automated testing. System 2 thinking is invoked in exploratory testing, context-driven testing, and data analysis. Formal meetings often give rise to System 1 thinking, while in-depth one-on-one conversations engage System 2.
It’s always good to have specific expertise on testing teams; however, you have to use that expertise effectively. You should know the limits of your experts, and call upon their expertise when you have enough information available for them to make a difference.
I knew an expert programmer. In fact, he was an expert C programmer using Borland Turbo C 3.0. He knew that product like no one else. But his expertise quickly disappeared as other languages and tools gained popularity. For a brief period, he commanded a high salary, but wasn’t able to easily transition into newer technology trends.This argues for a broad rather than deep skill set. Testers and other team members should be able to fill multiple roles, and feel comfortable with knowing what they don’t know. It is easier to acquire knowledge just-in-time, and it has the potential to reduce bias. Expertise still has value, but only if that expertise directly applies to the problem domain.
In summary, our projects are influenced in a number of ways by how we think about them. Some of those thought processes can be in error. I’ve discussed several common errors of thinking, based on Kahneman’s theories. And I hope I’ve provided a few ideas on how we can adapt and account for our errors in the realm of software projects and testing.