August 2009 – Three Kinds of Measurement and Two Ways to Use Them

This article was originally featured in the July/August 2009 issue of Better Software magazine. Read the entire issue or become a subscriber.

People often quote Lord Kelvin: “I often say that when you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot express it in numbers, your knowledge is of a meager and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely, in your thoughts, advanced to the stage of science, whatever the matter may be.”But, few note the sentence that precedes the passage: “In physical science the first essential step in the direction of learning any subject is to find principles of numerical reckoning and practicable methods for measuring some quality connected with it.” The missing sentence prompts some questions: Are software development and testing sciences subject to the same kind of numerical measurement that we use in physics? If not, what kinds of measurements should we use? How could we think more usefully about measurement?

Gerald M. (Jerry) Weinberg suggests thinking in terms of three broad categories. First-order measurement, he says, is what we need to get started–“just adequate to the task of getting something built.” First-order measurement tends to be qualitative, fast, and inexpensive; it generally doesn’t require mechanisms or devices to enhance or extend the observation. In a recent conversation, Jerry told me that first-order measurements “are unobtrusive, or minimally obtrusive, and can be used without a whole lot of fuss. They help give you a lot of important information that can lead to other information or, in the best case, to immediate action if needed.”

First-order measurement is what we’re doing most of the time as we’re driving a car. We look through the windows, listen to the engine, and feel the acceleration and deceleration. We make observations and comparisons without getting hung up on quantification. “The road is dry. It’s cloudy. There’s traffic on the right and a car up ahead with its brake lights on.” First-order measurement suggests answers to the questions What seems to be happening? and What should I do now? In this situation, if you feel like you’re driving too fast, you probably are driving too fast. If so, first-order measurement is enough to inform an immediate and appropriate action: slow down.

Because it’s based on ongoing experience and feelings, rather than on careful experiments and controlled data intake, wise use of first-order measurement requires us to consider a number of possible interpretations of the meaning and significance of what we see. Suppose you feel like you’re driving fast, but not too fast. Now you observe a set of red and blue lights on the top of the car ahead. The extra data suddenly prompts you to realize that you’re uncertain about your relationship to the speed limit. The situation and first-order measurement prompt a different response in the form of questions: What else do I need to know? and Where should I look? At this point, you move into second-order measurement and refer to the speedometer.

Second-order measurement, says Jerry, is the kind of measurement that engineers use to tune relatively stable systems, making them cheaper, stronger, lighter, more reliable, faster–or slower, if that’s what’s desired. Second-order measurement focuses on questions like What’s really happening? and How is it changing? tending to be more quantitative, subject to more refined models, and generally busier than first-order measurement. It is often assisted by external instruments to supplement or refine direct sensory intake. In particular, metrics–mathematical functions that relate objects or events to numbers via a model–are second-order measurements.

Back in the car, second-order measurement is the kind of information that you obtain from looking at the dashboard. You note that your speed is forty-three miles per hour and that the posted limit is thirty-five miles per hour. Your quantitative, second-order measurement tells you that you’re above the legal limit. The apprehensive feeling in your gut, triggered by the combination of police car and the second-order measurement, informs a decision to slow down.

What of third-order measurement? That, says Jerry, is the kind of precise, highly quantitative measurement that supports the physicist’s search for new natural laws. It helps us answer the question What happens? in a universal and general sense. But third-order measurement can be precise only because it tends to be about very simple systems (such as two interacting masses) or very simple models of complex systems (in which we choose to ignore many dimensions of the system, but analyze a very small number of dimensions very thoroughly). Perhaps most significantly, third-order measurement emphasizes and depends upon keeping messy human traits–variability, subjectivity, and values–out of the way. As noted in an important paper by Cem Kaner and Walter P. Bond, using metrics and higher-order measurement wisely depends on construct validity–critical rigor in evaluating the models and the functions that form the basis for the measurement.

In Rapid Testing, we define a control metric as any metric that drives a decision. Some development groups standardize the decision to ship the product when it contains a low-enough threshold number of high-severity bugs. Others consider a program adequately tested if there’s one positive and one negative test per “requirement” (meaning per line in a requirements document). Still others deem a test group “successful” if there is a low-enough percentage of rejected bug reports. By contrast, an inquiry metric is one that prompts a question: We have three open high-severity bugs–What’s the story there? Jim and Mark are two days behind where we thought they’d be–Do they need help? The program managers are deferring a lot of problem reports–Are the problems insignificant, or do we need more training because we don’t understand the product?

One of my recent clients rated the quality of its products and customer satisfaction with a basket of five second-order metrics. Each measurement collapsed months of work and tons of data into a single number. “Better” numbers earned praise; “worse” numbers earned a reprimand, so management meetings dragged on while people tried to explain changes from last month’s numbers–especially when things had gotten worse. At this company, schedules frequently slipped and shipments were often delayed. Yet when I asked testers the simple question: What slows you down? I got a wealth of information. They told me about broken and buggy builds, inadequate test environments, excessive emphasis on scripts that were out of date by the time the product arrived, and a lack of information about real customer needs. They also said they were wasting time collecting data that wasn’t being used to help speed up development or testing, and they offered dozens of ways in which the numbers could be gamed.

A different client, also working on one-year project cycles, focused on questions like: What happened this week? What did we get done? What problems did we run into? Managers used personal contact–direct observation of and conversation with people–as their primary approach to assessing the project’s status. They took a good number of quantitative measurements, but used them only as indicators to refine their initial assessments and to inform new first-order questions. The team made rough long-term estimates and more precise short-term estimates, dividing two-week cycles into tasks of two days or less, with clear deliverables that signaled completion. When tasks weren’t finished in the estimated time, no one was punished; instead, everyone considered what he hadn’t understood earlier, what he had learned, and what might inform a better estimate next time. Team members didn’t collect metrics on things that weren’t immediately interesting and important to them. They were interested in understanding the situation and optimizing the quality of the work, not in the appearances afforded by the metrics. They emphasized the game and the season over the box scores. And they consistently shipped high-quality products on time.

They did use one–and only one–control metric. When the amount of open problems exceeded a certain number, they stopped working on new features and fixed problems until the list was comprehensible and manageable again.

Jerry observes that in software engineering we seem obsessed with higher-order measurements. Why? He suggests that decisions about quality are political and emotional, based on discussions and decisions about whose values count and how much they count relative to one another. Such issues are often distasteful to people who want to appear rational and “scientific,” so we try to avoid those issues with appeals to higher-order measurement.

Each new software project involves a human context–interaction between different sets of clients, developers, tasks, and problems to solve, with high variability, contending values, and small sample sizes. In those environments, third-order measurement isn’t achievable; it’s an expensive distraction. That leaves us with cycles of first- and second-order inquiry measurement–not physics, but easily good enough to build and tune our systems.

References:

  • Thomson, William (Lord Kelvin). “Electrical Units of Measurement.” Popular Lectures and Addresses I (London, 1981-94).
  • Weinberg, Gerald M. Quality Software Management, Volume 2: First-Order Measurement. Dorset House Publishing, New York, 1993
  • Weinberg, Gerald M. Personal correspondence with the author, May 18, 2009.
  • Kaner, Cem and Walter P. Bond. “Software Engineering Metrics: What Do They Measure and How Do We Know?.” 10th International Software Metrics Symposium. Chicago, IL, 2004.
  • Weinberg, Gerald M. Quality Software Management, Volume 1: Systems Thinking. Dorset House Publishing, New York, 1991.
Michael Bolton

Michael Bolton provides training and consulting services in software testing and is a co-author with James Bach of Rapid Software Testing, a course and methodology on how to do testing more quickly, less expensively, and with excellent results. Contact Michael at mb@developsense.com.

Michael Bolton
Michael Bolton provides training and consulting services in software testing and is a co-author with James Bach of Rapid Software Testing, a course and methodology on how to do testing more quickly, less expensively, and with excellent results.

The Related Post

I’ve been reviewing a lot of test plans recently. As I review them, I’ve compiled this list of things I look for in a well written test plan document. Here’s a brain dump of things I check for, in no particular order, of course, and it is by no means a complete list. That said, if you ...
It’s a bird! It’s a plane! It’s a software defect of epic proportions.
Training has to be fun. Simple as that. To inspire changed behaviors and adoption of new practices, training has to be interesting, motivating, stimulating and challenging. Training also has to be engaging enough to maintain interest, as trainers today are forced to compete with handheld mobile devices, interruptions from texting, email distractions, and people who think they ...
Introduction Software Testing 3.0 is a strategic end-to-end framework for change based upon a strategy to drive testing activities, tool selection, and people development that finally delivers on the promise of Software Testing. For more details on the evolution of Software Testing and Software Testing 3.0 see: The Early Evolution of Software Testing Software Testing ...
The V-Model for Software Development specifies 4 kinds of testing: Unit Testing Integration Testing System Testing Acceptance Testing You can find more information here (Wikipedia): http://en.wikipedia.org/wiki/V-Model_%28software_development%29#Validation_Phases What I’m finding is that of those only the Unit Testing is clear to me. The other kinds maybe good phases in a project, but for test design it ...
Introduction Software Testing 3.0 is a strategic end-to-end framework for change based upon a strategy to drive testing activities, tool selection, and people development that finally delivers on the promise of software testing. For more details on the evolution of software testing and Software Testing 3.0 see: Software Testing 3.0: Delivering on the Promise of ...
Jeff Offutt – Professor of Software Engineering in the Volgenau School of Information Technology at George Mason University – homepage – and editor-in-chief of Wiley’s journal of Software Testing, Verification and Reliability, LogiGear: How did you get into software testing? What do you find interesting about it? Professor Offutt: When I started college I didn’t ...
Introduction This 2 article series describes activities that are central to successfully integrating application performance testing into an Agile process. The activities described here specifically target performance specialists who are new to the practice of fully integrating performance testing into an Agile or other iteratively-based process, though many of the concepts and considerations can be ...
Think you’re up for a challenge? Print this word search out! See if you can find all the words and learn a few new software testing terms in the process. To see how you’ve done, check your answers in the answer key below. *You can check the answer key here.
This article first appeared in BETTER SOFTWARE, May/June 2005. Executives and managers, get your performance testing teams out of the pit and ahead of the pack Introduction As an activity, performance testing is widely misunderstood, particularly by executives and managers. This misunderstanding can cause a variety of difficulties-including outright project failure. This article details the ...
This article was adapted from a presentation titled “How to Turn Your Testing Team Into a High-Performance Organization” to be presented by Michael Hackett, LogiGear Vice President, Business Strategy and Operations, at the Software Test & Performance Conference 2006 at the Hyatt Regency Cambridge, Massachusetts (November 7 – 9, 2006). Introduction Testing is often looked ...
When it is out of the question to delay delivery, the solution is a prioritization strategy in order to do the best possible job within the time constraints. The scenario is as follows: You are the test manager. You made a plan and a budget for testing. Your plans were, as far as you know, ...

Leave a Reply

Your email address will not be published.

Stay in the loop with the lastest
software testing news

Subscribe