August 2009 – Three Kinds of Measurement and Two Ways to Use Them

This article was originally featured in the July/August 2009 issue of Better Software magazine. Read the entire issue or become a subscriber.

People often quote Lord Kelvin: “I often say that when you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot express it in numbers, your knowledge is of a meager and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely, in your thoughts, advanced to the stage of science, whatever the matter may be.”But, few note the sentence that precedes the passage: “In physical science the first essential step in the direction of learning any subject is to find principles of numerical reckoning and practicable methods for measuring some quality connected with it.” The missing sentence prompts some questions: Are software development and testing sciences subject to the same kind of numerical measurement that we use in physics? If not, what kinds of measurements should we use? How could we think more usefully about measurement?

Gerald M. (Jerry) Weinberg suggests thinking in terms of three broad categories. First-order measurement, he says, is what we need to get started–“just adequate to the task of getting something built.” First-order measurement tends to be qualitative, fast, and inexpensive; it generally doesn’t require mechanisms or devices to enhance or extend the observation. In a recent conversation, Jerry told me that first-order measurements “are unobtrusive, or minimally obtrusive, and can be used without a whole lot of fuss. They help give you a lot of important information that can lead to other information or, in the best case, to immediate action if needed.”

First-order measurement is what we’re doing most of the time as we’re driving a car. We look through the windows, listen to the engine, and feel the acceleration and deceleration. We make observations and comparisons without getting hung up on quantification. “The road is dry. It’s cloudy. There’s traffic on the right and a car up ahead with its brake lights on.” First-order measurement suggests answers to the questions What seems to be happening? and What should I do now? In this situation, if you feel like you’re driving too fast, you probably are driving too fast. If so, first-order measurement is enough to inform an immediate and appropriate action: slow down.

Because it’s based on ongoing experience and feelings, rather than on careful experiments and controlled data intake, wise use of first-order measurement requires us to consider a number of possible interpretations of the meaning and significance of what we see. Suppose you feel like you’re driving fast, but not too fast. Now you observe a set of red and blue lights on the top of the car ahead. The extra data suddenly prompts you to realize that you’re uncertain about your relationship to the speed limit. The situation and first-order measurement prompt a different response in the form of questions: What else do I need to know? and Where should I look? At this point, you move into second-order measurement and refer to the speedometer.

Second-order measurement, says Jerry, is the kind of measurement that engineers use to tune relatively stable systems, making them cheaper, stronger, lighter, more reliable, faster–or slower, if that’s what’s desired. Second-order measurement focuses on questions like What’s really happening? and How is it changing? tending to be more quantitative, subject to more refined models, and generally busier than first-order measurement. It is often assisted by external instruments to supplement or refine direct sensory intake. In particular, metrics–mathematical functions that relate objects or events to numbers via a model–are second-order measurements.

Back in the car, second-order measurement is the kind of information that you obtain from looking at the dashboard. You note that your speed is forty-three miles per hour and that the posted limit is thirty-five miles per hour. Your quantitative, second-order measurement tells you that you’re above the legal limit. The apprehensive feeling in your gut, triggered by the combination of police car and the second-order measurement, informs a decision to slow down.

What of third-order measurement? That, says Jerry, is the kind of precise, highly quantitative measurement that supports the physicist’s search for new natural laws. It helps us answer the question What happens? in a universal and general sense. But third-order measurement can be precise only because it tends to be about very simple systems (such as two interacting masses) or very simple models of complex systems (in which we choose to ignore many dimensions of the system, but analyze a very small number of dimensions very thoroughly). Perhaps most significantly, third-order measurement emphasizes and depends upon keeping messy human traits–variability, subjectivity, and values–out of the way. As noted in an important paper by Cem Kaner and Walter P. Bond, using metrics and higher-order measurement wisely depends on construct validity–critical rigor in evaluating the models and the functions that form the basis for the measurement.

In Rapid Testing, we define a control metric as any metric that drives a decision. Some development groups standardize the decision to ship the product when it contains a low-enough threshold number of high-severity bugs. Others consider a program adequately tested if there’s one positive and one negative test per “requirement” (meaning per line in a requirements document). Still others deem a test group “successful” if there is a low-enough percentage of rejected bug reports. By contrast, an inquiry metric is one that prompts a question: We have three open high-severity bugs–What’s the story there? Jim and Mark are two days behind where we thought they’d be–Do they need help? The program managers are deferring a lot of problem reports–Are the problems insignificant, or do we need more training because we don’t understand the product?

One of my recent clients rated the quality of its products and customer satisfaction with a basket of five second-order metrics. Each measurement collapsed months of work and tons of data into a single number. “Better” numbers earned praise; “worse” numbers earned a reprimand, so management meetings dragged on while people tried to explain changes from last month’s numbers–especially when things had gotten worse. At this company, schedules frequently slipped and shipments were often delayed. Yet when I asked testers the simple question: What slows you down? I got a wealth of information. They told me about broken and buggy builds, inadequate test environments, excessive emphasis on scripts that were out of date by the time the product arrived, and a lack of information about real customer needs. They also said they were wasting time collecting data that wasn’t being used to help speed up development or testing, and they offered dozens of ways in which the numbers could be gamed.

A different client, also working on one-year project cycles, focused on questions like: What happened this week? What did we get done? What problems did we run into? Managers used personal contact–direct observation of and conversation with people–as their primary approach to assessing the project’s status. They took a good number of quantitative measurements, but used them only as indicators to refine their initial assessments and to inform new first-order questions. The team made rough long-term estimates and more precise short-term estimates, dividing two-week cycles into tasks of two days or less, with clear deliverables that signaled completion. When tasks weren’t finished in the estimated time, no one was punished; instead, everyone considered what he hadn’t understood earlier, what he had learned, and what might inform a better estimate next time. Team members didn’t collect metrics on things that weren’t immediately interesting and important to them. They were interested in understanding the situation and optimizing the quality of the work, not in the appearances afforded by the metrics. They emphasized the game and the season over the box scores. And they consistently shipped high-quality products on time.

They did use one–and only one–control metric. When the amount of open problems exceeded a certain number, they stopped working on new features and fixed problems until the list was comprehensible and manageable again.

Jerry observes that in software engineering we seem obsessed with higher-order measurements. Why? He suggests that decisions about quality are political and emotional, based on discussions and decisions about whose values count and how much they count relative to one another. Such issues are often distasteful to people who want to appear rational and “scientific,” so we try to avoid those issues with appeals to higher-order measurement.

Each new software project involves a human context–interaction between different sets of clients, developers, tasks, and problems to solve, with high variability, contending values, and small sample sizes. In those environments, third-order measurement isn’t achievable; it’s an expensive distraction. That leaves us with cycles of first- and second-order inquiry measurement–not physics, but easily good enough to build and tune our systems.


  • Thomson, William (Lord Kelvin). “Electrical Units of Measurement.” Popular Lectures and Addresses I (London, 1981-94).
  • Weinberg, Gerald M. Quality Software Management, Volume 2: First-Order Measurement. Dorset House Publishing, New York, 1993
  • Weinberg, Gerald M. Personal correspondence with the author, May 18, 2009.
  • Kaner, Cem and Walter P. Bond. “Software Engineering Metrics: What Do They Measure and How Do We Know?.” 10th International Software Metrics Symposium. Chicago, IL, 2004.
  • Weinberg, Gerald M. Quality Software Management, Volume 1: Systems Thinking. Dorset House Publishing, New York, 1991.
Michael Bolton

Michael Bolton provides training and consulting services in software testing and is a co-author with James Bach of Rapid Software Testing, a course and methodology on how to do testing more quickly, less expensively, and with excellent results. Contact Michael at

Michael Bolton
Michael Bolton provides training and consulting services in software testing and is a co-author with James Bach of Rapid Software Testing, a course and methodology on how to do testing more quickly, less expensively, and with excellent results.

The Related Post

Introduction All too often, senior management judges Software Testing success through the lens of potential cost savings. Test Automation and outsourcing are looked at as simple methods to reduce the costs of Software Testing; but, the sad truth is that simply automating or offshoring for the sake of automating or offshoring will only yield poor ...
“Combinatorial testing can detect hard-to-find software faults more efficiently than manual test case selection methods.” Developers of large data-intensive software often notice an interesting—though not surprising—phenomenon: When usage of an application jumps dramatically, components that have operated for months without trouble suddenly develop previously undetected errors. For example, newly added customers may have account records ...
Last week I went to StarWest as a presenter and as a track chair to introduce speakers. Being a track chair is wonderful because you get to interface more closely with other speakers. Anyway…one of the speakers I introduced was Jon Bach. Jon is a good public speaker, and I was pleasantly surprised that he ...
Are you looking for the best books on software testing methods? Here are 4 books that should be on your reading list! The Way of the Web Tester: A Beginner’s Guide to Automating Tests By Jonathan Rasmusson Whether you’re a traditional software tester, a developer, or a team lead, this is the book for you! It ...
This article was developed from concepts in the book Global Software Test Automation: Discussion of Software Testing for Executives. Introduction When thinking of the types of Software Testing, many mistakenly equate the mechanism by which the testing is performed with types of Software Testing. The mechanism simply refers to whether you are using Manual or ...
For mission-critical applications, it’s important to frequently develop, test, and deploy new features, while maintaining high quality. To guarantee top-notch quality, you must have the right testing approach, process, and tools in place.
The V-Model for Software Development specifies 4 kinds of testing: Unit Testing Integration Testing System Testing Acceptance Testing You can find more information here (Wikipedia): What I’m finding is that of those only the Unit Testing is clear to me. The other kinds maybe good phases in a project, but for test design it ...
March Issue 2020: Smarter Testing Strategies for The Modern SDLC
Jeff Offutt – Professor of Software Engineering in the Volgenau School of Information Technology at George Mason University – homepage – and editor-in-chief of Wiley’s journal of Software Testing, Verification and Reliability, LogiGear: How did you get into software testing? What do you find interesting about it? Professor Offutt: When I started college I didn’t ...
One of the most dreaded kinds of bugs are the ones caused by fixes of other bugs or by code changes due to feature requests. I like to call these the ‘bonus bugs,’ since they come on top on the bug load you already have to deal with. Bonus bugs are the major rationale for ...
Having developed software for nearly fifteen years, I remember the dark days before testing was all the rage and the large number of bugs that had to be arduously found and fixed manually. The next step was nervously releasing the code without the safety net of a test bed and having no idea if one ...

Leave a Reply

Your email address will not be published.

Stay in the loop with the lastest
software testing news