Finally, Three Ways to Automate iOS Testing

iOS culture, even in many large organizations with skilled engineers, is behind on up-to-date testing practices.

Agile development has long been all the rage; indeed, in most modern development shops the great agile methodologies are old hat. If you come from a software background like Ruby on Rails, Python, or certain Java niches, you may–until recently–have experienced a small jolt of culture shock when encountering the deep obstacles that agile development practices faced on the iOS platform.

This article outlines our experience using TDD to build HowAboutWe Dating for iPad and iPhone. We’ll describe the stack of tools we use for testing and continuous integration and how we use them to speed the delivery of quality software. 

When we made automated tests a requirement for completing a feature or bug-fix ticket, our QA churn dropped radically; our crash instances plummeted; developer confidence improved because we saw the risk of making changes go down; and we could better predict our release readiness without emergency feature cuts.

Our most important tools are Kiwi for unit testing (what Xcode calls “logic tests”) our model and controller logic; KIF for integration testing of user-facing behavior; and CruiseControl.rb for continuous integration to keep us honest. We also have some key practices that guide our use of these tools.

Tool number one: kiwi for unit testing

If you’ve ever used RSpec, you’re familiar with the likes of:

describe RingOfPower do
 it 'takes a name in the constructor' do
my_precious = RingOfPower.new('The One Ring')
my_precious.name.should eq('The One Ring')
end
end

Allen Ding’s Kiwi is a testing framework for iOS with an RSpec-inspired syntax. It makes slick use of Objective-C blocks and lends itself to readable, contextualized tests.

Kiwi is a very complete framework, with many of the levers and knobs you’d reach for regularly in RSpec, like:

describe(@"Sting", ^{it(@"does not glow, normally",
^{[[theValue([Sting uniqueInstance].isGlowing) should] beFalse];});
context(@"there's an orc about",^{__block Orc *anOrc = nil;beforeEach(^{
anOrc = [[Orc alloc] init];});
it(@"glows", ^{
[[theValue([Sting unique Intance].isGlowing) should] beTrue];});
});});
  • Nestable contexts
  • Blocks to call before and after each or all specs in the context
  • A rich set of expectations
  • Mocks and stubs
  • Asynchronous testing

In addition, Kiwi is built on top of OCUnit, which means that it integrates seamlessly with Xcode logic tests and that you can reuse your old OCUnit tests, if you want to do a whole-hog migration to Kiwi. We prefer Kiwi to raw OCUnit, mainly for the elegant syntax–the nested blocks are easy to scan, and the specs are about as smooth to write as you could hope for in Objective-C.

We use kiwi

With our models (most of which are subclassed from NSManagedObject), we test all the code not generated for us. This includes parsing JSON from our API into Objective-C instances; all model-level internal logic, such as converting a user’s gender and orientation into a complementary set of genders and orientations to search for; and important inter-model interactions, as between messages and message threads.

Helpers and categories are another place where Kiwi and TDD shine. We’ve test-driven a set of CGRect helper functions that aid us in smart photo cropping; a photo cache; and a category of time- and sanity-saving methods on NSLayoutConstraint.

We’ve also been driving toward thinning out our view controllers, and a lot of that involves factoring complex code into separate, single-responsibility objects. An example: In our app’s messaging module, we offer an Inbox, a Sent messages folder, and an Archive folder. The three boxes have different behaviors (e.g., you can only archive a thread from the Inbox), and an earlier revision of the messaging view controller had a lot of if-inbox-then-do-X-else-if-sent-do-Y-else, plus a lot of code to make sure the correct message folder was loaded and visible, that Sent and Inbox were properly synced but sorted slightly differently, different empty state strings were displayed for each folder, etc.

Fat controllers and repeated if-else chains are both code smells, and we used Kiwi tests to drive out a single solution to both of them: a separate MessageStore object that handled the juggling of messages and threads. The messaging view controller tells the MessageStore when the user switches modes and queries the MessageStore for the contents of the current folder, appropriate loading and empty strings, and for yes/no answers to behavioral questions like, “Should I expose an Archive button?” The controller is slimmed and the chained if-else-if statements are replaced by data structures that will be easily extensible if we decide to add a fourth folder.

Kiwi specs were integral to building the MessageStore with minimalism and correctness. To give you a taste, here are two specs that cover message-archiving behavior:

beforeEach(^{ messageStore = [[MessageStore alloc] init]; }); // uninitialized afterEach(^{ messageStore = nil; });
it(@”says whether you can archive messages in the current folder”, ^{[[theBlock(^{ [messageStore canArchiveThreads]; }) should] raise]; messageStore.mode = MessageStoreInbox;
[[theValue(messageStore.canArchiveThreads) should] equal:theValue(YES)]; messageStore.mode = MessageStoreSent; [[theValue(messageStore.canArchiveThreads) should] equal:theValue(NO)];messageStore.mode = MessageStoreArchive;
[[theValue(messageStore.canArchiveThreads) should] equal:theValue(NO)];
});

This first spec tells us that if the MessageStore is uninitialized, it should throw an exception when asked whether archiving behavior should be exposed; otherwise, it should give a boolean answer appropriate to its current mode. If the user requests that a thread be archived, the MessageStore handles that, as defined in the second spec.

This spec sets up an Inbox containing message objects (here represented by some dummy NSNumber objects–the MessageStore does not actually care about the type of the objects it is holding) and mimics various requests to pull an object from the Inbox and insert it into a particular place in the archive folder. For modes where the user should not be allowed to archive messages (as defined in the previous spec) or when invalid indices in the Inbox or archive collections are specified, an exception should be thrown; otherwise, the appropriate object should change folders.

The full spec is about 250 LoC, and canonical red-green-refactor TDD drove out an implementation of about 200 LoC. Visible, facile metrics like this scare some people off TDD, because they just see the cost of more code; I see this and know that I’ve written and tested a well-specified, tight bundle of logic, and I took one of the flakier, harder-to-maintain pieces of our app and broke it into solid, loosely-coupled modules that work reliably. The test-driven MessageStore and the concomitant simplification of the messaging view controller purged a whole class of hard-to-diagnose bugs from our issue tracker. When it comes to stabilizing the most-used parts of your app, 250 lines of straightforward, declarative test code is cheap.

One limitation of Kiwi is that it’s not so good for testing UIKit-derived classes or anything that touches them. This is actually a limitation of Xcode logic tests–they don’t fire up a UIApplication instance and don’t play nicely with UIKit. To test elements of the project that can’t be separated from the UI, we use automated integration tests.

Tool number two: KIF for integration testing

Kiwi helps us keep our lovely abstractions lovely, but what of the user-facing parts of the app? And how do we know that all the pieces work together? For integration tests of user-visible behavior, we use Square’s KIF library. It uses the iOS accessibility framework to simulate user interaction with the app.

Testing every facet of the app by automatically driving the app through every possible user action would be insanely costly, and it would rapidly get to the point of diminishing returns. In addition, the fact that the tests run in the simulator by faking user behavior means that the tests run at human-ish speeds, not as fast as the CPU can run through them. Integration testing in the sim requires a number of additional practices and judgment calls to make it sane and valuable.

First, the tests have to be decoupled from the outside world. We’ve used method swizzling to stub out all our network calls and give back dynamically generated, predictable data to drive the app.

Each of those stub methods returns a simulated API response, based on responses recorded in an actual session. This keeps us from having to stand up a web server to test the client app, and puts the inputs to the tests entirely under our control. We frequently have the stubs respond to different inputs by returning different data or exceptions so that we can simulate behaviors like paging data, network failure modes, etc.

Second, the integration tests have to be decoupled from each other. If you run 50 integration tests one after the other and make a change to the fourth test that alters the app’s state in a persistent way, you risk breaking the next 46 tests. To mitigate this risk, we bundle the tests into related modules and run steps to log the test user out and clear the database between modules. Where it’s important that an intermodule dependency be tested (e.g., a message sent from a user profile should show up in the logged-in user’s Sent folder), we write a test for it, but otherwise we try to keep the KIF test scenarios limited to one screen or a small set of related screens, each testing a limited but meaningful set of user behavior.

Third, a lot of judgment needs to be exercised in what gets tested. It is impossible to test every possible user input, but you want to hit all your major error states as well as at least one valid input. It is impossible to test every path through the code, but you want to reasonably simulate the things a user is likely to do and spend a little more effort on the parts of the code that matter most to the user experience.

If you’ve read about the pros and cons of integration testing, you’ve heard some version of the issues I’ve described above (coupling-induced fragility, impossibility of total coverage, etc.) as reasons why integration tests are a bad thing. Certainly, we’ve found them costlier to write and maintain than unit tests. The way we’ve applied them, though, has given us far too much value to even consider discarding them: Where we’ve used unit tests to write quality code, the integration tests have been invaluable in helping us maintain it. They form our “regression firewall”, and if the test board is green, then the developers, product managers, and QA all know that none of the big stuff has gone wrong. Bugs still get through, but there will tend to be stuff around the edges.

In the rare case something big gets through, we add it to the suite. It happened recently that a release made it into the wild with a 100% reproducible crash when the subscription screen was reached by a certain path.

From this short repro, you can get the feel of KIF tests: Check the state of the screen (mostly via the accessibilityLabel and accessibilityValue properties of screen elements), interact with screen elements, check the state, interact, and so on.

Our crash occurred on the last line of the repro above. We added the steps to the suite, ran it, watched it crash; then we fixed the bug, ran the test, watched it pass; then we ran the rest of the upgrade-related test module to make sure we didn’t break anything. This is a much longer process than just diving in and fixing the bug as soon as you’ve diagnosed it, but it boosts confidence among everyone who builds or inspects the app in two ways: They trust that we haven’t broken existing functionality (thanks to existing tests), and they trust that the bug being addressed in the new test won’t return.

One KIF trick that comes in handy is defining our own one-off steps. Any parameterizable process that gets used more than once gets a factory method in our own category on the KIFTestStep class, but sometimes the code is made more comprehensible when a task that only happens once is defined inline with a block.

The cloud inside the silver lining

There are two major downsides to KIF. The first is the syntax–all those addStep: calls are actually building the test suite, not running it, and there’s no clean way to set a break point at a particular instance of a step (unless you’ve defined the step yourself). We tolerate it because KIF is the best thing we’ve found for this type of testing. We feel it has yet to achieve true maturity, and we’ve extended it quite a bit for our own purposes, but it largely does what it says it will and has served as one of the pillars of our testing strategy.

The other pain point is the run time of the tests. Our full suite takes nearly 15 minutes to run, which makes it useless for fast-iterating styles of TDD/BDD. Our usual method of handling this goes something like:

  • Run only the test scenario related to the feature or bug being addressed.
  • Once the central test passes, run related test modules to make sure nothing was broken.
  • In cases where confidence is low or a change is far-reaching, run the full suite at the developer bench. Otherwise, merge your change (after review: See below) and be ready to jump back on it if the CI board (again: See below) goes red.

This is another one of those times when developer judgment plays a key role. Running the full suite is a major break in your rhythm, especially when you’re making (what feels like) a small change. The value of moving on to the next thing needs to be weighed against the risk inherent in the change being made, and the evaluation of that risk depends on one’s intimacy with the code and seasoning as a programmer. In the section on practices below, I go into how we buttress individual judgment with the collective wisdom of the team.

Tool number three: CruiseControl.rb for continuous integration

CruiseControl.rb is a darling of the Rails community, but it’s not just for Rails apps. It’s quick to set up–including on a Mac, which is required to run our Xcode-based tests–and can run and extract results from arbitrary build-and-test scripts. CC.rb handles polling our GitHub repositories for changes to projects; our custom scripts do the rest, and CC.rb reports red/green for each project by checking standard Unix return values from the scripts.

First the common library shared by our iPhone and iPad projects gets built, and its Kiwi tests are run:

#!/bin/bash
git submodule init
git submodule update
xcodebuild -scheme HAWCommonTestsCL -sdk iphonesimulator \

TEST_AFTER_BUILD=YES -arch i386 clean build | grep “BUILD SUCCEEDED”

It’s that simple; the result of the final grep for Kiwi’s success message is the success or failure that CruiseControl.rb reports.

Running the KIF tests for the iPhone and iPad projects is a bit more of a production. We have to do extra steps to build the common library prior to the main project, and we have to use Waxsim (we prefer Jonathan Penn’s excellent fork) to run the simulator from the command line and capture console output, and sift through that console output for success or failure messages. The end result is the same, though: The return value of the script is reported as the test outcome by CC.rb.

Continuous Integration, of course, is only as good as the speed with which it gives you feedback. We have our CI set up on a Mac Mini running Screen Sharing enabled, and with the CC.rb dashboard exposed on a convenient port. CruiseControl.rb can be set up to send email, but our inboxes are plenty cluttered already.

We get the results through two main channels:

To broadcast status to management and the wider team, we keep CruiseControl.rb’s web dashboard on a large screen mounted on a wall overlooking the developers’ corner of the shop.

The public display of results is an important driver of good testing habits. As soon as a team gets used to meeting high expectations for test reliability, a red test suite on display for all to see becomes a distracting irritant. While we don’t generally suggest introducing distracting irritants into technical workflows, in this case the irritation is confluent with the worthy goal of maintaining a reliable test suite that consistently inspires confidence in everyone who builds or depends on the software.

Our best practices: The rules to guide the tools

Tools are only valuable when they are used well. We surround our tools with processes to get the most out of them, and tune those processes as we go in response to real-world feedback. Here are three simple rules that guide our use of the tools described above:

A: TDD

Red: You write a test describing the behavior you want, run it, and watch it fail.Green: You nudge your code into a state where the test passes.Refactor: You inspect your code (and your tests) for duplication and other issues, and remove them. The tests keep you from breaking anything.

The last step is probably the least-understood, most-skipped one in the process. People carry a lot of weird, fuzzy definitions in their heads for the word refactor, often having to do with larger re-architecture of code. In the TDD context, it has a very specific meaning: Refactoring is changing code without changing behavior. An important sidebar to this is that you have to verify the constancy of the behavior, or you’re not really refactoring–you’re just changing stuff. Automated tests are one (relatively cheap) way to do this–you can have confidence that the behavior being tested hasn’t changed.

Skipping the refactoring step is a sure path to technical debt. The refactoring step is doing the dishes after dinner, pouring water on the ashes of your campfire, cleaning your rifle after you’ve fired it. After you’ve added or altered code, any duplicated bits should be factored out into methods (and tests run again); any ugly or slow bits should be reworked (and tests run again); any dead or commented code should be removed . . . You get the picture. Professional software development isn’t a game of seeing how quickly we can deliver working code; there is the cost of future change to consider, too, and leaving your code clean makes life better for the next person who touches it (and that will as often as not be you).

All that to say: The automated tests don’t increase your product quality. They provide the support and confidence for you to apply your skill and judgment toward improving the quality of your code.

B: Tests Should Always Be Green

Tests do not inspire confidence when they’re failing. Above, I mentioned the social incentive built into making our tests results public, but the practical value is important, too. The tests exist to convince us that we haven’t broken anything. The moment you allow yourself to get comfortable with broken or inconsistent tests, you’ve lost sight of why you built them to begin with. There may be situations where you allow them to be red for a short time, even a few days during a large re-architecture, but these should be extreme situations, and you should not get comfortable with them.

We foster a culture of personal responsibility around the tests. If your name is on the failing commit, it’s yours to fix. In the event that the committer can’t address it right then, the next person with a free hand is responsible for investigating the issue.

C: 2 > 1

A minimum of two people must look at every piece of our code before it gets merged into the master branch. We have two ways to meet this requirement: pair programming and GitHub pull requests. In both cases, the goal is to have a second active collaborator devoting attention to the problem, to overcome the individual tendency to code to the happy path, and to bring different skills and perspectives to the question of the best way to implement something.

In the case of pull requests, the programmer inspecting the code is expected to run both automated and manual tests, as well as applying a critical eye to the code. With both practices (pairing and pull requests), both parties are expected to be active collaborators; no one should be rubber-stamping someone else’s decisions.

Conclusion

iOS culture, even in many large organizations with skilled engineers, is behind on up-to-date testing practices. Aggressive mobile strategies up against lengthy App Store release cycles and manual user app updates create pressure to jettison code and best practices that might be seen as “extras.”

It’s ironic that iOS development–the catalyst of the consumer web explosion of the past few years–has been a reluctant late comer to TDD, perhaps the most cherished methodology of the agile web development culture that is building the consumer Internet.

Over the platform’s short history, agile methodology and TDD have been at odds in iOS development culture. The agile desire for speed has taken precedence over other concerns due to a past dearth of high-powered automated testing frameworks, and the results have often been high crash rates, long QA cycles, and a whole series of tribulations that the modern developer associates with the antiquities of waterfall development.

Our experience building HowAboutWe Dating for iPhone and iPad has shown that TDD and CI on iOS are well worth the effort. The tools are young but rapidly maturing. It is possible! Our move to a genuine culture of TDD on iOS has transformed the quality of our software and how quickly and predictably we can deliver it. So we’re believers that any organization not already employing these practices should dive in and measure the results for themselves.

HowAboutWe is the modern love company and has launched a series of products designed to help people fall in love and stay in love.

Aaron Schildkrout
Aaron Schildkrout is co-founder and co-CEO of HowAboutWe, where he runs product.
Brad Heintz
Brad Heintz is the Lead iOS Developer at HowAboutWe. When he’s not bringing people better love through TDD, he’s tinkering, painting, or playing the Chapman Stick.
James Paolantonio
James Paolantonio is a mobile engineer at HowAboutWe, specializing in iOS applications. He has been developing mobile apps since the launch of the first iPhone SDK. Besides coding, James enjoys watching sports, going to the beach, and scuba diving.

This article was originally published in FastCompany.

Brad Heintz
Brad Heintz is the Lead iOS Developer at HowAboutWe. When he’s not bringing people better love through TDD, he’s tinkering, painting, or playing the Chapman Stick.
Aaron Schildkrout
Aaron Schildkrout is co-founder and co-CEO of HowAboutWe, where he runs product.
James Paolantonio
James Paolantonio is a mobile engineer at HowAboutWe, specializing in iOS applications. He has been developing mobile apps since the launch of the first iPhone SDK. Besides coding, James enjoys watching sports, going to the beach, and scuba diving.

The Related Post

Mobile is no longer an area that a few UX people specialize in, and we need to start designing and testing everything for smartphones and tablets as well as computers. If you’re new to mobile usability testing, fear not. It is not as hard as you might think but there are some key differences from ...
To start with, we need a Test schedule. The same is created in the process of developing the Test plan. In this schedule, we have to estimate the time required for testing of the entire Data Warehouse system. There are different methodologies available to create a Test schedule. None of them are perfect because the ...
Strategies to Approach Mobile Web App Testing Mobile web technology has been continuously changing over the past few years, making “keeping up” challenging. In this article, Raj Subramanian covers the latest trends and changes happening in the mobile web and how testers can prepare for them.
LogiGear Magazine December 2012 – Mobile Test Automation  
This is the second part of a two part article that analyzes the impact of product development for the internet of things (IoT) on software testing.  Part one of this article (LogiGear Magazine, Sept 2014) gave a wide view on the IoT, embedded systems, and the device development aspects of testing on these projects. This ...
 LogiGear_Magazine_October_2014_Testing_Smart_and_Mobile
Great mobile testing requires creativity to find problems that matter. I’d like to take you through the thought process of testers and discuss the types of things they consider when testing a mobile app. The intention here is to highlight their thought processes and to show the coverage and depth that testers often go to.
Whether Or Not You Have a Mobile App You’re walking down the street. You see something interesting, and you want to know more about it. What do you do? Do you wait until you get home, open up your laptop, and type “google.com” into your search bar?
Devices matter. We don’t yet trust the mobile devices like we trust desktops and laptops. In the course of testing traditional web applications, rarely do you have to think about the model of the actual machine. In mobile, however, the behavior of an application can vary from device to device. You can no longer just ...
Organizations need to implement automated testing strategies designed specifically for mobile applications. Mobile device usage continues to gain momentum at increasing speed. Enterprises that delay mobile adoption face the danger of becoming competitively disadvantaged. But, before jumping in headlong, you need to be fully aware of the unique challenges that can arise when developing and implementing ...
I am not a big fan of concepts which moves industry standards to IT. I am rather a Agile and Scrum guy. Managing multiple projects at once and trying to set a highest quality standard is a challenge and this book shows how industrial language can be translated into software development. I do not think that it ...
What you need to know for testing in the new paradigm This two part article analyzes the impact of the Internet of Things (IoT) product development on traditional testing. Part one of this series starts with a wide view on the IoT, embedded systems and device development aspects of testing. Part two, to be published ...

Leave a Reply

Your email address will not be published.

Stay in the loop with the lastest
software testing news

Subscribe