Testing Netflix on Android

When Netflix decided to enter the Android ecosystem, we faced a daunting set of challenges:

1. We wanted to release rapidly (every 6-8 weeks).

2. There were hundreds of Android devices of different shapes, versions, capacities, and specifications which need to playback audio and video.

3. We wanted to keep the team small and happy.

Of course, the seasoned tester in you has to admit that these are the sort of problems you like to wake up to every day and solve. Doing it with a group of fellow software engineers who are passionate about quality is what made overcoming those challenges even more fun. 

Release rapidly

You probably guessed that automation had to play a role in this solution. However automating scenarios on the phone or a tablet is complicated when the core functionality of your application is to play back videos natively but you are using an HTML5 interface which lives in the application’s web view.

Verifying an app that uses an embedded web view to serve as its presentation platform was challenging in part due to the dearth of tools available. We considered Selenium, Android Native Driver, and the Android Instrumentation Framework. Unfortunately, we could not use Selenium or the Android Native Driver, because the bulk of our user interactions occur on the HTML5 front end. As a result, we decided to build a slightly modified solution.

Our modified test framework heavily leverages a piece of our product code which bridges JavaScript and native code through a proxy interface. Though we were able to drive some behavior by sending commands through the bridge, we needed an automation hook in order to report state back to the automation framework. Since the HTML document doesn’t expose its title, we decided to use the title element as our hook. We rely on the onReceivedTitle notification as a way to communicate back to our Java code when some Javascript is executed in the HTML5 UI. Through this approach, we were able to execute a variety of tasks by injecting JavaScript into the web view, performing the appropriate DOM inspection task, and then reporting the result through the title property.

With this solution in place, we are able to automate all our key scenarios such as login, browsing the movie catalog, searching, and controlling movie playback.

While we automate the testing of playback, the subjective analysis of quality is still left to the tester. Using automation we can catch buffering and other streaming issues by adding testability in our software, but at the end of the day we need a tester to verify issues such as seamless resolution switching or HD quality which are hard to achieve today using automation and also cost prohibitive.

We have a continuous build integration system that allows us to run our automated smoke tests on each submit on a bank of devices. With the framework in place, we are able to quickly ascertain build stability across the vast array of makes and models that are part of the Android ecosystem. This quick and inexpensive feedback loop enables a very quick release cycle as the testing overhead in each release is low given the stakes.

Device diversity

To put device diversity in context, we see around 1000 different devices streaming Netflix on Android every day. We had to figure out how to categorize these devices in buckets so that we can be reasonably sure that we are releasing something that will work properly on these devices. So the devices we choose to participate in our continuous integration system are based on the following criteria.

We have at least one device for each playback pipeline architecture we support (The app uses several approaches for video playback on Android such as hardware decoder, software decoder, OMX-AL, iOMX).

We choose devices with high and low end processors as well as devices with different memory capabilities.

We have representatives that support each major operating system by make in addition to supporting custom ROMs (most notably CM7, CM9).

We choose devices that are most heavily used by Netflix Subscribers.

With this information, we have taken stock of all the devices we have in house and classified them based on their specs. We figured out the optimal combination of devices to give us maximum coverage. We are able to reduce our daily smoke automation devices to around 10 phones and 4 tablets and keep the rest for the longer release wide test cycles.

This list gets updated periodically to adjust to the changing market conditions. Also note that this is only the phone list, we have a separate list for tablets. We have several other phones that we test using automation and a smaller set of high priority tests, the list above goes through the comprehensive suite of manual and automation testing.

To put it another way, when it comes to watching Netflix, any device other than those ten devices can be classified with the high priority devices based on their configuration. This in turn helps us to quickly identify the class of problems associated with the given device.

Small happy team

We keep our team lean by focusing our full time employees on building solutions that scale and automation is a key part of this effort. When we do an international launch, we rely on crowd-sourcing test solutions like uTest to quickly verify network and latency performance. This provides us real world insurance that all of our backend systems are working as expected. These approaches give our team time to watch their favorite movies to ensure that we have the best mobile streaming video solution in the industry.

Amol Kher
Amol is the Chief Technical Officer at Wello, a startup that focuses on connecting users with trainers over live video. He is currently developing a mobile app for the product. In his previous job, he was the engineering manager for the tools and test team at Netflix mobile team and was responsible for shipping the Netflix Mobile app on the iOS, Android and AppleTV platforms. Prior to Netflix, he was an early engineer on the Chrome for Android team at Google.

The Related Post

What is the Automation ROI ticker? The LogiGear Automation Return on Investment (ROI) ticker, the set of colored numbers that you see above the page, shows how much money we presumably save our customers over time by employing test automation as compared to doing those same tests manually, both at the design and execution level.
When automated tests are well-organized and written with the necessary detail, they can be very efficient and maintainable. But designing automated tests that deal with data can be challenging if you have a lot of data combinations. For example, let’s say we want to simulate a series of 20 customers, along with the number of ...
I recently came back from the Software Testing & Evaluation Summit in Washington, DC hosted by the National Defense Industrial Association. The objective of the workshop is to help recommend policy and guidance changes to the Defense enterprise, focusing on improving practice and productivity of software testing and evaluation (T&E) approaches in Defense acquisition.
For this interview, we talked to Greg Wester, Senior Member Technical Staff, Craig Jennings, Senior Director, Quality Engineering and Ritu Ganguly, QE Director at Salesforce. Salesforce.com is a cloud-based enterprise software company specializing in software as a service (SaaS). Best known for its Customer Relationship Management (CRM) product, it was ranked number 27 in Fortune’s 100 ...
This article was developed from concepts in the book Global Software Test Automation: Discussion of Software Testing for Executives. Introduction There are many potential pitfalls to Manual Software Testing, including: Manual Testing is slow and costly. Manual tests do not scale well. Manual Testing is not consistent or repeatable. Lack of training. Testing is difficult ...
Framework: An abstraction in which software providing generic functionality can be selectively changed by additional user written code, thus providing application specific software. A software framework is a universal, reusable software platform used to develop applications, products and solutions. Harness: A collection of software and test data configured to test a program unit by running it under varying conditions and monitoring ...
The growing complexity of the Human-Machine Interface (HMI) in cars offers traditional testers an opportunity to capitalize on their strengths. The human-machine interface (HMI) is nothing new. Any user interface including a graphical user interface (GUI) falls under the category of human-machine interface. HMI is more commonly being used to mean a view into the ...
An Overview of Four Methods for Systematic Test Design Strategy Many people test, but few people use the well-known black-box and white-box test design techniques. The technique most used, however, seems to be testing randomly chosen valid values, followed by error guessing, exploratory testing and the like. Could it be that the more systematic test ...
When it comes to performance testing, be smart about what and how you automate Listen closely to the background hum of any agile shop, and you’ll likely hear this ongoing chant: Automate! Automate! Automate! While automation can be incredibly valuable to the agile process, there are some key things to keep in mind when it ...
As our world continues its digital transformation with excitement in the advancement and convergence of so many technologies- from AI, machine learning, big data and analytics, to device mesh connectivity, nor should we forget VR and AR- 2017 promises to be a year that further transforms the way we work, play and take care of ...
With the new year just around the corner, here’s a look at the Test Automation trends that have the potential to dominate. DevOps is being relied upon more than ever. With there being strong Market Drivers for the adoption of DevOps, the need for Test Automation has also never been greater. But what’s next after ...
Introduction A characteristic of data warehouse (DW) development is the frequent release of high-quality data for user feedback and acceptance. At the end of each iteration of DW ETLs (Extract-Transform-Load), data tables are expected to be of sufficient quality for the next ETL phase. This objective requires a unique approach to quality assurance methods and ...

Leave a Reply

Your email address will not be published.

Stay in the loop with the lastest
software testing news

Subscribe