Testing methodology for Web browser development

This document is biased towards the Mozilla's development system, but similar principles apply in other Web browser developments.

Types of coverage ("when")

There are several ways of finding failures. These form a broad spectrum of test coverage, ranging from the unsuspecting public to the simplest of targeted test cases. Let us examine each in turn.

Real world coverage. This is the ultimate test. Ideally, end users would never experience software defects (bugs) but, unfortunately, writing software as complex as a web browser on a tight schedule inevitably means a compromise must be made between perfection and shipping the product before the company goes bankrupt. In the case of CSS, bugs in the last released version of the product are sometimes reported by web developers in official feedback forms or in public forums. These are an important source of bug reports, but in practice the signal to noise ratio is usually too high to warrant spending much time here. (Typically, issues reported in official feedback forms and public forums are well known issues, or even incorrect reports.)
Pre-release beta testing. Errors reported in widely distributed releases of non-final versions of the product are often known issues (as with bugs reported in final versions) but by examining how many times issues are reported the most important bugs can be prioritized before the final version is released.
Dogfood. It is good practice to use the software one is developing on a daily basis, even when one is not actively working on developing or testing the product. This is known as "eating one's dogfood". Many bugs, usually user interface (UI) issues but also occasionally web standards bugs, are found by people using daily builds of the product while not actively looking for failures. A bug found using this technique which prevents the user of the product on a regular basis is called a "dogfood" bug and is usually given a very high priority.
"top100" and site-specific testing. The web's most popular pages are regularly checked by visual inspection to ensure that they display correctly. (It is hard, if not impossible, to automate this task, because these pages change very, very frequently.) Bugs found through this technique are important, because many users would encounter them should a product be released with such a defect. In practice, many rendering issues found on top100 pages are actually caused by errors on the pages themselves, for example using invalid CSS or CSS which is incorrectly handled by other browsers. Most web authors do not check the validity of their pages, and assume that if the page "looks right" on popular browsers, it must be correct.
Smoketests. Each day, before allowing work to begin on the code base, the previous day's work must pass an extremely simple set of tests known as "smoketests". The name comes from the idea that these tests are the software equivalent of powering a new prototype circuit and seeing if it catching fire! CSS does not figure very prominently on the smoketest steps, so few, if any, CSS bugs are caught this way. However, since CSS underpins a large part of the application's UI, if anything is seriously wrong with the CSS infrastructure, it will be caught by the smoketests. Bugs found this way are known as "smoketest blockers", and with good reason: all work is blocked until the bugs are fixed. This is to ensure that the bugs are fixed quickly.
Tinderbox tests. Tests are also run on a continuous basis on a system known as the tinderbox. This tests the absolute latest code base, and therefore is a good way of catching unexpected compile errors (code that works on one platform might not work on another) and major problems such as startup failures. There are no CSS tests currently being run on the tinderboxes, however this is a direction which will be worth pursuing in the future.
Automated tests. Some tests have been adapted for a test harness known as NGDriver. These tests run unattended and can therefore cover large areas of the product with minimum effort. Until recently, CSS could not easily be tested using an automation system. However, with the advent of the Layout Automation System (LAS), there now exists a test harness that is capable of displaying a test page and then comparing this test page, pixel for pixel, with a pre-stored image.

Automation is the holy grail of QA. Unfortunately, there are many aspects that are hard to impossible to automate, such as printing.
Engineer regression and pre-checkin tests. In order to catch errors before they are flagged on the tinderbox (and thus wasting a lot of time) engineers must run a set of tests before committing their changes to the main code base. These tests are known as 'pre-checkin tests'. Certain changes also require that the new code be run through specially designed regression tests that are written to flag any regressions (new defects which were not present in a previous version) in the new code.
Manual test runs. Before releases, and occasionally at other times as well (for instance when a new operating system is released) every test case is run through a test build of the product and manually inspected for errors. This is a very time consuming process, but (assuming the person running the tests is familiar with them and the specification being tested) it is a very good way of catching regressions.
QA test development. The main way of discovering bugs is the continuous creation of new test cases. These tests then get added either to the manual test case lists or the automated test case lists, so that they can flag regressions if they occur.
Engineer test development. When a bug is discovered, the file showing this bug is then reduced to the smallest possible file still reproducing the bug. This enables engineers to concentrate on the issue at hand, without getting confused by other issues. Oddly enough, during this process it is not unusual to discover multiple other related bugs, and this is therefore an important source of bug reports.

Types of test cases ("where")

If the various techniques for finding failures gives a list of when bugs are typically found, then the various different kinds of test cases gives a list of where the failures are found.

The original files of a bug found in the field. Typically, bugs reported by end users and beta testers will simply consist of the web address (URI) of the page showing the problem. Similarly, bugs reported by people while using daily builds (dogfood testing) and bugs found on top100 pages will consist of entire web pages.

An entire web page is usually not very useful to anyone by itself. If a bug is found on a web page, then the web site will have to be turned into a reduced test to be useful for developers (engineer test development). This will then typically then be used as the basis for a group of more complicated QA tests for regression testing (and maybe finding more bugs in that area).

Example:
- http://www.cnn.com/ or any other web site.
Reduced test (attached to a bug). In order for an engineer to find the root cause of a bug, it is helpful if the original page which demonstrates the bug is simplified to the point where no unrelated material is left. This is known as minimizing or reducing a test case, and forms part of engineer test development.

Reduced tests are typically extremely small (less than one kilobyte including any support files) and extremely simple. There are obviously exceptions to these rules, for example tests for bugs that only manifest themselves with one megabyte files will be big (although still simple) and bugs that are only visible with a convoluted set of conditions will be complicated (although still small).

A good reduced test will also be self explanatory, if that can be done without adding text which would be unrelated to the test.

Examples:
- http://bugzilla.mozilla.org/attachment.cgi?id=25713&action=view
- http://bugzilla.mozilla.org/attachment.cgi?id=39662&action=view
Note that both these examples are attached to Bugzilla, the Mozilla bug tracking tool.
Simple test. When the implementation of a feature is in its infancy, it is useful to create a few simple tests to check the basics. These tests are also known as isolation tests (since the features are typically tested in isolation) or ping tests (in computing terms, to ping something means to check that it is alive).

Simple tests consist of a test as simple as a reduced test, but designed to be easy for QA to use, rather than for engineers, and therefore may have the appearance of a complicated test.

Simple tests are often used as part of a complicated test. When used in this way they are known as a control test, with analogy to the concept of a control in experimental physics. If the control test fails, then the rest of the test is to be considered irrelevant. For example, if a complicated test uses colour matching to test various colour related properties, then a good control test would be one testing that the 'color' property is supported at all.

Example:
- http://www.hixie.ch/tests/adhoc/css/cascade/style/001.xml
Complicated test. This is the most useful type of test for QA, and is the type of test most worth writing. One well written complex test can show half a dozen bugs, and can therefore they are worth many dozens of simple tests. since for a complicated feature to work, the simpler features it uses must all work too.

Complicated tests should appear to be very simple, but their markup can be quite convoluted since it is typically testing combinations of several things at once. The next chapter describes how to write these tests.

Example:
- http://www.hixie.ch/tests/adhoc/css/selectors/not/006.xml
Use case demo page. Occasionally, pages will be written to demonstrate a particular feature. Pages like this are written for various reasons -- they are written by marketing teams to show users the new features of a product, they are written by technology evangelism teams to show web developers features that would make their site more interesting or to answer frequently asked questions about a particular feature, sometimes they are even written for fun! CSS, due to its very graphical nature, has many demo pages.

Demo pages are really another kind of complicated test, except that because the target audience is not QA it may take longer to detect failures and reduce them to useful tests for engineers.

Example:
- http://damowmow.com/mozilla/demos/layout/demo.html
Extremely complicated demo page. When an area has been highlighted as needing a lot of new complicated tests, it may be hard to decide where to begin working. To help decide, one can attempt to write an entire web site using the feature in question (plus any others required for the site). During this process, any bug discovered should be noted, and then used as the basis for complicated tests.

This technique is surprisingly productive, and has the added advantage of discovering bugs that will be hit in real web sites, meaning that it also helps with prioritisation.

At least two web sites exist purely to act as extremely complicated demo pages:
- http://www.libpr0n.com/
- http://www.mozillaquestquest.com/
Automated tests. These are used for the same purposes as complicated tests, except that they are then added to automated regression test suites rather than manual test suites.

Typically the markup of Automated Tests is impenetrable to anyone who hasn't worked on them, due to the peculiarities of the test harness used for the test. This means that when a bug is found on an automated test, reducing it to a reduced test can take a long time, and sometimes it is easier to just use automated tests as a pointer for running related complicated tests. This is suboptimal however, and well designed automated tests have clear markings in the source explaining what should be critical to reproducing the test without its harness.

The worst fear of someone running automated tests is that a failure will be discovered that can only be reproduced with the harness, as reducing such a bug can take many hours due to the complexities of the test harnesses (for example the interactions with the automation server).

Example:
- http://www.hixie.ch/tests/ngdriver/domcss/sc2p004.html

Finding bugs

The following flowchart is a summary of this document.

                                  BETA FEEDBACK
                                        |
 EXTREMELY                             \|/
COMPLICATED        USER FEEDBACK --> WEB SITE <-- DOGFOOD
 DEMO PAGE                              |
     |                                  |
     |                                  |
    \|/                                \|/
  LIST OF -----> COMPLICATED ------> REDUCED
   BUGS   <-----    TESTS             TEST
    /|\            |                    |
     |            \|/                  \|/
     AUTOMATED TESTS                BUG FILED