CSS2.1 Test Case Authoring Guidelines

Tantek Çelı̇k, Microsoft Corporation, tantekc@microsoft.com
Ian Hickson, ian@hixie.ch

[Here will be included the file "../copyright.inc"]


This is a supporting document for authors of tests in the CSS2.1 test suite giving the best practices for writing test cases. It describes the key aspects of CSS tests, and gives various techniques authors should use when writing tests.

Status of this document

These guidelines were produced by members of the CSS working group which is part of the style activity (see summary).

Comments on, and discussions of this document can be sent on the (archived) public mailing list public-css-testsuite@w3.org (see instructions). W3C Members can also send comments directly to the CSS working group.

These guidelines represent the current thinking of the working group and as such may be updated, replaced or rendered obsolete by other W3C documents at any time. Its publication does not imply endorsement by the W3C membership or the CSS Working Group (members only).

Patent disclosures relevant to CSS may be found on the Working Group's public patent disclosure page.

Table of contents


This document explains how to write test cases for the CSS2.1 test suite.

While this document describes tests in the context of CSS2.1, it also applies to other Web formats. For example, these guidelines are directly applicable to tests written for HTML, SVG, XML, etc.

Key aspects of tests

A badly written test can lead to the tester not noticing a failure, as well as breaking the tester's concentration. Therefore it is important that the tests all be of a high standard.

Easy to determine the result

Tests are viewed one after the other in quick succession, usually in groups of several hundred to a thousand. As such, it is important that the results be easy to interpret.

Quick to determine the result

The tests should need no more than a few seconds to convey their results to the tester.

Self explanatory

The tests should not need an understanding of the specification to be used.


Tests should be very short (a paragraph or so) and certainly not require scrolling on even the most modest of screens, unless the test is specifically for scrolling behaviour.


Unless specifically testing error-recovery features, the tests should all be valid.


In general, CSS2.1 tests should follow the following template.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <title>CSS 2.1 Test Suite: Description of test</title>
  <link rel="help" href="http://www.w3.org/TR/CSS21/...#..."/>
  <style type="text/css">
   CSS for test
  Content of test

Naturally, not all tests fit in this template. For example, SVG tests would need a similar template created, but using SVG; tests checking for error recovery or some more esoteric aspects of XHTML would similarly have to have this template adapted to suit the test.

Kinds of test

Test suites should consist of the following kinds of tests:

a — atomic
tests that check whether the feature is supported at all. There should only be as many atomic test cases as necessary to check that the feature is implemented. Thus most features will require at least two atomic test cases, to check that different values have different results. (It is likely that these test cases will be in the same single test file.)
b — basic
tests that check the simple (but realistic) aspects of a feature, for example the various named color values.
c — composite
tests that check combinations of features, for example 'border-top-color' with 'color'.
d — detailed
tests that check explicit testable assertions that involve more than one feature.
e — evil
tests that check side-effects of combinations of features, or edge cases.
f — failure
tests that check failure conditions and error- handling. (For example, for CSS tests, these tests are likely to involve invalid CSS stylesheets.)

Links to the specification

Each test should contain one or more links to the relevant part of the specification:

<link rel="help" href="link to part of specification">

The first of these links typically decides the position of the test in the test suite. The links should be ordered from rare to common such that the link to the rarest feature will be first.

For CSS2.1 tests, the URIs should point to the /TR/CSS21/ spec (i.e. the "latest version"), not to the version existing at the time of publication.

Recommended content

In atomic, basic, and failure tests, tests should limit themselves to <p>, <div>, and <span> elements.

For composite, detailed, and evil tests, tests may use any element from the XHTML Basic repertoire. Specific tests checking for the interaction of CSS with other XHTML elements, or elements from other namespaces, may include those other elements too (and the filenames should be adjusted to indicate the required support).


If possible without compromising the test itself, block-level markup should be indented to show the structure of the document, as in the template given above. This dramatically eases the ability of the tests to be peer-reviewed. Tabs should be avoided, as editors are known to handle them inconsistently.

Content-Style-Type and Content-Script-Type

When the document contains inline style attributes, then the following line should be added immediately after the title element in the markup:

<meta http-equiv="Content-Style-Type" content="text/css"/>

When the document contains inline script (event handler) attributes, then the following line should be added immediately after the title element in the markup:

<meta http-equiv="Content-Script-Type" content="text/javascript"/>

If both are present, the Content-Style-Type header should be given first.

Other meta lines must not be included (unless they are an integral part of the test, obviously). Extraneous elements like this could interfere with the test.


Tests should use UTF-8 unless the test is specifically checking encoding handling. If the XHTML 1.1 source files need to define a character set other than UTF-8 or UTF-16, then they will define their character set using an XML declaration, as in:

<?xml encoding="iso-8859-1"?>

Special cases

It is expected that there will be special cases, e.g. tests that check for the cascading order of multiple stylesheets. Those tests will be handled on a case-by-case basis. Support files will all be kept in one support directory, if possible.


Tests will be given a filename in a special format.

The CSS2.1 section of the test, where AA is the chapter number, BB is the section number, and CC is the subsection number, all three zero-padded. If a test is for a section with no subsection, then the CC number is omitted.

For other test suites, this should be used in reference to section numbers in the relevant specification. For test suites covering more than one test suite, or covering features not detailed by explicit specifications, this part may be ommitted, being instead replaced by a longer "test-topic" string.

A short string naming the test or series of tests.
A zero padded three digit number.
The kind of test, as given above.
Test requirements as zero or more of the following (in alphabetical order):
a test requires the Ahem font
f test requires frames
g test requires support for bitmap images (PNG, JPEG, or GIF)
h test requires a session history
i test requires user interaction
m test requires support for MathML
n test requires support for XML Namespaces (i.e. uses namespaces other than XHTML1)
o test requires support for the Document Object Model
s test requires support for SMIL
v test requires support for SVG
If none are given, the hyphen before the reqs must be omitted too. (Note: It is likely that not all of these requirements will be used in the CSS2.1 test suite; some of these flags are included for future expansion.)
The format of the file, one of the following:
xht XHTML 1.1 test
xhb XHTML Basic test
htm HTML 4.01 test
xxh XML+XHTML test
xml XML test
svg SVG test
css CSS file
png PNG file

For example, for a devious but otherwise normal test for margin collapsing:


This is the filename of a composite test of the 'color' property's 'green' value which requires user interaction, DOM support, and namespaces:


Note: Due to limitations of the Mac platform, the total length of the filename must be at most 31 characters.

Requirements and assumptions

In general, tests should assume the following:

Tests with more detailed requirements should indicate these requirements with the relevant flags in the filename. (For example, a test should not assume the Ahem font is available unless it explicitly states that as a requirement in the filename.)

Writing ideal tests

Well designed tests typically fall into several categories, named after the features that the test will have when correctly rendered by a user agent.

Note: The terms "the test has passed" and "the test has failed" refer to whether the user agent has passed or failed a particular test — a test can pass in one web browser and fail in another. In general, the language "the test has passed" is used when it is clear from context that a particular user agent is being tested, and the term "this-or-that-user-agent has passed the test" is used when multiple user agents are being compared.

Indicating success

The green paragraph

This is the simplest form of test, and is most often used when testing the parts of CSS or other specs that are independent of the rendering, like the cascade or selectors. Such tests consist of a single line of text describing the pass condition, which will be one of the following:

This line should be green.
This line should have a green border.
This line should have a green background.


The green page

This is a variant on the green paragraph test. There are certain parts of CSS that will affect the entire page, when testing these this category of test may be used. Care has to be taken when writing tests like this that the test will not result in a single green paragraph if it fails. This is usually done by forcing the short descriptive paragraph to have a neutral color (e.g. white).


(This example is poorly designed, because it does not look red when it has failed.)

The green block

This is the best type of test for cases where a particular rendering rule is being tested. The test usually consists of two boxes of some kind that are (through the use of positioning, negative margins, zero line height, or other mechanisms) carefully placed over each other. The bottom box is colored red, and the top box is colored green. Should the top box be misplaced by a faulty user agent, it will cause the red to be shown. (These tests sometimes come in pairs, one checking that the first box is no bigger than the second, and the other checking the reverse.)


The green paragraph and the blank page

These tests appear to be identical to the green paragraph tests mentioned above. In reality, however, they actually have more in common with the green block tests, but with the green block colored white instead. This type of test is used when the displacement that could be expected in the case of failure is likely to be very small, and so any red must be made as obvious as possible. Because of this, test would appear totally blank when the test has passed. This is a problem because a blank page is the symptom of a badly handled network error. For this reason, a single line of green text is added to the top of the test, reading something like:

This line should be green and there should be no red on this page.


The two identical renderings

It is often hard to make a test that is purely green when the test passes and visibly red when the test fails. For these cases, it may be easier to make a particular pattern using the feature that is being tested, and then have a reference rendering next to the test showing exactly what the test should look like.

The reference rendering could be either an image, in the case where the rendering should be identical, to the pixel, on any machine, or the same pattern made using totally different parts of the specification. (Doing the second has the advantage of making the test a test of both the feature under test and the features used to make the reference rendering.)


The positioned text

There are some cases where the easiest test to write is one where the four letters of the word 'PASS' are individually positioned on the page. This type of test is then said to have passed when all that can be seen is the word with all its letters aligned. Should the test fail, the letters are likely to go out of alignment, for instance:




The problem with this test is that when there is a failure it is sometimes not immediately clear that the rendering is wrong. (e.g. inexperienced testers might think the first example was the intended rendering.)

This type of test is often useful for non-CSS tests.


The descriptive text

In some rare cases, there really is no way to get around describing what should happen. In these cases, it is important to describe what should happen, and not why. Generally, the tester is not concerned over what bidi embedding level the text is at, just that it should be right aligned, for example. Try to keep the text as brief as possible.


The success page

Rarely used for CSS tests, but very useful for tests of technologies such as XLink, is the success page. This consists of having a link (which is the test) pointing to a page which is green and says "PASS". If the link is correctly handled, the tester sees the pass page and knows it has succeeded, otherwise the test has failed.


Indicating failure

Ideal tests, as well as having well defined characteristics when they pass, should have some clear signs when they fail. It can sometimes be hard to make a test do something only when the test fails, because it is very hard to predict how user agents will fail! Furthermore, in a rather ironic twist, the best tests are those that catch the most unpredictable failures!

Having said that, here are the best ways to indicate failures. These are in addition to those inherent to the various test types, e.g., differences in the two halves of a two-identical-renderings test obviously also shows a bug.


This is probably the best way of highlighting bugs. Tests should be designed so that even if the rendering is just a few pixels off some red is uncovered.


Overlapped text

Tests of the 'line-height', 'font-size' and similar properties can sometimes be devised in such a way that a failure will result in the text overlapping.

The word "FAIL"

Some properties lend themselves well to this kind of test, for example 'quotes' and 'content'. The idea is that if the word "FAIL" appears anywhere, something must have gone wrong.

Parsing tests for markup languages are also often able to use this technique.


Scrambled text

This is similar to using the word "FAIL", except that instead of (or in addition to) having the word "FAIL" appear when an error is made, the rest of the text in the test is generated using the property being tested. That way, if anything goes wrong, it is immediately obvious.



In addition to the techniques mentioned in the previous sections, there are some techniques that are important to consider or to underscore.


This technique should not be cast aside as a curiosity — it is in fact one of the most useful techniques for testing CSS, especially for areas like positioning and the table model.

The basic idea is that a red box is first placed using one set of properties, e.g. the block box model's margin, height and width properties, and then a second box, green, is placed on top of the red one using a different set of properties, e.g. using absolute positioning.

This idea can be extended to any kind of overlapping, for example overlapping to lines of identical text of different colors.

Special fonts

Todd Fahrner has developed a font called Ahem, which consists of some very well defined glyphs of precise sizes and shapes. This font is especially useful for testing font and text properties. Without this font it would be very hard to use the overlapping technique with text.


The font's em-square is exactly square. It's ascent and descent is exactly the size of the em square. This means that the font's extent is exactly the same as its line-height, meaning that it can be exactly aligned with padding, borders, margins, and so forth.

The font's alphabetic baseline is 0.2em above its bottom, and 0.8em below its top.

The font has four glyphs:

X U+0058 A square exactly 1em in height and width.
p U+0070 A rectangle exactly 0.2em high, 1em wide, and aligned so that its top is flush with the baseline.
É U+00C9 A rectangle exactly 0.8em high, 1em wide, and aligned so that its bottom is flush with the baseline.
U+0020 A transparent space exactly 1em high and wide.

Most other US-ASCII characters in the font have the same glyph as X.

The self explanatory sentence followed by pages of identical text

For tests that must be long (e.g. scrolling tests), it is important to make it clear that the filler text is not relevant, otherwise the tester may think he is missing something and therefore waste time reading the filler text. Good text for use in these situations is, quite simply, "This is filler text. This is filler text. This is filler text.". If it looks boring, it's working!


In general, using colors in a consistent manner is recommend. Specifically, the following convention has been developed:

Any red indicates failure.
In the absence of any red, green indicates success.
Tests that do not use red or green to indicate success or failure should use blue to indicate that the tester should read the text carefully to determine the pass conditions.
Descriptive text is usually black.
Fuchsia, Yellow, Teal, Orange
These are useful colors when making complicated patterns for tests of the two identical renderings type.
Descriptive lines, such as borders around nested boxes, are usually dark gray. These lines come in useful when trying to reduce the test for engineers.
Light gray is sometimes used for filler text to indicate that it is irrelevant.

Here is an example of blue being used:

Methodical testing

There are particular parts of CSS that can be tested quite thoroughly with a very methodical approach. For example, testing that all the length units work for each property taking lengths is relatively easy, and can be done methodically simply by creating a test for each property/unit combination.

In practice, the important thing to decide is when to be methodical and when to simply test, in an ad hoc fashion, a cross section of the possibilities.

This example is a methodical test of the :not() pseudo-class with each attribute selector in turn, first for long values and then for short values:

Tests to avoid

The detailed harness

Tests that have huge amounts of text above and below linking to other tests, to the specs, showing the source of the test, and giving generic information about the test suite, are a significant time sink. The test page should contain only the test, nothing else.

Links to other tests are generally to be avoided as they may interfere with the test, and are unlikely to be useful anyway (testers are more likely to be using their own harness.) The source of the test is especially not useful, for four reasons: first, the tester probably doesn't understand the technology being tested well enough for that information to be useful; second, the tester is probably only trying to ascertain whether the test has passed or failed, not why; third, the source cannot be a complete picture of the test anyway (since it would have to contain itself to be so); and fourth, most user agents have their own, more accurate, source viewer if it is necessary.

This is an example of a test that has much too much detail around the test:

The long test

Any manual test that is so long that is needs to be scrolled to be completed is too long. The reason for this becomes obvious when you consider how manual tests will be run. Typically, the tester will be running a program (such as "Loaderman" or "BETTER") which cycles through a list of several hundred tests. Whenever a failure is detected, the tester will do something (such as hit a key) that takes a note of the test case name. Each test will be on the screen for about two or three seconds. If the tester has to scroll the page, that means he has to stop the test to do so.

Of course, there are exceptions -- the most obvious one being any tests that examine the scrolling mechanism! However, these tests are considered tests of user interaction and are not run with the majority of the tests.

In general, any test that is so long that it needs scrolling can be split into several smaller tests, so in practice this isn't much of a problem.

This is an example of a test that is too long:

The counterintuitive "this should be red" test

As mentioned many times in this document, red indicates a bug, so nothing should ever be red in a test.

There is one important exception to this rule... the test for the 'red' value for the color properties!

The first subtest on this page shows this problem:

Unobvious tests

A test that has half a sentence of normal text with the second half bold if the test has passed is not very obvious, even if the sentence in question explains what should happen.

There are various ways to avoid this kind of test, but no general rule can be given since the affected tests are so varied.

The last subtest on this page shows this problem:


Thanks to everyone who actually uses the guidelines in this document to write testcases: you will dramatically help Web browser vendors in their quest to implement Web standards interoperably.