===========================================================================
Hi. If you wish to discuss this document, the appropriate forum is
probably www-style@w3.org. Make sure you give the URI of this document
if you do bring it up!
===========================================================================
 
ABSTRACT

This is a proposal for changes to the selector module draft [1] and
user interface module draft [2] which takes into account all the
selector suggestions of which I am aware.


SUMMARY OF MAIN CHANGES FROM CSS3 DRAFTS

  + Not exact match simple selector: !ns|element
  + Substring match attribute selector [ns|attr$="substring"]
  + Numeric attribute selectors: [ns|attr<=0] [ns|attr>=0]
  + Regular expression attribute selector: [ns|attr?="regexp"]
  + Negative attribute selectors: [!...]
  + Exact match content selector: ="text"
  + Substring match content selector: $="text"
  + Regular expression match content selector: ?="text"
  + Negative content selectors: !...
  + :replaced pseudo-class
  + :first-node, :last-node pseudo-classes
  + :only-child, :only-node, :only-of-type pseudo-classes
  + :child(n,m), :child-of-type(n,m) pseudo-classes
  + :children(n,m), :children-of-type(n,m) pseudo-classes       
  + :empty pseudo-class
  + :matches, :matched pseudo-classes (instead of :selected)
  + :not-... pseudo-classes
  + change pseudo-element prefix to "::" from ":"
  + ::first-word, ::first-words(n) pseudo-elements
  + ::line(n), ::line(n,m), ::lines(a,b) pseudo-elements
  + ::last-line pseudo-element
  + Reference combinator: /.../ 


COMPLETE PROPOSAL

  Selectors consist of one or more simple selectors separated from
  each other by combinators.

  SIMPLE SELECTORS

    A simple selector starts either with an optional universal
    selector or with a type selector, followed by zero or more of the
    following: attribute selectors, ID selectors, content selectors,
    pseudo-classes or pseudo-elements. (The universal selector is
    optional because it may be omitted if there are other parts to the
    simple selector as well.)

    The subject of the selector is *always* the elements represented
    by the last sequence of simple selectors in the selector.

    TYPE SELECTORS

      A type selector represents a particular element, optionally in a
      particular namespace (see the namespace draft [3] for an
      explanation of how to declare a namespace).

      In each case the element type may be prefixed by a namespace
      specifier, which may be either a short name, a wildcard, or
      empty.

      The meanings are as follows:

        ns|GI -- elements with name GI in namespace ns.

        *|GI -- elements with name GI in any namespace.

        |GI -- elements with name GI without an explicit namespace.

        GI -- if no default namespace has been specified, this is
        equivalent to *|GI. Otherwise it is equivalent to ns|GI where
        ns is the default namespace.

      The element name may also be substituted for a wildcard, in
      which case it is known as the universal selector.

        ns|* -- all elements in the namespace "ns"
        
        *|* -- all elements
        
        |* -- all elements without a namespace specifier

        * -- if no default namespace has been specified, this is
        equivalent to *|*. Otherwise it is equivalent to ns|* where ns
        is the default namespace.

      Not specifying the namespace at all is equivalent to giving the
      namespace which has been set as the default. If the default is
      unspecified then it is taken to be "*". 

        EXACT MATCH
          e.g. ns|GI

        NOT EXACT MATCH
          e.g. !ns|GI
          matches all elements that are _not_ of type "GI" in the
          "ns" namespace. 

          e.g. !*|*
          matches no elements at all (and therefore pointless).

        -- NOTES
          Element type selectors probably do not need to be any more
          complicated than the above -- there is in any case a usually
          small and certainly finite set of element names, set by the
          DTD, so regexp or substring matches needn't be used to cater
          for all situations.

          For example, 

             ?"H[1-6]" 

          ...doesn't make life that much easier than saying

             H1, H2, H3, H4, H5, H6

          ...which, given the HTML DTD, is all it is standing for, and
          looks a lot neater.


    ATTRIBUTE SELECTORS

      An attribute selector matches elements which have the relevant
      attributes or attribute values. A rich set of methods of
      matching is required, since attributes come in many different
      syntaxes.

      In each case the attribute name may be prefixed by a namespace
      specifier and the vertical bar, and the namespace specifier may
      be either a short name, a wildcard, or empty. Not specifying the
      namespace is equivalent to giving an empty namespace.

      The meanings are as follows:

        ns|attr -- performs the test on the "attr" attribute of the
        "ns" namespace. The element's default attributes are _not_
        examined, even if the element itself is in the "ns" namespace. 

        *|attr -- performs the test for every attribute "attr" with a
        namespace, including the default attribute (i.e., the one
        without an explicit namespace).

        |attr -- performs the test on the default attribute of the
        element (i.e., the attribute named "attr" which does not have
        an explicit namespace). 

        attr -- exactly equivalent to |attr (default namespaces do not
        apply to attributes).

        PRESENCE
          e.g. [ns|attr]

        EXACT MATCH
          e.g. [ns|attr="value"]

        SUBSTRING MATCH
          e.g. [ns|attr$="part"]

        SPACE SEPARATED KEYWORD MATCH
          e.g. [ns|attr~="keyword"]

        SPACE SEPARATED KEYWORD MATCH FOR html:class ATTRIBUTE
          e.g. .keyword

        HYPHEN SEPARATED ROOT MATCH
          e.g. [ns|attr|="root"]

        NUMERIC GREATER-THAN-OR-EQUAL-TO MATCH
          e.g. [ns|attr>=0]
          does not match if the attribute is not numeric.

        NUMERIC LESS-THAN-OR-EQUAL-TO MATCH
          e.g. [ns|attr<=0]
          does not match if the attribute is not numeric.

        REGEXP MATCH
          e.g. [ns|attr?="regexp"]

        NOT PRESENCE
          e.g. [!ns|attr]

        NOT EXACT MATCH
          e.g. [!ns|attr="value"]

        NOT SUBSTRING MATCH
          e.g. [!ns|attr$="part"]

        NOT SPACE SEPARATED KEYWORD MATCH
          e.g. [!ns|attr~="keyword"]

        NOT HYPHEN SEPARATED ROOT MATCH
          e.g. [!ns|attr|="root"]

        NOT NUMERIC GREATER-THAN-OR-EQUAL-TO MATCH
          e.g. [!ns|attr>=0]
          does not match if the attribute is not numeric.

        NOT NUMERIC LESS-THAN-OR-EQUAL-TO MATCH
          e.g. [!ns|attr<=0]
          does not match if the attribute is not numeric.

        NOT REGEXP MATCH
          e.g. [!ns|attr?="regexp"]

        -- NOTES
          Greater than but _not_ equal to can be selected by:
             [attr>=0][!attr<=0]
          Similarly, numeric equality is the same as:
             [attr>=0][attr<=0]
          Neither of the above will match if the attribute value is
          missing or non-numeric, however.


    ID SELECTORS

      An ID selector matches elements which have a particular ID. By
      definition, an ID selector can only match at the most one
      element per document.

        EXACT MATCH
          e.g. #id


    CONTENT SELECTORS

      A content selector matches elements which have particular
      content. (The syntax is purposefully the same as that for
      attribute selectors but without the square brackets.)

        EXACT MATCH
          e.g. ="text"

        SUBSTRING MATCH
          e.g. $="text"

        REGEXP MATCH
          e.g. ?="text"

        NOT EXACT MATCHq
          e.g. !="text"

        NOT SUBSTRING MATCH
          e.g. !$="text"

        NOT REGEXP MATCH
          e.g. !?="text"

    PSEUDO-CLASSES

      A pseudo-class matches elements based on information that lies
      outside of the document tree or that cannot be expressed using
      the other simple selectors. 

        DYNAMIC PSEUDO-CLASSES

          LINK PSEUDO-CLASSES

            :link - matches elements that are links to documents that
            have not yet been visited. (How this is determined is left
            up to the UA and is outside the scope of CSS.)

            :visited - matches elements that are links to documents
            that have already been visited. (How this is determined is
            left up to the UA and is outside the scope of CSS.)

          UI ACTION PSEUDO-CLASSES

            :hover - matches elements that have the pointing device
            within their outer border edge.

            :active - matches elements while they are being activated
            by the user (only elements whose 'user-input' property has
            the value of "enabled" can become :active).

            :focus - matches elements that have the UI focus (only
            elements whose 'user-input' property has the value of
            "enabled" can acquire :focus).
           
            :enabled - matches elements whose 'user-input' property
            has the value of "enabled".
           
            :disabled - matches elements whose 'user-input' property
            has the value of "disabled".

            :checked - matches elements which have been checked or
            picked (only those elements whose 'user-input' property
            has the value of "enabled" or "disabled" can be :checked).

          TARGET PSEUDO-CLASS

            :target - matches elements whose ID is identical to the
            fragment identifier of the current URI.

          REPLACED PSEUDO-CLASS

            :replaced - matches replaced elements. 

            e.g. <img> elements would match this if the image was
            found, but if the image is broken/not displayed for
            whatever reason, and the alt text is shown instead, then
            it would not match. See footnote [B].

        LANGUAGE PSEUDO-CLASS

          :lang(x) - matches elements in language x. (The language is
          inherited down the document tree in document-language
          specific ways. Refer to the relevant specs - HTML, XML - for
          details.) The language code is matched in the same way as
          for the |= attribute selector.

        STRUCTURAL PSEUDO-CLASSES

          :first-child - same as :child(1).

          :first-node - same as :first-child, but *ONLY* if there is
          no #PCDATA anonymous content preceding the first child
          (ignoring any ignorable whitespace).

          :first-of-type - same as :child-of-type(1).


          :last-child - matches elements that have no later siblings.
          The same as :nth-child(-1). The following will never match:
 
            *:last-child ~ * { }

          :last-node - same as :last-child, but *ONLY* if there is no
          #PCDATA anonymous content following the last child (ignoring
          any ignorable whitespace).

          :last-of-type - matches elements that have no later siblings
          with the same element name. The following will never match:

            X:last-of-type ~ X { }


          :only-child - matches an element that has no siblings. Same
          as :first-child:last-child or :child(1):child(-1).

          :only-node - matches an element that has no siblings, not
          even #PCDATA siblings. Same as :first-node:last-node.

          :only-of-type - matches an element that has no siblings with
          the same element name. Same as :first-of-type:last-of-type
          or :child-of-type(1):child-of-type(-1).


          :child(n) - directly equivalent to :child(n,0).

          :child-of-type(n) - directly equivalent to :child-of-type(n,0).

          :child(n,m) - matches an element that has n+xm-1 siblings
          before it in the document tree, for all x. (n>=1, m=0 or
          m>=n, x>=0). In other words, this matches the nth child of
          an element after all the children have been split into
          groups of m elements each. For example, this allows the
          selectors to address every other row in a table, and could
          be used, for example, to alternate the colour of paragraph
          text in a cycle of four.

            TR:child(1,2) /* address every odd row */
            TR:child(2,2) /* address every even row */

            /* Alternate paragraph colours: */                    
            P:child(1,4) { color: navy; }
            P:child(2,4) { color: green; }
            P:child(3,4) { color: maroon; }
            P:child(4,4) { color: purple; }

          When m=0, no repeating is used, so :child(5,0) matches only
          the fifth child.

          If n is negative, then start counting from the end of the
          element. For example,

            /* Alternate paragraph colours: */                    
            P:child(-1,4) { color: navy; }
            P:child(-2,4) { color: green; }
            P:child(-3,4) { color: maroon; }
            P:child(-4,4) { color: purple; }

          ...results in the same as the previous example, except that
          the last P of each block is guaranteed to be navy.

          :child-of-type(n,m) - matches an element that has n+xm-1
          siblings with the same element name before it in the
          document tree, for all x. (n>=1, m=0 or m>=n, x>=0). In
          other words, this matches the nth child of that type after
          all the children of that type have been split into groups of
          m elements each. For example, this allows us to alternate
          the position of floated images:

            IMG:child-of-type(1,2) { float: right; }
            IMG:child-of-type(2,2) { float: left; }

          When m=0, no repeating is used, so :child-of-type(5,0)
          matches only the fifth child of that type.

          If n is negative, then start counting from the end of the
          element. For example,

            IMG:child-of-type(-1,2) { float: right; }
            IMG:child-of-type(-2,2) { float: left; }

          ...results in the same as the previous example, except that
          the last IMG of each block is guaranteed to be on the right.


          :children(a,b) - matches all elements that are the ath
          child, the bth child, or any child in between, of their
          parent. Negative numbers mean count from the end of the
          element. For example, :children(3,5) matches the 3rd, 4th
          and 5th children of every element.

          :children-of-type(a,b) - matches all elements of each type
          that are the ath child of that type, the bth child of that
          type, or any child of that type in between, of their parent.
          Negative numbers mean count from the end of the element. For
          example, TR:children-of-type(2,-1) means all rows apart from
          the first (and is equivalent to TR:not-first-child-of-type).


          :empty - matches an element which has no children (including
          text nodes).

          :root - matches elements that are the root of their
          document tree.

          :matches(SELECTOR) - matches elements if the selector so
          far with SELECTOR appended to it would match that element
          or one of its descendants. (See footnote [A].)

            H1:matches(+P) /* H1s that are followed by paragraphs */

            BODY:matches(BLOCKQUOTE P IMG.signature) /* BODY
            elements that contain one or more BLOCKQUOTEs containing
            a P containing an IMG element of class "signature". */

            P:matches(+ H2) CITE /* matches CITE elements inside P
            elements which are immediately before an H2 element */

          :matched(SELECTOR) - matches elements if the simple selector
          with SELECTOR _prefixed_ to it would match that element.

            H2:matched(P+) /* equivalent to P+H2 */

            H1 ~ H2:matched(P+) /* equivalent to H1 ~ P + H2 */

            A:matched(B):matched(C) /* matches elements A that have
            both a B ancestor and a C ancestor */


        NEGATIVE PSEUDO-CLASSES  

          For consistency, every pseudo-class has an equivalent that
          matches elements that do _not_ match the positive
          pseudo-class. For example, :not-hover matches elements that
          the pointing device is _not_ designating. Similarly,
          :not-child-of-type(3,7) matches elements that are _not_ the
          3rd sibling of groups of 7.

          For completeness, here is a list of all the negative
          pseudo-classes:

            :not-*+*

          Note that :not-enabled is NOT the same as :disabled, as some
          (most) elements will be neither enabled nor disabled.


    PSEUDO-ELEMENTS

      ::before - the pseudo-element that is just inside every element.

      ::after - the pseudo-element that is just before the end of
      every element.

      ::first-letter - the first letter of every element.

      ::first-word - the first word of every element. Equivalent to
      the long form ::first-words(1). The definition of "word" used is
      that used for the 'text-transform' property.

      ::first-words(n) - the first n words of every element. The
      definition of "word" used is that used for the 'text-decoration'
      property.

      ::first-line - the root inline box of the first line box of
      every block element which contains inline elements. Equivalent
      to the long form ::line(1,0).

      ::line(n) - the root inline box of the nth line box of every
      block element which contains inline elements. Equivalent to
      ::line(n,0). Takes the same properties as ::first-line.

      ::line(n,m) - the root inline box of the (n+xm)th line box of
      every block element which contains inline elements, for all x.
      (n>=1, m=0 or m>=n, x>0). When m=0, no repeating is used, so
      ::line(5,0) matches only the fifth line box of a block. Takes
      the same properties as ::first-line. If 'n' is negative, then
      perform the above calculations but starting from the bottom of
      the element. So ::line(-2,0) matches the penultimate line, and
      ::line(-2,2) matches every second line other from the
      penultimate line and going up the element.

      ::lines(a,b) - selects every root inline box from the ath to the
      bth in every block level element which contains inline elements.
      If either number is negative, then starts counting from the last
      line of the block. The range may go either forward or backwards.
      For example, ::lines(1,5) selects the first five lines;
      ::lines(-2,2) selects every line except the first and last (and
      demonstrates that a need not be less than b).

      ::last-line - the root inline box of the last line box of every
      block element which contains inline elements. Takes the same
      properties as ::first-line. Directly equivalent to ::line(-1,0).

      ::selection - applies to the portion of a document that has been
      highlighted by the user. Only elements whose 'user-select' has a
      value other than 'none' can have a ::selection.

      ::access-key - the part of the element which represents the
      'key-equivalent' key combination.

      ::menu - the contents of the element. See the CSS3 UI draft.

      ::inside, ::outside - see David's proposal [4].

      -- NOTES ON PSEUDO-ELEMENT RULES

        Something needs to be decided about how pseudo-elements
        inherit from their surroundings, how their contents inherit
        from them, and also which properties apply to each. For
        example, the ::selection pseudo-element can actually select
        straight across boundaries, so

          ::selection { font-size: 1.5em; }

        ...would probably not result in the same font-size all across
        the selection. This should be defined.


  COMBINATORS

    DESCENDANT COMBINATORS

      INDIRECT DESCENDANT COMBINATOR ( )

        e.g. A B
        matches B in:
           <A> <X> <B/> </X> </A>

      DIRECT DESCENDANT (CHILD) COMBINATOR (>)

        e.g. A > B
        matches B in:
           <A> <B/> </A>

    ADJACENT SIBLING COMBINATORS

      INDIRECT ADJACENT COMBINATORS (~)

        e.g. A ~ B
        matches B in:
           <A/> <X/> <B/>

      DIRECT ADJACENT COMBINATORS (+)

        e.g. A + B
        matches B in:
           <A/> <B/>

    REFERENCE COMBINATOR (/.../)

      e.g. IMG /USEMAP/ MAP AREA 

      Matches all AREA elements which are descendants of the MAP
      element pointed to by the attribute USEMAP of an IMG element.

      If the attribute is IDREF, then the element selected must be the
      one which would match #XXX where XXX is the contents of the
      attribute in question. Otherwise, it is the element pointed to
      by the URI which the attribute represents, as per the
      target-counter, target-content, and target-attr functions.


DISCUSSION

  WHY :SELECTED IS A BAD THING...

    What is :selected ? Is it a pseudo-class or pseudo-element?

    The answer is 'neither'. It is an entirely new type of selector
    which simply changes the subject of the selector chain. Thus it is
    inconsistent with the rest of the selectors draft. (I propose that
    we use :matches() instead. See footnote [A] below.)
    
  ...AND WE SHOULD SEPARATE PSEUDO CLASSES FROM PSEUDO ELEMENTS

    Secondly, what is :-foo-bar ? Is it a pseudo-class or
    pseudo-element? The "-foo-" prefix is a mechanism designed by the
    working group to flag extensions in a forward-compatible way.
    However, implementations with 'open-ended' engines have no way of
    knowing which it is supposed to be, pseudo-class or -element, and
    thus have no easy way of dealing with them internally.

    Thus I propose that pseudo-elements be changed to use the ::
    prefix instead, as I have used above.

      ::before

    ...and so forth. This will also hopefully make people think more
    carefully about whether something is a pseudo-class or -element,
    and they may even realise that things like :selected are neither
    one nor the other...


FOOTNOTES

  [A] I suggest that in the initial specification of :matches(), this
  selector must be the last selector in the chain. This makes it
  directly equivalent to the :selected selector in the CSS3 Selectors
  WD of August 1999, with the only syntactic differences being the
  brackets:

    X:selected A  
    X:matches(A)

  Then, in later specifications, the :matches() pseudo-class can be
  extended to allow it to appear anywhere in a selector:

    X:matches(A) B

  ...which cannot be done using the :selected pseudo-thing.

  This means that the initial implementation burden on implementors is
  no greater for :selected as for :matches, and should thus remove
  must objections. (See the DISCUSSION section for why :selected is
  evil and :matches is better...)

  Also, the :matched() pseudo-class could be combined with the
  :matches() pseudo class in some way, for example:

    X:matched(A):matches(B) 
  
  could be written as:

    X:matches(A # B)

  I do not have a view either way.

  [B] This would probably be defined in terms of the replaced content
  proposals, such as the "content:replaced(attr(src)), attr(alt)"
  suggestion or suchlike.


REFERENCES

  [1] CSS3 Selectors Draft, W3C:
    http://www.w3.org/TR/1999/WD-CSS3-selectors-19990803
  [2] CSS3 User Interface Draft, W3C:
    http://www.w3.org/TR/1999/WD-css3-userint-19990916
  [3] CSS3 Namespace Draft, W3C:
    http://www.w3.org/1999/06/25/WD-css3-namespace-19990625/
  [4] ::inside & ::outside, David Baron:
    http://lists.w3.org/Archives/Public/www-style/2000Mar/0043.html


ACKNOWLEDGEMENTS

  Thanks to:
    Sjoerd Visscher <sjoerd@heeten.nl>
    Bert Bos <Bert.Bos@sophia.inria.fr>
    David Baron <dbaron@fas.harvard.edu>

-- Ian Hickson <py8ieh=website=internet=wwwstyle=selectors@bath.ac.uk>