METADATA IN HTML ================ See also: http://www.bath.ac.uk/%7Epy8ieh/internet/metadata/linkelement.txt Last updated: 2000-05-17 16:45 BST ABSTRACT -------- * All this should be done in XUL. * We should show 'title' and 'summary' attributes in tooltips. * We need have a status line with three panels, one for UA messages, one for js messages and metadata, and one for language information. * We need code to interpret the metadata attributes into human-readable strings. * We need a new menu item for the attributes that are links. WHAT METADATA? -------------- The following attributes in HTML provide metadata that we should be showing to the user. attribute element format description ------------------------------------------------------------------------------- title most elements free form text supplementary information summary TABLE free form text summary of table information datetime INS, DEL date/time stamp date and time of change lang most elements language code language of content hreflang A, LINK language code language of link target rel A, LINK link type target type wrt source doc rev A, LINK link type source doc type wrt target href A, AREA, LINK uri target uri action FORM uri script uri longdesc IMG, FRAME, IFRAME uri alternative description uri cite BLOCKQUOTE, Q, uri related doc uri DEL, INS In addition, the "title" attribute in the XLink namespace should also be made available. It is equivalent to the "title" attribute in HTML. Any javascript status line messages should also be considered to be metadata, since it originates from the web page itself. IMPLEMENTATION CONCEPT ---------------------- The suggested implementation described below is only the default configuration for the default chrome. Quite clearly, different chromes would have different ways of showing the metadata. For this reason, all the stuff described below should be implemented in XUL. This is very important! This must be implemented in the configurable chrome! Otherwise, it will not be very configurable, which would be bad, as some people are likely to want to change this a lot! FREE FORM TEXT: title AND summary --------------------------------- The content of the 'title' and 'summary' attributes should be shown verbatim as a word-wrapped tooltip. In the case of a table with both title and summary, or an element with both an HTML 'title' and an XLink 'title', the titles and the summary would be separated by a blank line in the tooltip. For example, take the following html table, which is also an XLink:
The tooltip would be: +---------------------+ | Sales for 1999 | | | | June had the | | highest sales. | | | | Raw data for 1999 | | sales | +---------------------+ (It may be advisable to prefix the XLink title text with the text "link: " or something like that (localised of course)). If the relevant attribute is blank or consists of only whitespace, then that part of the tooltip should be suppressed. If all parts of the tooltip are suppressed or missing, then the entire tooltip should not be shown. The tooltip shown should be that of the first element directly under the mouse cursor which has a title or summary set. The document tree is unimportant in working this out, as positioning and z-index can result in unrelated overlapping elements. For example, in this case:
1
2
When the cursor is in the 10px by 10px area at the top left hand corner of the viewport where the "1" and "2" overlap, the tooltip should read "second div", as it is 'on top'. If the second div had a z-index of "-1", then the tooltip would read "first div". The same concept of 'what is on top' is used to work out which element the metadata in the next section should come from. A similar concept is used to work out which elements are :hovering or :active in CSS. SPECIAL METADATA: datetime, lang, hreflang, rel, rev, href, cite... ------------------------------------------------------------------- * basic concepts: the status line The metadata is typically drawn in the status line (think about links and the HREF attribute, and javascript tickers). However, this means that the metadata overrides the UA's status messages. For this reason, we would want to have two parts to the stats line: a UA status line and a web page status line. Also, because the language is a special piece of metadata which applies to every element and is inherited, it would make sense to have this as a small third panel on the status line. To recap, the status line would be split like this: |^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^| +--------+-----------------------------------+------------+ | ua | metadata | language | +--------+-----------------------------------+------------+ The parts of the status line are: ua = Messages from the web browser. For example: Connecting to www.mozilla.org... Connected to www.mozilla.org... Downloading from www.mozilla.org... 45%... 95%... Starting Java... Done. metadata = Messages from the web page. The data from each attribute of the topmost element which the mouse is currently hovering and which has metadata should be concatenated to form the metadata status line text. (See below for more details.) language = The language of the element the mouse is hovering over. (all elements have this data, even in XML documents, because the lang attribute (xml:lang in XML) is inherited.) If the mouse is outside the viewport, then this should default to the root element's language. Note that the 'ua' and 'language' sections are of a fixed size, and the metadata status line takes all the remaining space. This should of course be configurable, see the next section. * notes on the status line and the user interface 1. The metadata and language panels should be optional (defaulting to enabled). To disable them, a simple selection from the right- click-on-the-status-line menu should be enough. 2. The size of each panel should also be configurable, preferably by simply dragging the separator bars. 3. The language status line should be able to act as a "translate" button (thus removing the need for a translate button on the toolbar, and also making it possible for the translate feature to be cleverer -- they would know what language the document was in). 4. The metadata available for an element may well exceed the available space on the status line. So that the information is not lost, it should be possible (probably by right clicking and choosing "Details...") to see the complete metadata of the top-most metadata-aware element the mouse is hovering over. See also the "metadata window idea" section near the end. * how to build the metadata line The textual interpretation for each of the attributes contributing text for the status line would first be collected. The exact method for doing that for each atribute is described in detail in separate sections below. Then, each piece of text would be concatenated to each other, in the following order, separated by spaces: 1. javascript statusline messages 2. datetime 3. hreflang 4. rel 5. rev 6. cite 7. href 8. longdesc 9. action Javascript comes first because it is what the author most likely wants the user to see. Javascript tickers can easily be disabled in the preferences, see the section on that below. The four URI attributes come last, since they are likely to be the longest, and we don't want them obliterating the more useful and rarer attributes such as 'datetime' and 'hreflang'. * new menu items The IMG, FRAME, IFRAME, INS, DEL, Q and BLOCKQUOTE elements should have an additional contextual menu item which brings up the page pointed to by the 'longdesc' or 'cite' attributes. For example, if you right clicked on a quote, you should be able to select something along the lines of "Open Quote Source" or "Open Quote Source in New Window", which would display the page from whence the quote came (as given by the 'cite' attribute). Similarly for the 'longdesc' attribute. * examples: INS element and datetime metadata: |^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^| +--------+-----------------------------------+------------+ | Done. | Section inserted on 1999/05/21... | English | +--------+-----------------------------------+------------+ Javascript ticker: |^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^| +--------+-----------------------------------+------------+ | Done. | uy our brand new power toys to... | English | +--------+-----------------------------------+------------+ Link with rel, hreflang and href attributes: |^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^| +--------+-----------------------------------+------------+ | Done. | French Next Chapter (http://ww... | English | +--------+-----------------------------------+------------+ Swiss french web page downloading, no particular metadata: |^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^| +--------+-----------------------------------+------------+ | 45%... | | French... | +--------+-----------------------------------+------------+ Note that if something doesn't fit in its status line, an elipsis is appended. To get to the additional information, the user would right click on the element and choose "More Details...", which would bring up a window with the complete metadata text. * preference settings The number of settings which can be set in the preferences panel should be kept to a minimum. Here is the suggested set of items: [x] Show web site information (metadata) in status line [x] Enable dynamic status line content (JavaScript tickers) [x] Enable language panel Another option may be needed to control the "metadata window", if this feature is implemented: [ ] Show web site information (metadata) in popup window THE lang AND hreflang ATTRIBUTES -------------------------------- To convert a language code (as used in lang and hreflang) to a friendly name, I would recommend the following algorithm (this is just an idea and would have to be refined by examining the relevant specs): If the first name token (recall that language codes are simply a hyphen separated list of name tokens) is either 'x' or 'i', then take the second token and capitalize it, and then include all remaining tokens (if any) in a parenthesised section after it, replacing the hyphens with spaces. Leave the case of these other tokens untouched. e.g., "x-klingon" -> "Klingon" "x-minbari-warrior-caste" -> "Minbari (warrior caste)" "i-default" -> "Default" Otherwise, the first token should be expanded as per ISO639. If the second token is two letters long and in capitals, then it should be substituted for a country name as per RFC 1766 (I think), and prepended to any other tokens, which should all be placed in a parenthesised section afterwards, replacing the hyphens with spaces, and again leaving the case of these other tokens untouched. e.g., "en-US" -> "English (United States)" "en-GB" -> "English (Great Britain)" "en-gb" -> "English (gb)" "en-cockney" -> "English (cockney)" "fr-CH-slang" -> "French (Switzerland slang)" "az" -> "Azerbaijani" See also: http://info.internet.isi.edu/in-notes/rfc/files/rfc1766.txt ftp://dkuug.dk/i18n/ISO_639 Note that this last section should be carefully checked once implemented to make sure the relevant RFCs are followed. THE rel AND rev ATTRIBUTES -------------------------- To change 'rel' and 'rev' values into friendly names, the following substitutions would do (note that this would obviously have to be localised -- as would all the strings suggested herein -- and some of those below could do with some improvements): Link Types When used with 'rel'... When used with 'rev'... ------------------------------------------------------------------ alternate Alternate Document Alternate Document stylesheet Stylesheet Data start First Document Continuation Document next Next Document Previous Document prev Previous Document Next Document contents Contents Chapter index Index Main Document glossary Glossary Main Document copyright Copyright Main Document chapter Chapter Main Document section Section Main Document subsection Subsection Section appendix Appendix Main Document help Help Document Main Document bookmark Bookmark Index (keyword) (keyword) Reverse of (keyword) If keywords are mixed, then the expansions should be enumerated, separated with semicolons. e.g., rel="appendix subsection" would become "Appendix; Subsection". Duplicates should be ignored, so rev="copyright glossary" would not become "Main Document; Main Document". This merging would also happen if both rev and rel were used simultaneously, for example "rel=next rev=prev" would become simply "Next Document", and not "Next Document; Next Document". A possible way of making this even cooler is described below. While it is not exactly complicated, it is not a two liner, either. I suggest this be delayed for the 5.1 release: If any of the "Alternate", "Start", "Help", "Next" or "Prev" keywords are used in conjunction with another keyword from the list on the 'rel' attribute, then the word "Document" should be exchanged for the other keyword's interpretation. e.g., rel="start chapter" would become "First Chapter" and not "First Document; Chapter". If they are used several times, then the substitution process should be done several times, starting at the right hand side. For example, rel="alternate start help" would become "Alternate First Help Document", and rel="next alternate copyright appendix" would become "Next Alternate Copyright; Appendix", but rel="next copyright appendix alternate" would become "Next Copyright; Appendix; Alternate Document". This could be made even cleverer, but that is for someone else to think about! These strings could then be prepended to the HREF, which could be put in parenthesis. The following examples show first the link and then what would appear in the status line. Note that the href is expanded to an absolute URL first. Index (http://www.foo.org/somedocument/index.html) Chapter (http://www.foo.org/somedocument/chapter4.html) Glossary (http://www.foo.org/somedocument/glos.html) Next Chapter (http://www.foo.org/somedocument/chapter5.html) Previous Section (http://www.foo.org/somedocument/chapter5.html#part2) (http://www.mozilla.org/) French (Switzerland) (http://www.mozilla.org/) French book (http://www.foo.org/path/book.fr.html) THE datetime ATTRIBUTE ---------------------- The 'datetime' attribute should be converted to the format specified by the operating system's date and time settings. Then, if the attribute was on an INS element, the text would be formatted as: "Section inserted on at
(http://www.bath.ac.uk/%7Epy8ieh/cgi/listbrowsers.pl) Examples for are given in the 'rel' and 'rev' section. THE JAVASCRIPT STATUS LINE STUFF -------------------------------- As noted earlier, any messages which the javascript code tries to add to the status line will be PREPENDED to the rest of the metadata. This means that the href will still be visible -- one very frequent complaint is that the uri is hidden by web authors, and this gets around the problem. The javascript status line stuff should be completely disable-able anyway, as noted in the section on preference settings (see above). For example: Hello. English (http://www.foo.org/path/a) Note how the uri is still present, regardless of the javascript. THE METADATA WINDOW IDEA ------------------------ An additional feature which could be implemented is a small, resizable, persistent window which displays the same thing as the status line, but floating around instead. This would get around the problem of the metadata not fitting on the staus line. Maybe this could be the same as the window which appears when choosing "Details..." from the context sensitive menu of elements. ============================================================================== Contributors: Ian Hickson Heikki Toivonen Robert O'Callahan