Pingback 0.9

This version:
http://www.hixie.ch/specs/pingback/pingback-0.9
Latest version:
http://www.hixie.ch/specs/pingback/pingback
Previous versions:
http://www.kryogenix.org/writings/tech/pingback
http://www.kryogenix.org/days/000138.cas
Editors:
Stuart Langridge <sil@kryogenix.org>
Ian Hickson <ian@hixie.ch>

Abstract

Pingback is a mechanism for one Web site inform another that the first now links to content on the second. Typically, these will both be frequently updated Web logs ("blogs").

For example, Alice writes an interesting article on her Web log. Bob then reads this article and comments about it, linking back to Alice's original post. Using pingback, Bob's software can automatically notify Alice that her post has been linked to.

Status of This Document

This is a working draft. Comments are welcome on the blogite mailing list (archived).

There are currently six known implementations of this specification, although no thorough testing has been done to check how compliant they are:

Available languages

The English version of this specification is the only normative version. However, for translations of this document, see http://www.hixie.ch/specs/pingback/translations/.

Table of Contents


1. Introduction

The Pingback system is a way for a blog to be automatically notified when other blogs link to it. It is entirely transparent to the blogger doing the linking, requiring no user intervention to work, and operates on principles of automatic discovery of everything that it needs to know. A sample blog post involving Pingback might go like this:

  1. Alice posts to her blog. The post she's made includes a link to a post on Bob's blog.
  2. Alice's blogging system contacts Bob's blogging system and says "look, Alice made a post which linked to one of your posts".
  3. Bob's blogging system then, when people view the posts on Bob's blog, notes on the page that Alice linked to this post.
  4. Users can then follow this link back to Alice's post and read more.

It enables reverse linking — a way of going back up a chain of links rather than merely drilling down.

1.1. Technical Details

The pingback mechanism uses an HTML or XHTML <link> element for autodiscovery, and uses a single XML-RPC call for notifying the target site of the link on the source site.

1.2. Definitions

source URI
The address of the entry on the site containing the link.
pingback client
The software that establishes the connection to inform the server about the link from the source to the target. Typically, the source will be the client.
pingback-enabled page
A web page that advertises a pingback server using a pingback link element.
pingback server
The software that accepts XPC-RPC connections. Typically, the target URI will be associated with the server (e.g. on the same host).
pingback user agent
A single system, which is both a pingback client and a pingback server.
target URI
The target of the link on the source site. This SHOULD be a pingback-enabled page.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119].


2. Server Autodiscovery

Pingback-enabled pages MUST be valid HTML or XHTML pages containing a <link> element in one of the following two forms:

HTML
<link rel="pingback" href="pingback server">
XHTML
<link rel="pingback" href="pingback server" />

The link element MUST match the appropriate form exactly (including the whitespace before the slash, for instance).

Pages MUST NOT include more than one such element, and MUST NOT include such a string matching the pattern described below unless it is intended to be the link element.

The pingback server placeholder MUST be replaced by the absolute URI of the pingback XML-RPC server. This URI MUST NOT include entities other than &amp;, &lt;, &gt;, and &quot;. Other characters that would not be valid in the HTML document or that cannot be represented in the document's character sencoding MUST be escaped using the %xx mechanism as described in [RFC2396].

These strict requirements are intended to drastically reduce the requirements on clients implementing server autodiscovery, as it was deemed that requiring clients to implement an HTML parser in addition to an XML parser was a too heavy burden, given how easy it would be for page authors to comply to the restrictions described above.

Pingback clients, given a source URI and a target URI, should fetch the target URI and search for the first match of the following regular expression:

<link rel="pingback" href="([^"]+)" ?/?>

Clients MUST then expand the four allowed entities (&amp; for &, &lt; for <, &gt; for >, and &quot; for "). Having extracted this URI, it SHOULD be used to send an XML-RPC request as described below.

If the regular expression does not match, then the target in question does not support pingback as defined by this specification and the client MAY do whatever it likes. However, it is RECOMMENDED that clients do not attempt to be more lenient (e.g. by correctly parsing the HTML and looking for <link> elements that look like pingback links from an HTML point of view) because this will lead to some systems recognising the link and others ignoring it.

Clients MAY optimise the search. For example:

Note, however, that these optimisations are prone to being caught out by legitimate documents, for example those having comments containing the strings given above, or those with large inline stylesheets appearing before the pingback link. Authors are encouraged to take these possible optimisations into account when deciding where to place their pingback links.

How the client is told of the source and target URIs is out of the scope of this specification, but typically blogs will extract external links from posts being made to find the target URIs.

3. XML-RPC Interface

Pingback clients, having discovered a pingback server, SHOULD send the server an XML-RPC request with the method name pingback.ping and two arguments, the source URI and the target URI respectively. [XML-RPC]

pingback.ping
Notifies the server that a link has been added to sourceURI, pointing to targetURI.
Parameters
sourceURI of type string
The absolute URI of the post on the source page containing the link to the target site.
targetURI of type string
The absolute URI of the target of the link, as given on the source page.
Return Value
A string, as described below.
Faults
There are no predefined faults for this call.

Servers MUST respond to this function call either with a single string or with a fault code.

If the pingback request is successful, then the return value MUST be a single string, containing as much information as the server deems useful. This string is only expected to be used for debugging purposes.

If the result is unsuccessful, then the server MUST respond with an RPC fault value. This specification does not specify what error codes to use.

Clients MAY ignore the return value, whether the request was successful or not. It is RECOMMENDED that clients do not show the result of successful requests to the user.

Upon receiving a request, servers MAY do what they like. However, the following steps are RECOMMENDED:

  1. The server MAY attempt to fetch the source URI to verify that the source does indeed link to the target.
  2. The server MAY check its own data to ensure that the target exists and is a valid entry.
  3. The server MAY check that the pingback has not already been registered.
  4. The server MAY record the pingback.
  5. The server MAY regenerate the site's pages (if the pages are static).

4. Conformance Requirements

To claim conformance to this specification a pingback client MUST support server autodiscovery as described in this specification and MUST correctly send pingback XML-RPC calls.

To claim conformance to this specification a pingback server MUST be able to receive pingback XML-RPC calls and MUST always return results that conform to the allowed return values.

Note that some pingback servers may not have associated pages. For example, a pingback gateway server could be standalone, and other pages would then use the link element to link to this gateway server instead of providing a server of their own. To claim conformance to this specification a pingback-enabled page MUST have a link element in order to allow for server autodiscovery.

To claim conformance to this specification a pingback user agent MUST support server autodiscovery as described in this specification, MUST correctly send pingback XML-RPC calls, MUST be able to receive pingback XML-RPC calls, MUST always return results that conform to the allowed return values, and MUST have a link element on all potential target pages in order to allow for server autodiscovery.


5. Example

Here is a more detailed look at what could happen between Alice and Bob during the example described in the introduction.

  1. Alice posts to her blog. The post she's made includes a link to a post on Bob's blog. The permalink to Alice's new post is http://alice.example.org/#p123, and the URL of the link to Bob's blog is http://bob.example.net/#foo.
  2. Alice's blogging system parses all the external links out of Alice's post, and finds http://bob.example.net/#foo.
  3. It then requests the first 5 kilobytes of the page referred to by the link.
  4. It scans this page fragment for the pingback link tag, which it finds:
    <link rel="pingback" href="http://bob.example.net/xmlrpcserver">
    If this tag had not been contained in the page, then Bob's blog would not support Pingback, so Alice's software would have given up here (moving on to the next link found in step 2).
  5. Next, since the link was there, it executes the the following XML-RPC call to http://bob.example.net/xmlrpcserver:
    pingback.ping('http://alice.example.org/#p123', 'http://bob.example.net/#foo')
  6. Alice's blogging system repeats step 3 to 5 for each external link that was found in the post.

There ends the work undertaken by Alice's system. The rest of the work is performed by Bob's blog.

  1. Bob's blog receives a ping from Alice's blog (the ping sent in step 5 above), naming http://alice.example.org/#p123 (the site linking to Bob) and http://bob.example.net/#foo (the page Alice linked to).
  2. Bob's blog confirms that http://bob.example.net/#foo is in fact a post on this blog.
  3. It then requests the content of http://alice.example.org/#p123 and checks the Content-Type of the entity returned to make sure it is text of some sort.
  4. It verifies that this content does indeed contain a link to http://bob.example.net/#foo (to prevent spamming of pingbacks).
  5. Bob's blog also retrieves other data required from the content of Alice's new post, such as the page title, an extract of the page content surrounding the link to Bob's post, any attributes indicating which language the page is in, and so forth.
  6. Finally, Bob's post records the pingback in its database, and regenerates the static pages referring to Bob's post so that they mention the pingback.

A. References

[RFC 2119]
Key words for use in RFCs to Indicate Requirement Levels, S. Bradner. IETF, March 1997. RFC 2119 is available at http://www.normos.org/ietf/rfc/rfc2119.txt.
[RFC 2396]
Uniform Resource Identifiers (URI): Generic Syntax, T. Berners-Lee, R. Fielding, L. Masinter. IETF, August 1998. RFC 2396 is available at http://www.normos.org/ietf/rfc/rfc2396.txt
[XML-RPC]
XML-RPC Specification, D. Winer. UserLand Software, Inc, June 1999. XML-RPC is available at http://www.xmlrpc.com/spec