MAMA: Markup report, part 3: Basic BODY markup
This is it. This week we begin to cover the real heart of HTML markup. These elements are what make HTML tick—they give documents their life and primary expression. Hyperlinks create the "web" of the Web; images are the primary avenue for authors to incorporate multimedia, and the many phrase and block elements covered here imbue content with semantics and basic formatting. We will take a look at each of these areas in turn. For a deeper look at these areas and more, the following MAMA article topics are also available this week:
To read more details of MAMA's findings, check out the MAMA home page.
Hyperlinks make the Web the web that it is. It is little wonder then that
the A element is the most popular of any of the
child elements. MAMA considered each occurrence of an
attribute in an
as a hyperlink and kept a running tally for each URL analyzed. It also compared
the domain of the URL in a hyperlink with the domain of the page being analyzed
in order to discover how many external domains were being referenced in a
document. Of the 3,337,666 URLs that contained hyperlinks, 72.2% of them had
at least one linking outside the domain of the URL that was analyzed. The
average number of hyperlinks per document was 38.4.
| ||3,304,834||99.9%|| ||452,272||99.8%|
| ||1,978,018||59.8%|| ||450,478||99.4%|
| ||658,820||19.9%|| ||439,720||97.0%|
| ||485,168||14.7%|| ||203,624||44.9%|
| ||96,613||2.9%|| ||13,570||3.0%|
Rel attribute for the
element expresses the relationship that the destination URL has to the current
URL. Until relatively recently this attribute was underused. However, its
use has grown in
the last few years as microformats have been embraced. The most popular
values for this attribute are "nofollow" at more
than 2-to-1 over the next-nearest values of "bookmark"
|Attribute value||Frequency||Attribute value||Frequency|
MAMA kept track of how many images were detected in each document, including duplicate references to the same image. It tallied the total image references encountered (avg: 22.6), the number of unique images encountered (avg: 12.3), and the maximum number of times an image was referenced multiple times (avg: 15.2). MAMA found 3,233,208 URLs (92.14%) using images via the following methods:
- The forms
- Elements with a
IMG element was by far the most popular method
for using images in a document. Of the child elements of
IMG is second in popularity only to hyperlinks—used in 91.7% of MAMA's URL set. Several of its attributes rank among the top
10 of all markup attributes.
| ||3,219,304||99.99%|| ||875,461||27.2%|
| ||2,957,808||91.9%|| ||526,348||16.3%|
| ||2,945,989||91.5%|| ||447,774||13.9%|
| ||2,810,265||87.3%|| ||445,580||13.8%|
| ||2,520,939||78.3%|| ||367,132||11.4%|
Authors use images in many ways, and there is definitely room on the Web for the many formats currently in play. In addition to keeping track of image totals, MAMA looked at the popularity of the GIF, JPEG, and PNG formats. MAMA defaulted to using an image's file extension to judge the type of format. If MAMA could declare a particular format from just this alone, it did not try to dig any deeper. If the file extension check was inconclusive, MAMA would then download the HTTP HEAD of the referenced image and proceed to examine the image's MIME type to detect the format.
JPEG has no real competition in depicting photographs or realistic scenes, but the PNG format and the dominant GIF format are at odds for the same use cases. Due to a number of historical issues, uptake of the PNG format has been slower than many expected. Authors seem to have no problem with both formats coexisting on their Web sites.
Image formats in combination: Venn diagram
The following diagram shows the usage overlap of the three dominant image formats. The relationship between GIF and PNG is usually characterized as a competitive one, so it was expected that these numbers would demonstrate authors showing a clear preference for one or the other in their pages. That is definitely not the case. PNGs were detected in 374,408 URLs, and of those, 311,827 URLs (83.3%) also used the GIF format as well. If that is what constitutes a format war, the battle is a subtle one.
Note: Region sizes are not to scale
The purpose of some of these inline elements is to assign semantic meaning to
text content. Many of the other elements in this category set their sights
lower; they convey simple formatting and appearance information. In fact,
the most popular of these elements is still the
element—an element that exists purely to convey formatting. Markup purists
will be dismayed that
FONT remains in such high usage,
but they can console themselves that the overall use of CSS (80.4%) now edges
FONT (58.7%) by a comfortable margin. Other
interesting findings of note include
twice as popular as
in use almost 8 times as much as
Zeroing in on this element can tell us a lot about "old school" HTML. When
it was introduced at the end of 1994, it filled an early void for typographical
capabilities with authors. CSS has since subsumed all the features that
FONT first brought to the Web. The values for the
main attributes of the
FONT element show a dominant
value preferred for each: the
Color is usually white
("#ffffff" or "white"),
Face is typically "Arial",
Size is most often "2".
|#000080||101,950||times new roman||197,881||5||332,907|
Block and replaced elements
These elements are used in a wide variety of situations to accomplish an assortment of tasks. Some of them are widely used and others are not. The only relationship that many of these elements have is that they share little in common with the other main MAMA categories.
BR element is used most frequently in this group,
DIV are also
favored by authors. The
BR element ranks a little higher
than the numbers below indicate, because MAMA detected
<br/>. The two variants
BR were detected in 2,884,356 of MAMA's URLs (82.2%).
The heading elements (
followed an expected popularity path:
H2 is found
less often than
less often than
H2 and so on.
is found at almost 20 times the frequency of the
element; it is a little surprising that authors do not show a tendency to rank things.
This week's topics mark a turning point in the MAMA results. Although some other topics are also very popular (such as next week's tables and forms), hyperlinks and images constitute arguably THE most important part of HTML. Next week, we will wrap up MAMA's look at markup by covering the remaining popular markup topics: forms, plug-ins, tables, and XML.
This article is licensed under a Creative Commons Attribution, Non Commercial - Share Alike 2.5 license.