MAMA: Markup report, part 4: Forms, tables, and plug-ins, oh my!
In this week's overview we wrap up MAMA's look at markup by covering its most complex structures—forms, tables, and plug-ins. These topics take Web pages from a simple series of text, links, images, and lists to an entirely different level. Forms greatly expand user interaction possibilities. Tables generate axial relationships—which authors have creatively distorted for their most popular (and questionable) use, creating pixel-perfect grid based layouts. Plug-ins afford extensibility beyond HTML's stock capabilities. Without any of these features, HTML would be a barren, unexciting markup language. For a deeper look at these areas and more, the following MAMA article topics are also available this week:
Aside from hyperlinks, forms are the main way in which users interact with the Web. Among their varied critical uses, forms allow people to find things with search engines, publish their thoughts with blogging systems, and buy things on e-commerce sites. Forms in general are very popular—found on up to one-third of all pages analyzed.
Elements used in forms
The popularity of the main types of form elements varies widely, and sometimes
surprisingly. For example, almost every
FORM has an
INPUT, but relatively few make use of
TEXTAREA. Such variations may be due to a number of
factors, including inherent biases in MAMA's current URL set (a majority of MAMA's
URLs are Surface/Home
pages, which rarely have forms on them, apart from the increasingly-popular search field). The intended use of a Web page often dictates the types of elements
used, including form elements.
We will start our look at form elements by looking at its main container element:
FORM. It was detected in 1,040,771 of MAMA's URLs.
Notice that the
Action attribute is used on most of these
pages—it specifies what to do with the information the form is collecting.
This attribute is required, so the dominance here is understandable. The
Method attribute is only slightly less popular than the
Action attribute (89.4% of all forms usage).
Approximately 70% of pages that specify an explicit HTTP Method use the
"post" method, while ~46% use the "get"
method. This would indicate a clear authoring preference for the
"post" method, but there are a few factors to
consider. About 15% of the pages specifying the
attribute use multiple forms on the page that mix both
"post" and "get" methods.
There are 110,428 URLs that used the
Method attribute; "get"
is the implied default value in such cases. This brings the relative preferences
Method amongst all
usages much closer: 62.2% for "post" and 51.6% for
an explicit or implied "get" value.
This popular element is used in 96.9% of all documents using forms. With the element's functionality being as overloaded as it is, this popularity is both understandable and expected. Some of its attributes are also very popular.
| ||1,005,152|| ||213,924|
| ||990,058|| ||172,843|
| ||947,403|| ||135,049|
| ||656,354|| ||120,420|
| ||335,990|| ||119,902|
Many of the attributes for the
INPUT element are only
applicable to specific
Type attribute values, so we must
examine this attribute's values first.
|Attribute value||Frequency||Attribute value||Frequency|
We can now look more deeply at the various uses of the messy
- The "empty" value indicates that an
INPUTelement did not have a
Typevalue at all. In such situations, a widget is interpreted as
Type="Text". In all, 79,050 URLs used
INPUTelements where none of them specified a
- In the early days of forms, "Submit" buttons were usually paired with a "Reset" button, but today, that seems to be passé. By comparison, "Reset" is rarely encountered now.
- The "Submit" and "Image" types: Because "Image" is a type of submittal, and each will often be used to the exclusion of the other, looking at their combined totals shows that submittal is the most popular function of forms (more popular than "Text"). This is actually an expected result.
Type="Image" related attributes:
Hspace(horizontal dimensions) have just a slight edge over
Vspace(vertical dimensions), just like they do with the
- The exclusive choice widget,
Type="Radio", is twice as popular as the multi-choice
Tables have a bad reputation among the markup purists in the development community,
because many authors often use them solely for Web page layout. Tables
generally increase the complexity of documents and can make them more difficult
to maintain. Authors do not really see these factors as significant drawbacks, though,
judging by the overwhelming popularity of layout tables in the MAMA result set. In practice, the use of presentational
tables by authors is what makes the main table-related elements some of the most
popular sub-elements of
BODY, after the
IMG elements. The most frequently occurring of
these is the
TABLE element, found in 2,894,184 of MAMA's
URLs (82.5%). Authors have a definite preference for the table elements they use.
Almost every table uses the
TD elements. All of the other elements are used rarely
TFOOT are all used in less than 1% of
Attributes of the
This wrapper element for table structures is (naturally) the most popular element
of its type. It ranks #8 overall in element popularity, used in 82.47% of all
MAMA's URLs. Many attributes were detected for this element, only some of which
are in the standards. A few of these attributes are VERY popular
with authors -
are used in ~90% of all URLs that use tables. Usage of other attributes, like
register; they are used in less than 0.5% of all
These two elements are grouped together because they mostly share the same
attributes and have very similar usage. But their usage rates could not be more
different. The most popular table sub-element is
TD (detected in 2,891,972 URLs), and it is the 9th most popular element overall (used in 82.4%
of all URLs in MAMA and 99.9% of all URLs using the
TH sub-element, on the other hand, is used in only 5.1%
of URLs using the
TABLE element. Because of the
inherent attribute overlap between
TH, it can be interesting to compare attribute usage
rates between the two elements. Percentages of the total element usage are
also provided to help cross-comparisons.
|TD Attribute||Frequency||% of
|TH Attribute||Frequency||% of|
How deeply are tables nested?
One of the features requested for MAMA was the ability to detect deeply-nested
tables. Such structures can be excellent stress tests for a browser. In theory,
TABLE open tag should have a corresponding closing
tag. As MAMA traversed a document, any
TABLE open tags
added 1 to the current depth counter. A closing
tag would subtract 1 from the depth counter. When the depth counter hit a new high
score for the document, that value became the new "maximum table depth". This rather
simplistic system yielded a number for a document's "maximum table nesting depth"—it does not necessarily mean that the open and closing tags are properly nested;
that is another issue entirely. The average nesting depth when tables were used was 2.77.
The maximum nesting depth discovered was an astounding 745 deep at
The Web has multiple elements to handle plug-ins because of simple evolution.
At first, there was no standardized way to use plug-ins, so solutions arose
PARAM. The standards process produced a cohesive
solution in the
OBJECT element, but authoring inertia
seems to indicate that
are not going anywhere. Rather than the
being used instead of
EMBED, the majority of
OBJECT tags are used
in conjunction with
In all, 503,783 URLs use both
elements (94.5% of all
OBJECT and 92.3% of all
MAMA tried to discover evidence of Flash usage in every document it analyzed. It had to resort to looking for a number of different factors, as authors can use Flash in many ways. Its use was detected by satisfying one or more of the following components:
PARAMelement that contained the substrings ".swf" or "flash"
- Any MIME types containing the substring "flash"
from getting any
- Any scripting component containing the substring "flash" or ".swf"
Using these criteria, 1,176,227 URLs were found to be using Flash. This is a
MUCH higher result than one would expect by looking solely at
This means that either some aspect(s) of MAMA's detection mechanism are too
relaxed, or that some part of the analysis is flagging a lot of positive matches
alone does not catch. If any part of the above detection is suspect, it is likely
to be the scripting detection of Flash (due to the simplistic nature of its
substring search). Judging by anecdotal evidence seen over the years, the number
is probably pretty accurate; scripting is frequently given the duty of dynamically
generating plug-in markup.
As with Flash, there were a number of methods MAMA used to detect Java usage. The following criteria were used to judge whether or not Java was being used in a URL and resulted in the detection of 53,688 matches:
- Any usage of the
PARAMelement that contained the substrings ".class" or "java"
- Any MIME types containing the substring "java"
from getting any
- Any scripting component containing the substring "application/java-vm"
Now that we have spent several weeks looking intensely at HTML's many markup topics (and rightly so), we will next be turning our attention to other important Web page technologies that are vital to address in any examination of the Web. Next week we will look at the details of CSS usage: the whos, whats, wheres, whens, whys, and hows of the way CSS is used.
This article is licensed under a Creative Commons Attribution, Non Commercial - Share Alike 2.5 license.