MAMA: CSS syntax
- Previous article—MAMA: CSS quantities and sizes
- Next article—MAMA: Scripting - quantities and sizes
- Table of contents
- External CSS file names
- External CSS MIME types
- Media types
- Pseudo-classes and pseudo-elements
- CSS properties
- Notable CSS syntax: Inherit and !important
- Miscellaneous CSS property values and other syntax
- Saarsoo's CSS study
MAMA's look at CSS covered a number of different areas. It looked at external CSS
element), embedded CSS (via the
element), and inline CSS (using the common
attribute). It also delved into CSS specified using the
syntax and did its best to reveal CSS usage in XML with the "xml-stylesheet"
processing instruction. Overall, CSS was detected in 2,821,141 (80.39%) of the
URLs that MAMA analyzed. CSS properties, MIME types, media types, and other
syntax was tracked, but more can be done to analyze CSS usage; Saarsoo's study
proved that. We will look at the factors, many unique to this analysis, that MAMA
tracked in its study of CSS, and then we will compare some of the commonalities
and differences between the results MAMA found and what Saarsoo was able to
discover in his study.
MAMA tracked 3 types of CSS at-rule syntax:
@charset, only the existence of the at-rule
is tracked. In the case of
@media, the existence
is tracked, along with the stated media type values (see the Media
types section below for more). Lastly,
statements were dissected and analyzed for their file names and media types.
|At-rule type||Frequency||% Total|
@Import usage: Quantities
This syntax represents an additional source of CSS content when used in a document.
@import syntax is necessary to analyze but
tricky to handle - an
@import statement can point
@import statements; it can even point to
itself in endless recursion. In order to sidestep such logistical headaches,
only the first-level
@import URLs were resolved,
downloaded and added to the CSS analysis queue for MAMA. These first-level
@import situations were detected in 191,496 URLs.
The quantities and sizes of
were also tracked. The most extreme case originally registered as having 1,224
@import statements, but more recent scrutiny
exposed only 68 (still high, but not astronomical like before). When
@import is used, the average number of statements
was 2.3 and the most popular number was 1. The top
case (verifiable at the time of writing) was
points out an issue in MAMA's detection strategy. Sure, 151
statements are detected, but the majority of those are repeated declarations—there are only 1-2 dozen unique URLs represented there. A
full frequency table of
@import quantities is available.
@Import usage: sizes
As with external style sheets, the extreme size values here point out some
problems with MAMA's strategies for deciding what to do with
content. The list of URLs that MAMA analyzes is not reduced to unique URLs,
which results in inflated sums. This is an issue when the same
object is referenced in multiple sub-frames, but some pages
@import reference multiple times, even in the same document!
|Size range||Frequency||Size range||Frequency||Size range||Frequency|
|=0||3,331,624||>7000 && <=8000||6,881||>25000 && <=30000||6,089|
|>0 && <= 500||14,350||>8000 && <=9000||5,355||>30000 && <=35000||4,732|
|>500 && <=1000||7,569||>9000 && <=10000||4,737||>35000 && <=40000||8,540|
|>1000 && <=2000||17,123||>10000 && <=12000||8,461||>40000 && <=45000||2,423|
|>2000 && <=3000||17,252||>12000 && <=14000||6,765||>45000 && <=50000||2,055|
|>3000 && <=4000||9,426||>14000 && <=16000||5,396||>50000 && <=75000||4,791|
|>4000 && <=5000||8,751||>16000 && <=18000||5,108||>75000 && <=100000||1,533|
|>5000 && <=6000||8,693||>18000 && <=20000||4,138||>100000 && <=150000||1,523|
|>6000 && <=7000||7,024||>20000 && <=25000||8,280||>150000||561|
|http://www.boone.k12.ky.us/index.htm (URL no longer active)||874,643|
|http://www.waitrose.com/inspiration/wfi.aspx (URL no longer active)||751,734|
External CSS file names
Tracking CSS file names is an example of a MAMA feature in search of a reason for
existing. The URLs of external style sheets from
@import-ed CSS were reduced to just the final filename
portion and this was stored in MAMA. This originally began as a request from a
co-worker to track file names used
by external scripts. With scripts, I knew that this data would
be compelling and useful. The code for tracking script file names was easy to replicate
for external CSS files, but I did not know what the result would be—it turns out it
is not very compelling. The popular file names used for external CSS files are rather
tedious and obvious: "style.css", "main.css",
"default.css", and the inspired "css.css"
are among the devastatingly insightful author choices for CSS file names. Yes, the
full frequency table is also available.
External CSS MIME types
This feature tracked the actual returned MIME type of the external CSS files
references). It did not trust any reported
Type attribute, if present. The actual result is
what one would expect—almost 99% of all external CSS are delivered with a
"text/css" MIME type. Other types were encountered,
but some are puzzling. Why would some external CSS be served as an image or
place value of "text/html" has two easy explanations—misconfigured Web servers (again), or Pseudo-404 errors redirecting to full
HTML error pages. 134,839 URLs with external style sheet references had at least
one that had no MIME type at all. Once again, MAMA comes through with a
full frequency table for your viewing pleasure.
In all, 404,212 pages specified at least one CSS media type. Media types were detected
by looking at the
Media attribute of all
STYLE elements, as well as the CSS
at-rule syntax. The resulting list of media types were then matched against the following regular
Any media type that was not recognized fell into a catch-all category termed "other". What were some of the "other" media types? The 3 main types that were noticeable in significant quantities are all from CSS2—"braille", "embossed" and "tty". These values will definitely be added to the regular expression above the next time a big analysis is done.
|Media type||Frequency||% Total
|Media type||Frequency||% Total|
Pseudo-classes and pseudo-elements
There are a number of these constructs defined in CSS2 and CSS3. A subset of pseudo-classes and pseudo-elements were chosen for tracking
in MAMA. Some obvious/important pseudo-classes were overlooked in this analysis,
":visited". It must be stressed that these are
not the only possible pseudo-classes and pseudo-elements, just most of the ones
that were widely (or soon to be widely) implemented by browsers at the time they
were added to MAMA.
A simple regular expression match was performed on all CSS content looking for the following pattern:
:hover is used in two-thirds of all pages that use CSS.
:after is (strangely) 3 times more popular than
:before. The pseudo-element
is more than 4 times as frequent as
(although that can probably be attributed to
being in CSS2, while
:last-child was not added until
CSS3). The typography distinctions that are
:first-line are not that widely used, although
authors clearly prefer to control the initial letter of a block 3 times as much
as the initial line.
The most popular CSS properties are the replacements for standard "old school"
HTML presentational markup. Three of the top five properties replicate the functionality
FONT element, and the remaining ones take over
B elements. For CSS Box Model properties
'padding'), the shorthand versions are more popular
than their component forms, but the reverse is true for the
'background' properties. The most popular CSS Box
Model side properties are top for
'margin', and bottom
|CSS property||Frequency||% Total
|CSS property||Frequency||% Total|
Browser vendor CSS property extensions
The major browser makers have extended CSS over the years, and documents on the
Web show just how much effect this has had on authoring practice. Mozilla's
'-moz-opacity' is the most popular one, with the
'opacity' being only slightly
more popular. Microsoft Office CSS extensions (prefixed by "mso-")
have the highest representation overall, with 202 (!!) different CSS properties
in the frequency table. Adobe ("-adbe-"), Apple/Safari
("-webkit-"), KDE ("-khtml-"),
Microsoft ("-ms-") and Opera ("-o-")
are all also represented by CSS browser extensions.
Notable CSS syntax: inherit and !important
Two keywords in CSS have special meaning—they are not selectors, and they are
not properties. The "inherit" keyword is a special
global property value used to explicitly pass on a particular value from a parent
to a child. Just under 10% of all URLs using CSS (278,743 URLs) use this keyword
at least once. The other special keyword is "!important",
which specifies a shift in the bias of a document's cascade order toward a specific
CSS rule. It was found in 155,449 of MAMA's URLs (over 5% of all cases using CSS).
These numbers seem significant, but if one frames the numbers in persective with
the CSS property frequency table, optimism is quickly deflated. For instance,
there are almost 75 CSS properties that are more popular than the "!important"
keyword, including the non-standard
most of the scrollbar properties.
Miscellaneous CSS property values
MAMA generally tracked only CSS properties in this version. Future MAMA versions plan to gather more details about CSS. Some other parts of CSS syntax were also harvested this time, but MAMA generally stayed away from the values used by the CSS properties. There were some exceptions—due to requests from co-workers, a few select property values were compiled.
Saarsoo's CSS study
Renee Saarsoo's university thesis work "Coding Practices of Web Pages" was groundbreaking in its coverage of both the breadth and depth of CSS usage on a large scale URL set. I discovered this study very late in MAMA's most recent development cycle and was impressed with the scope of the information presented—especially compared with the CSS information that MAMA was gathering. Now, Saarsoo was able to discover a number of things that MAMA did not, but the reverse is also true. Together, these two studies reveal a substantial amount of information about CSS usage on the Web.
When developing code, some things are easy ... and some are hard. For MAMA and the way it was designed, information about CSS selectors, property values, and units was among the harder things to analyze. Saarsoo's study represented these areas very well. In the future, the Perl CSS::SAC parser used in Saarsoo's study will be integrated into MAMA in the hopes of gathering similar data for it to scrutinize and correlate.
By analyzing CSS selectors, Saarsoo was able to look at the actual
referenced by CSS. MAMA did not do this, but it did look at all
used in markup. By combining these two, an interesting comparison could be
generated about how the attributes specified in a page are used—and disused—by CSS.
Some loose comparisons between MAMA and Saarsoo's CSS results
Saarsoo's study looked at some factors that do not have direct comparisons in MAMA, but we can look at data of a similar nature for instructive parallels. For instance, Saarsoo's study looks at CSS usage of image formats for various purposes. It showed that the GIF format is used almost 4-to-1 over the JPEG format, with PNG trailing FAR behind both. MAMA's look at inline and background image usage in markup also shows that the GIF format is dominant, but JPEG usage is only slightly less popular, and the margin to PNG's third place ranking is much smaller.
MAMA's look at the
FONT element reveals trends in font name usage and colors specified for
them. These findings can be compared to Saarsoo's look at the CSS
property and the general usage of CSS color units. We can see that the popularity
of the top font names are almost exactly the same between these studies:
"Verdana", and "sans-serif"
are definitely kings. Saarsoo concluded that the #rrggbb colour syntax was the most popular
in CSS, and this is also true with the
markup usage. His results regarding the most popular colors also agrees with MAMA's
findings about the
- Previous article—MAMA: CSS quantities and sizes
- Next article—MAMA: Scripting - quantities and sizes
- Table of contents
This article is licensed under a Creative Commons Attribution, Non Commercial - Share Alike 2.5 license.