|
You don't need a Crystal Ball |
![]() |
|
to Search for Files |
| SearchWin Home |
| Article Home |
| Web Forms |
| Borderless Windows |
| Cookies-Part One |
| Cookies-Part Two |
| About Span and Div. |
| Using CSS-Part 1 |
| Using CSS-Part 2 |
| Using CSS-Part 3 |
| About DOCTYPE |
| Tables With XML |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
About <!DOCTYPE> Some - not all - web pages begin with <!DOCTYPE ....>. This article discusses what it is, what does it does, and how can it be used. As you may be aware, HTML is a sub-set of the Standard Generalized Markup Language (SGML). SGML is an international standard that describes the relationship between a document's context and its structure in an open and vendor-neutral format. Information can be shared across computer platforms and applications. This delivers information solutions ranging from electronic publishing to document repositories. Each markup language defined in SGML is called an SGML application. An SGML application is generally characterized by: 1. An SGML declaration. The SGML declaration specifies which characters and delimiters may appear in the application. 2. A document type definition (DTD). The DTD defines the syntax of markup constructs. The DTD may include additional definitions such as character entity references. 3. A specification that describes the semantics to be ascribed to the markup. This specification also imposes syntax restrictions that cannot be expressed within the DTD. 4. Document instances containing data (content) and markup. Each instance contains a reference to the DTD to be used to interpret it. A DTD contains the "syntax list" (and other information) for the Markup Language they define. In principle, a document consumer should be able to load a DTD and from it, have a list of all valid Tags, their usage, allowed characters, etc. for the defined Markup Language. In general, DOCTYPE tells a "program" (maybe "document consumer" is a better term) which DTD to use and where to get it, in order to correctly present the data in the document. In the case of Browsers, it basically does nothing. Browser versions before 6.x ignore it and use whatever DTD was built in by the author(s). Netscape & IE V6.x make some use of DOCTYPE. Netscape 6.x exposes it as part of the document model - I've yet to find what if anything the Browser does with it. With IE 6.x, DOCTYPE can be use to turn on strict standards-compliant mode - it's unclear how this is useful. DOCTYPE Syntax The <!DOCTYPE...> declaration must be the first line of the file, preceding any document tags. Schematically, the syntax is: TopElement Availability "Registration//Organization//Type Label//Definition Language" "URL" The elements of DOCTYPE syntax are: TopElement: Specifies the top level element type declared in the DTD. This corresponds to the SGML document type being declared. Default (for web pages): HTML. Availability: Specifies whether the formal public identifier (FPI) is a publicly accessible object or a system resource. Default: PUBLIC - It is a publicly accessible object. Or SYSTEM: System resource, such as a local file or URL. The next 6 element are grouped together & enclosed in a pair of quotes: Registration: Specifies whether the organization is registered by the International Organization for Standardization (ISO). Default: Organization name is registered. (The Internet Engineering Task Force (IETF) and World Wide Web Consortium (W3C) are not registered ISO organizations.) Organization: Specifies a unique label indicating the name of the entity or organization responsible for the creation and maintenance of the DTD being referenced by the !DOCTYPE declaration. i.e., the OwnerID of the DTD. Type: Specifies the public text class, the type of object being referenced. Default: DTD. Label: Specifies the public text description, a unique descriptive name for the public text being referenced. It can be appended with the version number of the public text. Default (for web pages): HTML. Language: Specifies the public text language, the natural language encoding system used in the creation of the referenced object. It is written as an ISO 639 language code (uppercase, two letters). Default: EN - English language. The last element is enclosed in quotes: URL: Specifies the location of the referenced object (DTD). This should be specified; it provides the means for the "document consumer" to obtain the DTD being used for the document. Examples: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Strict//EN"> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> Non-web page: <!DOCTYPE vxml PUBLIC "-//Tellme Networks//Voice Markup Language 1.0//EN" "http://resources.tellme.com/toolbox/vxml-tellme.dtd"> DOCTYPE - what it can be used for Although Browsers make little or no use of DOCTYPE, it can have some use for a web page author. There are a variety of HTML "validators" around - both as local programs and web accessible (see Web Reference). They can be useful for checking HTML code for syntax errors. They can also be useful to get an idea of the audience that will be able to view your page as you expect it to be seen. For example, you specify HTML 2.0 as the DTD and validate your page against it. It passes without error. You know your page can be viewed by most any Browser. Similarly, if a page passes HTML v3.2 verification, the page should work in most version 4.x+ browsers. If you code passes a HTML v4.01 check, but not a v3.2 check, you can almost bet at least some folk will have problems with your pages. HTML 4.x comes in three flavors. Their meaning & major differences are: HTML 4.x Strict HTML 4.x Strict is a trimmed down version of HTML 4.x that emphasizes structure over presentation. Deprecated elements and attributes (including most presentational attributes), frames, and link targets are not allowed in HTML 4.x Strict. HTML 4.x Transitional HTML 4.x Transitional includes all elements and attributes of HTML 4.x Strict but adds presentational attributes, deprecated elements, and link targets. HTML 4.x Transitional recognizes the relatively poor browser support for style sheets, allowing many HTML presentation features to be used as a transition towards HTML 4.x Strict. HTML 4.x Frameset HTML 4.x Frameset is a variant of HTML 4.x Transitional for documents that use frames. The FRAMESET element replaces the BODY in a Frameset document. DOCTYPE statements for HTML 4.1 : <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd"> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/1999/REC-html401-19991224/frameset.dtd"> DOCTYPE statements for HTML 4.0 : <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-html40/strict.dtd"> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Frameset//EN" "http://www.w3.org/TR/REC-html40/frameset.dtd"> DOCTYPE statement for HTML 3.2 : <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> DOCTYPE statement for HTML 2 (If there is no DOCTYPE statement on a web page, most validators will assume html 2) : <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> Errors will tell which elements on your page will (may) not work or not be rendered correctly. For example, many browsers will ignore a missing closing tag such as: <b> this is bold <br> and this is a new line ( correct syntax is: <b> ... </b> <br / ... ) The problem being, it is unpredictable what the browser will do. Tags preceding and following such "errors" can affect or be affected by them depending on the Browser. Validation provides only nominal information about how pages will render in browsers; it only provides syntax validation and not information about what a particular browser will do with it. You may have your HTML code "correct" and still find it renders differently from browser to browser. XML It interesting that this markup language's documents do NOT begin with DOCTYPE. Instead, XML documents begin with: <?xml version="1.0" encoding="UTF-8"?> It contains the processing instruction <?xml version="1.0" encoding="UTF-8"?>, which both tells the parser that it is an XML document and that it uses the standard 8-bit encoding schema of most typical English documents. Another DOCTYPE You will also see (and be seeing more of) XHTML. XHTML stands for EXtensible HyperText Markup Language. XHTML is essentially HTML v4.01 defined as an XML application. It uses 4.01 tags and XML syntax. It is expected that XHTML will replace HTML. It's DOCTYPE statement looks like: <?xml version="1.0"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> Like HTML v4.01, it comes it 3 flavors: STRICT, TRANSITIONAL, & FRAMESET. XHTML has the same tags as HTML v4.01 and so these pages work correctly in IE 5.+ and NS 6.+. Conclusion Although DOCTYPE has been required at the beginning of html documents since HTML v3.2, its value for web page authors is limited to HTML syntax validation. Syntax validation is of limited value since Browsers use an internal DTD and "do their own thing" with them. Validation is useful to gauge the browser audience that can view a web page and can reduce html errors thus, improving the view ability of web pages. The newest web browsers recognize the DOCTYPE declaration but still utilize an internal DTD. The Web is transitioning toward XHTML & XML and the current Browsers that support them also utilize internal DTDs. Conceptually, an SGML document specifying the DTD to be used makes sense. It would make a lot more sense if Browsers actually utilized them. Web References: - DOCTYPE http://www.htmlhelp.com/design/dtd/ http://www.w3.org/TR/html4/intro/sgmltut.html http://www.htmlgoodies.com/tutors/descript.html http://mfx.dasburo.com/htmlsgmlstuff.html - Validators http://www.htmlhelp.com/links/validators.htm http://tech.irt.org/articles/js138/index.htm - validator lists http://www.w3schools.com/site/site_validate.asp - XML & XHTML http://www.w3.org/TR/REC-xml (XML) http://www.w3.org/TR/xhtml1/ (XHTML) http://www.devx.com/upload/free/features/webbuilder/2000/wb0300/kc0300/kc0300-1.asp |