html

Lot’s of people use HTML, but not many really understand it. If you don’t know what’s the difference between   and   this article is for you.

What was before World Wide Web?

SGML (Standard Generalized Markup Language) is format for writing text documents with additional informations (tags). In 1986 it became ISO standard. SGML document look’s something like this:

<tag1>
    Some Text
    <tag2 attribute otherattribute="value">
</tag1>

Standard defines syntax, and how to parse it. SGML parser know, that there is tag with name tag1 and it has 2 child nodes: text and another tag.

But it doesn’t define what tag1 means (semantics). It’s general purpose language, it’s base for more specific applications like

Published in 1991, HTML defines for example that <a> is a link, and href attribute contains url.

But excluding HTML, SGML wasn’t so popular. That’s why not many people bother to even remember this name. But I think, that it is important to know the difference between a syntax and a semantics.

XML and XHTML

In 1998 was completed new standard: Extensible Markup Language (XML). It is successor of SGML: it carries only about syntax of document.

In 2000 W3C (organization, that makes web standard) published XHTML 1.0. It was HTML 4.01 witch syntax changed for XML – XHTML document is 100% valid XML document.

What is the difference between HTML 4.01 AND XHTML 1.0?

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
 <head>
   <title>XHTML 1.0 Example</title>
 </head>
 <body>
   <p>
     This is an example
     <br/>
     of XHTML
   </p>
   <div id="empty"/>
 </body>
</html>

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 strictl//EN"
  "http://www.w3.org/TR/html4/strict.dtd">
<html lang="en">
 <head>
   <title>XHTML 1.0 Example</title>
   <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
 </head>
 <body>
   <p>
     This is an example
     <br>
     of XHTML
   </p>
   <div id="empty"></div>
 </body>
</html>

It looks similar, that’s why many people don’t see differences on first sight.

First big change is that XML don’t tolerate syntax errors. In case of error SGML would to try parse something, but XML should return error.

Every XHTML document should have XML declaration, that provides version of XML and encoding.

XML has namespaces, so you can combine many different format in one document. For example you can write XHTML inside SVG or RSS files.

Self closing tags

HTML provides mechanism for automated closing tags. For example   is self closing – it cannot have any content. Another example: cannot have another  inside, so when you write

<p>first
<p>second
<p>last</p>

parser knows, that opening new should automatically close previous. But as you see, it requires, that parser know individual behaviour of each tag. In XHTML you must close all tags manually, so every XML parser should parse a document correctly without any knowledge about this tags. If something isn’t closed, you will get an error.

But to have more readable code, in XML you have short syntax   which is equal  . But it is important to know, that in HTML this / is ignored (as i said before, HTML tries to ignore syntax errors) and don’t do anythink.

If you write:

<body>
  <div/>
  <p>Some text</p>
</body>

if it is inside XHTML document, browser will parse as

<body>
  <div></div>
  <p>Some text</p>
</body>

but inside HTML document it will be

<body>
  <div>
    <p>Some text</p>
  </div>
</body>

Abandoned XHTML 2.0

At the beginning XHTML 1.0 and 1.1 has 2 big problems. First: it wasn’t fully supported by Internet Explorer untill version 9 (released in 2011, 11 years after XHTML was standarized). And remember, that evet then users didn’t updated theis browsers too quicly. So if you wanted to write XHTML, you still needed to serve it as HTML, so browser will parse it as HTML, so you don’t have any advantages of using XHTML.

Second problem was, that programmers didn’t like that idea, that in case of any syntax error whole webpage/application will stop working, and user will see big error message.

W3C proposed XHTML 2.0, which was designed to break backwards compatibility. This idea didn’t meet with approval of web developers, so in 2006 project was abandoned. But it isn’t the end of XHTML.

What is now – HTML5

After abandoning XHTML 2.0 W3C created new standard: HTML5. One of the main assumptions was that HTML and XHTML will be developed together as part of the same standard. You can choose your syntax, but everything else will be identical. After parsing code by browser it will be threaded identically.

W3C decided also, that HTML won’t be fully compatible with SGML specification, because no one really cares. They also introduced using MathML (still not supported by Chrome 😡) or SVG tags inside HTML file.

It’s important to know, that even if you don’t write namespaces in HTML, browser will add it automatically while parsing. You can test it by simple JS code:

var div = document.createElement('div');
div.innerHTML = '<p></p><svg></svg>';
console.log(div.children[0].namespaceURI);// -> "http://www.w3.org/1999/xhtml"
console.log(div.children[1].namespaceURI);// -> "http://www.w3.org/2000/svg"

It is important to know, that if you want to add svg elements to html by JS, you need to add namespaces manually, otherwise browser will not treat is as SVG, but rather like unknown HTML element.

var badSvg=document.createElement("svg");
console.log(badSvg.namespaceURI);// -> "http://www.w3.org/1999/xhtml"
var goodSvg=document.createElementNS("http://www.w3.org/2000/svg","svg");
console.log(goodSvg.namespaceURI);// -> "http://www.w3.org/2000/svg"

Polish version

Have you ever opened mail and noticed, that text is shown correctly, but images are not loaded? And your mail client gives you message: that images were blocked, and you need to click to open it.

This problem is common. It is so common, that it’s hard to find message with images, that loads correctly (unless you mark sender e-mail as trusted). It is so common, that many people I talked about it don’t notice it anymore, and other thinks, that the only solution is to avoid using images.

But reason why and solution is very simple.

Reason

Emails are sent as HTML. So many people treats it as a normal webpage – but they aren’t.

In normal webpage if you want to add image, you puts link to it in <img> tag. And for web browser it doesn’t matter, if it is on the same server or not. So in emails the easiest solution is to upload image to any webserver and put full url inside <img> tag inside email.

But this behavior is great opportunity for spammers. If spammer sends spam, it would like to know if it was readed. Then he know to which email is worth sending more spam. So he could add to each message <img> with unique url, but all pointing to his server. When someone opens mail, also opens image – which means connecting to his server. That’s why mail programs blocks images.

Solution

To send email with images you need to add these as attachments. It solves problem, because image will be sended with email, without any need to ask other server.

Adding attachment, you can specify ContentID (cid). it will be use as url inside mail’s html.

<img src="cid:your_content_id" alt="">

Also don’t forget to set this attachment as inline – so it won’t show your receiver on attachment’s list.

You also could use data: url with base64 encoding, but this isn’t supported in all mail programs, especially in Microsoft Outlook. As I said earlier, email is not a webpage and you can use only small part of browser’s capabilities if you want to be sure, that everyone will reads your message correctly.

TheBugger

From programmers for programmers