Arc Forumnew | comments | leaders | submitlogin
2 points by rocketnia 1713 days ago | link | parent

"And a self-closed tag lets you know there's no body, which is also a plus."

A <p> tag does have a body. HTML like this:

  <body>
    <p>First paragraph
    <p>Second paragraph
  </body>
Is treated basically like this:

  <body>
    <p>First paragraph
    </p><p>Second paragraph
  </p></body>
If instead you write:

  <body>
    <p />First paragraph
    <p />Second paragraph
  </body>
Then... Well, I should try to be precise....

It looks like the HTML specification defines this as a "non-void-html-element-start-tag-with-trailing-solidus parse error." The spec says that in this case, "The parser behaves as if the U+002F (/) is not present," but also that "[browsers] may abort the parser at the first parse error that they encounter for which they do not wish to apply the rules described in this specification."

I don't know of any browsers that abort the parsing altogether, so it's still reliable to write the HTML that way.

However, the similarity to XML is actively misleading in this case. When you process that document as HTML, you still get structure like this:

  <body>
    <p>First paragraph
    </p><p>Second paragraph
  </p></body>
But when you process it as XML, you get structure like this:

  <body>
    <p></p>First paragraph
    <p></p>Second paragraph
  </body>
So if you're trying to write a polyglot HTML/XML document, self-closing <p /> tags still probably aren't a great option. Closing the paragraphs explicitly, like so, makes it clearer how the structure will end up:

  <body>
    <p>First paragraph</p>
    <p>Second paragraph</p>
  </body>
---

I think modern HTML does have a reliable common subset with XML. Modern HTML treats <br></br> and <p /> as parse errors, but it treats <br /> and <p></p> as valid. To write HTML/XML polyglot content, you just need to pay attention to whether you're dealing with a void element like "br" or a non-void element like "p".

Incidentally, why use an HTML/XML polyglot at all? There are at least a few situations where it can make sense:

- You're serving it as HTML, but (at least someday) you might want to use an XML-processing tool on it or serve it as XHTML.

- You're trying to serve it as XHTML, but you're worried you'll mess up your server configuration and serve it as HTML by mistake.

- You're confident you can serve it as XHTML today, but you have a backup plan to serve it as HTML if needed. In particular, you're afraid someday your XHTML will be invalid due to a bug in your code, a bug in a browser, an intentional spec violation in a browser (e.g. for security or user privacy), or a backwards-incompatible change in the spec. The XHTML spec dictates that an invalid page won't be displayed at all, so if you end up with invalid XHTML for any of those reasons, your site will be rather unusable until you can implement a fix. If that happens at a time you're not ready to drop everything and look at the bug in depth, then you can make a pretty quick switch to serving it as HTML, and most of the page will display again.

Because of the brittle handling of errors, XHTML still hasn't really gotten off the ground. So it seems like the primary value of the HTML/XML polyglot is to serve a document as HTML but use XML-processing tools on it behind the scenes.

---

A side note...

In the very early days of XML and XHTML, when people were trying to make their HTML pages as XML-like as possible, many browsers would interpret something like <br/> as an element with the tag name "br/". That's why people got into the habit of putting in a space like <br />. That way those browsers would instead interpret the / as an attribute named "/", which was mostly harmless. Nowadays, the space is pretty much vestigial and you can just write <br/> if you want to.



3 points by zck 1712 days ago | link

Yeah, when I dug more into the spec, I found out that <p> tags can't be self-closed. Only void and foreign elements can be self-closed (https://html.spec.whatwg.org/multipage/syntax.html#start-tag...).

The closing </p> can be omitted if the next tag is one of 25 different tags (https://developer.mozilla.org/en-US/docs/Web/HTML/Element/p, see "tag omission").

What I ended up coding was that the (para) call will always add a closing tag. This is more consistent with the spec -- as far as I can tell, the closing tag is never required to be omitted.

-----