# URL specification and browser implementation differences

> Recently, I encountered a difference in output in Ada compared to Safari, Chrome and Firefox. I thought it would be a good idea to write a blog post and explain the difference, and how browser implementations of URL is not kept in sync with the URL specification.

*Published: 2023-05-25 · Tag: algorithms*

---

Recently, I wrote a blog post about [Ada URL parser version 2.0.0][ada-url-parser-v2].
In that post, I mentioned  that the parser is fully compatible with the URL
parser specification, passes all [Web Platform Tests][web-platform-tests] and
currently used in [Node.js 20.0.0][node-20].

A couple of days ago, I noticed a big difference in how URLs were handled by
Ada, Safari, Chrome, and Firefox. It got me thinking that it would be a good
idea to explain why these differences occur and how different browsers handle
URL parsing in their own unique ways, which leads to inconsistencies and
compatibility issues. In this blog post, I'll break down these variations and
shed light on why URLs may not always work the same way across different
browsers.

**Disclosure**: In the context of this blog post, I will be focusing on WHATWG
URL specification, and not RFC 3986 or RFC 3987, even though historically it
was written a lot earlier than the WHATWG.

## Quick Recap

> Is WHATWG URL specification the only URL specification?

No. One notable alternative is the URL specification maintained by the [World Wide Web Consortium][w3c].

**RFC 3986**, also known as the *Uniform Resource Identifier (URI): Generic Syntax* provides its own set of rules and guidelines for working with URLs.

> Is there any differences between RFC 3986 and WHATWG?

It's worth noting that the WHATWG and the W3C URL specification have some
differences and have evolved separately. The WHATWG specification has been
widely implemented by all web browsers, while the W3C specification is used as
a reference by various web-related standards and technologies, such as cURL.

Both the WHATWG and W3C specifications are important references for web
developers and are widely followed in the industry. The choice of which
specification to follow may depend on factors such as browser support, specific
requirements of a project, or the recommendations of relevant standards
organizations.

> What is the meaning of WHATWG?

The term **WHATWG** stands for [Web Hypertext Application Technology Working
Group.][whatwg]. It is a community-driven organization that focuses on
developing and maintaining web standards. The WHATWG was initially formed in
response to the divergence between the World Wide Web Consortium (W3C) and the
browser vendors at the time, who felt that the W3C process was too slow to
address the evolving needs of web developers.

> What is the URL specification?

The WHATWG URL specification is a set of rules and guidelines for working
with URLs (Uniform Resource Locators) in web applications. URL is a
standardized way to identify and locate resources on the internet, such as
web pages, images, or files. The specification provides a detailed definition
of the URL syntax, parsing algorithms, and methods for manipulating URLs.

The URL specification is one of the many standards maintained by the WHATWG.
The group consists of a community of web developers, browser vendors, and other
interested parties who collaborate to define and improve web technologies.

> What is the difference between the WHATWG and the W3C?

While the [W3C][w3c] is another important organization involved in web
standards, the WHATWG operates independently and maintains its own set of
specifications, including [the URL specification][url-spec].

## The Problem

URL standard is a living document and it gets updated quite often. Whenever a
change is introduced several WHATWG members inform URL implementors about the
change. Depending on the reporting method, members of WHATWG open an issue on
the relevant application's bug tracker or send an email to the mailing list.

Due to the priorities, some implementors may not be able to update their
codebase to the latest version of the URL standard.


## Opaque hosts

URLs might have protocols that does not conform to the URL standard, and in the
context of specification, they are called non-special hosts. For example, HTTP,
HTTPS, FTP, SSH and FILE as special protocols.

We recently developed a playground for Ada URL parser, if you want to dig
deeper into the result of [the URL parser][playground-opaque-host-example].

```javascript title="Ada URL parser handling opaque-hosts"
> new URL('yagiz://blog/post/1?source=rss')
URL {
  href: 'yagiz://blog/post/1?source=rss',
  origin: 'null',
  protocol: 'yagiz:',
  host: 'blog',
  hostname: 'blog',
  pathname: '/post/1',
  search: '?source=rss',
}
```

In December 28, 2016, WHATWG added the notion of opaque hosts and added support for opaque host URLs to have hostnames. This change was introduced in [Add opaque hosts pull-request][opaque-host-pr].

Unfortunately, for this particular example, browsers return different results.

### Safari

Safari returns the same result as Ada URL parser, and is kept in sync with the specification. I've tested this on Safari 16.4.

### Google Chrome

Returns a different hostname and pathname, and does not support opaque-hosts.
[Chromium bug report][chromium-bug-report] is stil open after this change in specification was introduced back in 2016.

```javascript title="Result of Google Chrome 113.0.5672"
> new URL('yagiz://blog/post/1?source=rss')
URL {
  href: 'yagiz://blog/post/1?source=rss',
  origin: 'null',
  protocol: 'yagiz:',
  host: '',
  hostname: '',
  pathname: '//blog/post/1',
  search: '?source=rss',
}
```

### Firefox

Returns the same result as Firefox, hinting towards that they're not keeping up
with the specification.

```javascript title="Result of Firefox 113.0.2"
> new URL('yagiz://blog/post/1?source=rss')
URL {
  href: 'yagiz://blog/post/1?source=rss',
  origin: 'null',
  protocol: 'yagiz:',
  host: '',
  hostname: '',
  pathname: '//blog/post/1',
  search: '?source=rss',
}
```

I'm afraid there are more differences than opaque-hosts and with the
advancements in web3 applications, we will see more of these differences.

## Further reading

I recommend reading the following blog posts by Daniel Stenberg, the author of
cURL, mentioning the differences in RFC 3986 and WHATWG URL specification.

- [Don't mix URL parsers][dont-mix-url-parsers]
- [One URL standard please][one-url-standard-please]
- [My URL isn't your URL][my-url-isnt-your-url]

[ada-url-parser-v2]: https://www.yagiz.co/announcing-ada-url-parser-v2-0
[chromium-bug-report]: https://bugs.chromium.org/p/chromium/issues/detail?id=1291564&q=opaque%20host%20url&can=2
[dont-mix-url-parsers]: https://daniel.haxx.se/blog/2022/01/10/dont-mix-url-parsers/
[one-url-standard-please]: https://daniel.haxx.se/blog/2017/01/30/one-url-standard-please/
[my-url-isnt-your-url]: https://daniel.haxx.se/blog/2016/05/11/my-url-isnt-your-url/
[node-20]: https://nodejs.org/en/blog/announcements/v20-release-announce
[playground-opaque-host-example]: https://playground.ada-url.com/?url=yagiz://blog/post/1?source=rss
[opaque-host-pr]: https://github.com/whatwg/url/pull/185
[url-spec]: http://url.spec.whatwg.org/
[web-platform-tests]: https://web-platform-tests.org
[whatwg]: https://whatwg.org/
[w3c]: https://www.w3.org/

---

Authored by Yagiz Nizipli