This is a rough transcript of the talk I gave at Web Directions Summit in Sydney on October 31st, 2019 (Halloween). I say rough transcript, because it’s based on the script I wrote during preparation. The words I actually end up saying during a talk are sometimes only loosely correlated with the original plan.
The video recording of the talk is available on the Conffab video service (which requires sign up).
The slides I used have been added next to the relevant text to add a sort of visual vibe. Obviously they were designed for a giant screen, I’m not expecting them to be super-readable on personal devices.
Hello, today I’ll be talking to you about internat…
Err… supporting users who live where you don’t!
“Internationalisation” is hard to say and hard to spell — we can’t even agree on whether it contains an S or a Z. It’s much easier shorten the term to “i18n”, because there are 18 letters between the first and last letters. The same happens with “localisation”, which becomes “L10n”.
Fun fact: A word that’s been shortened that way is known as a “numeronym”. Or, as I prefer, “n7m”.
Ok, but what is internationalisation? Simply put, it’s about displaying words and numbers in a way that makes sense to different people around the world.
Localisation in computing is based on the concept of locales, unsurprisingly. They’re identified by standardised set of characters, most commonly just using language and country codes.
But a locale is not a language, and a locale is not a country. Also, a language is not a country, which is why you shouldn’t use flags to represent languages. (See also: Falsehoods programmers believe about flags)
A locale represents a region that shares a group of conventions. Language is just one of those conventions. There’s also the character set and alphabet people use for that language, how they write dates and numbers, in what calendar system… the list goes on.
Canada is a great example here, because English Canadian and French Canadian standard formats are different from each other, and different again from the formats used in England and France.
Getting to the point now… let’s walk through a scenario. You have a web-based product or service built initially just for your own local market. But now you want to expand, and you want things to still make sense for people in other countries.
An inevitable question that comes up early in this thinking is “what about translations”? Yes, to target other languages properly you’re going to have to translate all your text into those languages. However, full text translation is not what I’ll be talking about today. That’s a can of worms that deserves a whole presentation all to itself.
But other than translations of prose, there’s a bunch of other things that need to be taken care of. Dates and times, currencies, measurements — all these things differ across locales.
Then you’re also going to need a way to use the data neatly. And so we turn to myriad open source libraries to handle the complicated work for us.
But here’s the catch: the wider you cast your net of language support, the more likely it is that your users are running devices that don’t handle all that JS. To steal the phrase coined by Bruce Lawson, this is the World Wide Web, not the Wealthy Western Web.
Not everyone can afford the super-powered devices people like us tend to use for development. And all of that data you’re sending to their low-powered device has a real monetary cost.
So now it becomes a balancing act, with perfect international support on one side of the scales, and small JS bundle sizes on the other.
Operating systems and native applications built on top of them get their locale data from a veritable alphabet soup of standardised providers:
- International Organisation for Standardisation [ISO].
- Internet Assigned Names Authority [IANA].
- Internet Engineering Task Force [IETF].
- Unicode Common Locale Data Repository [CLDR].
- International Components for Unicode [ICU].
- World Translation Foundation [WTF].
- (Except not that last one, I just made it up.)
When you set your operating system to a particular language and region combination, it knows how to display numbers according to those settings.
Wouldn’t it be great if you could use the exact same settings and standardised data in your web application?
But what are these APIs, and why should you care? There are several different APIs, and more are being proposed. They can be roughly sorted into 2 categories:
- Everything else.
Before we jump in to how to use them, let’s talk a little about when and why you’d use them.
Simply put… do you display numbers anywhere in your application? Great! That’s where you’d use these APIs.
We’ll start with the formatting APIs, as they’re generally the ones you’ll need to use the most.
Before we go too much further though, I need to give a great big disclaimer. I’ve played around with these APIs in toy projects, but I haven’t properly used any of them in a shipped production system. So, taking inspiration from the legion of financial service products around these days, here’s my giant disclaimer:
“This presentation’s advice is general in nature and may not fit your specific circumstances. Talk to your front-end advisor before committing to these APIs. Past performance is not a reliable indicator of future performance.”
All of the formatting APIs follow the same general pattern. You create a new formatter for a specific locale with a bunch of options.
new Intl.ThingFormat(locale, options)
Then you can call the
format() function on that formatter instance as much as you like, to produce strings in that chosen format.
Additionally, they all have the
formatToParts() method, which returns a list of formatting tokens instead of a single string. This allows you to break down a formatted string into its component parts for further processing.
I’m not going to go into too much detail for these — the idea is to give you a taste of what they can do, as a launching pad for further research. I’m never going to be able to properly cover all the nuances in a 20-minute talk. So, despite the fact that this is a talk focusing on a bunch of APIs, there’s actually going to be very few code samples.
First up, number formatting. Amazingly enough, you do this through an object called
What’s so special about number formatting though? You shove some commas in for the thousands separators, round the decimal places, and you’re done, right?
Ah, not so fast. As anyone who’s visited Europe will know, in many countries the thousands and decimal separator characters are the other way around from how we use them in Australia. And that’s just a minor difference.
How about places that put the separators in different spots, like India? Or use different numerals entirely such as Arabic or Japanese?
Then there are currencies. Some currencies are traditionally written with the symbol before the number, some after. It varies by country and region, even for the same currency.
In this slide, the same value of Euro currency is formatted differently in Australia, Britain, France, Germany, Austria, and Switzerland.
But hang on, you might say (in this hypothetical conversation that we’re apparently now having), I’ve already been using a library to do number formatting. It’s really tiny too, only 8.2 femtobytes, so it barely adds anything to my JS bundle!
And you’re right, there are some open source libraries which handle this problem, such as Numeral.js and d3-format. But no matter how small they may get, a file size cost is still a file size cost, compared to an API already built in to the user’s browser. And the more locales you support, the more data need to be loaded as part of the library. As soon as you include more than one locale’s data, by definition you’ll be loading code that isn’t needed by some users.
But even if you’re lazy-loading code using a brilliant (and therefore complicated) strategy to optimise that locale data, how can you be sure it’s correct? Every open source library defines locale data in custom formats, most often sourcing that data from community-provided translations.
As I mentioned earlier, there are already standard datasets that provide this information… but it’s very rare that a 3rd-party library will actually process and use these datasets. So sometimes, the libraries disagree with each about a particular locale, or they disagree with other applications on your computer.
Using ad-hoc locale data from an open source library is a bit like using Wikipedia as a reference. It’s probably correct, but you can’t really be sure.
As I said before, the Intl APIs load the exact same datasets already included in your OS. So not only is there zero download cost for the data, but it’s also guaranteed to be more consistent.
Although a special mention here must go to the FormatJS and Globalize projects, which do use the CLDR data. They aren’t listed in the file size comparison because they cover more than one formatting domain.
Right, number formatting got a little bit out of hand. This time let’s try something that’s so dead-simple, absolutely no-one has ever got it wrong in the history of computing… dates and times!
OK, so maybe it’s a little complicated. Day-first? Month-first? Names for days and months? Or numbers? Or both? In which order? With what separator characters? And is it 2 or 4 digits for the year? 12- or 24-hour time? For which time zone? In what language? Would you like fries with that?
Many of those choices are yours, but others depend on the locale of the user. Are you absolutely sure which is which? Yes, this is a complicated area, and the
Intl.DateTimeFormat API has a dizzying list of options and configurations, which I won’t really dig into here due to time constraints. (Also, I prefer my audience to still be awake by the end of the talk.)
The key thing is that you choose what you want to display — month names yes or no, that kind of thing — and the API will work out how to display it, based on the chosen locale.
Now this is definitely one area of the web that has no shortage of existing libraries, with MomentJS being the most well-known.
One criticism of Moment that often pops up is that it’s quite large. Including the Moment code and all of its locale data adds 230 KiB to your page, or 65 KiB with compression, and that’s not even including time zone support. Part of the reason is that Moment is an all-in-one library, so even if you only want the formatting parts, you still have the date parsing and calculation code coming along for the ride.
Date-fns is an alternative that uses a modular design, so you can include only the formatting function and nothing else. But even that is 6 KiB compressed before you add the locale data on top.
But here’s the thing. Even if you optimise your build process so you’re including the smallest possible formatting function, with the smallest possible locale dataset, there’s still a glaring problem. Most of the formatting libraries make you define the exact format, and they just provide the right numbers and words to go in the slots you’ve chosen. But can you guarantee that the format you’ve chosen is appropriate for every locale? To really hammer on this point, different regions have different expectations about how dates look, both in short and long forms.
Newer libraries such as Luxon are taking a different approach though, and are built as wrappers around the
Intl APIs. Then they’re free to only worry about calculation logic, while delegating all the formatting responsibilities to the browser-provided APIs.
Moving on… but not very far. The next API to mention is still within the domain of dates and times. Specifically, it’s the
RelativeTimeFormat API, useful for all those times when a UI has to say a post or comment happened “yesterday” or “3 days ago”, or “that was so last year, whatevs”.
This is a more recent addition to the spec, so there are some limitations. It’s much easier to start with a small API and add features in later, than to start big and realise half of it is wrong (like the current
Date object). So currently it only works with pre-calculated numbers and a specified time period.
For example, you can tell it to return a localised string representing “2 months ago”, but you can’t get more than one type of unit in the same string, like “2 months, 5 days ago”. You also can’t pass in a
Date object and have the API give you a difference from “now” — you’ll have to calculate that yourself.
Honestly, given the current limitations, you’re probably better off still relying on one of those libraries today. But it’s good to know what’s coming down the pipeline.
Right, that’s it for the big formatting APIs. Actually, that’s a lie — there is one more, the
ListFormat API, but it’s very new. It’s only supported by one browser, and the spec hasn’t quite been finalised yet, so it’s best left for a future talk.
I mentioned earlier that the Intl APIs effectively divide in 2 categories. We’ve now covered formatting, so let’s look at the rest.
The remaining 2 APIs are
PluralRules actually ties in much more with full-text translations, so I’m going to pretty much ignore it for today, and move straight on to the last one.
Here’s a common UI scenario. You have a long list of items — sorted alphabetically — and a text input to allow people to quickly filter the list. How would you implement the sorting and filtering logic?
How many times have you written a filter or search method like this? Grab 2 strings, convert them both to lowercase, so the search is case-insensitive, and see if one contains the other. I’ve done this in many different projects.
Likewise with the alphabetical ordering — just use
.sort() on the array, right? These two techniques are fairly common and obvious, and personally I naïvely assumed they were good enough.
But what happens when your users don’t just use the basic Latin character set we have in English, and have different expectations around ordering and matching? Or simpler still, what if your list of strings also contains numbers, and you want to sort them according to numeric order, as well as alphabetical?
For each language and character set, there are well-defined rules about how to sort them, and which characters can be matched with others. For example, matching “a” with all the accented variations of “a”. This set of rules is known as collation, and thus we have a handy API available to us in the form of
You create a
Collator object with the options you need for a specific use case, then call the
.compare() method on it.
For sorting, the
.compare() method can handily be passed directly to the array’s
.sort() method as a custom sort implementation. (This is not an accident.)
For searching and filtering, the
.compare() function can also be used to find matches within a list. But there is a catch: currently you can only check for exact matches. If you want to search for strings which contain your search term anywhere within the string, this API can’t help just yet. (I might end up proposing a new addition to the spec for this use case.)
By this point, many people would have already been thinking “yes, that’s all well and good, but what about the browser support? Can we actually use this in reality?”
For many of the APIs, yes you can.
RelativeTimeFormat APIs are pretty new, so support is still patchy. But the others are supported by all current major browsers. For those who use Node.js for rendering, the Node support is pretty good too. But your installed version of Node might not have all the locale data by default. To include all the standardised data, you might have to recompile Node yourself.
Note that I said “current browsers”, but that leaves one glaring omission — what about the elephant in the room (or IE-lephant, as I prefer)? Yes, good old Internet Explorer 11. You might still have to support IE11, because Microsoft still does, despite what we (or they) might want.
IE’s life cycle is tied to Windows 10, which is still receiving 6-monthly major updates. Which means that for the foreseeable future, IE11 is the browser that will never die.
Although speaking of the undead, today is Halloween, so perhaps an IE pumpkin is more appropriate…
Well, surprisingly enough, IE11 supports several of these APIs. The first version of the ECMA-402 spec was released 7 years ago, and IE11 implemented it. It can handle
DateTimeFormat, but not arbitrary time zones.
Maybe you don’t even care about IE11 and just want to use one of the newest APIs that isn’t fully supported yet. Luckily there are polyfills available for most of the
Intl APIs, and a special mention must go to the FormatJS library which provides and builds upon many of the polyfills. But hang on, doesn’t loading a full polyfill and its associated locale data just take us right back to square one with data costs?
Well… yes, quite frankly. Using a polyfill comes with the same filesize problems as using a different open source library. There are 2 slight advantages though:
- The polyfill data files come from the same root source as the in-built browser APIs, so there is consistency.
- If you reach a future point where all your supported browsers have implemented the specific API you’re polyfilling, you can just drop the polyfill with no other changes to your code.
Ultimately, though, the choice of using or not using a polyfill is not one that I can make for you. Like so much in the world of development, the primary answer is “it depends”.
Now I don’t want anyone thinking that I’m insisting that everyone starts using these APIs at all times. You might still only be focused on a single-country, single-language market, where these APIs would be complete overkill. That’s a perfectly valid choice.
I also don’t want you to think that I’m denigrating all open source libraries. Many of them are still incredibly useful, and do far more than just the parts covered by the APIs I showed today.
What I want is for all of you to be better informed about some of the things browsers let you do right now. Then you can make more informed choices about whether to add Yet Another Library to your project. You might weigh up the pros and cons and decide that, yes, that library is still worth adding in. And that’s a perfectly valid choice.
But equally, you might look at your current uses of particular libraries and think “hmmm, maybe we don’t need all of that after all”.
The choice is yours.