by subset on 6/26/23, 12:32 PM with 119 comments
by chrismorgan on 6/26/23, 12:51 PM
Previous discussions:
• https://news.ycombinator.com/item?id=21105625 (29 September 2019, 542 points, 169 comments)
• https://news.ycombinator.com/item?id=30330144 (14 February 2022, 399 points, 153 comments)
—⁂—
If you find this article interesting, you’ll probably also like the article someone else published a month later: “Text Editing Hates You Too” <https://lord.io/text-editing-hates-you-too/>. It has also been discussed here a couple of times:
• https://news.ycombinator.com/item?id=21384158 (29 October 2019, 875 points, 282 comments)
• https://news.ycombinator.com/item?id=27236874 (21 May 2021, 384 points, 182 comments)
by knuckleheadsmif on 6/26/23, 4:52 PM
What this article misses is a lot of the complex formatting issues also with text rendering. For example even implementing tabs can be complex (remember that you have different tab types and conflicting constraints like keeping something decimal or left aligned but now having enough space to do that with out overlapping characters.) In languages like German if you have a computed or soft, hyphens the spelling of a word can change. Good paragraph breaking, widow orphan control, page breaking, computed header/footers that can change heights… are also complex issues that have to be dealt with.
Back when I worked on this stuff we also had much slower computers that made things even more difficult so that you could type anywhere in the paragraph and still have responsive and correct output (you can’t delay formatting if the chargers change on context although some formatting can be delayed.)
by heleninboodler on 6/26/23, 8:05 PM
by aidos on 6/26/23, 1:00 PM
It gets even more wild as you descend down into the myriad ways this information is specified within fonts. For bonus points dig into how fonts are embedded within PDFs.
There’s definitely something wrong with me because I find the whole thing fascinating. Picking through the specs and existing open-source code feels more like an archaeological crusade than anything else.
by dahwolf on 6/26/23, 8:58 PM
We couldn't get it to look right on Windows. Only at a few select font sizes and weights did it look OK, at every other setup it looked like somebody took random bites out of the glyphs, meaning the individual characters had inconsistent weights.
On Apple devices, we had no such issues. It seems their anti-aliasing algorithm intervenes more deeply, prioritizing a good and consistent result over theoretical integrity.
You might have this same issue with thin fonts (this wasn't a thin issue) looking good on Mac and unreadable on Windows. A simple way to put it is that Mac renders font thicker (more black).
I've faced the issues above for multiple custom fonts. It's why I no longer believe in the supposed best practice of fluid font sizes. I hardcode a limited set of font sizes and only those proven to look good across systems.
Oh, another fun one is the glyphs of fonts having internal padding or asymmetric padding. Which messes with line heights and vertical centering. Or how about needing to tweak letter-spacing as characters overlap.
I've come across all of this and more for some widely used fonts. Most are anything but ready-to-go.
by plg on 6/26/23, 1:08 PM
The corollary is, if I’m writing a document that I know others will be judging (e.g. a research grant application or a scientific publication) I will absolutely 100% do what I can to make it a more pleasant experience for the reader, including using LaTeX for text rendering. I may change (or not) the default font, but for text rendering and all the tiny decisions about spacing, kerning, etc, I will trust LaTeX over MS Turd any day.
Sure, “it shouldn’t matter”, a reader ought to judge my ideas and not be affected by this other stuff —- but, alas, the reader (for now ;) ) is human, and we are affected by this stuff.
by barbariangrunge on 6/26/23, 3:44 PM
Fonts are pretty much just third or fourth degree beziers, plus a way to shade between the lines iirc (i may have my terminology wrong). Try it out sometime, I did my curves using tessellation shaders.
Btw, you'll never find a better guide on beziers than here:
by aatharuv on 6/26/23, 8:19 PM
Devanagari (roughly speaking) has Consonants (with an inherent vowel A), dependent vowels signs added to a consonant, and independent vowel letters. And a few other signs for aspiration, nasalization, and cancelling the inherent vowel to combine consonants.
न्हृे Starts with N (the consonant NA with the inherent vowel A cancelled and ends with the Devanagari Consonant HA with _two_ vowel signs added to it - DEVANAGARI VOWEL SIGN VOCALIC R and DEVANAGARI VOWEL SIGN E
CV is the standard for Devanagari "syllables". When you do want to write two vowels after each other, you would write CV, and then another independent Vowel letter, so it would look like
न्हृए (which would end with DEVANAGARI LETTER E instead of DEVANAGARI VOWEL SIGN E)
*These are syllables as per the script definition rather than linguistic syllables.
https://www.unicode.org/versions/Unicode15.0.0/UnicodeStanda... section 12.1 has more details on the specifics of implementing Devanagari script, but not necessarily all of the conjunct forms between consonants, which are used especially when rendering Sanskrit.
by putlake on 6/26/23, 12:54 PM
There is no incantation combination of CSS properties word-break, word-wrap, overflow-wrap, hyphens and white-space that will do this. In 2023.
I believe word-break: break-word does #1 but it's not hyphenating for me. And MDN says word-break: break-word is deprecated.
by RcouF1uZ4gsC on 6/26/23, 1:12 PM
It had a language and alphabet that was amenable to a relatively simple encoding combined with a massive market of people that didn’t care if anything else worked.
Thus even the very slow and limited memory computers of that time could actually do useful text manipulation (like sorting phone books) in a reasonable amount of time.
by Izkata on 6/26/23, 2:00 PM
I'm guessing this was written around when Edge still used its own rendering engine, and nowadays it looks like Firefox and Chrome.
by einpoklum on 6/26/23, 7:46 PM
1. The LibreOffice project (libreoffice.org), the free office application suite. This is where the rubber hits the road and developers deal with the extreme complexities of everything regarding text - shaping, styling, multi-object interaction, multi-language, you name it. And - they/we absolutely need donations to manage a project with > 200 million users: https://www.libreoffice.org/donate
2. harfbuzz (https://harfbuzz.github.io), and specifically Behdad Esfahood the main contributor. Although, TBH, I've not quite figured out whether you can donate to that or to him. At least star the project on GitHub I guess.
by tayistay on 6/26/23, 3:14 PM
1. "Retina displays really don’t need [subpixel AA]" So eventually that hack will be needed less often. Apple has already disabled subpixel AA on macOS AFAICT.
2. Style changes mid-ligature: "I’m not aware of any super-reasonable cases where this happens." So that doesn't matter.
3. Shaping is farmed out to another library (Harfbuzz)
4. "Oh also, what does it mean to italicize or bold an emoji? Should you ignore those styles? Should you synthesize them? Who knows." ... yes, you should ignore those styles.
5. "some fonts don’t provide [bold, italic] stylings, and so you need a simple algorithmic way to do those effects." Perhaps this should be called "Text Rendering for Web Browsers Hates You." TextEdit on macOS simply doesn't allow you to italicize a font that doesn't have that style. Pages does nothing. Affinity won't allow it.
6. Mixing RTL and LTR in the same selection also seems like a browser problem, but I guess it could maybe happen elsewhere.
7. "Firefox and Chrome don’t do [perfect rendering with transparency] because it’s expensive and usually unnecessary for the major western languages." Reasonable choice on their part. Translucent text is just a bad idea.
Probably could go on. It's a good discussion of the edge cases if you really need to support everything, I suppose.
by jml7c5 on 6/26/23, 8:23 PM
Many layout and shaping issues are present even in monospace text, at least if you want to do anything beyond ASCII.
by phkahler on 6/26/23, 2:06 PM
Well, for the languages used in newspapers for hundreds of years I'd say "sure you can". Just because we can do better than that with computers doesn't mean it's wrong.
Sure text rendering is very complex, but this is also a first world problem.
by PaulHoule on 6/26/23, 1:11 PM
Today I can’t set a serif font for display in print and stand to look at the results if I don’t kern manually, this is true if I use Powerpoint, it is true if I use Adobe Illustrator and also on PC or Mac. I think most people just give up and use a sans-serif font. The thing is I don’t remember having the problem with desktop publishing in the 1990s, maybe I was less picky then but I’d say the results I get setting serif headlines today are so bad it is beyond being picky, also when I see posters that other people make people seem to just not use serif fonts anymore so I think they feel the same way.
by thworp on 6/26/23, 2:52 PM
Many terminal emulators have greyscale anti-aliasing as an option or default, I know and have tested:
- Windows Terminal (Windows) where you can set it in settings.json
- kitty (linux) where it is default
- You can probably get it globally in Linux through fontconfig settings, but I've never loked into that.
If you're on a display at or above 200 ppi (4k at 27") you can also disable font anti-aliasing completely. The effect that has on letter shapes (esp. in Windows) is pretty striking.
In chromium-based browsers and firefox there is a changing collection of about:config setting that control their Direct
(edited to fix line breaks)
by Asooka on 6/26/23, 10:39 PM
Joking aside, I am one of those people that completely disagree with subpixel anti-aliasing. I wish I could turn it off completely everywhere on Windows. I can do it on GNU/Linux and macOS never had that. It always looks wrong to me, regardless of monitor used or settings I pick, like the letters have colour fringing. I hated it when Microsoft introduced Clear Type in Windows XP and I still can't stand it.
by ggm on 6/26/23, 11:15 PM
the "may literally appear as" is actually EASIER TO UNDERSTAND
its a person, of colour, who is a female. The single emoji is almost unviewable on my monitor at any precision and I could not ascribe feminine qualities to it, I almost was unable to ascribe personhood qualities.
This to me is the central problem of emoji: They actually suck at conveying meaning, where alphabets and words do really really well.
by Calzifer on 6/27/23, 12:08 AM
Quite recently I noticed that Java2D text rendering cheats the same way as Firefox and Chrome described in the overlapping example.
The character Ø can be expressed in Unicode in two ways. Either as single code point U+00D8 or as an O + combining slash (U+0338). Since the first one is one code point Java2D renders it correctly with transparency but the second variant as two characters with notable overlap since Java is lazy on the 'combining' part.
by chris_wot on 6/26/23, 2:46 PM
* segment up text into paragraphs. You’d think this would be easy, but Unicode has a lot of seperators. Heck in html you have break and paragraph tags, but Unicode has about half a dozen things that can count as paragraph seperators.
* parse text into style runs - each time text font, color slant, weight, or anything like this changes you add it to a seperate run
* parsing the text into bidirectional runs - the text must work out the points at which it shifts text direction and place them into a new run at each shift of direction
* you need to figure out how to reconcile the two types of runs into a single bidi and style run list.
Do t forget that you might need to handle vertical text! And Japanese writing has ruby characters that are characters between columns.
* fun bit of code - working out kashida length in Arabic. Took one of the real pros of the LO dev team to work out how to do this. Literally took them years!
* you then most work out what font you are actually going to use - commonly known as the itemisation stage.
This is a problem with Office suites when you don’t have the font installed. There is a complex font substitution and matching algorithm. Normally you get a stack of fonts to chose from and fallback to - everybody has their own font fallback algorithm. The PANOSE system is one such system when they literally take a bunch of text metrics and use distance algorithms to work out the best font to select. This is not universally adopted and most people have bolted on their own font selection stack, in general it’s some form of this.
LibreOffice has a buggy matching algorithm that frankly doesn’t actually work due to some problems with logical operators and a running font match calculation metric they have baked in. At one point I did extensive unit testing around this in an attempt to document and show existing behaviour, I submitted a bunch of patches and tests piecemeal but they only decided to accept half of them because they kept changing how they wanted the patches submitted and then eventually someone who didn’t understand the logic point blank refused to accept the code - I just gave up at this point and the code remains bug riddled and unclear.
On top of this, you need to take into account Unicode normalisation rules. Tricky.
* now you deal with script mapping. Here you take each character (usually a code point) and work out the glyph to use and where to place it. You get script runs - some languages have straight forward codepoint to glyph rules, others less so. By breaking it into script runs it makes it far easier to work out this conversion.
* now you get the shaping of the text into clusters. You’ll get situations where a glyph can be positioned and rendered in different ways. - in Latin-based languages and example is the “ff” - this can be two “f” characters but often it’s just a single character. It gets weirder with Indic characters which change based on the characters before and after… my mind was blown when I got to this point.
This gets complex, and fast - luckily there are plenty of great quality shaping engines that handle this for you. Most open source apps use HalfBuzz, which gets better and better with each iteration.
* now you take these text runs, and a lot of the job is done. However, paragraph separation is not line separation. You have a long enough paragraph of text and you must determine where to add breaks i lines of the text - basically word wrapping.
Whilst this seems very simple, it’s not because then you get text hyphenation. This can vary based on language and script.
A lot of this I worked out from reading LO code, but I did stumble onto an amazing primer here:
https://raphlinus.github.io/text/2020/10/26/text-layout.html
The LO guys are pretty amazing, for the record. They are dealing with multiple platforms and using each of the platforms text rendering subsystems where possible. Often they start to standardise - but certainly it’s not an easy feat. Hats off to them!
by mnutt on 6/26/23, 8:21 PM
(the issue ended up involving harfbuzz, but it wasn't clear initially since it dealt with HTML whitespace collapse)
by Rapzid on 6/27/23, 1:11 AM
Like seriously, font rednering is bonkers complicated.
by cyclotron3k on 6/26/23, 2:32 PM
by n6h6 on 6/26/23, 4:21 PM
by WhereIsTheTruth on 6/26/23, 2:19 PM
https://github.com/ryuukk/linux-improvements/blob/main/chrom...
by jokoon on 6/26/23, 3:40 PM
I use them in a few places, nothing more, and it's a bit difficult to make a good looking bitmap font, but it's an important part of having software that use simple things.
by chungy on 6/26/23, 5:08 PM
by AtNightWeCode on 6/26/23, 7:15 PM
by z3t4 on 6/26/23, 2:15 PM