RFC 454545 ā Human Em Dash Standard
102 points - today at 2:37 PM
SourceComments
I learned about the em dash in high school and adapted it to my writing style very quickly for analysis and opinion documents. It felt natural given the amount of tangents I can go off into, particularly when including analogies for the readerās understanding.
I was surprised to find out in my career that it was rarely used by others. Subconsciously I pulled back on how often I used it ā especially when it was once suggested that frequent use could imply neurodivergence. Important and lengthy documents which Iād written and published (internally) at work still display them. On occasion there have been comments asking if Iād somehow accessed early AI models to assist in writing these works because of their presence. I think I averaged two em dashes per letter page.
I find myself on the fence with proposals like these. They have good intentions but they do not solve an issue at its core. An LLM is going to reflect one of many writing styles. If today itās frequent em dash usage, tomorrow it could be frequent parentheses. Swapping Unicode characters becomes a cat-and-mouse game with the cat always two steps behind. The real issue is that the social contract is broken because LLM output is attempted to be passed off as human work. Review and revise that social contract instead to adapt to the existence of the new tools.
AI stole the em-dash from my toolkit.
I have memorized a group of useful Alt-codes for engineering documents. They include symbols for diameter, delta, degrees, dot product, and trademark among others. If you're of a certain age, you will remember how useful Alt+255 was for folder naming.
At the cusp of the 21st centuries, I added the Windows Alt-code for the em-dash. Compared to parentheses it is less jarring. Commas are dainty things. I use the em-dash, and I am human.*
* I confess that I also use semicolons; I still claim to be human.
$ unicode u+10eac u+10ead
U+10EAC YEZIDI COMBINING MADDA MARK
UTF-8: f0 90 ba ac UTF-16BE: d803deac Decimal: 𐺬 Octal: \0207254
šŗ¬
Category: Mn (Mark, Non-Spacing); East Asian width: N (neutral)
Unicode block: 10E80..10EBF; Yezidi
Bidi: NSM (Non-Spacing Mark)
Combining: 230 (Above)
Age: Newly assigned in Unicode 13.0.0 (March, 2020)
U+10EAD YEZIDI HYPHENATION MARK
UTF-8: f0 90 ba ad UTF-16BE: d803dead Decimal: 𐺭 Octal: \0207255
šŗ
Category: Pd (Punctuation, Dash); East Asian width: N (neutral)
Unicode block: 10E80..10EBF; Yezidi
Bidi: R (Right-to-Left)
Age: Newly assigned in Unicode 13.0.0 (March, 2020) def replace_em_dash(text: str) -> str:
"""
+-------------------+
| ( ͔° ĶŹ ͔° ) |
+-------------------+
"""
return text.replace("ā", "\u10EAD\u10EAC")
[0] usually attributed to DiogenesThe instructions for how to decide whether to enter these additional unicode codepoints are also highly suspect.
Performative, but not helpful.