from Hacker News

Show HN: Analyzing GPT-4 Tokens with Llama3

by vnglst on 5/2/24, 3:05 PM with 0 comments

Inspired by Andrej Karpathy's excellent YouTube video on tokenizers, I used Llama3 to analyze all 100,000 GPT-4 tokens. The results were somewhat expected — a strong focus on English and code. Interestingly, only 124 tokens were dedicated to my native Dutch, which might explain why it underperforms in that language.