Oh I think I kind of see what you’re saying. I use question marks and periods and letters to make words and sentences.
I wanna look up atomic to get more of what you’re saying. I found oxford languages define atomic like this:
Does that mean that english character symbols and punctuation are irreducible? Itd be nice to know what irreducible means, but in a way I don’t see why I would need to break down a period or english letter to make a sentence or paragraph. Like, I can access those symbols and punctuation to make up any word, sentence, or paragraph.
So if we look at LLM’s making sentences(particularly Tokenizer OpenAI):
So the atomic units I as human would use to make the sentence in the photo above are things like “k”, “t” “e”, “.” OpenAI would use atomic units such as “I”(i), “ like”(space before like), and “.” It would use those units to read or form words, sentences, etc.
This is what I think I’m missing from trying to understand all this:
- do we use atomic units? is that correct way to say that?
- Do LLMs construct words and sentences too? Like humans do? that sounds obvious
- What atomic units are
- What atomic units do
- How to apply the phrase “atomic units” in the English language and AI tokens
- Can I replace “AI tokens” and “Egnlish symbols and punctuation” with “atomic units”? Like, can I use them synonymously?
- What this all means as a whole. Like relate the idea of atomic units to alphabets.

