Text Analysis Practice

Lots and lots of people have written lots and lots of explanations, and LLM training picks up on some statistical patterns in the words people write when explaining in response to other words.

2 Likes

LLMs basically predict the next word, token or letter based on statistical frequency and correlations in a large data set.

Like if the prompt is “rain” then, in the training data, the most likely next word might be “coat”, but other words like “sucks” and “is” also have probabilities. Based on the previous words before “rain” the LLM can make a better guess at which word should go after “rain”. Like if one of the previous words was “store” then “coat” is more likely, but if a previous word was “depressed” then that increases the probability for “sucks”.

2 Likes

I wanted to say what I thought was good from your paraphrase, but then I thought back to what ET said that there’s no right answer for which words to include and the goal is to do a reasonable job(link to him saying this).

For there not being a single right answer: is that because paraphrases would be the same either way in their meaning? And if you include more or less thing, are we talking about the same thing anyways?

Or is it because the goal is to have you think more about the original paraphrases? Or that whatever gets you thinking more and seeing new perspectives that that’s the right answer?

For the goal of doing a reasonable job: What is a reasonable job? What’s that for paraphrases? Is a reasonable job like you make an attempt to shorten the paraphrases and you find something that’s more grammatically essential?

Is a reasonable job like you strip down the paraphrases and that job is done within reason? For example, for the first sentence of the original paraphrase:

Childhood lead poisoning has declined steadily since the 1970s, when cars stopped spewing leaded exhaust into the environment and lead paint was formally banned.

I could say childhood lead poisoning has declined since 1970s, when cars stopped spewing leaded exhaust and lead paint was banned.

Was what I wrote a reasonable job? I excluded “steadily” cuz I think the reader gets the point that lead poisoning has declined at all and that’s a good thing and it’s something kind of easy to conceptualize.

I excluded ”the” from 1970s cuz I thought people would know I’m talking about 1970-present.

I also excluded “into the environment” cuz I think people already know where exhaust goes automatically. That’s cuz there’s lots of commercials and ads about how the environment contains lots of co2 form exhaust and so they’re able to automatically tell where any exhaust goes, even leaded.

I excluded “formally” cuz I think when people see that something is banned they know it’s illegal or not allowed.

I looked up what reasonable meant so I can have a better idea about how to check if something is a reasonable job or not:

From MW:

adjective,

being in accordance with reason

I thought I’d look up reason in one of ET’s articles(Introduction to Reason):

But how much attention do you give to using your mind in the best, most effective ways? How much effort do you put into good thinking? The best ways to think are called reason.

What does the best ways to think mean? Is it your best ways to get you to your goal? Is ET referring to the article’s definition of reason?

Probably he was talking about doing a reasonable job like how others say just try your best or don’t say anything outrageous.

Is reason the literal best ways to think that’s taken from all of mankind? Like, the best chess player has the best ways to think of playing the game and so they are the best in the world.

Is it situational? Cuz Im not gonna know everything the best chess player knows when im playing. But, I know I can look for my best way to think and maybe improve it through learning.

I’ll let you know what I think about your attempt tomorrow. Idk if I spent too much time doing this sorry. I could just say what I think but idk. I thought I wanted to be more informed about what a good paraphrase looks like and comment after that.

From my first look, I like that your paraphrases are blended together and weren’t just five separate sentences like mine. Like just cuz we’re doing grammatically essential breakdowns that doesn’t mean that we can’t make the sentences flow.

Here’s my updated discussion tree for our discussion:

Discussion Tree of Text Analysis Practice.pdf (65.3 KB)

Most of the recent stuff is to the right of the tree

For @Eternity’s paraphrase of the original:

The breakdown being two sentences makes it easier to read. Like, i know they’re teo complex sentences, but there’s less pauses between periods(there’s only 2).

Poisoning has declined is grammatically essential cuz it gets the point across that poisoning, something’s that’s bad, is going away.

Leaving in “when” in the first sentence is important cuz it shows the connection between poisoning declining and “cars stopped spewing exhaust”. Same goes for “paint was banned” and first clause.

I see u say what would be the point of paraphrasing for paraphrase 1. I would respond with what was the paraphrase’s goal? That would i think find errors in the paraphrase

I see you think it’s important to not leave out or to keep in information. I’m gonna write a pros/cons list for keeping info in a paraphrase":

Pros/cons list for keeping the same info in a paraphrase:

pros cons
No lost info too many words for ur reader
No changing the author’s words
You dont put it in ur “own words”. “Own words” is like u write the paraphrase ur way n not the author’s.
Keeping the same words like a quote You dont just talk about the important parts
U show the reader word for word what is said so they get the same “build up” while reading the ideas. The build up is to get the point or understand You’re not getting to the sweet part soon enough probably
You want the reader to judge for themselves about the original source You dont want to include everything
You dont know if youre leaving something important out. So much to write/type

Edit: in the pros/cons list this part should be in the cons side:

You dont put it in ur “own words”. “Own words” is like u write the paraphrase ur way n not the author’s.

Idk why it got flipped to the pros side

Idk if you mentioned this later but are you aware of what grammatically essential words are? Or, put differently, after doing that with AI did you come to learn what is considered grammatically essential? Or did you just do some pattern guessing?

How’s this organized?

A color corresponds to a person’s name. Correct?

The top node (in red representing Elliot) is the task Elliot shared and the child nodes are based on responses to that?

Hmm. I guess where I was surprised is because I just assumed their wouldn’t be enough information to make a pattern(?) on some topics. If an average “smart” college student can’t break down a passage from an old author, then I assume when that student becomes a professor they still can’t.

So, to check, what do you think me, you, and @Jarrod attempted? The assignment(?) shared by Elliot was to break down something into only its grammatically essential words, not too paraphrase. We aren’t doing a paraphrase here. We are evaluating the paraphrases done by someone else.

If you’re talking about what Elliott said:

There isn’t a perfect right answer in what to include in our stripped down versions.


I saw that you shared your thoughts on what’s essential and responded to mine. I’ll comment on those soon.

I think i did some pattern guessing. The idea I got was that grammatically essential meant just subject+verb+object/complement. After looking at one of your grammatically essential wordings, I think now that putting in conjunctions is grammatically essential as well.

I try to split clauses from a sentence and relate them to the more important clause. Sometimes, I try to make a clause out of an adj or modifier to help me break things down more and see what it says about a clause or word from a node.

Yea the color corresponds to a person’s name. I think i messed up a few times miscoloring someone

I thought our stripped down versions counted as paraphrases. I thot tht cuz I thought changing the qoutes in any way like taking its subject, verb, then object(stripped down version) counted as a paraphrase. I thought that was what paraphrase meant. I did a quick google search n after reading gemini ai’s def of a paraphrase, im not too sure i knew what a paraphrase is or counts as.

Oh ok i thought our stripped down versions were paraphrases