Class 26: Randomization Does Nothing

William M Briggs

Oct 28, 2024

Jaynes’s book (first part): https://bayes.wustl.edu/etj/prob/book.pdf

Read →

10 Comments

Agustín SáncheZ-Cobos

Oct 30

Interesting... I have to pay attention to this.

Thank you for sharing!

Expand full comment

Gunther Heinz

Oct 28

See, this is what makes me an idiot. Reading into the initial paragraphs of the lesson, it did NOT occur to me, as it should have, that the experiment was obviously and absurdly unnecessary. Why? BECAUSE I WAS CONCENTRATING ON THE CORRECT ANSWER, and not on the lesson. The CORRECT ANSWER was all by itself, supposed to lead me to COMPETENCY, just as the leash leads the dog to the park, in the mind of the dog. That's why the dog wags its tail when you pick up the leash! There are literally tens of millions of idiots like me, so I have a lot of empathy. Friendly is the teacher that gives you the correct answer while you`re taking an exam.

Expand full comment

A Whip of Cords

Oct 28

“Thus randomizing, if it means anything, means removing control.” Control > Randomizing. Thank you for helping me understand this. (BTW, your wooden vs metal crosses example was priceless. Bravo. )

Expand full comment

Reply (1)

Paul Fischer

Oct 28

Randomization is done to minimize bias by randomly distributing bias across the sample. The problem is you never know if that actually occurred. But, it's the best we have. I see no alternative.

Expand full comment

Paul Fischer

Oct 28

Hmm. So in clinical trials, does randomization truly add value if we can’t be sure it balances out all biases? We also can’t control for hidden biases or traits—by definition, we don’t know what they are. But without randomization, we know for certain that bias can creep in, potentially undermining the entire experiment. So, how do we actually minimize these unknown biases?

This same dilemma shows up even more blatantly in polling and surveys. In my view, modern polls have become rubbish because polling companies now rely on online paid survey takers (“panels”). This group hardly represents the entire U.S. population it only represents the population of paid survey takers. What we are after is a representative cross-section of the U.S. population.

And here’s the kicker: this issue of hidden biases has always been there, whether we notice it or not. When I was doing polling back in the ’70s and ’80s, for example, how would I have known if I was primarily reaching out to white male MBAs? If certain groups are unintentionally overrepresented, then even a “random” sample might not really be representative.

Honestly, Matt, I think we’re left with randomization as a tool to at least try to reduce some biases—or in polling, to aim for a more representative sample. What better approach is there? We can’t control for unknown biases, so does randomization remain our best option?

Expand full comment

Reply (1)

William M Briggs

Oct 28Edited

No, control does. And the prevention of cheating.

Put another way, since you don't know what the biases are, "randomly" allocating subjects does not guarantee equal mixing. Indeed, there is just as much chance you get equal mixing, or all possible biases, by just taking subjects as they come.

But nothing beats control.

Expand full comment

Reply (1)

Paul Fischer

Oct 28

How can you apply controls to a political poll? You can’t. Even in clinical trials, where there’s much more control, countless hidden variables still escape us. Say you run a trial with two groups of 14,000—how are you supposed to control for the googol of hidden factors? It’s just not possible. Randomization is the only tool we have that spreads out those hidden variables even a little.

Expand full comment

Reply (1)

William M Briggs

Oct 29

They do try. For instance, they sample the number of Ds they think will vote, and then Rs. They call by "block", trying not to miss areas which might be causally influential

The point is randomization does not do what you suggest. The hidden factors are hidden, unknown. There is no way to prove randomization has not put all of one, say, in one group, and none in the other. There is no informational difference between jut taking subjects as they come and randomizing them.

There is no magic happening with mixing. It doesn't hurt, as I say, and helps to guard against cheating. But it's not buying anything.

I think I have an idea how to prove this with an illustration. I will make it into a Class.

Thanks.

Expand full comment

Reply (2)

Paul Fischer

Oct 29

That's not been my experience. In fact, I’ve proven this repeatedly—randomizing a survey sample makes a huge difference. Over those years I conducted maybe a hundred survey research projects. Back in the ’70s and ’80s, we did these surveys regularly—monthly, per semester, and annually—with excellent response rates. Today? It’s a completely different story; now, it's entirely garbage. But back then, we were diligent about randomizing, and it paid off. When we tried sampling without randomizing, the results were highly skewed, completely unrepresentative of the population we were targeting.

That’s how it used to work, anyway. I think we're talking about two different things here—this isn’t like a controlled clinical trial where you can manage every variable. I hated doing survey research, but we had to; so much of the soft science faculty were set on it, and, frankly, they’d botch it without proper rigor. Still, I’ve always felt that asking people questions isn’t science. Now it’s become an absolute joke using paid survey takers.

It seems you’re talking about the essence of randomizing I’m talking about trying to obtain a representative sample. I suppose we could have compiled a list of people that responded and just use them every time we wanted to ask questions. That’s essentially what they are doing now. But then you’re omitting a large portion of other opinions. You see if what you’re after is someones opinions it doesn’t much matter what their “hidden factors” are. Everyone has different “hidden factors” so what. They also have different opinions.

Expand full comment

William M Briggs

Oct 29

Success!

I have coded up a neat little example. I'll provide the code and explanation i the class after next.

Idea is simple: subjects come "pre-randomized" (as it were) in the hidden attributes.

Expand full comment

Science Is Not The Answer

Class 26: Randomization Does Nothing