Donate to Remove ads

Got a credit card? use our Credit Card & Finance Calculators

Thanks to Anonymous,bruncher,niord,gvonge,Shelford, for Donating to support the site

Maths/probability question

Straight answers to factual questions
Forum rules
Direct questions and answers, this room is not for general discussion please
AleisterCrowley
Lemon Half
Posts: 6385
Joined: November 4th, 2016, 11:35 am
Has thanked: 1882 times
Been thanked: 2026 times

Maths/probability question

#380977

Postby AleisterCrowley » January 26th, 2021, 4:25 pm

Numb brain today - this is work related so rephrased to include balls, and not work stuff

I have a big pile of 2,000 balls - 90% are White, and 10% are Red, totally mixed up. So 200 Red, 1,800 White
If I have 100 containers and fill them with 20 randomly selected balls each - what is the most likely number of containers containing at least one Red ball?
They could come out as all red for the first ten then remainder all White - so 10 with at least one Red
Or they could have two red in each, so 100 with at least one Red
The answer is between....

tia
AC
[ I somehow got to most likely is 87% with one or more reds, but this sounds wrong]

onthemove
Lemon Slice
Posts: 540
Joined: June 24th, 2017, 4:03 pm
Has thanked: 722 times
Been thanked: 471 times

Re: Maths/probability question

#381071

Postby onthemove » January 26th, 2021, 11:41 pm

AleisterCrowley wrote:Numb brain today - this is work related so rephrased to include balls, and not work stuff

I have a big pile of 2,000 balls - 90% are White, and 10% are Red, totally mixed up. So 200 Red, 1,800 White
If I have 100 containers and fill them with 20 randomly selected balls each - what is the most likely number of containers containing at least one Red ball?
They could come out as all red for the first ten then remainder all White - so 10 with at least one Red
Or they could have two red in each, so 100 with at least one Red
The answer is between....

tia
AC
[ I somehow got to most likely is 87% with one or more reds, but this sounds wrong]


If you fill the containers up sequentially, then for the first container to have no reds, it would need all white balls.

This, I believe, is then the hypergeometric probability(?)

https://stattrek.com/online-calculator/ ... etric.aspx

If you plug in your numbers...
population 2000,
'successes' = white = 1800,
sample size = container size = 20,
and number of 'successes' = 20 (we want all white)

Then you get p = 0.120291315
Which flipped around means probability of having a red ball in that first container = 0.88.

But then this is where it gets interesting (read, beyond me!). This only holds for the first container.

If you are drawing the balls randomly, then (I think(?)) you'd still expect the same distribution - i.e. 90% white, 10% red, still on average, as the population declines. (i.e. if you have your initial 2000 balls, then randomly take out 1000, then for the remaining 1000, you'd still expect a 90%:10% split white to red, on average)

So if you then consider when you're down to 200 balls left, on average 90% should still be white, so you'd expect 180 white balls on average at this stage.

Plug those numbers in, and for that container from a population of 200 balls, you get ...

p(all white) = 0.108542037
So p(at least 1 red) = 0.89

i.e. because the population is now smaller, the effect of drawing a single ball (and not replacing it) changes the next probability in a different proportion to what it did when you drew the first out of the 2000.

Then consider when you're down to 40 balls...

p(all white) = 0.053014553
p(at least 1 red) = 0.95

How you'd actually go about combining all the probabilities from the 100 containers in between, to get an overall probability, I'm not sure.

That said, I'm not completely sure of the above reasoning, but it's the best I can come up with so far (stats weren't my strong point in maths)

So my answer would be a greater than 88% chance (i.e. you'd expect typically more than 88 containers to contain a red ball). But how much greater / how much more, I'm not sure.

I must admit though, it's similar to your number and similar to what you suggest, this does feel a little 'wrong', and I'm not at all confident in my workings out! (In particular I'm not sure whether it's valid to say that when you get down to 200, that it should still be 90:10 split on average, so assume 180 white, 20 red)

GoSeigen
Lemon Quarter
Posts: 4519
Joined: November 8th, 2016, 11:14 pm
Has thanked: 1642 times
Been thanked: 1647 times

Re: Maths/probability question

#381095

Postby GoSeigen » January 27th, 2021, 7:34 am

AleisterCrowley wrote:Numb brain today - this is work related so rephrased to include balls, and not work stuff

I have a big pile of 2,000 balls - 90% are White, and 10% are Red, totally mixed up. So 200 Red, 1,800 White
If I have 100 containers and fill them with 20 randomly selected balls each - what is the most likely number of containers containing at least one Red ball?
They could come out as all red for the first ten then remainder all White - so 10 with at least one Red
Or they could have two red in each, so 100 with at least one Red
The answer is between....

tia
AC
[ I somehow got to most likely is 87% with one or more reds, but this sounds wrong]


FWIW 87% sounds about right to me, after a quick and rough estimate. If you need to understand and get the maths right, start with a small number of balls and containers, do the maths, then generalise for larger numbers. Or wait for one of the Puzzle Corner people to do a worked solution. ;-)

GS

johnhemming
Lemon Quarter
Posts: 3858
Joined: November 8th, 2016, 7:13 pm
Has thanked: 9 times
Been thanked: 609 times

Re: Maths/probability question

#381100

Postby johnhemming » January 27th, 2021, 7:54 am

onthemove wrote:Then you get p = 0.120291315
Which flipped around means probability of having a red ball in that first container = 0.88.

I got a slightly different answer on the first container being all white and then thought that this was probably not the way to do it and decided I didn't have the time to spend on this.

AleisterCrowley
Lemon Half
Posts: 6385
Joined: November 4th, 2016, 11:35 am
Has thanked: 1882 times
Been thanked: 2026 times

Re: Maths/probability question

#381116

Postby AleisterCrowley » January 27th, 2021, 8:57 am

Thanks all - for my purposes I think 88% is a reasonable guesstimate [I think I'd failed to round my 87.?? %]- would be interested in seeing the exact answer if anyone want to try it! My last probability/stats coure was 35 years ago...
I may run an Excel simulation later when work gets really boring (inevitable)

johnhemming
Lemon Quarter
Posts: 3858
Joined: November 8th, 2016, 7:13 pm
Has thanked: 9 times
Been thanked: 609 times

Re: Maths/probability question

#381119

Postby johnhemming » January 27th, 2021, 9:08 am

I think the difficulty is that we are looking for the most probable number of containers with at least one red ball. It strikes me that this is quite a laborious calculation.

onthemove
Lemon Slice
Posts: 540
Joined: June 24th, 2017, 4:03 pm
Has thanked: 722 times
Been thanked: 471 times

Re: Maths/probability question

#381127

Postby onthemove » January 27th, 2021, 9:19 am

AleisterCrowley wrote:Thanks all - for my purposes I think 88% is a reasonable guesstimate [I think I'd failed to round my 87.?? %]- would be interested in seeing the exact answer if anyone want to try it! My last probability/stats coure was 35 years ago...
I may run an Excel simulation later when work gets really boring (inevitable)


I've just run my idea through excel...

Hypergeom.dist function for

Starting population 2000,
"Successes" in population = Population * 0.9,
Samples fixed at 20 (each container is the same size),
"Successes" in sample = 20 (since we're looking for all white)

And taken (1-answer) to get the P(at least 1 red)

Then copied that line 100 times, with population decreasing by the 20 balls removed each time.

Then taken a simple average of the results.

Comes out to 88.5133.

So rounded up to whole containers, my answer would be 89 containers would expected on balance of probability to have at least 1 red ball.

Not completely convinced this is correct, and even if it is, I'm sure there must be a better, more direct way of calculating it.

AleisterCrowley
Lemon Half
Posts: 6385
Joined: November 4th, 2016, 11:35 am
Has thanked: 1882 times
Been thanked: 2026 times

Re: Maths/probability question

#381133

Postby AleisterCrowley » January 27th, 2021, 9:47 am

Hmm, thinking about it again (I was in a hurry yesterday) I may have inadvertently misled people
What I've actually got is a set of 2000 'things' taken from a wider population of tens of thousands
Each thing has an independent 10% chance of failing an audit
They are bundled in blocks of 20 under different change references
If they have been audited already, then I know* that set contains 200 'fails' so if we look at each block consecutively, if there are a lot of fails earlier there are fewer to go round later
If the audits happen as we go through each block then each thing has an independent 10% chance of failing, so it shouldn't affect the probabilities of later blocks containing a fail
Confused...



*let's assume the 10% is entirely accurate and consistent

UncleEbenezer
The full Lemon
Posts: 10978
Joined: November 4th, 2016, 8:17 pm
Has thanked: 1505 times
Been thanked: 3050 times

Re: Maths/probability question

#381137

Postby UncleEbenezer » January 27th, 2021, 10:05 am

AleisterCrowley wrote:Hmm, thinking about it again (I was in a hurry yesterday) I may have inadvertently misled people
What I've actually got is a set of 2000 'things' taken from a wider population of tens of thousands
Each thing has an independent 10% chance of failing an audit
They are bundled in blocks of 20 under different change references


Hey, it's the risk of at least one covid-infected person in a group he's asking about!

Yes, I'd run a simulation. Unless Gengulphus posts a solution.

modellingman
Lemon Slice
Posts: 638
Joined: November 4th, 2016, 3:46 pm
Has thanked: 625 times
Been thanked: 377 times

Re: Maths/probability question

#381140

Postby modellingman » January 27th, 2021, 10:18 am

AleisterCrowley wrote:Numb brain today - this is work related so rephrased to include balls, and not work stuff

I have a big pile of 2,000 balls - 90% are White, and 10% are Red, totally mixed up. So 200 Red, 1,800 White
If I have 100 containers and fill them with 20 randomly selected balls each - what is the most likely number of containers containing at least one Red ball?
They could come out as all red for the first ten then remainder all White - so 10 with at least one Red
Or they could have two red in each, so 100 with at least one Red
The answer is between....

tia
AC
[ I somehow got to most likely is 87% with one or more reds, but this sounds wrong]


I actually looked at this yesterday, but forgot to post it. However, the last post from AC seems to have changed the rules slightly. Oh, well...



Quite a tricky problem.

Effectively, you are sampling without replacement from a finite population and this type of sampling is usually covered by the hypergeometric distribution. This would, for example, give you the distribution of the number of reds in your first container. However, your problem becomes more complex as the second, third, etc containers are added. Rather than trying to solve this problem analytically, I fell back on a bit of Monte-Carlo simulation.

It was slightly easier to count the simulated number of "all White" containers. The mean number of "all White" containers across 100,000 simulations was 11.91. The mode was 12. About 90% of the time the simulated number of "all Whites" was in the range 8-16 and the minimum and maximum simulated values across all the simulations were 2 and 24.

Subtract these numbers from 100 to get your "at least one Red" values. Your 87% doesn't look too far out judging by the simulation results.

mark88man
2 Lemon pips
Posts: 237
Joined: January 28th, 2017, 11:58 am
Has thanked: 319 times
Been thanked: 87 times

Re: Maths/probability question

#381151

Postby mark88man » January 27th, 2021, 10:30 am

The wikipedia on hypergeometric distributions is quite interesting, and does contain a number of direct formulas involving combinatorial operators mainly. So I haven't done the maths, but those with spreadsheets open might be able to double check

Eboli
Lemon Slice
Posts: 338
Joined: November 7th, 2016, 9:05 pm
Has thanked: 2 times
Been thanked: 125 times

Re: Maths/probability question

#381152

Postby Eboli » January 27th, 2021, 10:31 am

Surely this a probability tree like a Pascal triangle?

So on the first container you need to compute the probabilities for ALL outcomes. All outcomes mean:

0, 1....20 red balls of which 87.843% thereabouts (100 - [09^20]) is the probability of the first box having anything other than 0 red balls. Of course it isn't exactly (100 - [0.9^20]) because
- whereas the probability of drawing the first white ball is 0.9
- the probability then of drawing the second white ball is 1799/1999
- the probability then of drawing the third white ball is 1798/1998 ....&c
So it is a laborious calculation.

But that only gives you the chance of the first box having at least 1 red ball. Of course there are 20 possible outcomes for the first box as it could contain anything from 0 to 20 red balls all of which will have a probability based on the the total number of permutations of the red balls. So one red ball can occur in 20 different ways (on the first, second....twentieth ball). And, of course, the permutations quickly increase for 2 red balls (20-2!/2! = 190 ways).

And for each of these possibilities there will be probabilities based on the remaining number of red balls for the boxes that follow. So for example if the first box contains 20 red balls then the probability of the second box containing at least 1 red ball will be about [100 - (1780/1980)^19] which about 86.777%

And then the probability tree continues.

This seems a far too long way of calculating it. And I am sure a mathematician has developed a short cut method.

But it tends to suggest that the probability of a container containing at least 1 red ball will be lower than 88%. I suspect it is nearer to 86%

Eb.

AleisterCrowley
Lemon Half
Posts: 6385
Joined: November 4th, 2016, 11:35 am
Has thanked: 1882 times
Been thanked: 2026 times

Re: Maths/probability question

#381153

Postby AleisterCrowley » January 27th, 2021, 10:32 am

I think we're in the right ball park - better then the '..erm..25%' floating around (not here, and not me)
I'm not sure if my last post does change things? Perhaps the Monty Hall problem is easier to grasp...

modellingman
Lemon Slice
Posts: 638
Joined: November 4th, 2016, 3:46 pm
Has thanked: 625 times
Been thanked: 377 times

Re: Maths/probability question

#381206

Postby modellingman » January 27th, 2021, 1:10 pm

AleisterCrowley wrote:Hmm, thinking about it again (I was in a hurry yesterday) I may have inadvertently misled people
What I've actually got is a set of 2000 'things' taken from a wider population of tens of thousands
Each thing has an independent 10% chance of failing an audit
They are bundled in blocks of 20 under different change references
If they have been audited already, then I know* that set contains 200 'fails' so if we look at each block consecutively, if there are a lot of fails earlier there are fewer to go round later
If the audits happen as we go through each block then each thing has an independent 10% chance of failing, so it shouldn't affect the probabilities of later blocks containing a fail
Confused...



*let's assume the 10% is entirely accurate and consistent


I think you have two subtly different probability modelling problems.

The "audited" case: here you appear to know the total number of fails in your set of 2,000 but not how these fails are distributed amongst the 100 bundles.

The "non-audited" case: you don't know the total number of fails in your set of 2,000.

The "non-audited" case is fairly straightforward. The "things" are independent of each other, so too are the bundles. Independence of things means the number of fails in a bundle has a probability distribution given by B(20,10%) [binomial distribution with parameters n=20 and p=10%]. So, the probability of no fails in a bundle is 0.9^20 = 0.122 (approx). Independence of bundles means the number of bundles without fails also has a probability distribution given by a binomial distribution - in this case, B(100, 0.9^20). So, the mean number of bundles without fails is simply 100*(0.9^20) = 12.16.

This is slightly different from the mean value of 11.91 I previously gave and which corresponds to an "audited" case with 200 fails. As I and others have noted this "audited" case is difficult to deal with analytically, but I think things just got a whole lot easier...

Unless that difference of 12.16 vs 11.91 has material real-world implications (such as cost or quality) I would be very tempted to use the "double-binomial" approach set out above for the "non-audited" case and apply it to the "audited" case. But, I would use N/2000 as the probability value p rather than 10% (where N is the known number of fails in the set of 2000). This would give the expected (or mean) number of bundles without fails as

100*(1-p)^20

OK, it will be a slightly biased over-estimate, but much easier than building and testing a simulation model.

One final point, your assumption that the 10% is entirely consistent and accurate does not mean that you will always have 200 fails in a (randomly) selected set of 2,000 things. It means the number of fails in the set will conform to the B(2000, 10%) distribution. Using the Normal approximation to the Binomial will give you some idea of the range of fails you might experience in practice in a set of 2000. Whilst the expected number of fails will be 200, the standard deviation of the distribution is about 13.4. So, 5% of the time the number of fails in your 2,000 will fall outside the range 175 to 225.

AleisterCrowley
Lemon Half
Posts: 6385
Joined: November 4th, 2016, 11:35 am
Has thanked: 1882 times
Been thanked: 2026 times

Re: Maths/probability question

#381229

Postby AleisterCrowley » January 27th, 2021, 2:06 pm

Thanks!
The 'real world' problem is pretty much a guesstimate of how many change control 'bundles' will need reworking (minor time/cost impact) so close enough is close enough ! Knowing that it's likely to be >85% rather than original estimate < 50% is sufficient.
The 10% failure rate is (of course) very much finger in the air, but I fixed it for this exercise , rather than adding in another variable
I will investigate later, as it's more interesting than dealing with a certain software company's billing team
What I did in a hurry yesterday was ;
probability of no fails = 0.9^20 which was about 12%. Think I did 1-(0.9^20) for the 'one or more fails' and didn't round up hence 87%


Return to “Does anyone know?”

Who is online

Users browsing this forum: No registered users and 18 guests