Originally posted by AcolyteI still don't understand the merits of this strategy.
Also, could you give me the objective function that the hiring officer is trying to maximise? It can't be as simple as 'probability of best candidate being selected', because then he would just interview everyone so he could be sure of finding the best one.
This is because I haven't explained it properly...😳.
The officer INTERVIEWS the first M, and rejects them out of hand. He then interviews the rest, and selects the first one to have a higher score on the objective criteria than the rest. If such a candidate does not exist, he hires the last candidate by default (that is what I forgot to mention). Furthermore, remember the crucial detail: A PROSPECTIVE PRANCER MUST BE ACCEPTED OR REJECTED IMMEDIATELY FOLLOWING HIS INTERVIEW (before others are interviewed). Given this strategy, I am asking you to tell me what value of M maximizes the probability that the best candidate is chosen. M is different from zero. Also, given that this is done properly, what is the probability that the best candidate is hired, and where should you put yourself in line. I think that I may have left the cruciall detail out of my initial explanation, which gives rise to the problems you astutely mentioned.
Your second objection is answered by this detail. Sorry about the lack of clarity.
The best candidate can choose where to stand. He knows that the interviewer wants to minimise his workload. Therefore the cleverest candidate stands at the back. Interviewer discards the first M-2 people, interviews the second to last and hires the last one. Since only a very good candidate will work out that it is best to stand at the back this hiring technique will work admirably.
Originally posted by royalchickenOh right, now it makes sense. So the quality of the last candidate is irrelevant, and if the best candidate of the other N-1 candidates is one of the first M, the hiring officer is bound to choose the last candidate. Is that right?
The officer INTERVIEWS the first M, and rejects them out of hand...
Originally posted by AcolyteThat is correct. Sorry I forgot to tell you that detail before. Some linguistic rigidity on my part or some such.
Oh right, now it makes sense. So the quality of the last candidate is irrelevant, and if the best candidate of the other N-1 candidates is one of the first M, the hiring officer is bound to choose the last candidate. Is that right?
Ok - my thoughts so far: (mucho apologies about previous attempt)
Forget about the instantly rejected people, for now
Given his inefficient search procedure (without the rejection step), say the probability that the interviewer will select the best candidate from n is f[n]
The probability that, with the rejection step, the interviewer will select the best candidate from the total group G = (G-n)/G*f[n]
(i.e. that probability that the best candidate is not in the rejected batch, and is then selected)
f[n] is a summation for all p from 1..n of the values of function g[p,n], which gives the probability that, if the best candidate is at position p within the group of n, the search procedure will find him.
how does g[p,n] work:
well
g[1,n] = 0 (for n =/ 1) and 1 (for n = 1)
g[2,n] = 1 for all n > 1
g[x,n], where x > 2, is more complicated. It is a summation of functions h[b,c,d] where h[b,c,d] is the probability that a random selection of c candidates from a set of d will rank worse than b
b ranges from 1..n-1, and represents the ranking of the first candidate within the group (with better candidates higher)
c = p - 2, and represents the number of candidates between the first and p
d = n - 2, and is the total number of candidates, excluding the best candidate and the first candidate
I'm getting there, but I am going to bed now. Perhaps someone else can carry this on?
A little bit more before I go to work... I didn't need parameter c
I'm getting there, but I am going to bed now. Perhaps someone else can carry this on?
h[b,d] reresents the probability that a random selection of candidates from d rank worse than candidate b,
obviously thare are a total of d-b candidates in d that rank worse than b. The total number of ways of arranging these d-b candidates is (d-b)!
(where ! means factorial)
the total number of arrangements of d-b candidates selected from the total set d-1 is (d-1)!/(b-1)! (I take 1 from d because I am excluding the b'th candidate)
so h[b,d] = (d-b)!.(b-1)!/(d-1)!
Wow - I'll have to look up binomial expansions, I only did a chemistry degree.
Coincidentally, after listening to Jon Tickle's theory of women on BB4, I wrote a little program to test his hypothesis (that given 30 women, you look at the first 10 you meet, and then select the next woman you meet who is better than any you have previously met!!)
This is the same problem in a different wrapper, so I have some ready-made results...
Using N=30, I tried this for each value of M from 0 to 29 - and for each value, did 100 runs and calculated the percentage of times the best 'candidate' was selected.
This shows, for example, that just interviewing the first 3 candidates and then selecting the next candidate that is better than any of the first three (or the last candidate if the best candidate was in the first three), the quality of the candidate will be approximately 82.15% (if all candidates have a random quality between 0 and 99). Jon's theory that M=10 was optimal is shown to be questionable by this sample, which produced a 79.18% quality rating...
0 = 49.32
1 = 69.44
2 = 80.92
3 = 82.15
4 = 82.03
5 = 81.26
6 = 82.58
7 = 80.95
8 = 83.75
9 = 78.60
10 = 79.18
11 = 73.39
12 = 80.25
13 = 76.59
14 = 74.81
15 = 73.37
16 = 75.39
17 = 66.94
18 = 65.13
19 = 61.22
20 = 64.87
21 = 61.36
22 = 54.48
23 = 53.23
24 = 57.24
25 = 54.51
26 = 60.61
27 = 51.65
28 = 53.25
29 = 50.67
Originally posted by royalchickenTo help, here is the code that was used...
Chris, maybe offer an accolade for the complete explanation of this one...let's see if the Answer Prancer will deliver 😉.
Also, can someone explain the fairly subtle reason why the theoretical answer (M=11 for N=30) disagrees with Chris's data? The program did not quite preserve everything.
ITEMS = 30
SAMPLES = 1000
FOR M = 0 TO ITEMS - 1
TOTAL = 0
* For each value of M, take a reasonable number of samples
FOR LOOP1 = 1 TO SAMPLES
ITEM = 0
BEST = 0
* Look at the first 'M' items and find the best
FOR LOOP2 = 1 TO M
ITEM = RND(100)
IF ITEM > BEST THEN
BEST = ITEM
END
NEXT LOOP2
* Look at the remaining items and stop if any is better than BEST,
* otherwise take the last item
FOR LOOP3 = 1 TO (ITEMS - M)
ITEM = RND(100)
IF ITEM > BEST THEN
* Exit Loop
LOOP3 = ITEMS + 1
END
NEXT LOOP3
* Add the score of this item to the current total for 'M'
TOTAL += ITEM
NEXT LOOP1
PRINT "Score for " : M : " = " : (TOTAL / SAMPLES)
NEXT M
Right then...I guess I should explain. Here's how it works, as compared to what the program looks for. Let P(Best) be the probability that the best prancer candidate is selected; we seek to maximise this. Let p(i) denote the ith prancer candidate (i=1,2,....,N). Clearly:
P(Best) = SUM (P[p(i) is best]*P[p(i) is selected])
Where the sum is taken over i=M+1,M+2,...,N.
Since for each i, P[p(i) is best] = 1/N, and the probability that p(i) is better than all its predecessors is M/(i-1), we have:
P(Best) = (M/N)*SUM (1/(i-1)) for i = M+1,...,N
or (and I'll only get into the details of this step if requested):
P(Best) ~ (M/N)*(log N - log M)
Where log denotes a natural logarithm. We need the value of M that maximises this, so we differentiate the above wrt M and set that equal to zero:
log N - 1 = log M
or exp(log N - 1) = M
or M = N/e, where e = 2.7182818...
Thus the best value of M is N/e, and the maximum probability of getting the best prancer is 1/e or about 37%. So, for example, if there re 100 potential prancers, 37 should be rejected out of hand and the probability of getting the absolute best prancer will be 37%.
Therefore, Jon Tickle was not off by much. But can anyone tell me why this theoretical answer does not square with Chris's program.....? (There is a wee problemette, as iamatiger might put it.)
EDIT Also, I'd be interested in hearing whether anyone can come up with a more efficient algortihm for this sort of thing. This was the best I could do, but I suspect that proving optimality here would be quite difficult.
EDIT THE SECOND Chris, I like your profile quote.
Originally posted by royalchickenHere is the output from the program when it is adjusted to count the number of times the BEST item was selected. The number outside the brackets is the average quality (0-99) of the selected candidate. The number in the brackets is the number of times the best candidate of all 30 was selected. Each value of M had 100,000 runs. This shows that the value M=11 produces the highest number of "best" selections, but for average candidate quality, M=4 seems to be best. I'm confused.
Right then...I guess I should explain. Here's how it works, as compared to what the program looks for. Let P(Best) be the probability that the best prancer candidate is selected; we seek to maximise this. Let p(i) denote the ith pr ...[text shortened]... te difficult.
EDIT THE SECOND Chris, I like your profile quote.
0 = 50.09 (3385)
1 = 73.34 (13061)
2 = 79.99 (19805)
3 = 82.54 (24612)
4 = 83.27 (28262)
5 = 83.43 (31196)
6 = 82.91 (33701)
7 = 82.08 (35444)
8 = 81.00 (36360)
9 = 79.94 (37118)
10 = 78.78 (37702)
11 = 77.67 (38357)
12 = 76.05 (37680)
13 = 74.56 (37015)
14 = 73.32 (36543)
15 = 71.79 (35559)
16 = 70.21 (34047)
17 = 68.98 (32934)
18 = 67.27 (31145)
19 = 65.85 (29642)
20 = 64.13 (27452)
21 = 62.63 (25384)
22 = 61.00 (22936)
23 = 59.62 (20759)
24 = 57.98 (18088)
25 = 56.50 (15435)
26 = 54.80 (12605)
27 = 53.29 (9659)
28 = 51.67 (6574)
29 = 50.01 (3307)
EQU QUALITY TO 15000
EQU SAMPLES TO 100000
EQU ITEMS TO 30
FOR M = 0 TO ITEMS - 1
TOTAL = 0
GOTBESTCOUNT = 0
* For each value of M, take a reasonable number of samples
FOR LOOP1 = 1 TO SAMPLES
ITEM = 0
BEST = -1
* Look at the first 'M' items and find the best
FOR LOOP2 = 1 TO M
ITEM = RND(QUALITY)
IF ITEM > BEST THEN
BEST = ITEM
END
NEXT LOOP2
* Look at the remaining items and stop if any is better than BEST,
* otherwise take the last item
PICKEDPOS = -1
FOR LOOP3 = M + 1 TO ITEMS
ITEM = RND(QUALITY)
IF ITEM > BEST THEN
* Exit Loop
PICKEDPOS = LOOP3
LOOP3 = ITEMS + 1
END
NEXT LOOP3
* Now, see if the picked item is the best by examining the remaining items.
* Now, see if the picked item is the best by examining the remaining items.
* If an item was found that was better than BEST then let's see if it is the
* best of ALL the items
IF PICKEDPOS > -1 THEN
GOTBEST = 1
FOR LOOP4 = PICKEDPOS + 1 TO ITEMS
IF RND(QUALITY) > ITEM THEN
GOTBEST = 0
LOOP4 = ITEMS + 1
END
NEXT LOOP4
GOTBESTCOUNT += GOTBEST
END
* Add the score of this item to the current total for 'M'
TOTAL += ITEM
NEXT LOOP1
PRINT M : " = " : (TOTAL / SAMPLES / (QUALITY / 100)) 2 : " (" : GOTBESTCOUNT : "😉"
NEXT M
Originally posted by ChrismoYou are correct 😀. The problem asks for the way to produce the highest probability of selecting the best candidate, which, as you say, happens at M=11. In practice, shooting for highest average quality seems better from a business standpoint, though.
Here is the output from the program when it is adjusted to count the number of times the BEST item was selected. The number outside the brackets is the average quality (0-99) of the selected candidate. The number in the brackets is the number of times the best candidate of all 30 was selected. Each value of M had 100,000 runs. This shows that the value M ...[text shortened]... "best" selections, but for average candidate quality, M=4 seems to be best. I'm confused.
Can anyone work out, in view of what I said before, how to get the best average candidate quality in general (noting that Chris was correct in saying M=4 in that case)?