Open letter to Russ re/engine use

Squelchbelch

Only Chess

28 Nov 08

Korch

Chess Warrior

Riga

Joined: 05 Jan 05
Moves: 24932

01 Dec 08

3 edits

Originally posted by Kepler
I would suggest the point of human v human tournaments is to show that they play at significantly different levels from each other. By significantly different I assume we both mean there will be a clear winner.

Consider what high match up rates with a particular engine in a particular game tell us. That might be interpreted as evidence that that engine is rence or because there is but the statistical analysis fails to find it. More work is required.

To be honest I don`t see logic in your conclusions.

And I`m still awaiting answer to one question:

WHAT COULD BE LEGITIMATE REASON WHY ENGINE MATCHUP OF SUSPECT (if result is reached in objective investigation) IS MUCH HIGHER THAN STRONGEST HUMAN PLAYERS EVER HAD????

Until no one will be able to give plausible answer to this question I will consider high engine matchup as evidence of engine use.

Kepler

Demon Duck

of Doom!

Joined: 20 Aug 06
Moves: 20099

01 Dec 08

1 edit

Originally posted by Korch
To be honest I don`t see logic in your conclusions.

And I`m still awaiting answer to one question:

[b]WHAT COULD BE LEGITIMATE REASON WHY ENGINE MATCHUP OF SUSPECT (if result is reached in objective investigation) IS MUCH HIGHER THAN STRONGEST HUMAN PLAYERS EVER HAD????[/b]

I don't know of a legitimate reason why a player would achieve a higher engine match up than the strongest human players had. I assume by legitimate you mean legal on this site? One reason would be if the player were using the same engine as that used for investigating his games but that hardly seems legitimate.

Unfortunately that is not the question I am trying to answer. What I am trying to discover is why the strongest engines do not get a significantly different match up than strong humans. I'll leave the matter of whether Alekhine, Reti, Rubinstein et al. were strong or not to others to debate.

I am not saying that high match rate is not an indication of engine use. What I am saying is that I didn't get a high match up even though I know in advance that all the games in a particular sample were played by engines. I see no evidence of difference between the humans and engines in the two samples because the engines have a surprisingly low match up not because Alekhine and co. have a high match up.

Korch

Chess Warrior

Riga

Joined: 05 Jan 05
Moves: 24932

01 Dec 08

Originally posted by Kepler
I don't know of a legitimate reason why a player would achieve a higher engine match up than the strongest human players had. I assume by legitimate you mean legal on this site? One reason would be if the player were using the same engine as that used for investigating his games but that hardly seems legitimate.

Unfortunately that is not the question I am ...[text shortened]... he matter of whether Alekhine, Reti, Rubinstein et al. were strong or not to others to debate.

What I am trying to discover is why the strongest engines do not get a significantly different match up than strong humans.

I think that according to matchup analyse results in Gatecrasher`s posts engines tend to have obviously higher engine matchups than humans.

Palynka

Upward Spiral

Halfway

Joined: 02 Aug 04
Moves: 8702

01 Dec 08

2 edits

Originally posted by Kepler
I would suggest the point of human v human tournaments is to show that they play at significantly different levels from each other. By significantly different I assume we both mean there will be a clear winner.

Consider what high match up rates with a particular engine in a particular game tell us. That might be interpreted as evidence that that engine is rence or because there is but the statistical analysis fails to find it. More work is required.

Consider what high match up rates with a particular engine in a particular game tell us.
Ok, we agree that this is the main question.

That might be interpreted as evidence that that engine is being used. However, that only helps if we know what engine was playing the original game. Was it just chance or is there a high match up because we are both using the same engine?
"Chance"? But that's what the test is for! The test statistic is being compared to what we would expect under the null (human vs human) and every high match-up game increases the probability that the null is wrong. The high-match-up is not calculated for a single game, but for a large amount of games.

Of course, if I happen to use Patzer 3.2 to analyse and get a high match up in many games I will be confident that the person whose games I am analysing also used some version of Patzer.
Hence, high-matchups are evidence of engine use.

If I get a lower match up rate all I can say is that he or she was unlikely to be using Patzer.
And? No actions here are taken when low match-up rates occur. If you think about it, all you're saying is that the match-up test is not very stringent and engine users might be let go because they're using a particular type of engine.

Nevertheless, none of this affects the probability of an innocent player having a high-match-up rate and being wrongly 'convicted'. Remember, the question is what we can interpret from HIGH match-up rates.

What I wanted was to demonstrate that it is possible to distinguish between human and engine moves in general on the basis of match up rates
Wrong. What you're saying is that low match-up rates are poor evidence of non-engine use. BUT high match-up rates can only be explained by engine use and nothing in your findings changes this interpretation.

Kepler

Demon Duck

of Doom!

Joined: 20 Aug 06
Moves: 20099

01 Dec 08

Originally posted by Korch
[b]What I am trying to discover is why the strongest engines do not get a significantly different match up than strong humans.

I think that according to matchup analyse results in Gatecrasher`s posts engines tend to have obviously higher engine matchups than humans.[/b]

I think you may be right but there is a difference between a subjective obvious to us and an objective statistical test.

A preliminary look at Gatecrasher's data suggests that some human tournaments have a much greater difference in match up to the single engine tournament. My result may well be due to the particular human tournament I used.

There may also be issues with the engine I used, I notice that Gatecrasher's data suggest a higher match up for the engine tournament than I obtained. I also know that he was not using the same engine I used.

There may be other factors that need to be considered. Earlier you suggested that hardware might be a factor, that is an avenue I am currently investigating.

Korch

Chess Warrior

Riga

Joined: 05 Jan 05
Moves: 24932

01 Dec 08

Originally posted by Kepler
I think you may be right but there is a difference between a subjective obvious to us and an objective statistical test.

A preliminary look at Gatecrasher's data suggests that some human tournaments have a much greater difference in match up to the single engine tournament. My result may well be due to the particular human tournament I used.

There may a ...[text shortened]... hardware might be a factor, that is an avenue I am currently investigating.t be considered as

"I think you may be right but there is a difference between a subjective obvious to us and an objective statistical test."

If average matchup of humans tournaments and engine tournaments (obviously higher than matchup in human tournaments) can`t be considered as "an objective statistical test" then please explain what can.

Kepler

Demon Duck

of Doom!

Joined: 20 Aug 06
Moves: 20099

01 Dec 08

2 edits

Originally posted by Palynka
[b]Consider what high match up rates with a particular engine in a particular game tell us.
Ok, we agree that this is the main question.

That might be interpreted as evidence that that engine is being used. However, that only helps if we know what engine was playing the original game. Was it just chance or is there a high match up because we are nly be explained by engine use and nothing in your findings changes this interpretation.

[/b]Yes, high match up rates can only be explained by engine use if taken over a large enough sample of games. I do not dispute that. However, that is NOT the question I was addressing. Telling me that I am wrong when I plainly state what I was trying to achieve is basically claiming that you know my mind better than I do. Do not do that, you do not have such an ability.

Would you care to speculate what a low match up rate for known engine games tell us? Or better yet, why those engines get the same match up rate as humans from 1922? You will notice that I have NOT said that Alekhine et al. achieved a high match up rate.

Kepler

Demon Duck

of Doom!

Joined: 20 Aug 06
Moves: 20099

01 Dec 08

Originally posted by Korch
[b]"I think you may be right but there is a difference between a subjective obvious to us and an objective statistical test."

If average matchup of humans tournaments and engine tournaments (obviously higher than matchup in human tournaments) can`t be considered as "an objective statistical test" then please explain what can.[/b]

OK. Here's a little test for you. The results I got were match up rate of just over 61% in one sample, just under 60% in the other. Obviously one is higher than the other! Is the difference statistically significant? You may say yes on the basis of just those figures. The staistician will want some other information and then he will perform an appropriate test. The two sample t-test I already performed is a good candidate. Well I'm blowed, no significant difference. So, who is correct? I know who I will go with.

I will look at Gatecrasher's data (i have the full data set now not just the summary) and see if there is indeed a significant difference. I hope there is! What I will not do is just look at the figures and say that there is an obvious difference.

Palynka

Upward Spiral

Halfway

Joined: 02 Aug 04
Moves: 8702

01 Dec 08

Originally posted by Kepler
Yes, high match up rates can only be explained by engine use if taken over a large enough sample of games. I do not dispute that. However, that is NOT the question I was addressing. Telling me that I am wrong when I plainly state what I was trying to achieve is basically claiming some you know my mind better than I do. Do not do that, you do not have such ...[text shortened]... m 1922? You will notice that I have NOT said that Alekhine et al. achieved a high match up rate.[/b]

You didn't address any of my points. Interesting.

You were saying that we cannot differentiate between engine and human using match-up rates. Now you agree that high match up rates can only be explained by engine use. You can't have it both ways.

Either you were wrong before or are wrong now. Don't blame me, I'm just the messenger.

Would you care to speculate what a low match up rate for known engine games tell us?
It says we don't know whether he is using an engine or not. Like I said, it's IRRELEVANT because no action is taken using low-match-up rates.

I'll make this simple for you:
1) High match-up rates can only be explained by engine use
2) High match-up rates are considered by the mods as evidence of cheating

Anyone willing to say that the current system is flawed needs to address 1). Your experiment doesn't. It's simple logic.

Kepler

Demon Duck

of Doom!

Joined: 20 Aug 06
Moves: 20099

01 Dec 08

1 edit

Originally posted by Palynka
You didn't address any of my points. Interesting.

You were saying that we cannot differentiate between engine and human using match-up rates. Now you agree that high match up rates can only be explained by engine use. You can't have it both ways.

Either you were wrong before or are wrong now. Don't blame me, I'm just the messenger.

[b]Would you car rrent system is flawed needs to address 1). Your experiment doesn't. It's simple logic.

[/b]I did not say that people in general could not distinguish between engine and human using match up. I said that there is no evidence for a difference in the mean match up percentage in the two samples that I used. That is not the same statement. There may well be a difference, I just didn't find it this time. There may be other factors at work here as I have already said in other posts.

I say that high match up rates can be explained by engine use. I would hesitate to say that is the only explanation but it is certainly the most likely I can think of. However, that is still not the issue I was addressing here. If you want to think that I am trying to debunk the notion that high match up rate is indicative of engine use, be my guest. It will not change the fact that I am not trying to do that at all.

Would you care to speculate what a low match up rate for known engine games tell us?
It says we don't know whether he is using an engine or not. Like I said, it's IRRELEVANT because no action is taken using low-match-up rates.

Interesting, you didn't read the question did you? If I have analysed an engine v engine tournament and got a low match up rate how does that tell me the engines are not using an engine? Do you think that the engines in the 16th World Computer Chess Championship were replaced with humans?

DeepThought

Losing the Thread

Quarantined World

Joined: 27 Oct 04
Moves: 87415

01 Dec 08

Originally posted by Kepler
First 2 paragraphs cut.
There may also be issues with the engine I used, I notice that Gatecrasher's data suggest a higher match up for the engine tournament than I obtained. I also know that he was not using the same engine I used.

There may be other factors that need to be considered. Earlier you suggested that hardware might be a factor, that is an avenue I am currently investigating.

The thing is that when a moderator is checking some games they don´t know what engine the suspected cheat might have been using. The chances of hitting the right engine are 1 in however many engines there are, and then you have to take into account settings, hardware isn´t really an issue as you just have to wait longer (assuming a relatively modern machine with enough memory). To some extent I think this justifies the 3 move match up as it helps compensate for engine differences, as to a first approximation engines will agree on the top 3 moves, only put them in a different order.

Kepler

Demon Duck

of Doom!

Joined: 20 Aug 06
Moves: 20099

01 Dec 08

Originally posted by DeepThought
The thing is that when a moderator is checking some games they don´t know what engine the suspected cheat might have been using. The chances of hitting the right engine are 1 in however many engines there are, and then you have to take into account settings, hardware isn´t really an issue as you just have to wait longer (assuming a relatively modern mac ...[text shortened]... a first approximation engines will agree on the top 3 moves, only put them in a different order.

You may be right about the 3 choice malarkey. However, a preliminary look at Gatecrashers data suggests it doesn't work as well as one might like. I suspect I will get less difference using three choices than one. The reason may be simply that in many positions there aren't that many good moves and a good player, whether human or engine, is likely to spot the one or two good moves in a position. This effect may simply overwhelm those positions where there are apparently many almost equal moves. I may be wrong though, we shall see.

Palynka

Upward Spiral

Halfway

Joined: 02 Aug 04
Moves: 8702

01 Dec 08

1 edit

Originally posted by Kepler
I did not say that people in general could not distinguish between engine and human using match up. I said that there is no evidence for a difference in the mean match up percentage in the two samples that I used. That is not the same statement. There may well be a difference, I just didn't find it this time. There may be other factors at work here as I h nk that the engines in the 16th World Computer Chess Championship were replaced with humans?[/b]

Look, Kepler, I thought you actually wanted to help. Now I see you're just trying to show off. Sorry, but based on that, I'll have to pummel you further.

The good thing of an internet forum, is that you can go back and see EXACTLY what people said. There's no escape. You can try to claim you didn't say this or that, but it's all there in black and white.

- Kepler: I did not say that people in general could not distinguish between engine and human using match up.
- Kepler (Page 1) This also suggests a reason why it has taken so long to ban some alleged cheats, match up rates are no indicator of engine use!

Moreover:
Kepler: If you want to think that I am trying to debunk the notion that high match up rate is indicative of engine use, be my guest. It will not change the fact that I am not trying to do that at all.

- Kepler (Page 1): If anyone has been banned on the basis of match up rates alone I consider that there is at least a 50% chance that they were wrongly banned.

The first one is a blatant lie. It would have been easier to admit you were simply wrong, but you wanted to save face and decided to backtrack and claim you didn't claim what you, in fact, had...claimed.

The second is either a similar lie or a honest mistake. Your initial interpretation of your results included an attack on the use of match-up rates by the mods (obviously, high match-up rates as these are the ones that can lead to banning). Either you now reject your own initial claims or you don't mind contradicting yourself.

Interesting, you didn't read the question did you? If I have analysed an engine v engine tournament and got a low match up rate how does that tell me the engines are not using an engine?
Seriously, who can't read here? I said: It says we don't know whether he is using an engine or not. If I say the test is inconclusive, why should I think that the engines are not engines? It just proves my point. Low match-up rates should be interpreted as lack of evidence for engine use, not as evidence for non-engine use. If you're a statistician, you should know the difference.

Korch

Chess Warrior

Riga

Joined: 05 Jan 05
Moves: 24932

01 Dec 08

Originally posted by Palynka
Look, Kepler, I thought you actually wanted to help. Now I see you're just trying to show off. Sorry, but based on that, I'll have to pummel you further.

The good thing of an internet forum, is that you can go back and see EXACTLY what people said. There's no escape. You can try to claim you didn't say this or that, but it's all there in black and white. ...[text shortened]... vidence for non-engine use. If you're a statistician, you should know the difference.

Rec`ed.

Kepler

Demon Duck

of Doom!

Joined: 20 Aug 06
Moves: 20099

01 Dec 08

Originally posted by Palynka
Look, Kepler, I thought you actually wanted to help. Now I see you're just trying to show off. Sorry, but based on that, I'll have to pummel you further.

The good thing of an internet forum, is that you can go back and see EXACTLY what people said. There's no escape. You can try to claim you didn't say this or that, but it's all there in black and white. ...[text shortened]... vidence for non-engine use. If you're a statistician, you should know the difference.

You obviously missed the bit where I pointed out that I had been deliberately controversial in that first post of mine. I was after some information and I got it. Sometimes one just has to take an extreme stance before people will engage in any debate. Unfortunate and I am fully aware I have ruffled some feathers but I got the information I needed. I knew that asking politely for it would result in silence because I have asked before. So I poked the hornet's nest. Rude of me, but what the hey, it worked.

Sorry if I have ruffled your feathers as well. Pummel away if it will make you feel any better.

I am not trying to prove or disprove the methods the mods use. I don't know exactly what methods they use. I do know there is more involved than just considering match ups. I was asked if it is possible to distinguish between the play of engines and the play of humans. The received wisdom is that there is an "obvious" difference. So that was the reply I gave. Then I was asked what the "obvious" difference is. High match up of moves when compared with an engine says I. Then I was asked if I had any evidence that there is actually an identifiable difference between engine and human play. Unfortunately I didn't have any to hand and no one else was willing to confirm that it did or did not exist. All I got was the same sort of stuff you are shoving at me. "If player x matches over n% with an engine then he is using an engine". And? That is not an answer to the question asked. It is a bit like asking where the butter is and being told that you can use butter to make cakes.

So I went off and did what staisticians do. I gathered data and applied some staistical jiggery pokery to me. The result surprised me. I was going to have to back and say "Nope, there is no detectable difference between humans and engines". Well, that wouldn't do. What to do then? Well, I could just have tried more or other samples or other tests until I got the desired result but that sort of dubious practice could get a statistician booted out of the statisticians' union. So I thought I would try here. There are many knowledgeable people in here even if suspicion and paranoia sometimes gets the better of them. I knew I would have a hard time getting the information I wanted. What if I am cheat? What if I am protecting other cheats? What if I used the information to construct some kind of invisibility cloak for the cheaters? That is the reason for the initial post and maybe one or two others. Now no one knows which bits of information I wanted or what to do with them so I am not aiding cheats. In any case, the information wouldn't actually help anyone to cover their tracks it just tells me where I should look to find out what went wrong with my investigation.

Low match up rates evidence for non-engine use? I prefer to take the statistician's view that low match up rates show there is no evidence to reject the idea that someone is a human. It doesn't actually give us evidence they are not something else. You can interpret that as "Low match up is not a guarantee that the suspect is not using an engine".