Go back
Open letter to Russ re/engine use

Open letter to Russ re/engine use

Only Chess

K
Demon Duck

of Doom!

Joined
20 Aug 06
Moves
20099
Clock
29 Nov 08
Vote Up
Vote Down

Originally posted by Tengu
Not a good one judging from the 1908 championship match!

πŸ˜›
Unfortunately he was using an older model of steam powered PC to run his engine on.

FL

Joined
21 Feb 06
Moves
6830
Clock
29 Nov 08
Vote Up
Vote Down

Originally posted by Kepler
You may recall I posted in another thread now long gone stating that I was applying statistical analysis to two samples of games. One sample was taken from a tournament held in Vienna in 1922 and featured the likes of Reti, Gruenfeld and Rubinstein. The other sample was taken from the 16th World Computer Chess Championship which was recently won by Rybka.

...[text shortened]... it has taken so long to ban some alleged cheats, match up rates are no indicator of engine use!
Surely the reason for the low match up from the World Computer Championship is the fact that there were at least two or three absolutely dreadful programs in the competition? These would drag down the average match up for all the games in this tournament.
http://www.grappa.univ-lille3.fr/icga/tournament.php?id=178

Did your analysis take book moves into account? The human game which you said had an 80% match up was a 25 move draw, so probably too short a game to be statistically significant.
http://www.chessgames.com/perl/chessgame?gid=1148874

For individual games between players on this site we have seen 90% match ups with Fritz' first choice.

a

cavanaugh park

Joined
27 Feb 05
Moves
50881
Clock
29 Nov 08
Vote Up
Vote Down

who is the player in question?! is it numero uno

rc

Joined
26 Aug 07
Moves
38239
Clock
29 Nov 08
Vote Up
Vote Down

Originally posted by alexstclaire
who is the player in question?! is it numero uno
i have been really itching to ask the same question but thought it inappropriate as the thread may be removed as a consequence, however, if someone would like to send any analysis privately i would really like to give it some consideration - regards Robbie.

a

cavanaugh park

Joined
27 Feb 05
Moves
50881
Clock
29 Nov 08
Vote Up
Vote Down

i guess im tactless, for the record i think numero uno is a good guy

c
Grammar Nazi

Auschwitz

Joined
03 Apr 06
Moves
44348
Clock
29 Nov 08
Vote Up
Vote Down

Originally posted by alexstclaire
i guess im tactless, for the record i think numero uno is a good guy
Please don't start a discussion about individual players, as this thread has some very valid information, and I'd hate to have it pulled.

rc

Joined
26 Aug 07
Moves
38239
Clock
29 Nov 08
Vote Up
Vote Down

Originally posted by alexstclaire
i guess im tactless, for the record i think numero uno is a good guy
i wouldnt say that, considering your rather tactful play with the pieces in reaching 2000 on icc, πŸ˜€

a

cavanaugh park

Joined
27 Feb 05
Moves
50881
Clock
29 Nov 08
Vote Up
Vote Down

Originally posted by robbie carrobie
i wouldnt say that, considering your rather tactful play with the pieces in reaching 2000 on icc, πŸ˜€
hehe, thanksπŸ˜‰

D
Losing the Thread

Quarantined World

Joined
27 Oct 04
Moves
87415
Clock
29 Nov 08
Vote Up
Vote Down

Originally posted by Kepler
DT: Stuff I´m not responding to cut out
A little preliminary work on the top three issue suggested that this actually narrows the gap between engine and human, making the difference harder to detect, whereas I wanted the opposite.
I´d wondered about the choice of top 3 moves myself. Can you expand on how and why you decided that testing beyond the first choice engine move makes the difference between engines and humans harder to detect (other than the obvious well they´ve got three chances to match aspect)? I think that this is a critical issue. For that matter I´d be interested to hear one of the former games mods comment on this.

I´m sympathetic to what you are trying to do as it is vital that the innocent are not found guilty. However. It´s also annoying that people use engines at all - I´d like their behaviour modified. While I´m aware that you are an expert in statistics, which I´m not, I feel that your implied conclusion - that it is impossible to find a statistical test which can reliably detect engine use - is pessimistic.

I think you are right that a major problem is to find a good way of eliminating moves that both a human and an engine would make. Forced lines and automatic recaptures, or situations where there are a small number of sensible moves and the rest are obviously losing. We seem to agree that eliminating these moves from the sample should increase the difference between genuine players and GenuineIntel players.

S

Joined
14 Jul 06
Moves
20541
Clock
29 Nov 08
1 edit
Vote Up
Vote Down

Originally posted by Kepler
....
This is an extremely disturbing result. If anyone has been banned on the basis of match up rates alone I consider that there is at least a 50% chance that they were wrongly banned. I hope that match up rates have only been used as an indicator to suggest further scrutiny and that further tests have then been applied. This also suggests a reason why it has taken so long to ban some alleged cheats, match up rates are no indicator of engine use!
A decent player who also uses an engine isn't going to give you smoking gun engine-like moves. They will simply get all the tactics right & rarely if ever make a mistake.

If not overall matchup rates out of database & over time, then how exactly do you propose they are detected & banned?!

Maybe mods should simply ask suspects "are you using an engine?"
I'm sure they'd give an honest answer. πŸ™„

S

Joined
14 Jul 06
Moves
20541
Clock
29 Nov 08
Vote Up
Vote Down

Originally posted by DeepThought
...Forced lines and automatic recaptures, or situations where there are a small number of sensible moves and the rest are obviously losing. We seem to agree that eliminating these moves from the sample should increase the difference between genuine players and GenuineIntel players.
The methods I use compare the overall matchups between both players so this takes into account forcing lines.

If you have a game between 2 evenly rated players with forcing lines & get results like:

Result:
White: 2300(a)
Top 1 Match: 19/31 (61,3% )
Top 2 Match: 27/31 (87,1% )
Top 3 Match: 31/31 (100,0% )

Black: 2300(b)
Top 1 Match: 13/30 (43,3% )
Top 2 Match: 22/30 (73,3% )
Top 3 Match: 23/30 (76,7% )


Then who out of the 2300(a) or 2300(b) are you going to investigate further?

I honestly think that it is naive to discount forcing moves.
What next? Should we discount all tactics/combinations where the engine user forces the other player into a losing position because of a wonderful 10-move coup d’état?

I think games mods would still be investigating the first suspect on the site if we followed your discounted moves criteria to it's logical conclusion!

J

benching

Joined
17 Jul 08
Moves
1218
Clock
29 Nov 08
Vote Up
Vote Down

This discussion is taking the well trodden path of the thread that was removed earlier.

Let us take the amazing game below :-



If forced moves and recaptures should not be factored out then, database moves should also not be factored out. Taking into account all the moves, what are the matchups for the above game? This illustrates also that a credible "game cheat hunter" should be at least 2200 OTB not some prima donna loudmouth punk.

K
Demon Duck

of Doom!

Joined
20 Aug 06
Moves
20099
Clock
29 Nov 08
Vote Up
Vote Down

Originally posted by Fat Lady
Surely the reason for the low match up from the World Computer Championship is the fact that there were at least two or three absolutely dreadful programs in the competition? These would drag down the average match up for all the games in this tournament.
http://www.grappa.univ-lille3.fr/icga/tournament.php?id=178

Did your analysis take book moves into ...[text shortened]... ividual games between players on this site we have seen 90% match ups with Fritz' first choice.
Yes, I took all that into account. I removed all the games played by Mobile Chess for instance because it actually plays worse than I do when drunk. Opening moves were discarded even though opening theory was not as advanced in 1922 as it is now. In fact, the 1922 Vienna tournament player list is like looking at an opening book, most of the players' names are now opening or variation names.

I mentioned the two individual match up rates to show the range of results involved. We may not like a 25 move draw but it is part of the data and there is no good statistical reason to discard it, unlike the games played by Mobile Chess.

J

benching

Joined
17 Jul 08
Moves
1218
Clock
29 Nov 08
Vote Up
Vote Down

Originally posted by BlitzNewbie
Some time ago I posted in the OTB players club forum that I found simple matchup rates to be too one-dimensional to be used as conclusive evidence of engine use.
Kepler seems to back up my point of view.
The nature of the games analyzed need to somehow be taken into account. I have no idea of how this is done in praxis though...
People can quote statistics when it suits them and unquote them when it doesn't suit them. There was talk of a 25 move game as being "statistically insignificant" but when a certain individual was pressing for the exclusion of a certain player from the site he said "all it takes is a single game to prove 3b" (conveniently forgetting his own 100% match in a single game).

K
Demon Duck

of Doom!

Joined
20 Aug 06
Moves
20099
Clock
29 Nov 08
Vote Up
Vote Down

Originally posted by DeepThought
I´d wondered about the choice of top 3 moves myself. Can you expand on how and why you decided that testing beyond the first choice engine move makes the difference between engines and humans harder to detect (other than the obvious well they´ve got three chances to match aspect)? I think that this is a critical issue. For that matter I´d be interested ...[text shortened]... from the sample should increase the difference between genuine players and GenuineIntel players.
The main reason I chose first choice rather than top three is simply that no one could give me an adequate reason for preferring top three over any other number. In fact, I received no explanation at all. My conclusion is that it is an arbitrary number chosen (I suspect) because in the past it has given the desired result, namely an incriminating match up rate for a suspect. A good statistician does not modify his methods to give the desired result. Thinking about the whole n-top choice thing, it occurred to me that if engines produce engine match up rates significantly higher than humans then this should be true of the first choice. If we now increase the number of moves considered the engine has less room for improvement than the human and the gap between the two gets narrower. Furthermore, why stop at three? If we were to increase the number of choices considered sufficiently we could "prove" that my neighbour's cat is an engine. I decided to test my idea using some games played by two versions of Glaurung (guaranteed high match up) and some blitz games played by a couple of people I know elsewhere. The match up rates for the engines were high and increased a little with increasing number of moves analysed whereas the match up rates for humans were low but increased markedly with number of moves analysed.

I do not think that a reliable statistical test for engine use is impossible, just that the current top three engine choices over 30 seconds may well be unable to distinguish accurately between engine and human. It occurs to me that this is not actually a surprising result. Strong players from 1922 will not produce significantly worse moves than players today and were probably playing better than the majority of players on this site. I suspect they may actually have been capable of better play than all the players on this site. Similarly, a strong engine will produce good moves. If that were no the case strong engines would lose regularly to humans. I am also unconvinced by the idea that engines somehow produce moves with some characteristic whiff of silicon. Modern engines may produce dubious moves at times but so do modern GMs.

There are many ways that a good player can distinguish between engine and human. Unfortunately most of them are subjective and we require an objective test of "engineness". My current thought is that a more reliable way to distinguish humans and engines would be blunder rate. This is quite easy to check if we use an interface that sticks ? or ?? next to moves it considers downright bad. Now, it is possible that the move is really bad (it loses material for no compensation) or it is actually good but the engine does not understand it (many positional sacrifices are "bad" to an engine). Whether objectively good or bad does not matter, the fact is moves that get a ? or ?? would not be played by the analysis engine and are therefore unlikely to be played by other engines. Preliminary investigation of the same sample games indicates that blunder rate is a very accurate method of distinguishing between strong OTB players and engines. It should be possible to use this to distinguish between anyone playing in OTB fashion (Many games, short move times etc) and an engine but it may be more difficult to distinguish between correspondence-style play (few games, long move times etc) and engines.

Cookies help us deliver our Services. By using our Services or clicking I agree, you agree to our use of cookies. Learn More.