Open letter to Russ re/engine use

Squelchbelch

Only Chess

28 Nov 08

~~diskamyl~~

Joined: 29 Mar 07
Moves: 1260

01 Dec 08

1 edit

Originally posted by Gatecrasher
6th Correspondence Chess World Cup Final 1968-1971

{ Rittner, H. (Games: 15) }
{ Top 1 Match: 203/337 ( 60.2% )
{ Top 2 Match: 254/337 ( 75.4% )
{ Top 3 Match: 276/337 ( 81.9% )
{ Top 4 Match: 293/337 ( 86.9% )

{ Zagorovsky, V. (Games: 15) }
{ Top 1 Match: 201/378 ( 53.2% )
{ Top 2 Match: 267/378 ( 70.6% )
{ Top 3 Match: 311/378 ( 82. ...[text shortened]... tch: 1170/1565 ( 74.8% )
{ Top 3 Match: 1298/1565 ( 82.9% )
{ Top 4 Match: 1365/1565 ( 87.2% )

thanks for sharing the analysis. I think this, especially the matchup rate of correspondence chess world cup final, should be more than convincing for those who are rather skeptical about matchup rates being a sign of engine use.

could you also tell which engine was used for these?

luctruc

Joined: 28 Jan 04
Moves: 3570

01 Dec 08

Originally posted by diskamyl
thanks for sharing the analysis. I think this, especially the matchup rate of correspondence chess world cup final, should be more than convincing for those who are rather skeptical about matchup rates being a sign of engine use.

could you also tell which engine was used for these?

I believe it was the RCA Chess-aire Home Game Center.

Gatecrasher

Whale watching

33°36'S 26°53'E

Joined: 05 Feb 04
Moves: 41150

01 Dec 08

Originally posted by diskamyl
could you also tell which engine was used for these?

No. I didn't mention the engine or the time controls on purpose. The important thing is that they were the same for each batch.

Yuga

Renaissance

OnceInALifetime

Joined: 24 Sep 05
Moves: 30579

01 Dec 08

Originally posted by Gatecrasher
No. I didn't mention the engine or the time controls on purpose. The important thing is that they were the same for each batch.

In terms of Kepler's analysis, the strength and maybe the style of the engine might be important so implicitly the time controls would matter too but I largely agree.

Yuga

Renaissance

OnceInALifetime

Joined: 24 Sep 05
Moves: 30579

01 Dec 08

Originally posted by Kepler
I agree, we are not completely on the same page. I am not comparing a specific engine or human (x or y in your post) against the z entity, Glaurung in this case. What I did do was compare a sample of games known to have been played by strong humans, there was no possibility of engine use in 1922, and a sample of games known to have been played by strong engin ...[text shortened]... s one of these fancy superposition of states cat in box thingies that Schrodinger was fond of.

Thank you for your consideration of your interpretation of your two analyses and for clarifying a few details. For me, the purpose of your study and your interpretation of the implications of your analysis was somewhat unclear to me. I now agree that the results of your two analyses do not have to be contradictory.

Your analysis indicated that humans and engines could not be distinguished by match-up rates with a third entity, an engine, Glaurung 2.1. An ideal test standard for the purposes of your analysis would be an engine equal to the strength of the subjects studied and for the purposes of your study I think that the engine that you used would suffice. Additionally the collective strengths of the engines and humans should be uniform and the strengths of the humans should be as high as possible for such an analysis.

Again, regardless of your methodology, I agree with the result of your analysis although Gatecrasher’s analysis seems to indicate that match-up rates for engines against another engine seem to be slightly higher and that of a human compared with that engine but Gatecrasher’s data does not take into account the disparity of strengths of the humans and engines. Clearly, the trend indicates that the stronger the player, the higher the matchup with an engine.
The moderators do an analysis of human’s moves X versus computer’s moves Y. Your analysis involves a third entity Z so your analysis is not equivalent to the X-Y analysis that the moderators do. That is the basic distinction that I made and I believe that is the cause of some confusion.

The problem with your analysis is that, in my opinion, it is not a pure X-Y analysis as you seem to imply; more importantly, it is not the best way to distinguish between humans and engines to find out what constitutes a threshold of certainty for engine use in regards to match-up rates. I think that this is the real issue here.
I do not really wish to discuss how to find a threshold of certainty for engine use in detail for obvious reasons, even when just considering matchup rates. Somehow the game moderators can and do establish confidence intervals statistically although I would dispute the feasibility of such calculation if it did not assume the possible matchup rates of an exceptionally strong human in comparison with an engine.

Anyway, to determine what were the highest matchups with an engine were theoretically possible for a human, we would have to extrapolate the data of the strongest human players until we could determine how perfect humans would play. It is definitely doable [by statistical methods and/or by extensive analysis of human games] but maybe not very practical since humans do not play perfect chess although theoretically humans could play perfect chess.

However, I believe that the result of your analysis and Gatecrasher’s statistics MAY have implications regarding what constitutes a threshold of certainty. If engines that are stronger than humans can match-up 60-70% first choice with another strong engine and there is no difference between human and computer matchups with an engine given that humans and computers have been at the same strength, well, if a human comes along that can play at Anand’s strength in correspondence chess, that human may hit 63% in first choice engine moves. And if a human comes along that plays stronger than Anand’s strength in correspondence chess, that human may have a slightly higher first-choice matchup with a strong engine than 63%.

If, as Squelchbelch has noted, the Top 3 matchup "overwhelming evidence of engine use" over time in many games once out of database are as follows: Top 1 match = 60%+, Top 2 match = 75%+, Top 3 match = 85%+, the implication is someone who plays at the strength of Anand or Fischer [both 60%+ Top 1 match] on this site could be banned of engine use IF matchup rates were considered alone.

Now, the idea of a human playing on this site at or above the strength of Anand or Fischer at his peak seems silly. And maybe it has not happened yet – but consider that the top legitimate players on this site can score against engines in correspondence chess and these top engines can play at or above the strength of these players. So I think it is very possible for a human to play at such a high level in correspondence chess.

Squelchbelch

Joined: 14 Jul 06
Moves: 20541

01 Dec 08

Originally posted by Yuga
...
If, as Squelchbelch has noted, the Top 3 matchup "overwhelming evidence of engine use" over time in many games once out of database are as follows: Top 1 match = 60%+, Top 2 match = 75%+, Top 3 match = 85%+, the implication is someone who plays at the strength of Anand or Fischer [both 60%+ Top 1 match] on this site could be banned of engine use IF matchup ...[text shortened]... I think it is very possible for a human to play at such a high level in correspondence chess.

Yes very interesting, but the player I analysed had

Top 1 Match: 438/673 (65.1% )
Top 2 Match: 559/673 (83.1% )
Top 3 Match: 612/673 (91.0% )

which I think even Anand, Fischer & Kramnik would find hard going.

no1marauder

Naturally Right

Somewhere Else

Joined: 22 Jun 04
Moves: 42677

01 Dec 08

1 edit

Originally posted by Yuga
Thank you for your consideration of your interpretation of your two analyses and for clarifying a few details. For me, the purpose of your study and your interpretation of the implications of your analysis was somewhat unclear to me. I now agree that the results of your two analyses do not have to be contradictory.

Your analysis indicated that humans and e I think it is very possible for a human to play at such a high level in correspondence chess.

Yuga: So I think it is very possible for a human to play at such a high level in correspondence chess.

On what basis? Gatecrasher's, Kepler's, mine and others' analysis of the most successful human correspondence players in history shows that none of them reached such match up levels on a consistent basis before the advent of engines. Why do you think it's possible for players on RHP to do so?

Yuga

Renaissance

OnceInALifetime

Joined: 24 Sep 05
Moves: 30579

01 Dec 08

Originally posted by Squelchbelch
Yes very interesting, but the player I analysed had

Top 1 Match: 438/673 (65.1% )
Top 2 Match: 559/673 (83.1% )
Top 3 Match: 612/673 (91.0% )

which I think even Anand, Fischer & Kramnik would find hard going.

Clearly that match-up is very high. It would be interesting to know that user's matchups against other equally strong engines to the one that you analyzed that user against.

Kepler

Demon Duck

of Doom!

Joined: 20 Aug 06
Moves: 20099

01 Dec 08

Originally posted by Squelchbelch
Yes very interesting, but the player I analysed had

Top 1 Match: 438/673 (65.1% )
Top 2 Match: 559/673 (83.1% )
Top 3 Match: 612/673 (91.0% )

which I think even Anand, Fischer & Kramnik would find hard going.

That is extraordinary it has to be said. That is better than some of the engines in Gatecrasher's data.

Yuga

Renaissance

OnceInALifetime

Joined: 24 Sep 05
Moves: 30579

01 Dec 08

2 edits

Originally posted by no1marauder
Yuga: So I think it is very possible for a human to play at such a high level in correspondence chess.

On what basis? Gatecrasher's, Kepler's, mine and others' analysis of the most successful human correspondence players in history shows that none of them reached such match up levels on a consistent basis before the advent of engines. Why do you think it's possible for players on RHP to do so?

The strongest human players can hit 60%+ 1st choice OTB according to Gatecrasher's statistics. So I think can assume that it is possible for the strongest human correspondence players to do so as well.

I think I recall that only Rittner in that CC tournament in the 60's hit above 60% first choice, but CC and chess theory has greatly improved in the last 40 years. And increased chess strength corresponds with increased engine match-up. That is the basis for my statement.

I don't know Kepler's statistics, only his general methodology and results.

no1marauder

Naturally Right

Somewhere Else

Joined: 22 Jun 04
Moves: 42677

01 Dec 08

Originally posted by Yuga
The strongest human players can hit 60%+ 1st choice OTB according to Gatecrasher's statistics. So I think can assume that it is possible for the strongest correspondence players to do so as well.

I think I recall that only Rittner in that CC tournament in the 60's hit above 60% first choice, but CC and chess theory has greatly improved in the last 40 years. ...[text shortened]... my statement.

I don't know Kepler's statistics, only his general methodology and results.

There is absolutely no objective basis for your claims particularly that supposed increases in "chess theory" translate to increased engine match ups in correspondence play. It is also a claim that cannot be tested in any way, shape or form, so it is completely non-scientific conjecture on your part.

Yuga

Renaissance

OnceInALifetime

Joined: 24 Sep 05
Moves: 30579

01 Dec 08

Originally posted by no1marauder
There is absolutely no objective basis for your claims particularly that supposed increases in "chess theory" translate to increased engine match ups in correspondence play. It is also a claim that cannot be tested in any way, shape or form, so it is completely non-scientific conjecture on your part.

It is possible that what I said was not wholly accurate.

But an increase in chess theory implies an increase in chess strength. An increase in chess strength directly corresponds with an increase in engine match-up.

Carterson

Joined: 22 Nov 08
Moves: 981

01 Dec 08

so if i am right in thinking on this really the only place to test all of these matchups is really in the middle game because of opening books and of course endgame books. but does it include the entire middlegame? i mean i try and style my play after lasker and i am sure i do not make the most correct move all of the time, i just try and find the move that makes my opponent the most uncomfortable. but what happens when a person hits a match on one move and then hits one a few moves down the road that leads to his plan which was myabe different than the engine's plan that leads to sort of the same result?

no1marauder

Naturally Right

Somewhere Else

Joined: 22 Jun 04
Moves: 42677

01 Dec 08

Originally posted by Yuga
It is possible that what I said was not wholly accurate.

But an increase in chess theory implies an increase in chess strength. An increase in chess strength directly corresponds with an increase in engine match-up.

According to analysis already presented in this thread, Rubinstein, Reti and Co. were matching up at the same rate in 1922 in OTB that Rittner and other top correspondence players were matching up in the 1960's. Where is the evidence that supports the claim you are making? It seems to me from everything I have seen that there really is a ceiling over which human players are not going to reach as regards matching up to engines on a consistent basis.

Kepler

Demon Duck

of Doom!

Joined: 20 Aug 06
Moves: 20099

01 Dec 08

Originally posted by Yuga
Thank you for your consideration of your interpretation of your two analyses and for clarifying a few details. For me, the purpose of your study and your interpretation of the implications of your analysis was somewhat unclear to me. I now agree that the results of your two analyses do not have to be contradictory.

Your analysis indicated that humans and e ...[text shortened]... I think it is very possible for a human to play at such a high level in correspondence chess.

I agree with all of that except the continued notion that I am trying to set a threshold of some kind. I am actually just trying to show that it is possible to distinguish between engines and humans. To do that I do not need a threshold, all I need is to be able to show that it is unlikely that the mean match up rates produced by humans are significantly different to those produced by engines. It doesn't actually matter whether human match up rates are lower or higher than those for engines, just that there be a significant difference. Of course, I expected the engine match ups to be higher than the human match ups. It is logical that engines should match engines at a higher rate than humans which is why i was surprised when my data came up with no statistical difference at all.