Only Chess
06 Mar 08
Originally posted by GatecrasherBut at the same time, it will depend what hypothesis is being tested.
Correct. Game selection is critical if you are using statistical inference from a batch of games. Any selection criteria imposed have to be unbiased with regard to the probability of matching an engine. You can't start with a large group, weed out all those games with low match-ups, and apply a statistical test to the balance. That would totally invalidate the results.
If the allegation, for example, is that a player cheated in a particular tournament, then it makes no sense to test the player's other games as well. Indeed it would be methodologically wrong from a statistical point of view.
If someone stood accused of shop lifting at a particular shop at a particular time, it would be pretty pointless going over CCTV footage of other shops at other times as well.
To do so would be to test a different hypothesis from the one alleged - ie, that the player always cheats / always shoplifts at all shops. And a lack of statistical significance would therefore only disprove this wider hypothesis (which was never alleged), not the more specific allegation.
Originally posted by murrowIf you go back to my original post, you'll see that was addressed. It was even suggested that in selecting games from said tournament, say 50 total, you would probably even shave that down to 20 of the 50 games on the basis that he was playing opponents of insufficient skill to justify engine use. But to further reduce the set would, in this case, tend towards a personal bias, rather than logical one. In other words, I agree with what you are getting at. Even more, what your getting at is the MOST IMPORTANT factor in selecting games. You just have to be careful not to cross the line from "player cheated from game A thru game B" to "player cheated in game A and B", as this significantly skews the statistics.
But at the same time, it will depend what hypothesis is being tested.
If the allegation, for example, is that a player cheated in a particular tournament, then it makes no sense to test the player's other games as well....
Originally posted by SydrianI don't see any problem with selecting specific games, provided this is done BEFORE the games are analysed.
If you go back to my original post, you'll see that was addressed. It was even suggested that in selecting games from said tournament, say 50 total, you would probably even shave that down to 20 of the 50 games on the basis that he was playing opponents of insufficient skill to justify engine use. But to further reduce the set would, in this case, tend towar ...[text shortened]... ru game B" to "player cheated in game A and B", as this significantly skews the statistics.
Obviously it would be wrong to analyse a bunch of games and then only select the ones that match to include in the statistical test.
Although in a way I guess this is kind of what is done by the players alleging the abuse in the first place, before the games are analysed by the mod-squad.
In summary, then: 😕🙄 tricky!
Originally posted by Dragon FireSince this post is pure speculation, let me add some speculation. The fact is Cludi wasn't "caught" by another Game Mod but by another player (admittedly an ex-Game Mod). As a Game Mod, he must have thought himself above suspicion. That would make him not a fool, but a bit of a gambler. He might have thought his risk of detection was minimal.
One thing driving me nuts here. If cludi was cheating and that certainly has not been proven then I see a major flaw in the whole system.
As a game mod cludi would know how games were chosen, he would also know the criteria that needed to be looked at to determine if a player was cheating (i.e. he would know what constitutes "engine moves" ). Knowing ...[text shortened]... assume a mistake has been made here and hope cludi will return as he had so much to offer.
The "witch hunt" language is tiresome and predictable. Every single time a popular (and many times when a not so popular) as been IDed as a cheat, the same red flags have been thrown up. As I said before, there were no witches (at least ones that did actual harm) but there are engine cheats at RHP and they do harm to the site every day.
There is little basis for your claim that "this site has less cheaters than many others and far more is done than most sites," Even if it were true, we shouldn't accept that because other sites do nothing, we should be satisfied with very little at RHP.
I don't think Cludi is owed ANYTHING. A player came forth with evidence based on engine analysis that, in this player's expert opinion, indicated engine use. he submitted it to the Game Mods. The Site Admins then shut down the Game Mods before a determination as to guilt could be made. Given these facts, the player had only two choices: do nothing with the probable result that what he considered an engine cheat would still be able to cheat or make a stink. I consider the latter more honorable and better for the site. The evidence can be considered by the new Game Mod team.
If anybody is owed an apology, it's Dave Tebb for the ridiculous assertion that he would make such a serious accusation because he lost. And I don't think we should assume anything; I think the new Game Mod team should look at the evidence with an open mind and judge the merits of the case based on the totality of that evidence.
Originally posted by murrowBecause that is not how statistics work, for all practical purposes. With statistics, any time you use data you suspect of leaning one way, and ONLY data that leans one way it has a high likelihood of skewing the data. Using statistics, if you took only the games you found suspicious of the accused, it is highly likely that its going to lean towards your suspicions, regardless of weather your suspicions are true or not.
I don't see any problem with selecting specific games, provided this is done BEFORE the games are analysed...
Although, as a time saving device, it would be a valid test to see if a more serious investigation was needed. With 5 games that stand out, if they don't come close to justifying your claim you would have no reason to look into the rest of the games.
Originally posted by SydrianWe'll have to agree to disagree about 'how statistics work'.
Because that is not how statistics work, for all practical purposes. With statistics, any time you use data you suspect of leaning one way, and ONLY data that leans one way it has a high likelihood of skewing the data. Using statistics, if you took only the games you found suspicious of the accused, it is highly likely that its going to lean towards your sus ...[text shortened]... ome close to justifying your claim you would have no reason to look into the rest of the games.
According to your logic there would be no possible way of proving that someone cheated (or, by extension does anything) on specific occasions, rather than generally.
I think it would still be overwhelming evidence if 20 specific games were shown to match over say 85%...
Originally posted by SydrianThe thing is that here the game mods are not trying to estimate exactly what percentage of moves a "suspect" uses an engine, but whether he uses it or not.
Because that is not how statistics work, for all practical purposes. With statistics, any time you use data you suspect of leaning one way, and ONLY data that leans one way it has a high likelihood of skewing the data. Using statistics, if you took only the games you found suspicious of the accused, it is highly likely that its going to lean towards your sus ...[text shortened]... ome close to justifying your claim you would have no reason to look into the rest of the games.
In that sense, any such skewing is only positive.
If he doesn't use an engine then the data will not be skewed.
If he uses an engine, the data will be skewed in the direction of easier identification.
It's actually better to do what you're criticizing.
Originally posted by murrowBasically, yes, that is what I believe. You can not prove anything specifically, in a statistical sense. Statistics, by nature are a generality. And, yes, 20 games with a high match up rate would be fairly damning, but in the study of, say, the 50 games they took place in would be required for a fair evaluation. The real point of contention at that point, is at what probability is it conclusive?
According to your logic there would be no possible way of proving that someone cheated on specific occasions, rather than generally.
Another rather important point is the strength of play in the games/moves with high match up vs the ones with a normal match rate. If the match up % is exceedingly high in is say, 1 out of 5 games, but the quality of play from the accused is roughly the same in all games it is quite reasonable to consider the possibility that the 1 in 5 games an anomaly.
Considering the strength of play in a game by the player could be used to FURTHER narrow down the data set. This would probably be the BEST determining factor of what specific games to analyze. It SHOULD catch the games brought into question, and a reasonable amount of other games to test against. Of course this is a VERY DIFFICULT filter to apply to a data set, as only very strong players could tell a noticeable difference in the strength of play. I will assume there were players that could make this distinction on the old mod group, and would say it was a very important point in the discussion of what games to evaluate.
Originally posted by SydrianAny such assessment is bound to be purely subjective as a (very) strong player could play badly in some games for a whole variety of reasons, e.g. he is tired, drunk, in a hurry, over confident, messing around on an "easy" win, etc. whereas against stronger opponents or in importrant games he may take more time and be much more careful and therefore play better.
Basically, yes, that is what I believe. You can not prove anything specifically, in a statistical sense. Statistics, by nature are a generality. And, yes, 20 games with a high match up rate would be fairly damning, but in the study of, say, the 50 games they took place in would be required for a fair evaluation. The real point of contention at that point, is ...[text shortened]... group, and would say it was a very important point in the discussion of what games to evaluate.
Need I go on! There are simply dozens of reasons why a person may play differently in some games to others and to only select his best performances would skew results to prove a pre determined point. At the very least it would be necessary to review a sizeable percentage of games against similar strength opposition and if a high engine match up is then achieved to review those games for engine moves. By itself a high engine match up is indicitive (as are many other things) but not conclusive proof in itself.
Originally posted by Dragon FireCludi played 48 games in the 2007 Championship which probably equates to well over a 1000 non-opening book moves. Are you claiming that this sample is insufficient?
Any such assessment is bound to be purely subjective as a (very) strong player could play badly in some games for a whole variety of reasons, e.g. he is tired, drunk, in a hurry, over confident, messing around on an "easy" win, etc. whereas against stronger opponents or in importrant games he may take more time and be much more careful and therefore play ...[text shortened]... h engine match up is indicitive (as are many other things) but not conclusive proof in itself.
Originally posted by Dragon FireA very good point Dragon... Now that I've taken the time to discuss this/think about it, I realize how difficult, and unappreciated job the game mods truely had/will have. I also think its virtually impossible to say that someone was without a doubt using an engine in a very specific set of games based on the analysis of those games alone.
There are simply dozens of reasons why a person may play differently in some games to others....By itself a high engine match up is indicitive (as are many other things) but not conclusive proof in itself.
Originally posted by no1marauder
Cludi played 48 games in the 2007 Championship which probably equates to well over a 1000 non-opening book moves. Are you claiming that this sample is insufficient?
Marauder, the point here isn't so much about the size of the sample, as it is the criteria to which games to analyze. In other words, do all 47 games need to be looked at, or can certain games be omitted. If games can be omitted, what reasons are valid?
Originally posted by SydrianIf there is sufficient time, I see no reason why all 48 can't be analyzed. Or one could use consistent criteria to exclude games. When running checks on players here, I usually omit games between players with large ratings differences or short, tactical games, both of which tend to have higher matchups than normal. Such omissions are favorable to the "suspect".
A very good point Dragon... Now that I've taken the time to discuss this/think about it, I realize how difficult, and unappreciated job the game mods truely had/will have. I also think its virtually impossible to say that someone was without a doubt using an engine in a very specific set of games based on the analysis of those games alone.
Marauder, the p ...[text shortened]... e looked at, or can certain games be omitted. If games can be omitted, what reasons are valid?
Originally posted by SydrianSydrian, nothing personal here, but you are out of yer element. lower rated player make bad moves and higher rated players make better moves gms better moves still, but chess computers make very strange moves that human players do not make. That is where the "science" of statistics comes into play.
A very good point Dragon... Now that I've taken the time to discuss this/think about it, I realize how difficult, and unappreciated job the game mods truely had/will have. I also think its virtually impossible to say that someone was without a doubt using an engine in a very specific set of games based on the analysis of those games alone.
Marauder, the p e looked at, or can certain games be omitted. If games can be omitted, what reasons are valid?
Originally posted by no1marauderAha! a piece to the puzzle, thanks.
If there is sufficient time, I see no reason why all 48 can't be analyzed. Or one could use consistent criteria to exclude games. When running checks on players here, I usually omit games between players with large ratings differences or short, tactical games, both of which tend to have higher matchups than normal. Such omissions are favorable to the "suspect".
Originally posted by eldragonflyTrue, computers can make some strange moves, but I would have to assume that a strong player, especially one that knows how games are reviewed, trying to cheat would avoid playing those type of moves in his games. Your also right, I am out of my element. I wouldn't have the slightest clue between a strong human move and a strange computer one.
...chess computers make very strange moves that human players do not make...
edit:
I'd also like to say, everything is basically "in theory", rather meaningless, and most importantly a discussion I found rather interesting. Nobody should take it all that serious, nor am I making suggestions. Things were bouncing around in my head.