In my endless quest to find the perfect TC for a rating list, there are a few considerations. One is speed. A rating list needs hundreds of engines, and needs to keep up with the pace at which they’re released. So you want to go as fast as possible. Another is number of games. You want a margin of error below the Elo difference, and you want the same number of games for every tournament — which means you need enough games that no matter how comparable two engines are, the margin of error will still be less than the difference between them. So you want as many games as possible. 200 isn’t enough, and 500 is barely sufficient. But I’m realizing these days that another is the willingness of others to accept the project. I think this is very important. I try to put myself in the place of someone looking at a rating list that they’ve found via Google, and deciding whether or not to trust it. What would they think if they saw 1+0.1? How far up do I go in this thought experiment before this theoretical user starts to trust the project? My guess is 60+1. I know someone who tests at 25+1, but I can see my theoretical user looking at that TC and being skeptical.
But that’s only one extreme of the “willingness to accept”. The other, I think, is the prevalent way of looking at tests, that they should allow the engine enough time to do its best job. In this mindset, the STC is not really important anyway. But the LTC leaves us with tons of draws, which means that book makers will have less to work with, and it takes way more time. So if a short time control is 10+.1, and a long time control is 600+5, neither is going to work. There wouldn’t be universal acceptance of a rating list built on such a short TC, and there wouldn’t be enough time to test enough engines at ten minutes plus five seconds with one thousand games per match-up.
This is why I’ve settled on 60+1. When you factor in that a test needs to be legitimate, as well as produce as many games as possible, you have to go for the fastest acceptable TC.
But this does vastly slow down the production of the Chess Engine Rating List, which, of course that’s fine. It’s just a personal project taking away time from housework. One match-up takes a day now, instead of half an hour.