This script will prompt you for beginning and end year/month, then download everything you ask for, extract each one, and filter each one in pgn-extract according to certain criteria that can be changed.
This is what the file pgn-extract-command.txt looks like:
Anything from -t to -w can be changed. For instance, if you want to change the minimum and maximum moves — or to take one or both parameters out, here’s the place to do it. If you want to take out the requirement that a game end in checkmate, just take out the –checkmate option. The –evaluation option is what creates those evaluations for each move. The -w9999 is what makes each movelist one line. -D checks for doubles, -e specifies that you want ECO information and opening names. (Which is what eco.pgn is for.) If you want to change the contents of tags.txt, you’d either be changing the minimum Elo for either white or black, or the minimum time control — or taking any of those requirements out. The roster.txt file is a list of tag names, one per line, that define the order those tags go in. Specifying –xroster means that it will remove any tags *not* in the roster, as well as put them in the same order.
Because this script needs pgn-extract.exe to run, it’s necessary to keep the binary and all of its dependent files in the same directory as the script.
This is a C program written by ChatGPT o1, at my direction. To be clear, I know nothing about coding, especially in a low-level language. The form that the command takes is:
It should be run in the same directory as the executable, at least for the command to take the form above. If you wanted, you could have each path be a full path. If there are any spaces in the path, you should surround it with double quotes.
It should be noted that because this is written in C and optimized for speed, it runs bizarrely fast. It finishes a gigabyte in about two seconds. It should also be noted that for the same reasons, it’s just going to do it, and not check to make sure there’s enough room. So if you are filtering a very large PGN, make sure you have enough space and resources.
If you simply double-click on the file, it should prompt you for a few things, like the PGN or the path to a directory, for games to convert, the time it takes per move, the color of the board, and… I guess that’s all. Point is, all you need to bring is the path to your PGN(s). It should do the rest. Here is a raw PGN that I’m using as a source in this case:
[Event "Casual game"][Site "London ENG"][Date "1851.06.21"][EventDate "?"][Round "?"][Result "1-0"][White "Adolf Anderssen"][Black "Lionel Adalbert Bagration Felix Kieseritzky"][ECO "C33"][WhiteElo "?"][BlackElo "?"][Source "La Régence, v3 n7, July 1851, pp221-222"][SourceNote "ends 19.Ke2"][Source2 "The Chess Player, vol.i no.1, 1851.07.19, p.2"][PlyCount "45"]1.e4 e5 2.f4 exf4 3.Bc4 Qh4+ 4.Kf1 b5 5.Bxb5 Nf6 6.Nf3 Qh67.d3 Nh5 8.Nh4 Qg5 9.Nf5 c6 10.g4 Nf6 11.Rg1 cxb5 12.h4 Qg613.h5 Qg5 14.Qf3 Ng8 15.Bxf4 Qf6 16.Nc3 Bc5 17.Nd5 Qxb2 18.Bd6Bxg1 {It is from this move that Black's defeat stems. WilhelmSteinitz suggested in 1879 that a better move would be18... Qxa1+; likely moves to follow are 19. Ke2 Qb2 20. Kd2Bxg1.} 19. e5 Qxa1+ 20. Ke2 Na6 21.Nxg7+ Kd8 22.Qf6+ Nxf623.Be7# 1-0
And here is a GIF produced with this program in green at 1s per move:
Created from the combined CCRL that I have on this site. That means the engine ratings are from that list — though it was designed to adhere closely to the actual CCRL. One feature of that list is that all the extraneous information is removed from the engine names. Meaning if an engine is listed as, say, “Stockfish 15 avx2 4CPU” or something like that, it was first changed to “Stockfish 15” before the rating list was made. As a consequence, all the engine names here are also just the plain engine name and the version number/name. So you’d have to make those changes first to whatever PGN you were adding ratings to.
Just load this into Scid via Options — Resources…, or into Scid vs. PC via Options — Load Spellcheck File. It then works like a normal SSP ratings file.
These are just like the previous set of engine games, in that no player is rated below 2300, and no game has less than 10 moves or more than 150. Each game also has a high-enough beauty score in ChessBase that it has three out of three medals. This represents about 2% of the original database. All games have opening tags, ply count, beauty scores, evaluations, and novelty annotations.
“Artemis” games are games that are considered strong (2300+ and 10+ moves) and most beautiful (meaning three medals for beauty in ChessBase 17 or 18). These are the engine games that I ran for the FDRL (the Fixed Depth Rating List) and the CERL (the Chess Engine Rating List). Games have openings, evaluations, beauty scores, and novelty annotations.