Regular expressions are not limited to software or to text editors. They also have use in office apps, search forms, etc. But this is just a quick heads-up on how to run a regular expression search and replace in Notepad++, which is a Windows text editor.

Say you have a file with only one PGN file total. We can experiment with this.

[Event "Patricia - 4ku"]
[Site "Chess Nerd"]
[Date "2024.12.02"]
[Round "1"]
[White "Patricia 3.1"]
[Black "4ku 5.1"]
[Result "1-0"]
[ECO "B22"]
[GameDuration "00:05:35"]
[GameEndTime "2024-12-02T10:59:02.042 Central Standard Time"]
[GameStartTime "2024-12-02T10:53:26.553 Central Standard Time"]
[Opening "Sicilian"]
[PlyCount "129"]
[TimeControl "120+1"]
[Variation "Alapin's variation (2.c3)"]

1. e4 {+0.54/21 5.1s} c5 {-0.31/22 4.6s} 2. c3 {+0.46/21 6.2s} d5 {-0.02/22 4.8s} 3. exd5 {+0.43/24 5.7s} Qxd5 {-0.14/23 4.3s} 4. d4 {+0.38/22 6.4s} Nc6 {+0.01/23 4.1s} 5. Nf3 {+0.44/22 3.6s} Nf6 {+0.04/23 4.2s} 6. Be2 {+0.44/23 4.5s} Bf5 {-0.08/24 7.8s} 7. c4 {+1.01/22 5.7s} Qd6 {0.00/27 5.3s} 8. d5 {+1.33/22 4.0s} Nb4 {0.00/27 3.6s} 9. O-O {+1.50/21 3.8s} Nc2 {+1.32/25 4.4s} 10. Nh4 {+1.55/18 3.4s} Bg6 {+0.97/27 6.2s} 11. Nc3 {+1.63/18 3.5s} Nxa1 {+0.49/26 8.3s} 12. Nb5 {+1.85/20 3.3s} Qb8 {+0.38/27 2.6s} 13. Nxg6 {+1.90/18 3.2s} hxg6 {+1.52/25 2.5s} 14. g3 {+1.89/20 4.0s} Qc8 {+0.76/24 4.5s} 15. Re1 {+2.23/20 4.1s} Qh3 {+2.07/26 12s} 16. Nd6+ {+2.15/19 3.8s} Kd7 {+3.24/23 2.0s} 17. Qa4+ {+1.88/22 4.7s} Kxd6 {+2.50/25 4.9s} 18. Bf4+ {+1.76/23 2.6s} e5 {+0.86/26 4.2s} 19. Bxe5+ {+1.71/23 4.0s} Ke7 {+0.88/26 1.7s} 20. Bf3 {+2.57/20 2.5s} Kd8 {+0.87/26 2.1s} 21. Bxf6+ {+2.51/22 6.0s} Kc7 {+0.72/25 1.7s} 22. Be7 {+3.21/22 2.8s} Bxe7 {0.00/24 3.0s} 23. Rxe7+ {+3.83/22 3.1s} Kd6 {0.00/25 1.9s} 24. Rxb7 {+4.09/22 4.8s} Qxh2+ {0.00/26 1.6s} 25. Kf1 {+4.21/19 1.9s} Qh3+ {0.00/27 1.8s} 26. Bg2 {+3.46/20 2.3s} Qf5 {0.00/29 1.4s} 27. Qc6+ {+3.46/22 2.5s} Ke5 {-1.19/26 3.3s} 28. Re7+ {+3.60/22 1.8s} Kd4 {-1.12/28 4.6s} 29. Re4+ {+3.32/23 4.1s} Qxe4 {-1.45/25 1.2s} 30. Bxe4 {+3.45/21 1.9s} Rac8 {-1.44/25 1.3s} 31. Qd7 {+3.90/20 1.7s} Kxe4 {-1.26/24 1.8s} 32. Ke2 {+3.83/21 1.6s} g5 {-2.81/24 2.8s} 33. Qxf7 {+3.93/20 2.1s} Rce8 {-2.82/24 4.7s} 34. d6 {+4.58/20 1.7s} Ke5 {-2.65/25 1.8s} 35. d7 {+4.77/20 1.5s} Rd8 {-3.31/22 2.2s} 36. Qe7+ {+5.16/20 1.7s} Kf5 {-3.42/22 1.1s} 37. b4 {+5.22/21 2.3s} cxb4 {-2.60/22 1.1s} 38. c5 {+5.44/20 1.6s} b3 {-2.53/22 1.1s} 39. Qf7+ {+5.52/20 1.7s} Ke4 {-5.86/22 2.0s} 40. axb3 {+5.52/21 2.3s} Nxb3 {-6.84/23 0.91s} 41. Qxb3 {+5.81/23 1.9s} Kd4 {-7.01/25 1.1s} 42. c6 {+6.28/20 1.4s} Ke5 {-7.01/27 1.2s} 43. Qd1 {+6.96/20 3.5s} Ke6 {-7.01/25 0.87s} 44. Ke3 {+7.06/19 1.2s} g4 {-8.10/22 0.92s} 45. Qd4 {+7.42/19 1.6s} a5 {-9.38/25 2.2s} 46. Kd3 {+7.36/21 1.4s} a4 {-9.34/25 1.9s} 47. Kc4 {+7.44/20 1.2s} a3 {-9.75/25 1.4s} 48. Qd5+ {+8.24/18 1.6s} Ke7 {-9.85/26 1.2s} 49. Qe5+ {+9.32/21 2.4s} Kf7 {-9.87/27 0.96s} 50. Qf5+ {+9.59/21 1.3s} Ke7 {-14.81/25 1.0s} 51. Kd5 {+9.75/21 1.6s} Rxd7+ {-16.62/28 1.9s} 52. Qxd7+ {+10.93/21 1.1s} Kf6 {-299.58/30 1.7s} 53. Qe6+ {+12.14/20 1.3s} Kg5 {-28.83/23 0.78s} 54. Qe7+ {+12.18/21 2.6s} Kg6 {-299.58/25 1.5s} 55. Qxa3 {+14.15/19 1.2s} Kf7 {-299.80/25 0.78s} 56. Qa4 {+15.13/20 1.3s} Rd8+ {-299.80/24 1.2s} 57. Kc5 {+22.44/21 1.6s} Ke6 {-299.68/26 1.7s} 58. Qxg4+ {+37.58/22 1.6s} Ke5 {-299.84/27 0.84s} 59. Qg5+ {+M15/26 1.1s} Ke4 {-299.86/26 0.73s} 60. c7 {+M11/25 1.4s} Rd6 {-299.90/26 1.0s} 61. Qe3+ {+M9/25 1.3s} Kf5 {-299.92/24 0.73s} 62. c8=Q+ {+M7/26 1.6s} Kg6 {-299.94/26 0.85s} 63. Qe4+ {+M5/26 1.2s} Kg5 {-299.96/27 1.1s} 64. Qf4+ {+M3/26 1.3s} Kh5 {-299.98/26 0.96s} 65. Qcg4# {+M1/26 0.93s, White mates} 1-0

Perhaps you would like to strip out the comments. As you can see, all PGN comments are surrounded by curly brackets. These appear nowhere else in the PGN.

To use a regular expression is to look for more than simply one literal search term. In this case, you would search on  \{.*?\}

Just look at part of the file. 88. Kg1 {-299.92/34 0.69s} Kg5 … You take out the space before it, then the comment and everything inside it. When I run that on the PGN included above, I get this result:

[Event "Patricia - 4ku"]
[Site "Chess Nerd"]
[Date "2024.12.02"]
[Round "1"]
[White "Patricia 3.1"]
[Black "4ku 5.1"]
[Result "1-0"]
[ECO "B22"]
[GameDuration "00:05:35"]
[GameEndTime "2024-12-02T10:59:02.042 Central Standard Time"]
[GameStartTime "2024-12-02T10:53:26.553 Central Standard Time"]
[Opening "Sicilian"]
[PlyCount "129"]
[TimeControl "120+1"]
[Variation "Alapin's variation (2.c3)"]

1. e4 c5 2. c3 d5 3. exd5 Qxd5 4. d4 Nc6 5. Nf3 Nf6 6. Be2 Bf5 7. c4 Qd6 8. d5 Nb4 9. O-O Nc2 10. Nh4 Bg6 11. Nc3 Nxa1 12. Nb5 Qb8 13. Nxg6 hxg6 14. g3 Qc8 15. Re1 Qh3 16. Nd6+ Kd7 17. Qa4+ Kxd6 18. Bf4+ e5 19. Bxe5+ Ke7 20. Bf3 Kd8 21. Bxf6+ Kc7 22. Be7 Bxe7 23. Rxe7+ Kd6 24. Rxb7 Qxh2+ 25. Kf1 Qh3+ 26. Bg2 Qf5 27. Qc6+ Ke5 28. Re7+ Kd4 29. Re4+ Qxe4 30. Bxe4 Rac8 31. Qd7 Kxe4 32. Ke2 g5 33. Qxf7 Rce8 34. d6 Ke5 35. d7 Rd8 36. Qe7+ Kf5 37. b4 cxb4 38. c5 b3 39. Qf7+ Ke4 40. axb3 Nxb3 41. Qxb3 Kd4 42. c6 Ke5 43. Qd1 Ke6 44. Ke3 g4 45. Qd4 a5 46. Kd3 a4 47. Kc4 a3 48. Qd5+ Ke7 49. Qe5+ Kf7 50. Qf5+ Ke7 51. Kd5 Rxd7+ 52. Qxd7+ Kf6 53. Qe6+ Kg5 54. Qe7+ Kg6 55. Qxa3 Kf7 56. Qa4 Rd8+ 57. Kc5 Ke6 58. Qxg4+ Ke5 59. Qg5+ Ke4 60. c7 Rd6 61. Qe3+ Kf5 62. c8=Q+ Kg6 63. Qe4+ Kg5 64. Qf4+ Kh5 65. Qcg4# 1-0

 \{.*?\}

The space is self-explanatory. The backslash is to “escape” the left curly bracket, i.e. to keep it from being used as an actual reg exp character, instead of a search character. The next three are always together. They are a dot to say “any one character”, an asterisk to modify the dot to include as many as needed. The question mark to say, don’t get overambitious in your searching. The right curly bracket is then escaped (for the same reason the left one was) and that is really all it is.

Say you want to change the Event value to Big Tournament. You would use the following search text: ^\[Event ".*?"\]$ to indicate the beginning of a line, the tag you’re looking for, random content that you’re not keeping track of with parentheses, and then the rest of the tag and the end-of-line indicator.

^ means the beginning of a line. $ means the end of it. Backslashes escape the square brackets, and you can just replace the whole thing with \[Event "Big Tournament"\]

You don’t need to specify beginning or end of line in replace text, and since it’s always the same value, you just type it in literally.

These are two good examples to give you an idea of how to make this system work for you. It’s mostly a matter of looking things up and asking the various robots how to write the command line scripts. Things like that.

Still not totally convinced this is finished, since it took such a long time. I started with the 2025-01 FIDE XML players list, culled it with a ChatGPT-generated Python script so that only players rated 2000+ were included. Then converted that to XLSX (amongst other formats) and copied out the column in that spreadsheet for the FIDE IDs of those players. Then, with another Python script, I was able to download a JSON file for each player from the FIDE API, using a wrapper from a GitHub repository. These JSON files have all the information for those players. Not just general information, but the ratings and number of games for every month that they have a rating for. So these files turned out to be pretty long. The next step was converting all of those to a new XML players list, which includes all the history, as well as the general information. Even though the number of players is drastically reduced to only a bit more than 19,000, still the new XML players list is about twice the size of the old one. I did make sure to streamline the elements, so that, as much as possible, they resemble the elements in the regular players list.

https://www.mediafire.com/file/enoh1ljui47g305/fide-ratings-and-history-2000+-240115.zip/file

The size of the ZIP is about 40 MB, and oddly the size of the XML file is about 1.2 GB. I’m not sure how it compressed so well, but it seems to have done so.

The player’s list is provided with each month, but not provided in the archives for previous months. While it would be somewhat useful to have an archive of them, there’s no reliable way to build it. However, all three lists are provided as XML, which can be combined with a script. So, using a Python script written by ChatGPT, I’m able to create combined XML files which don’t contain the inactive players, but nonetheless could easily stand in for the player’s list. And these can be provided for every month that there are XML files. Here is December 2024.

https://www.mediafire.com/folder/86n30ijqd3cos/2024-12

I’ve made many attempts now to create the perfect Python script for SF commits (via ChatGPT, of course), and finally realized that the problem was that almost everything necessary is already at Abrok. Official builds are over at the official site anyway. But all the commits are here, going back to 2018 or so. Here is a list of those commits, with direct links to the executables.

https://www.mediafire.com/file/v3y5kjnvxuj5528/stockfish-data-250113.txt/file

The FIDE player list is a combination of the three ratings lists (standard, rapid, and blitz) but is very big and includes lots of players who are registered but have no rating. After running a Python script to keep only the players rated 2000 or above, I can provide a much smaller list that might also be more useful. In the MediaFire directory for 2025-01, you can find it in XML, CSV, JSON, XLSX, ODS, and TXT.

https://www.mediafire.com/folder/kpyvsijjhsiiw/2025-01

CSVs made from the full tables downloadable from:
https://training.lczero.org/matches/?show_all=1
https://training.lczero.org/networks/?show_all=1

These are tables of data related to the Leela Chess Zero self-test matches and NNUE networks. Aside from all the relevant data in the matches table, you can also turn the first value in each record — the ID field — into a URL to download the PGN itself if you put it in the following form:

https://storage.lczero.org/files/match_pgns/1/ID-NUM.pgn

It should be noted that while most of the links will come from run 1, some of them don’t, so the /1/ before the ID-NUM is actually variable data as well, and is also listed in the table along with the ID.

The data for the networks, similarly relevant, can not so easily be turned into URLs. They come with the table, but converting strips those out, and so far I don’t have a way around that. So instead I have the networks page itself, which is available at: https://storage.lczero.org/files/networks/

https://www.mediafire.com/file/ki2bt9vgbq55298/networks_storage_output.csv/file

The above link is to a CSV I just made of the networks page itself. This is of limited use, so it’s static. In the next iteration of the script, I’ll try and put all this functionality together. Or, rather, get ChatGPT to do so, as I couldn’t code my way out of a sack of pythons.

NOTE: Those initial HTML links should be saved directly to the hard drive. Attempting to view either link would result in your browser tab crashing, as they’re both too large to display. Hence downloading them directly via the browser and then converting them to CSV, which is what I did, and what these files are:

https://www.mediafire.com/file/fmjc441b7y9dvht/lc0_data_241220.zip/file

And here is the Python script that ChatGPT wrote that generates the CSVs. Using PyInstaller (and with ChatGPT’s help, of course) I managed to get it into EXE form. All you have to do is double-click on it, and it will automagically download both HTML files for you, then convert them each to CSV, putting everything into whatever folder it happens to be in:

https://www.mediafire.com/file/8abgkivf5tl5fl9/lc0_tables_generator.zip/file