Simple RegExp Tutorial

Regular expressions are not limited to software or to text editors. They also have use in office apps, search forms, etc. But this is just a quick heads-up on how to run a regular expression search and replace in Notepad++, which is a Windows text editor.

Say you have a file with only one PGN file total. We can experiment with this.

[Event "Patricia - 4ku"]
[Site "Chess Nerd"]
[Date "2024.12.02"]
[Round "1"]
[White "Patricia 3.1"]
[Black "4ku 5.1"]
[Result "1-0"]
[ECO "B22"]
[GameDuration "00:05:35"]
[GameEndTime "2024-12-02T10:59:02.042 Central Standard Time"]
[GameStartTime "2024-12-02T10:53:26.553 Central Standard Time"]
[Opening "Sicilian"]
[PlyCount "129"]
[TimeControl "120+1"]
[Variation "Alapin's variation (2.c3)"]

1. e4 {+0.54/21 5.1s} c5 {-0.31/22 4.6s} 2. c3 {+0.46/21 6.2s} d5 {-0.02/22 4.8s} 3. exd5 {+0.43/24 5.7s} Qxd5 {-0.14/23 4.3s} 4. d4 {+0.38/22 6.4s} Nc6 {+0.01/23 4.1s} 5. Nf3 {+0.44/22 3.6s} Nf6 {+0.04/23 4.2s} 6. Be2 {+0.44/23 4.5s} Bf5 {-0.08/24 7.8s} 7. c4 {+1.01/22 5.7s} Qd6 {0.00/27 5.3s} 8. d5 {+1.33/22 4.0s} Nb4 {0.00/27 3.6s} 9. O-O {+1.50/21 3.8s} Nc2 {+1.32/25 4.4s} 10. Nh4 {+1.55/18 3.4s} Bg6 {+0.97/27 6.2s} 11. Nc3 {+1.63/18 3.5s} Nxa1 {+0.49/26 8.3s} 12. Nb5 {+1.85/20 3.3s} Qb8 {+0.38/27 2.6s} 13. Nxg6 {+1.90/18 3.2s} hxg6 {+1.52/25 2.5s} 14. g3 {+1.89/20 4.0s} Qc8 {+0.76/24 4.5s} 15. Re1 {+2.23/20 4.1s} Qh3 {+2.07/26 12s} 16. Nd6+ {+2.15/19 3.8s} Kd7 {+3.24/23 2.0s} 17. Qa4+ {+1.88/22 4.7s} Kxd6 {+2.50/25 4.9s} 18. Bf4+ {+1.76/23 2.6s} e5 {+0.86/26 4.2s} 19. Bxe5+ {+1.71/23 4.0s} Ke7 {+0.88/26 1.7s} 20. Bf3 {+2.57/20 2.5s} Kd8 {+0.87/26 2.1s} 21. Bxf6+ {+2.51/22 6.0s} Kc7 {+0.72/25 1.7s} 22. Be7 {+3.21/22 2.8s} Bxe7 {0.00/24 3.0s} 23. Rxe7+ {+3.83/22 3.1s} Kd6 {0.00/25 1.9s} 24. Rxb7 {+4.09/22 4.8s} Qxh2+ {0.00/26 1.6s} 25. Kf1 {+4.21/19 1.9s} Qh3+ {0.00/27 1.8s} 26. Bg2 {+3.46/20 2.3s} Qf5 {0.00/29 1.4s} 27. Qc6+ {+3.46/22 2.5s} Ke5 {-1.19/26 3.3s} 28. Re7+ {+3.60/22 1.8s} Kd4 {-1.12/28 4.6s} 29. Re4+ {+3.32/23 4.1s} Qxe4 {-1.45/25 1.2s} 30. Bxe4 {+3.45/21 1.9s} Rac8 {-1.44/25 1.3s} 31. Qd7 {+3.90/20 1.7s} Kxe4 {-1.26/24 1.8s} 32. Ke2 {+3.83/21 1.6s} g5 {-2.81/24 2.8s} 33. Qxf7 {+3.93/20 2.1s} Rce8 {-2.82/24 4.7s} 34. d6 {+4.58/20 1.7s} Ke5 {-2.65/25 1.8s} 35. d7 {+4.77/20 1.5s} Rd8 {-3.31/22 2.2s} 36. Qe7+ {+5.16/20 1.7s} Kf5 {-3.42/22 1.1s} 37. b4 {+5.22/21 2.3s} cxb4 {-2.60/22 1.1s} 38. c5 {+5.44/20 1.6s} b3 {-2.53/22 1.1s} 39. Qf7+ {+5.52/20 1.7s} Ke4 {-5.86/22 2.0s} 40. axb3 {+5.52/21 2.3s} Nxb3 {-6.84/23 0.91s} 41. Qxb3 {+5.81/23 1.9s} Kd4 {-7.01/25 1.1s} 42. c6 {+6.28/20 1.4s} Ke5 {-7.01/27 1.2s} 43. Qd1 {+6.96/20 3.5s} Ke6 {-7.01/25 0.87s} 44. Ke3 {+7.06/19 1.2s} g4 {-8.10/22 0.92s} 45. Qd4 {+7.42/19 1.6s} a5 {-9.38/25 2.2s} 46. Kd3 {+7.36/21 1.4s} a4 {-9.34/25 1.9s} 47. Kc4 {+7.44/20 1.2s} a3 {-9.75/25 1.4s} 48. Qd5+ {+8.24/18 1.6s} Ke7 {-9.85/26 1.2s} 49. Qe5+ {+9.32/21 2.4s} Kf7 {-9.87/27 0.96s} 50. Qf5+ {+9.59/21 1.3s} Ke7 {-14.81/25 1.0s} 51. Kd5 {+9.75/21 1.6s} Rxd7+ {-16.62/28 1.9s} 52. Qxd7+ {+10.93/21 1.1s} Kf6 {-299.58/30 1.7s} 53. Qe6+ {+12.14/20 1.3s} Kg5 {-28.83/23 0.78s} 54. Qe7+ {+12.18/21 2.6s} Kg6 {-299.58/25 1.5s} 55. Qxa3 {+14.15/19 1.2s} Kf7 {-299.80/25 0.78s} 56. Qa4 {+15.13/20 1.3s} Rd8+ {-299.80/24 1.2s} 57. Kc5 {+22.44/21 1.6s} Ke6 {-299.68/26 1.7s} 58. Qxg4+ {+37.58/22 1.6s} Ke5 {-299.84/27 0.84s} 59. Qg5+ {+M15/26 1.1s} Ke4 {-299.86/26 0.73s} 60. c7 {+M11/25 1.4s} Rd6 {-299.90/26 1.0s} 61. Qe3+ {+M9/25 1.3s} Kf5 {-299.92/24 0.73s} 62. c8=Q+ {+M7/26 1.6s} Kg6 {-299.94/26 0.85s} 63. Qe4+ {+M5/26 1.2s} Kg5 {-299.96/27 1.1s} 64. Qf4+ {+M3/26 1.3s} Kh5 {-299.98/26 0.96s} 65. Qcg4# {+M1/26 0.93s, White mates} 1-0

Perhaps you would like to strip out the comments. As you can see, all PGN comments are surrounded by curly brackets. These appear nowhere else in the PGN.

To use a regular expression is to look for more than simply one literal search term. In this case, you would search on  \{.*?\}

Just look at part of the file. 88. Kg1 {-299.92/34 0.69s} Kg5 … You take out the space before it, then the comment and everything inside it. When I run that on the PGN included above, I get this result:

[Event "Patricia - 4ku"]
[Site "Chess Nerd"]
[Date "2024.12.02"]
[Round "1"]
[White "Patricia 3.1"]
[Black "4ku 5.1"]
[Result "1-0"]
[ECO "B22"]
[GameDuration "00:05:35"]
[GameEndTime "2024-12-02T10:59:02.042 Central Standard Time"]
[GameStartTime "2024-12-02T10:53:26.553 Central Standard Time"]
[Opening "Sicilian"]
[PlyCount "129"]
[TimeControl "120+1"]
[Variation "Alapin's variation (2.c3)"]

1. e4 c5 2. c3 d5 3. exd5 Qxd5 4. d4 Nc6 5. Nf3 Nf6 6. Be2 Bf5 7. c4 Qd6 8. d5 Nb4 9. O-O Nc2 10. Nh4 Bg6 11. Nc3 Nxa1 12. Nb5 Qb8 13. Nxg6 hxg6 14. g3 Qc8 15. Re1 Qh3 16. Nd6+ Kd7 17. Qa4+ Kxd6 18. Bf4+ e5 19. Bxe5+ Ke7 20. Bf3 Kd8 21. Bxf6+ Kc7 22. Be7 Bxe7 23. Rxe7+ Kd6 24. Rxb7 Qxh2+ 25. Kf1 Qh3+ 26. Bg2 Qf5 27. Qc6+ Ke5 28. Re7+ Kd4 29. Re4+ Qxe4 30. Bxe4 Rac8 31. Qd7 Kxe4 32. Ke2 g5 33. Qxf7 Rce8 34. d6 Ke5 35. d7 Rd8 36. Qe7+ Kf5 37. b4 cxb4 38. c5 b3 39. Qf7+ Ke4 40. axb3 Nxb3 41. Qxb3 Kd4 42. c6 Ke5 43. Qd1 Ke6 44. Ke3 g4 45. Qd4 a5 46. Kd3 a4 47. Kc4 a3 48. Qd5+ Ke7 49. Qe5+ Kf7 50. Qf5+ Ke7 51. Kd5 Rxd7+ 52. Qxd7+ Kf6 53. Qe6+ Kg5 54. Qe7+ Kg6 55. Qxa3 Kf7 56. Qa4 Rd8+ 57. Kc5 Ke6 58. Qxg4+ Ke5 59. Qg5+ Ke4 60. c7 Rd6 61. Qe3+ Kf5 62. c8=Q+ Kg6 63. Qe4+ Kg5 64. Qf4+ Kh5 65. Qcg4# 1-0

 \{.*?\}

The space is self-explanatory. The backslash is to “escape” the left curly bracket, i.e. to keep it from being used as an actual reg exp character, instead of a search character. The next three are always together. They are a dot to say “any one character”, an asterisk to modify the dot to include as many as needed. The question mark to say, don’t get overambitious in your searching. The right curly bracket is then escaped (for the same reason the left one was) and that is really all it is.

Say you want to change the Event value to Big Tournament. You would use the following search text: ^\[Event ".*?"\]$ to indicate the beginning of a line, the tag you’re looking for, random content that you’re not keeping track of with parentheses, and then the rest of the tag and the end-of-line indicator.

^ means the beginning of a line. $ means the end of it. Backslashes escape the square brackets, and you can just replace the whole thing with \[Event "Big Tournament"\]

You don’t need to specify beginning or end of line in replace text, and since it’s always the same value, you just type it in literally.

These are two good examples to give you an idea of how to make this system work for you. It’s mostly a matter of looking things up and asking the various robots how to write the command line scripts. Things like that.

Leave a Reply

Your email address will not be published. Required fields are marked *