Logo Chess Nerd

Overview of pgn-extract

pgn-extract is a command-line tool for searching, cleaning, and manipulating chess game files in Portable Game Notation (PGN). Below is a concise outline showing core usage and features.

1. Basic Usage

  • Syntax: pgn-extract [flags] [input-game-files]
  • If no arguments are given, it reads games from standard input and writes valid games (in SAN) to standard output.
  • Example: pgn-extract -o clean.pgn raw.pgn
  • Use -r to check for errors without writing output: pgn-extract -r raw.pgn

2. Output Control

  • -o <file>, --output <file>: Write matched games to a new file (overwrite).
  • -a <file>, --append <file>: Append matched games to an existing file.
  • -n <file>: Write unmatched games to a file.
  • -C, --nocomments: Remove comments.
  • -N, --nonags: Remove NAG symbols.
  • -V, --novars: Remove variations.
  • --nomovenumbers, --noresults, --notags: Suppress move numbers, results, or tags.

3. Searching and Filtering

  • -x <file>: Positional variations. Matches if a game reaches the position described by moves in that file.
  • -v <file>: Textual variations. Matches sequences of moves (supports wildcards *, !, etc.).
  • -t <file>: Tag-based criteria, e.g. players, dates, results, Elo, or FEN positions.
  • -T...: Limited command-line tag filter (player, date, result, etc.).
  • --fenpattern <string> / --fenpatterni <string>: FEN-based pattern match in the game.
  • --materialy, --materialz: Endgame or material-based filtering (e.g., R vs N endgames).

4. Limiting Game Length

  • -b / -p: Filter by number of moves or plies (lower bound, upper bound).
  • --minply, --maxply, --minmoves, --maxmoves: Modern equivalents of the above.
  • --startply <N>: Start matching only after N plies.
  • --matchplylimit <N>: Stop searching for matches after N plies.

5. Duplicate Detection & ECO Classification

  • -d <file>, --duplicates <file>: Write duplicates to that file.
  • -D, --noduplicates: Skip duplicates in output.
  • -U, --nounique: Suppress unique games (so only duplicates appear in output).
  • -e <eco-file>: Add/replace ECO codes in the output. Defaults to eco.pgn.

6. Splitting Output

  • -# <N>: Split output into files of N games each (named 1.pgn, 2.pgn,...).
  • -E <N>: Split output by ECO code (A.pgn, B.pgn,..., or A00.pgn,... etc.).

7. Specialized Flags

  • --checkmate, --stalemate, --fifty, --repetition, --underpromotion: Match only games that exhibit these features.
  • -F: Output FEN string after the final move or replace placeholder comments with FEN strings.
  • --fencomments: Place FEN after every move.
  • --hashcomments: Place a hashcode after every move.
  • --addhashcode: Add HashCode tag to each output game.
  • --splitvariants: Output each variation as a separate game.

8. Helpful Examples

1) Convert raw PGN to cleaned SAN:
   pgn-extract -o clean.pgn raw.pgn

2) Extract only games by Fischer:
   pgn-extract -t tags.txt raw.pgn
   // tags.txt contains:
   // Player "Fischer"

3) ECO classify a file:
   pgn-extract -e eco.pgn -o ecoclass.pgn bigfile.pgn

4) Remove duplicates, keeping unique:
   pgn-extract -D -o unique.pgn input.pgn

5) Extract short games (under 20 moves):
   pgn-extract -bu20 -o shortgames.pgn input.pgn

9. Further Documentation

  • Run pgn-extract --help or -h for a brief flag summary.
  • The tool’s source code and full manual are included with the distribution.