Shogi
Usage
or you can directly load Shogi class
Description
TBA
Specs
| Name | Value |
|---|---|
| Version | v1 |
| Number of players | 2 |
| Number of actions | 2187 |
| Observation shape | (9, 9, 119) |
| Observation type | bool |
| Rewards | {-1, 0, 1} |
Observation
We follow the observation design of dlshogi, an open-source shogi AI.
Ther original dlshogi implementations are here.
Pgx implementation has [9, 9, 119] shape and [:, :, x] denotes:
x |
Description |
|---|---|
0:14 |
Where my piece x exists |
14:28 |
Where my pieces x are attacking |
28:31 |
Where the number of my attacking pieces are >= 1,2,3 respectively |
31:45 |
Where opponent's piece x exists |
45:59 |
Where opponent's pieces x are attacking |
59:62 |
Where the number of opponent's attacking pieces are >= 1,2,3 respectively |
The following planes are all ones ore zeros
x |
Description |
|---|---|
62:70 |
My hand has >= 1, ..., 8 Pawn |
70:74 |
My hand has >= 1, 2, 3, 4 Lance |
74:78 |
My hand has >= 1, 2, 3, 4 Knight |
78:82 |
My hand has >= 1, 2, 3, 4 Silver |
82:86 |
My hand has >= 1, 2, 3, 4 Gold |
86:88 |
My hand has >= 1, 2 Bishop |
88:90 |
My hand has >= 1, 2 Rook |
90:98 |
Oppnent's hand has >= 1, ..., 8 Pawn |
98:102 |
Oppnent's hand has >= 1, 2, 3, 4 Lance |
102:106 |
Oppnent's hand has >= 1, 2, 3, 4 Knight |
106:110 |
Oppnent's hand has >= 1, 2, 3, 4 Silver |
110:114 |
Oppnent's hand has >= 1, 2, 3, 4 Gold |
114:116 |
Oppnent's hand has >= 1, 2 Bishop |
116:118 |
Oppnent's hand has >= 1, 2 Rook |
118 |
Ones if checked |
Note that piece ids are
| Piece | Id |
|---|---|
歩 PAWN |
0 |
香 LANCE |
1 |
桂 KNIGHT |
2 |
銀 SILVER |
3 |
角 BISHOP |
4 |
飛 ROOK |
5 |
金 GOLD |
6 |
玉 KING |
7 |
と PRO_PAWN |
8 |
成香 PRO_LANCE |
9 |
成桂 PRO_KNIGHT |
10 |
成銀 PRO_SILVER |
11 |
馬 HORSE |
12 |
龍 DRAGON |
13 |
Action
The design of action also follows that of dlshogi.
There are 2187 = 81 x 27 distinct actions.
The action can be decomposed into
directionfrom which the piece moves anddestinationto which the piece moves
by direction, destination = action // 81, action % 81.
The direction is encoded by
| id | direction |
|---|---|
| 0 | Up |
| 1 | Up left |
| 2 | Up right |
| 3 | Left |
| 4 | Right |
| 5 | Down |
| 6 | Down left |
| 7 | Down right |
| 8 | Up2 left |
| 9 | Up2 right |
| 10 | Promote + Up |
| 11 | Promote + Up left |
| 12 | Promote + Up right |
| 13 | Promote + Left |
| 14 | Promote + Right |
| 15 | Promote + Down |
| 16 | Promote + Down left |
| 17 | Promote + Down right |
| 18 | Promote + Up2 left |
| 19 | Promote + Up2 right |
| 20 | Drop Pawn |
| 21 | Drop Lance |
| 22 | Drop Knight |
| 23 | Drop Silver |
| 24 | Drop Bishop |
| 25 | Drop Rook |
| 26 | Drop Gold |
Rewards
Non-zero rewards are given only at the terminal states. The reward at terminal state is described in this table:
| Reward | |
|---|---|
| Win | +1 |
| Lose | -1 |
| Draw | 0 |
Termination
Termination occurs when
- either player checkmates the opponent, or
512steps are elapsed (from AlphaZero[Silver+18])
Fourfold repetition is not implemented in v0.
Version History
v1: Bug fix in current player by @KazukiOta in #1298 (v2.6.0)v0: Initial release (v1.0.0)