Training Data ‐ Digital Chessboard Workflow - ObayAlshaer/ChessMentor-ML GitHub Wiki
Digital Chessboard Workflow - Training Data
Image Size, Cropping & Format
For optimal training quality, all images in our dataset will be cropped to 800×800 pixels before annotation and saved in JPEG (.jpg) format at 90–95% quality. This ensures that YOLO receives the same type of input during both training and real-world inference. We have developed a Chessboard Cropping Tool, available in our GitHub repository, to automate this process: 🔗 ChessboardDetector GitHub Repository
Why Crop the Images Before Training?
- Consistency with App Workflow: Our app pre-crops the chessboard before running YOLO. Training on full-screen images would introduce unnecessary variations (UI elements, glare, side menus), making detection harder.
- Better Model Accuracy: By removing irrelevant parts of the image, YOLO can focus entirely on piece detection rather than board localization.
- More Efficient Training: Smaller, focused images lead to faster training times and lower memory usage, improving overall performance.
Best Image Format for Roboflow & YOLO
For optimal training performance, file size efficiency, and preserving image quality, we use:
Format | Pros | Cons | Best Use Case |
---|---|---|---|
JPEG (.jpg) ✅ | ✔ Small file size ✔ Fast training speed ✔ Good quality if saved at 90%+ | ❌ Slight compression loss | Best for YOLO training & Roboflow uploads |
PNG (.png) ⚠️ | ✔ Lossless quality ✔ Supports transparency | ❌ Larger file sizes ❌ Slower training | Not necessary for YOLO |
WebP (.webp) 🟡 | ✔ High compression + quality ✔ Smaller than JPEG | ❌ Not natively supported by YOLO | Not recommended |
TIFF (.tiff) ❌ | ✔ Lossless quality ✔ High bit depth | ❌ Huge file sizes ❌ Slows down training | Overkill for YOLO |
Example Image Properties
Below is an example of a properly formatted image for training YOLO and uploading to Roboflow. This image meets all dataset requirements:
✅ Format: JPEG (.jpg)
✅ Image Size: 800×800 pixels
✅ Color Model: RGB
✅ File Size: ~100-250 KB (efficient for processing)
✅ DPI: 72 pixels/inch (not an issue for YOLO training)
Recommended Dataset Size
To achieve our target accuracy (≥ 95% chessboard detection and ≥ 90% piece detection), our dataset should be split as follows:
Dataset Split | Minimum Recommended | Ideal |
---|---|---|
Training Set | 700 | 800 |
Validation Set | 200 | 150 |
Test Set | 100 | 50 |
Total | 1000 images | 1000 images |
Dataset Composition
Research & Justification for Dataset Proportions
To ensure our dataset represents real-world usage, we conducted a deep analysis of Chess.com’s most popular board themes. Based on Chess.com user data, forum discussions, and community insights, we found that over 90% of players use a small selection of predefined board themes. Our dataset proportions are directly based on this research:
- Classic Green (35%): The most widely used board, the default choice for most users.
- Brown (20%): A widely chosen flat brown theme mimicking wood, using "Classic" pieces.
- Dark Wood (15%): A highly popular wood-textured theme.
- Blue (8%): A commonly used alternative for contrast and visibility.
- Icy Sea (8%): A sleek, modern choice, preferred by blitz players.
- Tournament Dark Green (5%): Resembles classic tournament boards.
- High-Contrast B&W (3%): Ensures visibility under extreme contrast conditions.
- Glass (3%): A niche but recognized transparent board design.
- Bubblegum (2%): A vibrant, less serious choice used by streamers and casual players.
- Lolz (1%): A rare, high-contrast black/grey board that presents unique challenges.
Final Dataset Breakdown
Theme Name | # Images | Source Type |
---|---|---|
Classic Green (default) | 350 | Camera (280), Screenshots (70) |
Brown (Brown board, Classic Pieces) | 200 | Camera (160), Screenshots (40) |
Icy Sea | 80 | Camera (65), Screenshots (15) |
Dark Wood (Walnut) | 150 | Camera (120), Screenshots (30) |
High-Contrast (B&W) | 100 | Camera (80), Screenshots (20) |
Tournament (Dark Green) | 50 | Camera (40), Screenshots (10) |
Glass Pieces | 30 | Camera (24), Screenshots (6) |
Bubblegum | 20 | Camera (16), Screenshots (4) |
Lolz (Dark/grey) | 10 | Camera (8), Screenshots (2) |
Total | 1000 |
Labeling Strategy
- We will label individual chess pieces with bounding boxes on cropped chessboard images.
- We are using Roboflow for annotation, which allows us to efficiently label and manage our dataset while ensuring consistency across different board themes and piece types.
- Our labels will include:
white_pawn
,white_rook
,white_knight
,white_bishop
,white_queen
,white_king
black_pawn
,black_rook
,black_knight
,black_bishop
,black_queen
,black_king
Bias Prevention Strategy
-
1️⃣ Use a Mix of Opening, Midgame, and Endgame Positions: Ensure dataset includes game states where some pawns are missing (not just opening positions).
-
2️⃣ Include Tactics & Studies: Some datasets can include puzzle positions, sacrifices, and pawnless endgames to balance things.
-
3️⃣ Filter Out Too Many “Full Pawn Ranks” Boards: If too many images have 8 pawns per side, balance it by adding captured-pawn scenarios.
Data Augmentation
We will use augmentation techniques to automatically simulate image quality variations such as:
- Perspective transformations (angled photographs)
- Brightness and contrast adjustments (lighting variations)
- Blur and noise (lower-quality captures)
This effectively increases our dataset size by approximately 3–4 times, significantly enhancing model generalization.
Augmentation Pipeline (Python example)
import albumentations as A
import cv2
augmentation_pipeline = A.Compose([
A.Perspective(scale=(0.05, 0.1), p=0.6),
A.RandomBrightnessContrast(p=0.7),
A.GaussianBlur(blur_limit=(3, 7), p=0.4),
A.GaussNoise(var_limit=(10, 30), p=0.3),
A.Rotate(limit=10, p=0.5),
A.RandomCrop(width=700, height=700, p=0.2),
])
# Usage example:
image = cv2.imread('original_image.png')
augmented = augmentation_pipeline(image=image)['image']
cv2.imwrite('augmented_image.png', augmented)
Roles and Responsibilities
Our Role | Augmentation’s Role |
---|---|
Capture and label original Chess.com images | Generate realistic image variations |
Set augmentation parameters | Automatically simulate quality variations |
Periodically validate augmented images | Efficiently create diverse training examples |
Next Steps
- Collect our initial images based on the above table (total 1000).
- Set up the augmentation pipeline using Albumentations.
- Train the YOLO model and export it to CoreML for integration into our iOS application.
- Continuously validate the model’s performance and improve based on user feedback.