Chess AI Lab — v2 updated project map

Rebuilt from original C1–C8, CSSLab/Maia research, GM data layer, and monetization lens · April 2026

Overview
Core projects
Research tier
Dropped
Build order
Monetization
5
Core projects
3
Research tier
2
Dropped
3
Data layers
Kept and reframed: C1 (Style DNA), C2 (RL Coach), C3 (Openings), C4 (Opponent), C8 (Clock). All now have GM data as a comparison layer, not just personal games.

Upgraded to research tier: C7 becomes the Rating Band Atlas — genuinely publishable, addresses an open gap in CSSLab's Maia-2 work.

Two new projects: Behavioral Fingerprint Contribution (feeds the fair-play research community) and Style-Matched GM Curriculum (the coaching product).

Dropped: C5 Chess Vision (solved problem), I4 AWS Serverless (unnecessary complexity given Mac Mini + Hetzner setup).

Core principle shift: GM data stops being a separate module (C7) and becomes the comparison backbone that makes every personal project meaningful.
ML fundamentals Reinforcement learning Gen AI — RAG Gen AI — Agents Gen AI — LLM / fine-tuning Embeddings Statistical / research methods Data engineering
Personal

10K chess.com games. Your behavioral fingerprint. Longitudinal time-series. Nobody else has this about you.

Rating-band peers

Lichess games filtered to your Elo ±200. What humans at your level actually play — not GM theory.

GM corpus

4M+ Lichess elite games. Pattern library for positions you reach. Filtered to lines you actually play.

C1Behavioral fingerprint + drift tracker
High priorityMonetizable
Your move distribution is as unique as a fingerprint — CSSLab proved this. The upgraded C1 doesn't just build the fingerprint once; it tracks how it changes over time. Are you becoming more positional? Is your time-pressure behavior improving? Is your style converging toward or away from success patterns at your rating band?
Original framing
Compare your DNA against Magnus. One-time snapshot.
New framing
Longitudinal behavioral time-series. Monthly drift report. Alerts when your profile shifts significantly.
Feature pipeline extracting: move-type distribution per game phase, time allocation patterns per move, position-type preferences (tactical vs positional), deviation frequency from Maia predictions at your Elo. Stored monthly as behavioral snapshots in Postgres. Drift detection using PSI threshold same as C6.
Compare your fingerprint against Lichess players in your rating band ±200 who improved by 200+ Elo in 12 months. What did their behavioral profiles look like before the jump? That's your actual target signal.
python-chessPostgresMaia featuresEvidently AIPSI drift
AI / ML concepts required
ML fundamentalsFeature engineeringBehavioural stylometryTime-series analysisDimensionality reduction
StatisticalDistribution drift (PSI)Cosine similarityPopulation stability index
Data engineeringPGN parsingLongitudinal data modellingSnapshot storage
Abstract, research questions & tasks
C2Mistake pattern coach with GM context
High priorityMonetizableCSSLab adjacent
Your blunder classifier already finds your problem positions. The upgrade: when it surfaces a weakness pattern, it doesn't show you the engine line. It shows you how players 200–400 Elo above you — who actually reach the same positions from the same openings you play — handle it.
Original output
"You blunder in IQP positions under time pressure. Here's Stockfish's best move."
New output
"Here's how 1800-rated players who play your exact Sicilian line handle this structure — and what they do differently in the 5 moves before the critical moment."
After blunder classification, query Lichess GM DB filtered by: same opening line (ECO code), same position type, player Elo 1800–2200, games from last 3 years. Extract 10 moves around the critical moment. Feed to Claude for natural language pattern extraction in terms of principles, not engine lines.
XGBoostLichess DBStockfish evalClaude APIPivotRLPAHF
AI / ML concepts required
ML fundamentalsSupervised classificationGradient boosting (XGBoost)Anomaly detectionClass imbalance handling
Reinforcement learningReward shapingPolicy optimisationCurriculum learningAdaptive difficulty (PivotRL)
Gen AI — LLMStructured promptingChain-of-thoughtPersonalised feedback (PAHF)
Data engineeringEval batch pipelinesPosition taggingGame phase segmentation
Abstract, research questions & tasks
C3Personalised repertoire builder
High priorityMonetizable
Not "you deviated from theory at move 8." The real insight: what do players at your exact rating band actually play in the positions you reach? Opening recommendations based on what wins at your level, filtered by your behavioral style from C1 — avoiding lines that reach your C2 problem positions.
Original framing
Detect where you deviate from GM theory. Flag bad deviations.
New framing
Build a repertoire from what actually works at your Elo, matched to your style profile. Avoid lines leading to your weak position types.
Take your C1 fingerprint — specifically your positional vs tactical preference score and endgame win rate. Filter opening recommendations to avoid lines that statistically lead to position types where you underperform. A tactical player who loses every Carlsbad endgame should not be recommended the Exchange Slav.
Lichess opening DBECO classifierQdrantC1 fingerprintDoc-to-LoRA
AI / ML concepts required
Gen AI — RAGHybrid search (BM25 + dense)Vector embeddingsRetrieval-augmented generationReranking
Gen AI — fine-tuningLoRA / QLoRADoc-to-LoRA compressionDomain adaptation
ML fundamentalsMulti-label classification (ECO)Style-conditioned filteringRecommendation systems
EmbeddingsPosition embeddings (FEN)nomic-embed-text
Abstract, research questions & tasks
C4Opponent profiling agent
Medium priorityMonetizable
Pre-game brief from opponent's public game history: openings, collapse patterns, time tendencies, Elo trajectory. The upgrade: cross-reference their weaknesses against your strengths from C1. Find matchup edges, not just their profile in isolation.
Your C1 fingerprint vs opponent's extracted profile → a matchup score per opening line. "Opponent struggles in endgames; your endgame win rate is 67% — steer toward these exchanges." This is what seconds do for elite players. Nobody has automated it for club players.
chess.com APIC1 fingerprintLangGraphClaude API
AI / ML concepts required
Gen AI — AgentsMulti-agent orchestrationLangGraph state machinesTool-calling agentsAgentic memory (MAPLE)
Gen AI — LLMStructured output generationPersona-conditioned prompting
ML fundamentalsProfiling / clusteringSimilarity scoringBehavioural pattern extraction
Data engineeringAPI data ingestionPGN feature extraction
Abstract, research questions & tasks
C8Psychological clock agent
Medium priorityDataset contribution
Core unchanged — extract move timestamps, correlate think-time with eval drops, find your personal danger zone. Upgraded role: becomes your primary contribution to the fair-play research community. Your self-labeled time-pressure data is exactly what detection teams say they're missing.
Original role
Personal coaching tool. Find your clock danger zone.
New role
Personal tool + labeled behavioral dataset with granular self-labels: prep moves, calculation moves, time-pressure moves. CSSLab-compatible format.
PGN timestampsStockfish evalBehavioral labelsCSSLab format
AI / ML concepts required
ML fundamentalsTemporal pattern analysisCorrelation analysisBehavioural labellingAnomaly detection
StatisticalTime-series regressionConfidence intervalsThreshold-based alerting
Data engineeringPGN timestamp extractionMove-level annotationDataset schema designHuggingFace dataset release
Abstract, research questions & tasks
R1Rating band behavioral atlas
High priorityPublishableExtends Maia-2
Your strongest research contribution and the one that opens doors to CSSLab, KTH, and research hiring. Maia-2 (NeurIPS 2024) built a unified model of human chess behavior across skill levels but acknowledged it doesn't produce coherent improvement pathways. This project builds that missing map.
Take the full Lichess DB. Segment by rating in 100-Elo bands. For each band extract: time allocation per game phase, move-type distribution, tactical vs positional preference, blunder frequency by position type, opening diversity, endgame conversion rate. Build a continuous behavioral map showing what specifically changes between each band — not "play better moves" but concrete behavioral shifts.
CSSLab explicitly said in the Maia-2 paper that coherent improvement pathways are the missing piece. You're building a data-driven answer to that gap using the same Lichess corpus they use. A workshop paper at KDD or CHI submission on human-AI learning in chess is realistic with solid results.
Once the atlas exists, plot your own behavioral fingerprint against it. See exactly where you sit, which band's patterns you're closest to, and which specific behavioral shifts would move you to the next band. Your personal project and research contribution become the same pipeline.
Lichess full DBpython-chessMaia featuresPostgresNeurIPS/KDD targetCSSLab outreach
AI / ML concepts required
ML fundamentalsLarge-scale feature extractionBehavioural stylometryClustering (k-means / UMAP)Longitudinal cohort analysis
Human-AI alignmentSkill-level modellingCoherence across distributionsMaia-2 extension methodology
StatisticalDistributional shift analysisEffect size measurementSignificance testing
Data engineeringBig data PGN processingElo band segmentationPostgres time-series schema
Abstract, research questions & tasks
R2Style-matched GM curriculum
High priorityMonetizableResearch angle
The coaching product. Every chess improvement platform teaches GM mainlines. None personalise to your behavioral style. This uses your C1 fingerprint to find which GMs' decision-making profiles most resemble yours structurally — then pulls their games filtered to positions you actually reach and areas where you struggle.
Extract behavioral features from a curated set of historical GMs using the same C1 pipeline: time allocation patterns, positional vs tactical ratio, endgame preferences, opening diversity. Cosine similarity between your feature vector and each GM's vector across your problem position types. Output: top 3 style-matched GMs with confidence score per position category.
Filter matched GM's Lichess games to: your opening lines (ECO codes you play), your C2 problem position types, comparable opponent Elo. Extract the 10–15 most instructive examples. Claude generates natural language annotations explaining the GM's decision at each critical moment in terms of principles, not engine lines.
Chessable is a content platform. This is a personalisation engine. B2B angle: sell to chess coaches as a tool that auto-generates a personalised study plan for each student. Coaches pay €30–50/month per tool. That's a different business model — recurring SaaS, not one-time course purchase.
C1 fingerprintGM PGN corpusCosine similarityClaude APICoach B2B
AI / ML concepts required
Gen AI — RAGSemantic retrievalContext-window managementPersonalised retrieval
Gen AI — LLMInstructional annotation generationFew-shot promptingLong-context reasoning
Gen AI — AgentsMulti-step agentic pipelineMemory-augmented agents (MemCollab)Personalisation agents (PAHF)
ML fundamentalsCosine similarity matchingVector space modellingStyle transfer concepts
EmbeddingsBehavioural feature vectorsQdrant hybrid indexing
Abstract, research questions & tasks
R3Behavioral dataset for fair-play research
Medium priorityCommunity contributionCSSLab hook
The fair-play research community lacks labeled behavioral data capturing the spectrum of human move types — not just "human vs engine" but the subtler categories within human play. Your C8 work produces exactly this as a byproduct. Package it as a community contribution.
10K games with self-labeled move categories: opening prep (fast, theory-based), active calculation (slow, deliberate), intuitive decisions (fast, positional), time-pressure moves (fast, stressed), post-blunder moves (emotional state indicators). Combined with Maia move-match probabilities and Stockfish eval. Format compatible with Irwin and CSSLab datasets. Released on HuggingFace.
A well-documented dataset release gets cited. Citations lead to collaborations. Collaborations lead to research roles. It costs relatively little — it's a byproduct of C8 — but its research value is high because this type of granular self-labeled data genuinely doesn't exist publicly.
C8 outputMaia featuresHuggingFace releaseCSSLab compatibleIrwin format
AI / ML concepts required
ML fundamentalsDataset curation and labellingInter-rater reliabilityBehavioural annotation schema
Human-AI alignmentFair-play detection methodologyHuman vs engine stylometrySubtle cheat pattern taxonomy
Data engineeringDataset versioning (DVC)HuggingFace dataset schemaIrwin-compatible formatMove-level metadata
Abstract, research questions & tasks
C5Chess vision: board detection
Dropped
YOLOv8 board detection from images and video is a solved, open-source problem. Multiple well-maintained repos already do this well. Adds nothing to your research portfolio and nothing to the community. The only interesting angle — pairing it with your behavioral model — is a product feature, not a research project. Can be added later as a thin UI wrapper at minimal cost.
If you revisit it: build it as a thin wrapper that converts any image/video frame to FEN, then pipes straight into your existing C1/C2 analysis pipeline. Two days of work, not a standalone project.
I4AWS serverless pipeline
Dropped
Lambda + SQS + DynamoDB + EventBridge is unnecessary complexity given you already have a Mac Mini M4 Pro and a Hetzner VPS running k3s. Adds cost unpredictability, vendor lock-in, and a second deployment target to maintain. Your I5 + I1 + I8 stack handles everything you need.
The only valid reason to revisit: if you productize R2 as a SaaS and need multi-tenant scale. At that point use AWS or Fly.io. Not before.
Now → May 2026

Foundation: fingerprint + data pipeline

Complete C1 behavioral fingerprint pipeline on your 10K games. Set up Lichess DB local extract (just your ECO codes + Elo band to start). Get C8 timestamp labeling running. These three feed everything else.

C1C8Lichess pipeI5 ✓I9 ✓
May → Jul 2026

Core agents: C2 + C3 with GM layer

Blunder classifier with GM context queries. Repertoire builder with style-aware filtering. LangGraph API live. Qdrant RAG platform with hybrid search. First working end-to-end: game in → personalised feedback out with GM examples.

C2C3I3I7
Jul → Sep 2026

Research: Rating Band Atlas first results

Start R1 Atlas with a focused slice — 3–4 rating bands, 3–4 behavioral features. Write first blog post with findings. Email CSSLab. Build C4 opponent profiling and C6 continuous learning pipeline.

R1 v1C4C6Blog postCSSLab email
Sep → Dec 2026

Full Atlas + R2 curriculum prototype

Full Rating Band Atlas across all Elo bands. R3 dataset packaged and released on HuggingFace. R2 Style-Matched GM Curriculum working prototype. k3s production deployment. If Atlas results are strong — workshop paper submission for KDD 2027.

R1 fullR2R3 releaseI1Paper submission
2027

Productize or research path

If R2 curriculum prototype gets traction from coaches: productize as B2B SaaS. If R1 Atlas paper gets accepted: pursue research collaboration or apply to research-oriented roles at King, chess.com, or academic labs. These paths are not mutually exclusive.

R2 SaaSResearch collabHiring signal

R2 — Coach SaaS

Personalised curriculum tool for chess coaches. €30–50/month per coach. B2B, doesn't compete with Chessable. 50 coaches = €1500–2500/month.

12–18 months

R1 — Research role

Atlas results as portfolio piece. Opens doors at King, chess.com data teams, CSSLab collaboration, KTH/Chalmers research positions in Göteborg.

12 months

C1–C4 stack — ML hire

End-to-end behavioral modeling + agentic system. Positions you for ML engineer roles at gaming companies (King, Paradox) or any LLM product company.

6–9 months

R3 dataset — Research credibility

HuggingFace release cited by fair-play papers. Indirect: leads to collaborations. Direct: demonstrates research contribution in job applications.

10–12 months

Infra skills — Consulting

LangGraph + multi-agent + RAG is scarce now. 800–1500 SEK/hr freelance rate in Sweden while building the research work in parallel.

Now
Ages well — invest here

Domain expertise in behavioral modeling. The Lichess + personal dataset. Research findings and publications. Chess + ML intuition for what to build next.

Ages poorly — don't over-invest

LangGraph specifics. k3s configuration. Any particular RAG library. The orchestration plumbing gets abstracted away in 2–3 years.