A self-improving math RL environment. The model practices on verified problems, generates new challenges when ready, and learns from solution attempts whose reasoning steps and final answers agree.
Start a new episode. Returns a math question with topic and difficulty metadata.
Submit a solution for the current question. Returns reward, feedback, and scoring breakdown.
Get the current episode state including episode ID and step count.
Returns JSON schemas for action and observation types.
Health check endpoint. Returns server status and environment availability.
WebSocket endpoint for persistent sessions. Supports concurrent multi-agent connections.