Neuroevolution (dqn)

Prompt utilise pour regenerer cette page :
Page: Neuroevolution (dqn)
Description: "Neural networks that learn to cross the road"
Category: artificial-intelligence
Icon: graph
Tags: neural, genetic
Status: validated

Front matter (index.md):
  title: "Neuroevolution (dqn)"
  description: "Neural networks that learn to cross the road"
  icon: "graph"
  tags: ["neural", "genetic"]
  status: ["validated"]

HTML structure (index.md):
  <section class="container visual size-600 ratio-1-1 canvas-contain">
    <canvas id="neuro-canvas" width="600" height="600"></canvas>
  </section>

Widget files:
- _stats.right.md (weight: 10): ##### Statistics
  <dl> with:
    - Generation: dd#stat-generation (initial "0")
    - Best Score: dd#stat-best (initial "0")
    - Alive: dd#stat-alive (initial "0")
    - Crossed: dd#stat-crossed (initial "0")

- _controls.right.md (weight: 20): Two sections:
  ##### Settings — <dl> with:
    - Population: input#population-size type=number min=1 max=200 value=10
    - Network: select#network-arch with options:
        "128", "256-128" selected, "512-256", "256-128-64", "512-256-128"
    - Speed: select#speed with options: 0.5x, 1x, "2" selected, 4x
    - Show Best: checkbox#show-best checked
    - AI: checkbox#enable-ai (unchecked by default)
  div.neuro-controls with:
    {{< button id="btn-start" icon="play" aria="Start" class="is-start" >}}
    {{< button id="btn-pause" icon="pause" aria="Pause" class="is-pause" >}}
    {{< button id="btn-reset" icon="refresh" aria="Reset" >}}
  ##### Batch Training — div.neuro-controls with:
    input#batch-count type=number min=1 max=1000 value=100 style="width: 4rem;"
    {{< button id="btn-batch" label="Train" >}}
    {{< button id="btn-stop" label="Stop" disabled=true >}}

Architecture (multi-file, 5 JS files):
  - default.js: main controller (IIFE)
  - _brain-dqn.lib.js: Deep Q-Network brain (exported class BrainDQN)
  - _engine.lib.js: game engine (exported class Engine)
  - _renderer-vision.lib.js: vision matrix renderer (exported class VisionRenderer)
  - _vehicles.lib.js: vehicle definitions (exported classes Vehicle, Car, Truck + VEHICLE_TYPES constant)

=== default.js (main controller) ===
  IIFE, imports: panic from '/_lib/panic_v3.js', BrainDQN from './_brain-dqn.lib.js', Engine from './_engine.lib.js', VisionRenderer from './_renderer-vision.lib.js'

  Grid config: GRID_COLS=14, TOTAL_LANES=2048, VISIBLE_LANES=10
  Cell sizes: CELL_WIDTH=43, CELL_HEIGHT=43 (600px / 14 cols)
  Timing: STUCK_THRESHOLD=50 ticks (50 x 200ms = 10 seconds max without progress)

  Neural network input: VISION_LANES=9 (-1 to +7 relative to frog), FRAME_STACK=4
    INPUT_SIZE = GRID_COLS(14) * VISION_LANES(9) * FRAME_STACK(4) = 504
    hiddenSizes=[256,128] (default, changeable via UI), OUTPUT_SIZE=4 (up, down, left, right)

  Cached options: enableAI(false), enableVision(true), showGrid(false), snapCells(false), showBest(false). Updated from DOM checkboxes once per frame via updateCachedOptions().

  Frog class:
    constructor(agentIndex): col=Math.floor(GRID_COLS/2), lane=0, jumpCooldown=0, moveCooldown=0, stuckFrames=0, frameBuffer=[], lastState=null, lastAction=null
    getLane(): returns this.lane
    getScreenX(): (col + 0.5) * CELL_WIDTH
    getScreenY(canvasHeight): converts lane to screen Y using engine.cameraLane for relative offset. Lane 0 at bottom, higher lanes above.
    update(): reduces cooldowns. Always builds frame inputs via getInputs(). If AI disabled, returns early. Gets action from brain.getAction(agentIndex, inputs). Picks strongest output > 0.5 threshold. Actions: 0=jumpUp(lane++, cooldown=10), 1=jumpDown(lane--, cooldown=10), 2=moveLeft(col--, cooldown=3), 3=moveRight(col++, cooldown=3). Only one action per frame. DQN learning: reward +10 for progress (maxLane increase), -0.1 otherwise. Kills if stuckFrames > STUCK_THRESHOLD or lanesCrossed >= TOTAL_LANES.
    buildFrameMatrix(): 14x9 matrix, lanes from offset -1 to +7 relative to frog lane. Each cell: 1 if vehicle occupies (checks vehicle.col to vehicle.col+vehicle.cells), 0 otherwise. Returns flat array of 126 values.
    getInputs(): frame stacking. Pushes current frame matrix to frameBuffer, keeps last FRAME_STACK(4) frames, pads with copies if insufficient. Returns concatenated 504-value array.
    draw(ctx, canvasHeight, highlight): green circle rgba(46,204,113,0.6), radius = CELL_WIDTH*0.4. Red #e74c3c if highlighted. White eyes (radius 3) for highlighted frog only. Skips if off-screen.

  createBrain(): new BrainDQN with config: inputSize=INPUT_SIZE(504), hiddenSizes, outputSize=4, populationSize, learningRate=0.001, gamma=0.95, epsilon=1.0, epsilonDecay=0.995, epsilonMin=0.01, batchSize=32, bufferSize=10000, targetUpdate=100

  init():
    - VisionRenderer with gridCols=14, visionLanes=9, frameStack=4, width=600
    - Binds btn-start/btn-pause/btn-reset/btn-batch/btn-stop
    - Space = toggle, Escape = stop training
    - population-size change triggers reset
    - network-arch change parses "256-128" format, triggers reset
    - speed change updates speed multiplier
    - Theme color updates via visionRenderer.updateColors()

  Game loop:
    run(currentTime): engine.update(currentTime, speed) returns tick count, step() for each tick, then updateStats() and draw()
    step(): updateCachedOptions, engine.tick(), update frogs, check collisions via engine.checkCollision(col, lane), camera follows maxLane via engine.followLane(), all dead or bestLanes >= TOTAL_LANES -> nextGeneration()
    nextGeneration(): brain.setFitness per agent, final DQN experience with done=true, brain.evolve(), engine.init() reset, new frogs

  updateStats(): stat-generation, stat-best shows "bestScore/TOTAL_LANES", stat-alive, stat-crossed shows current best frog's lanes
  draw(): finds best alive frog, renders vision matrix via visionRenderer.draw(bestFrog.frameBuffer, bestFrog.col, bestFrog.lane)

  Batch training: STEPS_PER_FRAME=100 per setTimeout(0) frame. Progress logging every 10 generations via panic.info. btn-batch disabled / btn-stop enabled during training. Same pattern as neuro-bird.

=== _brain-dqn.lib.js ===
  QNetwork class (internal):
    constructor(inputSize, hiddenSizes, outputSize): He initialization (scale = sqrt(2/prevSize)), weights as 2D arrays [rows][cols], biases as 1D arrays per layer
    randomMatrix(rows, cols, scale): creates random matrix with +-scale range
    forward(state): hidden layers with ReLU (Math.max(0, sum)), output layer linear (raw Q-values)
    copyFrom(other): deep copy all weights and biases
    update(state, action, target, lr): full forward pass saving activations, backpropagation through all layers (ReLU derivative, output delta only for selected action), weight update via lr * delta * input. Returns squared loss.
    getFlatWeights(): flattens bias[0] + all weights for visualization

  ReplayBuffer class (internal):
    constructor(capacity): circular ring buffer using position index
    push(experience): overwrites oldest when full
    sample(batchSize): random unique indices via Set, returns batch
    size(): buffer.length

  BrainDQN class (exported, default export):
    constructor(config): stores all hyperparameters in this.config, initializes state (qNetwork, targetNetwork, replayBuffer as null; epsilon, stepCount, generation, agentFitness)
    getType(): returns 'dqn'
    init(): creates qNetwork + targetNetwork (copy), replayBuffer, resets epsilon/stepCount/generation. Logs topology via panic.info.
    getPopulationSize(): returns config.populationSize
    getAction(agentIndex, inputs): epsilon-greedy. Random: one-hot output for random action. Greedy: forward through qNetwork, one-hot for argmax Q-value.
    setFitness(agentIndex, fitness): stores in agentFitness array
    step(agentIndex, state, action, reward, nextState, done): pushes {state, action, reward, nextState, done} to replay buffer. Trains if buffer.size >= batchSize. Updates target network every targetUpdate steps.
    train(): samples batch, Double DQN: online network selects best next action, target network evaluates Q-value. Target = reward (if done) or reward + gamma * targetQ[bestAction]. Updates via qNetwork.update().
    evolve(): increments generation, decays epsilon (epsilon *= epsilonDecay, min epsilonMin), returns max fitness, resets agentFitness. Logs via panic.info.
    getGeneration(): returns generation
    getNetworkInfo(agentIndex): returns {inputSize, hiddenSizes, outputSize, weights: qNetwork.getFlatWeights(), epsilon}
    updateConfig(updates): Object.assign(this.config, updates)

=== _engine.lib.js ===
  DIFFICULTIES constant: ['easy', 'medium', 'hard']
  Engine class (exported):
    constructor(config): gridCols(14), totalLanes(2048), visibleLanes(10), tickMs(200), difficulty('easy'). State: vehicles[], laneGenerators Map, cameraLane=0, timing state.
    setDifficulty(level): validates against DIFFICULTIES
    randomGap(vehicleType): looks up VEHICLE_TYPES[type].gaps[difficulty], random in min..max range
    initLaneGenerator(lane, vehicleType='car'): direction alternates per lane (even=right=1, odd=left=-1). Fills lane with initial vehicles spaced by random gaps. Stores generator with vehicleType, direction, spawnCol, nextSpawn counter.
    updateLaneGenerator(lane): decrements nextSpawn, spawns new Car when 0, resets nextSpawn
    init(): clears vehicles/generators/camera/timing. Creates generators for lanes 1 to min(visibleLanes+5, totalLanes).
    updateVisibleVehicles(): removes vehicles off-screen or behind camera. Removes generators for old lanes. Creates generators for new lanes ahead. Updates all active generators.
    tick(): updates all vehicle positions, calls updateVisibleVehicles
    update(currentTime, speedMultiplier): fixed timestep accumulator. Returns number of ticks to execute (caller must call tick() for each). First call initializes lastTickTime.
    followLane(targetLane, offset=3, smoothing=0.1): smooth camera lerp toward targetLane - offset
    checkCollision(col, lane): iterates vehicles in lane, checks if col falls within [floor(vehicle.col), ceil(vehicle.col+vehicle.cells))
    resetTiming(): resets lastTickTime=0 and accumulator=0

=== _vehicles.lib.js ===
  VEHICLE_TYPES (exported constant):
    car: { cells: 1, gaps: { easy: {min:3, max:9}, medium: {min:2, max:6}, hard: {min:1, max:4} } }
    truck: { cells: 2, gaps: { easy: {min:6, max:15}, medium: {min:4, max:10}, hard: {min:2, max:6} } }

  Vehicle class (exported): base class with type, cells (from VEHICLE_TYPES), lane, col, direction
    getScreenX(cellWidth): col * cellWidth
    getScreenY(cameraLane, canvasHeight, cellHeight): converts lane to screen Y relative to camera
    update(): col += direction (moves 1 cell per tick)
    isOffScreen(gridCols): true if exited grid based on direction
    draw(ctx, cameraLane, canvasHeight, cellWidth, cellHeight): colored rectangle (car=#e74c3c red, truck=#3498db blue), dark windows at both ends

  Car class (exported): extends Vehicle with type='car'
  Truck class (exported): extends Vehicle with type='truck'

=== _renderer-vision.lib.js ===
  VisionRenderer class (exported):
    constructor(config): gridCols(14), visionLanes(9), frameStack(4), width(600). cellSize=width/gridCols, drawWidth=width, drawHeight=cellSize*visionLanes. Colors: {background: '#fff', frog: '#2ecc71'}
    init(canvas): gets 2D context, centers drawing (offsetX = (canvas.width - drawWidth) / 2, offsetY = 0). Logs dimensions via panic.info.
    updateColors(): caches --background-color-surface from CSS computed styles
    draw(frameBuffer, frogCol, frogLane): clears canvas. Builds recentFrame map: for each cell, finds most recent frame (0=current, 1=frame-1, etc.) where cell was occupied. Draws cells row by row (lane 0 at bottom = behind frog, lane 8 at top = ahead). Grayscale: age 0 = gray(0) black, older = gray(age/(frameStack-1)*200), never occupied = gray(255) white. Cell gap = 1px. Frog indicator: green stroke rectangle at frogCol on screen lane 7 (visionLanes-1-1, which is lane index 1 = current position).

SCSS file (default.scss):
  - #neuro-canvas: 100% width/height, background var(--background-color-surface)
  - #history-canvas, #network-canvas: 100% width, auto height, 1px solid var(--draw-color-surface) border, 4px border-radius, same background
  - .neuro-controls: display flex, row nowrap, justify-content center, gap .5rem, margin-top 1rem
    .is-start: display block (visible by default)
    .is-pause: display none (hidden by default)
    &.is-running: .is-start hidden, .is-pause visible
Page entierement generee et maintenue par IA, sans intervention humaine.