Prompt utilise pour regenerer cette page :
Page: Neuroevolution (dqn)
Description: "Neural networks that learn to cross the road"
Category: artificial-intelligence
Icon: graph
Tags: neural, genetic
Status: validated
Front matter (index.md):
title: "Neuroevolution (dqn)"
description: "Neural networks that learn to cross the road"
icon: "graph"
tags: ["neural", "genetic"]
status: ["validated"]
HTML structure (index.md):
<section class="container visual size-600 ratio-1-1 canvas-contain">
<canvas id="neuro-canvas" width="600" height="600"></canvas>
</section>
Widget files:
- _stats.right.md (weight: 10): ##### Statistics
<dl> with:
- Generation: dd#stat-generation (initial "0")
- Best Score: dd#stat-best (initial "0")
- Alive: dd#stat-alive (initial "0")
- Crossed: dd#stat-crossed (initial "0")
- _controls.right.md (weight: 20): Two sections:
##### Settings — <dl> with:
- Population: input#population-size type=number min=1 max=200 value=10
- Network: select#network-arch with options:
"128", "256-128" selected, "512-256", "256-128-64", "512-256-128"
- Speed: select#speed with options: 0.5x, 1x, "2" selected, 4x
- Show Best: checkbox#show-best checked
- AI: checkbox#enable-ai (unchecked by default)
div.neuro-controls with:
{{< button id="btn-start" icon="play" aria="Start" class="is-start" >}}
{{< button id="btn-pause" icon="pause" aria="Pause" class="is-pause" >}}
{{< button id="btn-reset" icon="refresh" aria="Reset" >}}
##### Batch Training — div.neuro-controls with:
input#batch-count type=number min=1 max=1000 value=100 style="width: 4rem;"
{{< button id="btn-batch" label="Train" >}}
{{< button id="btn-stop" label="Stop" disabled=true >}}
Architecture (multi-file, 5 JS files):
- default.js: main controller (IIFE)
- _brain-dqn.lib.js: Deep Q-Network brain (exported class BrainDQN)
- _engine.lib.js: game engine (exported class Engine)
- _renderer-vision.lib.js: vision matrix renderer (exported class VisionRenderer)
- _vehicles.lib.js: vehicle definitions (exported classes Vehicle, Car, Truck + VEHICLE_TYPES constant)
=== default.js (main controller) ===
IIFE, imports: panic from '/_lib/panic_v3.js', BrainDQN from './_brain-dqn.lib.js', Engine from './_engine.lib.js', VisionRenderer from './_renderer-vision.lib.js'
Grid config: GRID_COLS=14, TOTAL_LANES=2048, VISIBLE_LANES=10
Cell sizes: CELL_WIDTH=43, CELL_HEIGHT=43 (600px / 14 cols)
Timing: STUCK_THRESHOLD=50 ticks (50 x 200ms = 10 seconds max without progress)
Neural network input: VISION_LANES=9 (-1 to +7 relative to frog), FRAME_STACK=4
INPUT_SIZE = GRID_COLS(14) * VISION_LANES(9) * FRAME_STACK(4) = 504
hiddenSizes=[256,128] (default, changeable via UI), OUTPUT_SIZE=4 (up, down, left, right)
Cached options: enableAI(false), enableVision(true), showGrid(false), snapCells(false), showBest(false). Updated from DOM checkboxes once per frame via updateCachedOptions().
Frog class:
constructor(agentIndex): col=Math.floor(GRID_COLS/2), lane=0, jumpCooldown=0, moveCooldown=0, stuckFrames=0, frameBuffer=[], lastState=null, lastAction=null
getLane(): returns this.lane
getScreenX(): (col + 0.5) * CELL_WIDTH
getScreenY(canvasHeight): converts lane to screen Y using engine.cameraLane for relative offset. Lane 0 at bottom, higher lanes above.
update(): reduces cooldowns. Always builds frame inputs via getInputs(). If AI disabled, returns early. Gets action from brain.getAction(agentIndex, inputs). Picks strongest output > 0.5 threshold. Actions: 0=jumpUp(lane++, cooldown=10), 1=jumpDown(lane--, cooldown=10), 2=moveLeft(col--, cooldown=3), 3=moveRight(col++, cooldown=3). Only one action per frame. DQN learning: reward +10 for progress (maxLane increase), -0.1 otherwise. Kills if stuckFrames > STUCK_THRESHOLD or lanesCrossed >= TOTAL_LANES.
buildFrameMatrix(): 14x9 matrix, lanes from offset -1 to +7 relative to frog lane. Each cell: 1 if vehicle occupies (checks vehicle.col to vehicle.col+vehicle.cells), 0 otherwise. Returns flat array of 126 values.
getInputs(): frame stacking. Pushes current frame matrix to frameBuffer, keeps last FRAME_STACK(4) frames, pads with copies if insufficient. Returns concatenated 504-value array.
draw(ctx, canvasHeight, highlight): green circle rgba(46,204,113,0.6), radius = CELL_WIDTH*0.4. Red #e74c3c if highlighted. White eyes (radius 3) for highlighted frog only. Skips if off-screen.
createBrain(): new BrainDQN with config: inputSize=INPUT_SIZE(504), hiddenSizes, outputSize=4, populationSize, learningRate=0.001, gamma=0.95, epsilon=1.0, epsilonDecay=0.995, epsilonMin=0.01, batchSize=32, bufferSize=10000, targetUpdate=100
init():
- VisionRenderer with gridCols=14, visionLanes=9, frameStack=4, width=600
- Binds btn-start/btn-pause/btn-reset/btn-batch/btn-stop
- Space = toggle, Escape = stop training
- population-size change triggers reset
- network-arch change parses "256-128" format, triggers reset
- speed change updates speed multiplier
- Theme color updates via visionRenderer.updateColors()
Game loop:
run(currentTime): engine.update(currentTime, speed) returns tick count, step() for each tick, then updateStats() and draw()
step(): updateCachedOptions, engine.tick(), update frogs, check collisions via engine.checkCollision(col, lane), camera follows maxLane via engine.followLane(), all dead or bestLanes >= TOTAL_LANES -> nextGeneration()
nextGeneration(): brain.setFitness per agent, final DQN experience with done=true, brain.evolve(), engine.init() reset, new frogs
updateStats(): stat-generation, stat-best shows "bestScore/TOTAL_LANES", stat-alive, stat-crossed shows current best frog's lanes
draw(): finds best alive frog, renders vision matrix via visionRenderer.draw(bestFrog.frameBuffer, bestFrog.col, bestFrog.lane)
Batch training: STEPS_PER_FRAME=100 per setTimeout(0) frame. Progress logging every 10 generations via panic.info. btn-batch disabled / btn-stop enabled during training. Same pattern as neuro-bird.
=== _brain-dqn.lib.js ===
QNetwork class (internal):
constructor(inputSize, hiddenSizes, outputSize): He initialization (scale = sqrt(2/prevSize)), weights as 2D arrays [rows][cols], biases as 1D arrays per layer
randomMatrix(rows, cols, scale): creates random matrix with +-scale range
forward(state): hidden layers with ReLU (Math.max(0, sum)), output layer linear (raw Q-values)
copyFrom(other): deep copy all weights and biases
update(state, action, target, lr): full forward pass saving activations, backpropagation through all layers (ReLU derivative, output delta only for selected action), weight update via lr * delta * input. Returns squared loss.
getFlatWeights(): flattens bias[0] + all weights for visualization
ReplayBuffer class (internal):
constructor(capacity): circular ring buffer using position index
push(experience): overwrites oldest when full
sample(batchSize): random unique indices via Set, returns batch
size(): buffer.length
BrainDQN class (exported, default export):
constructor(config): stores all hyperparameters in this.config, initializes state (qNetwork, targetNetwork, replayBuffer as null; epsilon, stepCount, generation, agentFitness)
getType(): returns 'dqn'
init(): creates qNetwork + targetNetwork (copy), replayBuffer, resets epsilon/stepCount/generation. Logs topology via panic.info.
getPopulationSize(): returns config.populationSize
getAction(agentIndex, inputs): epsilon-greedy. Random: one-hot output for random action. Greedy: forward through qNetwork, one-hot for argmax Q-value.
setFitness(agentIndex, fitness): stores in agentFitness array
step(agentIndex, state, action, reward, nextState, done): pushes {state, action, reward, nextState, done} to replay buffer. Trains if buffer.size >= batchSize. Updates target network every targetUpdate steps.
train(): samples batch, Double DQN: online network selects best next action, target network evaluates Q-value. Target = reward (if done) or reward + gamma * targetQ[bestAction]. Updates via qNetwork.update().
evolve(): increments generation, decays epsilon (epsilon *= epsilonDecay, min epsilonMin), returns max fitness, resets agentFitness. Logs via panic.info.
getGeneration(): returns generation
getNetworkInfo(agentIndex): returns {inputSize, hiddenSizes, outputSize, weights: qNetwork.getFlatWeights(), epsilon}
updateConfig(updates): Object.assign(this.config, updates)
=== _engine.lib.js ===
DIFFICULTIES constant: ['easy', 'medium', 'hard']
Engine class (exported):
constructor(config): gridCols(14), totalLanes(2048), visibleLanes(10), tickMs(200), difficulty('easy'). State: vehicles[], laneGenerators Map, cameraLane=0, timing state.
setDifficulty(level): validates against DIFFICULTIES
randomGap(vehicleType): looks up VEHICLE_TYPES[type].gaps[difficulty], random in min..max range
initLaneGenerator(lane, vehicleType='car'): direction alternates per lane (even=right=1, odd=left=-1). Fills lane with initial vehicles spaced by random gaps. Stores generator with vehicleType, direction, spawnCol, nextSpawn counter.
updateLaneGenerator(lane): decrements nextSpawn, spawns new Car when 0, resets nextSpawn
init(): clears vehicles/generators/camera/timing. Creates generators for lanes 1 to min(visibleLanes+5, totalLanes).
updateVisibleVehicles(): removes vehicles off-screen or behind camera. Removes generators for old lanes. Creates generators for new lanes ahead. Updates all active generators.
tick(): updates all vehicle positions, calls updateVisibleVehicles
update(currentTime, speedMultiplier): fixed timestep accumulator. Returns number of ticks to execute (caller must call tick() for each). First call initializes lastTickTime.
followLane(targetLane, offset=3, smoothing=0.1): smooth camera lerp toward targetLane - offset
checkCollision(col, lane): iterates vehicles in lane, checks if col falls within [floor(vehicle.col), ceil(vehicle.col+vehicle.cells))
resetTiming(): resets lastTickTime=0 and accumulator=0
=== _vehicles.lib.js ===
VEHICLE_TYPES (exported constant):
car: { cells: 1, gaps: { easy: {min:3, max:9}, medium: {min:2, max:6}, hard: {min:1, max:4} } }
truck: { cells: 2, gaps: { easy: {min:6, max:15}, medium: {min:4, max:10}, hard: {min:2, max:6} } }
Vehicle class (exported): base class with type, cells (from VEHICLE_TYPES), lane, col, direction
getScreenX(cellWidth): col * cellWidth
getScreenY(cameraLane, canvasHeight, cellHeight): converts lane to screen Y relative to camera
update(): col += direction (moves 1 cell per tick)
isOffScreen(gridCols): true if exited grid based on direction
draw(ctx, cameraLane, canvasHeight, cellWidth, cellHeight): colored rectangle (car=#e74c3c red, truck=#3498db blue), dark windows at both ends
Car class (exported): extends Vehicle with type='car'
Truck class (exported): extends Vehicle with type='truck'
=== _renderer-vision.lib.js ===
VisionRenderer class (exported):
constructor(config): gridCols(14), visionLanes(9), frameStack(4), width(600). cellSize=width/gridCols, drawWidth=width, drawHeight=cellSize*visionLanes. Colors: {background: '#fff', frog: '#2ecc71'}
init(canvas): gets 2D context, centers drawing (offsetX = (canvas.width - drawWidth) / 2, offsetY = 0). Logs dimensions via panic.info.
updateColors(): caches --background-color-surface from CSS computed styles
draw(frameBuffer, frogCol, frogLane): clears canvas. Builds recentFrame map: for each cell, finds most recent frame (0=current, 1=frame-1, etc.) where cell was occupied. Draws cells row by row (lane 0 at bottom = behind frog, lane 8 at top = ahead). Grayscale: age 0 = gray(0) black, older = gray(age/(frameStack-1)*200), never occupied = gray(255) white. Cell gap = 1px. Frog indicator: green stroke rectangle at frogCol on screen lane 7 (visionLanes-1-1, which is lane index 1 = current position).
SCSS file (default.scss):
- #neuro-canvas: 100% width/height, background var(--background-color-surface)
- #history-canvas, #network-canvas: 100% width, auto height, 1px solid var(--draw-color-surface) border, 4px border-radius, same background
- .neuro-controls: display flex, row nowrap, justify-content center, gap .5rem, margin-top 1rem
.is-start: display block (visible by default)
.is-pause: display none (hidden by default)
&.is-running: .is-start hidden, .is-pause visible
Page entierement generee et maintenue par IA, sans intervention humaine.