Robots.txt Analyzer

Analyze and visualize robots.txt files from any website
Instructions Claude
Prompt utilisé pour régénérer cette page :
Page "Robots.txt Analyzer" du toolbox de web-cylian-org.
Catégorie : Toolbox — outil d'analyse et de visualisation des fichiers
robots.txt de n'importe quel site web.

=== FRONT MATTER (index.md) ===
title: "Robots.txt Analyzer"
description: "Analyze and visualize robots.txt files from any website"
icon: "robot"
tags: [seo, robots, crawler]
features:
  - Parse all robots.txt directives
  - Visual Allow/Disallow rules
  - Sitemap detection
  - Request history

=== WIDGET LEFT (_history.left.md) ===
Front matter : title "History", weight 10.
Contenu HTML :
<div id="history-widget">
  <h5>Recent URLs</h5>
  <ul id="history-list"></ul>
  <p id="history-empty" class="hidden">No history yet.</p>
  <button id="btn-clear-history" class="button hidden">Clear</button>
</div>

=== CONTENU HTML (index.md) ===
Tout dans <div id="robots-container"> :

1. Formulaire (div#robots-form) :
   - div.input-group contenant :
     <input type="url" id="url-input" placeholder="https://example.com" autocomplete="url">
     <button id="btn-analyze" class="button color-primary">Analyze</button>
   - <div id="form-error" class="error hidden"></div>

2. Loading (div#robots-loading class="hidden") :
   <span class="spinner"></span> Fetching robots.txt...

3. Résultats (div#robots-results class="hidden") :
   - Header : <h3 id="results-url"></h3> + <span id="results-status" class="badge"></span>
   - Content (div#results-content) :
     - div#user-agents (sections dynamiques des agents)
     - div#sitemaps-section class="hidden" : <h4>Sitemaps</h4> + <ul id="sitemaps-list">
     - div#host-section class="hidden" : <h4>Preferred Host</h4> + <span id="host-value">
     - <details id="raw-section"><summary>Raw Content</summary>
       <pre id="raw-content"></pre></details>

4. État vide (div#robots-empty class="hidden") :
   <p>Enter a URL above to analyze its robots.txt file.</p>

=== JAVASCRIPT (default.js) ===
Pas d'import ES module — vanilla JS pur. Init via DOMContentLoaded.
Constantes : HISTORY_KEY = 'robots-history', MAX_HISTORY = 20.

init() :
- Récupère url-input et btn-analyze
- Event listeners : click sur btn-analyze → analyze(), Enter dans input → analyze()
- Expose window.analyzeUrl pour le widget history
- Affiche l'état vide (robots-empty)

analyze() :
- Lit et trim la valeur de url-input, valide non-vide
- Appelle analyzeUrl(url)

analyzeUrl(url) :
- Met à jour l'input, cache les états précédents, affiche loading
- try : normalizeUrl → fetchRobotsTxt → parseRobotsTxt → renderResults → addToHistory
- catch : showError(err.message)
- finally : cache loading

normalizeUrl(input) :
- Ajoute 'https://' si absent
- Parse avec new URL(), reconstruit protocol//host/robots.txt

fetchRobotsTxt(url) → fetchWithProxy(url) :
- Proxy CORS : https://api.codetabs.com/v1/proxy?quest=<encoded url>
- Vérifie response.ok, retourne response.text()

parseRobotsTxt(content) :
- Split par lignes, pour chaque ligne :
  - Supprime les commentaires (après #), trim, skip vides
  - Parse directive:value (split sur premier ':')
  - Switch sur directive (lowercase) :
    user-agent : crée {name, rules:[], crawlDelay:null}, push dans result.userAgents
    allow : push {type:'allow', path:value} dans currentAgent.rules
    disallow : push {type:'disallow', path:value}
    crawl-delay : parseFloat(value) dans currentAgent.crawlDelay
    sitemap : push value dans result.sitemaps
    host : set result.host
- Retourne {userAgents:[], sitemaps:[], host:null}

renderResults(url, parsed, raw) :
- Header : URL en textContent, badge avec compteur user-agents
  (color-success si >0, color-warning sinon)
- User agents : pour chaque agent, crée div.user-agent-section avec :
  - div.agent-header : h4.agent-name + optionnel span.crawl-delay
  - Si pas de rules : p.no-rules "No specific rules."
  - Sinon ul.rules-list avec li.rule.rule-allow ou li.rule.rule-disallow :
    span.rule-type (texte uppercase du type) + code.rule-path
- Sitemaps : affiche si >0, liens <a> avec target="_blank" rel="noopener"
- Host : affiche si non-null
- Raw : met le contenu brut dans pre#raw-content
- Montre div#robots-results

showError(message) : affiche l'erreur, cache l'état vide.

addToHistory(url) :
- Filtre les doublons, unshift au début, slice à MAX_HISTORY
- Sauvegarde dans localStorage
- Dispatch CustomEvent 'robots-history-updated'

getHistory() : lit et parse depuis localStorage, retourne [] par défaut.
clearHistory() : supprime la clé, dispatch l'événement.

Fonctions exposées sur window : analyzeUrl, getHistory, clearHistory.

initHistoryWidget() :
- Récupère history-list, history-empty, btn-clear-history
- renderHistory() : si vide → montre empty + cache btn ; sinon → crée
  des <li><a> cliquables qui appellent analyzeUrl(entry.url)
- btnClear event click → clearHistory()
- Écoute 'robots-history-updated' → renderHistory()
- Appelle renderHistory() au démarrage

escapeHtml(str) : crée un div, set textContent, retourne innerHTML.

=== SCSS (default.scss) ===
#robots-container : flex column gap 1.5rem
#robots-form .input-group : flex gap 0.5rem, input[type="url"] flex 1 min-width 200px
  .error : margin-top 0.5rem, background rgba danger 0.1, color danger, border-radius 4px
#robots-loading : flex center gap 0.5rem, opacity 0.7
  .spinner : inline-block 1rem x 1rem, border 2px currentColor,
  border-right transparent, border-radius 50%, animation spin 0.75s linear infinite
@keyframes spin { to { transform: rotate(360deg) } }
#robots-results #results-header : flex center gap 1rem, border-bottom,
  h3 monospace word-break-all, .badge flex-shrink 0
  #results-content : flex column gap 1.5rem
.user-agent-section : background secondary, border-radius 8px, padding 1rem,
  border-left 4px solid primary
  .agent-header : flex center gap 1rem, .agent-name monospace
  .crawl-delay : petit badge avec background tertiary
  .no-rules : italic opacity 0.6
.rules-list : list-style none, flex column gap 0.25rem
.rule : flex center gap 0.75rem, padding, border-radius 4px
  .rule-type : uppercase, bold, min-width 70px, text-align center
  .rule-path : monospace, transparent background
  &.rule-allow : background success 0.1, .rule-type background success color white
  &.rule-disallow : background danger 0.1, .rule-type background danger color white
#sitemaps-section : h4 margin 0, ul list-style none, a monospace word-break-all
#host-section : h4 margin 0, #host-value monospace
#raw-section : summary cursor pointer opacity 0.7, pre background secondary max-height 300px
#robots-empty : padding 2rem center opacity 0.6
.hidden : display none !important
#history-widget : h5 margin-bottom, #history-list flex column gap 0.25rem,
  a monospace 0.75rem ellipsis, #history-empty italic, btn-clear margin-top

=== OBJECTIF ===
Outil SEO qui récupère et parse les fichiers robots.txt via un proxy CORS,
affiche les règles par user-agent avec code couleur allow/disallow,
liste les sitemaps et le host préféré, et maintient un historique
des URLs analysées en localStorage.
Page entièrement générée et maintenue par IA, sans intervention humaine.