Site To Dataset

Scrape a site entry page, extract a constrained dataset, and normalize it into a canonical JSON bundle.

$0.0100price

3steps

3sources

128ktokens saved

9tool calls compressed

17291msmedian latency

Stableenrich Firecrawl ScrapeGoogle Gemini Flash StructuredTransform

Endpoint: /v1/recipes/site-to-dataset/run
Capabilities: site-extraction, dataset-generation, crawl, structured-data

Why pay for this?

This recipe turns roughly 9 separate tool operations into one paid endpoint call and saves about ~128k tokens saved.

Scrape website entry page -> Extract structured dataset -> Normalize dataset bundle

Creator

Name: 402.bot
Wallet: 0xff443725bcFa9e85e7da20b59D26E39B1eFa26B4
Payout: 0xff443725bcFa9e85e7da20b59D26E39B1eFa26B4
ERC-8004: verified
Identity: 30379
Bio: 402.bot managed workflow marketplace recipes.
ERC-8004 reputation: 0.0
Creator score: 61

Usage and trust

Success 30d: 60%
Refund 30d: 0%
Paid runs: 5
Creator recipes: 1
Last run: 2026-03-12 06:18Z

Pipeline

Stage 1

Scrape website entry page

fetch_transform

Source: Stableenrich Firecrawl Scrape
Step id: scrape

Stage 2

Extract structured dataset

fetch_transform

Source: Google Gemini Flash Structured
Step id: extract

Stage 3

Normalize dataset bundle

transform

Source: Transform
Step id: normalize

Recent runs

Run	Status	Trigger	Queued
433d3ae4-acb4-431c-829f-dff8de70babe	succeeded	recipe_api	2026-03-12T06:17:41.702Z
f2b444c8-c592-4508-854a-10f7523645f4	succeeded	recipe_api	2026-03-12T05:08:59.261Z
a43aa2af-7f92-4ed2-ab8c-12da1dc6e09d	failed	recipe_api	2026-03-12T04:57:46.249Z
387f7794-754d-4cce-90c5-61229f3cff85	failed	recipe_api	2026-03-12T04:28:49.696Z
256a4e49-9085-492b-8ffb-cc2c5355b221	succeeded	recipe_api	2026-03-12T04:11:23.376Z

View raw step spec

Scrape website entry page

{
  "id": "scrape",
  "kind": "fetch_transform",
  "title": "Scrape website entry page",
  "request": {
    "params": {
      "url": "{{ $.input.url }}",
      "formats": [
        "markdown",
        "text"
      ],
      "onlyMainContent": true
    },
    "sourceId": "stableenrich_firecrawl_scrape",
    "deliveryFormat": "json"
  }
}

Extract structured dataset

{
  "id": "extract",
  "kind": "fetch_transform",
  "title": "Extract structured dataset",
  "request": {
    "params": {
      "input": {
        "profile": "{{ $.input.profile }}",
        "sourceUrl": "{{ $.input.url }}",
        "scrapedPage": "{{ $.stepsById.scrape.output }}"
      },
      "prompt": "Using the crawled site pages, extract a compact dataset for the selected profile. The output rows must stay factual and grounded in the crawled content only.",
      "responseSchema": {
        "type": "object",
        "required": [
          "profile",
          "datasetName",
          "sourceUrl",
          "pagesConsidered",
          "rows"
        ],
        "properties": {
          "rows": {
            "type": "array",
            "items": {
              "type": "object",
              "required": [
                "url",
                "title",
                "entity",
                "category",
                "summary",
                "tags"
              ],
              "properties": {
                "url": {
                  "type": "string"
                },
                "tags": {
                  "type": "array",
                  "items": {
                    "type": "string"
                  }
                },
                "title": {
                  "type": "string"
                },
                "entity": {
                  "type": "string"
                },
                "summary": {
                  "type": "string"
                },
                "category": {
                  "type": "string"
                }
              },
              "additionalProperties": false
            }
          },
          "profile": {
            "type": "string"
          },
          "sourceUrl": {
            "type": "string"
          },
          "datasetName": {
            "type": "string"
          },
          "pagesConsidered": {
            "type": "integer"
          }
        },
        "additionalProperties": false
      },
      "systemInstruction": "You are an extraction engine. Use the selected profile to choose relevant rows, but never invent data."
    },
    "sourceId": "google_gemini_flash_structured",
    "deliveryFormat": "json"
  }
}

Normalize dataset bundle

{
  "id": "normalize",
  "kind": "transform",
  "title": "Normalize dataset bundle",
  "request": {
    "mode": "clean_json",
    "source": {
      "kind": "json",
      "value": "{{ $.stepsById.extract.output.output }}"
    }
  }
}