Scrapegraph-ai/docs/README.md

---
title: ScrapGraphAI Roadmap
markmap:
  colorFreezeLevel: 2
  maxWidth: 500
---

# **ScrapGraphAI Roadmap**

## **Short-Term Goals**

- Integration with more llm APIs

- Test proxy rotation implementation

- Add more search engines inside the SearchInternetNode

- Improve the documentation (ReadTheDocs)
    - [Issue #102](https://github.com/VinciGit00/Scrapegraph-ai/issues/102)

- Create tutorials for the library

## **Medium-Term Goals**

- Node for handling API requests

- Improve SearchGraph to look into the first 5 results of the search engine

- Make scraping more deterministic
    - Create DOM tree of the website
    - HTML tag text embeddings with tags metadata
    - Study tree forks from root node
    - How do we use the tags parameters?

- Create scraping folder with report
    - Folder contains .scrape files, DOM tree files, report
    - Report could be a HTML page with scraping speed, costs, LLM info, scraped content and DOM tree visualization
    - We can use pyecharts with R-markdown

- Scrape multiple pages of the same website
    - Create new node that instantiate multiple graphs at the same time
    - Make graphs run in parallel
    - Scrape only relevant URLs from user prompt
    - Use the multi dimensional DOM tree of the website for retrieval
  - [Issue #112](https://github.com/VinciGit00/Scrapegraph-ai/issues/112)

- Crawler graph
    - Scrape all the URLs with the same domain in all the pages
    - Build many DOM trees and link them together
    - Save the multi dimensional tree in a file

- Compare two DOM trees to assess the similarity
    - Save the DOM tree of the scraped website in a file as a sort of cache to be used to compare with future website structure
    - Create similarity metrics with multiple DOM trees (overall tree? only relevant tags structure?)

- Nodes for handling authentication
    - Use Selenium or Playwright to handle authentication
    - Passes the cookies to the other nodes

- Nodes that attaches to an open browser
    - Use Selenium or Playwright to attach to an open browser
    - Navigate inside the browser and scrape the content

- Nodes for taking screenshots and understanding the page layout
    - Use Selenium or Playwright to take screenshots
    - Use LLM to asses if it is a block-like page, paragraph-like page, etc.
    - [Issue #88](https://github.com/VinciGit00/Scrapegraph-ai/issues/88)

## **Long-Term Goals**

- Automatic generation of scraping pipelines from a given prompt

- Create API for the library

- Finetune a LLM for html content