Marco Vinciguerra
f2dffe534f
fix: pdf scraper bug
2024-05-22 11:54:55 +02:00
VinciGit00
cc5adefd29
fix: come back to the old version
2024-05-15 15:54:00 +02:00
Marco Vinciguerra
cffcf80a75
Merge branch '88-blockscraper-implementation' into main
2024-05-15 15:20:12 +02:00
Marco Perini
a6e1813ddd
fix(fetch_node): bug in handling local files
2024-05-14 16:51:10 +02:00
Marco Perini
a296927624
feat(omni-scraper): working OmniScraperGraph with images
2024-05-14 13:46:49 +02:00
Marco Perini
0c1594737f
fix(fetch-node): removed isSoup from default
2024-05-13 12:09:55 +02:00
Marco Perini
7e8acd8e6a
Merge branch 'pre/beta' into fix/fetch-node-proxybroker
2024-05-13 11:17:37 +02:00
Marco Perini
1e9a564616
fix(proxy-rotation): removed duplicated arg and passed the loader_kwarhs correctly to the node
2024-05-12 18:39:03 +02:00
mayurdb
300fd5d253
Fetch links in the page while parsing html
2024-05-11 09:46:51 +05:30
Federico Minutoli
fc2aa3ac1c
Merge branch 'pre/beta' of https://github.com/DiTo97/Scrapegraph-ai into fix/fetch-node-proxybroker
2024-05-10 21:20:40 +02:00
Federico Minutoli
768719cce8
feat(safe-web-driver): enchanced the original AsyncChromiumLoader web driver with proxy protection and flexible kwargs and backend
...
the original class prevents passing kwargs down to the playwright backend, making some config unfeasible, including passing a proxy server to the web driver.
the new class has backward compatibility with the original, but 1) allows any kwarg to be passed down to the web driver, 2) allows specifying the web driver backend (only playwright is supported for now) in case more (e.g., selenium) will be supported in the future and 3) automatically fetches a suitable proxy if one is not passed already
2024-05-10 21:13:38 +02:00
Marco Perini
864aa91326
feat: revert fetch_node
2024-05-10 15:11:54 +02:00
Marco Vinciguerra
99adc9799f
Merge branch 'pre/beta' into fetchNodeFix
2024-05-10 11:13:54 +02:00
mayurdb
f8ce3d5916
fix: Augment the information getting fetched from a webpage
2024-05-10 13:28:53 +05:30
VinciGit00
0ab31c3fdb
fix: add json integration
2024-05-09 21:07:07 +02:00
VinciGit00
324e977b85
fix: fixed bugs for csv and xml
2024-05-09 20:46:46 +02:00
Shubham Kamboj
f10f3b1438
feat: Add support for passing pdf path as source
2024-05-09 21:55:05 +05:30
Marco Perini
186c0d035d
fix(examples): openai std examples
2024-05-08 14:56:44 +02:00
Marco Vinciguerra
b326886250
Merge branch '88-blockscraper-implementation' into asdt
2024-05-07 13:27:30 +02:00
Eric Page
3ae2ea1dbd
Miscellaneous "llm" -> "llm_model" refactors
2024-05-05 15:58:50 +02:00
Marco Perini
1409797475
docs: refactor nodes docstrings
2024-05-01 23:17:57 +02:00
Marco Perini
e9817963c8
docs: base and fetch node
2024-05-01 21:30:06 +02:00
EURAC\marperini
2dd7817cfb
feat: added verbose flag to suppress print statements
2024-04-30 15:31:57 +02:00
Marco Perini
d592d27bb4
Merge pull request #115 from VinciGit00/101-scrape-json-files
...
feat: add xml scraper and json scraper
2024-04-30 14:29:57 +02:00
EURAC\marperini
42ab0aa1d2
feat(fetch): added playwright support
2024-04-30 04:02:58 +02:00
VinciGit00
deb920a33e
fixing json and example
2024-04-29 16:11:57 +02:00
EURAC\marperini
6e7283ed8f
feat: add finalize_node()
2024-04-29 10:05:04 +02:00
VinciGit00
9cd516507c
fix: bug with fetch node
2024-04-27 21:23:35 +02:00
VinciGit00
adbc08f27b
fix: robot node and proxyes
2024-04-27 19:07:37 +02:00
VinciGit00
b754dd909c
fix: changed proxy function
2024-04-27 14:40:50 +02:00
VinciGit00
f6077d1f98
feat: add new proxy rotation function
2024-04-27 13:31:53 +02:00
VinciGit00
3f95801737
Merge branch 'main' of https://github.com/VinciGit00/Scrapegraph-ai
2024-04-19 10:36:40 +02:00
Andrea Rota
b0e446f014
feat: apply remove to the document before updating the state
2024-04-17 11:24:56 +02:00
VinciGit00
4233430518
add integration on the fetch node
2024-04-16 12:19:23 +02:00
VinciGit00
cec4c43c64
add new local models context window
2024-04-09 10:23:38 +02:00
EURAC\marperini
17add20c13
implemented node_config, add embedder model choice, add azure endpoint, refactor graphs and exmaples
2024-04-08 12:18:32 +02:00
EURAC\marperini
c2709aa50c
working search graph
2024-04-06 13:01:04 +02:00
VinciGit00
be36ab8707
add new type of scraping through text
2024-03-24 21:22:26 +01:00
VinciGit00
e25f74ddc8
fixed bug for scraping from node + add example
2024-03-18 22:53:52 +01:00
VinciGit00
1afb950751
refactoring of code, add example on the readme
2024-03-18 14:35:47 +01:00
Perinim
7d3190a636
fixed example graphs and utils files
2024-03-18 10:20:23 +01:00
Perinim
52934bf007
implemented graph_config, fixed smart_scraper and speech graph
2024-03-17 20:35:04 +01:00
Perinim
3585cd81e5
refactor fetch node
2024-03-17 12:02:35 +01:00