Commit Graph

67 Commits

Author SHA1 Message Date
Marco Vinciguerra
30ca15ca28
Merge branch 'md_scraper_integration' into integration_markdown
Some checks failed
/ build (3.10) (push) Has been cancelled
2024-06-30 16:58:37 +02:00
Marco Vinciguerra
2804434a9e feat: add integrations for markdown files
Some checks are pending
/ build (3.10) (push) Waiting to run
2024-06-29 13:35:39 +02:00
Marco Vinciguerra
9b45ebcdcf modify fetch node with no cut mode 2024-06-28 14:38:36 +02:00
Marco Vinciguerra
228a1de2be add new force 2024-06-27 18:57:27 +02:00
Marco Vinciguerra
9917972c11 fixed request
Some checks are pending
/ build (3.10) (push) Waiting to run
2024-06-22 21:39:37 +02:00
Marco Vinciguerra
afd46ac77b fixed generate_answer_node
Some checks are pending
/ build (3.10) (push) Waiting to run
2024-06-22 11:31:54 +02:00
Marco Vinciguerra
d1c3de777f fixed a bug 2024-06-21 14:14:43 +02:00
Marco Vinciguerra
7af411aa99 add trigger
Co-Authored-By: Matteo Vedovati <68272450+vedovati-matteo@users.noreply.github.com>
2024-06-21 13:36:27 +02:00
Marco Vinciguerra
2f02830c81 refactoring of fetch node
Some checks are pending
/ build (3.10) (push) Waiting to run
2024-06-20 13:44:42 +02:00
Marco Vinciguerra
23bc6332d0 fixed a bug 2024-06-19 21:46:31 +02:00
Marco Vinciguerra
8bb560a489 add convert function 2024-06-19 20:17:45 +02:00
Marco Perini
4c8becc721 overwrite common params to affect nodes config
Some checks failed
/ build (3.10) (push) Has been cancelled
Release / Build (push) Has been cancelled
Release / Release (push) Has been cancelled
2024-06-16 15:19:40 +02:00
Marco Perini
12f4386552
Merge branch 'pre/beta' into 349-problem-with-scrapegraphaigraphspdf_scraper_graphpy 2024-06-14 15:22:43 +02:00
Marco Perini
203de83405 fix(pdf): correctly read .pdf files 2024-06-14 15:20:30 +02:00
Marco Perini
283b61fafc docs: better logging 2024-06-13 18:13:47 +02:00
supercoder-dev
d0e300af72
Update fetch_node.py 2024-06-12 14:32:01 +05:30
Marco Perini
a4ee757507
Merge branch 'pre/beta' into pdf_scraper_refactoring 2024-05-25 00:16:38 +02:00
Marco Perini
8d5eb0bb0d fix(local_file): fixed textual input pdf, csv, json and xml graph 2024-05-25 00:13:47 +02:00
Marco Vinciguerra
b913b51cca Merge branch 'logger-integration' into pre/beta 2024-05-24 12:39:14 +02:00
Federico Minutoli
c251cc45d3 fix(node-logging): use centralized logger in each node for logging 2024-05-24 01:09:49 +02:00
Marco Vinciguerra
b377467b29 add info 2024-05-23 12:51:08 +02:00
Marco Vinciguerra
f2dffe534f fix: pdf scraper bug 2024-05-22 11:54:55 +02:00
VinciGit00
cc5adefd29 fix: come back to the old version 2024-05-15 15:54:00 +02:00
VinciGit00
29d284e497 Merge branch 'main' into logger-integration 2024-05-15 15:28:20 +02:00
Marco Vinciguerra
cffcf80a75
Merge branch '88-blockscraper-implementation' into main 2024-05-15 15:20:12 +02:00
VinciGit00
05890835f5 refactoring of loggers 2024-05-15 10:54:53 +02:00
Marco Perini
a6e1813ddd fix(fetch_node): bug in handling local files 2024-05-14 16:51:10 +02:00
VinciGit00
e53766b16e feat: add logger integration 2024-05-14 15:20:39 +02:00
Marco Perini
a296927624 feat(omni-scraper): working OmniScraperGraph with images 2024-05-14 13:46:49 +02:00
Marco Perini
0c1594737f fix(fetch-node): removed isSoup from default 2024-05-13 12:09:55 +02:00
Marco Perini
7e8acd8e6a
Merge branch 'pre/beta' into fix/fetch-node-proxybroker 2024-05-13 11:17:37 +02:00
Marco Perini
1e9a564616 fix(proxy-rotation): removed duplicated arg and passed the loader_kwarhs correctly to the node 2024-05-12 18:39:03 +02:00
mayurdb
300fd5d253 Fetch links in the page while parsing html 2024-05-11 09:46:51 +05:30
Federico Minutoli
fc2aa3ac1c Merge branch 'pre/beta' of https://github.com/DiTo97/Scrapegraph-ai into fix/fetch-node-proxybroker 2024-05-10 21:20:40 +02:00
Federico Minutoli
768719cce8 feat(safe-web-driver): enchanced the original AsyncChromiumLoader web driver with proxy protection and flexible kwargs and backend
the original class prevents passing kwargs down to the playwright backend, making some config unfeasible, including passing a proxy server to the web driver.

the new class has backward compatibility with the original, but 1) allows any kwarg to be passed down to the web driver, 2) allows specifying the web driver backend (only playwright is supported for now) in case more (e.g., selenium) will be supported in the future and 3) automatically fetches a suitable proxy if one is not passed already
2024-05-10 21:13:38 +02:00
Marco Perini
864aa91326 feat: revert fetch_node 2024-05-10 15:11:54 +02:00
Marco Vinciguerra
99adc9799f
Merge branch 'pre/beta' into fetchNodeFix 2024-05-10 11:13:54 +02:00
mayurdb
f8ce3d5916 fix: Augment the information getting fetched from a webpage 2024-05-10 13:28:53 +05:30
VinciGit00
0ab31c3fdb fix: add json integration 2024-05-09 21:07:07 +02:00
VinciGit00
324e977b85 fix: fixed bugs for csv and xml 2024-05-09 20:46:46 +02:00
Shubham Kamboj
f10f3b1438 feat: Add support for passing pdf path as source 2024-05-09 21:55:05 +05:30
Marco Perini
186c0d035d fix(examples): openai std examples 2024-05-08 14:56:44 +02:00
Marco Vinciguerra
b326886250
Merge branch '88-blockscraper-implementation' into asdt 2024-05-07 13:27:30 +02:00
Eric Page
3ae2ea1dbd Miscellaneous "llm" -> "llm_model" refactors 2024-05-05 15:58:50 +02:00
Marco Perini
1409797475 docs: refactor nodes docstrings 2024-05-01 23:17:57 +02:00
Marco Perini
e9817963c8 docs: base and fetch node 2024-05-01 21:30:06 +02:00
EURAC\marperini
2dd7817cfb feat: added verbose flag to suppress print statements 2024-04-30 15:31:57 +02:00
Marco Perini
d592d27bb4
Merge pull request #115 from VinciGit00/101-scrape-json-files
feat: add xml scraper and json scraper
2024-04-30 14:29:57 +02:00
EURAC\marperini
42ab0aa1d2 feat(fetch): added playwright support 2024-04-30 04:02:58 +02:00
VinciGit00
deb920a33e fixing json and example 2024-04-29 16:11:57 +02:00