Commit Graph

207 Commits

Author SHA1 Message Date
Marco Vinciguerra
909af8d912 refactor gen answ node 2024-05-23 13:45:23 +02:00
Marco Vinciguerra
f00ed35f7b
Merge branch 'pre/beta' into patch-1 2024-05-23 11:47:51 +02:00
Marco Vinciguerra
f2dffe534f fix: pdf scraper bug 2024-05-22 11:54:55 +02:00
Marco Vinciguerra
be4237a04d
Merge branch 'pre/beta' into multi_scraper_graph 2024-05-21 13:44:06 +02:00
Marco Perini
fc58e2d3a6 feat(smart-scraper-multi): add schema to graphs and created SmartScraperMultiGraph 2024-05-21 13:13:27 +02:00
JGalego
3ffa896c39 Fixed model ID -> model name conversion 2024-05-20 17:35:55 +01:00
JGalego
05ecc3ae20 Added missing logic to extract model_name from model_id 2024-05-20 17:17:11 +01:00
Marco Perini
a338383399 feat(kg): removed import 2024-05-18 10:30:29 +02:00
Marco Perini
58cc903d55 feat(multiple): quick fix working 2024-05-18 05:57:43 +02:00
Marco Vinciguerra
6f62b0560a Merge branch 'multi_scraper_graph' of https://github.com/VinciGit00/Scrapegraph-ai into multi_scraper_graph 2024-05-18 02:42:27 +02:00
Marco Vinciguerra
b82f33aee7 fix: template names 2024-05-18 02:42:25 +02:00
Marco Perini
bed3eed50c feat(multiple_search): working multiple example 2024-05-18 01:51:12 +02:00
Marco Vinciguerra
05e511e36f add new prompts 2024-05-18 00:22:52 +02:00
Marco Perini
0196423bde feat(knowledgegraph): add knowledge graph node 2024-05-17 23:41:44 +02:00
Marco Perini
8c33ea3fbc feat(node): knowledge graph node 2024-05-17 18:40:58 +02:00
Marco Vinciguerra
3453f72397 add graph 2024-05-17 18:23:50 +02:00
Marco Vinciguerra
02745a4f63 Merge branch 'main' into pre/beta 2024-05-17 10:40:19 +02:00
VinciGit00
9e9f8f09d5 removed max depth 2024-05-16 12:39:08 +02:00
VinciGit00
9483afddd3 revert 2024-05-16 12:34:02 +02:00
Mayur Bhosale
1e0b2f7e70
Merge branch 'pre/beta' into nDeep 2024-05-15 23:36:23 +02:00
mayurdb
d60438cc89 Add a n-level deep search support 2024-05-15 23:32:15 +02:00
VinciGit00
ba8a4f7122 removed duplicates 2024-05-15 21:47:39 +02:00
VinciGit00
cc5adefd29 fix: come back to the old version 2024-05-15 15:54:00 +02:00
Marco Vinciguerra
008f8d9eae
Merge pull request #248 from VinciGit00/main
reallignment
2024-05-15 15:20:53 +02:00
Marco Vinciguerra
cffcf80a75
Merge branch '88-blockscraper-implementation' into main 2024-05-15 15:20:12 +02:00
Marco Vinciguerra
22cd9e3605
Merge branch 'search_link_context' into main 2024-05-15 15:16:57 +02:00
mayurdb
0b71b9a1e8 Add a new graph traversal that allows more than one edges out of a graph 2024-05-15 13:46:19 +02:00
Marco Vinciguerra
932df8d491
Merge pull request #238 from VinciGit00/gpt4-omni 2024-05-14 17:09:23 +02:00
mayurdb
a458ec4b9f Update the prompt for the search_link_node 2024-05-14 16:59:16 +02:00
Marco Perini
a6e1813ddd fix(fetch_node): bug in handling local files 2024-05-14 16:51:10 +02:00
Marco Perini
a296927624 feat(omni-scraper): working OmniScraperGraph with images 2024-05-14 13:46:49 +02:00
Marco Perini
90955ca52f feat(gpt-4o): image to text single node test 2024-05-14 11:43:21 +02:00
Marco Perini
367dea5cbd
Merge branch 'pre/beta' into feat/parallel-node-execution 2024-05-13 23:50:40 +02:00
Marco Perini
a8d5e7db05 feat(batchsize): tested different batch sizes and systems 2024-05-13 23:49:48 +02:00
Marco Perini
dedc733047 fix(asyncio): replaced deepcopy with copy due to serialization problems 2024-05-13 18:46:34 +02:00
Marco Perini
0c1594737f fix(fetch-node): removed isSoup from default 2024-05-13 12:09:55 +02:00
Marco Perini
7e8acd8e6a
Merge branch 'pre/beta' into fix/fetch-node-proxybroker 2024-05-13 11:17:37 +02:00
Marco Perini
1e9a564616 fix(proxy-rotation): removed duplicated arg and passed the loader_kwarhs correctly to the node 2024-05-12 18:39:03 +02:00
VinciGit00
e2350eda62 feat: add new prompt info 2024-05-12 11:14:30 +02:00
mayurdb
9a67a26cd3 Update documentation 2024-05-11 16:57:22 +05:30
mayurdb
df271b6451 Add search link node that can find out relevant links in the webpage 2024-05-11 16:39:55 +05:30
Marco Vinciguerra
b752499fab
Merge pull request #217 from mayurdb/fetchLinkFix
Fetch links in the page while parsing html
2024-05-11 09:42:40 +02:00
mayurdb
300fd5d253 Fetch links in the page while parsing html 2024-05-11 09:46:51 +05:30
Eric Page
0683e78e78
Merge branch 'pre/beta' into fix-GenerateScraperGraph 2024-05-11 01:59:28 +02:00
Eric Page
aac51ba290 Removed dead code, allows GenerateScraperNode to generate scraper with
one chunk of context
2024-05-11 01:34:51 +02:00
Eric Page
40884747c7 Added parse_html option in parse_node 2024-05-11 00:32:01 +02:00
Federico Minutoli
627cbeeb20 feat(parallel-exeuction): add asyncio event loop dispatcher with semaphore for parallel graph instances
TODO: still untested
2024-05-11 00:13:27 +02:00
Federico Minutoli
fc2aa3ac1c Merge branch 'pre/beta' of https://github.com/DiTo97/Scrapegraph-ai into fix/fetch-node-proxybroker 2024-05-10 21:20:40 +02:00
Federico Minutoli
768719cce8 feat(safe-web-driver): enchanced the original AsyncChromiumLoader web driver with proxy protection and flexible kwargs and backend
the original class prevents passing kwargs down to the playwright backend, making some config unfeasible, including passing a proxy server to the web driver.

the new class has backward compatibility with the original, but 1) allows any kwarg to be passed down to the web driver, 2) allows specifying the web driver backend (only playwright is supported for now) in case more (e.g., selenium) will be supported in the future and 3) automatically fetches a suitable proxy if one is not passed already
2024-05-10 21:13:38 +02:00
Marco Perini
864aa91326 feat: revert fetch_node 2024-05-10 15:11:54 +02:00