Marco Vinciguerra
f2dffe534f
fix: pdf scraper bug
2024-05-22 11:54:55 +02:00
VinciGit00
cc5adefd29
fix: come back to the old version
2024-05-15 15:54:00 +02:00
Marco Vinciguerra
cffcf80a75
Merge branch '88-blockscraper-implementation' into main
2024-05-15 15:20:12 +02:00
Marco Vinciguerra
22cd9e3605
Merge branch 'search_link_context' into main
2024-05-15 15:16:57 +02:00
Marco Vinciguerra
932df8d491
Merge pull request #238 from VinciGit00/gpt4-omni
2024-05-14 17:09:23 +02:00
mayurdb
a458ec4b9f
Update the prompt for the search_link_node
2024-05-14 16:59:16 +02:00
Marco Perini
a6e1813ddd
fix(fetch_node): bug in handling local files
2024-05-14 16:51:10 +02:00
Marco Perini
a296927624
feat(omni-scraper): working OmniScraperGraph with images
2024-05-14 13:46:49 +02:00
Marco Perini
90955ca52f
feat(gpt-4o): image to text single node test
2024-05-14 11:43:21 +02:00
Marco Perini
367dea5cbd
Merge branch 'pre/beta' into feat/parallel-node-execution
2024-05-13 23:50:40 +02:00
Marco Perini
a8d5e7db05
feat(batchsize): tested different batch sizes and systems
2024-05-13 23:49:48 +02:00
Marco Perini
dedc733047
fix(asyncio): replaced deepcopy with copy due to serialization problems
2024-05-13 18:46:34 +02:00
Marco Perini
0c1594737f
fix(fetch-node): removed isSoup from default
2024-05-13 12:09:55 +02:00
Marco Perini
7e8acd8e6a
Merge branch 'pre/beta' into fix/fetch-node-proxybroker
2024-05-13 11:17:37 +02:00
Marco Perini
1e9a564616
fix(proxy-rotation): removed duplicated arg and passed the loader_kwarhs correctly to the node
2024-05-12 18:39:03 +02:00
VinciGit00
e2350eda62
feat: add new prompt info
2024-05-12 11:14:30 +02:00
mayurdb
9a67a26cd3
Update documentation
2024-05-11 16:57:22 +05:30
mayurdb
df271b6451
Add search link node that can find out relevant links in the webpage
2024-05-11 16:39:55 +05:30
Marco Vinciguerra
b752499fab
Merge pull request #217 from mayurdb/fetchLinkFix
...
Fetch links in the page while parsing html
2024-05-11 09:42:40 +02:00
mayurdb
300fd5d253
Fetch links in the page while parsing html
2024-05-11 09:46:51 +05:30
Eric Page
0683e78e78
Merge branch 'pre/beta' into fix-GenerateScraperGraph
2024-05-11 01:59:28 +02:00
Eric Page
aac51ba290
Removed dead code, allows GenerateScraperNode to generate scraper with
...
one chunk of context
2024-05-11 01:34:51 +02:00
Eric Page
40884747c7
Added parse_html option in parse_node
2024-05-11 00:32:01 +02:00
Federico Minutoli
627cbeeb20
feat(parallel-exeuction): add asyncio event loop dispatcher with semaphore for parallel graph instances
...
TODO: still untested
2024-05-11 00:13:27 +02:00
Federico Minutoli
fc2aa3ac1c
Merge branch 'pre/beta' of https://github.com/DiTo97/Scrapegraph-ai into fix/fetch-node-proxybroker
2024-05-10 21:20:40 +02:00
Federico Minutoli
768719cce8
feat(safe-web-driver): enchanced the original AsyncChromiumLoader web driver with proxy protection and flexible kwargs and backend
...
the original class prevents passing kwargs down to the playwright backend, making some config unfeasible, including passing a proxy server to the web driver.
the new class has backward compatibility with the original, but 1) allows any kwarg to be passed down to the web driver, 2) allows specifying the web driver backend (only playwright is supported for now) in case more (e.g., selenium) will be supported in the future and 3) automatically fetches a suitable proxy if one is not passed already
2024-05-10 21:13:38 +02:00
Marco Perini
864aa91326
feat: revert fetch_node
2024-05-10 15:11:54 +02:00
Marco Vinciguerra
99adc9799f
Merge branch 'pre/beta' into fetchNodeFix
2024-05-10 11:13:54 +02:00
mayurdb
f8ce3d5916
fix: Augment the information getting fetched from a webpage
2024-05-10 13:28:53 +05:30
VinciGit00
0ab31c3fdb
fix: add json integration
2024-05-09 21:07:07 +02:00
VinciGit00
324e977b85
fix: fixed bugs for csv and xml
2024-05-09 20:46:46 +02:00
Shubham Kamboj
f10f3b1438
feat: Add support for passing pdf path as source
2024-05-09 21:55:05 +05:30
Marco Perini
6b71ec1d2b
fix(examples): local, mixed models and fixed SearchGraph embeddings problem
2024-05-08 15:36:26 +02:00
Marco Perini
186c0d035d
fix(examples): openai std examples
2024-05-08 14:56:44 +02:00
Marco Vinciguerra
b326886250
Merge branch '88-blockscraper-implementation' into asdt
2024-05-07 13:27:30 +02:00
VinciGit00
67d5fbf816
feat: new search_graph
2024-05-06 22:09:18 +02:00
VinciGit00
5a67bca0db
Merge branch 'pre/beta' into pr/161
2024-05-06 14:50:04 +02:00
VinciGit00
51aa109e42
feat: add turboscraper (alfa)
2024-05-06 11:59:14 +02:00
VinciGit00
5e1d5db6da
Update search_internet_node.py
2024-05-06 10:09:17 +02:00
VinciGit00
80053a2358
Merge branch 'pre/beta' of https://github.com/VinciGit00/Scrapegraph-ai into pre/beta
2024-05-06 10:07:56 +02:00
VinciGit00
389b52aabb
removed examples
2024-05-06 10:07:31 +02:00
VinciGit00
88d999e15b
add website content
2024-05-06 09:36:32 +02:00
Marco Perini
930adb38f2
feat(node): multiple url search in SearchGraph + fixes
...
Implemented GraphIteratorNode and MergeAnswersNode to create multiple istances of a graph and merge the scraped content from multiple pages
2024-05-06 00:30:09 +02:00
Marco Perini
dbb614a8dd
feat: multiple graph instances
2024-05-05 23:51:04 +02:00
Marco Perini
84fcb44aaa
feat: fixed custom_graphs example and robots_node
2024-05-05 22:02:24 +02:00
Lorenzo Padoan
e1b9d69360
dev basic class blockindentifier
2024-05-05 17:36:30 +02:00
Eric Page
cc27b21c3f
Merge branch 'pre/beta' into pass-common-params-graph
2024-05-05 16:18:32 +02:00
Eric Page
3ae2ea1dbd
Miscellaneous "llm" -> "llm_model" refactors
2024-05-05 15:58:50 +02:00
VinciGit00
cb1bd00a19
removed unused node
2024-05-05 09:40:35 +02:00
Eric Page
729d5d7597
Changed node_config["llm"] to node_config["llm_model"]
2024-05-05 09:36:11 +02:00