Commit Graph

63 Commits

Author SHA1 Message Date
Marco Perini
c75e6a06b1 feat(kg): working rag kg 2024-05-18 10:26:25 +02:00
Marco Perini
0196423bde feat(knowledgegraph): add knowledge graph node 2024-05-17 23:41:44 +02:00
VinciGit00
cc5adefd29 fix: come back to the old version 2024-05-15 15:54:00 +02:00
Marco Vinciguerra
cffcf80a75
Merge branch '88-blockscraper-implementation' into main 2024-05-15 15:20:12 +02:00
Marco Perini
a296927624 feat(omni-scraper): working OmniScraperGraph with images 2024-05-14 13:46:49 +02:00
Marco Perini
7e8acd8e6a
Merge branch 'pre/beta' into fix/fetch-node-proxybroker 2024-05-13 11:17:37 +02:00
Marco Perini
5d6d996e8f fix(proxy-rotation): removed max_shape duplicate 2024-05-13 07:26:43 +02:00
Marco Perini
1e9a564616 fix(proxy-rotation): removed duplicated arg and passed the loader_kwarhs correctly to the node 2024-05-12 18:39:03 +02:00
VinciGit00
dc91719365 Update cleanup_html.py 2024-05-11 10:49:16 +02:00
mayurdb
300fd5d253 Fetch links in the page while parsing html 2024-05-11 09:46:51 +05:30
Federico Minutoli
fc2aa3ac1c Merge branch 'pre/beta' of https://github.com/DiTo97/Scrapegraph-ai into fix/fetch-node-proxybroker 2024-05-10 21:20:40 +02:00
Federico Minutoli
217013181d feat(proxy-rotation): add parse (IP address) or search (from broker) functionality for proxy rotation
the broker has been made fully configurable for anonymity level, admissible locations, scheme and max shape not to waste resources, unlike the original `free-proxy` package.

other options have been explored (e.g., `proxybroker`, `proxybroker2`) due to their built-in proxy server and rotation capabilities, but the former is no longer maintained, and the latter has issue with any python version outside of python 3.9
2024-05-10 21:09:48 +02:00
Federico Minutoli
db2234bf5d feat(webdriver-backend): add dynamic import scripts from module and file 2024-05-10 21:06:05 +02:00
Marco Perini
864aa91326 feat: revert fetch_node 2024-05-10 15:11:54 +02:00
mayurdb
f8ce3d5916 fix: Augment the information getting fetched from a webpage 2024-05-10 13:28:53 +05:30
Marco Vinciguerra
b326886250
Merge branch '88-blockscraper-implementation' into asdt 2024-05-07 13:27:30 +02:00
VinciGit00
aeb1acbf05 feat: refactoring search function 2024-05-03 21:06:09 +02:00
Perinim
cf038b33ea docs: update utils docstrings 2024-05-01 12:35:12 +02:00
VinciGit00
a9b11e433a fix: bug for calculate costs 2024-04-28 14:36:49 +02:00
VinciGit00
adbc08f27b fix: robot node and proxyes 2024-04-27 19:07:37 +02:00
VinciGit00
b754dd909c fix: changed proxy function 2024-04-27 14:40:50 +02:00
VinciGit00
f6077d1f98 feat: add new proxy rotation function 2024-04-27 13:31:53 +02:00
EURAC\marperini
e778d27169 added tree metadata 2024-04-26 15:17:44 +02:00
EURAC\marperini
dd99ac595e structural and textual hashing 2024-04-25 11:56:41 +02:00
EURAC\marperini
c5f9fcaabe Merge branch 'asdt' of https://github.com/VinciGit00/Scrapegraph-ai into asdt 2024-04-25 11:06:54 +02:00
EURAC\marperini
c927f707e9 subtrees implementation 2024-04-25 11:06:05 +02:00
VinciGit00
f232717bf8 REFACTORING 2024-04-25 09:19:44 +02:00
EURAC\marperini
ae69e4b340 dom tree structure 2024-04-24 18:59:44 +02:00
VinciGit00
74d7ef614a refactor of conbert to json function 2024-04-21 20:30:56 +02:00
Alok Saboo
6cbf992870 chore: Remove .csv extension from filename in convert_to_csv.py 2024-04-21 14:13:34 -04:00
VinciGit00
3f95801737 Merge branch 'main' of https://github.com/VinciGit00/Scrapegraph-ai 2024-04-19 10:36:40 +02:00
EURAC\marperini
134c94e0ed added first asdt implementation 2024-04-17 17:12:59 +02:00
Andrea Rota
b0e446f014 feat: apply remove to the document before updating the state 2024-04-17 11:24:56 +02:00
Marco Vinciguerra
4703a0b94c
Update remover.py 2024-04-16 12:27:17 +02:00
VinciGit00
9661c77ebe add minimizer function 2024-04-16 12:07:43 +02:00
VinciGit00
3640434f5c add utils 2024-04-12 12:59:15 +02:00
VinciGit00
6dc4c6dec4 Merge branch 'main' into research_branch 2024-04-06 14:44:56 +02:00
EURAC\marperini
57a53bd6af refactor examples 2024-04-06 13:22:42 +02:00
EURAC\marperini
c2709aa50c working search graph 2024-04-06 13:01:04 +02:00
VinciGit00
4c3ea8f4ac fixed image_to_tex_node and refactoring 2024-04-03 12:53:12 +02:00
VinciGit00
b53bfefeea removed junk if 2024-04-02 12:57:27 +02:00
VinciGit00
471df48a22 add google search 2024-04-01 13:36:02 +02:00
VinciGit00
b8a3590235 add SearcInetnetNode 2024-03-28 13:14:39 +01:00
VinciGit00
fff1f9ee51 new function for searching 2024-03-28 12:41:45 +01:00
VinciGit00
a28ca993de refactoring of examples folder 2024-03-27 21:41:21 +01:00
VinciGit00
672e9a5191 refactoring of research function 2024-03-27 21:29:10 +01:00
VinciGit00
97d0a6887f add reserch web function 2024-03-27 21:22:04 +01:00
VinciGit00
587dfa5c68 refactoring of the saving functions 2024-03-26 14:15:36 +01:00
VinciGit00
1afb950751 refactoring of code, add example on the readme 2024-03-18 14:35:47 +01:00
Perinim
875a7cc4f0 refactored RagNode and GenerateAnswerNode 2024-03-17 16:48:58 +01:00