Marco Perini
c75e6a06b1
feat(kg): working rag kg
2024-05-18 10:26:25 +02:00
Marco Perini
0196423bde
feat(knowledgegraph): add knowledge graph node
2024-05-17 23:41:44 +02:00
VinciGit00
cc5adefd29
fix: come back to the old version
2024-05-15 15:54:00 +02:00
Marco Vinciguerra
cffcf80a75
Merge branch '88-blockscraper-implementation' into main
2024-05-15 15:20:12 +02:00
Marco Perini
a296927624
feat(omni-scraper): working OmniScraperGraph with images
2024-05-14 13:46:49 +02:00
Marco Perini
7e8acd8e6a
Merge branch 'pre/beta' into fix/fetch-node-proxybroker
2024-05-13 11:17:37 +02:00
Marco Perini
5d6d996e8f
fix(proxy-rotation): removed max_shape duplicate
2024-05-13 07:26:43 +02:00
Marco Perini
1e9a564616
fix(proxy-rotation): removed duplicated arg and passed the loader_kwarhs correctly to the node
2024-05-12 18:39:03 +02:00
VinciGit00
dc91719365
Update cleanup_html.py
2024-05-11 10:49:16 +02:00
mayurdb
300fd5d253
Fetch links in the page while parsing html
2024-05-11 09:46:51 +05:30
Federico Minutoli
fc2aa3ac1c
Merge branch 'pre/beta' of https://github.com/DiTo97/Scrapegraph-ai into fix/fetch-node-proxybroker
2024-05-10 21:20:40 +02:00
Federico Minutoli
217013181d
feat(proxy-rotation): add parse (IP address) or search (from broker) functionality for proxy rotation
...
the broker has been made fully configurable for anonymity level, admissible locations, scheme and max shape not to waste resources, unlike the original `free-proxy` package.
other options have been explored (e.g., `proxybroker`, `proxybroker2`) due to their built-in proxy server and rotation capabilities, but the former is no longer maintained, and the latter has issue with any python version outside of python 3.9
2024-05-10 21:09:48 +02:00
Federico Minutoli
db2234bf5d
feat(webdriver-backend): add dynamic import scripts from module and file
2024-05-10 21:06:05 +02:00
Marco Perini
864aa91326
feat: revert fetch_node
2024-05-10 15:11:54 +02:00
mayurdb
f8ce3d5916
fix: Augment the information getting fetched from a webpage
2024-05-10 13:28:53 +05:30
Marco Vinciguerra
b326886250
Merge branch '88-blockscraper-implementation' into asdt
2024-05-07 13:27:30 +02:00
VinciGit00
aeb1acbf05
feat: refactoring search function
2024-05-03 21:06:09 +02:00
Perinim
cf038b33ea
docs: update utils docstrings
2024-05-01 12:35:12 +02:00
VinciGit00
a9b11e433a
fix: bug for calculate costs
2024-04-28 14:36:49 +02:00
VinciGit00
adbc08f27b
fix: robot node and proxyes
2024-04-27 19:07:37 +02:00
VinciGit00
b754dd909c
fix: changed proxy function
2024-04-27 14:40:50 +02:00
VinciGit00
f6077d1f98
feat: add new proxy rotation function
2024-04-27 13:31:53 +02:00
EURAC\marperini
e778d27169
added tree metadata
2024-04-26 15:17:44 +02:00
EURAC\marperini
dd99ac595e
structural and textual hashing
2024-04-25 11:56:41 +02:00
EURAC\marperini
c5f9fcaabe
Merge branch 'asdt' of https://github.com/VinciGit00/Scrapegraph-ai into asdt
2024-04-25 11:06:54 +02:00
EURAC\marperini
c927f707e9
subtrees implementation
2024-04-25 11:06:05 +02:00
VinciGit00
f232717bf8
REFACTORING
2024-04-25 09:19:44 +02:00
EURAC\marperini
ae69e4b340
dom tree structure
2024-04-24 18:59:44 +02:00
VinciGit00
74d7ef614a
refactor of conbert to json function
2024-04-21 20:30:56 +02:00
Alok Saboo
6cbf992870
chore: Remove .csv extension from filename in convert_to_csv.py
2024-04-21 14:13:34 -04:00
VinciGit00
3f95801737
Merge branch 'main' of https://github.com/VinciGit00/Scrapegraph-ai
2024-04-19 10:36:40 +02:00
EURAC\marperini
134c94e0ed
added first asdt implementation
2024-04-17 17:12:59 +02:00
Andrea Rota
b0e446f014
feat: apply remove to the document before updating the state
2024-04-17 11:24:56 +02:00
Marco Vinciguerra
4703a0b94c
Update remover.py
2024-04-16 12:27:17 +02:00
VinciGit00
9661c77ebe
add minimizer function
2024-04-16 12:07:43 +02:00
VinciGit00
3640434f5c
add utils
2024-04-12 12:59:15 +02:00
VinciGit00
6dc4c6dec4
Merge branch 'main' into research_branch
2024-04-06 14:44:56 +02:00
EURAC\marperini
57a53bd6af
refactor examples
2024-04-06 13:22:42 +02:00
EURAC\marperini
c2709aa50c
working search graph
2024-04-06 13:01:04 +02:00
VinciGit00
4c3ea8f4ac
fixed image_to_tex_node and refactoring
2024-04-03 12:53:12 +02:00
VinciGit00
b53bfefeea
removed junk if
2024-04-02 12:57:27 +02:00
VinciGit00
471df48a22
add google search
2024-04-01 13:36:02 +02:00
VinciGit00
b8a3590235
add SearcInetnetNode
2024-03-28 13:14:39 +01:00
VinciGit00
fff1f9ee51
new function for searching
2024-03-28 12:41:45 +01:00
VinciGit00
a28ca993de
refactoring of examples folder
2024-03-27 21:41:21 +01:00
VinciGit00
672e9a5191
refactoring of research function
2024-03-27 21:29:10 +01:00
VinciGit00
97d0a6887f
add reserch web function
2024-03-27 21:22:04 +01:00
VinciGit00
587dfa5c68
refactoring of the saving functions
2024-03-26 14:15:36 +01:00
VinciGit00
1afb950751
refactoring of code, add example on the readme
2024-03-18 14:35:47 +01:00
Perinim
875a7cc4f0
refactored RagNode and GenerateAnswerNode
2024-03-17 16:48:58 +01:00