Umut CAN
827f7260ad
This commit focuses on optimizing the utility modules in the codebase for better performance and maintainability. Key improvements include: - More efficient HTML processing with combined regex operations and optimized tag handling - Enhanced deep copy functionality with better type handling and optimized recursion - Refactored web search with improved error handling and modular helper functions The changes maintain all existing functionality while improving code quality, performance, and maintainability. Documentation and type hints have been enhanced throughout.
...
Optimize utils modules for better performance and maintainability
- Improve HTML cleanup and minification:
- Combine regex operations for better performance
- Add better error handling for HTML processing
- Optimize tag removal and attribute filtering
- Enhance deep copy functionality:
- Add special case handling for primitive types
- Improve type checking and error handling
- Optimize recursive copying for collections
- Refactor web search functionality:
- Add input validation and error handling
- Split search logic into separate helper functions
- Improve proxy handling and configuration
- Add better timeout and error management
- Optimize URL filtering and processing
Technical improvements:
- Better type hints and documentation
- More efficient data structures
- Improved error handling and validation
- Reduced code duplication
- Better separation of concerns
No breaking changes - all existing functionality maintained
2024-10-28 22:40:32 +03:00
Marco Vinciguerra
2d91848b76
a
CodeQL / Analyze (python) (push) Has been cancelled
Release / Build (push) Has been cancelled
Release / Release (push) Has been cancelled
2024-10-28 14:16:47 +01:00
Marco Vinciguerra
15415eebbc
Update funding.json
2024-10-28 14:15:24 +01:00
Marco Vinciguerra
bc19e898ae
Update funding.json
2024-10-28 14:07:28 +01:00
Marco Vinciguerra
5ed28976d2
Update funding.json
2024-10-28 14:05:47 +01:00
Marco Vinciguerra
6418479f49
Update funding.json
2024-10-28 13:59:46 +01:00
Marco Vinciguerra
e97add5daf
Update funding.json
2024-10-28 13:58:01 +01:00
Marco Vinciguerra
8a69fb5ccc
Update funding.json
2024-10-28 13:55:23 +01:00
Marco Vinciguerra
300fd5ac5b
Create funding.json
2024-10-28 13:38:56 +01:00
Marco Vinciguerra
eb24da5a8d
Update overview.rst
CodeQL / Analyze (python) (push) Has been cancelled
/ build (push) Has been cancelled
Release / Build (push) Has been cancelled
Release / Release (push) Has been cancelled
2024-10-26 10:29:37 +02:00
Marco Vinciguerra
a7df68490e
Merge branch 'main' of https://github.com/ScrapeGraphAI/Scrapegraph-ai
2024-10-26 10:27:55 +02:00
Marco Vinciguerra
849fe395da
update doc
2024-10-26 10:27:53 +02:00
semantic-release-bot
3933d64601
ci(release): 1.27.0 [skip ci]
...
## [1.27.0](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.26.7...v1.27.0 ) (2024-10-26)
### Features
* add conditional node structure to the smart_scraper_graph and implemented a structured way to check condition ([cacd9cd ](cacd9cde00 ))
* add integration with scrape.do ([ae275ec ](ae275ec5e8 ))
* add model integration gpt4 ([51c55eb ](51c55eb3a2 ))
* implement ScrapeGraph class for only web scraping automation ([612c644 ](612c644623 ))
* Implement SmartScraperMultiParseMergeFirstGraph class that scrapes a list of URLs and merge the content first and finally generates answers to a given prompt. ([3e3e1b2 ](3e3e1b2f3a ))
* refactoring of export functions ([0ea00c0 ](0ea00c078f ))
* refactoring of get_probable_tags node ([f658092 ](f658092dff ))
* refactoring of ScrapeGraph to SmartScraperLiteGraph ([52b6bf5 ](52b6bf5fb8 ))
### Bug Fixes
* fix export function ([c8a000f ](c8a000f1d9 ))
* fix the example variable name ([69ff649 ](69ff649556 ))
* remove variable "max_result" not being used in the code ([e76a68a ](e76a68a782 ))
### chore
* fix example ([9cd9a87 ](9cd9a874f9 ))
### Test
* Add scrape_graph test ([cdb3c11 ](cdb3c1100e ))
* Add smart_scraper_multi_parse_merge_first_graph test ([464b8b0 ](464b8b04ea ))
### CI
* **release:** 1.26.6-beta.1 [skip ci] ([e0fc457 ](e0fc457d1a ))
* **release:** 1.27.0-beta.1 [skip ci] ([9266a36 ](9266a36b2e ))
* **release:** 1.27.0-beta.10 [skip ci] ([eee131e ](eee131e959 ))
* **release:** 1.27.0-beta.2 [skip ci] ([d84d295 ](d84d295389 ))
* **release:** 1.27.0-beta.3 [skip ci] ([f576afa ](f576afaf0c ))
* **release:** 1.27.0-beta.4 [skip ci] ([3d6bbcd ](3d6bbcdaa3 ))
* **release:** 1.27.0-beta.5 [skip ci] ([5002c71 ](5002c713d5 ))
* **release:** 1.27.0-beta.6 [skip ci] ([94b9836 ](94b9836ef6 ))
* **release:** 1.27.0-beta.7 [skip ci] ([407f1ce ](407f1ce4eb ))
* **release:** 1.27.0-beta.8 [skip ci] ([4f1ed93 ](4f1ed939e6 ))
* **release:** 1.27.0-beta.9 [skip ci] ([fd57cc7 ](fd57cc7c12 ))
2024-10-26 08:06:36 +00:00
Marco Vinciguerra
b7d5a20ae0
Merge pull request #764 from ScrapeGraphAI/pre/beta
2024-10-26 10:05:15 +02:00
semantic-release-bot
eee131e959
ci(release): 1.27.0-beta.10 [skip ci]
...
## [1.27.0-beta.10](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.27.0-beta.9...v1.27.0-beta.10 ) (2024-10-25)
### Bug Fixes
* fix export function ([c8a000f ](c8a000f1d9 ))
2024-10-25 06:45:23 +00:00
Marco Vinciguerra
f9c1432342
Merge pull request #767 from ScrapeGraphAI/fix-export-function
2024-10-25 08:43:40 +02:00
semantic-release-bot
fd57cc7c12
ci(release): 1.27.0-beta.9 [skip ci]
...
## [1.27.0-beta.9](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.27.0-beta.8...v1.27.0-beta.9 ) (2024-10-24)
### Features
* add model integration gpt4 ([51c55eb ](51c55eb3a2 ))
2024-10-24 22:39:44 +00:00
Marco Vinciguerra
9e5e76abbb
Merge pull request #765 from ScrapeGraphAI/add-model-integration-for-images
...
feat: add model integration gpt4
2024-10-25 00:38:16 +02:00
Marco Vinciguerra
4cd5ef296e
add docstring files
CodeQL / Analyze (python) (push) Has been cancelled
/ build (push) Has been cancelled
Release / Build (push) Has been cancelled
Release / Release (push) Has been cancelled
2024-10-24 15:28:27 +02:00
Marco Vinciguerra
6179ab99a4
Update data_export.py
2024-10-24 15:20:36 +02:00
Marco Vinciguerra
c8a000f1d9
fix: fix export function
2024-10-24 10:11:36 +02:00
Marco Vinciguerra
51c55eb3a2
feat: add model integration gpt4
2024-10-24 09:10:51 +02:00
semantic-release-bot
4f1ed939e6
ci(release): 1.27.0-beta.8 [skip ci]
...
## [1.27.0-beta.8](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.27.0-beta.7...v1.27.0-beta.8 ) (2024-10-24)
### Bug Fixes
* removed tokenizer ([a184716 ](a18471688f ))
### CI
* **release:** 1.26.7 [skip ci] ([ec9ef2b ](ec9ef2bcda ))
2024-10-24 06:55:58 +00:00
Marco Vinciguerra
066e77dbe7
Merge branch 'main' into pre/beta
2024-10-24 08:54:17 +02:00
semantic-release-bot
407f1ce4eb
ci(release): 1.27.0-beta.7 [skip ci]
...
## [1.27.0-beta.7](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.27.0-beta.6...v1.27.0-beta.7 ) (2024-10-24)
### Features
* refactoring of get_probable_tags node ([f658092 ](f658092dff ))
2024-10-24 06:45:14 +00:00
Marco Vinciguerra
a1bd05da10
Merge pull request #763 from ScrapeGraphAI/refactoring-get-probable-tags
...
feat: refactoring of get_probable_tags node
2024-10-24 08:43:49 +02:00
Marco Vinciguerra
f658092dff
feat: refactoring of get_probable_tags node
2024-10-23 12:15:16 +02:00
semantic-release-bot
94b9836ef6
ci(release): 1.27.0-beta.6 [skip ci]
...
## [1.27.0-beta.6](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.27.0-beta.5...v1.27.0-beta.6 ) (2024-10-23)
### Features
* add integration with scrape.do ([ae275ec ](ae275ec5e8 ))
2024-10-23 10:09:36 +00:00
Marco Vinciguerra
ae275ec5e8
feat: add integration with scrape.do
2024-10-23 12:08:00 +02:00
semantic-release-bot
5002c713d5
ci(release): 1.27.0-beta.5 [skip ci]
...
## [1.27.0-beta.5](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.27.0-beta.4...v1.27.0-beta.5 ) (2024-10-22)
### Features
* refactoring of export functions ([0ea00c0 ](0ea00c078f ))
2024-10-22 07:06:26 +00:00
Marco Vinciguerra
34d2964f08
Merge pull request #761 from ScrapeGraphAI/refactoring-export-functions
...
feat: refactoring of export functions
2024-10-22 09:04:57 +02:00
Marco Vinciguerra
11ae717623
add new doc
CodeQL / Analyze (python) (push) Has been cancelled
/ build (push) Has been cancelled
Release / Build (push) Has been cancelled
Release / Release (push) Has been cancelled
2024-10-21 11:16:29 +02:00
Marco Vinciguerra
0ea00c078f
feat: refactoring of export functions
2024-10-21 10:30:21 +02:00
semantic-release-bot
3d6bbcdaa3
ci(release): 1.27.0-beta.4 [skip ci]
...
## [1.27.0-beta.4](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.27.0-beta.3...v1.27.0-beta.4 ) (2024-10-21)
### Features
* refactoring of ScrapeGraph to SmartScraperLiteGraph ([52b6bf5 ](52b6bf5fb8 ))
2024-10-21 08:14:25 +00:00
Marco Vinciguerra
52b6bf5fb8
feat: refactoring of ScrapeGraph to SmartScraperLiteGraph
2024-10-21 10:12:53 +02:00
Marco Vinciguerra
b84883bfd1
add smartscraper lite
2024-10-21 09:39:17 +02:00
Marco Vinciguerra
2991ca8dd2
add examples smart scraper lite
2024-10-21 09:33:40 +02:00
semantic-release-bot
f576afaf0c
ci(release): 1.27.0-beta.3 [skip ci]
...
## [1.27.0-beta.3](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.27.0-beta.2...v1.27.0-beta.3 ) (2024-10-20)
### Features
* implement ScrapeGraph class for only web scraping automation ([612c644 ](612c644623 ))
* Implement SmartScraperMultiParseMergeFirstGraph class that scrapes a list of URLs and merge the content first and finally generates answers to a given prompt. ([3e3e1b2 ](3e3e1b2f3a ))
### Bug Fixes
* fix the example variable name ([69ff649 ](69ff649556 ))
### chore
* fix example ([9cd9a87 ](9cd9a874f9 ))
### Test
* Add scrape_graph test ([cdb3c11 ](cdb3c1100e ))
* Add smart_scraper_multi_parse_merge_first_graph test ([464b8b0 ](464b8b04ea ))
2024-10-20 08:15:19 +00:00
Marco Vinciguerra
ffa1067f0d
Merge pull request #756 from shenghongtw/pre/beta
...
The smart_scraper_multi_graph method is too expensive
2024-10-20 10:13:47 +02:00
Marco Vinciguerra
b912904313
Merge pull request #758 from ScrapeGraphAI/fix-together-ai
...
chore: fix example
2024-10-19 07:25:57 +02:00
semantic-release-bot
ec9ef2bcda
ci(release): 1.26.7 [skip ci]
...
## [1.26.7](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.26.6...v1.26.7 ) (2024-10-19)
### Bug Fixes
* removed tokenizer ([a184716 ](a18471688f ))
2024-10-19 05:20:39 +00:00
Marco Vinciguerra
a18471688f
fix: removed tokenizer
2024-10-19 07:18:56 +02:00
Federico Aguzzi
9cd9a874f9
chore: fix example
...
Committing even though this is not the bug we were looking for
2024-10-18 22:35:33 +02:00
semantic-release-bot
d84d295389
ci(release): 1.27.0-beta.2 [skip ci]
...
## [1.27.0-beta.2](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.27.0-beta.1...v1.27.0-beta.2 ) (2024-10-18)
### Bug Fixes
* refactoring of gpt2 tokenizer ([44c3f9c ](44c3f9c989 ))
### CI
* **release:** 1.26.6 [skip ci] ([a4634c7 ](a4634c7331 ))
2024-10-18 20:18:25 +00:00
Federico Aguzzi
8cb9646a45
Merge branch 'main' into pre/beta
2024-10-18 22:16:39 +02:00
Marco Vinciguerra
58b11334d3
Merge branch 'main' of https://github.com/ScrapeGraphAI/Scrapegraph-ai
2024-10-18 17:11:36 +02:00
Marco Vinciguerra
3f71f103a7
scrape do key added
2024-10-18 17:11:33 +02:00
semantic-release-bot
a4634c7331
ci(release): 1.26.6 [skip ci]
...
## [1.26.6](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.26.5...v1.26.6 ) (2024-10-18)
### Bug Fixes
* refactoring of gpt2 tokenizer ([44c3f9c ](44c3f9c989 ))
2024-10-18 07:00:26 +00:00
Marco Vinciguerra
44c3f9c989
fix: refactoring of gpt2 tokenizer
2024-10-18 08:58:53 +02:00
Marco Vinciguerra
bde1e0fbad
Merge pull request #757 from yusefes/fix-tokenizer-loading
...
Fix tokenizer loading for GPT2
2024-10-18 08:57:42 +02:00