Scrapegraph-ai/scrapegraphai/utils
Umut CAN 827f7260ad This commit focuses on optimizing the utility modules in the codebase for better performance and maintainability. Key improvements include: - More efficient HTML processing with combined regex operations and optimized tag handling - Enhanced deep copy functionality with better type handling and optimized recursion - Refactored web search with improved error handling and modular helper functions The changes maintain all existing functionality while improving code quality, performance, and maintainability. Documentation and type hints have been enhanced throughout.
Optimize utils modules for better performance and maintainability

- Improve HTML cleanup and minification:
  - Combine regex operations for better performance
  - Add better error handling for HTML processing
  - Optimize tag removal and attribute filtering

- Enhance deep copy functionality:
  - Add special case handling for primitive types
  - Improve type checking and error handling
  - Optimize recursive copying for collections

- Refactor web search functionality:
  - Add input validation and error handling
  - Split search logic into separate helper functions
  - Improve proxy handling and configuration
  - Add better timeout and error management
  - Optimize URL filtering and processing

Technical improvements:
- Better type hints and documentation
- More efficient data structures
- Improved error handling and validation
- Reduced code duplication
- Better separation of concerns

No breaking changes - all existing functionality maintained
2024-10-28 22:40:32 +03:00
..
screenshot_scraping removed unused imports and comments + removed dead code 2024-09-23 09:25:13 +02:00
tokenizers fix: removed tokenizer 2024-10-19 07:18:56 +02:00
__init__.py feat: refactoring of export functions 2024-10-21 10:30:21 +02:00
cleanup_code.py update readme and readibility of the code 2024-10-02 10:07:03 +02:00
cleanup_html.py This commit focuses on optimizing the utility modules in the codebase for better performance and maintainability. Key improvements include: - More efficient HTML processing with combined regex operations and optimized tag handling - Enhanced deep copy functionality with better type handling and optimized recursion - Refactored web search with improved error handling and modular helper functions The changes maintain all existing functionality while improving code quality, performance, and maintainability. Documentation and type hints have been enhanced throughout. 2024-10-28 22:40:32 +03:00
code_error_analysis.py fix: async invocation 2024-10-13 11:30:39 +02:00
code_error_correction.py fix: async invocation 2024-10-13 11:30:39 +02:00
convert_to_md.py refctoring of the code 2024-09-15 11:20:08 +02:00
copy.py This commit focuses on optimizing the utility modules in the codebase for better performance and maintainability. Key improvements include: - More efficient HTML processing with combined regex operations and optimized tag handling - Enhanced deep copy functionality with better type handling and optimized recursion - Refactored web search with improved error handling and modular helper functions The changes maintain all existing functionality while improving code quality, performance, and maintainability. Documentation and type hints have been enhanced throughout. 2024-10-28 22:40:32 +03:00
custom_callback.py removed unused imports and comments + removed dead code 2024-09-23 09:25:13 +02:00
data_export.py Update data_export.py 2024-10-24 15:20:36 +02:00
dict_content_compare.py removed unused files 2024-10-12 09:41:02 +02:00
llm_callback_manager.py refactoring of examples 2024-10-08 08:54:18 +02:00
logging.py removed unused imports and comments + removed dead code 2024-09-23 09:25:13 +02:00
model_costs.py removed unused files 2024-10-12 09:41:02 +02:00
output_parser.py Update output_parser.py 2024-09-28 11:05:09 +02:00
parse_state_keys.py refctoring of the code 2024-09-15 11:20:08 +02:00
prettify_exec_info.py refactoring of the code 2024-09-28 09:02:20 +02:00
proxy_rotation.py removed unused files 2024-10-12 09:41:02 +02:00
research_web.py This commit focuses on optimizing the utility modules in the codebase for better performance and maintainability. Key improvements include: - More efficient HTML processing with combined regex operations and optimized tag handling - Enhanced deep copy functionality with better type handling and optimized recursion - Refactored web search with improved error handling and modular helper functions The changes maintain all existing functionality while improving code quality, performance, and maintainability. Documentation and type hints have been enhanced throughout. 2024-10-28 22:40:32 +03:00
save_audio_from_bytes.py refctoring of the code 2024-09-15 11:20:08 +02:00
save_code_to_file.py fix: node refiner + examples 2024-09-25 14:53:51 +02:00
schema_trasform.py refactoring of examples 2024-10-08 08:54:18 +02:00
split_text_into_chunks.py removed unused files 2024-10-12 09:41:02 +02:00
sys_dynamic_import.py removed unused files 2024-10-12 09:41:02 +02:00
tokenizer.py fix: removed tokenizer 2024-10-19 07:18:56 +02:00