Scrapegraph-ai/scrapegraphai/utils
2024-09-13 04:18:53 +02:00
..
screenshot_scraping fix(ScreenShotScraper): static import of optional dependencies 2024-09-04 15:00:30 +02:00
tokenizers feat: refactoring of the tokenization function 2024-09-12 20:21:00 +02:00
__init__.py feat: removed semchunk and used tikton 2024-09-10 14:03:52 +02:00
cleanup_html.py refactoring of the code 2024-08-23 11:33:22 +02:00
convert_to_csv.py refactoring of the code 2024-08-23 11:33:22 +02:00
convert_to_json.py refactoring of the code 2024-08-23 11:33:22 +02:00
convert_to_md.py refactoring of the code 2024-08-23 11:33:22 +02:00
copy.py refactoring of the code 2024-09-11 16:04:43 +02:00
llm_output_parser.py fix: Added support for nested structure 2024-09-13 04:18:53 +02:00
logging.py refactoring of the code 2024-08-23 11:33:22 +02:00
parse_state_keys.py refactoring of the code 2024-09-11 16:04:43 +02:00
prettify_exec_info.py refactoring of the code 2024-09-11 16:04:43 +02:00
proxy_rotation.py refactoring of the code 2024-08-10 17:44:35 +02:00
research_web.py refactoring of the code 2024-09-11 16:04:43 +02:00
save_audio_from_bytes.py refactoring of the code 2024-08-10 17:44:35 +02:00
split_text_into_chunks.py use semchunk by default as the other code is causing tokenizers to be called for every individual word which is very slow especially with the mistral tokenizer 2024-09-12 08:46:52 +01:00
sys_dynamic_import.py refactoring of the code 2024-09-11 16:04:43 +02:00
tokenizer.py feat: refactoring of the tokenization function 2024-09-12 20:21:00 +02:00