Scrapegraph-ai

mirror of https://github.com/VinciGit00/Scrapegraph-ai.git synced 2026-06-28 21:01:55 +08:00

History

Lorenzo Paleari 66ea166438 fix: Added support for nested structure		2024-09-13 04:18:53 +02:00
..
screenshot_scraping	fix(ScreenShotScraper): static import of optional dependencies	2024-09-04 15:00:30 +02:00
tokenizers	feat: refactoring of the tokenization function	2024-09-12 20:21:00 +02:00
__init__.py	feat: removed semchunk and used tikton	2024-09-10 14:03:52 +02:00
cleanup_html.py	refactoring of the code	2024-08-23 11:33:22 +02:00
convert_to_csv.py	refactoring of the code	2024-08-23 11:33:22 +02:00
convert_to_json.py	refactoring of the code	2024-08-23 11:33:22 +02:00
convert_to_md.py	refactoring of the code	2024-08-23 11:33:22 +02:00
copy.py	refactoring of the code	2024-09-11 16:04:43 +02:00
llm_output_parser.py	fix: Added support for nested structure	2024-09-13 04:18:53 +02:00
logging.py	refactoring of the code	2024-08-23 11:33:22 +02:00
parse_state_keys.py	refactoring of the code	2024-09-11 16:04:43 +02:00
prettify_exec_info.py	refactoring of the code	2024-09-11 16:04:43 +02:00
proxy_rotation.py	refactoring of the code	2024-08-10 17:44:35 +02:00
research_web.py	refactoring of the code	2024-09-11 16:04:43 +02:00
save_audio_from_bytes.py	refactoring of the code	2024-08-10 17:44:35 +02:00
split_text_into_chunks.py	use semchunk by default as the other code is causing tokenizers to be called for every individual word which is very slow especially with the mistral tokenizer	2024-09-12 08:46:52 +01:00
sys_dynamic_import.py	refactoring of the code	2024-09-11 16:04:43 +02:00
tokenizer.py	feat: refactoring of the tokenization function	2024-09-12 20:21:00 +02:00