Commit Graph

217 Commits

Author SHA1 Message Date
Marco Vinciguerra
1c21e5a836
Update README.md
Some checks failed
CodeQL / Analyze (python) (push) Has been cancelled
Release / Build (push) Has been cancelled
Test Suite / Unit Tests (Python ${{ matrix.python-version }}) (macos-latest, 3.10) (push) Has been cancelled
Test Suite / Unit Tests (Python ${{ matrix.python-version }}) (macos-latest, 3.11) (push) Has been cancelled
Test Suite / Unit Tests (Python ${{ matrix.python-version }}) (macos-latest, 3.12) (push) Has been cancelled
Test Suite / Unit Tests (Python ${{ matrix.python-version }}) (ubuntu-latest, 3.10) (push) Has been cancelled
Test Suite / Unit Tests (Python ${{ matrix.python-version }}) (ubuntu-latest, 3.11) (push) Has been cancelled
Test Suite / Unit Tests (Python ${{ matrix.python-version }}) (ubuntu-latest, 3.12) (push) Has been cancelled
Test Suite / Unit Tests (Python ${{ matrix.python-version }}) (windows-latest, 3.10) (push) Has been cancelled
Test Suite / Unit Tests (Python ${{ matrix.python-version }}) (windows-latest, 3.11) (push) Has been cancelled
Test Suite / Unit Tests (Python ${{ matrix.python-version }}) (windows-latest, 3.12) (push) Has been cancelled
Test Suite / Integration Tests (file-formats) (push) Has been cancelled
Test Suite / Integration Tests (multi-graph) (push) Has been cancelled
Test Suite / Integration Tests (smart-scraper) (push) Has been cancelled
Test Suite / Performance Benchmarks (push) Has been cancelled
Test Suite / Code Quality Checks (push) Has been cancelled
Release / Release (push) Has been cancelled
Test Suite / Test Coverage Report (push) Has been cancelled
Test Suite / Test Summary (push) Has been cancelled
2026-03-17 09:21:17 -07:00
octo-patch
6a2f8ecc7b feat: add MiniMax as a supported LLM provider
MiniMax provides an OpenAI-compatible API, making integration
straightforward. This adds:

- MiniMax model wrapper class (OpenAI-compatible)
- Model token mappings for MiniMax-M1, M2, and M2.5 models
- Provider routing in abstract_graph factory
- README update listing MiniMax as a supported provider
2026-03-14 22:54:38 +08:00
Marco Vinciguerra
909a0c9873 add new readme
Some checks failed
CodeQL / Analyze (python) (push) Has been cancelled
Release / Build (push) Has been cancelled
Test Suite / Unit Tests (Python ${{ matrix.python-version }}) (macos-latest, 3.10) (push) Has been cancelled
Test Suite / Unit Tests (Python ${{ matrix.python-version }}) (macos-latest, 3.11) (push) Has been cancelled
Test Suite / Unit Tests (Python ${{ matrix.python-version }}) (macos-latest, 3.12) (push) Has been cancelled
Test Suite / Unit Tests (Python ${{ matrix.python-version }}) (ubuntu-latest, 3.10) (push) Has been cancelled
Test Suite / Unit Tests (Python ${{ matrix.python-version }}) (ubuntu-latest, 3.11) (push) Has been cancelled
Test Suite / Unit Tests (Python ${{ matrix.python-version }}) (ubuntu-latest, 3.12) (push) Has been cancelled
Test Suite / Unit Tests (Python ${{ matrix.python-version }}) (windows-latest, 3.10) (push) Has been cancelled
Test Suite / Unit Tests (Python ${{ matrix.python-version }}) (windows-latest, 3.11) (push) Has been cancelled
Test Suite / Unit Tests (Python ${{ matrix.python-version }}) (windows-latest, 3.12) (push) Has been cancelled
Test Suite / Integration Tests (file-formats) (push) Has been cancelled
Test Suite / Integration Tests (multi-graph) (push) Has been cancelled
Test Suite / Integration Tests (smart-scraper) (push) Has been cancelled
Test Suite / Performance Benchmarks (push) Has been cancelled
Test Suite / Code Quality Checks (push) Has been cancelled
Release / Release (push) Has been cancelled
Test Suite / Test Coverage Report (push) Has been cancelled
Test Suite / Test Summary (push) Has been cancelled
2026-01-08 16:11:41 +01:00
adrienpacifico
6ea2cbf7cc
Add format key to LLM configuration, solve bug.
Example does not work with current configuration.
Adding json format solve the issue for llama3.2.


Exemple is also curently long (1m26), small mods give the same results down to 10 sec on a M4 macbook.

----
Error with  scrapegraphai v1.71.0, langchain-ollama v1.0.1 and langchain v1.2.1


```python
---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
File ~/Projects/weekend_projects/coffee/coffee_llm_crawl/.venv/lib/python3.13/site-packages/langchain_core/output_parsers/json.py:84, in JsonOutputParser.parse_result(self, result, partial)
     83 try:
---> 84     return parse_json_markdown(text)
     85 except JSONDecodeError as e:

File ~/Projects/weekend_projects/coffee/coffee_llm_crawl/.venv/lib/python3.13/site-packages/langchain_core/utils/json.py:164, in parse_json_markdown(json_string, parser)
    163     json_str = json_string if match is None else match.group(2)
--> 164 return _parse_json(json_str, parser=parser)

File ~/Projects/weekend_projects/coffee/coffee_llm_crawl/.venv/lib/python3.13/site-packages/langchain_core/utils/json.py:194, in _parse_json(json_str, parser)
    193 # Parse the JSON string into a Python dictionary
--> 194 return parser(json_str)

File ~/Projects/weekend_projects/coffee/coffee_llm_crawl/.venv/lib/python3.13/site-packages/langchain_core/utils/json.py:137, in parse_partial_json(s, strict)
    134 # If we got here, we ran out of characters to remove
    135 # and still couldn't parse the string as JSON, so return the parse error
    136 # for the original string.
--> 137 return json.loads(s, strict=strict)

File ~/.local/share/uv/python/cpython-3.13.7-macos-aarch64-none/lib/python3.13/json/__init__.py:359, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    358     kw['parse_constant'] = parse_constant
--> 359 return cls(**kw).decode(s)

File ~/.local/share/uv/python/cpython-3.13.7-macos-aarch64-none/lib/python3.13/json/decoder.py:345, in JSONDecoder.decode(self, s, _w)
    341 """Return the Python representation of ``s`` (a ``str`` instance
    342 containing a JSON document).
    343 
    344 """
--> 345 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    346 end = _w(s, end).end()

File ~/.local/share/uv/python/cpython-3.13.7-macos-aarch64-none/lib/python3.13/json/decoder.py:363, in JSONDecoder.raw_decode(self, s, idx)
    362 except StopIteration as err:
--> 363     raise JSONDecodeError("Expecting value", s, err.value) from None
    364 return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

The above exception was the direct cause of the following exception:

OutputParserException                     Traceback (most recent call last)
Cell In[11], line 22
     15 smart_scraper_graph = SmartScraperGraph(
     16     prompt="Extract useful information from the webpage, including a description of what the company does, founders and social media links",
     17     source="https://scrapegraphai.com/",
     18     config=graph_config
     19 )
     21 # Run the pipeline
---> 22 result = smart_scraper_graph.run()
     24 import json
     25 print(json.dumps(result, indent=4))

File ~/Projects/weekend_projects/coffee/coffee_llm_crawl/.venv/lib/python3.13/site-packages/scrapegraphai/graphs/smart_scraper_graph.py:303, in SmartScraperGraph.run(self)
    295 """
    296 Executes the scraping process and returns the answer to the prompt.
    297 
    298 Returns:
    299     str: The answer to the prompt.
    300 """
    302 inputs = {"user_prompt": self.prompt, self.input_key: self.source}
--> 303 self.final_state, self.execution_info = self.graph.execute(inputs)
    305 return self.final_state.get("answer", "No answer found.")

File ~/Projects/weekend_projects/coffee/coffee_llm_crawl/.venv/lib/python3.13/site-packages/scrapegraphai/graphs/base_graph.py:363, in BaseGraph.execute(self, initial_state)
    361     state, exec_info = (result["_state"], [])
    362 else:
--> 363     state, exec_info = self._execute_standard(initial_state)
    365 # Print the result first
    366 if "answer" in state:

File ~/Projects/weekend_projects/coffee/coffee_llm_crawl/.venv/lib/python3.13/site-packages/scrapegraphai/graphs/base_graph.py:308, in BaseGraph._execute_standard(self, initial_state)
    295         graph_execution_time = time.time() - start_time
    296         log_graph_execution(
    297             graph_name=self.graph_name,
    298             source=source,
   (...)    306             exception=str(e),
    307         )
--> 308         raise e
    310 exec_info.append(
    311     {
    312         "node_name": "TOTAL RESULT",
   (...)    319     }
    320 )
    322 graph_execution_time = time.time() - start_time

File ~/Projects/weekend_projects/coffee/coffee_llm_crawl/.venv/lib/python3.13/site-packages/scrapegraphai/graphs/base_graph.py:281, in BaseGraph._execute_standard(self, initial_state)
    278     schema = self._get_schema(current_node)
    280 try:
--> 281     result, node_exec_time, cb_data = self._execute_node(
    282         current_node, state, llm_model, llm_model_name
    283     )
    284     total_exec_time += node_exec_time
    286     if cb_data:

File ~/Projects/weekend_projects/coffee/coffee_llm_crawl/.venv/lib/python3.13/site-packages/scrapegraphai/graphs/base_graph.py:205, in BaseGraph._execute_node(self, current_node, state, llm_model, llm_model_name)
    200 curr_time = time.time()
    202 with self.callback_manager.exclusive_get_callback(
    203     llm_model, llm_model_name
    204 ) as cb:
--> 205     result = current_node.execute(state)
    206     node_exec_time = time.time() - curr_time
    208     cb_data = None

File ~/Projects/weekend_projects/coffee/coffee_llm_crawl/.venv/lib/python3.13/site-packages/scrapegraphai/nodes/generate_answer_node.py:193, in GenerateAnswerNode.execute(self, state)
    190     chain = chain | output_parser
    192 try:
--> 193     answer = self.invoke_with_timeout(
    194         chain, {"content": doc, "question": user_prompt}, self.timeout
    195     )
    196 except (Timeout, json.JSONDecodeError) as e:
    197     error_msg = (
    198         "Response timeout exceeded"
    199         if isinstance(e, Timeout)
    200         else "Invalid JSON response format"
    201     )

File ~/Projects/weekend_projects/coffee/coffee_llm_crawl/.venv/lib/python3.13/site-packages/scrapegraphai/nodes/generate_answer_node.py:79, in GenerateAnswerNode.invoke_with_timeout(self, chain, inputs, timeout)
     77 try:
     78     start_time = time.time()
---> 79     response = chain.invoke(inputs)
     80     if time.time() - start_time > timeout:
     81         raise Timeout(f"Response took longer than {timeout} seconds")

File ~/Projects/weekend_projects/coffee/coffee_llm_crawl/.venv/lib/python3.13/site-packages/langchain_core/runnables/base.py:3151, in RunnableSequence.invoke(self, input, config, **kwargs)
   3149                 input_ = context.run(step.invoke, input_, config, **kwargs)
   3150             else:
-> 3151                 input_ = context.run(step.invoke, input_, config)
   3152 # finish the root run
   3153 except BaseException as e:

File ~/Projects/weekend_projects/coffee/coffee_llm_crawl/.venv/lib/python3.13/site-packages/langchain_core/output_parsers/base.py:201, in BaseOutputParser.invoke(self, input, config, **kwargs)
    193 @override
    194 def invoke(
    195     self,
   (...)    198     **kwargs: Any,
    199 ) -> T:
    200     if isinstance(input, BaseMessage):
--> 201         return self._call_with_config(
    202             lambda inner_input: self.parse_result(
    203                 [ChatGeneration(message=inner_input)]
    204             ),
    205             input,
    206             config,
    207             run_type="parser",
    208         )
    209     return self._call_with_config(
    210         lambda inner_input: self.parse_result([Generation(text=inner_input)]),
    211         input,
    212         config,
    213         run_type="parser",
    214     )

File ~/Projects/weekend_projects/coffee/coffee_llm_crawl/.venv/lib/python3.13/site-packages/langchain_core/runnables/base.py:2058, in Runnable._call_with_config(self, func, input_, config, run_type, serialized, **kwargs)
   2054     child_config = patch_config(config, callbacks=run_manager.get_child())
   2055     with set_config_context(child_config) as context:
   2056         output = cast(
   2057             "Output",
-> 2058             context.run(
   2059                 call_func_with_variable_args,  # type: ignore[arg-type]
   2060                 func,
   2061                 input_,
   2062                 config,
   2063                 run_manager,
   2064                 **kwargs,
   2065             ),
   2066         )
   2067 except BaseException as e:
   2068     run_manager.on_chain_error(e)

File ~/Projects/weekend_projects/coffee/coffee_llm_crawl/.venv/lib/python3.13/site-packages/langchain_core/runnables/config.py:435, in call_func_with_variable_args(func, input, config, run_manager, **kwargs)
    433 if run_manager is not None and accepts_run_manager(func):
    434     kwargs["run_manager"] = run_manager
--> 435 return func(input, **kwargs)

File ~/Projects/weekend_projects/coffee/coffee_llm_crawl/.venv/lib/python3.13/site-packages/langchain_core/output_parsers/base.py:202, in BaseOutputParser.invoke.<locals>.<lambda>(inner_input)
    193 @override
    194 def invoke(
    195     self,
   (...)    198     **kwargs: Any,
    199 ) -> T:
    200     if isinstance(input, BaseMessage):
    201         return self._call_with_config(
--> 202             lambda inner_input: self.parse_result(
    203                 [ChatGeneration(message=inner_input)]
    204             ),
    205             input,
    206             config,
    207             run_type="parser",
    208         )
    209     return self._call_with_config(
    210         lambda inner_input: self.parse_result([Generation(text=inner_input)]),
    211         input,
    212         config,
    213         run_type="parser",
    214     )

File ~/Projects/weekend_projects/coffee/coffee_llm_crawl/.venv/lib/python3.13/site-packages/langchain_core/output_parsers/json.py:87, in JsonOutputParser.parse_result(self, result, partial)
     85 except JSONDecodeError as e:
     86     msg = f"Invalid json output: {text}"
---> 87     raise OutputParserException(msg, llm_output=text) from e

OutputParserException: Invalid json output: This text is a web page for the company ScrapeGraphAI, which provides a web scraping API. The page includes information about the product, its features, and customer testimonials. Here's a breakdown of the different sections:

1. **Introduction**: A brief introduction to ScrapeGraphAI, its mission, and its unique approach to web data extraction.
2. **Testimonials**: Quotes from satisfied customers who have used ScrapeGraphAI for their web scraping needs.
3. **Team**: Information about the team behind ScrapeGraphAI, including their backgrounds and expertise.
4. **Give your AI Agent superpowers with lightning-fast web data!**: A call to action to get started with ScrapeGraphAI's API, which promises to provide fast and reliable web data for AI agents.
5. **ScrapeGraphAI**: A link to the company's GitHub page, where users can find more information about the project.
6. **Contact Us**: Information on how to contact ScrapeGraphAI, including a contact email address.
7. **Legal Pages**: Links to the company's privacy policy, terms of service, and manifesto.

The text also includes several links to external websites, such as LinkedIn profiles, GitHub repositories, and Reddit communities, where users can find more information about ScrapeGraphAI and its community.
For troubleshooting, visit: https://docs.langchain.com/oss/python/langchain/errors/OUTPUT_PARSING_FAILURE 
```
2026-01-07 16:12:56 +01:00
Marco Vinciguerra
2ef88261f3 Update README.md 2025-12-26 09:52:00 +01:00
Marco Vinciguerra
2b711b44d3 Update README.md 2025-12-24 08:31:18 +01:00
Marco Vinciguerra
da112dbe14 feat: add benchmark 2025-12-19 10:01:47 +01:00
Marco Vinciguerra
3dc648402b
Add download badge and linting badges to README
Some checks failed
CodeQL / Analyze (python) (push) Has been cancelled
Release / Build (push) Has been cancelled
Release / Release (push) Has been cancelled
2025-11-21 14:17:46 -08:00
Marco Vinciguerra
32d5636ac3
Remove downloads badge from README
Some checks failed
CodeQL / Analyze (python) (push) Has been cancelled
Release / Build (push) Has been cancelled
Release / Release (push) Has been cancelled
2025-11-17 11:31:56 -08:00
Lorenzo Padoan
739b05ac79 doc: 1$ banner
Some checks failed
CodeQL / Analyze (python) (push) Has been cancelled
Release / Build (push) Has been cancelled
Release / Release (push) Has been cancelled
2025-10-04 16:52:23 +02:00
Marco Vinciguerra
c07b3c08cd feat: update pr 2025-08-13 11:49:30 +02:00
Marco Vinciguerra
72b43b3b09
Update README.md 2025-08-13 11:22:23 +02:00
alecontuIT
c2abb9fd5d docs: removed duplicated line 2025-08-04 21:03:56 +02:00
Marco Vinciguerra
2dc6b9bff2
feat: update doc 2025-07-03 11:50:31 +02:00
Marco Vinciguerra
939e170eb6 feat: update the readme 2025-06-24 17:30:34 +02:00
Marco Vinciguerra
288c69a862 feat: removed sposnsors 2025-06-24 09:48:31 +02:00
neo
07dec35f1b
docs: add links to other language versions of README
Added language selection links to the README for easier access to translated versions: German, Spanish, French, and Portuguese.
2025-06-16 16:00:57 +08:00
Marco Vinciguerra
1d1e4db94b
Update README.md 2025-06-16 09:47:51 +02:00
Marco Vinciguerra
2a73821cf1
Update README.md
Some checks failed
CodeQL / Analyze (python) (push) Has been cancelled
Release / Build (push) Has been cancelled
Release / Release (push) Has been cancelled
2025-06-09 13:58:05 +02:00
Marco Vinciguerra
3322f9d32a
Update README.md 2025-06-06 20:02:05 +02:00
Lorenzo Padoan
68bb34cc5e chore: enhanced a readme
Some checks failed
CodeQL / Analyze (python) (push) Has been cancelled
Release / Build (push) Has been cancelled
Release / Release (push) Has been cancelled
2025-06-04 17:08:39 +02:00
Lorenzo Padoan
c23e3b7abc feat: enhanced readme call to action 2025-06-04 17:04:12 +02:00
Marco Vinciguerra
d560070e63
Update README.md
Some checks failed
CodeQL / Analyze (python) (push) Has been cancelled
Release / Build (push) Has been cancelled
Release / Release (push) Has been cancelled
2025-05-26 17:18:02 +02:00
Marco Vinciguerra
a1cfa98e89
Update README.md 2025-05-22 10:59:26 +02:00
Stelios Georgiou
5e60d49721
Update README.md - spellfix 2025-05-15 14:52:12 +01:00
Marco Vinciguerra
1fa0d4ffe9
Update README.md 2025-05-15 12:42:57 +02:00
Marco Vinciguerra
0b4b27e550 feat: add minor fixes 2025-05-13 17:23:40 +02:00
Marco Vinciguerra
0c34b76d53 update logo
Some checks failed
CodeQL / Analyze (python) (push) Has been cancelled
Release / Build (push) Has been cancelled
Release / Release (push) Has been cancelled
2025-04-07 10:09:09 +02:00
Marco Vinciguerra
f5a369e2a3 Update README.md 2025-03-27 17:26:21 +01:00
Marco Vinciguerra
c085d6c7ff feat: add new logo 2025-03-27 17:25:03 +01:00
Marco Vinciguerra
ae60e2b8bf feat: add scrapeless logo 2025-03-27 17:23:10 +01:00
Marco Vinciguerra
ff7b33b376 feat: update terms 2025-03-10 11:27:33 +01:00
Marco Vinciguerra
dc0a138a7e run pre commit 2025-01-12 16:35:31 +01:00
Marco Perini
aa6a76e5bd
docs: added first ollama example
Some checks failed
Code Quality Checks / quality (push) Has been cancelled
CodeQL / Analyze (python) (push) Has been cancelled
Release / Build (push) Has been cancelled
Release / Release (push) Has been cancelled
2025-01-12 14:11:52 +01:00
Marco Perini
02022cc5db
docs: code quality badge update
Some checks failed
Code Quality Checks / quality (push) Has been cancelled
CodeQL / Analyze (python) (push) Has been cancelled
Release / Build (push) Has been cancelled
Release / Release (push) Has been cancelled
2025-01-06 19:03:19 +01:00
Marco Vinciguerra
ca94b39aba
Update README.md 2025-01-06 17:19:50 +01:00
PeriniM
5cdf0550fe chore: made some libs optional 2025-01-06 03:42:45 +01:00
PeriniM
f6009d1abf fix: better playwright installation handling 2025-01-06 02:01:17 +01:00
Marco Perini
96064f20ee
docs: fixed missing import
Some checks failed
CodeQL / Analyze (python) (push) Has been cancelled
/ build (push) Has been cancelled
Release / Build (push) Has been cancelled
Release / Release (push) Has been cancelled
2025-01-02 15:20:49 +01:00
Marco Vinciguerra
b312251cc5 fix: revert 2025-01-02 14:49:46 +01:00
Marco Perini
fe89ae29e6
docs: updated documentation reference
Some checks failed
CodeQL / Analyze (python) (push) Has been cancelled
Release / Build (push) Has been cancelled
Release / Release (push) Has been cancelled
2024-12-27 15:11:32 +01:00
Marco Perini
67038e1952
docs: added api reference 🔗 2024-12-16 16:07:43 +01:00
Marco Vinciguerra
2c4d06aea2 Update README.md 2024-12-14 12:02:53 +01:00
Marco Vinciguerra
bae92b0dcc Update README.md 2024-12-10 16:30:21 +01:00
Marco Vinciguerra
fbb131ce15 Update README.md 2024-12-10 15:16:21 +01:00
Marco Vinciguerra
14e28a9c15 Update README.md 2024-12-06 10:40:13 +01:00
Marco Vinciguerra
25e47d8398 Update README.md 2024-12-06 10:39:52 +01:00
Lorenzo Padoan
fde878fbe9
Update README.md
Some checks failed
CodeQL / Analyze (python) (push) Has been cancelled
/ build (push) Has been cancelled
Release / Build (push) Has been cancelled
Release / Release (push) Has been cancelled
add website
2024-12-02 15:16:30 +01:00
Marco Vinciguerra
98cf5f1f83
Merge branch 'pre/beta' into main 2024-11-19 07:58:28 +01:00
Lorenzo Padoan
b400cc5a66
Update README.md
Some checks failed
CodeQL / Analyze (python) (push) Has been cancelled
/ build (push) Has been cancelled
Release / Build (push) Has been cancelled
Release / Release (push) Has been cancelled
2024-11-07 16:02:19 +01:00