- Add timeout parameter to FetchNode (default: 30 seconds)
- Apply timeout to requests.get() calls to prevent indefinite hangs
- Implement timeout for PDF parsing using ThreadPoolExecutor
- Propagate timeout to ChromiumLoader via loader_kwargs
- Add comprehensive unit tests for timeout functionality
- Fully backward compatible (timeout can be disabled with None)
Fixes issue with requests.get() and PDF parsing blocking indefinitely
on slow/unresponsive servers or large documents.
Usage:
node_config={'timeout': 30} # Custom timeout
node_config={'timeout': None} # Disable timeout
node_config={} # Use default 30s timeout