# 🤖 YOSO-ai: You Only Scrape Once YOSO-ai is a Python **Open Source** library that uses LLM and Langchain for faster and efficient web scraping. Just say which information you want to extract and the library will do it for you. Official documentation page: [yoso-ai.readthedocs.io](https://yoso-ai.readthedocs.io/) # 🔍 Demo Try out YOSO-ai in your browser: [![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/VinciGit00/YOSO-ai) # 🔧 Quick Setup Follow the following steps: 1. ```bash git clone https://github.com/VinciGit00/yoso-ai.git ``` 2. (Optional) ```bash python -m venv venv source ./venv/bin/activate ``` 3. ```bash pip install -r requirements.txt # if you want to install it as a library pip install . # or if you plan on developing new features it is best to also install the extra dependencies using pip install -r requirements-dev.txt # if you want to install it as a library pip install .[dev] ``` 4. Create your personal OpenAI API key from [here](https://platform.openai.com/api-keys) 5. (Optional) Create a .env file inside the main and paste the API key ```config API_KEY="your openai.com api key" ``` 6. You are ready to go! 🚀 7. Try running the examples using: ```bash python -m examples.html_scraping # or if you are outside of the project folder python -m yoso-ai.examples.html_scraping ``` # 📖 Examples ```python import os from dotenv import load_dotenv from yosoai import _get_function, send_request load_dotenv() def main(): # Get OpenAI API key from environment variables openai_key = os.getenv("API_KEY") if not openai_key: print("Error: OpenAI API key not found in environment variables.") return # Example values for the request request_settings = [ { "title": "title_news", "type": "str", "description": "Give me the name of the news" } ] # Choose the desired model and other parameters selected_model = "gpt-3.5-turbo" temperature_value = 0.7 # Mockup World URL mockup_world_url = "https://sport.sky.it/nba?gr=www" # Invoke send_request function result = send_request(openai_key, _get_function(mockup_world_url), request_settings, selected_model, temperature_value, 'cl100k_base') # Print or process the result as needed print("Result:", result) if __name__ == "__main__": main() ``` ### Case 2: Passing your own HTML code ```python import os from dotenv import load_dotenv from yosoai import send_request load_dotenv() # Example using a HTML code query_info = ''' Given this code extract all the information in a json format about the news.

Booker show with 52 points, whoever has the most games over 50

Standings

The Suns' No. 1 dominated the match won in New Orleans, scoring 52 points. It's about...

...
Partite con 50+ punti: Booker in Top-20
''' def main(): # Get OpenAI API key from environment variables openai_key = os.getenv("API_KEY") if not openai_key: print("Error: OpenAI API key not found in environment variables.") return # Example values for the request request_settings = [ { "title": "title", "type": "str", "description": "Title of the news" } ] # Choose the desired model and other parameters selected_model = "gpt-3.5-turbo" temperature_value = 0.7 # Invoke send_request function result = send_request(openai_key, query_info, request_settings, selected_model, temperature_value, 'cl100k_base') # Print or process the result as needed print("Result:", result) if __name__ == "__main__": main() ``` Note: all the model are available at the following link: [https://platform.openai.com/docs/models](https://platform.openai.com/docs/models), be sure you have enabled that keys # Example of output Given the following input ```python [ { "title": "title", "type": "str", "description": "Title of the news" } ] ``` using as a input the website [https://sport.sky.it/nba?gr=www](https://sport.sky.it/nba?gr=www) The oputput format is a dict and its the following: ```bash { 'title': 'Booker show with 52 points, whoever has the most games over 50' } ``` # Developed by

Vincios Logo Lurenss Logo PeriniLab Logo