From a8e1c3c5d356146d2002edf2066c8a2fbc6ce700 Mon Sep 17 00:00:00 2001 From: Asabeneh Date: Wed, 11 Dec 2019 00:37:06 +0200 Subject: [PATCH] day 22 --- readme.md | 2 +- readme10-12.md | 5 ++- readme13-15.md | 4 +- readme16-18.md | 4 +- readme19-21.md | 12 ++++-- readme22-24.md | 100 +++++++++++++++++++++++++++++++++++++++++++++++++ readme4-6.md | 4 +- readme7-9.md | 4 +- 8 files changed, 121 insertions(+), 14 deletions(-) create mode 100644 readme22-24.md diff --git a/readme.md b/readme.md index e51e744..0d6fd8b 100644 --- a/readme.md +++ b/readme.md @@ -7,7 +7,7 @@ 🧳 [Part 5: Day 13 - 15](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme13-15.md) 🧳 [Part 6: Day 16 - 18](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme16-18.md) 🧳 [Part 7: Day 19 - 21](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme19-21.md) -🧳 [Part 8: Day 22 - 24](#) 🔒 +🧳 [Part 8: Day 22 - 24]([#](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme22-24.md) 🧳 [Part 9: Day 25 - 27](#) 🔒 🧳 [Part 10: Day 28 - 30](#) 🔒 diff --git a/readme10-12.md b/readme10-12.md index fcd1b97..c1c26c7 100644 --- a/readme10-12.md +++ b/readme10-12.md @@ -6,11 +6,12 @@ 🧳 [Part 4: Day 10 - 12](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme10-12.md) 🧳 [Part 5: Day 13 - 15](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme13-15.md) 🧳 [Part 6: Day 16 - 18](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme16-18.md) -🧳 [Part 7: Day 19 - 21](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme19-21.md) -🧳 [Part 8: Day 22 - 24](#) 🔒 +🧳 [Part 7: Day 19 - 21](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme19-21.md) +🧳 [Part 8: Day 22 - 24]([#](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme22-24.md) 🧳 [Part 9: Day 25 - 27](#) 🔒 🧳 [Part 10: Day 28 - 30](#) 🔒 +--- --- - [📘 Day 10](#%f0%9f%93%98-day-10) diff --git a/readme13-15.md b/readme13-15.md index 66d378c..c206713 100644 --- a/readme13-15.md +++ b/readme13-15.md @@ -6,8 +6,8 @@ 🧳 [Part 4: Day 10 - 12](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme10-12.md) 🧳 [Part 5: Day 13 - 15](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme13-15.md) 🧳 [Part 6: Day 16 - 18](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme16-18.md) -🧳 [Part 7: Day 19 - 21](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme19-21.md) -🧳 [Part 8: Day 22 - 24](#) 🔒 +🧳 [Part 7: Day 19 - 21](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme19-21.md) +🧳 [Part 8: Day 22 - 24]([#](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme22-24.md) 🧳 [Part 9: Day 25 - 27](#) 🔒 🧳 [Part 10: Day 28 - 30](#) 🔒 diff --git a/readme16-18.md b/readme16-18.md index dfc9b20..91bb6de 100644 --- a/readme16-18.md +++ b/readme16-18.md @@ -6,8 +6,8 @@ 🧳 [Part 4: Day 10 - 12](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme10-12.md) 🧳 [Part 5: Day 13 - 15](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme13-15.md) 🧳 [Part 6: Day 16 - 18](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme16-18.md) -🧳 [Part 7: Day 19 - 21](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme19-21.md) -🧳 [Part 8: Day 22 - 24](#) 🔒 +🧳 [Part 7: Day 19 - 21](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme19-21.md) +🧳 [Part 8: Day 22 - 24]([#](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme22-24.md) 🧳 [Part 9: Day 25 - 27](#) 🔒 🧳 [Part 10: Day 28 - 30](#) 🔒 diff --git a/readme19-21.md b/readme19-21.md index 1428ea4..cc7b146 100644 --- a/readme19-21.md +++ b/readme19-21.md @@ -7,10 +7,12 @@ 🧳 [Part 5: Day 13 - 15](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme13-15.md) 🧳 [Part 6: Day 16 - 18](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme16-18.md) 🧳 [Part 7: Day 19 - 21](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme19-21.md) -🧳 [Part 8: Day 22 - 24](#) 🔒 +🧳 [Part 8: Day 22 - 24]([#](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme22-24.md) 🧳 [Part 9: Day 25 - 27](#) 🔒 🧳 [Part 10: Day 28 - 30](#) 🔒 +--- + - [📘 Day 19](#%f0%9f%93%98-day-19) - [File handling](#file-handling) - [Opening File for reading](#opening-file-for-reading) @@ -49,7 +51,7 @@ - [Method to modify class default values](#method-to-modify-class-default-values) - [Inheritance](#inheritance) - [Overriding parent method](#overriding-parent-method) - - [💻 Exercises: Day 20](#%f0%9f%92%bb-exercises-day-20) + - [💻 Exercises: Day 21](#%f0%9f%92%bb-exercises-day-21) # 📘 Day 19 @@ -1184,7 +1186,7 @@ Lidiya Teklemariam is 28 year old. She lives in Espoo, Finland. We can use super() function or the parent name Person to automatically inherit the methods and properties from its parent. In the above example, we override the parant method. The child method has a different feature, it can identify if the gender is male or female and assign the proper pronoun(He/She). -## 💻 Exercises: Day 20 +## 💻 Exercises: Day 21 1. Python has the module called _statistics_ and we can use this module to do all the statistical caluculations. Hower to challlenge ourselves, let's try to develop a program which calculate measure of central tendency of a sample(mean, median, mode) and measure of variability(range, variance, standard deviation). In addition to those measure, find the min, max, count and frequency distribution of the sample. Check the output below. @@ -1220,3 +1222,7 @@ Variance: 17.5 Standard Deviation: 4.2 Frequency Distribution: [(20.0, 26), (16.0, 27), (12.0, 32), (8.0, 37), (8.0, 34), (8.0, 33), (8.0, 31), (8.0, 24), (4.0, 38), (4.0, 29), (4.0, 25)] ``` + +[<< Part 6 ](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme16-18.md) | [Part 8 >>](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme22-24.md) + +--- \ No newline at end of file diff --git a/readme22-24.md b/readme22-24.md new file mode 100644 index 0000000..face945 --- /dev/null +++ b/readme22-24.md @@ -0,0 +1,100 @@ +![30DaysOfPython](./images/30DaysOfPython_banner3@2x.png) + +🧳 [Part 1: Day 1 - 3](https://github.com/Asabeneh/30-Days-Of-Python) +🧳 [Part 2: Day 4 - 6](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme4-6.md) +🧳 [Part 3: Day 7 - 9](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme7-9.md) +🧳 [Part 4: Day 10 - 12](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme10-12.md) +🧳 [Part 5: Day 13 - 15](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme13-15.md) +🧳 [Part 6: Day 16 - 18](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme16-18.md) +🧳 [Part 7: Day 19 - 21](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme19-21.md) +🧳 [Part 8: Day 22 - 24]([#](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme22-24.md) +🧳 [Part 9: Day 25 - 27](#) 🔒 +🧳 [Part 10: Day 28 - 30](#) 🔒 + +--- +- [📘 Day 22](#%f0%9f%93%98-day-22) + - [Python Web Scraping](#python-web-scraping) + - [What is web scrapping](#what-is-web-scrapping) + - [💻 Exercises: Day 22](#%f0%9f%92%bb-exercises-day-22) +# 📘 Day 22 + +## Python Web Scraping + +### What is web scrapping + +The internet is full huge amount of data which can be used for different uses. To collect this data we need to know how scrape data on a website. + +Web scraping is the process of extracting and collecting data from websites and storing the data into a local machine or into a database. + +In this section, we will use beautifulsoup and requests package to scape data. The beautifulsoup package we are using beautifulsoup 4. + +To start scraping a website you need _requests_, _beautifoulSoup4_ and _website_ to be scrapped. + +```sh +pip install requests +pip installl install beautifulsoup4 +``` + +To scrape a data on a website it needs basic understanding of HTML tags and css selectors. We target content from a website using HTML tag, class or an id. +Let's import the requests and BeautifulSoup module + +```py +import requests +from bs4 import BeautifulSoup +``` + +Let's declare url variable for the website which we are going to scrape. + +```py + +import requests +from bs4 import BeautifulSoup +url = 'http://mlr.cs.umass.edu/ml/datasets.html' + +# Lets use the requests get method to fetch the data from url + +response = requests.get(url) +# lets check the status +status = response.status_code +print(status) # 200 means the fetching was successful +``` + +```sh +200 +``` + +Using beautifulSoup to parse content from the page + +```py +import requests +from bs4 import BeautifulSoup +url = 'http://mlr.cs.umass.edu/ml/datasets.html' + +response = requests.get(url) +content = response.content # we get all the content from the website +soup = BeautifulSoup(content, 'html.parser') # beautiful soup will give a chance to parse +print(soup.title) # UCI Machine Learning Repository: Data Sets +print(soup.title.get_text()) # UCI Machine Learning Repository: Data Sets +print(soup.body) # gives the whole page on the website +# print(soup.body) +print(response.status_code) + +tables = soup.find_all('table', {'cellpadding':'3'}) +# We are targeting the table with cellpadding attribute and the attribute value +# We can select using id, class or HTML tag , for more information check the beautifulsoup doc +table = tables[0] # the result is list, we are taking out from the list +for td in table.find('tr').find_all('td'): + print(td.text) +``` +If you run the above code, you can see that the extraction is half done. You can continue doing it because it is part of exercise 1. +For reference check the beautiful [soup documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#quick-start) + + +## 💻 Exercises: Day 22 +1. Extract the table in this url (http://mlr.cs.umass.edu/ml/datasets.html) and change it to a json file +2. Scrape the presidents table and store the data as json(https://en.wikipedia.org/wiki/List_of_presidents_of_the_United_States) + + +[<< Part 7 ](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme19-21.md) | [Part 9 >>](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme25-27.md) + +--- \ No newline at end of file diff --git a/readme4-6.md b/readme4-6.md index 0498119..a52b8b5 100644 --- a/readme4-6.md +++ b/readme4-6.md @@ -6,8 +6,8 @@ 🧳 [Part 4: Day 10 - 12](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme10-12.md) 🧳 [Part 5: Day 13 - 15](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme13-15.md) 🧳 [Part 6: Day 16 - 18](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme16-18.md) -🧳 [Part 7: Day 19 - 21](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme19-21.md) -🧳 [Part 8: Day 22 - 24](#) 🔒 +🧳 [Part 7: Day 19 - 21](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme19-21.md) +🧳 [Part 8: Day 22 - 24]([#](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme22-24.md) 🧳 [Part 9: Day 25 - 27](#) 🔒 🧳 [Part 10: Day 28 - 30](#) 🔒 diff --git a/readme7-9.md b/readme7-9.md index ac9189b..21e041e 100644 --- a/readme7-9.md +++ b/readme7-9.md @@ -6,8 +6,8 @@ 🧳 [Part 4: Day 10 - 12](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme10-12.md) 🧳 [Part 5: Day 13 - 15](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme13-15.md) 🧳 [Part 6: Day 16 - 18](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme16-18.md) -🧳 [Part 7: Day 19 - 21](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme19-21.md) -🧳 [Part 8: Day 22 - 24](#) 🔒 +🧳 [Part 7: Day 19 - 21](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme19-21.md) +🧳 [Part 8: Day 22 - 24]([#](https://github.com/Asabeneh/30-Days-Of-Python/blob/master/readme22-24.md) 🧳 [Part 9: Day 25 - 27](#) 🔒 🧳 [Part 10: Day 28 - 30](#) 🔒