Merge pull request #39 from pkiczko/update-18

Update 18
This commit is contained in:
Asabeneh 2020-05-16 13:46:41 +03:00 committed by GitHub
commit 6796a6e68b
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -19,52 +19,62 @@
![30DaysOfPython](../images/30DaysOfPython_banner3@2x.png)
- [📘 Day 18](#%f0%9f%93%98-day-18)
- [Regular Expression](#regular-expression)
- [Import re module](#import-re-module)
- [re functions](#re-functions)
- [Regular Expressions](#regular-expression)
- [The *re* Module](#The-re-module)
- [Functions in *re* Module](#functions-in-re-module)
- [Match](#match)
- [Search](#search)
- [Searching all matches using findall](#searching-all-matches-using-findall)
- [Replacing a substring](#replacing-a-substring)
- [Spliting text using RegEx split](#spliting-text-using-regex-split)
- [Writing RegEx pattern](#writing-regex-pattern)
- [Square Bracket](#square-bracket)
- [Searching for All Matches Using *findall*](#searching-for-all-matches-using-findall)
- [Replacing a Substring](#replacing-a-Substring)
- [Splitting Text Using RegEx Split](#splitting-text-using-regex-split)
- [Writing RegEx Patterns](#writing-regex-patterns)
- [Square Brackets](#square-brackets)
- [Escape character(\\) in RegEx](#escape-character-in-regex)
- [One or more times(+)](#one-or-more-times)
- [Period(.)](#period)
- [Zero or more times(*)](#zero-or-more-times)
- [Zero or more times(\*)](#zero-or-more-times)
- [Zero or one times(?)](#zero-or-one-times)
- [Quantifier in RegEx](#quantifier-in-regex)
- [Quantifiers in RegEx](#quantifiers-in-regex)
- [Cart ^](#cart)
- [💻 Exercises: Day 18](#%f0%9f%92%bb-exercises-day-18)
# 📘 Day 18
## Regular Expression
A regular expression or RegEx is a small programming language that helps to find pattern in data. A RegEx can be used to check if some pattern exists in a different data type. To use RegEx in python first we should import the RegEx module which is *re*.
### Import re module
## Regular Expressions
A regular expression or RegEx is a special text string that helps to find patterns in data. A RegEx can be used to check if some pattern exists in a different data type. To use RegEx in python first we should import the RegEx module which is called *re*.
### The *re* Module
After importing the module we can use it to detect or find patterns.
```py
import re
```
### re functions
To find a pattern we use different set of *re* functions that allows to search a string for match.
* *re.match()*:searches only in the beginning of the first line of the string and return match object if found, else return none.
* *re.search*:Returns a Match object if there is a match anywhere in the string including or in multiline string.
* *re.findall*:Returns a list containing all matches
* *re.split*: Returns a list where the string has been split at each match
* *re.sub*: Replaces one or many matches with a string
### Functions in *re* Module
To find a pattern we use different set of *re* character sets that allows to search for a match in a string.
* *re.match()*: searches only in the beginning of the first line of the string and returns matched objects if found, else returns none.
* *re.search*: Returns a match object if there is one anywhere in the string, including multiline strings.
* *re.findall*: Returns a list containing all matches
* *re.split*: Takes a string, splits it at the match points, returns a list
* *re.sub*: Replaces one or many matches within a string
#### Match
```py
# syntac
re.match(substring, string, re.I)
# substring is a string or a pattern, string is the text we look for a pattern , re.I is case ignore
```
```py
txt = 'I love to teach python or javaScript'
# It return an object with span, and match
import re
txt = 'I love to teach python and javaScript'
# It returns an object with span, and match
match = re.match('I love to teach', txt, re.I)
print(match) # <re.Match object; span=(0, 15), match='I love to teach'>
# We can get the starting and ending position of the match as tuple using span
@ -76,19 +86,23 @@ print(start, end) # 0, 15
substring = txt[start:end]
print(substring) # I love to teach
```
As you can see from the above example, the pattern we are looking for or the substring *I love to teach* is the beginning of the text. The match function only returns an object if the text starts with the pattern.
As you can see from the example above, the pattern we are looking for (or the substring we are looking for) is *I love to teach*. The match function returns an object **only** if the text starts with the pattern.
#### Search
```py
# syntax
re.match(substring, string, re.I)
# substring is a pattern, string is the text we look for a pattern , re.I is case ignore flag
```
```py
txt = '''Python is the most beautiful language that a human begin has ever created.
import re
txt = '''Python is the most beautiful language that a human being has ever created.
I recommend python for a first programming language'''
# It return an object with span, and match
# It returns an object with span and match
match = re.search('first', txt, re.I)
print(match) # <re.Match object; span=(100, 105), match='first'>
# We can get the starting and ending position of the match as tuple using span
@ -100,13 +114,15 @@ print(start, end) # 100 105
substring = txt[start:end]
print(substring) # first
```
As you can see search is much better than match because it can look for the pattern through out the text. Search return returns a match object right way a first match found. A much better *re* function is *findall*. This function check the pattern through the string and returns all the matches as a list.
#### Searching all matches using findall
As you can see, search is much better than match because it can look for the pattern throughout the text. Search returns a match object with a first match that was found, otherwise it returns _None_. A much better *re* function is *findall*. This function checks for the pattern through the whole string and returns all the matches as a list.
#### Searching for All Matches Using *findall*
*findall()* returns all the matches as a list
```py
txt = '''Python is the most beautiful language that a human begin has ever created.
txt = '''Python is the most beautiful language that a human being has ever created.
I recommend python for a first programming language'''
# It return a list
@ -114,11 +130,12 @@ matches = re.findall('language', txt, re.I)
print(matches) # ['language', 'language']
```
As you can see, the word language found two times in the string. Let's practice more
Let's look for the word both Python and python in the string
As you can see, the word language was found two times in the string. Let's practice some more.
Now we will look for both Python and python words in the string:
```py
txt = '''Python is the most beautiful language that a human begin has ever created.
txt = '''Python is the most beautiful language that a human being has ever created.
I recommend python for a first programming language'''
# It returns list
@ -126,9 +143,11 @@ matches = re.findall('python', txt, re.I)
print(matches) # ['Python', 'python']
```
Since we are using *re.I* both lowercase and uppercase are included but if we don't have the flag, we write our pattern differently. Let's see that
Since we are using *re.I* both lowercase and uppercase letters are included. If we don't have that flag, then we will have to write our pattern differently. Let's check it out:
```py
txt = '''Python is the most beautiful language that a human begin has ever created.
txt = '''Python is the most beautiful language that a human being has ever created.
I recommend python for a first programming language'''
matches = re.findall('Python|python', txt)
@ -139,49 +158,60 @@ matches = re.findall('[Pp]ython', txt)
print(matches) # ['Python', 'python']
```
#### Replacing a substring
#### Replacing a Substring
```py
txt = '''Python is the most beautiful language that a human begin has ever created.
txt = '''Python is the most beautiful language that a human being has ever created.
I recommend python for a first programming language'''
match_replaced = re.sub('Python|python', 'JavaScript', txt, re.I)
print(match_replaced) # JavaScript is the most beautiful language that a human begin has ever created.
print(match_replaced) # JavaScript is the most beautiful language that a human being has ever created.
# OR
match_replaced = re.sub('[Pp]ython', 'JavaScript', txt, re.I)
print(match_replaced) # JavaScript is the most beautiful language that a human begin has ever created.
print(match_replaced) # JavaScript is the most beautiful language that a human being has ever created.
```
Let's add one more example, the following string is really hard to read unless we remove the % symbol. Replacing the % with a empty string will clean the text.
Let's add one more example. The following string is really hard to read unless we remove the % symbol. Replacing the % with an empty string will clean the text.
```py
txt = '''%I a%m te%%a%%che%r% a%n%d %% I l%o%ve te%ach%ing.
T%he%re i%s n%o%th%ing as m%ore r%ewarding a%s e%duc%at%i%ng a%n%d e%m%p%ow%er%ing p%e%o%ple.
T%he%re i%s n%o%th%ing as r%ewarding a%s e%duc%at%i%ng a%n%d e%m%p%ow%er%ing p%e%o%ple.
I fo%und te%a%ching m%ore i%n%t%er%%es%ting t%h%an any other %jobs.
D%o%es thi%s m%ot%iv%a%te %y%o%u to b%e a t%e%a%cher.'''
D%o%es thi%s m%ot%iv%a%te %y%o%u to b%e a t%e%a%cher?'''
matches = re.sub('%', '', txt)
print(matches) # ['Python', 'python']
print(matches)
```
```sh
I am teacher and I love teaching.
There is nothing as more rewarding as educating and empowering people.
There is nothing as rewarding as educating and empowering people.
I found teaching more interesting than any other jobs.
Does this motivate you to be a teacher.
Does this motivate you to be a teacher?
```
## Spliting text using RegEx split
## Splitting Text Using RegEx Split
```py
txt = '''I am teacher and I love teaching.
There is nothing as more rewarding as educating and empowering people.
There is nothing as rewarding as educating and empowering people.
I found teaching more interesting than any other jobs.
Does this motivate you to be a teacher.'''
print(re.split('\n', txt))
Does this motivate you to be a teacher?'''
print(re.split('\n', txt)) # splitting using \n - end of line symbol
```
```sh
['I am teacher and I love teaching.', 'There is nothing as more rewarding as educating and empowering people.', 'I found teaching more interesting than any other jobs.', 'Does this motivate you to be a teacher.']
['I am teacher and I love teaching.', 'There is nothing as rewarding as educating and empowering people.', 'I found teaching more interesting than any other jobs.', 'Does this motivate you to be a teacher?']
```
## Writing RegEx pattern
## Writing RegEx Patterns
To declare a string variable we use a single or double quote. To declare RegEx variable *r''*.
The following pattern only identifies apple with lowercase, to make it case insensitive either we should rewrite our pattern or we should add a flag.
```py
import re
regex_pattern = r'apple'
txt = 'Apple and banana are fruits. An old cliche says an apple a day a doctor way has been replaced by a banana a day keeps the doctor far far away. '
matches = re.findall(regex_pattern, txt)
@ -190,7 +220,7 @@ print(matches) # ['apple']
# To make case insensitive adding flag '
matches = re.findall(regex_pattern, txt, re.I)
print(matches) # ['Apple', 'apple']
# or we can use set of characters method
# or we can use a set of characters method
regex_pattern = r'[Aa]pple' # this mean the first letter could be Apple or apple
matches = re.findall(regex_pattern, txt)
print(matches) # ['Apple', 'apple']
@ -198,71 +228,75 @@ print(matches) # ['Apple', 'apple']
```
* []: A set of characters
* [a-c] means, a or b or c
* [a-z] means, any letter a to z
* [A-Z] means, any character A to Z
* [a-z] means, any letter from a to z
* [A-Z] means, any character from A to Z
* [0-3] means, 0 or 1 or 2 or 3
* [0-9] means any number 0 to 9
* [A-Za-z0-9] any character which is a to z, A to Z, 0 to 9
* [0-9] means any number from 0 to 9
* [A-Za-z0-9] any single character, that is a to z, A to Z or 0 to 9
* \\: uses to escape special characters
* \d mean:match where the string contains digits (numbers from 0-9)
* \D mean: match where the string does not contain digits
* \d means: match where the string contains digits (numbers from 0-9)
* \D means: match where the string does not contain digits
* . : any character except new line character(\n)
* ^: starts with
* r'^substring' eg r'^love', a sentence which starts with a word love
* r'[^abc] mean not a, not b, not c.
* r'^substring' eg r'^love', a sentence that starts with a word love
* r'[^abc] means not a, not b, not c.
* $: ends with
* r'substring$' eg r'love$', sentence ends with a word love
* r'substring$' eg r'love$', sentence that ends with a word love
* *: zero or more times
* r'[a]*' means a optional or it can be occur many times.
* r'[a]*' means a optional or it can occur many times.
* +: one or more times
* r'[a]+' mean at least once or more times
* ?: zero or one times
* r'[a]?' mean zero times or once
* r'[a]+' means at least once (or more)
* ?: zero or one time
* r'[a]?' means zero times or once
* {3}: Exactly 3 characters
* {3,}: At least 3 character
* {3,}: At least 3 characters
* {3,8}: 3 to 8 characters
* |: Either or
* r'apple|banana' mean either of an apple or a banana
* r'apple|banana' means either apple or a banana
* (): Capture and group
![Regular Expression cheat sheet](../images/regex.png)
Let's use example to clarify the above meta characters
Let's use examples to clarify the meta characters above
### Square Bracket
Let's use square bracket to include lower and upper case
```py
regex_pattern = r'[Aa]pple' # this square bracket mean either A or a
txt = 'Apple and banana are fruits. An old cliche says an apple a day a doctor way has been replaced by a banana a day keeps the doctor far far away. '
matches = re.findall(regex_pattern, txt)
print(matches) # ['Apple', 'apple']
```
If we want to look for the banana, we write the pattern as follows:
```py
regex_pattern = r'[Aa]pple|[Bb]anana' # this square bracket mean either A or a
regex_pattern = r'[Aa]pple|[Bb]anana' # this square bracket means either A or a
txt = 'Apple and banana are fruits. An old cliche says an apple a day a doctor way has been replaced by a banana a day keeps the doctor far far away. '
matches = re.findall(regex_pattern, txt)
print(matches) # ['Apple', 'banana', 'apple', 'banana']
```
Using the square bracket and or operator , we manage to extract Apple, apple, Banana and banana.
### Escape character(\\) in RegEx
```py
regex_pattern = r'\d' # d is a special character which means digits
txt = 'This regular expression example was made in December 6, 2019.'
txt = 'This regular expression example was made on December 6, 2019.'
matches = re.findall(regex_pattern, txt)
print(matches) # ['6', '2', '0', '1', '9'], this is not what we want
regex_pattern = r'\d+' # d is a special character which means digits, + mean one or more
txt = 'This regular expression example was made in December 6, 2019.'
matches = re.findall(regex_pattern, txt)
print(matches) # ['6', '2019']
```
### One or more times(+)
```py
regex_pattern = r'\d+' # d is a special character which means digits, + mean one or more times
txt = 'This regular expression example was made in December 6, 2019.'
txt = 'This regular expression example was made on December 6, 2019.'
matches = re.findall(regex_pattern, txt)
print(matches) # ['6', '2019']
print(matches) # ['6', '2019'] - now, this is better!
```
### Period(.)
@ -277,60 +311,74 @@ matches = re.findall(regex_pattern, txt)
print(matches) # ['and banana are fruits']
```
### Zero or more times(*)
### Zero or more times(\*)
Zero or many times. The pattern could may not occur or it can occur many times.
```py
regex_pattern = r'[a].*' # . any character, + any character one or more times
regex_pattern = r'[a].*' # . any character, * any character zero or more times
txt = '''Apple and banana are fruits'''
matches = re.findall(regex_pattern, txt)
print(matches) # ['and banana are fruits']
```
### Zero or one times(?)
Zero or one times. The pattern could may not occur or it may occur once.
### Zero or one time(?)
Zero or one time. The pattern may not occur or it may occur once.
```py
txt = '''I am not sure if there is a convention how to write the word e-mail.
Some people write it email others may write it as Email or E-mail.'''
regex_pattern = r'[Ee]-?mail' # ? means optional
regex_pattern = r'[Ee]-?mail' # ? means here that '-' is optional
matches = re.findall(regex_pattern, txt)
print(matches) # ['e-mail', 'email', 'Email', 'E-mail']
```
### Quantifier in RegEx
We can specify the length of the substring we look for in a text, using a curly bracket. Lets imagine, we are interested in substring that their length are 4 characters
We can specify the length of the substring we are looking for in a text, using a curly bracket. Lets imagine, we are interested in a substring with a length of 4 characters:
```py
txt = 'This regular expression example was made in December 6, 2019.'
txt = 'This regular expression example was made on December 6, 2019.'
regex_pattern = r'\d{4}' # exactly four times
matches = re.findall(regex_pattern, txt)
print(matches) # ['2019']
txt = 'This regular expression example was made in December 6, 2019.'
txt = 'This regular expression example was made on December 6, 2019.'
regex_pattern = r'\d{1, 4}' # 1 to 4
matches = re.findall(regex_pattern, txt)
print(matches) # ['6', '2019']
```
### Cart ^
* Starts with
```py
txt = 'This regular expression example was made in December 6, 2019.'
txt = 'This regular expression example was made on December 6, 2019.'
regex_pattern = r'^This' # ^ means starts with
matches = re.findall(regex_pattern, txt)
print(matches) # ['This']
```
* Negation
```py
txt = 'This regular expression example was made in December 6, 2019.'
txt = 'This regular expression example was made on December 6, 2019.'
regex_pattern = r'[^A-Za-z ]+' # ^ in set character means negation, not A to Z, not a to z, no space
matches = re.findall(regex_pattern, txt)
print(matches) # ['e-mail', 'email', 'Email', 'E-mail']
print(matches) # ['6,', '2019.']
```
## 💻 Exercises: Day 18
1. What is the most frequent word in the following paragraph ?
1. What is the most frequent word in the following paragraph?
```py
paragraph = 'I love teaching. If you do not love teaching what else can you love. I love Python if you do not love something which can give you all the capabilities to develop an application what else can you love.
```
@ -358,20 +406,26 @@ print(matches) # ['e-mail', 'email', 'Email', 'E-mail']
(1, 'Python'),
(1, 'If')]
```
2. The position of some particles on the horizontal x-axis -12, -4, -3 and -1 in the negative direction, 0 at origin, 4 and 8 in the positive direction. Extract these numbers and find the distance between the two furthest particles.
2. The position of some particles on the horizontal x-axis -12, -4, -3 and -1 in the negative direction, 0 at origin, 4 and 8 in the positive direction. Extract these numbers from this whole text and find the distance between the two furthest particles.
```py
points = ['-1', '2', '-4', '-3', '-1', '0', '4', '8']
sorted_points = [-4, -3, -1, -1, 0, 2, 4, 8]
distance = 12
```
3. Write a pattern which identify if a string is a valid python variable
3. Write a pattern which identifies if a string is a valid python variable
```sh
is_valid_variable('first_name') # True
is_valid_variable('first-name') # False
is_valid_variable('1first_name') # False
is_valid_variable('firstname') # True
```
4. Clean the following text. After cleaning, count three most frequent words in the string.
```py
sentence = '''%I $am@% a %tea@cher%, &and& I lo%#ve %tea@ching%;. There $is nothing; &as& mo@re rewarding as educa@ting &and& @emp%o@wering peo@ple. ;I found tea@ching m%o@re interesting tha@n any other %jo@bs. %Do@es thi%s mo@tivate yo@u to be a tea@cher!?'''