mirror of
https://github.com/Asabeneh/30-Days-Of-Python.git
synced 2026-06-12 21:01:48 +08:00
commit
6796a6e68b
@ -19,52 +19,62 @@
|
||||

|
||||
|
||||
- [📘 Day 18](#%f0%9f%93%98-day-18)
|
||||
- [Regular Expression](#regular-expression)
|
||||
- [Import re module](#import-re-module)
|
||||
- [re functions](#re-functions)
|
||||
- [Regular Expressions](#regular-expression)
|
||||
- [The *re* Module](#The-re-module)
|
||||
- [Functions in *re* Module](#functions-in-re-module)
|
||||
- [Match](#match)
|
||||
- [Search](#search)
|
||||
- [Searching all matches using findall](#searching-all-matches-using-findall)
|
||||
- [Replacing a substring](#replacing-a-substring)
|
||||
- [Spliting text using RegEx split](#spliting-text-using-regex-split)
|
||||
- [Writing RegEx pattern](#writing-regex-pattern)
|
||||
- [Square Bracket](#square-bracket)
|
||||
- [Searching for All Matches Using *findall*](#searching-for-all-matches-using-findall)
|
||||
- [Replacing a Substring](#replacing-a-Substring)
|
||||
- [Splitting Text Using RegEx Split](#splitting-text-using-regex-split)
|
||||
- [Writing RegEx Patterns](#writing-regex-patterns)
|
||||
- [Square Brackets](#square-brackets)
|
||||
- [Escape character(\\) in RegEx](#escape-character-in-regex)
|
||||
- [One or more times(+)](#one-or-more-times)
|
||||
- [Period(.)](#period)
|
||||
- [Zero or more times(*)](#zero-or-more-times)
|
||||
- [Zero or more times(\*)](#zero-or-more-times)
|
||||
- [Zero or one times(?)](#zero-or-one-times)
|
||||
- [Quantifier in RegEx](#quantifier-in-regex)
|
||||
- [Quantifiers in RegEx](#quantifiers-in-regex)
|
||||
- [Cart ^](#cart)
|
||||
- [💻 Exercises: Day 18](#%f0%9f%92%bb-exercises-day-18)
|
||||
|
||||
|
||||
# 📘 Day 18
|
||||
## Regular Expression
|
||||
A regular expression or RegEx is a small programming language that helps to find pattern in data. A RegEx can be used to check if some pattern exists in a different data type. To use RegEx in python first we should import the RegEx module which is *re*.
|
||||
|
||||
### Import re module
|
||||
## Regular Expressions
|
||||
|
||||
A regular expression or RegEx is a special text string that helps to find patterns in data. A RegEx can be used to check if some pattern exists in a different data type. To use RegEx in python first we should import the RegEx module which is called *re*.
|
||||
|
||||
### The *re* Module
|
||||
|
||||
After importing the module we can use it to detect or find patterns.
|
||||
|
||||
```py
|
||||
import re
|
||||
```
|
||||
### re functions
|
||||
To find a pattern we use different set of *re* functions that allows to search a string for match.
|
||||
* *re.match()*:searches only in the beginning of the first line of the string and return match object if found, else return none.
|
||||
* *re.search*:Returns a Match object if there is a match anywhere in the string including or in multiline string.
|
||||
* *re.findall*:Returns a list containing all matches
|
||||
* *re.split*: Returns a list where the string has been split at each match
|
||||
* *re.sub*: Replaces one or many matches with a string
|
||||
|
||||
### Functions in *re* Module
|
||||
|
||||
To find a pattern we use different set of *re* character sets that allows to search for a match in a string.
|
||||
* *re.match()*: searches only in the beginning of the first line of the string and returns matched objects if found, else returns none.
|
||||
* *re.search*: Returns a match object if there is one anywhere in the string, including multiline strings.
|
||||
* *re.findall*: Returns a list containing all matches
|
||||
* *re.split*: Takes a string, splits it at the match points, returns a list
|
||||
* *re.sub*: Replaces one or many matches within a string
|
||||
|
||||
#### Match
|
||||
|
||||
```py
|
||||
# syntac
|
||||
re.match(substring, string, re.I)
|
||||
# substring is a string or a pattern, string is the text we look for a pattern , re.I is case ignore
|
||||
```
|
||||
|
||||
```py
|
||||
txt = 'I love to teach python or javaScript'
|
||||
# It return an object with span, and match
|
||||
import re
|
||||
|
||||
txt = 'I love to teach python and javaScript'
|
||||
# It returns an object with span, and match
|
||||
match = re.match('I love to teach', txt, re.I)
|
||||
print(match) # <re.Match object; span=(0, 15), match='I love to teach'>
|
||||
# We can get the starting and ending position of the match as tuple using span
|
||||
@ -76,19 +86,23 @@ print(start, end) # 0, 15
|
||||
substring = txt[start:end]
|
||||
print(substring) # I love to teach
|
||||
```
|
||||
As you can see from the above example, the pattern we are looking for or the substring *I love to teach* is the beginning of the text. The match function only returns an object if the text starts with the pattern.
|
||||
|
||||
As you can see from the example above, the pattern we are looking for (or the substring we are looking for) is *I love to teach*. The match function returns an object **only** if the text starts with the pattern.
|
||||
|
||||
#### Search
|
||||
|
||||
```py
|
||||
# syntax
|
||||
re.match(substring, string, re.I)
|
||||
# substring is a pattern, string is the text we look for a pattern , re.I is case ignore flag
|
||||
```
|
||||
```py
|
||||
txt = '''Python is the most beautiful language that a human begin has ever created.
|
||||
import re
|
||||
|
||||
txt = '''Python is the most beautiful language that a human being has ever created.
|
||||
I recommend python for a first programming language'''
|
||||
|
||||
# It return an object with span, and match
|
||||
# It returns an object with span and match
|
||||
match = re.search('first', txt, re.I)
|
||||
print(match) # <re.Match object; span=(100, 105), match='first'>
|
||||
# We can get the starting and ending position of the match as tuple using span
|
||||
@ -100,13 +114,15 @@ print(start, end) # 100 105
|
||||
substring = txt[start:end]
|
||||
print(substring) # first
|
||||
```
|
||||
As you can see search is much better than match because it can look for the pattern through out the text. Search return returns a match object right way a first match found. A much better *re* function is *findall*. This function check the pattern through the string and returns all the matches as a list.
|
||||
|
||||
#### Searching all matches using findall
|
||||
As you can see, search is much better than match because it can look for the pattern throughout the text. Search returns a match object with a first match that was found, otherwise it returns _None_. A much better *re* function is *findall*. This function checks for the pattern through the whole string and returns all the matches as a list.
|
||||
|
||||
#### Searching for All Matches Using *findall*
|
||||
|
||||
*findall()* returns all the matches as a list
|
||||
|
||||
```py
|
||||
txt = '''Python is the most beautiful language that a human begin has ever created.
|
||||
txt = '''Python is the most beautiful language that a human being has ever created.
|
||||
I recommend python for a first programming language'''
|
||||
|
||||
# It return a list
|
||||
@ -114,11 +130,12 @@ matches = re.findall('language', txt, re.I)
|
||||
print(matches) # ['language', 'language']
|
||||
|
||||
```
|
||||
As you can see, the word language found two times in the string. Let's practice more
|
||||
|
||||
Let's look for the word both Python and python in the string
|
||||
As you can see, the word language was found two times in the string. Let's practice some more.
|
||||
Now we will look for both Python and python words in the string:
|
||||
|
||||
```py
|
||||
txt = '''Python is the most beautiful language that a human begin has ever created.
|
||||
txt = '''Python is the most beautiful language that a human being has ever created.
|
||||
I recommend python for a first programming language'''
|
||||
|
||||
# It returns list
|
||||
@ -126,9 +143,11 @@ matches = re.findall('python', txt, re.I)
|
||||
print(matches) # ['Python', 'python']
|
||||
|
||||
```
|
||||
Since we are using *re.I* both lowercase and uppercase are included but if we don't have the flag, we write our pattern differently. Let's see that
|
||||
|
||||
Since we are using *re.I* both lowercase and uppercase letters are included. If we don't have that flag, then we will have to write our pattern differently. Let's check it out:
|
||||
|
||||
```py
|
||||
txt = '''Python is the most beautiful language that a human begin has ever created.
|
||||
txt = '''Python is the most beautiful language that a human being has ever created.
|
||||
I recommend python for a first programming language'''
|
||||
|
||||
matches = re.findall('Python|python', txt)
|
||||
@ -139,49 +158,60 @@ matches = re.findall('[Pp]ython', txt)
|
||||
print(matches) # ['Python', 'python']
|
||||
|
||||
```
|
||||
#### Replacing a substring
|
||||
|
||||
#### Replacing a Substring
|
||||
|
||||
```py
|
||||
txt = '''Python is the most beautiful language that a human begin has ever created.
|
||||
txt = '''Python is the most beautiful language that a human being has ever created.
|
||||
I recommend python for a first programming language'''
|
||||
|
||||
match_replaced = re.sub('Python|python', 'JavaScript', txt, re.I)
|
||||
print(match_replaced) # JavaScript is the most beautiful language that a human begin has ever created.
|
||||
print(match_replaced) # JavaScript is the most beautiful language that a human being has ever created.
|
||||
# OR
|
||||
match_replaced = re.sub('[Pp]ython', 'JavaScript', txt, re.I)
|
||||
print(match_replaced) # JavaScript is the most beautiful language that a human begin has ever created.
|
||||
print(match_replaced) # JavaScript is the most beautiful language that a human being has ever created.
|
||||
```
|
||||
Let's add one more example, the following string is really hard to read unless we remove the % symbol. Replacing the % with a empty string will clean the text.
|
||||
|
||||
Let's add one more example. The following string is really hard to read unless we remove the % symbol. Replacing the % with an empty string will clean the text.
|
||||
|
||||
```py
|
||||
|
||||
txt = '''%I a%m te%%a%%che%r% a%n%d %% I l%o%ve te%ach%ing.
|
||||
T%he%re i%s n%o%th%ing as m%ore r%ewarding a%s e%duc%at%i%ng a%n%d e%m%p%ow%er%ing p%e%o%ple.
|
||||
T%he%re i%s n%o%th%ing as r%ewarding a%s e%duc%at%i%ng a%n%d e%m%p%ow%er%ing p%e%o%ple.
|
||||
I fo%und te%a%ching m%ore i%n%t%er%%es%ting t%h%an any other %jobs.
|
||||
D%o%es thi%s m%ot%iv%a%te %y%o%u to b%e a t%e%a%cher.'''
|
||||
D%o%es thi%s m%ot%iv%a%te %y%o%u to b%e a t%e%a%cher?'''
|
||||
|
||||
matches = re.sub('%', '', txt)
|
||||
print(matches) # ['Python', 'python']
|
||||
print(matches)
|
||||
```
|
||||
```sh
|
||||
I am teacher and I love teaching.
|
||||
There is nothing as more rewarding as educating and empowering people.
|
||||
There is nothing as rewarding as educating and empowering people.
|
||||
I found teaching more interesting than any other jobs.
|
||||
Does this motivate you to be a teacher.
|
||||
Does this motivate you to be a teacher?
|
||||
```
|
||||
## Spliting text using RegEx split
|
||||
|
||||
## Splitting Text Using RegEx Split
|
||||
|
||||
```py
|
||||
txt = '''I am teacher and I love teaching.
|
||||
There is nothing as more rewarding as educating and empowering people.
|
||||
There is nothing as rewarding as educating and empowering people.
|
||||
I found teaching more interesting than any other jobs.
|
||||
Does this motivate you to be a teacher.'''
|
||||
print(re.split('\n', txt))
|
||||
Does this motivate you to be a teacher?'''
|
||||
print(re.split('\n', txt)) # splitting using \n - end of line symbol
|
||||
```
|
||||
```sh
|
||||
['I am teacher and I love teaching.', 'There is nothing as more rewarding as educating and empowering people.', 'I found teaching more interesting than any other jobs.', 'Does this motivate you to be a teacher.']
|
||||
['I am teacher and I love teaching.', 'There is nothing as rewarding as educating and empowering people.', 'I found teaching more interesting than any other jobs.', 'Does this motivate you to be a teacher?']
|
||||
```
|
||||
## Writing RegEx pattern
|
||||
|
||||
## Writing RegEx Patterns
|
||||
|
||||
To declare a string variable we use a single or double quote. To declare RegEx variable *r''*.
|
||||
The following pattern only identifies apple with lowercase, to make it case insensitive either we should rewrite our pattern or we should add a flag.
|
||||
|
||||
```py
|
||||
import re
|
||||
|
||||
regex_pattern = r'apple'
|
||||
txt = 'Apple and banana are fruits. An old cliche says an apple a day a doctor way has been replaced by a banana a day keeps the doctor far far away. '
|
||||
matches = re.findall(regex_pattern, txt)
|
||||
@ -190,7 +220,7 @@ print(matches) # ['apple']
|
||||
# To make case insensitive adding flag '
|
||||
matches = re.findall(regex_pattern, txt, re.I)
|
||||
print(matches) # ['Apple', 'apple']
|
||||
# or we can use set of characters method
|
||||
# or we can use a set of characters method
|
||||
regex_pattern = r'[Aa]pple' # this mean the first letter could be Apple or apple
|
||||
matches = re.findall(regex_pattern, txt)
|
||||
print(matches) # ['Apple', 'apple']
|
||||
@ -198,71 +228,75 @@ print(matches) # ['Apple', 'apple']
|
||||
```
|
||||
* []: A set of characters
|
||||
* [a-c] means, a or b or c
|
||||
* [a-z] means, any letter a to z
|
||||
* [A-Z] means, any character A to Z
|
||||
* [a-z] means, any letter from a to z
|
||||
* [A-Z] means, any character from A to Z
|
||||
* [0-3] means, 0 or 1 or 2 or 3
|
||||
* [0-9] means any number 0 to 9
|
||||
* [A-Za-z0-9] any character which is a to z, A to Z, 0 to 9
|
||||
* [0-9] means any number from 0 to 9
|
||||
* [A-Za-z0-9] any single character, that is a to z, A to Z or 0 to 9
|
||||
* \\: uses to escape special characters
|
||||
* \d mean:match where the string contains digits (numbers from 0-9)
|
||||
* \D mean: match where the string does not contain digits
|
||||
* \d means: match where the string contains digits (numbers from 0-9)
|
||||
* \D means: match where the string does not contain digits
|
||||
* . : any character except new line character(\n)
|
||||
* ^: starts with
|
||||
* r'^substring' eg r'^love', a sentence which starts with a word love
|
||||
* r'[^abc] mean not a, not b, not c.
|
||||
* r'^substring' eg r'^love', a sentence that starts with a word love
|
||||
* r'[^abc] means not a, not b, not c.
|
||||
* $: ends with
|
||||
* r'substring$' eg r'love$', sentence ends with a word love
|
||||
* r'substring$' eg r'love$', sentence that ends with a word love
|
||||
* *: zero or more times
|
||||
* r'[a]*' means a optional or it can be occur many times.
|
||||
* r'[a]*' means a optional or it can occur many times.
|
||||
* +: one or more times
|
||||
* r'[a]+' mean at least once or more times
|
||||
* ?: zero or one times
|
||||
* r'[a]?' mean zero times or once
|
||||
* r'[a]+' means at least once (or more)
|
||||
* ?: zero or one time
|
||||
* r'[a]?' means zero times or once
|
||||
* {3}: Exactly 3 characters
|
||||
* {3,}: At least 3 character
|
||||
* {3,}: At least 3 characters
|
||||
* {3,8}: 3 to 8 characters
|
||||
* |: Either or
|
||||
* r'apple|banana' mean either of an apple or a banana
|
||||
* r'apple|banana' means either apple or a banana
|
||||
* (): Capture and group
|
||||
|
||||

|
||||
|
||||
Let's use example to clarify the above meta characters
|
||||
Let's use examples to clarify the meta characters above
|
||||
|
||||
### Square Bracket
|
||||
|
||||
Let's use square bracket to include lower and upper case
|
||||
|
||||
```py
|
||||
regex_pattern = r'[Aa]pple' # this square bracket mean either A or a
|
||||
txt = 'Apple and banana are fruits. An old cliche says an apple a day a doctor way has been replaced by a banana a day keeps the doctor far far away. '
|
||||
matches = re.findall(regex_pattern, txt)
|
||||
print(matches) # ['Apple', 'apple']
|
||||
```
|
||||
|
||||
If we want to look for the banana, we write the pattern as follows:
|
||||
|
||||
```py
|
||||
regex_pattern = r'[Aa]pple|[Bb]anana' # this square bracket mean either A or a
|
||||
regex_pattern = r'[Aa]pple|[Bb]anana' # this square bracket means either A or a
|
||||
txt = 'Apple and banana are fruits. An old cliche says an apple a day a doctor way has been replaced by a banana a day keeps the doctor far far away. '
|
||||
matches = re.findall(regex_pattern, txt)
|
||||
print(matches) # ['Apple', 'banana', 'apple', 'banana']
|
||||
```
|
||||
|
||||
Using the square bracket and or operator , we manage to extract Apple, apple, Banana and banana.
|
||||
|
||||
### Escape character(\\) in RegEx
|
||||
|
||||
```py
|
||||
regex_pattern = r'\d' # d is a special character which means digits
|
||||
txt = 'This regular expression example was made in December 6, 2019.'
|
||||
txt = 'This regular expression example was made on December 6, 2019.'
|
||||
matches = re.findall(regex_pattern, txt)
|
||||
print(matches) # ['6', '2', '0', '1', '9'], this is not what we want
|
||||
|
||||
regex_pattern = r'\d+' # d is a special character which means digits, + mean one or more
|
||||
txt = 'This regular expression example was made in December 6, 2019.'
|
||||
matches = re.findall(regex_pattern, txt)
|
||||
print(matches) # ['6', '2019']
|
||||
```
|
||||
|
||||
### One or more times(+)
|
||||
|
||||
```py
|
||||
regex_pattern = r'\d+' # d is a special character which means digits, + mean one or more times
|
||||
txt = 'This regular expression example was made in December 6, 2019.'
|
||||
txt = 'This regular expression example was made on December 6, 2019.'
|
||||
matches = re.findall(regex_pattern, txt)
|
||||
print(matches) # ['6', '2019']
|
||||
print(matches) # ['6', '2019'] - now, this is better!
|
||||
```
|
||||
|
||||
### Period(.)
|
||||
@ -277,60 +311,74 @@ matches = re.findall(regex_pattern, txt)
|
||||
print(matches) # ['and banana are fruits']
|
||||
|
||||
```
|
||||
### Zero or more times(*)
|
||||
|
||||
### Zero or more times(\*)
|
||||
|
||||
Zero or many times. The pattern could may not occur or it can occur many times.
|
||||
|
||||
```py
|
||||
|
||||
regex_pattern = r'[a].*' # . any character, + any character one or more times
|
||||
regex_pattern = r'[a].*' # . any character, * any character zero or more times
|
||||
txt = '''Apple and banana are fruits'''
|
||||
matches = re.findall(regex_pattern, txt)
|
||||
print(matches) # ['and banana are fruits']
|
||||
|
||||
```
|
||||
### Zero or one times(?)
|
||||
Zero or one times. The pattern could may not occur or it may occur once.
|
||||
|
||||
### Zero or one time(?)
|
||||
|
||||
Zero or one time. The pattern may not occur or it may occur once.
|
||||
|
||||
```py
|
||||
txt = '''I am not sure if there is a convention how to write the word e-mail.
|
||||
Some people write it email others may write it as Email or E-mail.'''
|
||||
regex_pattern = r'[Ee]-?mail' # ? means optional
|
||||
regex_pattern = r'[Ee]-?mail' # ? means here that '-' is optional
|
||||
matches = re.findall(regex_pattern, txt)
|
||||
print(matches) # ['e-mail', 'email', 'Email', 'E-mail']
|
||||
|
||||
```
|
||||
|
||||
### Quantifier in RegEx
|
||||
We can specify the length of the substring we look for in a text, using a curly bracket. Lets imagine, we are interested in substring that their length are 4 characters
|
||||
|
||||
We can specify the length of the substring we are looking for in a text, using a curly bracket. Lets imagine, we are interested in a substring with a length of 4 characters:
|
||||
|
||||
```py
|
||||
txt = 'This regular expression example was made in December 6, 2019.'
|
||||
txt = 'This regular expression example was made on December 6, 2019.'
|
||||
regex_pattern = r'\d{4}' # exactly four times
|
||||
matches = re.findall(regex_pattern, txt)
|
||||
print(matches) # ['2019']
|
||||
|
||||
txt = 'This regular expression example was made in December 6, 2019.'
|
||||
txt = 'This regular expression example was made on December 6, 2019.'
|
||||
regex_pattern = r'\d{1, 4}' # 1 to 4
|
||||
matches = re.findall(regex_pattern, txt)
|
||||
print(matches) # ['6', '2019']
|
||||
|
||||
```
|
||||
|
||||
### Cart ^
|
||||
|
||||
* Starts with
|
||||
|
||||
```py
|
||||
txt = 'This regular expression example was made in December 6, 2019.'
|
||||
txt = 'This regular expression example was made on December 6, 2019.'
|
||||
regex_pattern = r'^This' # ^ means starts with
|
||||
matches = re.findall(regex_pattern, txt)
|
||||
print(matches) # ['This']
|
||||
```
|
||||
|
||||
* Negation
|
||||
|
||||
```py
|
||||
txt = 'This regular expression example was made in December 6, 2019.'
|
||||
txt = 'This regular expression example was made on December 6, 2019.'
|
||||
regex_pattern = r'[^A-Za-z ]+' # ^ in set character means negation, not A to Z, not a to z, no space
|
||||
matches = re.findall(regex_pattern, txt)
|
||||
print(matches) # ['e-mail', 'email', 'Email', 'E-mail']
|
||||
print(matches) # ['6,', '2019.']
|
||||
```
|
||||
|
||||
|
||||
## 💻 Exercises: Day 18
|
||||
1. What is the most frequent word in the following paragraph ?
|
||||
|
||||
1. What is the most frequent word in the following paragraph?
|
||||
```py
|
||||
paragraph = 'I love teaching. If you do not love teaching what else can you love. I love Python if you do not love something which can give you all the capabilities to develop an application what else can you love.
|
||||
```
|
||||
@ -358,20 +406,26 @@ print(matches) # ['e-mail', 'email', 'Email', 'E-mail']
|
||||
(1, 'Python'),
|
||||
(1, 'If')]
|
||||
```
|
||||
2. The position of some particles on the horizontal x-axis -12, -4, -3 and -1 in the negative direction, 0 at origin, 4 and 8 in the positive direction. Extract these numbers and find the distance between the two furthest particles.
|
||||
|
||||
2. The position of some particles on the horizontal x-axis -12, -4, -3 and -1 in the negative direction, 0 at origin, 4 and 8 in the positive direction. Extract these numbers from this whole text and find the distance between the two furthest particles.
|
||||
|
||||
```py
|
||||
points = ['-1', '2', '-4', '-3', '-1', '0', '4', '8']
|
||||
sorted_points = [-4, -3, -1, -1, 0, 2, 4, 8]
|
||||
distance = 12
|
||||
```
|
||||
3. Write a pattern which identify if a string is a valid python variable
|
||||
|
||||
3. Write a pattern which identifies if a string is a valid python variable
|
||||
|
||||
```sh
|
||||
is_valid_variable('first_name') # True
|
||||
is_valid_variable('first-name') # False
|
||||
is_valid_variable('1first_name') # False
|
||||
is_valid_variable('firstname') # True
|
||||
```
|
||||
|
||||
4. Clean the following text. After cleaning, count three most frequent words in the string.
|
||||
|
||||
```py
|
||||
sentence = '''%I $am@% a %tea@cher%, &and& I lo%#ve %tea@ching%;. There $is nothing; &as& mo@re rewarding as educa@ting &and& @emp%o@wering peo@ple. ;I found tea@ching m%o@re interesting tha@n any other %jo@bs. %Do@es thi%s mo@tivate yo@u to be a tea@cher!?'''
|
||||
|
||||
|
||||
Loading…
Reference in New Issue
Block a user