Exercise Answers

Intro Exercises

These exercises are all in the intro.py file in the exercises directory. Edit the appropriate function in that file to complete each exercise. To run the tests, from the exercises folder, type python test.py <function_name>, like this:

$ python test.py has_vowel

Hint

Match objects are always “truthy” and None is always “falsey”. Truthy means when you convert something to a boolean, it’ll be True.

You can convert the result of re.search to a boolean to get True or False for a match or non-match like this:

>>> bool(re.search(r"hello", sentence))
True
>>> bool(re.search(r"hi", sentence))
False

Has Vowels

Create a function has_vowel, that accepts a string and returns True if the string contains a vowel (a, e, i, o, or u) returns False otherwise.

Tip

Modify the has_vowel function in the intro module.

Your function should work like this:

>>> has_vowel("rhythm")
False
>>> has_vowel("exit")
True

Answer

def has_vowel(string):
    """Return True iff the string contains one or more vowels."""
    return bool(re.search(r"[aeiou]", string))

Is Integer

Create a function is_integer that accepts a string and returns True if the string represents an integer.

By our definition, an integer:

  • Consists of 1 or more digits

  • May optionally begin with -

  • Does not contain any other non-digit characters.

Tip

Modify the is_integer function in the intro module.

Your function should work like this:

>>> is_integer("")
False
>>> is_integer(" 5")
False
>>> is_integer("5000")
True
>>> is_integer("-999")
True
>>> is_integer("+999")
False
>>> is_integer("00")
True
>>> is_integer("0.0")
False

Answer

def is_integer(string):
    """Return True if the string represents a valid integer."""
    return bool(re.search(r"^-?\d+$", string))

Is Fraction

Create a function is_fraction that accepts a string and returns True if the string represents a fraction.

By our definition a fraction consists of:

  1. An optional - character

  2. Followed by 1 or more digits

  3. Followed by a /

  4. Followed by 1 or more digits, at least one of which is non-zero (the denominator cannot be the number 0).

Tip

Modify the is_fraction function in the intro module.

Your function should work like this:

>>> is_fraction("")
False
>>> is_fraction("5000")
False
>>> is_fraction("-999/1")
True
>>> is_fraction("+999/1")
False
>>> is_fraction("00/1")
True
>>> is_fraction("/5")
False
>>> is_fraction("5/0")
False
>>> is_fraction("5/010")
True
>>> is_fraction("5/105")
True
>>> is_fraction("5 / 1")
False

Answer

def is_fraction(string):
    """Return True iff the string represents a valid fraction."""
    return bool(re.search(r'^-?\d+/\d*[1-9]+\d*$', string))

Validation and Search Exercises

These exercises are all in the validation.py file in the exercises directory. Edit the appropriate function in that file to complete each exercise. To run the tests, from the exercises folder, type python test.py <function_name>, like this:

$ python test.py has_word

Has Word

Create a function has_word that accepts a single word string and a sentence string and returns True if the sentence contains the word (as a word by itself), or False otherwise.

Tip

Modify the has_word function in the validation module.

Your function should work like this:

>>> has_word('help', 'She was a big help when I learned French')
True
>>> has_word('help', 'She helped me learn French')
False

Answer

def has_word(word, sentence):
    """Return True iff sentence contains word as an individual word."""
    return bool(re.search(r'\b' + word + r'\b', sentence, re.IGNORECASE))

Four Letter Words

Create a function get_4_letter_words which accepts a sentence and returns all four letter words from the given sentence.

Tip

Modify the get_4_letter_words function in the validation module.

Your function should work like this:

>>> get_4_letter_words("She was a big help when I learned French")
["help", "when"]
>>> get_4_letter_words('help', 'What is going on here?')
["What", "here"]

Answer

def get_4_letter_words(string):
    """Return all four letter words found in the given string."""
    return re.findall(r'\b\w{4}\b', string)

Is Email

Create a function is_email that accepts a string and returns True if the string represents a valid email address.

Tip

Modify the is_email function in the validation module.

Your function should work like this:

>>> is_email('123@example.com')
True
>>> is_email('info123@help.example.com')
True
>>> is_email('help+info@help-example.com')
True
>>> is_email('100%@help-example.com')
True
>>> is_email('123@example.c')
False
>>> is_email('123example.com')
False

Answer

Using the IGNORECASE and VERBOSE flags:

def is_email(string):
    """Return True iff the string represents a valid email"""
    return bool(re.search(r"""
        ^
        [A-Z0-9._%+-] +
        @
        [A-Z0-9.-] +
        \.
        [A-Z] {2,}
        $
    """, string, re.IGNORECASE | re.VERBOSE))

Using the \w special sequence too:

def is_email(string):
    """Return True iff the string represents a valid email"""
    return bool(re.search(r"""
        ^
        [\w.%+-] +
        @
        [\w.-] +
        [.]
        [A-Z] {2,}
        $
    """, string, re.IGNORECASE | re.VERBOSE))

Is Phone Number

Create a function is_phone_number that accepts a string and returns True if the string represents an valid US-style phone number.

Let’s just concern ourselves with allowing (xxx)yyy-zzzz, (xxx) yyy-zzzz, or xxx-yyy-zzzz.

Tip

Modify the is_phone_number function in the validation module.

Your function should work like this:

>>> is_phone_number('202-762-1401')
True
>>> is_phone_number('(202)762-1401')
True
>>> is_phone_number('(202) 762-1401')
True
>>> is_phone_number('20-2762-1401')
False
>>> is_phone_number('202 762-1401')
True
>>> is_phone_number('2027621401')
True

Answer

def is_phone_number(string):
    """Return True iff the string represents a valid US phone number"""
    return bool(re.search(r"""
        ^
        [(]?               # optional open parenthesis
        \d{3}              # area code
        \s* [)-.]? \s*     # Optional space/sybols (including close paren)
        \d{3}              # prefix
        \s* [-.]? \s*      # optional space/sybols
        \d{4}              # remainder of number
        $
    """, string, re.VERBOSE))

Get Email

Create a function get_email that accepts a string, searches for an email address and returns the email address from the string. If there is no valid email, it should return None.

Tip

Modify the get_email function in the validation module.

Your function should work like this:

>>> get_email('Send an email to info@example.com for information')
'info@example.com'
>>> get_email('Do not use email of info@example.c.')
>>> get_email('Help is available at info123@help.example.com.')
'info123@help.example.com'

Answer

def get_email(string):
    """Search the string for a valid email and return it; else return None."""
    match = re.search(r"""
        \b
        [\w.%+-] +
        @
        [\w.-] +
        [.]
        [A-Z] {2,}
        \b
    """, string, re.IGNORECASE | re.VERBOSE)
    if match:
        return match.group()
    else:
        return None

More Search Exercises

These exercises are all in the grouping.py file in the exercises directory. Edit the appropriate function in that file to complete each exercise. To run the tests, from the exercises folder, type python test.py <function_name>, like this:

$ python test.py get_extension

Note

Most of these exercises involves searching in a dictionary. You can find the contents of this dictionary file in the dictionary variable within the grouping module.

Get File Extension

Make a function that accepts a full file path and returns the file extension.

Tip

Modify the get_extension function in the grouping module.

Example usage:

>>> get_extension('archive.zip')
'zip'
>>> get_extension('image.jpeg')
'jpeg'
>>> get_extension('index.xhtml')
'xhtml'
>>> get_extension('archive.tar.gz')
'gz'

Answers

Works with examples given:

def get_extension(filename):
    return re.search(r"([^.]*)$", filename).group()

Works with no extension:

def get_extension(filename):
    match = re.search(r"\.([^.]*)$", filename)
    return match.group(1) if match else ""

Works with only word-based extensions (try a.b/c):

def get_extension(filename):
    match = re.search(r"\.(?!.*\W)([^.]*)$", filename)
    return match.group(1) if match else ""

Hexadecimal Words

Find every word that consists solely of the letters A, B, C, D, E, and F. The input is a variable containing all the words in the file dictionary.txt.

Tip

Modify the hexadecimal function in the grouping module.

Examples: decaf, bead, cab

>>> hexadecimal(dictionary)
['abbe', 'abed', 'accede', 'acceded', 'ace', 'aced', 'ad', 'add', 'added', 'baa', 'baaed', 'babe', 'bad', 'bade', 'be', 'bead', 'beaded', 'bed', 'bedded', 'bee', 'beef', 'beefed', 'cab', 'cabbed', 'cad', 'cafe', 'ceca', 'cede', 'ceded', 'dab', 'dabbed', 'dace', 'dad', 'dead', 'deaf', 'deb', 'decade', 'decaf', 'deed', 'deeded', 'def', 'deface', 'defaced', 'ebb', 'ebbed', 'ed', 'efface', 'effaced', 'fa', 'facade', 'face', 'faced', 'fad', 'fade', 'faded', 'fed', 'fee', 'feed']

Answers

def hexadecimal(dictionary=dictionary):
    """Return a list of all words consisting solely of the letters A to F."""
    return re.findall(r"\b[a-f]+\b", dictionary)

Tetravocalic

Find all words that include four consecutive vowels. The input is a variable containing all the words in the file dictionary.txt.

Tip

Modify the tetravocalic function in the grouping module.

>>> tetravocalic(dictionary)
['aqueous', 'aqueously', 'archaeoastronomies', 'archaeoastronomy', 'assegaaied', 'assegaaiing', 'banlieue', 'banlieues', 'beauish', 'bioaeration', 'bioaerations', 'bioaeronautics', 'blooie', 'booai', 'booais', 'braaied', 'braaiing', 'camaieu', 'camaieux', 'cooee', 'cooeed', 'cooeeing', 'cooees', 'dequeue', 'dequeued', 'dequeueing', 'dequeues', 'dequeuing', 'enqueue', 'enqueued', 'enqueueing', 'enqueues', 'enqueuing', 'epigaeous', 'epopoeia', 'epopoeias', 'euoi', 'euouae', 'euouaes', 'flooie', 'forhooie', 'forhooied', 'forhooieing', 'forhooies', 'giaour', 'giaours', 'gooier', 'gooiest', 'guaiac', 'guaiacol', 'guaiacols', 'guaiacs', 'guaiacum', 'guaiacums', 'guaiocum', 'guaiocums', 'homoiousian', 'homoiousians', 'hypoaeolian', 'hypogaeous', 'looie', 'looies', 'louie', 'louies', 'maieutic', 'maieutical', 'maieutics', 'meoued', 'meouing', 'metasequoia', 'metasequoias', 'miaou', 'miaoued', 'miaouing', 'miaous', 'mythopoeia', 'mythopoeias', 'nonaqueous', 'obsequious', 'obsequiously', 'obsequiousness', 'obsequiousnesses', 'onomatopoeia', 'onomatopoeias', 'palaeoanthropic', 'palaeoecologic', 'palaeoecologies', 'palaeoecologist', 'palaeoecology', 'palaeoethnology', 'pharmacopoeia', 'pharmacopoeial', 'pharmacopoeian', 'pharmacopoeias', 'plateaued', 'plateauing', 'prosopopoeia', 'prosopopoeial', 'prosopopoeias', 'queue', 'queued', 'queueing', 'queueings', 'queuer', 'queuers', 'queues', 'queuing', 'queuings', 'radioautograph', 'radioautographic', 'radioautographies', 'radioautographs', 'radioautography', 'radioiodine', 'radioiodines', 'reliquiae', 'rhythmopoeia', 'rhythmopoeias', 'saouari', 'saouaris', 'scarabaeoid', 'scarabaeoids', 'sequoia', 'sequoias', 'subaqueous', 'tenuious', 'terraqueous', 'toeier', 'toeiest', 'zoaea', 'zoaeae', 'zoaeas', 'zoeae', 'zooea', 'zooeae', 'zooeal', 'zooeas', 'zoogloeae', 'zoogloeoid', 'zooier', 'zooiest']

Answers

def tetravocalic(dictionary=dictionary):
    """Return a list of all words that have four consecutive vowels."""
    return re.findall(r"\b[a-z]*[aeiou]{4}[a-z]*\b", dictionary)

Hexaconsonantal

Find at least one word with 6 consecutive consonants. For this problem treat y as a vowel. The input is a variable containing all the words in the file dictionary.txt.

Tip

Modify the hexaconsonantal function in the grouping module.

>>> re.findall(r"\b.*[^aeiouy\s]{6}.*\b", dictionary)
['bergschrund', 'bergschrunds', 'borschts', 'catchphrase', 'catchphrases', 'crwths', 'eschscholtzia', 'eschscholtzias', 'eschscholzia', 'eschscholzias', 'festschrift', 'festschriften', 'festschrifts', 'grrrls', 'latchstring', 'latchstrings', 'lengthsman', 'lengthsmen', 'sightscreen', 'sightscreens', 'tsktsk', 'tsktsked', 'tsktsking', 'tsktsks', 'watchspring', 'watchsprings', 'watchstrap', 'watchstraps', 'weltschmerz', 'weltschmerzes']
>>> re.findall(r'\b.*[bcdfghjklmnpqrstvwxz]{6}.*\b', dictionary)
['bergschrund', 'bergschrunds', 'borschts', 'catchphrase', 'catchphrases', 'crwths', 'eschscholtzia', 'eschscholtzias', 'eschscholzia', 'eschscholzias', 'festschrift', 'festschriften', 'festschrifts', 'grrrls', 'latchstring', 'latchstrings', 'lengthsman', 'lengthsmen', 'sightscreen', 'sightscreens', 'tsktsk', 'tsktsked', 'tsktsking', 'tsktsks', 'watchspring', 'watchsprings', 'watchstrap', 'watchstraps', 'weltschmerz', 'weltschmerzes']

Answers

def hexaconsonantal(dictionary=dictionary):
    """Return a list of all words with six consecutive consonants."""
    return re.findall(r"\b.*[^aeiouy\s]{6}.*\b", dictionary)

Crossword Helper

Make a function possible_words that accepts a partial word with underscores representing missing letters and returns a list of all possible matches.

Tip

Modify the possible_words function in the grouping module.

Use your crossword helper function to solve the following:

  1. water tank: CIS____

  2. pastry: ___TE

  3. temporary: __A_S_E__

Answers

def possible_words(partial_word):
    pattern = rf"\b{partial_word.replace('_', '.')}\b"
    return re.findall(pattern, dictionary, re.IGNORECASE)

Repeat Letter

Find every word with 5 repeat letters. The input is a variable containing all the words in the file dictionary.txt.

Tip

Modify the five_repeats function in the grouping module.

>>> five_repeats(letter, dictionary)
['inconveniencing', 'nondenominational', 'nonindependent', 'nonintervention', 'noninterventions']

Answers

def five_repeats(letter, dictionary=dictionary):
    """Return all words with at least five occurrences of the given letter."""
    return re.findall(rf"\b(?:.*{letter}.*){{5}}\b", dictionary)

Substitution Exercises

These exercises are all in the substitution.py file in the exercises directory. Edit the appropriate function in that file to complete each exercise. To run the tests, from the exercises folder, type python test.py <function_name>, like this:

$ python test.py normalize_jpeg

Normalize JPEG Extension

Make a function that accepts a JPEG filename and returns a new filename with jpg lowercased without an e.

Tip

Modify the normalize_jpeg function in the substitution module.

Hint

Lookup how to pass flags to the re.sub function.

Example usage:

>>> normalize_jpeg('avatar.jpeg')
'avatar.jpg'
>>> normalize_jpeg('Avatar.JPEG')
'Avatar.jpg'
>>> normalize_jpeg('AVATAR.Jpg')
'AVATAR.jpg'

Answers

def normalize_jpeg(filename):
    return re.sub(r"\.jpe?g$", r".jpg", filename, flags=re.IGNORECASE)

Normalize Whitespace

Make a function that replaces all instances of one or more whitespace characters with a single space.

Tip

Modify the normalize_whitespace function in the substitution module.

Example usage:

>>> normalize_whitespace("hello  there")
"hello there"
>>> normalize_whitespace("""Hold fast to dreams
... For if dreams die
... Life is a broken-winged bird
... That cannot fly.
...
... Hold fast to dreams
... For when dreams go
... Life is a barren field
... Frozen with snow.""")
'Hold fast to dreams For if dreams die Life is a broken-winged bird That cannot fly. Hold fast to dreams For when dreams go Life is a barren field Frozen with snow.'

Answers

def normalize_whitespace(string):
    return re.sub(r"\s+", r" ", string)

Compress blank lines

Write a function that accepts a string and an integer N and compresses runs of N or more consecutive empty lines into just N empty lines.

Tip

Modify the compress_blank_lines function in the substitution module.

Example usage:

>>> compress_blank_lines("a\n\n\nb", max_blanks=1)
'a\n\nb'
>>> compress_blank_lines("a\n\nb", max_blanks=0)
'a\nb'
>>> compress_blank_lines("a\n\nb", max_blanks=2)
'a\n\nb'
>>> compress_blank_lines("a\n\n\n\nb\n\n\nc", max_blanks=2)
'a\n\n\nb\n\n\nc'

Answers

def compress_blank_lines(string, max_blanks):
    """Compress N or more empty lines into just N empty lines."""
    max_newlines = max_blanks + 1
    regex = r"\n{" + str(max_newlines) + ",}"
    return re.sub(regex, "\n" * max_newlines, string)

Normalize URL

I own the domain treyhunner.com. I prefer to link to my website as https://treyhunner.com, but I have some links that use http or use a www subdomain.

Write a function that normalizes all www.treyhunner.com and treyhunner.com links to use HTTPS and remove the www subdomain.

Tip

Modify the normalize_domain function in the substitution module.

Example usage:

>>> normalize_domain("http://treyhunner.com/2015/12/python-list-comprehensions-now-in-color/")
'https://treyhunner.com/2015/12/python-list-comprehensions-now-in-color/'
>>> normalize_domain("https://treyhunner.com/2016/02/how-to-merge-dictionaries-in-python/")
'https://treyhunner.com/2016/02/how-to-merge-dictionaries-in-python/'
>>> normalize_domain("http://www.treyhunner.com/2015/11/counting-things-in-python/")
'https://treyhunner.com/2015/11/counting-things-in-python/'
>>> normalize_domain("http://www.treyhunner.com")
'https://treyhunner.com'
>>> normalize_domain("http://trey.in/give-a-talk")
'http://trey.in/give-a-talk'

Answers

def normalize_domain(url):
    return re.sub(
        r"^https?://(www\.)?treyhunner\.com",
        r"https://treyhunner.com",
        url
    )

Linebreaks

Write a function that accepts a string and converts linebreaks to HTML in the following way:

  • text is surrounded by paragraphs

  • text with two or more line breaks between is considered two separate paragraphs

  • text with a single line break between is separated by a <br>

Tip

Modify the convert_linebreaks function in the substitution module.

Example usage:

>>> convert_linebreaks("hello")
'<p>hello</p>'
>>> convert_linebreaks("hello\nthere")
'<p>hello<br>there</p>'
>>> convert_linebreaks("hello\n\nthere")
'<p>hello</p><p>there</p>'
>>> convert_linebreaks("hello\nthere\n\nworld")
'<p>hello<br>there</p><p>world</p>'

Answers

def convert_linebreaks(string):
    string = re.sub(r"\n{2,}", "</p><p>", string)
    string = re.sub(r"\n", "<br>", string)
    return f"<p>{string}</p>"

Or:

def convert_linebreaks(string):
    return "".join(
        f"<p>{paragraph}</p>"
        for paragraph in re.split(r"\n{2,}", string)
    ).replace("\n", "<br>")

Alternation Exercises

These exercises are all in the alternation.py file in the exercises directory. Edit the appropriate function in that file to complete each exercise. To run the tests, from the exercises folder, type python test.py <function_name>, like this:

$ python test.py is_number

Decimal Numbers

Write a function to match decimal numbers.

We want to allow an optional - and we want to match numbers with or without one decimal point.

Tip

Modify the is_number function in the alternation module.

Example usage:

>>> is_number("5")
True
>>> is_number("5.")
True
>>> is_number(".5.")
False
>>> is_number(".5")
True
>>> is_number("01.5")
True
>>> is_number("-123.859")
True
>>> is_number("-123.859.")
False
>>> is_number(".")
False

Answers

def is_number(num_string):
    return bool(re.search(r"^[-+]?(\d*\.?\d+|\d+\.)$", num_string))

Abbreviate

Make a function that creates an acronym from a phrase.

Tip

Modify the abbreviate function in the alternation module.

Example usage:

>>> abbreviate('Graphics Interchange Format')
'GIF'
>>> abbreviate('frequently asked questions')
'FAQ'
>>> abbreviate('cascading style sheets')
'CSS'
>>> abbreviate('Joint Photographic Experts Group')
'JPEG'
>>> abbreviate('content management system')
'CMS'
>>> abbreviate('JavaScript Object Notation')
'JSON'
>>> abbreviate('HyperText Markup Language')
'HTML'

Answer

def abbreviate(phrase):
    """Return an acronym for the given phrase."""
    return "".join(re.findall(r"(?:[a-z](?=[A-Z])|\b)(\w)", phrase)).upper()

Hex Colors

Write a function to match hexadecimal color codes. Hex color codes consist of an octothorpe symbol followed by either 3 or 6 hexadecimal digits (that’s 0 to 9 or a to f).

Tip

Modify the is_hex_color function in the alternation module.

Example usage:

>>> is_hex_color("#639")
True
>>> is_hex_color("#6349")
False
>>> is_hex_color("#63459")
False
>>> is_hex_color("#634569")
True
>>> is_hex_color("#663399")
True
>>> is_hex_color("#000000")
True
>>> is_hex_color("#00")
False
>>> is_hex_color("#FFffFF")
True
>>> is_hex_color("#decaff")
True
>>> is_hex_color("#decafz")
False

Answer

def is_hex_color(string):
    return bool(re.search(r"^#([\da-f]{3}){1,2}$", string, re.IGNORECASE))
def is_hex_color(string):
    return bool(re.search(r"^#[\da-f]{3}([\da-f]{3})?$", string, re.IGNORECASE))
def is_hex_color(string):
    return bool(re.search(r"^#([\da-f]{3}|[\da-f]{6})$", string, re.IGNORECASE))

Valid Date

Create a is_valid_date function that returns True if given a date in YYYY-MM-DD format.

For this exercise we’re more worried about accepting valid dates than we are about excluding invalid dates.

A regular expression is often used as a first wave of validation. Complete validation of dates should be done in our code outside of regular expressions.

Tip

Create this is_valid_date function in the alternation module.

Example usage:

>>> is_valid_date("2016-01-02")
True
>>> is_valid_date("1900-01-01")
True
>>> is_valid_date("2016-02-99")
False
>>> is_valid_date("20-02-20")
False
>>> is_valid_date("1980-30-05")
False

Answer

import re

def is_valid_date(string):
    return bool(re.search(r"^\d{4}-[01]\d-[0-3]\d$", string))
import re

def is_valid_date(string):
    return bool(re.search(r"""
        ^
        (19|20) \d \d
        -
        ( 0[1-9] | 1[0-2] )
        -
        ( 0[1-9] | [12]\d | 3[01] )
        $
    """, string, re.VERBOSE))
Write more Pythonic code

I send out 1 Python exercise every week through a Python skill-building service called Python Morsels.

If you'd like to improve your Python skills every week, sign up!

You can find the Privacy Policy here.
reCAPTCHA protected (Google Privacy Policy & TOS)