CSCA08H-Python-Assignment 3

2022/4/4 09:06 Assignment 3 https://q.utoronto.ca/courses/250905/assignments/834832 1/12 Assignment 3 Due Friday by 10pm Points 15 Available until Apr 10 at 11:59pm CSCA08H Assignment 3: Poetry Form Checker The starter code is now uploaded. You can download the Zip file for A3 (https://q.utoronto.ca/courses/250905/files/20233080/download download_frd=1) now. Due Date: Friday, April 8, 2022 before 10:00:00pm This assignment must be completed alone (no partners). Please see the syllabus for information about academic offenses. Late policy: There are penalties for submitting the assignment after the due date. These penalties depend on how late your submission is. Please see the Syllabus for more information. Read this handout thoroughly before starting to work on the implementation of your solution. Goals of this Assignment Writing code that uses dictionaries and reads from files. Using top-down design to break a problem down into subtasks and implementing helper functions to complete those tasks. Writing unittests to check whether a function implementation is correct. Practice reading problem descriptions written in English, together with provided docstring examples, and implementing function bodies to solve the problems. Continue to use Python 3, Wing 101, provided starter code, a checker module, and other tools. Poetry Form Checker Limericks, sonnets, haiku, and other forms of poetry each follow prescribed forms that specify the number of lines, the number of syllables on each line, and a rhyme scheme. We’re sure that you’ve all kept yourselves awake wondering if there was a way to have a computer program check whether a poem is a limerick or if it follows some other poetry form. Here’s your chance to resolve the question! In this assignment, you will work on a program that reads a poem from a file, determines how to pronounce the poem, counts the number of syllables in each line, and determines whether lines rhyme. Definitions In this section, we provide definitions for terms that are used in the handout. All links go to https://dictionary.com (https://dictionary.com) . 2022/4/4 09:06 Assignment 3 https://q.utoronto.ca/courses/250905/assignments/834832 2/12 poem (https://www.dictionary.com/browse/poem) A composition in verse, especially one that is characterized by a highly developed artistic form and by the use of heightened language and rhythm to express an intensely imaginative interpretation of the subject. rhyme (https://www.dictionary.com/browse/rhyme) A word agreeing with another in terminal sound: Find is a rhyme for mind and womankind. consonant (https://www.dictionary.com/browse/consonant) (In English articulation) A speech sound produced by occluding with or without releasing (p, b; t, d; k, g), diverting (m, n, ng), or obstructing (f, v; s, z, etc.) the flow of air from the lungs (opposed to vowel). vowel (https://www.dictionary.com/browse/vowel) (In English articulation) A speech sound produced without occluding, diverting, or obstructing the flow of air from the lungs (opposed to consonant). syllable (https://www.dictionary.com/browse/syllable) An uninterrupted segment of speech consisting of a vowel sound, a diphthong, or a syllabic consonant, with or without preceding or following consonant sounds. There are many vowel sounds. For example, the words freight, fraught, fruit, and fright all contain different vowel sounds — there are far more vowel sounds than there are letters that we use to describe them: a, e, i, o, u, and sometimes y. Poetry Form Example: Limerick Here is a stupendous work of limerick art. The lines have been numbered and we have highlighted the last word of each line because those words must rhyme according to the rhyme scheme for limericks. 1. I wish I had thought of a rhyme 2. Before I ran all out of time! 3. I’ll sit here instead, 4. A cloud on my head 5. That rains ’til I’m covered with slime. Here is a description of the form of a limerick: Limericks are five lines long. Lines 1, 2, and 5 each contain eight syllables, and the last words on these lines rhyme with each other. Lines 3 and 4 each contain five syllables and the last words rhyme with each other. (There are additional rules about the location and number of stressed vs. unstressed syllables, but we’ll ignore those rules for this assignment; we will be counting syllables, but not paying attention to whether they are stressed or unstressed.) The CMU Pronouncing Dictionary To determine whether or not two words rhyme, we will use data that describe how to pronounce words. 2022/4/4 09:06 Assignment 3 https://q.utoronto.ca/courses/250905/assignments/834832 3/12 The Carnegie Mellon University Pronouncing Dictionary (https://en.wikipedia.org/wiki/CMU_Pronouncing_Dictionary) describes how to pronounce words. Here is the entry for DAVID : D EY1 V IH0 D . There are five phonemes in David , namely ‘D’, ‘EY1’, ‘V’, ‘IH0’, and ‘D’. Each phoneme describes a sound. The sounds are either vowel sounds or consonant sounds. We will refer to phonemes that describe vowel sounds as vowel phonemes, and similarly for consonants. The phonemes were defined in a project called Arpabet (http://en.wikipedia.org/wiki/Arpabet) that was created by the Advanced Research Projects Agency (ARPA) (http://en.wikipedia.org/wiki/Advanced_Research_Projects_Agency) back in the 1970’s. One can download a text file containing the CMU Pronouncing Dictionary: all the words together with their pronunciations. All vowel phonemes end in a 0 , 1 , or 2 , with the digit indicating a level of syllabic stress. Consonant phonemes do not end in a digit. The number of syllables in a word is the same as the number of vowel sounds in the word, so you can determine the number of syllables in a word by counting the number of phonemes that end in a digit. As an example, in the word secondary ( S EH1 K AH0 N D EH2 R IY0 ), there are 4 vowel phonemes, and therefore 4 syllables. The vowel phonemes are EH1 , AH0 , EH2 , and IY0 . In case you’re curious, 0 means unstressed, 1 means primary stress, and 2 means secondary stress — try saying “secondary” out loud to hear for yourself which syllables have stress and which do not. In this assignment, your program will not need to distinguish between the levels of syllabic stress. The assignment zipfile includes the file pronunciation_dictionary.txt , which contains our version of the Pronouncing Dictionary. Your program will read this file. You must use it, and not any other versions of the CMU Pronouncing Dictionary, because our version differs slightly from the CMU version. We have removed alternate pronunciations for words, and we have removed words that do not start and end with alphanumeric characters (like #HASH-MARK , #POUND-SIGN and #SHARP-SIGN ). Open the pronunciation_dictionary.txt file to see the format; notice that any line beginning with ;;; is a comment. The words in pronunciation_dictionary.txt are all uppercase and do not contain surrounding punctuation. When your program looks up a word, use the uppercase form, with no leading or trailing punctuation. Function transform_string() in the starter code file poetry_functions.py will be helpful here. We have also provided a small, three-word CMU Pronouncing Dictionary named pronunciation_dictionary_small.txt , for use in docstring examples. Describing Poetry Forms For each type of poetry form (limerick, haiku, etc.), we will write its rules as a poetry form description. Here’s our poetry form description for the limerick poetry form: Limerick 8 A 8 A 2022/4/4 09:06 Assignment 3 https://q.utoronto.ca/courses/250905/assignments/834832 4/12 5 B 5 B 8 A On each line, the first piece of information is a number that indicates the number of syllables required on that line of the poem. The second piece of information on each line is a letter that indicates the rhyme scheme. Here, lines 1, 2, and 5 must rhyme with each other because they’re all marked with the same letter ( A ), and lines 3 and 4 must rhyme with each other because they’re both marked with the same letter ( B ). (Note that the choice to use the letters A and B was arbitrary. Other letters could have been used to describe this rhyme scheme.) Two lines of a poem rhyme with each other when the last syllable of the last word on each of the two lines rhyme. Two syllables rhyme when their vowels are the same and they end in the same sequence of consonant phonemes, like gosh and wash. Some poetry forms don’t require lines that rhyme. For example, the haiku form has 5 syllables in the first line, 7 in the second line, and 5 in the third line, but there are no rhyme requirements. Here is an example: Dan’s hands are quiet. Soft peace surrounds him gently: No thought moves the air. We’ll indicate the lack of a rhyme requirement by using the symbol * . Here is our poetry form description for the haiku poetry form: Haiku 5 * 7 * 5 * Some poetry forms have rhyme requirements but don’t have a specified number of syllables per line. Quintain (English) is one such example; these are 5-line poems with an ABABB rhyme scheme, but with no syllable requirements. Here is our poetry form description for the Quintain (English) poetry form (notice that 0 is used to indicate that there is no requirement on the number of syllables in the line): Quintain (English) 0 A 0 B 0 A 0 B 0 B Here’s an example of a Quintain (English) from Percy Bysshe Shelly’s Ode To A Skylark: Teach us, Sprite or Bird, What sweet thoughts are thine: I have never heard Praise of love or wine That panted forth a flood of rapture so divine. 2022/4/4 09:06 Assignment 3 https://q.utoronto.ca/courses/250905/assignments/834832 5/12 Your program will read a poetry form description file containing poetry form names together with their description. For each poetry form in the file: the first line gives the name of the poetry form subsequent lines contain the number of syllables and rhyme scheme for each line of poetry each poetry form is separated from the next by a blank line You may assume that the poetry form names given in a poetry form description file will all be different. We have provided two poetry form description files, poetry_forms.txt and poetry_forms_small.txt , as example poetry form description files. The first is used by the Poetry Form Checker program poetry_program.py while the second is used in doctest examples. We will test your code with these and other poetry form description files. Note: Many poetry forms don’t have a fixed number of lines. Instead, they specify what a stanza looks like, and then the poetry is made up of as many stanzas as the poet likes. We will not consider stanza-based poems in this assignment. Data Representation We use the following Python definitions to create new types relevant to the problem domain. Read the comments in the starter code file poetry_constants.py for detailed descriptions with examples. Type variables defined in poetry_constants.py POEM_LINE str POEM List[POEM_LINE] PHONEMES Tuple[str] PRONUNCIATION_DICT Dict[str, PHONEMES] POETRY_FORM_DESCRIPTION Tuple[Tuple[int], Tuple[str]] POETRY_FORM_DICT Dict[str, POETRY_FORM_DESCRIPTION] Valid Input For all poetry samples used in this assignment, you should assume that all words in the poems will appear as keys in the pronunciation dictionary. We will test with other pronunciation dictionaries, but we will always follow this rule. Required Functions In the starter code file poetry_functions.py , complete the following function definitions. In addition, you may add some helper functions to aid with the implementation of these required functions. Function name: (Parameter types) -> Return type Full Description (paraphrase to get a proper docstring description) 2022/4/4 09:06 Assignment 3 https://q.utoronto.ca/courses/250905/assignments/834832 6/12 get_syllable_count: (POEM_LINE, PRONUNCIATION_DICT) -> int The first parameter represents a non-empty line from a poem that has had leading and trailing whitespace removed. The second parameter represents a pronunciation dictionary. This function is to return the number of syllables in the line from the poem. The number of syllables in a poem line is the same as the number of vowel phonemes in the line. Assume that the pronunciation for every word in the line may be found in the pronunciation dictionary. HINT: Method str.split() and helper functions transform_string() and is_vowel_phoneme() may be helpful. check_syllable_counts: (POEM, POETRY_FORM_DESCRIPTION, PRONUNCIATION_DICT) -> List[POEM_LINE] The first parameter represents a poem that has no blank lines and has had leading and trailing whitespace removed. The second parameter represents a poetry form description. And the third parameter represents a pronunciation dictionary. This function is to return a list of the lines from the poem that do not have the right number of syllables for the poetry form description. The lines should appear in the returned list in the same order as they appear in the poem. If all lines have the right number of syllables, return the empty list. The number of syllables in a line is the same as the number of vowel phonemes in the line. Recall that every line whose required syllable count value is 0 has no syllable count requirement to meet. get_last_syllable: (PHONEMES) -> PHONEMES The parameter represents a tuple of phonemes. The function is to return a tuple that contains the last vowel phoneme and any consonant phoneme(s) that follow it in the given tuple of phonemes. The ordering must be the same as in the given tuple. The empty tuple is to be returned if the tuple of phonemes does not contain a vowel phoneme. HINT: Helper function is_vowel_phoneme() may be helpful. words_rhyme: (str, str, PRONUNCIATION_DICT) -> bool The first parameter represents a word, as does the second parameter. The third parameter represents a pronunciation dictionary. The function is to return whether or not the two words rhyme, according to the pronunciation dictionary. Assume that the pronunciation for both words may be found in the pronunciation dictionary. Recall that two words rhyme if and only if they have the same last syllable. all_lines_rhyme: (POEM, List[int], PRONUNCIATION_DICT) -> bool The first parameter represents a poem that has no blank lines and has had leading and trailing whitespace removed. The second parameter represents a list of poem line indexes. The third parameter represents a pronunciation dictionary. This function is to return whether or not all of the lines in the poem whose indices are in the list given by the second parameter, rhyme, according to the pronunciation dictionary. Recall that two lines in a poem rhyme if and only if the last word on each line rhyme. Assume that the pronunciation for the words in the poem lines may be found 2022/4/4 09:06 Assignment 3 https://q.utoronto.ca/courses/250905/assignments/834832 7/12 in the pronunciation dictionary, and that the list of line indexes is not empty and only contains valid line indexes for the given poem. get_symbol_to_lines: (Tuple[str]) -> Dict[str, List[int]] The first parameter represents a rhyme scheme, in the format of the second item in a POETRY_FORM_DESCRIPTION. The function is to return a Python dict where each key is a symbol given in the rhyme scheme. The value associated with a key is a list of the indexes in the rhyme scheme where the symbol appears. An empty dictionary should be returned when the rhyme scheme is empty. check_rhyme_scheme: (POEM, POEM_FORM_DESCRIPTION, PRONUNCIATION_DICT) -> List[List[POEM_LINE] The first parameter represents a poem that has no blank lines and has had leading and trailing whitespace removed. The second parameter represents a poetry form description. And the third parameter represents a pronunciation dictionary. Return a list of lists of lines from the poem that should rhyme with each other according to the poetry form description but do not. If all lines rhyme as they should, return the empty list. Recall that every line whose rhyme scheme symbol is * has no rhyme requirement to meet. Notes: The lines should appear in each inner list in the same order as they appear in the poem. If n lines are supposed to rhyme with each other and at least one line does not, all n lines should appear in the inner list. For example: if the rhyme scheme is (‘A’, ‘A’, ‘B’, ‘B’, ‘A’) , and the poem lines are [‘On the’, ‘plains, a’, ‘triceratops climbs.’, ‘The day adjourns.’, ‘Absurd!’] , this function should return either [[‘On the’, ‘plains, a’, ‘Absurd!’], [‘triceratops climbs.’, ‘The day adjourns.’]] or [[‘triceratops climbs.’, ‘The day adjourns.’], [‘On the’, ‘plains, a’, ‘Absurd!’]] . In the starter code file poetry_reader.py , complete the following function definitions. Add a helper function(s) to aid with the implementation of these functions. You may assume that all files that are given for reading follow the file formatting rules given in this assignment. Function name: (Parameter types) -> Return type Full Description (paraphrase to get a proper docstring description) read_pronunciation: (TextIO) -> PRONUNCIATION_DICT The parameter represents a file in the format of the CMU Pronouncing Dictionary that has been opened for reading. Return the pronunciation dictionary based on the given file. HINT: Method str.split() and function tuple() might be helpful. read_poetry_form_descriptions: (TextIO) -> POETRY_FORMS_DICT The parameter represents a poetry form description file that has been opened for reading. Return a dictionary where each key is a poetry form name and each value is the poetry form description for that form as given in the file. 2022/4/4 09:06 Assignment 3 https://q.utoronto.ca/courses/250905/assignments/834832 8/12 HINT: Accumulate information in list s that are turned into tuple s when complete. The main program Once you have correctly implemented the functions in poetry_functions.py and poetry_reader.py , execution of the main program ( poetry_program.py ) will: 1. Read our version of the CMU Pronouncing Dictionary ( datasets/pronouncing_dictionary.txt ) 2. Read datasets/poetry_forms.txt 3. Repeatedly ask the user for a poetry form to check and the name of a file containing a poem. The program will report on whether or not the poem satisfies the poetry form description for the chosen poetry form. A sample of poems is provided in the folder sample_poems . A Sonnet sample has not been provided, as we could not find one that did not contains words that are not in our pronouncing dictionary. If you try other poems and encounter a key error , it is likely that the poem contains a word whose pronunciation is not in the pronouncing dictionary. Required Testing ( unittest ) Write (and submit) a set of unittests for the function get_last_syllable . We have provided a starter file named test_get_last_syllable.py . Complete this file with your unittests. For each test method, include a brief docstring description specifying what is being tested. For unittest methods, the docstring description should not include a type contract or example calls. Files to Download Please click here (https://q.utoronto.ca/courses/250905/files/20233080/download download_frd=1) to download the Assignment 3 Starter Files and then extract the files in the zip archive. A description of each of the files that we have provided is given in the paragraphs below: Datasets folder: datasets This folder contains four data files that will be used in this assignment: poetry_forms.txt , poetry_forms_small.txt , pronunciation_dictionary.txt and pronunciation_dictionary_small.txt . You must use this version of the CMU Pronouncing Dictionary and should not change these files. Sample poems folder: sample_poems This folder contains some sample poems that you may use when running poetry_program.py . The files with names starting with bad_ may be used to test with poems that break the rules for their form. If interested, add new poems to this folder, but take care to only include poems that contain words whose pronunciation appears in the pronouncing dictionary. Otherwise, the program will fail with a key error . 2022/4/4 09:06 Assignment 3 https://q.utoronto.ca/courses/250905/assignments/834832 9/12 Provided Python files: poetry_constants.py and poetry_program.py The file poetry_constants.py contains definitions for the Data Representation types described earlier. The file poetry_program.py contains the main Poetry Form Checker program that you can run to test poems after completing the required functions. Do not change these files. Provided Python starter file: poetry_functions.py The file poetry_functions.py contains helper functions and headers with docstrings for the required functions described in the first table above. Complete the functions in this file, using helper functions when appropriate. Provided Python starter file: poetry_reader.py The file poetry_reader.py contains headers with docstrings for the required functions described in the second table above. Complete the functions in this file, using helper functions when appropriate. Provided Python starter file: test_get_last_syllable.py The file test_get_last_syllable.py contains a start to the required unittests for the get_last_syllable function. Complete this module by adding appropriate unittest methods. Checker: a3_checker.py We have provided a checker program ( a3_checker.py ) together with associated files a3_pyta.json and checker_generic.py . See below for more details. Do not change these files. Additional requirements Do not add statements that call print , input , or open , or use an import statement. Do not use any break or continue statements. We are imposing this restriction (and we have not even taught you these statements) because they are very easy to abuse, resulting in terrible, hard-to-read code. Do not modify or add to the import statements provided in the starter code. Testing your Code We strongly recommended that you test each function as you write it. As usual, follow the Function Design Recipe (we’ve done the first couple of steps for you). Once you’ve implemented a function, run it on the examples in the docstring, as well as some other examples that you come up with, to convince yourself that the function body works correctly. Here are a few tips: Be careful that you test the right thing. Be clear on what the functions are doing before determining whether your tests work. Can you think of any special cases for your functions Test each function carefully. 2022/4/4 09:06 Assignment 3 https://q.utoronto.ca/courses/250905/assignments/834832 10/12 Once you are happy with the behavior of a function, move to the next function, implement it, and test it. Remember to run the Assignment 3 checker that is described below! How to tackle this assignment Principles: To avoid getting overwhelmed, deal with one function at a time. Start with functions that don’t call any functions that are not provided; this will allow you to test them right away. The steps listed below give you a reasonable order in which to write the functions. For each helper function that you write, start by adding at least one example call to the docstring before you write the function. Keep in mind throughout that any function you have might be a useful helper for another function. As you write each function, begin by designing it in English, using only a few sentences. If your design is longer than that, shorten it by describing the steps at a higher level that leaves out some of the details. When you translate your design into Python, look for steps that are described at such a high level that they don’t translate directly into Python. Design a helper function for each of these high-level steps, and put a call to the helpers into your code. Don’t forget to write a great docstring for each helper! Since you are required to write unittests for test_get_last_syllables , write those tests before or as you write the function. That way you can execute the unittests to test the code you are writing. Steps: Here is a good order in which to solve the pieces of this assignment. 1. Read this handout thoroughly and carefully, making sure you understand everything in it. 2. Read the poetry_functions.py starter code to get an overview of what you will be writing. 3. Implement and test the required functions in poetry_functions.py , along with helper functions. Now is also a good time to write the unittest test files test_get_last_syllables.py .. 4. Next, read the starter code poetry_reader.py , and implement and test those functions. 5. Skim through the code provided in poetry_program.py and run it. If there are any problems with the results, try to identify which of your functions has an issue, and go back to testing that function. Marking These are the aspects of your work that may be marked for Assignment 2: Coding style (20%): Make sure that you follow Python style guidelines that we have introduced and the Python coding conventions that we have been using throughout the semester. Although we don’t provide an exhaustive list of style rules, the checker tests for style are complete, so if your code passes the checker, then it will earn full marks for coding style with one exception: docstrings for any helper functions you add may be evaluated separately. For each 2022/4/4 09:06 Assignment 3 https://q.utoronto.ca/courses/250905/assignments/834832 11/12 occurrence of a PyTA error, one mark (out of 20) deduction will be applied. For example, if a C0301 (line-too-long) error occurs 3 times, then 3 marks will be deducted. All functions you design and write on your own (helper functions), should have complete docstrings including preconditions when you think they are necessary. Correctness (80%): Your functions should perform as specified. Correctness, as measured by our tests, will count for the largest single portion of your marks. Once your assignment is submitted, we will run additional tests not provided in the checker. Passing the checker does not mean that your code will earn full marks for correctness. Assignment 3 Checker We are providing a checker module ( a3_checker.py ) that tests two things: whether your code follows the CSCA08 Python style guidelines, and whether your functions are named correctly, have the correct number of parameters, and return the correct types. To run the checker, open a3_checker.py and run it. Note: the checker file should be in the same folder as your poetry_functions.py and poetry_reader.py and other files included in the starter codes folder. When you run the checker, be sure to scroll up to the top and read all messages. If the checker passes for both style and types: Your code follows the style guidelines. Your function names, number of parameters, and return types match the assignment specification. This does not mean that your code works correctly in all situations. We will run a different set of tests on your code once you hand it in, so be sure to thoroughly test your code yourself before submitting it. If the checker fails, carefully read the message provided: It may have failed because your code did not follow the style guidelines. Review the error description(s) and fix the code style. Please see the PyTA documentation (http://www.cs.toronto.edu/~david/pyta/) for more information about errors. It may have failed because: you are missing one or more functions, one or more of your functions is misnamed, one or more of your functions has the incorrect number or type of parameters, or one or more of your function return types does not match the assignment specification, or your .py file is misnamed or in the wrong place. Read the error message to identify the problematic function, review the function specification in the handout, and fix your code. Make sure the checker passes before submitting. 2022/4/4 09:06 Assignment 3 https://q.utoronto.ca/courses/250905/assignments/834832 12/12 No Remark Requests No remark requests will be accepted. A syntax error could result in a grade of 0 on the assignment. Before the deadline, you are responsible for running your code and the checker program to identify and resolve any errors that will prevent our tests from running. What to Hand In The very last thing you do before submitting should be to run the checker program one last time. Otherwise, you could make a small error in your final changes before submitting that causes your code to receive zero for correctness. Submit poetry_functions.py , poetry_reader.py and test_get_last_syllable.py on MarkUs by following the familiar instructions from MarkUs Perform exercises. Remember that spelling of filenames, including case, counts: your file must be named exactly as above.