程序案例-COMP 1005/1405-Assignment 4

COMP 1005/1405 Summer 2021 Assignment 4 Word Fun (Sets, Dictionaries, Tuples) & Art Due: Friday, June 11th at 2:00pm (no late submissions allowed) Submit a single zip file called A4.zip. The assignment has 10 marks. Notes: It is essential that you use the built-in, default, archiving program to create this zip file. If we cannot easily open your zip file and extract the python files from them we cannot grade your assignment. Other file formats, such as rar, 7zip, etc, will not be accepted. Windows: Highlight (select with ctrl-click) all of your files for submission. Right-click and select “Send to” and then “Compressed (zipped) folder”. Change the name of the new folder “A4.zip”. MacOS: Highlight (select with shift-click) all of your files for submission in Finder. Right-click on one of the files and select “compress N items…” where N is the number of files you have selected. Rename the “Archive.zip” file “A4.zip”. Linux: use the zip program. After submitting your A4.zip file to brightspace, be sure that you download it and then unzip it to be certain that what you have submitted is what you wanted to submit. This also checks that your zip file is not corrupted and can be unzipped. Please note that reasons similar in nature to “I submitted the wrong files” or “I didn’t know the zip file was corrupt” will not be accepted as an excuse after the due date. Submit early and often. brightspace will save your latest submission. I would highly suggest that you submit as soon as you have one question done and keep re-submitting each time you add another problem (or partial problem). COMP 1005/1405 Summer 2021 Q1: Word Stats [40 marks] In this problem you will generate some statistics for a body of text (English text). You are NOT allowed to import any modules to help with this. You will write the following functions (in a file called words.py): unique_words( text : str ) -> list: The function takes a body of text (string) and outputs (returns) a list of all unique words in the text. The output list will contain strings. For example, calling unique_words( “The,. cat. Live’s in the! the road.”) will return the list [ “the”, “cat”, “live’s”, “road”, “in” ] Notice that punctuation is removed from the words. Notice that all words in the output are lower- case. Notice the word `the` only appears once in the output list. The order of the words in the output list does NOT matter. top_words( text : str, number : int) -> list The function takes body of text (string) and an integer. The function returns a list of tuples. Each tuple will look like (word, frequency) where word is a string and frequency is the number of times that word appears in the input text. The returned list will have “number” tuples in it that correspond to the “number” most frequently used words in the input text. For example, calling top_words(“cat dog cat. Dog cat cat kitten.”, 2) will return the list (of tuples) [(“cat”, 4), (“dog”, 2)] Notice that the most frequently used word appears first. Your list must return the tuples in decreasing order (based on frequency of the word appearing). Again, we don’t care about the case of words (“Cat” is the same as “cat” for this). All outputs should be in lower-case though. COMP 1005/1405 Summer 2021 What if “number” is larger than the actual number of unique words in the text Only make your output list as big as needed to include all unique words. What if there are multiple words with the same frequency Your output list can have more than “number” tuples in it if needed. When you are creating your output list, if you reach “number” words and there are more words with the same frequency then you should include the rest of the words with that frequency. For example, suppose you wanted the top 2 words, but the data was as follows: ‘cat’ appears 10 times ‘dog’ appears 7 times ‘eel’ appears 7 times ‘cow’ appears 4 times The output list would be [(‘cat’, 10), (‘dog’, 7), (‘eel’, 7)]. The order of the dog and eel do not matter. display_stats( text : str, characters : str ) -> str The function takes body of text (string) and a string consisting of characters to consider. The function outputs (returns) a string that when printed will be a visual display (frequency plot) of the frequency that each character that appears in the text. Note that the function returns a string. When that string is later printed, it is a plot of the frequencies of the punctuation marks. Example, if text = “the, c!a!t. Sits.. on the! Bed’s’ edge.”, then calling print( display_stats(text, “. ,!’”)) will display the following on the screen (output is shown in yellow background with blue font for illustrative purposes; there are no spaces at the end of any line; there is a newline at the end of each line EXCEPT the last) Character Stats (10 in total) -+—————————————- (#=10) .|######################################## | ,|########## !|############################## ’|#################### -+—————————————- The order of the characters in the plot follows the same order as the input string. COMP 1005/1405 Summer 2021 The character (or characters) with the maximal count will display exactly 40 hash-tags. All others are scaled using the same scale (that was used to make the max 40 #’s). The second line indicates how the scaling factor. In this example, Ten “#”s correspond to one appearance of the character in the test. There were 3 exclamation marks and so there are 3*10 = 30 hash tags for this character (40 for “.”, 0 for “ ”, 10 for “,”, 30 for “!” and 20 for apostrophe). You can round to the nearest integer number of “#”s when making your plot string. Do NOT round your scale itself though (as it may round down to zero). Note that this plot is NOT meant to be an exact representation of the data for all cases. For example, if characters = “ab” and there are 10000 occurrences of ‘a’ and 2 occurrences of ‘b’, then there will be 40 “#”s for a and zero “#”s for b. That is expected. Program Using the wordProgram.py file (that is provided) as your starting point, complete the program. The program asks the user for a file name, then loads the data from the You will also include a main function (and main guard) that prompts the user for the name of a file, loads the file and then runs your functions with the data read from the file. Your program will display some information as shown in the example: (yellow background indicates user input) Input name of file : fancy-story.txt Input the characters you want [whole alphabet if enter] : .,! fancy-story.txt stats ——————— Number of unique words : 287 Top 5 words used: the, cat, dog, indeed, was Character Stats (89 in total) -+—————————————- (#=0.5) .|######################################## ,|### !|## -+—————————————- Suppose that there were 80 “.” characters in the file, 5 “,” characters and 4 “!” characters. Here, we should really have 2.5 “#”s for “,”, but we round this to 3 for the plot. Note that the “——-” line has as many dashes as needed to completely underline the text above it (and this will depend on the filename used). You can assume (1) the file entered exists and is in the same directory as words.py and wordProgram.py, and (2) the characters input will have no whitespace in it (unless there was no COMP 1005/1405 Summer 2021 characters entered). If the user does not enter any characters (just hits enter) then the alphabet “abcdefghijklmnopqrstuvwxyz” will be used for the characters. Include your words.py file and your wordProgram.py in your submission zip file. Note: the marks breakdown is as follows: Style marks – 1 mark unique_words – 1 mark top_words – 2 marks display_stats – 2 marks program – 2 marks Note: A helper function that sorts tuples of the form (word, frequency) based on frequency will be provided. Q2: Drawing [2 marks] Think about your experience so far in COMP 1005/1405. Think about what you have learned and what you have done. The joys and frustrations. Think about what you might be able to do with what you have learned. Your task in this problem is to either draw a picture that expresses this reflection or to write about it (or a combination of both). My hope in asking you to do this exercise is that you will critically reflect on what you have learned and perhaps where you would like to take what you have learned forward. It should also make this assignment a bit lighter than the others. The intention is that this problem should not cause you any stress. Do not worry about your “artistic ability”. You will not be graded on how “artistic” your drawing is or how grammatically correct your writing is. If you put an honest effort into the problem, you will receive full marks. Have fun! You can create your drawing or writing in any way you wish but you should save it in PDF format. Ideally, the size of your drawing should be a standard letter size in horizontal orientation and the length of writing should not be more than one page. Time permitting, we will show your pictures to the class. If you want your submission to remain private (and not shown to the class) then save your file as private-name.pdf, where name is your name. COMP 1005/1405 Summer 2021 If you agree to have your picture/text displayed in the #top-cows channel in discord (and possibly in future semesters of this course or related courses) then submit your drawing in a file called public-name.pdf, where name is your first (given) name. For public submissions, do NOT include your full name/ID in your picture/text unless you are OK with everyone seeing it. Since you are submitting using brightspace, we already know who you are so we don’t need this information in your picture. Note: Offensive/rude/insensitive submissions will receive zero marks and may be forwarded to the Dean’s office depending on the severity. (This has never happened before, and I do not anticipate it happening now.) Save your program in a file as specified above and add it to your submission zip file. Recap [A4.zip] Submit a single zip file called A4.zip. Your zip file should have two (or three) files in it. ● words.py ● wordProgram.py ● Either private-name.pdf or public-name.pdf (or both).