CSCI 2040-CSCI2040 Python-Assignment 4

CSCI 2040 A/B: Introduction to Python 2021-2022 Term 1 Lab Assignment 4 Instructor: S.H. Or Due: 23:59 on Friday, Dec. 3, 2021 Notes 1. You are allowed to form a group of two to do this lab assignment. 2. You are strongly recommended to bring your own laptop to the lab with Anaconda1 and Pycharm2 installed. You don’t even have to attend the lab session if you know what you are required to do by reading this assignment. 3. For those of you using the Windows PC in SHB 924A (NOT recommended) with your CSDOMAIN account3, please login and open “Computer” on the desktop to check if an “S:” drive is there. If not, then you need to click “Map network drive”, use “S:” for the drive letter, fill in the path ntsvr1userapps and click “Finish”. Then open the “S:” drive, open the Python3 folder, and click the “IDLE (Python 3.7 64-bit)” shortcut to start doing the lab exercises. You will also receive a paper document and if anything has changed, please be subject to the paper. 4. Passing the test scripts we have provided doesn’t guarantee full marks for your question as our grade scripts will test for more cases 5. You may assume that all the corner cases we have not mentioned in this document will not appear in the hidden test cases, so you do not have to worry too much about wrong inputs unless you are required to do so. 6. Your code should only contain specified functions. Please delete all the debug statements (e.g. print()) before submission. Exercise 1 (25 marks) 1. Given dictionary vehicle_dict which is consisted of vehicles and their weights in kilograms. vehicle_dict={“Sedan”: 1500, “SUV”: 2000, “Pickup”: 3000, “Minivan”: 1600, “Van”: 2400, “Semi”: 13600, “Bicycle”: 7, “Motorcycle”: 110} Using list comprehension to contruct a list of the names of vehicles with weight below 2500 kilograms, and capitalize all the names in the same list comprehension. 1An open data science platform powered by Python. https://www.continuum.io/downloads 2A powerful Python IDE. https://www.jetbrains.com/pycharm/download/ 3A non-CSE student should ask the TA for a CSDOMAIN account. 1 of 9 CSCI 2040 A/B Lab Assignment 4 Page 2 For the convenience of testing, please name the list as list1. The expected output should be a list as follows: (5 marks) [‘SEDAN’, ‘SUV’, ‘MINIVAN’, ‘VAN’, ‘BICYCLE’, ‘MOTORCYCLE’] 2. Using list comprehension to construct a list of a multiplication formula table. For the convenience of testing, please name the list as list2. The expected output should be a list as follows: (5 marks) [‘1*1=1’, ‘1*2=2 2*2=4’, ‘1*3=3 2*3=6 3*3=9’, ‘1*4=4 2*4=8 3*4=12 4*4=16’, ‘1*5=5 2*5=10 3*5=15 4*5=20 5*5=25’, ‘1*6=6 2*6=12 3*6=18 4*6=24 5*6=30 6*6=36’, ‘1*7=7 2*7=14 3*7=21 4*7=28 5*7=35 6*7=42 7*7=49’, ‘1*8=8 2*8=16 3*8=24 4*8=32 5*8=40 6*8=48 7*8=56 8*8=64’, ‘1*9=9 2*9=18 3*9=27 4*9=36 5*9=45 6*9=54 7*9=63 8*9=72 9*9=81’] 3. Let a be the list of values produced by range(1, 11). Using the function map with a lambda argument, write an expression that will produce a list of squares of the corresponding values in list a. For the convenience of testing, please name the list as list3. The expected output should be a list as follows: (5 marks) [1, 4, 9, 16, 25, 36, 49, 64, 81, 100] 4. Let a be the list of values produced by range(1, 11). Using the function filter with a lambda argument, write an expression that will get all even numbers of a to form a new list. For the convenience of testing, please name the list as list4. The expected output should be a list as follows: (5 marks) [2, 4, 6, 8, 10] 5. fruit = [{“apple”: 10, “pear”: 20, “banana”: 30, “strawberry”: 50}, {“apple”: 12, “pear”: 5, “banana”: 20, “strawberry”: 5}, {“apple”: 15, “pear”: 26, “banana”: 32, “strawberry”: 8}] fruit is a list of dictionaries and and each dictionary contains exactly the same keys. Using the function reduce with a lambda argument, write an expression to get a new dictionary with the same keys. The value to each key of the new dictionary is the sum of all the values to the corresponding key. For the convenience of testing, please name the dictionary as dict5. The key values could be both int and float type. The expected output should be a dictionary as follows: (5 marks) {‘apple’: 37.0, ‘pear’: 51.0, ‘banana’: 82.0, ‘strawberry’: 63.0} 2 of 9 CSCI 2040 A/B Lab Assignment 4 Page 3 Note that in this exercise, you are required to write only one line of the code for each expression. We will manually check the correctness of your answer. Save your script for this exercise in p1.py Exercise 2 (30 marks) A wildcard character is a single character used to represent a number of characters or an empty string. It is often used in file searches so the full name need not be typed. Here we just consider two type of wildcard character: the asterisk character (*, also called “star”) and the question mark . When specifying file names or paths, * matches zero or more characters (e.g., doc* matches doc and document but not dodo), matches exactly one character (e.g., do matches doc but not dodo or do). Write a program that reads an input file, and then prints the number of alphabet characters, words, lines and the number of digits in file to an output file. You only need to write one program satisfying all the requirements. The command line should be like python p2.py input_filename output_filename. But the “input_filename” can contain the wildcard character * and . Hint: You could use argparse module4. (5 marks) We only consider text files in the current working directory by default. If the input filename doesn’t contain any wildcard character, and the input file exists, then the content of output file should contain the statistics of the input file as follows: (10 marks) Number of characters: XXX Number of words: YYY Number of lines: ZZZ Number of digits: WWW If the input filename contains the wildcard character, but there is no matching file, your program should print No matching!. If several matching files are found, then you should put the collected information for all the matched file(s) into the output file. Suppose two matched files are found, then the content of output file should be (10 marks) Name of file: XXX1 Number of characters: YYY1 Number of words: ZZZ1 Number of lines: KKK1 Number of digits: WWW1 4https://docs.python.org/3/library/argparse.html 3 of 9 CSCI 2040 A/B Lab Assignment 4 Page 4 Name of file: XXX2 Number of characters: YYY2 Number of words: ZZZ2 Number of lines: KKK2 Number of digits: WWW2 Handle the input file non-existence issue by using try…except.. statement. Suppose your input filename is test2.txt and there is no such file in current directory, the error message should be Opening file test2.txt failed!. Only the message need to be printed, and program should exit after that. (5 marks) Example: If there is a file test.txt, whose content is: 123456 7 8 9 abc I go to school by bus The output file should contains: Number of characters: 19 Number of words: 11 Number of lines: 3 Number of digits: 9 Some notes: – “[apple tree]” is regarded as 2 words; “abc@cse.cuhk.edu.hk” is regarded as one word – If there is no file in current dictionary, then we don’t need to output a file, just print out the messege “No matching!” or “Opening file test2.txt failed!” and then exit. – Please avoid inputing a single ’*’ in the command line: python p2.py * outputfilename , because some terminal may handle this case and we cannot run our script suc- cessfully. – If the input filename contains wildcard characters, and only ONE file is found, we still need to output the line “Mane of file: XXX1″ in the output file. Save your script for this exercise in p2.py 4 of 9 CSCI 2040 A/B Lab Assignment 4 Page 5 Exercise 3 (25 marks) Visualization is widely used for data analysis. In this exercise, you will use Python scripts to draw 3 kinds of figures. All the script for this exercise should be in a single file p3.py. The package matplotlib is useful for this exercise, which can be imported as following5: import matplotlib.pyplot as plt Note: Your grade for this exercise depends on the readability of your figures. Besides p3.py script, you should also submit the specific named .png files, and all the figures should be clear to read. Histogram and line chart (10 marks) import numpy as np np.random.seed(0) mu = 0 sigma = 1 random_numbers = mu + sigma * np.random.randn(10000) Histogram6 is a way to visualize the distribution of continuous variables. Write script in p3.py to plot a histogram for the random numbers in random_numbers generated by the above scripts. Set the number of bins equal to 100 in the histogram. The x-axis is the value of random numbers, and the y-axis is the probability density. Hint: You could use hist() in matplotlib. Draw a line chart for the function y = 1 σ √ 2pie (x μ)2/2σ2 in the same figure. Here, μ = 0, σ = 1 and x=bins where bins is one of the return values of hist(). Hint: You could use plot() in matplotlib. You are recommended to try color=’green’ options in hist() and linestyle=’dashed’ and color=’red’ options in plot(). Save your figure as histogram_line.png. Hint: You could use savefig() in matplotlib. Pie chart (10 marks) Given a dictionary below: Colleges = {‘New Asia College’: 3345, ‘United College’: 3364, ‘Shaw College’: 3342, ‘Morningside College’: 300} 5https://matplotlib.org/ 6https://en.wikipedia.org/wiki/Histogram 5 of 9 CSCI 2040 A/B Lab Assignment 4 Page 6 The keys of the dictionary Colleges are 4 selected colleges in CUHK, and the values to each key are the number of students in corresponding college in 20207. Pie chart8 is a way to visualize the distribution of categorical data. Write script in p3.py to plot a pie chart for the number of students of the 4 colleges in CUHK in 2020. The categories in the pie chart are the colleges in CUHK, and there should be label text for each category on the figure. Using the explode parameter of the pie function to make the wedge of the pie chart which has least values to look apart from the rest of the chart. Hint: You may use pie() in matplotlib. Save your figure as pie.png. Hint: You could use savefig() in matplotlib. Bar chart (5 marks) Bar chart9 is also suitable to visualize categorical variables. Write script in p3.py to plot a bar chart for the number of students of the 4 colleges in CUHK. There should be label text for each colleges on the figure. Hint: You could use bar() or barh() in matplotlib. Save your figure as bar.png. Hint: You could use savefig() in matplotlib. Save your script for this exercise in p3.py Exercise 4 (20 marks) Mastering a programming language is not only about the syntax, but also requires one to know the programming style. In this exercise, you will get a sense of the Pythonic way of programming. In a nutshell, a Pythonic way of programming is to utilize Python’s features that are designed to make a programmer’s life easier. Here are some examples: 1. Creating list of lists (using list comprehension) Suppose you want a 2-dimensional array that is a list of 4 empty lists. Since Python does not have declaration for a 2-dimensional array, you need to construct it from lists. The wrong way is to append the same list for 4 times (Why it is wrong10). # wrong code list = [] list_of_lists = [] for i in range(4): list_of_lists.append(list) 7https://www.iso.cuhk.edu.hk/images/publication/facts-and-figures/2020/html5/english/ 12/ 8https://en.wikipedia.org/wiki/Pie_chart 9https://en.wikipedia.org/wiki/Bar_chart 10http://cryptroix.com/2016/10/25/python-call-object/ 6 of 9 CSCI 2040 A/B Lab Assignment 4 Page 7 The ugly code runs a explicit for-loop. # correct but “ugly” code list_of_lists = [] for i in range(4): list_of_lists.append([]) The Pythonic code has only one line that utilizes list comprehension. # Pythonic code list_of_lists = [[] for _ in range(4)] 2. Open a file, reading a file Suppose you need to process the contents in a file, line by line. The following is the while-loop or forget closing the file. # “ugly” code file = open(‘some_file_name’) line = f.readline() while line: # do something with the line line = f.readline() # you may forget this file.close() # you may forget this In a Pythonic way, we use with which automatically close the file after usage, and we do a for-loop directly over the file. # Pythonic code with open(‘some_file_name’) as file: for line in file: # do something with the line 3. Chained comparison # “ugly” code if 0 <= x and x <= 100: x = x + 1 # Pythonic code if 0 <= x <= 100: x += 1 4. Conditional operator 7 of 9 CSCI 2040 A/B Lab Assignment 4 Page 8 # ``ugly'' code if 0 <= x and x <= 100: y = x + 1 else: y = x - 1 # Pythonic code y = x+1 if 0 <= x <= 100 else x-1 5. Multiple assignment # ``ugly'' code x = 1 y = 2 # Pythonic code x, y = 1, 2 More examples can be found in many online posts by searching “Pythonic”11. In this exercise, you need to write a function named get_final_grades in p4.py that takes the name of the grading file as the input, and returns a dictionary of the final grades for each student of the Python course. A prototype of your function can be def get_final_grades(filename='grades.csv'): return students_grades # a dictionary By default, the grades are recorded in an input file named grades.csv, in the same folder of your scripts. In this file, each line records the student ID and the grades of a student in the past lab assignments, which are separated by commas (that is called “CSV” file). For example, we have 3 students and 4 lab assignments, and the grades.csv has the following contents: SID1155000001,60,61,62.5,-1 SID1155000002,-1,70,75,73 SID1155000003,80,-1,87.5,-1 Here, if a student does not submit a lab assignment, his grade of that assignment is recorded as -1. For example, the student with SID 1155000001 for the first row has grades 60, 61 and 62.5 for the first three lab assignments respectively, and the “-1” indicates that this student does not submit the fourth lab assignment. The final grade of a student is calculated according to his/her performance on the four assignments. Because the difficulty of each assignment is different, we assign different weights 11https://medium.com/the-andela-way/idiomatic-python-coding-the-smart-way-cc560fa5f1d6 8 of 9 CSCI 2040 A/B Lab Assignment 4 Page 9 to each assignment and calculate the final score by weighted average. The weights for assignment 1, 2, 3, 4 are seperately 20%, 25%, 25%, 30%. If a student doesn’t submit one assignment, we regard his/her grade of that assignment as 0 when we do the calculation. For the above example, the average grade for student 1, 2, 3 are 42.875, 58.15, 37.875. The output is a list of the weighted average grades for each student (each number should be float which will be compared by the sample answer). The return value of get_average_grades for the above example should be a Python list: {'SID1155000001': 42.875, 'SID1155000002': 58.15, 'SID1155000003': 37.875} Your scripts should not contain any one of the above mentioned 5 kinds of “ugly” code. Your marks will be deducted by 4 for each kind of “ugly” code in your scripts. Your scripts can be in any style that does not contain the above mentioned “ugly” code, you are NOT necessarily required to use the Pythonic code. Save your script for this exercise in p4.py Submission rules 1. Please name the functions and script files with the exact names specified in this assign- ment and test all your scripts. Any script that has any wrong name or syntax error will not be marked. 2. For each group, please pack all your script files as a single archive named as __lab4.zip For example, 1155012345_1155054321_lab4.zip, i.e., just replace and with your own student IDs. If you are doing the assignment alone, just leave empty, e.g, 1155012345_lab4.zip. 3. Upload the zip file to your blackboard ( https://blackboard.cuhk.edu.hk), Only one member of each group needs to upload the archive file. Subject of your file should be __lab4 if you are in a two-person group or _lab4 if not. No later than 23:59 on Friday, Dec. 3, 2021 4. Students in the same group would get the same marks. Marks will be deducted if you do not follow the submission rules. Anyone/Anygroup who is caught plagiarizing would get 0 score! 9 of 9