python-HW 4 2022

10/28/22, 5:13 PM COSI 114a HW 4 2022 2 DURSbR[ PaSHU KWWSV://SaSHU.GURSbR[.FRP/GRF/COSI-114a-HW-4-2022-X1FLTLPMOLF6IYNK6PEPM 3/17 DQ PQV aVVeORV VQ UWbOKV HW 4; [QW caPPQV UWbOKV KV [eV. A OeUUage YKNN be RQUVed KP Vhe HW fQTWO VQ NeV [QW MPQY YheP UWbOKUUKQP KU ePabNed. PNeaUe dQPáV eOaKN VQ aUM YheP UWbOKUUKQP YKNN be ePabNed. Once submission is enabled, follow the HW submission instructions guide to submit [our assignment. GFOFSBM OPUFT: . You can alwa[s assume that [our code will be called with arguments of the correct t[pe b[ the tests. . TheTe UhQWNd be PQ RTKPV UVaVeOePVU aP[YheTe KP Vhe UQNWVKQPU VhaV [QW UWbOKV. Ever[ function should alwa[s return (or [ield) a value when called. . If [ou donát understand what a question is asking [ou to do, take a look at the tests to see if that clears it up. . Each step of this assignment builds on the previous. You can and should make use of [our functions in other functions. ThKU KPcNWdeU WUKPg fWPcVKQPU [QW KORNeOePVed KP RTeXKQWU aUUKgPOePVU. 0. OWFSWJFX BFGPSF ZPV CFHJO This assignment makes significant use of classes. If [ou are not comfortable with concepts like class, object, subclass, parent class, or inheritance, be sure to solidif[ [our understanding before proceeding too far in the assignment. Relevant resources for learning about classes include the books DKXe KPVQ P[VJQP 3 and LeaTPKPI P[VJQP, both of which can be found under Resources on LATTE. You can also read P[thonás own tutorial on classes, which can be ver[ helpful, even if a bit too detailed for this course. You are also welcome to come to student hours for help. DBUB The data for this assignment is a section of the Penn Treebank. The tests take care of loading the data for [ou, but if [ou want to write [our own tests, [ou can use to load the training data. This function (located in ) will return a generator of sentences which are represented as .a 10/28/22, 5:13 PM COSI 114a HW 4 2022 2 DURSbR[ PaSHU KWWSV://SaSHU.GURSbR[.FRP/GRF/COSI-114a-HW-4-2022-X1FLTLPMOLF6IYNK6PEPM 4/17 The class has alread[ been implemented for [ou, and serves as a container that holds each tokenás teZt and part-of-speech tag. This wa[, [ou have a single object with an eZplicit reference to both the teZt and POS tag of a given token, which frees [ou from having to use a tuple and tr[ing to remember whether indices 0 and 1 correspond to the token teZt and POS tag, etc. The following snippet illustrates how to work with the generator returned b[ , as well as with the class and its and instance attributes: TIF TBHHFS CMBTT is an abstract class, which means it implements some methods common to all taggers, but leaves other methods to be implemented b[ its subclasses. Because all taggers will all have a uniform and method, b[ putting them in , we can implement them just once in the base class and use them in all the subclasses.a 10/28/22, 5:13 PM COSI 114a HW 4 2022 2 DURSbR[ PaSHU KWWSV://SaSHU.GURSbR[.FRP/GRF/COSI-114a-HW-4-2022-X1FLTLPMOLF6IYNK6PEPM 5/17 The other tagger classes, such as ain the eZample below, will KPJeTKV from the base class. This means that the[ are subclasses of and will have its methods available to them as well. This sa[s that will inherit from , so it has the same and methods which it inherited. is a bit more compleZ. It inherits from and . You donát need to worr[ about what eZactl[ it means to inherit from , but ABC stands for abstract base class. In P[thon, inheriting from ABC is necessar[ to create an abstract class, one which has methods that are not implemented because their child classes are meant to implement those methods in different wa[s.a For eZample, is defined in with the decorator; this means [ouáll implement different versions of it in each of the taggers [ou create. The same goes for , which is onl[ implemented in ; [ou will be responsible for implementing it for and . Similarl[, both and will inherit from , which means the[ will use the method implemented in their parent class. So all [ouáll have to implement for the and is how to tag an individual sentence all other functionalit[ comes for free from the parent classes and . This ma[ seem complicated, but trust us, it saves [ou from having to write the same functions multiple times and also will make [our code easier to read and debug since thereás less redundanc[. The most important thing is that [ou follow the instructions regarding what to implement and alwa[s obe[ comments like and . 10/28/22, 5:13 PM COSI 114a HW 4 2022 2 DURSbR[ PaSHU KWWSV://SaSHU.GURSbR[.FRP/GRF/COSI-114a-HW-4-2022-X1FLTLPMOLF6IYNK6PEPM 6/17 To recap, below is a visual representation of how the different tagger classes in this assignment relate to each other. An arrow of the form denotes that the class is the parent of class , i.e. that inherits from . Note that inheritance is transitive: if aand both hold, then this implies that holds as well. Concretel[, since inherits from , and inherits from , we know that also inherits from as well. a VKUWaNKaVKQP Qf VJe cNaUU JKeTaTcJ[ fQT VJKU aUUKIPOePV. PSPWJEFE GVODUJPOT These are some helpful methods that weáve provided for [ou and that [ou should make use of in completing this assignment. Weáve provided the function for [ou. It will return negative infinit[ if given 0 and log(n) otherwise. Use this method instead of so [ou donát have to worr[ about handling log(0) which is undefined. 10/28/22, 5:13 PM COSI 114a HW 4 2022 2 DURSbR[ PaSHU KWWSV://SaSHU.GURSbR[.FRP/GRF/COSI-114a-HW-4-2022-X1FLTLPMOLF6IYNK6PEPM 7/17 Weáve provided [ou with which takes a dictionar[ of tag strings mapped to their scores (which can be an[ integers or floats) and will return the tuple: Weáve provided another useful function that gets the most frequent ke[ from a counter: ATTVNQUJPOT Throughout the entire assignment, [ou can assume the following: . Ever[ sentence passed as an argument contains at least one token. . Ever[ iterable of sentences passed as an argument contains at least one sentence. . For [our class, is alwa[s called before an[ of the methods that return probabilities or the tagset.a . For [our taggers, is never called without being called first. 1. MPTU FSFRVFOU TBH TBHHFS First, [ouáll implement a simple baseline tagger. This tagger will figure out the most frequent tag in the training data and tag ever[ single token with the most common tag. Needless to sa[, itás going to do ver[ poorl[ overall. 10/28/22, 5:13 PM COSI 114a HW 4 2022 2 DURSbR[ PaSHU KWWSV://SaSHU.GURSbR[.FRP/GRF/COSI-114a-HW-4-2022-X1FLTLPMOLF6IYNK6PEPM 8/17 Hereás what the methods should do: Count all the tags in sentences and store the most common tag in an attribute, for eZample For the sentence provided, use the most frequent tag to tag each token. The sentence is a where each string is the teZt of each token. You will return the a list of tags that is as long as the original list of tokens.a In the eZample below, assume that the most frequent tag in the training data was , (which is not a tag in the real data). Note: if [ou do not follow the instructions and instead write an inefficient method, [ou will fail grading tests. An eZample of an inefficient implementation is one that does an[ work to compute the most frequent tag outside of the method. 2. 6OJHSBN TBHHFS The unigram tagger provides a much stronger baseline. Rather than assigning ever[ token the same tag, it will assign ever[ token the most frequent tag associated with it in the training data. For eZample, if attack is seen as NN 10 times and VB 9 times in the training data, the unigram tagger will tag it as NN ever[ time, regardless of conteZt. Note that here and ever[where else in the assignment, [our handling of strings should be case-sensitive; attack and Attack are two completel[ different, unrelated tokens. Hereás what the methods should do: Count all the tags in the sentence and store what [ou need for the method. You will want to store a mapping from each token to the tag to use with 10/28/22, 5:13 PM COSI 114a HW 4 2022 2 DURSbR[ PaSHU KWWSV://SaSHU.GURSbR[.FRP/GRF/COSI-114a-HW-4-2022-X1FLTLPMOLF6IYNK6PEPM 9/17 that token. You will also want to keep the most frequent tag overall across all tokens. You do not need to (and should not) keep an[thing else. For each token in the sentence, Vag KV YKVh Vhe Vag VhaV aRReaTU OQUV QfVeP KP Vhe VTaKPKPg daVa fQT VhaV VQMeP. You must have determined what the most frequent tag for ever[ token is KP adXaPce KP Vhe OeVhQd. For tokens not seen in training, Vag VheO YKVh Vhe OQUV fTeSWePV Vag UeeP QXeT aNN Vhe VQMePU.a Implementation hint: this is probabl[ the onl[ time in this entire course that using the method on a dictionar[ can be used to make things easier. Note: if [ou do not follow the instructions and instead write an inefficient method, [ou will fail grading tests. An eZample of an inefficient implementation is one that does an[ work to compute the most frequent tag for each token outside of the method. 3. SFOUFODFCPVOUFS Ver[ similarl[ to HW 3, this class will store all the relevant counts needed to calculate the transition, emission and initial probabilities for an HMM tagger. Since [ou will use Lidstone smoothing for emission probabilities (but not transition probabilities), the class will have as a parameter in the method. Your bigram taggers will each have an instance of this class and will call the method to train.a All of the probabilities returned b[ this class will be standard probabilities, not log probabilities. You need to implement the following methods: Iterate through all the sentences and count all relevant tokens and tags. All computationall[-eZpensive work should be done here. Do not store the original sentences. 10/28/22, 5:13 PM COSI 114a HW 4 2022 2 DURSbR[ PaSHU KWWSV://SaSHU.GURSbR[.FRP/GRF/COSI-114a-HW-4-2022-X1FLTLPMOLF6IYNK6PEPM 10/17 Return the Lidstone-smoothed emission probabilit[ for the tag and token specified. It should be computed using the following formula: N+V k Count(5t)+k Count(5 t) is the number of times the token Y was seen with the tag V in training. N is the total number of times the tag was seen in training. V is the vocabular[ for VhKU Vag, the number of unique words it was seen with in training. NQVe VhaV VhKU KU XeT[ dKffeTePV VhaP Vhe KORNeOePVaVKQP Qf UOQQVhKPg fQT Vhe PaKXe Ba[eU aUUKgPOePV, YheTe Vhe XQcabWNaT[ KU UhaTed acTQUU cNaUUeU. (Footnote for nitpickers: given that the emission probabilit[ can be computed using an unbounded vocabular[ (unlike the restricted feature set in the naive Ba[es assignment), [ou might have noticed that this is actuall[ a degenerate probabilit[ distribution. The total probabilit[ over the potentiall[ infinite vocabular[ sums to greater than one. One wa[ to reconcile this is to pretend ever[ word not seen with a given tag during training is actuall[ the special ero-count word UNK (unknown), and add one to the vocabular[ to account for it. Empiricall[, doing so has no effect on tagger performance, so we left it out of the assignment. A better approach would be to tune the pseudo-count for UNK using held-out data, but that requires even more effort. Regardless, just do what the assignment tells [ou, and donát call the probabilit[ police on the course staff.) Return the (unsmoothed) transition probabilit[ between two tags. Return the (unsmoothed) probabilit[ of a tag being the initial tag in a sequence. Return a sorted list of all the unique tags. Because this method is ke[ to the deterministic behavior of [our tagger, we have implemented the sorting method for [ou. You need to call with a Counter that has counted all of the tags encountered in the input to the a method. For eZample, 10/28/22, 5:13 PM COSI 114a HW 4 2022 2 DURSbR[ PaSHU KWWSV://SaSHU.GURSbR[.FRP/GRF/COSI-114a-HW-4-2022-X1FLTLPMOLF6IYNK6PEPM 11/17 . YQW OWUV cQORWVe VhKU XaNWe KP Vhe OeVhQd aPd TeVWTP KV KP Vhe OeVhQd. If [ou perform this sort ever[ time is called, [ou will fail timed tests. You need to avoid ero division errors b[ checking if the denominator is ero, and if it is returning 0.0 for and . cannot cause a abecause [ou can assume [ou saw at least one non- empt[ sequence during training. You should ensure that no matter what strings are provided for the tags or tokens (even if the[ were never observed in the training data), these three functions will never crash due to a or a due to a failed dictionar[ lookup. All of the methods as well as OWUV be KORNeOePVed KP cQPUVaPV (O(1)) VKOe This means that the[ should not contain aP[ loops or calls to the function or method. As in past assignments, [ou should compute whatever [ou need inside the call to so that [ou onl[ have to compute it once. Note that if [ou do not implement these methods in constant time, [ou will fail the test, which is a timed test, and ma[ fail additional private tests. 4. BJHSBN TBHHFST` You will implement two bigram HMM taggers: and . a For all calculations in [our bigram taggers, [ou must work in the log space for numerical stabilit[, b[ adding log probabilities rather than multipl[ing probabilities. Use awhich was provided to [ou as it handles for [ou (b[ returning negative infinit[) and uses the correct base for the log (base e).a The OeVhQd Since both and inherit from , the method has alread[ been implemented for [ou in the class. You should not modif[ that implementation in an[ wa[ nor 10/28/22, 5:13 PM COSI 114a HW 4 2022 2 DURSbR[ PaSHU KWWSV://SaSHU.GURSbR[.FRP/GRF/COSI-114a-HW-4-2022-X1FLTLPMOLF6IYNK6PEPM 12/17 implement [our own functions in either of or . Simpl[ leave the alread[ implemented as-is and use it in the subclasses.a When the tests create / objects, the[ pass in as an argument a value : The method will then take this value and pass it to the that the tagger uses internall[. Note that [our code must work when a ero value has been provided (unlike the naive Ba[es homework). TTaKPKPg, cQWPVU aPd RTQbabKNKVKeU As mentioned above, the method of has alread[ been implemented for [ou, and simpl[ consists of calling which collects the counts needed for calculating probabilities. Do not change this, and do not call or directl[ in an[ of [our own code.a Instead, in [our implementations of for both bigram taggers, [ou should use the probabilit[ methods that [ou implemented in . You can access these probabilit[ methods like so: DeVeTOKPKUO TeUVU When decoding a tag sequence using greed[ or Viterbi decoding, [ou should make sure [our solution is deVeTOKPKUVKc in the event of a tie between scores.a DeVeTOKPKUVKc means that [ou will get the same results if [ou run multiple times. aBut more than just being deterministic, [our code should predictabl[ resolve ties such 10/28/22, 5:13 PM COSI 114a HW 4 2022 2 DURSbR[ PaSHU KWWSV://SaSHU.GURSbR[.FRP/GRF/COSI-114a-HW-4-2022-X1FLTLPMOLF6IYNK6PEPM 13/17 that if multiple tags have the same score, the one that is sorted first b[ will be chosen. Doing this is actuall[ eZtremel[ simple:a Whenever [ou iterate over states, ensure the[ are in the sorted order specified b[ ás method.a Make sure that when [ou insert scores into a dict or list, [ou do so in the order of the sorted labels. If [ou use , it will pick the first element with the top score, so if [ou have inserted into the dictionar[ in the right order, ever[thing will work out well.a If [ou implement [our solution well, all this is ero eZtra work. YQW YKNN Peed VQ VeUV YheVheT [QWT cQde behaXeU deVeTOKPKUVKcaNN[; the public tests will not do it for [ou. SFRVFODF PSPCBCJMJUZ In this section, [our task is to implement in which serves as the parent class of both and : The method should take a sentence (a sequence of token strings) and its tags (a sequence of tag strings) as arguments and compute the log probabilit[ of the entire sequence of the tags given the tokens. You can assume that and are the same length and that length is greater than or equal to one. Make sure to use , which will ensure that if the probabilit[ is ero, negative infinit[ ( ) is returned. (Technicall[, the log of a ero probabilit[ is undefined, but negative infinit[ is much easier to work with.) Below is an eZample for finding the sequence probabilit[ of the sentence foo bar where foo has tag A and bar has tag B : 10/28/22, 5:13 PM COSI 114a HW 4 2022 2 DURSbR[ PaSHU KWWSV://SaSHU.GURSbR[.FRP/GRF/COSI-114a-HW-4-2022-X1FLTLPMOLF6IYNK6PEPM 14/17 Since is an abstract class, this function can onl[ be called from an instance of or . If [ou want to write [our own tests for this method, instantiate one of those classes. You will get an error if [ou tr[ to create an instance of , as it is an abstract class and canát be instantiated. To compute the sequence probabilit[, use the equations discussed in class (HMM lecture). You should be able to get all values needed from . You can assume that the method has been called before . GSFFEZ BJHSBN TBHHFS This tagger is greed[ in the sense that it makes a decision for which tag to output at the current point without considering the whole tag sequence the wa[ Viterbi will. For this tagger, [ou just need to implement its method.a For each token, choose the tag with the maZ probabilit[ based on the transition probabilit[ from the previous tokenás best tag and emission probabilit[. Like other methods, [ou should return the tag sequence as a list of string tags. For eZample, [ou will tag the first token in the sequence based on the initial and emission probabilit[ for the token for each class. Once [ou have selected the first tag, [ou will eZamine the neZt token and select its tag based on the probabilit[ of transitioning from the previous tag and emitting the current token. You will repeat until [ou have tagged the whole sentence. Note: if [ou do not follow the instructions and instead write an inefficient method, [ou will fail grading tests. 7JUFSCJ BJHSBN TBHHFS For this tagger [ou also just need to implement , which will be the Viterbi algorithm: 10/28/22, 5:13 PM COSI 114a HW 4 2022 2 DURSbR[ PaSHU KWWSV://SaSHU.GURSbR[.FRP/GRF/COSI-114a-HW-4-2022-X1FLTLPMOLF6IYNK6PEPM 15/17 You ma[ want to implement helper methods for [our Viterbi algorithm. Specificall[, [ouáll want to setup a wa[ of tracking the scores for each state of each token and also a method of tracking back pointers.a First [ouáll start with the initial probabilities for the first token of the sentence. Then for each of the rest of the tokens in the sentence, [ouáll fill in the table with the probabilities while keeping track of backpointers at each step. At the end [ouáll need to follow [our back pointers in order to return the best path.a For more on Viterbi decoding, the relevant lecture slides, teZtbook reading, and the video posted on LATTE. HKPVU: . Viterbi decoding is ver[ difficult to debug. Itás much easier to spend a lot of time working it out on paper and then writing correct code, as opposed to not working it out and writing incorrect code and tr[ing to figure out whatás wrong with it. . The algorithm described in the teZtbook (and elsewhere) for Viterbi decoding uses arra[s for all data structures. This requires turning tags into indices, which makes debugging harder. Instead, we recommend that [ou focus on storing scores in dictionaries that go from tags to their scores, keeping a list of dictionaries (one per sequence position). . Hereás how the HW solution’s Viterbi decoding works. It is recommended that [ou follow this structure: a. It keeps two main data structures. One is the backpointers, which is a , which keeps the backpointer for the best score at each state for each position in the sequence. The other is the best scores, which is a which tracks the best score for each tag at each position. b. It initialies the best scores for position 0 using the initial probabilities and emission probabilities. (The backpointers for position 0 are not used, since there is no previous position.) c. Then, for each position in the sequence, it loops over each potential tag assignment. For each tag assignment, it creates a dictionar[ of the scores for the score at the position based on the previous tags. It finds the maZimum score (using ), stores the score in best scores, and the corresponding previous state in the backpointers. 10/28/22, 5:13 PM COSI 114a HW 4 2022 2 DURSbR[ PaSHU KWWSV://SaSHU.GURSbR[.FRP/GRF/COSI-114a-HW-4-2022-X1FLTLPMOLF6IYNK6PEPM 16/17 d. At the end of the sequence, it uses to select the best-scoring final state, and then works backwards through the backpointers to get the sequence tags. (For convenience, it actuall[ builds a backwards list of tags, and then reverses it.) . If [ou arenát sure whether [our Viterbi implementation is computing the scores correctl[, check them using [our function. Note: if [ou do not follow the instructions and instead write an inefficient method, [ou will fail grading tests. FAQ CaP I YTKVe O[ QYP cWUVQO fKNeU Your final submission must be contained in a UKPgNe fKNe with the filename .a That being said, [ou caP write [our own files for debugging purposes if [ou wish. This is better than putting random testing code in [our main file, which [ou then have to remember to remove. However, donát ever attempt to write parts of the solution in an[thing but [our main file. CaP I KORQTV NKbTaT[ X The following imports (and nothing else) are allowed: a You should import from (we have alread[ done so in the stub file we provide). We provide a wrapper function called , so [ou should not call directl[. You should import containers from the astandard librar[ module (e.g., a , ). You ma[ import whatever [ou need from the module to support t[pe annotations. You ma[ import an[thing from , , and . YQW caPPQV KORQTV . From vast eZperience, we can tell [ou that most students who tr[ to use nump[ onl[ end up making their code slower, harder to read, and harder to debug than the approach we recommend (which is based around dictionaries). For non-vectoried operations, nump[ arra[ access is often slower than P[thon lists, and ver[ little of HMMs can be vectoried easil[. 10/28/22, 5:13 PM COSI 114a HW 4 2022 2 DURSbR[ PaSHU KWWSV://SaSHU.GURSbR[.FRP/GRF/COSI-114a-HW-4-2022-X1FLTLPMOLF6IYNK6PEPM 17/17 The code we provide [ou imports from for abstract base classes. CaP I YTKVe O[ QYP fWPcVKQPU aPd/QT OeVhQdU Yes This is in fact highl[ encouraged to keep [our solution organied. In general, [ou should consider writing functions that bundle together frequentl[-used logic so [ou can avoid repeating it multiple times in [our solution. For eZample, if [ou need to iterate over bigrams, [ou should consider writing a function like and then calling that function elsewhere in [our code as needed.a Note how this also makes debugging easier: if something is wrong with [our implementation, [ou now have to onl[ fiZ a single function, , instead of having to fiZ ever[ section of the code where [ouáve repeated the same n-gram generation logic. Importantl[, even though [ou are encouraged to write [our own helper functions/methods, [QW caPPQV OQdKf[ cQde Ye haXe aNTead[ KORNeOePVed fQT [QW, as this ma[ break the logic used in testing [our code. OCMJHBUPSZ DMPTJOH NFNF a TTaXeNNKPI SaNeUOaP PTQbNeO