化学-CHEM3121

SCHOOL OF CHEMISTRY SENIOR CHEMISTRY CHEM3121 Chemical Biology Generic skills plus PROJECT L: Computing in Chemical Biology EXPERIMENT L1: ANALYSING REACTION KINETICS – AN INTRODUCTION TO COMPUTER DATA ANALYSIS EXPERIMENT L2: STEADY-STATE ENZYME KINETICS EXPERIMENT L3: BIOINFORMATICS 2022 1 Project L Computing in Chemical Biology CONTENTS Page L1 Analysing Reaction Kinetics – An Introduction to Computer Data Analysis 2 L2 Steady-State Enzyme Kinetics 7 L3 Further Investigations of Computing in Chemical Biology – Bioinformatics 16 – Your Own Investigations 19 AdobeStock – USYD licence 2 3 EXPERIMENT L1: ANALYSING REACTION KINETICS – AN INTRODUCTION TO COMPUTER DATA ANALYSIS Aim The aim of this experiment is use computer methods to investigate reaction kinetics. Introduction and Background This experiment is designed to familiarise you with the program Excel. It is also aimed to introduce you to the subject of chemical kinetics, which allows one to study how chemical reactions occur. To complete this computer-based experiment, you will need access to Excel and a copy of the files containing the sample data. Submission of this experiment will be in the form of an electronic notebook submissions which can be accessed via the Canvas page for this project. Experimental chemical kinetics usually proceeds by analysing reactant or product concentrations as a reaction proceeds by the integrated rate law method. The integrated rate law expresses the concentration of one reactant or product as a function of time. It may be derived from the differential rate law (rate as a function of concentration) by integration. Two examples are shown below: First order reaction: Second order reaction: A → B 2A → B Differential Rate Law: [] = [] 1 2 [] = []2 ∫ [][][][]0 = ∫ 0 ∫ [][]2[][]0 = 2 ∫ 0 |[]| [] []0 = ||0 1[] []0[] = 2||0 Integrated Rate Law: [] = []0 1[] = 1[]0 + 2 Before you start the experiment answer the following question. How should the experimental kinetic data be plotted to obtain a straight line and obtain the value of the rate constant for: 4 a) a first order reaction b) a second order reaction The reaction being studied is 2A + B → 2P where Rate = k[A]a[B]b (1) However, in this case, the concentration of substance B is large enough that it does not change significantly during the reaction. Hence, the rate law can be written as: Rate = kobs[A]a (2) where kobs is called the pseudo-rate constant and is equal to k[B]b. The aim of the exercise is to use the supplied experimental data to determine the reaction order with respect to reagent A (that is, “a”). To do this we use the integrated rate equations given in the introduction and analyse the data to see which form of plotting the data yields a straight line (see the question at the end of the introduction). The graphs generated will allow a determination of the order of reaction with respect to this species. Objectives 1. To use the supplied “experimental data” (computer-generated) to determine the rate law for the reaction given below. 2A + B → 2P where Rate = k[A]a[B]b Based on your analysis of the results, propose an acceptable mechanism for this reaction. 2. To become familiar with the use of a spreadsheet to manipulate, plot and perform statistical analysis of experimental data. Experimental Procedure 1. Accessing your data Two data files, CKDAT1.XLS and CKDAT2.XLS, contain the experimental data to be used in this exercise. These are available for download from the Experimental Data page for Project L. The sample data were collected by adding a large excess of substance B to reagent A, and subsequently monitoring the concentration of reagent A for the duration of the study. Each data set represents a different concentration of substance B and contains two columns of numbers. The first column is the time (in seconds) at which the measurement was taken, and the second column is the concentration (moles per litre) of A at that time. Before starting the data analysis make sure that you have the data analysis ToolPak installed in your version of Excel. If you do, there will be a Data Analysis box to the far right of the horizontal toolbar when you click on the Data tab. If it’s not there, follow the following steps to install it: i. Click on the File tab 5 ii. Click More iii. Click Options iv. Click Add-Ins v. Select Analysis ToolPak and click Go vi. Click OK. Data Analysis should now appear when you are on the Data tab. 2. Generating the Concentration versus Time Plots to Calculate kobs Procedure for More Experienced Users of Excel 1. Start Excel 2. Open the data files CKDAT1.XLS. 3. In a new column, calculate the values for the natural logarithm of the sample concentration data. 4. In another new column, calculate the values for the inverse of the sample concentration data. 5. Make plots of [A] vs time, ln[A] vs time, and 1/[A] vs time. Ensure these plots are appropriately labelled, including axis labels, they will need to be submitted as part of your report. 6. By examining which plot is linear determine the order of reaction with respect to A 7. Perform a linear regression analysis of the linear data (using Data; Data Analysis; Regression) and plot the line of best fit through the data (using Trendline). 8. Record the slope of the line. (Note that in this case the error has no meaning because the data have been artificially computer generated, but for your own experimental data you should always quote errors on calculations performed using your experimental data.) 9. Determine the value of kobs for this data set, include units. 10. Repeat steps 2-9 for the other data file, CKDAT2.XLS. 11. Save your Excel files so that you can insert your charts and data analyses into your report. Procedure for More Experienced Users of Excel 1. Start Excel 2. Open the data files CKDAT1.XLS. Double click on the data files CKDAT1.XL, found on the list of files given. 3. In a new column, calculate the values for the natural logarithm of the sample concentration data. Select the first cell of the column where the new data are to go. Type ‘=ln(B2)’ where B2 is the cell containing the first row of sample data, and press ENTER. 6 To make the other cells of the column contain the same function as the one you just created, you need to select the cell containing the function, grab the bottom right corner where a small black square is visible (you will notice that the pointer changes to a black plus sign when this is possible) and drag the pointer down the column. Each cell “dragged over” will now contain that same function but of the sample data in the same row as the cell dragged over. 4. In the next column, input the values for the reciprocal of the sample data. As before, to do this select the first cell of the column where the data are to go. Type ‘=1/(B2)’ where B2 is the cell containing the first row of sample data, and press ENTER. Grab and drag this cell down the column as you did for the previous step to make each cell of this column contain the same function. 5. Make plots of [A] vs time, ln[A] vs time, and 1/[A] vs time. Ensure these plots are appropriately labelled, including axis labels, they will need to be submitted as part of your report. To do this, click and drag over the data values that you wish to plot. When you need to plot x and y values which aren’t in adjacent columns, click and drag over the x values (in this case time), press and hold the ctrl key on the keyboard down as you click and drag over the y values. Select the Insert tab and in the Chart box, select the Scatter X- Y option, followed by the Scatter option. The time should be in the first column from cells 2 to 61, the concentration of A in the second column from cells 2 to 61, and so on. Click on the chart (outside of the actual graph) and click the + sign at the top righthand corner of the chart. Click axis and chart titles as desired. Then click on the words “Chart title” and “axis title” to edit the names (e.g., x axis = time, y axis = [A]). Adjust the size and placement of the chart as required for easy viewing. This is achieved by clicking on the chart (outside the graph area). Small open circles appear at the four corners and in the centre of each side. Grab one of these and drag it until the desired size is obtained. To move the whole chart, click on the chart, grab the chart (not the open circles) and drag it to its new location. ‘ 6. By examining which plot is linear determine the order of reaction with respect to A 7. Perform a linear regression analysis of the linear data to determine the slope of the line. One graph ([A], ln[A], or 1/[A]) will provide a straight line when plotted against time. To draw a line of best fit through the data, click on a data point on the graph. (The whole data set should light up.) Then right click on the data point and select Add Trendline. From the Format Trendline Options window that opens select the type of line you want (e.g., solid, dashed) and under Trendline Options select Linear, Display Equation on Chart, and Display R-squared value on chart. Close the Format Trendline window. 7 The ADD TRENDLINE feature of the program does not perform a comprehensive linear regression analysis of the data. To do this and obtain the value of the slope click on the DATA tab, followed by DATA ANALYSIS and REGRESSION. Click OK. Insert the X and Y ranges of the data you wish to fit by clicking and dragging down the respective columns of data. Select OUTPUT RANGE and click and drag over the cells where you wish the output to appear. Click OK. The intercept and slope (X variable) as well as their calculated errors are given in the bottom lefthand corner of the analysis. 8. Record the slope of the line. (Note that in this case the error has no meaning because the data have been artificially computer generated, but for your own experimental data you should always quote errors on calculations performed using your experimental data.) 9. Determine the value of kobs for this data set, include units. 10. Repeat steps 2-9 for the other data file, CKDAT2.XLS. 11. Save your Excel files so that you can insert your charts and data analyses into your report. 3. Order of Reaction with Respect to Reagent B The two pseudo-rate constants obtained part 2 of this exercise have different values because the concentration of B was different in each case. Both pseudo-rate constants are related to the “true” rate constant via the order of the reaction with respect to B. In the present exercise, the concentration of B used to collect the data in CKDAT2 was 3.00 times that used in CKDAT1. From eqn. 2, recall that kobs = k[B]b (3) From the ratio of the two kobs values determine the order of the reaction with respect to B. Now determine the rate law for the reaction by substituting the appropriate numbers into eqn. 1. 4. The Reaction Mechanism One of the most important uses of kinetics is to discriminate between possible reaction mechanisms. Based upon the experimental rate law just determined, discuss the appropriateness of the following mechanistic schemes to this reaction. Which is most consistent a) A + B → P + Q slow Q + A → P fast b) A + B Q fast equilibrium Q + A → 2P slow c) A + A → Q slow Q + B → 2P fast d) B + B → Q slow 8 Q + 2A → R + P fast R → P + B fast e) A + A Q fast equilibrium Q → P + R slow R + B → P fast This discussion should be added to the ‘Question’ tab of the electronic notebook for submission. 9 EXPERIMENT L2: STEADY-STATE ENZYME KINETICS Aim The aim of this experiment is use investigate the concept of steady-state kinetics using computational methods. Introduction In Experiment L1 you considered the kinetics of fairly simple reactions. When the reaction mechanism becomes more complex, sometimes with several transient and difficult-to-measure intermediates, the time dependence of the concentration of any species (including the products that you may be interested in) can be complex. One objective of kinetic analysis is, of course, to allow you to predict and optimise the concentration of the species of interest under a variety of experimental conditions (including temperature, initial concentrations, etc.). Unfortunately, as the mechanism becomes more complex, so does the mathematical treatment required. In fact, for many complex mechanisms the kinetic equations cannot be solved analytically. Fortunately, there are several situations where the complexity is simplified, leading to a better understanding of the important features of the reaction, and allowing prediction of optimal conditions. You should have come across at least one of these situations in First Year Chemistry –the “Rate Determining Step”. Other situations include the “Steady State Approximation” and the “Pre- equilibrium Condition” (which you came across at the end of Module 1). In this module, you will carry out computer simulations to study a two-step reaction mechanism. Via the simulations you will further explore kinetic concepts such as the steady state approximation and the rate-determining step and investigate under what conditions these approximations are valid. Reactions Going to Completion In Module L1, you considered reactions where the reverse reaction step was insignificant. The following discussion provides a brief summary: A reaction that goes to completion can be represented as: 1 where S represents the Starting compound and P the Product(s). In this case the reverse reaction is considered to be extremely slow in comparison to the forward reaction, so that the reverse reaction can be neglected. The rate of loss of S and gain of P can be expressed as: [] = 1[] [] = 1[] (1a/b) If the initial concentration of S is [S]0 and the initial concentration of P is zero, then the time dependence of [S] and [P] is obtained by integrating (1a) and considering that [P]t = [S]0 – [S]t to obtain: 10 [S] [S]t k te= 0 1 ( )[P] [S]t k te= 0 1 1 (2a/b) Reversible Reactions Approaching Equilibrium In the previous section, only reactions where the reverse reaction step was insignificant were considered. As you might have realised, reversible reactions are more common. Let us now consider the reversible first-order equilibrium reaction 1 1 = [][] where both the forward and reverse reactions are important, and Kc is the equilibrium constant for the reaction. The rate of change of S has two contributions, depletion through the forward reaction and replenishment through the reverse reaction. The net rate of change in S is therefore: [] = 1[] + 1[] (3) where k1 and k-1 are the rate constants for the forward and reverse reactions, respectively. If the initial concentration of S is equal to [S]0 and there are no products present at the start of the reaction, then at all times [P] = [S]0 – [S] (providing of course that the volume remains constant). Consequently, (3) can be written as: [] = 1[] + 1{[]0 []} = (1 + 1)[] + 1[]0 (4) The solution of this first-order differential equation is [] = []0 1 + 1 (1+-1) 1 + 1 (5) At equilibrium, the rate of change in the concentration of a species is zero. In other words, from (3), [] = 0 (6) and therefore, 1[] = 1[] (7) That is, at equilibrium, the rates of the forward and reverse reactions are equal. From (7) and the definition of the equilibrium constant, it is easily shown that Kc is related to the rate constants by a simple expression: = 1 1 (8) When an overall reaction is the sum of a sequence of reversible reactions, the overall equilibrium constant is simply the product of the equilibrium constants for each component step. 11 = 123….. 1 2 3…. (9) Consecutive Reactions Many reactions proceed through the formation of intermediates. Consider the general consecutive first-order reaction 1 2 The concentrations of substances S, C and P change at rates according to: [] = 1[] (10) [] = 1[] 2[] (11) [] = 2[] (12) If [S]0 is the initial concentration of S, then solution of the coupled series of differential equations (10- 12) yields: [] = []0 1 (13) [] = 1[]0 1 22 1 (14) [] = []0 1 + 1 2 2 12 1 (15) (You can see how quickly the kinetic equations become complex for even a minor complication in the reaction mechanism.) The Rate-Determining Step When either k1 >> k2 or k2 >> k1, (15) can be approximated by a much simpler form: k1 >> k2 [] = []0 1 2 (16a) k2 >> k1 [] = []0 1 1 (16b) Note that these equations are identical to (2b). That is, the overall kinetics of P production resemble a simple one-step mechanism with a rate constant equal to the smaller of k1 and k2 (i.e. the rate determining step). Normally one of the rate constants should be at least an order of magnitude (i.e., a factor of 10) greater than the other to be considered as the sole rate-determining step and to justify the approximations given by (16a) or (16b). Under these conditions the true amount of product formed, as given by the more exact expression (15), is within approximately 10% of that estimated by equations (16a) or (16b). 12 The Steady-State Approximation The full kinetic equations of multi-step reactions can be very complex. Often, however, approximations can be made to simplify the mathematics, allowing the important parameters in the rate of product formation to be identified. One such simplification has already been discussed in the recognition of the rate-determining step. Another common simplifying assumption is the steady-state approximation (SSA). The steady-state approximation concerns the concentration of an intermediate species, where it is assumed that “for the major part of the duration of the reaction the concentrations of all reactive intermediates are constant”. This assumption is used to simplify the equation of the kinetics of product formation by excluding the intermediate concentrations from the final expression. Mathematically, the SSA can be written as : [] ≈ 0 (17) Pre-Equilibria One application of the SSA is to examine consecutive reactions where the intermediates are in equilibrium with the starting reactants. Such a reaction can be written as follows: 1 1 2 Note that S could represent more than a single reactant and P more than a single product molecule. The rate of formation of P is given by: [] = 2 (18) and the rate of change of [S] is given by: [S] = 1[S] + 1[C] (19) Often the concentration of the intermediate, [C], is difficult to measure, and so (18) and (19) are not very useful. An alternate expression for d[P]/dt can be obtained using the SSA (17) as follows: [C] = 1[S] 1[C] 2[C] (20) Applying the SSA equation means that (20) can be set to zero, and then rearranging to make [C] the subject gives: [] 1[] 1+ 2 (21) (21) can now be substituted into (18) to yield the expression for d[P]/dt: [] = 12[] 1+ 2 (22) = k′ [S] where = 12 1+2 (23) This is in exactly the same form as (1b), with the same solution, i.e. 13 [] = []0 1 (2b′) [This is of course why we look for these approximations; so that we can simplify the maths and provide a simpler physical explanation of the reaction. Remember that (21) will only be valid when the SSA conditions are met (i.e. when d[C]/dt = 0)]. Experimental Procedure 1. Berkeley Madonna To check the validity of the SSA approximation you need calculate the time dependence of the concentrations of S, C and P to determine under what conditions the concentration of C is constant. To do this you must calculate the concentrations of all of the species without making the steady state approximation. To do that it is necessary to carry out a numerical integration of the set of simultaneous differential equations (18), (19) and (20). This can be done via the program Berkeley Madonna, which was developed by Robert Macey and George Oster at the University of California Berkeley. (We don’t know the origin of the “Madonna”, but the first version of the program was developed in the 1990’s or perhaps even earlier, so one or both of the inventors could have been fans of the singer.) To download the latest version of Berkeley Madonna (version 10) onto your computer go to the website https://berkeley-madonna.myshopify.com/pages/download and download the MacOS or Windows version, whichever is appropriate for your computer. Automatically you will then have the demo or trial version of the software. To obtain the full version of the software, you would have to register with Berkeley Madonna and pay for a license. However, for the purposes of this exercise, the demo version is sufficient. The Berkeley-Madonna program incorporates different numerical integration techniques which have been worked out by mathematicians to integrate a series of couple differential equations. These include the following algorithms: 1. Euler’s method (Euler) 2. Runge-Kutta 2 (RK2) 3. Runge-Kutta 4 (RK4) 4. Runge-Kutta 5 (Auto) 5. Rosenbrock (Stiff) Methods 1-3 utilise a fixed step-size, i.e., they solve the set of coupled differential equations describing the mechanism at fixed time intervals. The Auto and Stiff methods (4 and 5) use a variable step size which is automatically adjusted, so that a large time interval is used when there is a slow change in the concentrations of the various species with time and a small-time interval is used when the concentrations are changing rapidly. Stiff sets of coupled differential equations are defined as those in which some of the variables (i.e., concentrations in this case) are changing rapidly with time and others are changing slowly. The Rosenbrock method was designed specifically to deal with such systems. (Note: It is not the sole purpose of Berkeley Madonna to solve differential equations. You 14 can use it to simulate any equation, y = f(x). You just need to replace time in the program with x. You may find this very useful in the future.) Before writing a program in Berkeley-Madonna to simulate the reaction given above, for the purposes of understanding the syntax of the program language let’s consider a simpler system: For this simple mechanism a typical Berkeley-Madonna program is given below to determine the concentrations of A and B as a function of time using the Runge-Kutta 4 method. To test that Berkeley Madonna is working on your computer, open the program by double- clicking on the Berkeley Madonna shortcut icon on your computer desktop. Select File from the top ribbon bar and New Document from the dropdown menu. Type the program given below into the Berkeley Madonna’s equation window on the left-hand side of the screen. Alternatively, you can copy the equations into a Notepad text document (.txt file) and use Paste from the dropdown menu of the Edit option on Berkeley Madonna’s ribbon bar to enter the program. The new version of Berkeley Madonna does not allow you to copy and paste directly from a Word document if you are using the Windows operating system. Once you’ve entered the program click on Run on the run window on the right-hand side of the screen. If you have entered everything correctly a graph of the concentrations of A and B should appear in the centre of the screen. Clicking on table on the ribbon bar of the graph window will give you a table of the data points. If you click and drag across the table and then select Copy from the dropdown menu of Edit on the main ribbon you can then paste the data into another program, e.g. Excel, for formatting and exporting to your lab report. Program METHOD RK4 STARTTIME = 0 STOPTIME = 10 DT = 0.02 d/dt (A) = -ka*A + kb*B d/dt (B) = -kb*B + ka*A init A = 100 init B = 0 LIMIT A >= 0 LIMIT A <= 100 15 LIMIT B >= 0 LIMIT B <= 100 ka = 1 kb = 1 Many of the program lines are almost self-explanatory. The first line (METHOD RK4) defines the method to be used to solve the differential rate equations. The commands STARTTIME and STOPTIME simply define when the calculation should start and stop, i.e., in this case starting at time = 0 and stopping after 10 seconds. DT defines the time interval for integration of the differential rate equations, i.e., in this case after every 0.02 seconds. The next two lines are the differential rate equations for the species A and B. It is important to write an equation for every single species involved in the reaction mechanism. The command init defines the initial concentrations of each species (in whichever concentration units you desire). However, for second order reactions it is important that the units of the concentrations, rate constants and time are consistent. For example, if the concentration is entered in M and the time is seconds, a second order rate constant must be in units of M-1 s-1, not M-1 min-1 or mM-1 s-1. The program lines containing the command LIMIT define the upper and lower limits of the concentrations of each of the species. These are based on the mechanism and the law of conservation of mass, i.e., for this particular mechanism the total concentration of A and B cannot exceed 100, because the initial concentration of A is only 100. It should also be clear that a negative concentration makes no sense. The LIMIT lines are not absolutely necessary, but for complicated mechanisms they can be useful in preventing the numerical method from trying values of the concentrations which may represent a legitimate mathematical solution to the equations but make no physical sense. The final two lines merely specify the values of the rate constants for the mechanism. Based on this simple example, now try and write your own Berkeley-Madonna program for the two- step mechanism at the beginning of this section and its associated differential rate equations (18) – (20). Use a total simulation time of 1 second and a time interval of 0.002 seconds and values of k1 = 10 s-1, k2 = 0.1 s-1 and [S]0 = 1 M, which have been chosen arbitrarily just so everyone uses the same values for the calculation. The initial value of k-1 that you use isn’t important, but later we want to consider the three situations: k-1 >> k2 k-1 ≈ k2 k-1 << k2 Once you have typed in your program you can run it by clicking on the Run button. If your program successfully runs, a graph should appear showing the time course of the concentration of each of species S, C and P. If you wish to change the appearance of the lines, there is an option you can click on the toolbar of the Graph Window. You need to show one of the demonstrators a copy of your code and that you have successfully run this software by sharing your screen within the zoom breakout room; this will be marked off as part of the assessment. 16 Vary the value of k-1 within your program to consider the three conditions: k-1 >> k2 k-1 ≈ k2 k-1 << k2 From the time course of [C] determine under which condition the steady state approximation is obeyed. Why does this condition yield the best agreement For each condition, produce a plot of the concentrations of S, C and P versus time to include in your presentation. To do this you need to click on the option Table on the top ribbon bar of the graph window, so that you can see the concentration values of S, C and P and the corresponding times. For the plotting you can use Excel or any other freely available plotting program, such as SciDAVis. Apart from looking at the time course of [C], another test of the steady-state approximation is to choose a particular time point in your simulations and calculate [P] from (23) and (2b′), i.e., the predicted value of [P] based on the steady-state approximation. Then compare this to the value of [P] calculated in each of your simulations. Decide which simulation yields the best agreement and why does this simulation yield the best agreement 2. Michaelis-Menten Kinetics A special case of the SSA is very frequently applied in the chemistry of living systems. Most enzyme- catalysed reactions rely on a pre-equilibrium step; the first step is the reversible reaction between an enzyme (E) and substrate (S) to form an activated complex (C), which then reacts irreversibly to give the product (P). + 1 1 2 + The change in concentration of the intermediate complex with time can be expressed as: = 1[][] 1[] 2[] (24) Using the same steady-state approximation as previously (i.e., d[C]/dt = 0), this scheme leads to an equation practically identical to (21), hence: [] = 1[][] 1+ 2 (25) and [] = 2[] = 12[][] 1 + 2 (26) where [E] is the concentration of free enzyme. Unfortunately, [E] is almost always experimentally inaccessible; concentrations of enzyme are low, and it is impossible to determine them accurately in vivo. Biochemists customarily use a kinetic equation expressing d[P]/dt in terms of the total enzyme concentration. To do this, we make the substitution [E] = [Etotal] – [C] in equation 24: = 1[][] 1[][] 1[] 2[] (27) 17 Again applying the steady-state approximation, d[C]/dt = 0: [] = 1[][] 1+ 2+ 1[] (28) and [] = 2[] = 12[][] 1+ 2+ 1[] (29) Dividing the top and bottom of expression (29) by k1 gives the expression: [] = 2[][] +[] (30) where, = 1+ 21 is known as the Michaelis constant. Clearly, the maximum value of d[P]/dt will be obtained where [C] = [Etotal] (i.e. the binding sites of the enzyme are completely saturated by the substrate), and will be simply k2[Etotal]. Our expression then simplifies to the form familiar to biochemists: [] = [] +[] (31) where Vmax is the experimental maximum rate of the reaction. KM can be determined experimentally by finding the concentration of substrate that will give a rate of reaction equal to half the maximum: [] = 2 = []1/2 +[]1/2 (32) Therefore, KM = [S]1/2 KM for some enzymes is near the physiological concentration of their substrate – can you suggest an advantage of this 18 EXPERIMENT L3: FURTHER INVESTIGATIONS OF COMPUTING IN CHEMICAL BIOLOGY This experiment is in two parts. The first is involves using bioinformatic software to analyse proteins. The second part of the experiment involves “student-led enquiry” where you will come up with a research question which can be answered by performing a literature search. A number of suggested avenues are listed in Part 2, but you are not limited to these. Before beginning Part 2, you should follow the investigative experiment checklist. This involves checking with a demonstrator that your proposed research question is feasible. You should then prepare a HIRAC for approval for this literature analysis. Only the front page of the HIRAC form needs to be completed for literature-based investigations. An academic member of staff, not a demonstrator, must sign your HIRAC for Part 2. 19 L3 PART 1: BIOINFORMATICS Aim The aim of this experiment is gain experience with a bioinformatic software package, MEGA X and construct a phylogenetic tree showing the evolution of a selected protein. Introduction Proteins can be considered as the workhorses of biology. Their building blocks are the 20 naturally occurring amino acids, which can be arranged in an infinite number of different sequences of varying lengths with a vast range of different three-dimensional structures. Thus, protein molecules are specifically engineered in living systems to carry out a wide range of different functions, including catalysis (i.e., enzymes), structural support (e.g., collagen and keratin), carriers of other molecules or ions (e.g., haemoglobin and transferrin), energy conversion (e.g., ATP synthase and the Na+-pump) and transport across membranes (e.g., Na+ and K+ channels).