MATLAB-SESM6038 AY2021

SESM6038 AY2021-22 Coursework 1 Detailed description v1.0 Page 1 of 9 pages Background There are over 10,000 species of birds and the wide range of habitats occupied by the bipedal animals provides a unique opportunity to systematically explore skeletal adaptations to their environment (Prum et al. 2015; Braun and Kimball 2021). Variations in skeletal anatomy are often described using landmark-based analyses (Orkney et al. 2021), whereby a user identifies characteristic features through locating landmarks on the bone surface. Although the precision by which individual landmarks can be identified may remain limited even for experienced users, several approaches to reduce such user influence are known. These approaches include repeated determination of landmarks, the processing of groups of landmarks by fitting geometric primitives (lines, planes, spheres etc.), and also the direct algorithmic analysis of the surfaces on which landmarks are defined themselves. Figure 1 Left: Phylogenetic tree of birds (Braun and Kimball 2021); Right: Scaled surfaces of over 60 bird femora In this coursework, you will employ landmark-based analysis of the morphology of the thigh bone (femur) in a sample of 44 bird species to identify skeletal features that help to robustly establish similarities and differences between the many species of birds, habitats, and behaviours. Research question The key research question to be addressed by this work is how size and shape variations in avian femora are linked to phylogeny, ecology, behaviour. Of particular interest is to understand which morphological features are consistent or in conflict with groupings of birds according to phylogeny (e.g. order, family). Key tasks definition of landmarks describing essential characteristics of femur morphology (repeatedly) adaptation of Matlab code to derive a set of robust features from the landmarks and surfaces development of Matlab code for the statistical analysis of the data and provision of a document (“ReadMe”) to enable expert users to reproduce the results from the raw data preparing a concise, structured report to document the methods, results, and critical interpretation of the findings SESM6038 AY2021-22 Coursework 1 Detailed description v1.0 Page 2 of 9 pages Addressing the coursework To address the tasks related to this coursework you will use a range of software tools for landmark definition (MorphoDig), data visualisation (ParaView), and data analysis more generally (MATLAB). Please make sure to consult and use all material provided on the blackboard site for lectures, labs, and background reading more generally in addition to specific resources provided for the coursework. There is a dedicated blackboard discussion forum for the coursework – please also check the posts for answers to questions that might already have been addressed, and to ask and get answers to new questions. Essential resources for the coursework – data, and code Essential coursework-specific resources that you will need include the data and essential MATLAB functions and scripts which are available from the assignment section on blackboard. Data Computer models of the left femur (c.f. Figure 1, right) of 44 bird species are available for analysis in the form of triangulated surfaces (PLY format), including 4 animals from the clade of Palaeognathae and 40 from within the clade of Neognathae. From within the former clade, there is data on one species each of the orders of Dinornithiformes, Apterygiformes, and Struthioniformes. From within the clade of Neognathae, there are 5 femora for the Galloanserines (3 from the order of Galliformes and 2 from the order of Anseriformes), 5 Gruiformes, 5 Procellariiformes, 5 Strigiformes, 5 Piciformes, 5 Psittaciformes, and 10 Passeriformes. The names of the femur surfaces encode the species of the birds as documented in the EXCEL file “CW1_avian_femora.xlsx“ which further provides essential taxonomic/phylogenetic details on each species. Importantly, the EXCEL file provides the order and superorder(clade) of each animal for use in the analyses. Code To support the identification of features from landmarks and surfaces, we provide numerous Matlab functions (collected in the directory: Matlab_Functions) as well as one Matlab script, named “analyse_avian_femur_morphology_CW1.m“. The Matlab script reads all surfaces and landmarks from a specified directory, performs analysis to derive key features of the morphology in the proximal femur and diaphysis (already implemented), and also enables a more detailed analysis of the distal femur morphology, as well as the saving of the essential results to an EXCEL file for further analysis. To use the code you will have to ensure that the functions in the directory named Matlab_Functions are in the search path of Matlab, and adjust the base_dir variable in the “analyse_avian_femur_morphology_CW1.m“ script to reflect the structure on the machine that you use. Amending the MATLAB code In addition to small changes to the “analyse_avian_femur_morphology_CW1.m“ script you will have to amend also the function “extract_LMbased_avian_femur_features.m“ to derive a set of features for the distal femur that you find most suitable/interesting (remember that the function is located in the Matlab_Functions directory). You will also develop a new script for the statistical analysis of the results, “avian_femur_morphology_statistics.m”. SESM6038 AY2021-22 Coursework 1 Detailed description v1.0 Page 3 of 9 pages Top tips: if you want to run and develop the code on your own computer, make sure all Matlab tool boxes required her are installed ( ) if you find that the speed of execution is slow, e.g. on a University computer, this might have to do with where the data that is read/stored is kept; often using a networked drive may slow execution down considerably and storing data locally on the machine will typically help improve execution speed. You can also use Matlab online – see the post on the coursework discussion forum Addressing the key tasks To complete the coursework you need to address the 4 essential tasks already mentioned: TASK 1: definition of landmarks describing essential characteristics of femur morphology (repeatedly) TASK 2: adaptation of Matlab code to derive a set of robust features from the landmarks and surfaces TASK 3: development of Matlab code for the statistical analysis of the data and provision of a document (“ReadMe”) to enable expert users to reproduce the results from the raw data TASK 4: preparing a concise, structured report to document the methods, results, and critical interpretation of the findings In the following, we provide more specific details on how to best proceed with these tasks. TASK 1 – Repeated landmark definition Using the software MorphoDig, you are required to define landmarks on each of the femur surfaces that help identifying and characterising features of the femoral head, the trochlea of the patellofemoral joint (both the lateral and medial ridge), as well as the lateral and medial condyles (Figure 2). Figure 2 Left: 3 landmarks (blue) identify the femoral head (view from antero-medial); Centre: 9 landmarks each define the ridge of the lateral (red) and medial (green) ridge of the trochlea (view from anterior); Right: 9 landmarks each define the ridge of the lateral condyle (orange) and the approximately planar area of the medial condyle (green) (view from posterior) To obtain full marks, you need to define the landmarks twice, and report the time between the two attempts and also the order in which femora were processed and landmarks defined. SESM6038 AY2021-22 Coursework 1 Detailed description v1.0 Page 4 of 9 pages TASK 2 – Adaptation of Matlab code, feature extraction We have already highlighted the main coding related tasks which include making small changes to the “analyse_avian_femur_morphology_CW1.m“ script, and amending the function “extract_LMbased_avian_femur_features.m“ to derive a set of features for the distal femur that you find most suitable/interesting (remember that the function is located in the Matlab_Functions directory). You will also develop a new script for the statistical analysis of the results, “avian_femur_morphology_statistics.m”. In terms of the features to describe the femur morphology, you are required to extract and report at least the following 9 features : femur length (no further work required) femoral head radius (no further work required) the ratio of the Moments of inertia Ixx/Iyy for the central cross-section (no further work required) 2 features related to the trochlea 2 features related to the lateral condyle 2 features related to the medial condyle The first 3 features are already completely defined and no further decision/coding is required from you other than to save/export and report on them. You have to decide and implement appropriate code to extract the 6 further features related to the trochlea of the patellofemoral joint and the lateral and medial condyles though. The function “extract_LMbased_avian_femur_features.m“ provides examples and a very good basis you can build on to do so. Top Tips: for the development of the process and code, you can run the code on a subset of data, as long as there is a landmark file for each bone surface file included in the respective directory – you may only have 10 femur surfaces in a TestData directory. As long as you have defined landmarks for each surface the script “analyse_avian_femur_morphology_CW1.m“ should work fine. there might be sections of the code and data currently written to an EXCEL file that are useful for understanding the data and aid in the development of the analysis process, and helpful for preparing the report, but which might not be required for the final statistical analysis. Therefore you can adapt the code to respond to the respective needs of the different stages of development: o you might be interested in saving details about the bone surfaces (number of vertices, number of faces) and creating summary statistics to help describe the sample for your report – but only really need to extract such details once. o similarly, there is code that creates spheres located at the landmark position for all landmarks and saves these as .ply files to the result directory. A critical step to ensure that there are no issues with the landmark definition is to visualise these and we strongly encourage you to do so (load and check the bone surface and the respective representation of landmarks with spheres in ParaView). However, once you have confirmed that all is good, you could turn off the landmarks being written to file every time you run the script “analyse_avian_femur_morphology_CW1.m“. SESM6038 AY2021-22 Coursework 1 Detailed description v1.0 Page 5 of 9 pages TASK 3 – Development of MATLAB code for statistical analyses Comparison of features between repeat definitions of landmarks To establish how reliable your features are, you can explore several tests, including a paired t-test (or the non-parametric equivalent (Wilcoxon signed-rank test) if the assumptions for the paired t-test are not met) and a Bland-Altman plot (Martin Bland and Altman 1986) providing estimates for the bias and limits of agreements. Paired t-test The paired t-test is a method used to test whether the mean difference between pairs of measurements (v1,v2) is zero or not. A key pre-requisite for using the test is that the differences between the 1st and 2nd measurements are normally distributed; to formally test whether that requirement holds one may use a Lilliefors test. These tests are readily available from Matlab. Lilliefors test in Matlab [h,p,kstat,critval] = lillietest(v2-v1) Paired t-test in Matlab [h,p,ci,stats] = ttest(v1,v2) Bland-Altman plot A Bland-Altman plot (Martin Bland and Altman 1986) provides estimates for the bias and limits of agreements of repeat measurements. Functions to create such plots are available from MATLAB central; we provide the code from the implementation by H.J. Wisselink (2021). Here is some pseudo-code to illustrate how you could calculate bias and limits of agreement for each of n_features using that function: for i=1:n_features BA{i} = BlandAltmanPlot( v1(:,i), v2(:,i) ); end Numerical values for the essential results are then available in these fields: BA{i}.data.mu % mean value of var2-var1 (bias) BA{i}.data.loa % lower and upper limits of agreement BA{i}.data.CI.mu % lower and upper bound of the CI of the mean BA{i}.data.loa_lower % lower and upper bound of the CI of the lower LoA BA{i}.data.loa_upper % lower and upper bound of the CI of the upper LoA Please consult the function BlandAltmanPlot and our example script (“avian_femur_morphology_statistics_MH.m“) for more details on its usage! Top Tip: Though reporting results on the precision of the measurement of the features is important, we would not expect that you include e.g. Bland-Altman plots for each feature – just because we have shown you how to create such plots. However, including a concise table that summarised the data shown above (with units!) would seem to be a very good idea! SESM6038 AY2021-22 Coursework 1 Detailed description v1.0 Page 6 of 9 pages Descriptive statistics of features and visual inspection using plots Once you have established the precision with which the features are determined, you should compute the average from the repeated measurement and take this forward for further analysis. NOTE: Pseudocode (MATLAB) for performing the analyses and plots described below is provided! “avian_femur_morphology_statistics_MH.m“ For each feature, you should calculate key descriptive statistics (mean, SD, range (min to max)). To develop your understanding of the characteristics of the results and their variation across the entire sample we suggest you also inspect plots of histograms for each parameter. In addition to providing numerical values of the descriptive statistics calculated across all femora, in tabulated form, in your report, you should also use appropriate visualisations to understand similarities and differences in features between orders, using e.g. box plots, using “order” as the grouping variable. The decision on which box plots to include in the results section of the report should be driven by the aspects you want to highlight in the discussion and where reference to such data would critically inform the discussion. DO NOT include boxplots for features for which you will not offer substantial discussion. Remember to link the presentation of the results to the key research question i.e. determining how size and shape variations in avian femora are linked to phylogeny, ecology, behaviour. Of particular interest is to understand which morphological features are consistent or in conflict with groupings of birds according to phylogeny (e.g. order, family). Advanced analysis of features using linear regression analysis, PCA, and Spectral Clustering Though the summary of the data using the statistical methods described above will help to get a basic understanding of the distribution of the features and the similarities and differences between birds, understanding detailed relationships for 9 or more features describing femoral anatomy in 44 birds might be a challenge. You will therefore make use of several data reduction methods as described below. Linear regression analysis One may very well speculate that in birds, similar to conditions in man, the radius of a sphere fitted to the femoral head is significantly related to the length of the bone, in a linear relationship. If a strong linear relationship was also established in birds, additional insight by considering both rather than just one of these “features” would be very limited. To test the nature of the relationship between femur length and femoral head radius we require you to perform linear regression analysis and describe the specifics of the methods and include the results in your report. Note: if you suspect that such a linear relationship may also be present between further features (to be firstly confirmed by an appropriate scatter/xy plot of the relevant data) you can perform a similar regression analysis for these additional parameters, too. PCA analysis While a linear regression analysis as described above considers the relationship between pairs of features, a Principal Component Analysis (PCA) offers the possibility for a more effective representation of the data by determining new variables (principal components) which more effectively describe the variance in the data. SESM6038 AY2021-22 Coursework 1 Detailed description v1.0 Page 7 of 9 pages You are therefore required to perform a PCA on the features; however, to remove issues arising from comparing features of a possible very different magnitude you will perform the PCA not on the features directly but on standardised features, specifically so-called z-scores (mean value of 0, SD of 1.0). After calculating z-scores from the feature matrix, you perform a PCA on the z-scores of the relevant features and evaluate and report on the following results: the cumulative variance explained by the PCs (use quantitative data and a scree plot) bi-plots of the first 3 PCs to describe the relationship between PCs and original features, identify potential clustering (visually) By investigating the scores you will further explore similarities/differences in the morphology between the various orders of birds in a more comprehensive manner than would be possible by considering individual features in isolation. Spectral Clustering The keenest students would want to read up on and explore the use of Spectral Clustering (again, applied to the z-scores of the relevant features) as a more formal method to explore similarities/differences in scores between the various orders of birds. SESM6038 AY2021-22 Coursework 1 Detailed description v1.0 Page 8 of 9 pages TASK 4 – Preparing a concise, structured report In a 6 page report, written for an expert in the field (no click-by-click instructions!), you will provide a concise description of what you did, what you found, and what the results mean. We require you to include a brief formulation of aim/hypotheses, a concise yet precise description of material and methods (enabling expert to reproduce study), imaginative presentation of key results, and an insightful discussion of key findings, limitations, and suggestions for future work. How you will be assessed –the marking scheme There are a total of 100 marks awarded for successful completion of the coursework where up to 30 marks are awarded for (repeated) the landmark definition for all bones, up to 15 marks related to amending and documentation of the Matlab code (max 5 marks for amending feature extraction function, max 5 marks for amending the 2 scripts to run the analyses and evaluate the results, max 5 marks for a ReadMe.txt file providing essential instructions for generating and evaluating the results) while up to 55 marks are awarded for the 6-page report. Here up to 15 marks are awarded for a detailed materials & method section (concise yet comprehensive description of the material/data set, appropriate referencing of all software tools used as well as hardware and operating system details; imaginative description of all the methods using text, tables, and figures enabling an expert to reproduce the results). Up to 25 marks are awarded for a comprehensive yet focussed presentation of the results in writing (description is required!) and figures as well as tables. Up to 10 marks are awarded for an insightful and critical discussion of findings concerning the research question (discuss at least 4 distinct, relevant findings). A further 5 marks will be awarded for a suggestion of an essential future development (what, why, how; 2.5 marks max) and a concise conclusion that should provide relevant quantitative insight into the research question obtained from your analyses (max 2.5 marks). Top Tips: Material/Data: Provide a concise but complete description of the data set that includes not only an overview of essential features of the sample (number of bones, phylogenetics) but also captures key characteristics of the surfaces (number of vertices, triangles). Methods: Be imaginative in the use of text, tables, and figures to provide a clear description of the process, with an expert as a reader in mind. For all software tools used, report their name and version (DO NOT add such details in some form of a reference but include them directly in the body of the report text). For commercial software provide the name of the manufacturer/company, the city, and country of their headquarters (e.g.: Boston Scientific(Boston, USA)). For OpenSource software, include the reference(s) to the respective papers as requested/suggested by the developers. Results: Focus on the presentation of the results and DO NOT include any description of how results were obtained (methods!) or what the results mean (discussion!). Provide clear graphs (font size!) and tables (decimal places!) adding units, titles, and sufficiently detailed captions. Discussion: Start with reminding the reader what the key questions to be answered is before critically discussing what you found (be specific, cite results) and what these results mean, and how they help address the questions. References: Preferred citation style is (Author, Year) – please ensure that the reference list does include full author details, title, and journal information (name, volume/ issue, pages). SESM6038 AY2021-22 Coursework 1 Detailed description v1.0 Page 9 of 9 pages SUBMISSION one (1) report, as PDF one (1) .zip file with any amended or new code, your readme.txt file, the landmark data (1st and 2nd definition), and results – DO NOT ADD the SURFACE FILES we provided, or any MATLAB functions that you did not modify online submission is through eAssignment, by the 03.03.2022 (week 05) Literature Braun, Edward L., and Rebecca T. Kimball. 2021. ‘Data Types and the Phylogeny of Neoaves’, Birds, 2: 1-22. Martin Bland, J., and DouglasG Altman. 1986. ‘Statistical Methods for Assessing Agreement between Two Methods of Clinical Measurement’, The Lancet, 327: 307-10. Orkney, Andrew, Alex Bjarnason, Brigit C. Tronrud, and Roger B. J. Benson. 2021. ‘Patterns of skeletal integration in birds reveal that adaptation of element shapes enables coordinated evolution between anatomical modules’, Nature Ecology & Evolution, 5: 1250-58. Prum, Richard O., Jacob S. Berv, Alex Dornburg, Daniel J. Field, Jeffrey P. Townsend, Emily Moriarty Lemmon, and Alan R. Lemmon. 2015. ‘A comprehensive phylogeny of birds (Aves) using targeted next-generation DNA sequencing’, Nature, 526: 569-73.