1 BIA B452F Assignment 1 Weighting: 30% (Deadline: 30 March 2022, Wednesday) Learning outcome: Explain and select analytic techniques for business intelligence and big data analysis. Apply data visualization tools and predictive analytics to summarize and analyze business data. Important note: You should note that there might not be a single correct answer to the questions. Your answers to these questions may be different from each other and could all be equally valid. This is an individual assignment. Copying some or all of another student’s assignment is plagiarism. Discussing your assignments with other students and seeking their comments and advice is acceptable but it is not acceptable for two students to hand in assignments that are substantially the same. When you collaborate on an individual assignment, it is important that the final product is your own work. Task In this assignment, you need to perform exploratory analysis to investigate the salary survey of European IT specialists conducted in 2019. The sample Dataset “IT Salary Survey EU 2019.csv” consists of 991 observations for the following 21 features (source: https://www.kaggle.com/parulpandey/2020-it-salary- survey-for-eu-region select=T+Salary+Survey+EU+2019.csv). 1. age – Age 2. gender – Gender 3. city – City 4. seniority – Seniority level 5. position – Position (without seniority) 6. experience – Years of experience 7. technology – Your main technology / programming language 8. brutto_salary – Yearly brutto salary (without bonus and stocks) 9. bonus – Yearly bonus 10. stocks – Yearly stocks 11. brutto_salary_before – Yearly brutto salary (without bonus and stocks) one year ago. Only answer if staying in same country 12. bonus_before – Yearly bonus one year ago. Only answer if staying in same country 13. stocks_before – Yearly stocks one year ago. Only answer if staying in same country 14. vacation_days – Number of vacation days 15. home_office_days – Number of home office days per month 16. language_at_work – Main language at work 17. company_name – Company name 18. company_size – Company size 19. company_type – Company type 20. contract_duration – Contract duration 2 21. business_sector – Company business sector (Note: Brutto salary is the sum of salary before the deduction of tax and insurance(s) and the salary and bonus are in Euro.) You must apply exploratory analysis to salary package and trend of different IT specialists in European regions. You must define your own research questions (or hypotheses) and use summary statistics and data visualization to find the answers for your research questions. For example, you may hypothesize that the salary package of IT specialists is sexual independent. To collect evidence to verify the hypothesis, you may derive the average salary by different type of IT specialists and sex to verify the hypothesis, present the results using bar chart, and then draw your conclusion about the hypothesis. You must pre-process the data and select appropriate visualization methods in the analysis. You may need to handle the missing data, re-code the variables, and perform data aggregation. You may use any appropriate approach to handle the missing data and make reasonable assumptions in the analysis, if necessary. You must justify your methods and assumptions made. You must analyze the statistics and graphical output in detail and write up your interpretation. The following two references should be a good start for preparing this assignment: “IT Salary Survey December 2020” at https://www.asdcode.de/2021/01/it-salary-survey- december-2020.html “Assignment 1 Sample Analysis” on OLE (Note: The sample analysis only illustrates how to write up an analysis report on using R to perform the exploratory analysis of credit card usage. The program and analysis are not directly applicable to the given problem. You are expected to provide more in-depth discussion of the findings in your analysis.) Write a report to present and discuss your findings of the exploratory analysis. You are recommended to use R markdown to prepare the report. The report must include an overview of the problem, describe analysis of the survey data, your hypotheses, R programs/outputs, and analysis. This individual assignment will be graded based on the following components (for further details please see rubrics on OLE): 1. Describe analysis 2. Research questions and data analysis 3. Organization and writing skills Submission Details Your completed works should be uploaded to OLE before deadline (March 25, Friday), as follows: 1. Analysis report – “Assignment 1” 2. R program (or R markdown) – R program” Marks will be deducted if any non-compliance with the submission requirements.