Screen Australia – QBUS6600 Project Outline Background An important source of Australian box office data for movies is collated on a weekly basis by a company called Numero. The University of Sydney Business School has obtained this database going back to January 2000, for student project use. Additional data has been obtained from OpusData to enrich the box office information supplied by Numero. Screen Australia, our industry collaborator for this project, has expressed interest in our analysis of the movie data and assisted with the formulation of the questions for this project. Screen Australia is the Australian Federal Government’s key funding body for the Australian screen production industry, created under the Screen Australia Act 2008. Screen Australia supports the development, production, promotion and distribution of Australian narrative and documentary screen content. The organisation has a research division which analyses various aspects of Australian films and has an interest in the performance of movies as revealed by Australian box office statistics. The market for cinema-based movies is dynamic and has changed considerably over time as consumer tastes evolve and the nature of distributors and cinemas respond (for example streaming has captured audience share and although there are fewer actual movie theatres, there are now many more screens than was the case historically). One of the key questions facing the industry is how to better understand the drivers of financial success for different types of movies, in particular, how to predict total revenues. Problem Description You have been provided with a dataset (see ‘Data Description’ below) that contains Australian theatrical (cinema) box office information from January 1, 2000 to January 31, 2022. In this project, you will: Use exploratory data analysis to identify the key attributes for predicting the total Lifetime Gross revenues earned by movies and to investigate how movies screened in the Australian theatres (cinemas) changed over time. You should aim to find or reveal all relevant properties, characteristics, patterns, and statistics hidden in the dataset. For the ‘change over time’ investigation, we suggest that you compare the characteristics of the movies (including box office performance) over several similarly sized consecutive time periods (e.g., ~5-year periods). Because the final task below focuses on Australian, Asian and European movies, we ask that you also investigate whether (and in what ways) those groups of movies are different from each other and from the rest. Develop a regression model for predicting the Lifetime Gross movie revenue. Use any statistical or machine learning approaches that you feel are appropriate. We suggest that you use the RMSLE to evaluate the performance of your final model. Ensure that you justify the selection of your final model and interpret the final model in terms of the key attributes for predicting the lifetime gross movie revenue. Because the final task below focuses on Australian, Asian and European movies, we ask that you also investigate whether (and in what ways) your model implies that those groups of movies are different from each other and from the rest. If your model is too complex for this interpretation, we suggest that you also consider well-performing interpretable models (for example, linear models) for predicting the movie revenue. Based on your analysis, highlight differences between Australian, Asian and European movies and outline strategies for maximizing box office revenue of Australian, Asian and European movies. Your strategy should take advantage of the key movie attributes that you have identified for predicting the movie revenue and the models that you have built and validated. As part of your proposed strategy, you should include a discussion of the movie attributes (other than the early box office performance) that are likely to increase the box office revenue. Data Description You have been provided one tabular dataset in CSV format on Australian box office data. Movie box office This dataset is ~9.3K rows (~2.5MB), one row per movie, covering the time period from January 1, 2000 to January 31, 2022. The data contains fields including name and genre of movie, country of origin, main actors and other characteristics of the movie, production budget (for a subset of all movies), screening dates, number of screens, and information on the box office revenues generated. Additional Information Most of the data was extracted from the Numero’s All Films Research database and reflects the information provided by the movie distributors. Prior to 2014, this information was collected by the Motion Picture Distributors Association of Australia. Information provided in the last 6 columns (production_budget, creative_type, source, production_method, sequel, and running_time) was extracted from the OpusData database. As indicated in Screen Australia reports on trends in the cinema industry, the number of screens in Australia has increased over the years. For example, the number of screens available in 1980 was 829; by 2020 it had increased to 2,241. During the same period, the number of cinemas has reduced from 713 to 473. The net result of these changes is that the number of cinema patron seats has remained relatively stable over this period. In their cinema trends analysis, Screen Australia considers the following movie categories based on Numero’s Opening day screens data: Limited (0-19 screens) Speciality (20-99 screens) Mainstream 100-199 screens) Wide (200-399 screens) Blockbuster (400+ screens).