1. Group assignment (group of 2)
2. The assignment accounts for 30% of the whole unit.
3. Submit the assignment from ICT583 LMS site using the Assignment unit tool.
4. Late work may attract a penalty of 10% (of the mark for that piece of assessment) per day late, up to and including 10 days late. Work submitted more than 10 days late might not be marked.
5. You must keep a copy of the final version of your assignment as submitted and be prepared to provide it on request.
6. The University treats plagiarism, collusion, theft of other students’ work and other forms of dishonesty in assessment seriously.
The healthcare industry has been one of the most prominent beneficiaries of the emergence of data science. Successful applications such as AI-assisted diagnosis and prognosis, Computerized drug discovery, and virtual assistant, etc can greatly improve the patient care and save public money. Your final assignment is to apply your data science knowledge on two healthcare datasets, one is the mammographic masses dataset, the other one is the global burden of disease dataset. The goal of this project is to follow the data science analysis pipeline to answer interesting questions of your own choosing, acquire the data, perform data manipulations, design your visualizations, build your predictive modelling using machine learning techniques and present the results in a report format.
An essential part of your project is your R coding. Your R file should record the steps in developing your solutions and obtaining the final data analysis results. Make sure your code matches the findings you put in the report. For example, if there are three separate plots in the report, your code should produce exactly the same three separate plots.
You also need to submit an in-depth report including two parts - classification and clustering. The following components and discussions might be considered in each part.
Overview of the project:
Provide an overview of the project, the goals, and the motivation for it. Consider that this will be read by people who first see your project.
Describe the background of the dataset and provide the summary statistic. Interesting questions: What questions are you trying to answer? Do any questions evolve throughout the project? Are there any new questions you consider in the course of your analysis? ...
Data manipulation and cleaning:
Are there any data pre-processing steps performed, and why? Are there any questions that can be answered via data manipulation? ...
Exploratory data analysis:
What visualizations did you use to look at your data in different ways? Are there any detected outliers? ...
What are the various machine learning methods you considered? Justify the decisions you made. What are the main ideas of the selected methods? How do you build the models? Are there any concerns when designing your model? ...
What did you learn about the data? Which method statistically outperformed the rest? Have you found the answers to the raised questions? How can you justify your answers? ... Engagingly present your results using text, visualizations.
Are there any limitations of your study? What is your future work?