You work at Nintendo as a data scientist. The marketing team have approached you because they want to develop a new Pokémon that will be the ultimate Pokémon king directly below Arceus (the creator of the Pokémon world). The marketing team have no preconceived ideas about the sorts of attributes this new Pokemon should have. They would like to create a Pokémon that could be perceived by other Pokémon as being superior. Nintendo head office have provided you with a dataset and have asked to provide a report with recommendations about what attributes this new Pokémon should have.
First, the marketing team would like to get a better understanding about what sorts of attributes the current Pokémon have. They have asked you to describe the data and find interesting phenomena.
Second, the marketing team have asked you to explore the data in more detail. They would like you to use your expertise in data science to dig out anything you feel is interesting or significant. They are looking for attributes of strength that could be put together to create the profile of a Pokémon that could be the Pokémon King. Further, they would like you to be able to predict whether or not this Pokémon would win a battle against Dialga (one of Arceus’ protectors).
You are required to prepare a report about your findings and to make suggestions about which attributes you would recommend be included in the ultimate Pokémon’s profile. You are also required to provide the script of the code you have used to prepare and explore your data. A notepad template is provided for you to complete.
To learn more about Pokémon check this link out. It will bring up the official Pokédex where you can search for Pokémon to find pictures and learn more about them. If you aren’t familiar with Pokémon it’s worth taking a look at this link.
The potential audiences include other staff within Nintendo, such as executives or sales staff. These staff may have limited ICT or mathematical knowledge.
To prepare the report, please include the following sections:
Provide an introduction to the problem. Include background material as appropriate: who cares about this problem, what impact it has, where does the data come from, what are the dimensions and structure of the data.
Describe how to load the data, and how the pre-processing is performed.
The original dataset is not ready for analysis and it is different from the data forms that we are familiar with in previous practices. This means we need to do some pre-processing, either for the whole dataset, or for a subset of the dataset required for each sub task described later.
Once you have some ideas of exploratory or advanced analysis, you need to adjust the form of dataset. This can be achieved either by manipulating records in R by transposition or subsetting, or with other tools (e.g. notepad or excel) before reading them into R. Please clearly explain the way you have cleaned the data in this section. If you use Excel please still explain the steps in the Notepad document and the Report.
Exploratory Data Analysis
One-variable analysis studies one variable (one row or one column) each time. For example, the attribute “classification” could be selected to get a bar graph of the frequency of each Pokémon type. Or, “height” could be selected to show a histogram of height ranges of Pokémon. You can choose the attribute you want to for this. Add your code to the Notepad template.
A two-variable analysis studies the relation between two variables. For example, we might be interested to know the attack strength or speed of Pokémon (using the attribute “type1” or “classification”). Which type is the strongest overall? Which is the weakest? It is up to you to decide which attributes/variables you use for this analysis. Just be sure to explain what you have done using sentences as well. Add your code to the Notepad template.
Perform 2 two-variable analysis. Plot one graph for each variable. Explain the finding for each graph.
Briefly explain the concept of clustering and k-means (with references). Perform 1 clustering analysis. You can choose the attributes you want to evaluate but an idea is:
“Are then any clusters when capture rate and base happiness are examined?”
Briefly explain the concept of linear regression (with references).
Perform 2 linear regression analysis. Plot the learned models. You can choose the attributes you want to evaluate but an idea is:
- “Which type is the most likely to be a legendary Pokemon?”
- "How likely is [a Pokemon type] to be a legendary Pokemon?"
Briefly explain the concept of a classification tree (with references). You can choose the attributes you want to evaluate but an idea is:
- “Is it possible to build a classification tree to identify legendary Pokemon?”
Sum up your findings and provide some insight into the findings.
In this part, discuss any difficulties you had performing the analysis and how you solved those difficulties. Reflect on how the analysis process went for you, what you learnt, and what you might do differently next time.
Drawing a funny picture of your Pokémon is encouraged but entirely optional. There are no marks for this.