INF30030 Business Analytics
Task
Data SET
https://archive.ics.uci.edu/ml/datasets/bank+marketing (Links to an external site.)
Software to use
https://www.rstudio.com/products/rstudio/download/?fbclid=IwAR145VLpzh3aE1cjArVl85g207MzQVcQQEqUR9ZKFdoRS-9Jy2ePSImd7aI
Preparing and exploring Data
Exploring data
Once you have addressed missing values and duplicate data problem you will need to explore inherent relationships between the different variables. The focus variable for this study is the Result column (since you are asked to predict it). So, this section should show your efforts to identity from the remaining columns in the dataset which are likely to have high predictive power on the ‘Result’ column. You may use both basic statistical analyses such as correlations and present them as visual graphs or tables (raw data).
Usage of “head()” function.
R screenshoot
Usage of “dim()” function.
R screenshot
summary () function short description
Screenshot
Missing & Duplicate data filtering.
Exploration of Inherent Relationships Between Variables.(explain using chart )
Screenshot and short description
plot chart and short description
histogram type chart and short description
Preparing Data
You’ll use historical data to train your model. Data may contain duplicate records and outliers; depending on the analysis and the business objective, you decide whether to keep or remove them. Also, the data could have missing values, may need to undergo some transformation, and may be used to generate derived attributes that have more predictive power for your objective. Overall, the quality of the data indicates the quality of the model. You need to provide a data dictionary of all data items used in your analysis and their justification to be included in your model.
Short Description About Preparing Data
Removing Null Data and Variables that are Not Used
Screenshot and short description
Use Summery function and short description
Renaming the columns
Image: Before renaming the columns
Use rename () function short description
Data Dictionary of All the Data that is Used in the Analysis