N00BIoT Email Sentry 2.0 N00BIoT’s Email Sentry is a malware detection platform. The first version of Email Sentry wasn’t particularly effective, so the N00BIoT commissioned you as an expert in machine learning. An early phase of the project used principal component analysis to determine if there were specific factors about emails that could help to identify malicious emails. Based on this the N00BIoT software team tried to further refine their malware classification system. The results were still underwhelming.
Following your initial consultation with N00BIoT, the software development team has extracted data sets based upon your recommendations.
N00BIoT intends to launch a new version of Email Sentry at the end of the year. It will be marketed as N00BIoT ES2 (Powered By AI). The software team is scrambling to produce a reliable email detector and has turned to you to provide the machine learning expertise and analysis to deliver a product with the following goals:
Need assignment help for this question?
If you need assistance with writing your essay, we are ready to help you!
Why Choose Us: Cost-efficiency, Plagiarism free, Money Back Guarantee, On-time Delivery, Total Сonfidentiality, 24/7 Support, 100% originality
Very low false-positives on malware detection
High level of sensitivity in detecting malware.
You are to apply supervised machine learning algorithms to the data provided. You will train your ML model using the MalwareSample set, and then test them against the EmailSamples data set
All analyses are to be done using R. You will report on your findings
Part 1 – Preparing your data for constructing a supervised learning model using MalwareSamples10000.csv
You will need to write the appropriate code to,
i. Import the dataset MalwareSamples10000.csv into R studio.
ii. Set the random seed using your student ID.
iii. Partition the data into training and test sets using an 80/20 split.
The variable isMalware is the classification label and the outcome variabl
Part 2 Evaluating your supervised learning models
a) Select three supervised learning modelling algorithms to test against one another by running the following code. Make sure you enter your student ID into the command set.seed(.).
b) For each of your supervised learning approaches you will need to:
i. Run the algorithm in R on the training set (exclude “specimenId” from the analysis).
ii. Optimise the hyperparameter(s) of the models (except for binary logistic regression model).
iii. For the binary logistic regression model, perform recursive feature elimination (RFE) on the model to ensure the model is not overfitted. See Workshop 5 for an example, except in this instance, specify the argument function=lrFuncs in the rfeControl(.) command instead.
iv. Evaluate the predictive performance of the models on the test set, and provide the confusion matrix for the estimates/predictions, along with the sensitivity, specificity and accuracy of the model
c) For the binary logistic regression model, report on the RFE process (i.e, information on which