Cyber-attacks have become one of the biggest problems of the world. They cause serious financial damages to countries and people every day. The increase in cyber-attacks also brings along cyber-crime. The key factors in the fight against crime and criminals are identifying the perpetrators of cyber-crime and understanding the methods of attack. Detecting and avoiding cyber-attacks are difficult tasks. However, researchers have recently been solving these problems by developing security models and making predictions through artificial intelligence methods
Computer crime, or Cybercrime, refers to any crime that involves a computer and a network. Net crime is criminal exploitation of the Internet . A cyber-attack is an exploitation of computer systems and networks. It uses malicious code to alter computer code, logic or data and lead to cybercrimes, such as information and identity theft.
Such tools play a key role in the understanding the cyber-attack that has occurred and can aid a faster and more efficient incident response rate.
It has been common that every company usually has a lot of data to handle. Here we talk more about the servers and storage security. Human efforts are less likely to be useful and work nowadays, and they are slow also. We also need everything to be automatic, and manual always has some issues. Machine Learning helps the team to manage the servers and keep them safe.
What is a Confusion Matrix?
A Confusion matrix is the comparison summary of the predicted results and the actual results in any classification problem use case. The comparison summary is extremely necessary to determine the performance of the model after it is trained with some training data.
When we use Machine Learning based on our data, it finds something new and gives us its prediction or answer. The data we had before is called actual data, and the data that the machine gave us is predicted data.
Actual Class 1 value= 1 which is similar to Positive value in a binary outcome.
Actual Class 2 value = 0 which is similar to a negative value in binary outcome.
The left side index of the confusion matrix basically indicates the Actual Values and the top column indicates the Predicted Values.
There are various components that exist when we create a confusion matrix. The components are mentioned below
- Positive(P): The predicted result is Positive (Example: Image is a cat)
- Negative(N): the predicted result is Negative (Example: Images is not a cat)
- True Positive: This column holds the number of data out of the total, which is True in actual data and is correctly predicted by the machine.
- False Positive: This column hold the number of data out of the total, which is True in actual data, but the machine predicted them false. (Type 1 Error )
- False Negative: This column holds the number of data out of the total, which is False in actual data and machine predicted then wrong, i.e., True.(Type 2 Error)
- True Negative: This column holds the number of data out of the total, which is False in actual data, and the machine also predicted then false, i.e., which means correct prediction.
Type I error:
Type I error (False Positive)
This type of error can prove to be very dangerous. Our system predicted no attack but in real attack takes place, in that case no notification would have reached the security team and nothing can be done to prevent it. The False Positive cases above fall in this category and thus one of the aim of model is to minimize this value.
Type II error:
Type II error — False Alarm (False Negative)
This type of error are not very dangerous as our system is protected in reality but model predicted an attack. We would get notified and check for any malicious activity. This doesn’t cause any harm. They can be termed as False Alarm.
We can use confusion matrix to calculate various metrics:
1.The Accuracy (AC) is the proportion of total number of predictions that were correct .
Accuracy = (TP + TN)/(TP + TN + FP + FN)
2.The Recall or True positive rate( TPR) is the proportion of positive cases that were correctly identified.
TPR (True positives / all actual positives) = TP / TP + FN
3.True Negative Rate (TNR)or Specificity measures the proportion of actual negatives with respect to the Total Negatives
TNR or Specificity (True negatives / all actual negatives) =TN / TN + FP
4.False Positive Rate (FPR ) is the proportion of negative cases that were incorrectly classified as positive.
FPR (False positive / predicted positive result) = FP / TN + FP
5.False Negative Rate (FNR ) is the proportion of positives cases that were incorrectly classified as negative.
FNR (False negative / Predicted Negative result ):FN / FN + TP
6. The Negative predictive value (NPV) predicts the value for both true negatives and false negatives
NPV ( True Negative / All total Negative ) : TN / TN + FN
7.The Positive Predictive value (PPV) predicts the value for both true positives and false positives (Precision )
PPV ( Total Positive / All Positives): TP / TP+FP
**We can say that Machine Learning is a very much an important part of the IT industry and it has been used in every domain and it is being developed day by day to meet the need of the industry. We have also well discussed how the confusion matrix work and how it helps in real-world problems.
Confusion Matrix is used in classification models. It Determines the performance of classification Models for given data. It is usually a Square matrix Where Columns represents actual values and Row represents Predict value of model **.