Walkthrough of Advent of Cyber 2023 Day 2: O Data, All Ye Faithful [Log Analysis]
Hello, cyber enthusiasts! TryHackMe has unveiled challenges for Day 2 of Advent of Cyber 2023. Here is the link to the room: https://tryhackme.com/room/adventofcyber2023.
This room is about an introduction to what data science involves and how it can be applied to cybersecurity. You’ll learn about the basics of data science, Python, and Python libraries such as Pandas and Matplotlib.
Data science is like using programming, statistics, and artificial intelligence to look at a large amount of data. We do this to figure out trends and patterns, and then we use it to help businesses make predictions about the future. These guesses help them make better choices.
And it contains different roles such as data collection, data processing, data mining, analysis, and communication.
Data collection involves collecting the raw data. This could be a list of recent transactions.
Data processing involves turning the raw data that was previously collected into a standard format the analyst can work with. This can be time-consuming.
Data mining involves creating relationships between the data and finding patterns and correlations that can start to provide some insight. For example, chipping away at a big stone and discovering more and more as you chip away.
Analysis is where the bulk of the analysis takes place. Here, the data is explored to provide answers to questions and some future projections. For example, an e-commerce store can use data science to understand the latest and most popular products to sell, as well as create a prediction for the busiest times of the year.
Communication is extremely important. Even if you have the answers to the universe, no one will understand you if you can’t present them clearly. Data can be visualised as charts, tables, maps, etc.
The use of data science is quickly becoming more frequent in cybersecurity because of its ability to offer insights. Analyzing data, such as log events, leads to an intelligent understanding of ongoing events within an organisation. Using data science for anomaly detection is an example.
After this, they have given an introduction to Jupyter Notebook and the Python basics. They have explained it very clearly. So you should definitely refer to their documentation.
You’ll learn how to print your text in the Python programming language.
print(“Hello world”)
You’ll learn about variables, lists, the Pandas library, and the Metsplotlib library.
Let’s quickly move on to the first challenge they asked.
How many packets were captured (looking at the PacketNumber)?
In order to answer this question, we will use the count() function. So launch the machine, and you will get a window of a Jupyter notebook. In the fifth cell, add the code.
df.count();
Then you’ll get the total number of all columns.
Now coming to the second challenge, they asked.
What IP address sent the most amount of traffic during the packet capture?
Here, we’ll use the ‘size’ function to find out how much traffic was sent by the IP addresses. So add the code in the cell.
df.groupby([‘Source’]).size()
We’ve got our result. And clearly, we can see that IP address ‘10.10.1.4’ has sent the maximum number of packets.
Let’s move to third question
What was the most frequent protocol?
We’ll use the value.counts function to know the most frequent protocol. So we’ll change the code.
df[‘Protocol’].value_counts()
We got our result, and we can see that the ICMP protocol has been used for the maximum time of 27.
We’ve successfully answered all the questions. This room was beginner-friendly.
So I hope you guys have enjoyed this walkthrough, and if you have any doubts, feel free to ask me on LinkedIn.
You can check out the video walkthrough here : https://youtu.be/LLuKcNZu6tM?si=IKE5clPAyAk09kHa
Until next time, stay secure, stay curious and keep exploring the updates of cyber security.