How masscan works
Masscan is a fast port scanner capable of scanning the entire IPv4 internet in under five minutes. To achieve maximum speed, it requires a stable 10 Gigabit link and a custom network driver for Linux. In comparison, it can take weeks or even months for the naive implementation of port scanners. This article describes key features behind the internal design of masscan.
What is port scanning?
Port scanning is a method to determine which ports on a specified list of IPs are open and accept connections. People use it to find web servers, proxy servers, databases, and other internet services for security research. Usually, port scanners target the TCP/IP protocol. Although, UDP port scanning is also possible. This article will only focus on the TCP/IP (IPv4) running under the Linux platform since only the Linux version can handle more than two million packets per second.
TCP/IP and port scanning
Fast port scanners are capable of making millions of connections per second. Unfortunately, the TCP/IP was not designed to handle such an amount of connections (sessions) from a single device in the first place.
The TCP protocol is stateful, and the OS needs to keep the state of each session. Usually, a regular Linux machine can't handle more than 300 000 simultaneous connections, and you need to tweak some kernel parameters to achieve better throughput. This limit differs for each setup. Also, Maintaining hundreds of thousands of requests requires a lot of RAM (for buffers) and CPU. On Linux, the TCP/IP stack runs inside the kernel space, and each request to it from the user space requires relatively expensive syscalls that trigger CPU interrupts and data buffering.
Instead of relying on TCP/IP implementation from the operating system, masscan implements its own TPC/IP stack designed for port scanning. It runs in the user space and uses as few syscalls as possible. It is possible because of raw sockets. When you use raw sockets, the operating system skips many internal checks. For incoming packets, raw sockets capture all the incoming traffic on the system and act as a packet sniffer. That's acceptable because the receive rate is much smaller than transmit. A lot of endpoints do not answer back.
Internally, masscan uses libpcap, which also uses raw sockets to send and receive packets but with extra optimizations (e.g., PACKET_MMAP) and provides a portable interface.
When working with raw sockets, we need to generate Ethernet frames directly. This is because the TCP and IP stacks run on top of the other network layers (three layers above in the OSI model. In a nutshell, Ethernet frames carry TCP/IP packets to the router, and each frame encapsulates IP packets, and IP packets encapsulate TCP packets in turn.
Here are the actual layouts of the Ethernet frame and TCP/IP packets from their RFCs:
Ethernet frame header format
+----+----+------+------+-----+
| DA | SA | Len | Data | FCS |
+----+----+------+------+-----+
DA Destination MAC Address (6 bytes)
SA Source MAC Address (6 bytes)
Len Length of Data field (2 bytes)
Data Protocol Data (46 - 1500 bytes)
FCS Frame Checksum (4 bytes)
IP header format
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL |Type of Service| Total Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification |Flags| Fragment Offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time to Live | Protocol | Header Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Destination Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
TCP Header Format
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Port | Destination Port |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Acknowledgment Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data | |U|A|P|R|S|F| |
| Offset| Reserved |R|C|S|S|Y|I| Window |
| | |G|K|H|T|N|N| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum | Urgent Pointer |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The data
field from the Ethernet frame contains the IP header followed by the TCP header. As you can see, there are a lot of fields that must be set before sending an Ethernet frame.
To establish a TCP connection, we need to perform a three-way handshake. Since we only need to check if a port is open, sending one SYN packet (connect request) and one response (SYN-ACK) is enough.
To speed up packet generation, masscan uses a predefined SYN template and only modifies part of the data, such as source and destination IPs and port pairs, checksums, and TCP sequence number.
Given that packet assembly is performed without the operating system, it saves a lot of CPU and RAM resources. When Linux assembles a packet, it performs a lot of checks. That includes ARP lookups, routing lookups, and netfilter (firewall) checks.
Since masscan operates on the Ethernet level, it needs to know the source and destination (router's address) MAC addresses. It implements ARP protocol to get the addresses.
Source IP spoofing
Another interesting trick is that masscan can use an arbitrary IP address from a local network, which is not bound to the machine.
For example, suppose you set the --source-ip
parameter to 192.168.1.111
but your machine has 192.168.1.10
. Linux does not validate the IP fields for outgoing packets in the raw socket mode. For incoming packets, it skips all the packets with an unknown destination address but still delivers them to listeners that use raw sockets! This is a neat trick and it allows to skip port scanning packets by the OS.
When a router does not know to whom an IP belongs to, it sends a broadcast request to all the devices inside the local network and expects a reply with the MAC address. Masscan listens to ARP requests from the router and sends replies. LAN networks use MAC addresses for communication, and one MAC address can have multiple IPs.
When a TCP/IP packet travels outside the LAN, the router uses NAT to replace the local source IP with the external IP so that it can receive the response from the internet. When it receives the response, it also replaces the destination address with the local address and sends it to the machine using the associated MAC address. This is why traffic with a spoofed IP travels back to the machine.
What happens when an operating system does not ignore response packets from the masscan?
It closes such connections (sends TCP Reset response) because it does not know anything about TCP sessions that masscan initiated. This is bad if we want to grab a banner (e.g., HTTP response). For systems that use a direct link to the internet (without the local network), masscan suggests using iptables
rules that can ignore all incoming packets on a specified port. This works pretty fast, too, because iptables
is just a utility that creates firewall rules for netfiler that runs inside the kernel.
Asynchronous transmission
To send millions of packets per second, masscan uses an asynchronous approach and only two threads: one for sending and one for receiving the packets.
Since TCP/IP is a stateful protocol, it usually takes a lot of computing resources to maintain the state. Masscan does not keep any state and transmit/receive threads work independently without any synchronization. It also doesn't remember which packets it sent.
To match incoming packets, it uses the Sequence number
and Acknowledgment Number
TCP fields. Historically, they are used for packet order reconstruction and data integrity. When a client sends a connect request (SYN packet) to a server, it sets the sequence number. When the server responds with acknowledgment of the request, it increments by sequence number by the number of received bytes and puts it in the Acknowledgment Number
field.
According to the TCP specification, the first sequence number sent by a client can be any value. Such a property is used to implement SYN cookies that prevent SYN flood attacks. Masscan uses a very similar approach but stores a special hash instead.
To generate the hash, it passes source and destination IPs and ports to SipHash function. As a result, SipHash produces a 64-bit hash, and masscan takes the first 32 bits that can fit in the sequence number field.
Since TCP/IP packets already contain the source and destination IP and port pairs, masscan uses the hash to make sure incoming packets belong to it.
For example, let's suppose we are browsing Twitter and scanning it using masscan at the same time. When a TCP packet from Twitter arrives, we need to decide to whom it belongs too. It can belong to our browser or scanner. By checking the hash (which can be recalculated given any packet), we can filter all packets that should be routed to masscan.
When using raw sockets to get the incoming packets, Linux dumps all the traffic from the system. This technique also helps to eliminate potential IP spoofing in the incoming packets. This algorithm can be viewed as a checksum.
A naive scanner implementation usually spawns a separate thread for each connection and tracks the TCP state and timeouts using the operating system. That's a lot of work! Masscan's approach allows transmitting and receiving threads to run asynchronously. There is no synchronization and locks and only two threads that basically send and receive binary data to the network card using syscalls to the kernel.
PF_RING
Raw sockets still interact with the Linux kernel to receive and send packets. Such interactions use system calls to the kernel and constantly copy internal data structures to the user space.
Regular network drivers for Linux can't process more than 2 million packets per second. To get around this limitation, masscan uses PR_RING driver that provides direct access to the network card in user space. Such a driver can transmit up to 10 million packets per second.
With such a driver, you basically interact with the network adapter buffers. The main limitation of the driver is that the operating system and other applications will not be able to receive traffic from the same network adaptor.
For the majority of the setups, you won't even need this driver because your underlying network will not be able to process such an amount of packets. Two million packets per second is already a lot faster than the majority of the scanners.
Other optimizations
I've described the main optimizations that make masscan very fast, but there are a lot of small optimizations as well. To study more, you can read the source code, which contains a lot of detailed comments.
Comments
- thiezn 2024-10-17 #
Thanks for the great blog post, it really clearly lays out the underpinning concept of masscan!
With this post and some light reading of src/main.c I managed to replicate the core of functionality in Rust for educational purposes. Having the logic in main.c structured the way you did really helped as well. Oftentimes when jumping into a new codebase it takes quite a lot of time to understand where things are, not here though.
Thanks again!
This post is well written Keep up the good work :)