Writing a simple SOCKS server in Python

Last updated on April 20, 2018, in Python

This article explains how to write a tiny and basic SOCKS 5 server in Python 3.6. I am assuming that you already have a basic understanding of proxy servers.

Introduction

SOCKS is a generic proxy protocol that relays TCP connections from one point to another using intermediate connection (socks server). Originally, SOCKS proxies were mostly used as a circuit-level gateways, that is, a firewall between local and external resources (the internet). However, nowadays it is also popular in censorship circumvention and web scraping.

Throughout the article, I will be referring the RFC 1928 specification which describes SOCKS protocol.

Before reading this article, I recommend you to clone a completed version of the implementation so you can see the full picture.

TCP sessions handling

The SOCKS protocol is implemented on top of the TCP stack, in such way that the client must establish a separate TCP connection with the SOCKS server for each remote server it wants to exchange data with.

So, first of all, we need to create a regular TCP session handler. Python has a built-in socketserver module, which simplifies the task of writing network servers.

from socketserver import ThreadingMixIn, TCPServer, StreamRequestHandler


class ThreadingTCPServer(ThreadingMixIn, TCPServer):
    pass


class SocksProxy(StreamRequestHandler):

    def handle(self):
        # Our main logic will be here
        pass



if __name__ == '__main__':
    with ThreadingTCPServer(('127.0.0.1', 9011), SocksProxy) as server:
        server.serve_forever()

Here the ThreadingTCPServer creates a threading version of TCP server and listens for incoming connections on a specified address and port. Every time there is a new incoming TCP connection (session) the server spawns a new thread with SocksProxy instance running inside it. It gives us an easy way to handle concurrent connections.

The ThreadingMixIn can be replaced by ForkingTCPServer, which uses forking approach, that is, it spawns a new process for each TCP session.

Connection establishment and negotiation

When a client establishes a TCP session to the SOCKS server, it must send a greeting message.

The message consists of 3 fields:


versionnmethodsmethods
1 byte1 byte0 to 255 bytes

Here version field represents a version of the protocol, which equals to 5 in our case. The nmethods field contains the number of authentication methods supported by the client. The methods field consists of a sequence of supported methods by the client. Thus the methods field indicates the length of a methods sequence.

According to the RFC 1928, the supported values of methods field defined as follows:

  • '00' NO AUTHENTICATION REQUIRED
  • '01' GSSAPI
  • '02' USERNAME/PASSWORD
  • '03' to X'7F' IANA ASSIGNED
  • '80' to X'FE' RESERVED FOR PRIVATE METHODS
  • 'FF' NO ACCEPTABLE METHODS

When the SOCKS server receives such message, it should choose an appropriate method and answer back. Let's pretend we only support a USERNAME/PASSWORD method.

The format of the answer looks as follows:


versionmethod
1 byte1 byte

Here is how the whole process looks in Python:

    def handle(self):

        # Greating header
        # read and unpack 2 bytes from a client

        header = self.connection.recv(2)
        version, nmethods = struct.unpack("!BB", header)

        # socks 5
        assert version == SOCKS_VERSION
        assert nmethods > 0

        # Get available methods
        methods = self.get_available_methods(nmethods)

        # accept only USERNAME/PASSWORD auth
        if 2 not in set(methods):
            # close connection
            self.connection.sendall(struct.pack("!BB", SOCKS_VERSION, 255))
            self.server.close_request(self.request)
            return

        # Send server choice
        self.connection.sendall(struct.pack("!BB", SOCKS_VERSION, 2))

    def get_available_methods(self, n):
        methods = []
        for i in range(n):
            methods.append(ord(self.connection.recv(1)))
        return methods

Here the recv function reads n bytes from the client and the struct module helps to pack and unpack binary data using specified format.

Once the client has received the server choice, it responds with username and password credentials.


versionulenunameplenpasswd
1 byte1 byte0 to 255 bytes1 byte0 to 255 bytes

The version field represents the authentication version, which is equals to 1 in our case. The ulen and plen fields represent lengths of text fields so the server knows how much data it should read from the client.

The server response should look as follows:


versionstatus
1 byte1 byte

The status field of 0 indicates a successful authorization, while other values treated as a failure.

Python version of authorization looks as follows:

    def verify_credentials(self):
        version = ord(self.connection.recv(1))
        assert version == 1

        username_len = ord(self.connection.recv(1))
        username = self.connection.recv(username_len).decode('utf-8')

        password_len = ord(self.connection.recv(1))
        password = self.connection.recv(password_len).decode('utf-8')

        if username == self.username and password == self.password:
            # Success, status = 0
            response = struct.pack("!BB", version, 0)
            self.connection.sendall(response)
            return True


        # Failure, status != 0
        response = struct.pack("!BB", version, 0xFF)
        self.connection.sendall(response)
        self.server.close_request(self.request)
        return False

Once the authorization has completed the client can send request details.


versioncmdrsvatypdst.addrdst.port
1 byte1 byte1 byte1 byte4 to 255 bytes2 bytes

Where:

  • VERSION protocol version: '05'
  • CMD
    • CONNECT '01'
    • BIND '02'
    • UDP ASSOCIATE '03'
  • RSV RESERVED
  • ATYP address type of following address
    • IP V4 address: '01'
    • DOMAINNAME: '03'
    • IP V6 address: '04'
  • DST.ADDR desired destination address
  • DST.PORT desired destination port in network octet order

The cmd field indicates the type of connection. This article is limited to CONNECT method only, which is used for TCP connections. For more details, please read the SOCKS RFC.

If a client sends a domain name, it should be resolved by the DNS on the server side. Thus a client has no need for a working DNS server when working with SOCKS.

As soon as server establishes a connection to the desired destination it should reply with a status and remote address.


versionreprsvatypbnd.addrbnd.port
1 byte1 byte1 byte1 byte4 to 255 bytes2 bytes

Where:

  • VER protocol version: X'05'
  • REP Reply field:
    • '00' succeeded
    • '01' general SOCKS server failure
    • '02' connection not allowed by ruleset
    • '03' Network unreachable
    • '04' Host unreachable
    • '05' Connection refused
    • '06' TTL expired
    • '07' Command not supported
    • '08' Address type not supported
    • '09' to X'FF' unassigned
  • RSV RESERVED
  • ATYP address type of following address
    • IP V4 address: '01'
    • DOMAINNAME: '03'
    • IP V6 address: '04'
  • BND.ADDR server bound address
  • BND.PORT server bound port in network octet order

Here is how it looks in Python:

# client request
version, cmd, _, address_type = struct.unpack("!BBBB", self.connection.recv(4))
assert version == SOCKS_VERSION

if address_type == 1:  # ipv4
    address = socket.inet_ntoa(self.connection.recv(4))
elif address_type == 3:  # domain
    domain_length = ord(self.connection.recv(1)[0])
    address = self.connection.recv(domain_length)

port = struct.unpack('!H', self.rfile.read(2))[0]

# server reply
try:
    if cmd == 1:  # CONNECT
        remote = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        remote.connect((address, port))
        bind_address = remote.getsockname()
    else:
        self.server.close_request(self.request)

    addr = struct.unpack("!I", socket.inet_aton(bind_address[0]))[0]
    port = bind_address[1]
    reply = struct.pack("!BBBBIH", SOCKS_VERSION, 0, 0, address_type,
                        addr, port)

except Exception as err:
    # return Connection refused error
    reply = self.generate_failed_reply(address_type, 5)

self.connection.sendall(reply)

# Establish data exchange
if reply[1] == 0 and cmd == 1:
    self.exchange_loop(self.connection, remote)

 self.server.close_request(self.request)

If server's reply indicates a success, the client may now start passing the data. In order to work with both client and remote hosts concurrently we can use select library which supports select and pool Unix interfaces.

Here is how we can read and resend data in one loop both from client and remote host:

def exchange_loop(self, client, remote):
    while True:

        # wait until client or remote is available for read
        r, w, e = select.select([client, remote], [], [])

        if client in r:
            data = client.recv(4096)
            if remote.send(data) <= 0:
                break

        if remote in r:
            data = remote.recv(4096)
            if client.send(data) <= 0:
                break

That's it! We have got a working SOCKS 5 proxy.

Now we can test it using curl:

 curl -v  --socks5 127.0.0.1:9011 -U username:password https://github.com


Want a monthly digest of these blog posts?