<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Artem Golubin</title><link>https://rushter.com/blog/feed/</link><description>Python, Machine learning, NLP, websec, etc.</description><language>en</language><lastBuildDate>Thu, 22 Jan 2026 17:14:08 -0000</lastBuildDate><item><title>Do not fall for complex technology</title><link>https://rushter.com/blog/complex-tech/</link><description>&lt;p&gt;Fifteen years ago, I wanted to set up a note-taking system.
At the time, Evernote was the tool everyone was talking about, so choosing it seemed like the right and easy decision.&lt;/p&gt;
&lt;p&gt;After storing around 500 notes for eight years, Evernote became a mess to use.
It was bloated, heavily monetized, and slow to work with. So I wanted to switch.&lt;/p&gt;
&lt;p&gt;About that time came Notion. Everyone was talking about it. I jumped on the bandwagon and migrated a few hundred of
my notes to it that were still relevant. It did not even occur to me that switching from a bloated and slow app to
a web app would result in a similar outcome later. I followed a popular choice again.&lt;/p&gt;
&lt;p&gt;After struggling for a year, I switched to Markdown notes and a plugin for an editor that renders inline images.
I'm still using this to this day. It is simple, and I will be able to open my notes 10-20 years later.
I can edit them in any editor. It works offline and does not depend on commercial products.
I encrypt my notes locally so they can be stored safely on any cloud service.&lt;/p&gt;[......]</description><pubDate>Thu, 22 Jan 2026 17:14:08 -0000</pubDate><guid>https://rushter.com/blog/complex-tech/</guid></item><item><title>How ClickHouse handles strings</title><link>https://rushter.com/blog/clickhouse-strings/</link><description>&lt;p&gt;At my work, we use ClickHouse to process billions of records and hundreds of terabytes of data.
ClickHouse is fast, and its speed got me curious to learn some of its internals.&lt;/p&gt;
&lt;p&gt;Let's look at a few queries:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;cluster&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;feed&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;state&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;unknown&amp;#39;&lt;/span&gt;

&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;┌──────&lt;/span&gt;&lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="err"&gt;─┐&lt;/span&gt;
&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;129375618342&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;-- 129.38 billion&lt;/span&gt;[......]</description><pubDate>Fri, 16 Jan 2026 17:14:08 -0000</pubDate><guid>https://rushter.com/blog/clickhouse-strings/</guid></item><item><title>You probably don't need Oh My Zsh</title><link>https://rushter.com/blog/zsh-shell/</link><description>&lt;p&gt;Oh My Zsh is still getting recommended a lot.
The main problem with Oh My Zsh is that it adds a lot of unnecessary bloat that affects shell startup time.&lt;/p&gt;
&lt;p&gt;Since OMZ is written in shell scripts, every time you open a new terminal tab, it has to interpret all those scripts.
Most likely, you don't need OMZ at all.&lt;/p&gt;
&lt;p&gt;Here are the timings from the default setup with a few plugins (git, zsh-autosuggestions, zsh-autocomplete) that are usually recommended:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;➜&lt;span class="w"&gt;  &lt;/span&gt;~&lt;span class="w"&gt; &lt;/span&gt;/usr/bin/time&lt;span class="w"&gt; &lt;/span&gt;-f&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;%e seconds&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;zsh&lt;span class="w"&gt; &lt;/span&gt;-i&lt;span class="w"&gt; &lt;/span&gt;-c&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;exit&lt;/span&gt;
&lt;span class="m"&gt;0&lt;/span&gt;.38&lt;span class="w"&gt; &lt;/span&gt;seconds
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;And that's only for prompt and a new shell instance, without actually measuring the git plugin and virtual env plugins (which are often used for Python).[......]</description><pubDate>Fri, 09 Jan 2026 17:14:08 -0000</pubDate><guid>https://rushter.com/blog/zsh-shell/</guid></item><item><title>Recent optimizations in Python's Reference Counting</title><link>https://rushter.com/blog/python-refcount/</link><description>&lt;p&gt;It's been a while since I've written about CPython internals and its optimizations.
My last article on garbage collection was written 8 years ago.&lt;/p&gt;
&lt;p&gt;A lot of small optimizations were added since then. In this article, I will highlight a new optimization for
reference counting that uses a static lifetime analysis.&lt;/p&gt;
&lt;h3&gt;Background on reference counting in CPython&lt;/h3&gt;
&lt;p&gt;Reference counting is the primary memory management technique used in CPython.&lt;/p&gt;
&lt;p&gt;In short, every Python object (the actual value behind a variable) has a reference counter field that tracks how many references point to it.
When an object's reference count drops to zero, the memory occupied by that object is immediately deallocated.&lt;/p&gt;
&lt;p&gt;For hot loops, this can lead to significant overhead due to the frequent incrementing and decrementing of reference counts.
The counter must be updated whenever an object is referenced or dereferenced, which hurts performance and trashes CPU caches.&lt;/p&gt;
&lt;p&gt;So, when you read a variable in Python, you actually &lt;strong&gt;write&lt;/strong&gt; to the memory as well.&lt;/p&gt;[......]</description><pubDate>Sun, 04 Jan 2026 17:14:08 -0000</pubDate><guid>https://rushter.com/blog/python-refcount/</guid></item><item><title>Hash tables in Go and advantage of self-hosted compilers</title><link>https://rushter.com/blog/go-and-hashmaps/</link><description>&lt;p&gt;Recently, I was looking at the Go codebase that was using &lt;code&gt;map[int]bool&lt;/code&gt; to track unique values.
As some of you may know, Go has no set data structure. Instead, developers usually use hash maps (key-value data structure).&lt;/p&gt;
&lt;p&gt;The first idea that comes to mind is to use &lt;code&gt;map[int]bool&lt;/code&gt;, where keys are integers and values are booleans.
But experienced Go developers know that Go has an empty-struct type that can be used for maps: &lt;code&gt;map[int]struct{}&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The benefit of such a type is that it occupies zero bytes in memory; it's a so-called zero-sized type.
The compiler knows this and uses it to your advantage when it can. In this case, it should omit storing values and
keep only keys.&lt;/p&gt;
&lt;p&gt;So when I saw the bool struct, my first thought was to switch to &lt;code&gt;map[int]struct{}&lt;/code&gt; to save some memory.
In theory, that map can hold more than 100 000 integers.&lt;/p&gt;
&lt;p&gt;To my surprise, this change had no effect on memory consumption when running in production.&lt;/p&gt;[......]</description><pubDate>Sun, 14 Dec 2025 17:14:08 -0000</pubDate><guid>https://rushter.com/blog/go-and-hashmaps/</guid></item><item><title>How I am using Helix editor</title><link>https://rushter.com/blog/helix-editor/</link><description>&lt;p&gt;I've been using Helix as my editor to develop on remote servers for quite some time now.&lt;/p&gt;
&lt;p&gt;There are a lot of emerging supply-chain attacks, and I simply don't like the idea of installing tens of plugins to Vim/Neovim to make the editor usable.&lt;/p&gt;
&lt;p&gt;To make the switch from Neovim easier, I had to make some changes to the configuration.
I want to share them to save you some time, because discovering them is not straightforward.&lt;/p&gt;
&lt;h2&gt;Tmux setup&lt;/h2&gt;
&lt;p&gt;I use tmux as a terminal multiplexer.&lt;/p&gt;
&lt;p&gt;One thing that I miss from Neovim setup is a good file manager and TUI for git.
I rarely use a file manager, but when I need to, I usually want to move a bunch of selected files quickly. Unfortunately, Helix does not support file editing in the explorer. You can only view them.&lt;/p&gt;
&lt;p&gt;To overcome it, I added new keybindings to my tmux config:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;# Yazi related&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;-g&lt;span class="w"&gt; &lt;/span&gt;allow-passthrough&lt;span class="w"&gt; &lt;/span&gt;on[......]</description><pubDate>Sat, 11 Oct 2025 20:28:08 -0000</pubDate><guid>https://rushter.com/blog/helix-editor/</guid></item><item><title>Tracking malicious code execution in Python</title><link>https://rushter.com/blog/python-code-exec/</link><description>&lt;p&gt;Recently, I have been working on a new library that statically analyzes Python scripts and detects malicious or harmful code.
It's called &lt;a href="https://github.com/rushter/hexora" target="_blank"&gt;hexora&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Supply chain attacks have become increasingly common. More than 100 incidents were reported in the past five years for PyPi alone.
Let's consider a scenario where a threat actor attempts to upload a malicious package to PyPi.&lt;/p&gt;
&lt;p&gt;Usually, malicious packages attempt to imitate a legitimate library so that the main code works as expected.
The malicious part is usually inserted into one of the files of the library and executed on import silently.&lt;/p&gt;
&lt;p&gt;If the actor lacks experience, he will likely not obfuscate his code at all.
Experienced actors usually obfuscate their code or try to avoid simple heuristics such as regexes.
Regexes are fragile and can be fooled by using simple tricks, but it's not possible to know them in advance.&lt;/p&gt;
&lt;p&gt;One of the problems is that the harder you obfuscate the code, the easier it can be detected.[......]</description><pubDate>Mon, 25 Aug 2025 11:14:08 -0000</pubDate><guid>https://rushter.com/blog/python-code-exec/</guid></item><item><title>Threat Hunting Introduction: Cobalt Strike</title><link>https://rushter.com/blog/threat-hunting-cobalt-strike/</link><description>&lt;p&gt;Cobalt Strike is a threat simulation tool that is used by &lt;a href="https://en.wikipedia.org/wiki/Red_team" target="_blank"&gt;red teams&lt;/a&gt; to perform
penetration tests (simulate cyber-security attacks).
However, it is also used by malicious actors to perform real-world attacks.
In this article, I will demonstrate how anyone can find Cobalt Strike servers on the internet and retrieve metadata from them.&lt;/p&gt;
&lt;p&gt;After reading this article, you will be able to peek inside the Cobalt Strike servers and see how big
companies or malicious actors are using it.
Doing this can be pretty fun, it's like being able to hack into hackers or security experts.&lt;/p&gt;
&lt;p&gt;I will also introduce you to my tool written in Rust, called &lt;a href="https://github.com/rushter/SigStrike" target="_blank"&gt;SigStrike&lt;/a&gt;,
which I wrote to increase crawling speed by 20x.&lt;/p&gt;
&lt;h2&gt;Threat Hunting&lt;/h2&gt;
&lt;p&gt;Threat hunting is a proactive process of searching for threats that can potentially attack your company.
The part of the threat-hunting process consists of the collection of so-called IoCs (Indicators of Compromise).[......]</description><pubDate>Mon, 23 Jun 2025 20:28:08 -0000</pubDate><guid>https://rushter.com/blog/threat-hunting-cobalt-strike/</guid></item><item><title>Shady economics of proxy services</title><link>https://rushter.com/blog/proxy-services/</link><description>&lt;p&gt;Residential proxies are the most demanded type of proxies on the proxy market. Their price increases each year.&lt;/p&gt;
&lt;p&gt;In this article, I want to write down my understanding of the economics of proxy services. In particular, I describe types of proxy offerings, their typical clients, and why the majority of the market is supplied by malware.&lt;/p&gt;
&lt;h3&gt;Proxy types&lt;/h3&gt;
&lt;p&gt;There are two distinctive types of proxies on the market:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Data center (server) proxies&lt;/li&gt;
&lt;li&gt;Residential proxies (broadband and mobile)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Data center proxies&lt;/h3&gt;
&lt;p&gt;As the name implies — such proxies reside in data centers. To create a pool of proxies, proxy providers lease or buy IPV4 subnets. Usually, they are relatively cheap when compared to residential proxies.&lt;/p&gt;
&lt;p&gt;Due to IPv4 address exhaustion, providers of such proxies have a limited amount of IP addresses and fixed geographical locations. Buying an IPv4 subnetwork is also tricky nowadays. Hence, proxy providers usually lease them from small owners. Additionally, leasing gives flexibility and allows them to replace or abandon old IPv4 subnetworks.&lt;/p&gt;[......]</description><pubDate>Wed, 04 May 2022 16:28:08 -0000</pubDate><guid>https://rushter.com/blog/proxy-services/</guid></item><item><title>How masscan works</title><link>https://rushter.com/blog/how-masscan-works/</link><description>&lt;p&gt;&lt;a href="https://github.com/robertdavidgraham/masscan" target="_blank"&gt;Masscan&lt;/a&gt; is a fast port scanner capable of scanning the entire IPv4 internet in under five minutes. To achieve maximum speed,  it requires a stable 10 Gigabit link and a custom network driver for Linux. In comparison, it can take weeks or even months for the naive implementation of port scanners. This article describes key features behind the internal design of masscan.&lt;/p&gt;
&lt;h3&gt;What is port scanning?&lt;/h3&gt;
&lt;p&gt;Port scanning is a method to determine which ports on a specified list of IPs are open and accept connections. People use it to find web servers, proxy servers, databases, and other internet services for security research. Usually, port scanners target the &lt;a href="https://en.wikipedia.org/wiki/Internet_protocol_suite" target="_blank"&gt;TCP/IP&lt;/a&gt; protocol. Although, &lt;a href="https://en.wikipedia.org/wiki/User_Datagram_Protocol" target="_blank"&gt;UDP&lt;/a&gt; port scanning is also possible. This article will only focus on the TCP/IP (&lt;a href="https://en.wikipedia.org/wiki/IPv4" target="_blank"&gt;IPv4&lt;/a&gt;) running under the Linux platform since only the Linux version can handle more than two million packets per second.&lt;/p&gt;[......]</description><pubDate>Sat, 30 Apr 2022 15:06:22 -0000</pubDate><guid>https://rushter.com/blog/how-masscan-works/</guid></item></channel></rss>