Hexora v0.3: New features and improvements

Last updated on June 25, 2026
in python

Recently, I've improved my Python library, hexora. I wrote it to detect malicious Python code using static analysis.

In the new v.0.3.0 release, I've added new detections, and we now also use a simple machine learning model to analyze the whole file. The machine learning model uses code structure features, semantic features, and static code analysis to assess the entire Python file.

Although the model can detect malicious code without any detections coming from static analysis, its main use case is to filter false positives.

I've been testing it against newly published PyPI packages and it detects 2-10 new malicious packages each day.

Due to the number of published packages, before the machine learning model, I was getting around 5-10 false positives for 1 real finding. Now, I use the confidence score from a gradient boosting model to filter out low-confidence findings.

I'm still getting false positives. They are mostly coming from weird AI related projects. As I mentioned in my previous article, they tend to abuse dynamic code execution, inline their assets or strings using base64, or plant telemetry. When telemetry is optional, or when you can opt out, I usually don't report such projects.

For some reason, the recent wave of new developers think that it's okay to use telemetry in open-source projects. They treat their projects as products.

Working on hexora is fun, as it turns out, you don't need a lot of resources to detect malicious stuff and make the open-source ecosystem safer.


If you have any questions, feel free to ask them via e-mail displayed in the footer.
All articles on this website are written by a human.

Comments

There are no comments for this post. Be the first to share your thoughts.

Leave a comment