
By Starseer Labs
Your security team has mature processes for vetting software packages and patches. Dependency scanning, SBOM generation, vulnerability databases, the works. Now someone on the ML team downloads a model from Hugging Face (or LM Studio, Ollama, or the latest AI-viral OpenClaw craze) and loads it into production. Did that model go through the same scrutiny?
For most organizations, the answer is no. And the gap between how we treat traditional software artifacts and how we treat AI model files is one of the most underappreciated risks in enterprise AI today.
This post is the first in a three-part series on AI model risk. We are starting at the most tangible layer: the model file itself. Not what the model does, not how it was trained, but what happens when you download it and open it.
If you are new to this world, here is a quick primer.
Hugging Face is the Github of AI models. It is the dominant public repository where researchers and companies publish pre-trained AI models that anyone can download and use. There are over a million models on the platform covering everything from text generation to image classification to code completion.
When your team "downloads a model," they are pulling a set of files that contain the model's architecture (its structure) and its weights (the learned parameters that make it useful). These files come in various formats: PyTorch's native pickle format, Microsoft's Open Neural Network Exchange (ONNX), HuggingFace's own Safetensors format, GGUFs, and others.
Here is where security and risk come in with downloading models. Some of these formats are not just data. They are executable.
Python's pickle format is one of the most common ways to run PyTorch models. It is also, by design, capable of executing arbitrary code when a file is loaded into memory.
This is not a bug, it is how pickle works and why it has been the default model format for years before generative AI. When you run a pickle-based model file, the process can trigger any Python code embedded in that file. An attacker who uploads a malicious model to Hugging Face can embed a payload that runs the moment someone loads it.
This has been a well-documented attack vector going back before the generative and agentic AI phases. What started as a niche and narrow attack surface has grown substantially as enterprise and product teams have integrated AI models into applications all over their environment.
JFrog's security research team built a scanning system to examine models hosted on Hugging Face and found over 100 models containing malicious payloads. In one notable case, a PyTorch model uploaded by a user called "baller423" contained a payload that established a reverse shell to an external IP address. Loading the model would silently give the attacker remote access to the victim's machine. The payload used pickle's __reduce__ method to execute the code during deserialization, embedding the malicious logic directly in the trusted serialization process.
Think about what that means in an enterprise context. A product team or data scientist downloads a popular-looking model to experiment with. The model loads, and it works as expected. But in the background, the attacker now has a foothold in your environment. No phishing email required. No complex exploit chain. Just a model file and a good old-fashioned reverse shell.
Pickle gets most of the attention, but it is not the only risky format. Multiple model serialization formats support code execution in various forms:
TensorFlow Keras models can execute arbitrary code through Lambda Layers, which allow custom Python functions to be embedded in the model architecture. While the Hugging Face Transformers library mitigates this by only loading weights (not full model architectures), loading through TensorFlow's native API bypasses that protection.
ONNX models, often considered "safer" because they represent models as computational graphs rather than executable code, have their own attack surface. Recent research has shown that the mathematical operations within an ONNX graph can be manipulated to create backdoors. But we will save that for Part 2 where we’ll discuss architecture backdoors.
GGUF files, increasingly popular for running models locally through tools like Ollama, LM Studio, and llama.cpp (and widely produced by projects like Unsloth for efficient endpoint deployment), are generally safer at the tensor level. The format stores weights as raw data without pickle-style deserialization. But "safer" is not "safe." The GGML parsing library has had multiple heap overflow vulnerabilities (discovered independently by both Databricks and Cisco Talos) where a crafted GGUF file could achieve code execution through memory corruption during loading. And the "Llama Drama" vulnerability showed that GGUF's metadata fields, specifically Jinja2 chat templates, could be weaponized for server-side template injection and arbitrary code execution. Over 6,000 models on Hugging Face were affected before being patched. The pattern holds: any format that processes untrusted input has an attack surface, even if it was designed to be safe.
Other formats, including NumPy, TorchScript, H5/HDF5, and more, each have their own code execution characteristics. The common thread: if you do not know what a format is capable of, you cannot assess the risk of loading it.
Hugging Face developed the safetensors format specifically to address the code execution risk in model files. Safetensors stores only tensor data (the raw numerical weights) with no ability to embed executable code. The library is written in Rust for additional memory safety, and a joint security audit by Trail of Bits (commissioned by Hugging Face, EleutherAI, and Stability AI) found no critical security flaws that could lead to arbitrary code execution. Of the "safer" formats, safetensors has the strongest security posture by design.
This is a real improvement. If every model on Hugging Face used safetensors exclusively, the file-level code execution risk would be greatly reduced!
But "safer format" does not mean "safe ecosystem. There are important nuances:
The conversion pipeline itself is attackable. Researchers demonstrated that the Hugging Face Safetensors conversion service (the official tool for converting pickle-based models to safetensors) could be compromised. An attacker could hijack the conversion bot to submit malicious pull requests to any repository on the platform. Models from companies including Google and Microsoft had been converted through this service. The format was safe; the process of getting to the format was not.
Companion files introduce risk. A model repository on Hugging Face is not just a .safetensors file. It includes other files, and metadata. Palo Alto's Unit 42 found that even libraries that work exclusively with safetensors could be vulnerable to code execution via malicious configuration metadata in companion files. The safe tensor file can sit next to an unsafe configuration file, and the loading library may process both without distinction.
Adoption is not universal. Many popular models still ship with pickle-based files alongside (or instead of) safetensors versions. Older models, custom architectures, and models from smaller research groups may only provide pickle files. While users on HuggingFace can convert those to other formats, the question then becomes one of lineage. How trustworthy are the individuals or scripts converting the models? And even when safetensors are available, teams unaware of the risk may default to the loading method their existing code uses.
The gap between "a safer format exists" and "everyone can use it safely" is where real risk lives. We have dealt with these problems in security for a long time with legacy code and systems, which are often deployed well past their security patching support. Understanding the risk to have a better informed posture is key to a stronger security stance when it comes to models downloaded from public repositories.
Hugging Face takes the file-level threat seriously. They have partnered with JFrog, Palo Alto (formerly Protect AI), and Cisco to deploy scanning across the platform and developed their own pickle scanner (Picklescan). Models flagged as containing potentially dangerous code are marked with an "unsafe" warning.
This is good. It is also incomplete.
Current scanners primarily work through static analysis: examining the pickle bytecode for known dangerous function calls and module imports. But this approach struggles with the same limitations as any signature-based detection, like old-school antivirus technology. Novel payloads, obfuscated calls, and techniques that use benign-looking operations in unexpected combinations can evade detection.
JFrog's own research highlighted this tension. Some models flagged as "unsafe" are actually benign (researchers testing the scanner's boundaries), while sophisticated payloads could potentially slip through pattern-matching-based detection. The scanner provides a useful signal, but treating "not flagged" as "safe" would be a mistake.
More importantly, file-level scanning only addresses one layer of model risk. Even if you solve the code execution problem entirely (use safetensors, scan everything, restrict dangerous formats), you have only addressed the most visible attack surface. A model file that contains zero executable code can still harbor backdoors in its architecture or its learned behavior.
That will be the subject of Part 2.
If your organization is deploying AI models from external sources, here are immediate steps that reduce file-level risk:
Default to safetensors. Wherever possible, download and load models in safetensors format. Configure your model loading pipelines to prefer or require it. It is the strongest option available at the format level.
Treat model files like untrusted code. Apply the same supply chain security principles you use for software packages. Models should be loaded in isolated environments, scanned before promotion to production, and, when available, sourced from verified publishers.
Know your formats. Audit which model file formats are in use across your organization. If teams are loading pickle files without understanding the implications, that is a gap worth closing. Pay particular attention to GGUF files pulled through consumer tools like Ollama and LM Studio, as these are often downloaded without any security review process.
Don't stop at the weights file. Review the full model repository, not just the tensor data. Configuration files, tokenizer definitions, and chat templates all represent potential attack surface.
Monitor for provenance. Track where models come from, who published them, and whether the files have been modified since download. Model supply chain integrity is an emerging discipline, but the basics of provenance tracking apply today.
Recognize the limits. File scanning catches the obvious. It does not catch everything, and it does not address the deeper classes of model risk that exist below the file level.
This post covered the most accessible layer of AI model risk: the file itself. In Part 2, we will go deeper into the model's architecture and examine how backdoors can be embedded in computational graphs without any code execution at all. In Part 3, we will explore the hardest class of risk: sleeper agent backdoors embedded during training that resist detection through normal testing.
The file is where model security starts. It is not where it ends.
This is Part 1 of Starseer's AI Model Risk 101 series. Part 2: The Architecture is the Backdoor and Part 3: Hunting for Sleeper Agents will be published in the coming weeks. Subscribe to our newsletter for updates.
Starseer builds tools for understanding and securing AI model internals. If you want to see what is actually inside the models your organization is deploying, talk to us.
From industrial systems to robotics to drones, ensure your AI acts safely, predictably, and at full speed.
Join our newsletter to stay up to date on Starseer news, features, and releases.
© 2026 - Starseer, Inc.