More On Coding Agents

5 Oct 2025 · 8min

Previously, we created a solution to run a coding agent locally in isolation. In my rush to share the working solution, I glossed over the security stuff—like what could actually go wrong running a coding agent on your machine, and what containerization helps with.

Worse still, the post might leave you with a dangerous false sense of security, which would haunt me endlessly.

So let’s do this properly. I’ll describe my simplified model of what a coding agent actually is, walk through the risks with examples of how things could go wrong, and clarify which problems containerization solves—and which ones it doesn’t.

Coding Agent Simplified

Nuanced discussion tends to benefit a lot from having any model explicitly defined. It helps the speaker to explain the ideas and the audience to understand and challenge them.

Very simplified version of what I think coding agent is

This is what gets executed on your machine. User Interaction Module reads your prompts and writes back LLM output to you. LLM Module is likely very thin wrapper around sending well formatted HTTP POST to model API. Shell Execution Module is the code responsible for executing shell commands on your machine-I think about it as glorified eval <LLM-requested-command>.

Just to be sure, the actual model is not running on your machine. The code running on your machine is simply a wrapper enabling terminal interaction between you, the remotely running model and the local machine.

While writing above, I’ve realized Gemini CLI is open source so we can validate these assumptions against it.

What claude tells you about its competitor architecture

● Gemini CLI Architecture

  Main Packages:

  1. packages/cli - Terminal UI
      (input/output, rendering, configuration)
  2. packages/core - Backend engine
      (API client, prompt management, tool orchestration)
  3. packages/a2a-server - Agent-to-Agent server
  4. packages/vscode-ide-companion - VS Code integration

  Core Tools (packages/core/src/tools/):
  - File ops: read-file, write-file, edit, ls
  - Search: grep, glob
  - Execution: shell
  - Web: web-fetch, web-search
  - Extensions: mcp-client, memoryTool, write-todos

  Interaction Flow:

  1. User input → CLI package
  2. CLI → Core package (sends request)
  3. Core → Gemini API (with prompt + tool definitions)
  4. Gemini API → Core (response or tool call request)
  5. Core → Tool execution
      (with user approval for modifying operations)
  6. Tool result → Gemini API → Core → CLI → User

To wrap up, let’s define a simplified model of the actual interaction flow:

Note that remote model is the brain of the operation-local coding agent being mostly a proxy to access the model conveniently. You are in charge of what’s the goal, the model is responsible for figuring out how to achieve it and coding agent is just a messenger.

What Can Go Wrong

There are two types of risk I’m concerned about when interacting with coding agents:

data privacy breach: any situation when the data you’ve not intended to share are uploaded remotely¹
malicious execution breach: any situation when your computer gets compromised and executes a malicious program without your intent to do so

It’s up to you which category concerns you.

I’ve heard responses disregarding data privacy ranging from “they (corporations) already have my data, I don’t care” to “as long as it gives me competitive advantage”. The thing is that being concerned about just the personal data is only half of the picture. I’ve seen many developers storing their credentials in plain text files right in the project directory. Once the coding agent is running, I would consider such credentials compromised².

For the second category, ask yourself:

are you surprised people are concerned about installing via curl | sh?
would you use an application known to have SQL injection vulnerabilities?
would you let a new junior colleague use your computer to run a couple of scripts?

If your answer is straight yes to all of them, running a coding agent adds minimal additional risk to your already risky practices.

Let me clarify a couple of angles where I suspect these concerns might get misinterpreted.

But companies behind the coding agents have incentive to make them safe

Even though I disagree as we are in capture the market by increasing the capabilities phase, I have a different, more fundamental objection. I think that the coding agent can be either safe but useless or useful but dangerous with no middle ground.

Given the simplifed model described above, the most concerning piece is the Shell Execution Module. When thinking about how to implement it, we have basically two options—blacklist commands deemed to be dangerous or whitelist the safe ones. I consider both approaches to be doomed.

In the blacklisting scenario, you are going to inevitably omit an execution path, however unlikely³. Any gap gives the model basically full eval power.

Which leaves us with whitelisting. There is a version which would probably be safe in the sense of minimal risk of malicious execution—I tend to imagine it as allowing the agent to run just ls and cat. The agent can feed the context but nothing else, not even writing to the files.

This is probably secure, but also useless—barely better than copy/paste from a chat.

The current approach is likely whitelist with sanitization sprinkled on top. I honestly believe that coding agent developers are doing their best to minimize the attack surface. It’s clear they are aware the model might be cheeky to slide the malicious code in.

Unfortunately, the same argument as in blacklisting case applies. As you increase the execution capabilities, you will open a pathway to unbounded eval execution.

But the model is not intelligent, you are just another doomsayer

Let’s skip the discussion about the definition of the intelligence and how close the current models are to being such. I’m not worried about AI overlords for now.

I’m concerned about two scenarios: the model making genuine errors in its suggestions, or being deliberately trained to generate malicious recommendations. Neither requires AI consciousness—just flawed probabilistic outputs or compromised training data.

To make the first case more concrete, let’s consider the increased risk of typosquatting attack. The common scenario would be something along the lines:

Please write a script to do that

Sure, I will write such script

Would you like me to add the following dependencies: <lib>, <misspelled-lib>, <another-lib> ?

Yes (missing the misspelled dependency)

Would you like me to install them and run the tests?

Yes (as it is so convenient)

And the malicious code gets executed on your local machine.

In the second case, a government agency might pressure the company to train the model to inject subtle vulnerabilities. Such an attack might be aimed at high-profile targets only to maintain a low profile and avoid detection.

Alternative Remedies

We might consider alternatives to mitigate the risks.

Since part of the problem is data leaving your device, running the model locally helps. However, this solution has some issues.

If we do not have control over how the model was trained, it still might like to slip the backdoor suggestions in. My tightly air gapped comment in the previous post was just a joke. The reality is that the container has network access, which means that if the model runs locally, it could still exfiltrate data by uploading it.

Another alternative is using disposable remote server. In essence, it’s very similar to running the container locally. You are likely to upload just the project folder which solves part of the data privacy issue. If the server is really disposable, you might not care about the malicious execution either as you will just spin up a new instance in case of the breach.

Ensure tight network policies though. Ideally, run it in a cluster with limited access to the company network as otherwise the compromised server can pivot to your company’s internal network.

What Containerization Solves (And Doesn’t)

So finally let’s talk about the containerization approach.

Since we are selectively choosing which files, directories and environment variables are mounted to the container, we have control over what data can be compromised. It is up to you how paranoid you are with your data.

Because the container is running in the isolation from the host, even if you accidentally execute the malicious code, its blast radius is limited. The container is disposable so in case you are suspicious something is wrong, just terminate the container and start a new session.

I am running podman run always with --rm and teardown & spin new container quite often. In case you are concerned about the context being propagated between the sessions, I would argue it’s a better idea to hack your way around jq .projects[].history ~/.claude.json rather than having a long running container session.

Now to the most important part: what containerization does NOT solve.

Containerization doesn’t make LLM code safe to run blindly

My default trust level when executing LLM generated code is the same as executing random snippets from the internet. There is nobody else but you responsible for what happens once you decide to execute the code on your local machine. It pretty much does not matter if the file was written by the coding agent or you copy/paste it from the chat.

If you let the agent write code in a container, then execute that code on your host without reading it—you’ve completely missed the point. The container protects you during development, not during deployment. The containerization cannot save you from all the hard work of actually reading and reviewing the code. That is your job, so you still have one, and I would highly recommend to keep this responsibility to you.

Containerization doesn’t prevent data exfiltration

Secondly, even if you are selective with the files being mounted, those files are uploaded to the remote servers. Containerization gives you no control over what happens with such data. Maybe the data will become part of a training dataset, or will be used to profile you to spend even more time on the social media feed, or will leak your proprietary know-how.

Such is the price you pay for the temptation to make model responses more useful.

Regardless of whether the data are misused ↩
The same applies if the credentials are stored as environment variables in the shell session used to start the coding agent ↩
I would love to hear if it was flagged and explicitly hand waved during the code review ↩