On-device safety for LLM code execution

2024-07-12 5 min read By Memex Team
newsletter

Greetings friends of Memex. Welcome to our second installment of “As we may think”.

First things first - we soft launched a new website! Check it out at memex.tech. You can download our Research Preview and join our Discord directly from there. We are still offering $50 in credits for you to try it out; DM us in Discord to get them.

Also — we released a Linux version of the app last week. It’s tested on Ubuntu 22.04 and 24.04. For all of you Linux users, we’d greatly appreciate you trying it out and letting us know what (inevitable) bugs you run into so we can start squashing them!

Now on to the update. In this note we cover:

  1. How might we design safe LLM code execution on personal computers?

  2. Update on the Private Research Preview

How might we design safe LLM code execution on personal computers?

LLM agents are capable programmers. But they lack deterministic reliability. This is cause for concern when the agent has access to write code and execute commands directly on the user’s machine.

e.g. we don’t want LLM agents to execute shell scripts like this:

#!/bin/bash# HARMFUL COMMANDrm -rf /

On the one hand, adding security constraints to the LLM Agent seems like a straightforward proposition: we want to put guardrails around the agent’s execution to ensure the LLM Agent doesn’t inadvertently cause damage to the user’s environment. But on the other hand, security features often involve trade offs on ease-of-use, development velocity, and other utility concerns. Our aim is to strike a balance between these competing priorities. 

We think solving this challenge is valuable for our users, because the personal computer is the hub of a professional’s working environment. This section outlines three strategies we employ to keep user machine’s safe. We continue to push on this vector of development — if you have feedback, we’d love to hear!

System security (Mac)

We utilize the MacOS privacy and security features to restrict the Memex agent’s read and write permissions in protected directories. Each file written by Memex (or its code) within a user’s protected directories (e.g. Downloads, Documents, Desktop, iCloud, etc.) will have an additional UUID attribute appended to the file’s permissions that indicates the file was created by Memex. If the user specifies that Memex cannot access those protected directories, Memex is restricted: it can only read and write the files it creates.

Put simply, Memex requires user permission to make changes to user data within protected directories, but Memex can make changes to Memex data in protected directories. This reduces friction, because users do not need to constantly approve Memex to do work in a protected directory, but it also means Memex cannot overwrite your other files that it did not create.

If a user grants Memex access a protected directory, the user forfeits these controls.

To our knowledge, Memex is the only coding agent that takes this approach, which gives it the benefits of a drag-and-drop installation, low resource footprint, and filesystem protection.

System prompting best practices

We instruct the agent to follow safe practices in its system prompt. For example:

  1. It uses a dedicated directory (~/Workspace/).

  2. It is instructed to never read secret keys.

  3. It is instructed to use version control frequently to enable rollbacks.

  4. It is instructed to utilize tools like SEMGrep to identify and prevent harmful code (see image).

This guardrail is of course probabilistic, but in practice it appears to be effective. We have begun sketching out an evaluation benchmark that will allow us to better benchmark safety performance for these types of concerns. If this is an area that’s interesting to you, reach out!

Virtualization (Linux)

Linux does not have the same native security controls that MacOS makes available. For Linux users that want to have similar protection on their machine, we recommend using virtualization (e.g. Docker) to run Memex. When we use Memex via Docker, we mount a local directory to the Docker instance so that we can persist the assets that Memex creates outside of the container.

Update on the Private Research Preview

We wrote our first line of code for the Memex app on May 11th — or 62 days ago. Since then, we’ve pushed 83 releases to the Mac app, created a Linux app, and soft launched our marketing site.

Thank you to all of you that have provided feedback! We’ve identified three key areas of improvement based on the early feedback that we’re working on now: “Collapsed Mode” UX, multi-agent problem-solving, and improved file editing.

“Collapsed Mode” UX

It’s inevitable that the LLM will make errors as it codes and executes tasks. The core issue we are solving is not creating an LLM Agent that makes no errors — instead, the core issue is to make LLM Agents that can efficiently solve valuable tasks, overcoming errors as it encounters them. That said, our current UX “feels” error prone because all errors the LLM Agent encounters are displayed to the user. We’re working on a UX that collapses the agent’s iterations by default so the primary agent<>user interaction is more focused on soliciting preferences from the user and less focused on “stepping through” the agent’s iterative cycles. The below visual demonstrates this new UX pattern:

Multi-agent problem solving

The user<>agent interactions we observe can be broken into two type:

  1. describing preferences and/or requirements to the agent

  2. helping the agent accomplish its tasks

The latter of these involve some common patterns we’ve been collecting in Agent Engineering 101. The patterns to help agents increase their success rates are relatively well defined, for example:

  • frequently create and execute well defined tests

  • understand existing projects before changing them

  • commit work often

  • monitor for LLM hallucinations

Our most recent versions of Memex are using a single LLM agent which we call the “problem-solving agent”. We are working on a second, “critic agent” that is explicitly an advocate for the user that steers the problem-solving agent to follow best practices. The approach is inspired by OpenAI’s CriticGPT.

Desktop light

We’ve built a working intuition behind this approach. LLMs use transformers to model the joint probability distribution of tokens on the internet. It’s not common to find token sequences with inline, “self-critical” problem solving, so it’s less likely that an LLM’s output sequence will effectively solve problems and be critical of its approach. Hence, it’s useful to divide these complementary concerns into two agents.

File editing bug

There was a bug that was causing the LLM Agent to overwrite files by emitting an “empty edit”. We pushed a fix to this, and we’re also working on a bigger overhaul to file editing.

This area of development is “Agent Computer Interfaces”, and it’s an exciting area of R&D. If you’re interested in going deeper, we recommend the paper SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering. Here’s a great visual that illustrates the difference between human-computer interfaces (HCI) and agent-computer interfaces (ACI):

Our new approach uses a file diff methodology, which allows for targeted, “precision” edits. Here’s a sneak peak:

Thanks!

That’s it for this issue. Stay tuned for more, and thanks for reading!

Ever forwards,

The Memex Team