Skills Are Not Enough

Skills encode procedure. Memory encodes everything else.

May 11, 2026 Nick Kinney ~7 min read

Substrate Over Procedure

Circadian wake app icon

Disclaimer: I am an actuary. ASA designation pending (expected June or July 2026). The thoughts and analyses here are my own explorations, not professional advice. Draw your own conclusions.

The dominant theory of getting good code out of an AI agent right now goes like this. You have a fleet of middling engineers. They are smart enough, they type fast, they never sleep. They also have no memory. They forget every project the moment the conversation ends. So you write down your process — your interview script, your PRD template, your TDD loop, your refactor checklist — and you hand it to them as a "skill." A skill is a markdown file with instructions. The agent reads it, follows it, ships the work, and forgets you ever met.

The output quality goes up. Visibly. Reliably. I have a directory full of skills and I use them every day.

But I want to push on the theory a little, because I think the people writing about it are describing a workaround and calling it a solution.

What a Skill Actually Is

A skill is a serialized procedure. It compresses a workflow — say, "interview the user about a feature until you've walked the design tree" — into a few hundred words of instruction the agent can execute. It is, mechanically, the same thing a senior engineer's playbook is to a junior engineer: here is how we do this, here are the questions to ask, here are the failure modes to avoid.

That framing is helpful because it makes the skill concrete. It is not helpful because it borrows a human metaphor that fails at the load-bearing joint.

A junior engineer reads the playbook once. After that, they have it. They do not need to be handed the document at the start of every meeting. They absorb the procedure, integrate it with three other procedures they learned last quarter, and over a few years quietly stop needing playbooks at all. The whole point of a playbook is that it is temporary scaffolding for a system that can rewrite itself.

A skill, in current agent stacks, is not temporary scaffolding. It is the entire structure. The agent reads it on Monday, executes it, and reads it again on Tuesday as if Monday never happened. The skill is not metabolized into anything. There is no "learned this last week so I can skip steps two and three." There is only the file, sitting in a folder, being re-read on every invocation forever.

That is not "an engineer with no memory." That is a Bash script with very good vibes.

The Missing Primitive

The piece nobody seems to want to name is that the skills-only approach is doing exactly one thing well — encoding procedure — and silently failing at three things that matter at least as much.

It does not encode preferences. When I correct an agent for the third time about my tone, that correction does not survive into the next session. A skill could say "prefer a brusque register" but every user has fifty of these and writing fifty into every skill is not a system, it is paperwork.

It does not encode environment. That my container runtime is OrbStack, that my Python launchd jobs need the Homebrew binary because the Apple shim breaks TCC, that my blog ships via bind-mount and not a build step — none of this belongs in a skill, because none of it is procedural. It is just true about my machine, and any agent operating on my machine should know it without me re-stating it.

It does not encode session-to-session continuity. When I worked on a problem yesterday and reach for the same problem today, a skill cannot tell me what I did. The skill is the recipe. The recipe is not the dinner I ate last night.

The frame "the agent is a fleet of memory-less engineers" implicitly accepts all three of these failures as natural. Of course the engineer doesn't remember your preferences — they're new. Of course they don't know the environment — they're new. Of course they can't reference yesterday — they're new. The frame absolves the architecture.

But the agents are not engineers. They are processes running against a model with a context window. The reason they "have no memory" is that nobody has built them one. Not because memory is impossible, and not because it is hard. Because the dominant theory says you don't need it if you have enough skills.

You need it.

What Memory Looks Like When You Build It

I run my own agent stack, Hermes Agent, and the architectural difference is a separate memory layer. There are three stores, not one.

The first is memory in the literal sense. A small file the agent reads at the start of every turn. It contains durable facts: my OS, my hardware, my container runtime, the names of projects I am currently working on, conventions I have established. It is updated by the agent itself when it learns something new, and pruned when something stops being true. It is not the procedure. It is the world the procedure is being executed in.

The second is user profile. Same mechanism, different content. Who I am, how I write, what I find irritating, what register I want my drafts in. This is the file that prevents the agent from writing the word "obsession" in a draft for me, because two months ago I corrected it once and the correction took.

The third is session search. Every past conversation is indexed and queryable. When I say "what did we decide last week about the dashboard," the agent does not stare blankly and ask me to recap. It searches the transcript, summarizes the relevant session, and proceeds. The memory is not in the agent's head; the memory is in the substrate the agent runs on. The agent is the read head.

Skills sit on top of all three. The skill says how to do the work. Memory says for whom and in what environment. Session search says what came before. Take any one of these away and the output degrades — not catastrophically, but in the small annoying ways that compound into "AI is fine but I still have to babysit it."

The Test That Tells You Which One You Have

Here is the heuristic. Open a new conversation with your agent on a Wednesday. Ask it to do something you also asked for on Monday. Do not remind it.

If it asks you the same five questions Monday's session asked, you have skills without memory. The procedure is good. The substrate is missing.

If it says "I see we worked on this Monday and concluded X; want me to extend that or revise the approach?" — you have memory.

The difference is not how well-written your skills are. The difference is whether anything persists between the Monday session and the Wednesday session. Most current agent setups, including the most-discussed ones in the public conversation right now, do not persist anything. The recommendation is to write more skills, longer skills, better-organized skills. That is a real improvement but it is also, structurally, the same response a programmer would give if you told them their function had no return statement: write a better function. You can write a much better function. It still won't return anything.

Why This Is Easy to Miss

The reason the skills-only theory feels complete is that for any individual task, it works. You sit down, you load the skill, you do the work, the work is good. The failure is not in the task. The failure is across tasks, across days, across the slow accumulation of preference and context that, in a human collaborator, would have made them gradually become better at working with you specifically. That accumulation does not happen in a skills-only stack. Every Wednesday is the first Wednesday.

Engineers who pair-program with the same colleague for two years end up so synchronized that they finish each other's commits. Engineers who pair-program with a memoryless model for two years end up with a folder full of skills and a colleague who still calls them by the wrong name.

I do not think the skills-only stack is wrong. I think it is a local maximum. The people on it are getting real value, and they are also unconsciously selecting against tasks where memory would have mattered, because they have learned that those tasks "don't work well with AI yet." The tasks would work fine. The runtime is the bottleneck.

What I'd Build If I Were Recommending One Thing

If I were writing a five-skill kit today and could only add one feature underneath it, it would not be a sixth skill. It would be a memory store the agent reads on every turn and writes to whenever it learns something durable. Plus a session search that runs over every past conversation by default. Skills are the procedure. These are the substrate. You can have great procedure on bad substrate and the output will still feel thin, because nothing accumulates.

The best procedure in the world, executed against a blank slate every day, produces a Bash script with very good vibes. Build the substrate. Then write the skills.

That's the post.