LLM caretaker
A common practice in the last year has been using LLMs to write code. For this purpose, multiple ways to feed the data to the moodel have been developed. Probably the most common one is to just copy and paste exactly what is failing or what you need, but this is missing a lot of information. A more complete way to do this is to feed the whole code, but even doing so is missing some important information. A common way to feed the information in a comfortable way is to use applications like Cursor or adaptations of VScode, but again, essential information is missing.
What is missing? The environment. A project is just not code, it is the code, its evolution along the time, the user stories that are pending and the actual purpose of the project. All these things compose an evironment. I could give you a function named foo
and ask you to fix it, but if you don’t know what the function is supposed to do, you will not be able to fix it; or even worse, guess wrongfully what it is doing. Otherwise, if I gave you the function foo
, told you what is the user requirement, and what was the last changes made, you could problably see the mistake.
In this post, I will explain how I set up a complete (and sometimes overkill) enviroment so that the LLM can do my user stories right.
In my case, I just feed the whole thing throught the prompt, but it is applicable to any procedure you have. This is viable nowadays because of the huge context windows of LLMs.
Recently, MCP has gone viral for allowing users to connect Claude or any LLM to their computer and use the console. The content of this post is applicable to this scenario, but I higly discourage the use of this tool. All it takes to destroy a codebase or a computer is a wrong command. It is important to keep a wall between the allucinations of the LLM and any real system.
Setting up the environment
LLMs are not magic, they take an input and give an output. As in all AI situations the better the input the better the output. Lack of information or wrong assumptions that the LLM is going to be able to infere some common-sense ideas will, sadly, dissapoint you. So, you, yes you, as the LLM caretaker must prepare the environment for the LLM to work properly. Note that, since we are going to usually prompt the LLM to do things for us, the more things are stored in memory, the less things we need to explain to the LLM later, so the best thing you can do is provide as much useful non-redundant information as possible. “Useful non-redundant” is a key phrase here, the input information must be enought to cover the knowledge that the LLM needs to perform the task, but not more than that, because an excess of information will: first, make the whole process slower; second consume more tokens, which are expensive; and third, it will make the LLM more prone to hallucinations.
As everything in life, the end result depends mostly on the situation you are in. I will provide you with some ideas of what might be useful information, but again you are the one who knows what you need. If any of these elements grows too big, you can always crop it to only keep the most recent information or the most relevant one.
First step, creating the infrastructure
How is the code organized? What depedns of what? That is a common question generally in software.
Before jumping into the information that will be stored, you need to create the infrastructure that will hold it. I have found highly convenient to just use regular folders and files, storing them in a folder named “.context”. By using this approach, the whole project can be fed directly to a LLM and it will have the context of the whole codebase.
The folder might contain the following elements, which I have found generally useful:
Present the purpose of the project
Writting a compiler has a different purpose than writting a web app, and the architecture of each one of them is different. A less extreme example is that a web app might require a milisecond response time, while another one might afford a latency of 1 second. These kind of details guides the development and how the code is modeled.
The context MUST include what is the purpose of the project. To do so, I create a file named “purpose.context” that contains the purpose of the project. Furthermore, if your application has 5 different pages each with a different purpose, these should also be included in the purpose file. This documentation can sometimes be created automatically from the comments in the code like for example in Java, but this is not always the case and it might be needed to write it manually.
Present the architecture of the project
One of the old forgotten heros in the software development are diagrams. Class diagrams, activity, infraestructute, literally any of them. Having a visualization of something as abstract as code is the greatest long term investment anyone can do. For LLMs this is applicable too. Telling it that a request of a microservice is calling another internal microservice that you can also modify is a great way to avoid confussions. Another example is what it knowing how many functions call your class, or from where, or even When.
Creating such diagrams is a quite common task in documenting software. For instance the Unified Modeling Language (UML) is a standard for creating them. It defines a standard for:
- Class diagram
- Deployment diagram
- Use case diagram
- Activity diagram
- Objects diagram
- States diagram
- Sequence diagram
Each one of these represent something different. The most revelant ones are the class diagram, the use case diegram and the activity/sequence diagram.The other ones are good if you need to model someting in specific, but I never use them for LLMs.
To create those diagrams I higly suggest to sue a code to diagram tool. It has the exceptionaol advantage of first, allowing to modify the elements of the diagram by modifying the code, and second, allowing the LLM therefore, to modify the diagram directly. Also, you avoid having to work with images, avoiding a error-prone and token consuming step.
I recommend the use of Mermaid for creating the diagrams, it is a extremely popular tool, which allows to create all the mentioned diagrams. You can check examples in their official documentation and the examples page.
Keep a debug history database
Define your own patterns
Each company and person likes their code in some way. For the comanpy, it is usually some guidelines indicating things such as how to format the code, how many spaces to use, or libraries to avoid. For the person, as the developer grows as an expert, they start to have their own prefernces, cultivated from experience and errors. For instance, I never use ternary operators, never. What kind of benefit do you get of replacing a 4-line clearly explained if/else for a 1-line languagespecific-ternary operator? Or for example, I never use a not statement in the if condition, the if is always the positive one.
Those patters are explained again in a separate file known as “style.context”. This file is used to tell the LLM how to format the code, and how to write it. It can be explained as plain text or as code templates, but it must start with “This file containts commmon patterns use in the development of the application and the style to be used”. Since “style.context” might be unserstood as the css of the context or so.
Add cheatsheets for common tasks
Maintain a changelog
It is a good practice, not only for LLMs, but for any software development to keep a changelog. There are different ways to handle this information. Generally, I like to generate the changelog directly from the git history, and tweak it a little manually when needed.
As general guidelines for the changelog, I store it up in a single file named “changelog.context”. Within this file for each version, minor or major, the relevant changes are shown, specifically things such as bug fixs, new features, and most important, deprecations and breaking changes.
The website https://keepachangelog.com/ has a excelent explanation on how the changelog should be written and what it shpould contain. I will cover a little of it here, but I higly recommend to read the full article, it takes around 5 minutes.
You might note that it starts by saying that the changelog is for humans, not machines. Well, not anymore.
If you follow these 6 bullet points, you will have a great LLM friendly changelog:
- Mention whether you follow Semantic Versioning.
- There should be an entry for every single version.
- The same types of changes should be grouped, specifically, a good partition of changes is this one:
- ‘Added’ for new features.
- ‘Changed’ for changes in existing functionality.
- ‘Deprecated’ for soon-to-be removed features.
- ‘Removed’ for now removed features.
- ‘Fixed’ for any bug fixes.
- ‘Security’ in case of vulnerabilities.
- ‘Miscellaneous’ for any other changes.
- Versions and sections should be linkable.
- The latest version comes first.
- The release date of each version is displayed.
Add LLM friendly documentation
Second step, preparing the LLM
Large Language Models are like real workers, each one of them is more skilled for some specific task than others, and each one of them have different characteristics such as reasoning capabilities, creativity, deep research, or the size of the context they can handle.
In my case, I have tried Gemini, Claude, GPT and DeepSeek, and they differ not only in the capabilities, but also in the style. For instance, Gemini is the most verbose one, has the most recent data and has deep research capabilities; furthermore, it has a huge context window, allowing to feed it with the whole codebase usually. Claude on the other hand has recently released Sonnet 3.7, which has a whole different style, I have seen that it performs exceptionally well in web development tasks. Regarding DeepSeek, it is the cheapest one, the weights are public (I am not sure if they are open-source) allowing you to run it locally, and it has a good balance between cost and performance. Finally, GPT, I am not a fan after 2023, but it is still the most used one since it was the first to be in the market. As an extra possibility, they can be combined. For example, DeepSeek can be used to generate the thinking process, in which the prompt it uses for reasoning is feeded to another LLM. Again, the best one for you will depend on your specific situation.
After choosing the LLM, you need to make it think it is a real worker. This might sound strange, but asking a LLM just for what you need is the most effective approach. Again, one important part of the process is the context, and this context can include what role the LLM has. For example, if you tell a LLM that it is a poetry writer, and you ask for code generation, it might tell you that it does not have any idea of computers, or might give you a code in which the variable’s names rhyme, this is obiously not the desired result. Instead, you want a software engineer, a programmer or a UI/UX designer. With this role the results will be a lot more accurate for code. This context can be fed in different ways:
- Include it in the prompt
- Include it in the context files, such as in role.txt
- For Claude, you can create a project, which is their way to aggregate multiple coversations under the same context.
Given those options, make sure that you only feed the context when it is needed. For instance, it is always needed at the start of the conversation. Also, if in the middle of the conversation a big change is made, then you can feed it only that change.
The prompt
A LLM is a black box, it learns and answers, but it is not (generally) possible to know what are the inner mechanics of the LLM which are related to some actions. Therefore, here is the prompt that I use to make the LLM think it is a real worker, split by parts:
You are a Software Engineer graduated in Computer Science from the MIT (1). You have 10 years of experience working as {EXPERIENCE} (2). You are working in a project involving {PROJECT} (3). You have to do the following task: {TASKS} (4). You will be compensated You have access to the codebase adjunt to this instructions (5). "
- It studied in a great place with great developers, such as the MIT.
- It has a lot of experience worling in the field of whatever you need. Generally, I like to give it a Fullstack role rather than two separated Frontend and Backend roles, since this way it can do everything.
- The project generally should be in a big organization, such as a big company or a big open source project. This way, it can have a lot of context to work with. For instance, if we put as project Google, it might consider their guidelines.
- The tasks are the things that you want to do. You must experiment with this, but the better explained the better it will work.
- Copy and paste directly the whole codebase. For this is use code2promtp.
Now, this is VERY IMPORTANT:
- You MUST tweak the prompt to fit your situation.
- The tweak process is not automatic.
- A general prompt will always be worse than a specific one.
AI agents
I am sure that with the previous definition a whole team of developers can be created using AI agents. This way, multiple LLMs can be put to work in independent tasks. I did not do it. I problably won’t do it. It feels totally overkill to use 5 LLMs to write a CRUD web app, or to change the color of a button, or to implement a new request. But if you want to implement a company based on AI agents, you can do it.
Conclussion
A LLM
Side effects
If the top hints doesn’t work, at least you would have a pretty detailed documentation that a real person can also use to develop the tasks.