INSIDE THE ARCHITECTURE OF AI AGENT: FUNCTION AND APPLICATION

ChatGPT, Gemini, DeepSeek, and Perplexity are already used by over 1.2 billion people worldwide, but the main thing is not the number of users, but how quickly the role of these models is changing. In business, up to 30–40% of office tasks can be automated using LLM in the coming years.

That is why today we need to talk not just about LLM, but about LLM Agents: what they are, how they work, and how they are used for work and business.

Single-agent systems

This agent has a set of tools and works independently, following the ReAct (Reason + Act) cycle. It consists of:

The brain, which is usually a basic model (GPT-4 or Claude 3.5), is responsible for planning and decision-making.
Tools, which are a set of APIs (search, calculator, database access, Python interpreter).
Short-term and long-term memory.
A prompt-scheduler, which is a system instruction that corrects the agent.

Such an agent works in an infinite loop until it completes the task.

A great example is the startup Harvey AI. This is an assistant for legal research. The agent receives a description of the case, independently searches court databases for relevant decisions, filters thousands of documents, selects those whose circumstances match 90%, and writes arguments. As a result, the work that took a junior lawyer 20–30 hours now takes 5 minutes. Allen & Overy has officially announced the implementation of Harvey for its 3,500 lawyers in 43 offices.

For such an AI Agent to understand legal documents, it must be trained and tested on real cases. This requires specialists in the judicial field, as they can validate the model’s ability to interpret information correctly.

For these tasks, companies such as Keymakr are needed, which collect, annotate, and verify data together with experts from various fields. An annotation platform is also important, as it determines how quickly and optimally work on projects will be carried out. Therefore, it is essential to choose a platform with a user-friendly interface, such as Keylabs, which will make it easy to organize and manage large datasets.

Multi-agent systems (MAS)

A group of specialized agents that communicate with each other, and each has its own “role”. They pass work results along the chain and criticize each other’s actions to achieve the final goal.

They consist of:

Roles. For example, “Researcher”, “Copywriter”, “Editor”.
Tools. One agent has access to Google Search, another to Python for calculations, and the third to your CRM API. It depends on the role they play.
Memory. It is shared so that agents know what their “colleagues” have already done.
An orchestrator is a special agent or complex algorithm that decides which agent acts next and controls the quality of task performance.

An example is Software Engineering: ChatDev. Researchers have created a system where agents simulate the full software development cycle (Waterfall model). A team of agents, including the CEO, CTO, CPO, Programmer, Tester, and Designer, creates software products of varying complexity on request in less than 7 minutes for $1 (to cover API tokens).

Self-reflective agents

Agents whose main task is to criticize their own work. The agent creates a draft, then another agent reviews it for errors and forces it to be reworked to perfection.

Consists of:

An actor is a module that performs the task.
The evaluator module has a strict prompt that acts as an inspector.
Verification tools: a code compiler or fact-checker that the agent uses for self-analysis.
Working memory, where previous unsuccessful attempts are stored so as not to repeat mistakes.

Such agents are essential not only for business but also for researchers, as they offer an opportunity to study how they make decisions. In the study Reflexion: Language Agents with Verbal Reinforcement Learning, the agent is given a complex task: writing Python code (HumanEval). The agent writes the code and then runs it in an internal “sandbox” (Compiler). If an error occurs, the agent writes a critique of itself. This experience is stored in his short-term memory for the next attempt. As a result, accuracy increased from 80.1% to 91.0% solely through self-reflection cycles.

Long-term memory agents

These are systems that accumulate experience from past interactions and use it for future tasks. They use vector databases (RAGs) to remember how they solved similar functions in the past or to capture the user’s preferences.

They consist of:

Short-term memory and long-term memory.
The ranking mechanism is a system that decides which memory is “relevant” for the current task (Semantic Search).
The learning module is the set of rules by which the agent chooses to what to keep and what not to.

The study “Generative Agents: Interactive Simulacra of Human Behavior” demonstrated that AI can have a “social memory” and build complex relationships. As in the game The Sims, 25 AI agents lived in a virtual town. Each had their own role (doctor, artist, shopkeeper), plans for the day, and memories. When they met, they recalled past conversations thanks to the memory archive. For example, one agent decided to throw a Valentine’s Day party. He told a few friends about it. Over the course of the day, those friends spread the word, invited others, and as a result, the agents themselves gathered at the right time and place.

Embodied agents

AI agents that control physical or digital interfaces outside of text. They control robots or “see” a computer screen via cameras, sensors, or screen captures, and click a mouse like a human.

Consists of:

Brain (LLM/VLM) is a large language or multimodal model that understands visual context.
Body (Hardware/Interface) is a robotic arm, humanoid robot, or software agent that controls your desktop.
Sensors are cameras (vision), microphones (hearing), and lidars (distance).
Policy/skills is a library of movements or commands that the agent can use (e.g., opening a door or clicking the search icon).

Confirmation is the startup Figure, which combined its hardware with “brains” from OpenAI. In one of the videos, we see a robot given a bunch of objects on a table and asked to find something to eat. The robot doesn’t just follow a command. It sees an apple, realizes it’s the only edible thing, hands it to the person, and explains why at the same time. When asked to take out the trash, it simultaneously speaks and puts the wrappers in the garbage.

We see how quickly the world of AI has changed; before, robots had to be programmed for every movement. Now they understand natural language and visual context. Chatbots are no longer just text guides; they are AI Agents that understand you and perform actions themselves. They do not make decisions for a person; they help them get a result faster. That is why they have been so quickly integrated into our lives.

https://keymakr.com

INSIDE THE ARCHITECTURE OF AI AGENT: FUNCTION AND APPLICATION

SUBSCRIBE