# AIOS: AI Agent Operating System

AIOS is an AI Agent Operating System that embeds large language models (LLMs) directly into an operating-system-like kernel to provide scheduling, memory management, storage management, and tool management for LLM-based AI agents. It is composed of two components: the **AIOS Kernel** (this repository) and the **AIOS SDK** ([Cerebrum](https://github.com/agiresearch/Cerebrum)). The kernel exposes a FastAPI HTTP server, manages all resource subsystems, and supports multiple deployment modes — local single-machine, remote kernel, and dev-mode — enabling agents to run on resource-constrained devices while the heavy kernel runs on a powerful server.

The AIOS kernel handles the full lifecycle of agent execution: it initialises LLM adapters that load-balance across multiple backends (OpenAI, Anthropic, Gemini, Groq, HuggingFace, Ollama, vLLM, Novita), applies FIFO or Round-Robin scheduling to serialise and batch syscalls, persists files through the LSFS semantic filesystem, and stores/retrieves agent memories through pluggable providers (in-house ChromaDB/Qdrant, Mem0, Zep). A Model Context Protocol (MCP) server enables computer-use agents to interact with virtual desktop environments. Every interaction from an agent is expressed as a typed syscall (LLM, Storage, Memory, Tool) routed through `SyscallExecutor.execute_request`, making the system fully modular and observable.

---

## Configuration — `aios/config/config.yaml`

All kernel behaviour is driven by a single YAML file. It specifies LLM backends, memory provider, storage root, scheduler log mode, agent factory concurrency, and server host/port.

```yaml
# aios/config/config.yaml

api_keys:
  openai: "sk-..."
  anthropic: "your-anthropic-key"
  groq: "your-groq-key"
  gemini: "your-gemini-key"
  huggingface:
    auth_token: "hf-..."
    cache_dir: "/tmp/hf_cache"
  novita: "your-novita-key"

llms:
  models:
    - name: "gpt-4o-mini"
      backend: "openai"

    - name: "qwen3:4b"
      backend: "ollama"
      hostname: "http://localhost:11434"

    - name: "meta-llama/Llama-3.1-8B-Instruct"
      backend: "vllm"
      hostname: "http://localhost:8091/v1"

    - name: "meta-llama/Llama-3.1-8B-Instruct"
      backend: "huggingface"
      max_gpu_memory: {0: "24GB"}
      eval_device: "cuda:0"

  router:
    strategy: "sequential"   # or "smart"
  log_mode: "console"
  use_context_manager: false  # set true to enable RR scheduler + context switching

memory:
  provider: "in-house"        # "in-house" | "mem0" | "zep"
  auto_extract: false         # save conversation turns as memories
  auto_inject: false          # inject relevant memories before each LLM call
  relevance_threshold: 0.5
  max_injected_memories: 5
  max_memory_tokens: 1500

  mem0:                       # used when provider: "mem0"
    user_id: "default"
    llm:
      provider: "ollama"
      config: {model: "qwen2.5:7b", ollama_base_url: "http://localhost:11434"}
    embedder:
      provider: "ollama"
      config: {model: "nomic-embed-text", ollama_base_url: "http://localhost:11434"}
    vector_store:
      provider: "chroma"
      config: {collection_name: "mem0_memories"}

  zep:                        # used when provider: "zep"
    api_key: "your-zep-api-key"
    session_id: "default"

storage:
  root_dir: "root"
  use_vector_db: true

scheduler:
  log_mode: "console"

agent_factory:
  log_mode: "console"
  max_workers: 64

server:
  host: "0.0.0.0"
  port: 8000
```

---

## `ConfigManager` — singleton configuration accessor

`ConfigManager` (singleton) loads and persists `aios/config/config.yaml`, exposing typed getters for each subsystem. The global `config` instance is imported across the kernel.

```python
from aios.config.config_manager import config   # global singleton

# --- Read sections ---
llms_cfg   = config.get_llms_config()       # -> dict with "models", "router", "log_mode"
storage    = config.get_storage_config()    # -> {"root_dir": "root", "use_vector_db": True}
memory     = config.get_memory_config()     # -> {"provider": "in-house", ...}
scheduler  = config.get_scheduler_config()  # -> {"log_mode": "console"}
factory    = config.get_agent_factory_config()  # -> {"max_workers": 64}
server     = config.get_server_config()     # -> {"host": "0.0.0.0", "port": 8000}

# --- Update and persist ---
config.update_api_key("openai", "sk-new-key")   # writes config.yaml immediately
config.update_llm_config("claude-3-opus", "anthropic")

# --- Get an API key (config.yaml first, then env var fallback) ---
key = config.get_api_key("openai")   # returns None if missing
if key:
    print("OpenAI key found")

# --- Hot-reload without restarting ---
config.refresh()   # re-reads config.yaml into memory
```

---

## Launching the Kernel

The kernel is a FastAPI application served by uvicorn. It initialises all subsystems on startup.

```bash
# Recommended: via the launch script
bash runtime/launch_kernel.sh

# Or directly with uvicorn
python3.10 -m uvicorn runtime.launch:app --host 0.0.0.0 --port 8000

# Background (survives shell close)
nohup python3 -m uvicorn runtime.launch:app --host 0.0.0.0 --port 8000 > uvicorn.log 2>&1 &
```

---

## `useCore` — initialise LLM adapter

Initialises an `LLMAdapter` that wraps multiple LLM backends under a single load-balanced interface.

```python
from aios.hooks.modules.llm import useCore

llm = useCore(
    llm_configs=[
        {"name": "gpt-4o-mini", "backend": "openai"},
        {"name": "qwen3:4b",    "backend": "ollama", "hostname": "http://localhost:11434"},
    ],
    log_mode="console",
    use_context_manager=False,   # True → RR scheduler with context switching
)
# llm is an LLMAdapter instance
# llm.execute_llm_syscalls([syscall1, syscall2])  # batch execution
```

---

## `LLMAdapter` — multi-backend LLM router

`LLMAdapter` distributes syscall batches across configured backends using either sequential or smart routing. It handles API-key setup, error classification, dynamic Ollama model registration, and parallel batch execution with `ThreadPoolExecutor`.

```python
from aios.llm_core.adapter import LLMAdapter
from cerebrum.llm.apis import LLMQuery

# The adapter is created by useCore(); direct instantiation example:
adapter = LLMAdapter(
    llm_configs=[
        {"name": "gpt-4o-mini", "backend": "openai"},
        {"name": "claude-3-5-sonnet-20241022", "backend": "anthropic"},
    ],
    log_mode="console",
    use_context_manager=False,
)

# adapter.execute_llm_syscalls() is called internally by the scheduler.
# To verify available models:
print(adapter.available_llm_names)
# -> ["gpt-4o-mini", "claude-3-5-sonnet-20241022"]

# Dynamic Ollama registration (if a model is requested but not in config.yaml):
success = adapter._dynamic_register_ollama_model("llama3.2:3b")
print(success)  # True if the model exists on the Ollama server
```

---

## `useFIFOScheduler` / `fifo_scheduler_nonblock` — request scheduler

The FIFO scheduler processes LLM requests in batches and Memory/Storage/Tool requests individually, each on its own thread. `fifo_scheduler_nonblock` returns the scheduler object so `start()`/`stop()` can be called explicitly (used by the kernel server).

```python
from aios.hooks.modules.scheduler import fifo_scheduler_nonblock, useFIFOScheduler

# --- Non-blocking (used in runtime/launch.py) ---
scheduler = fifo_scheduler_nonblock(
    llm=llm_adapter,
    memory_manager=memory_mgr,
    storage_manager=storage_mgr,
    tool_manager=tool_mgr,
    log_mode="console",
    get_llm_syscall=None,      # None → uses global queue
    get_memory_syscall=None,
    get_storage_syscall=None,
    get_tool_syscall=None,
)
scheduler.start()   # spawns 4 background threads
# ... run agents ...
scheduler.stop()    # gracefully terminates threads

# --- Context manager form (for scripts/tests) ---
from aios.hooks.modules.scheduler import fifo_scheduler

with fifo_scheduler(
    llm=llm_adapter,
    memory_manager=memory_mgr,
    storage_manager=storage_mgr,
    tool_manager=tool_mgr,
    log_mode="console",
    get_llm_syscall=None,
    get_memory_syscall=None,
    get_storage_syscall=None,
    get_tool_syscall=None,
):
    # scheduler is running; submit agents here
    pass  # scheduler.stop() called automatically on exit
```

---

## `useFactory` — agent factory

`useFactory` returns two callables: `submitAgent` (downloads an agent from the AIOS Hub, instantiates it, and runs it in a thread pool) and `awaitAgentExecution` (polls the `Future` for results).

```python
from aios.hooks.modules.agent import useFactory

submit_agent, await_execution = useFactory(
    log_mode="console",
    max_workers=64,
)

# Submit an agent from the AIOS Hub (format: "author/agent-name")
process_id = submit_agent(
    agent_name="example/MathAgent",
    task_input="Calculate the derivative of x^3 + 2x",
)
print(f"Agent running with process_id={process_id}")

# Poll until done (returns None if still running)
import time
result = None
while result is None:
    result = await_execution(process_id)
    time.sleep(0.5)

print("Agent result:", result)
# -> {"output": "3x^2 + 2", "status": "completed"}
```

---

## `useSysCall` / `SyscallExecutor` — unified syscall dispatcher

`useSysCall()` creates and returns a `(execute_request, SyscallWrapper, executor)` triple. `execute_request(agent_name, query)` routes any typed query (LLM, Tool, Memory, Storage) through the appropriate kernel module and returns a dict with `"response"` and timing metrics.

```python
from aios.syscall.syscall import useSysCall
from cerebrum.llm.apis import LLMQuery
from cerebrum.tool.apis import ToolQuery
from cerebrum.memory.apis import MemoryQuery
from cerebrum.storage.apis import StorageQuery

execute_request, SyscallWrapper, executor = useSysCall()

# --- LLM chat ---
llm_result = execute_request(
    "my_agent",
    LLMQuery(
        messages=[{"role": "user", "content": "Summarise the water cycle in 3 bullets."}],
        action_type="chat",
    ),
)
print(llm_result["response"].response_message)

# --- LLM tool call (LLM selects and invokes a tool) ---
tool_result = execute_request(
    "my_agent",
    LLMQuery(
        messages=[{"role": "user", "content": "What is 42 * 17?"}],
        tools=[{"type": "function", "function": {"name": "calculator", "parameters": {}}}],
        action_type="call_tool",
    ),
)
print(tool_result["response"].response_message)

# --- Storage file write ---
storage_result = execute_request(
    "my_agent",
    StorageQuery(
        operation_type="create_file",
        params={"file_path": "notes/todo.txt", "content": "Buy milk"}
    ),
)
print(storage_result["response"].response_message)  # "Operation completed"

# --- Memory add ---
memory_result = execute_request(
    "my_agent",
    MemoryQuery(
        operation_type="add_memory",
        params={
            "content": "The user prefers dark mode.",
            "tags": ["preferences", "ui"],
            "category": "User Preferences",
        }
    ),
)

# --- Memory retrieve ---
retrieved = execute_request(
    "my_agent",
    MemoryQuery(
        operation_type="retrieve_memory",
        params={"content": "UI preferences", "k": 3}
    ),
)
print(retrieved["response"].response_message)
```

---

## `useMemoryManager` — memory subsystem

`useMemoryManager` returns a `MemoryManager` that delegates to a configured provider (in-house, Mem0, Zep). The manager routes `add_memory`, `remove_memory`, `update_memory`, `get_memory`, `retrieve_memory`, and `retrieve_memory_raw` operations.

```python
from aios.hooks.modules.memory import useMemoryManager

# Provider is read from config.yaml (memory.provider)
memory_manager = useMemoryManager(log_mode="console")

# Direct provider interaction (normally done via execute_request syscalls):
# memory_manager.address_request(memory_syscall)
# The provider implements:
#   add_memory(memory_note)
#   get_memory(memory_id)
#   retrieve_memory(query)       <- semantic search
#   update_memory(memory_note)
#   remove_memory(memory_id)
#   retrieve_memory_raw(query)   <- returns MemoryNote objects
```

**Switch providers at runtime via HTTP:**

```bash
# 1. Edit config.yaml: memory.provider = "mem0"
# 2. Hot-reload without restarting the server:
curl -X POST http://localhost:8000/core/refresh
```

---

## `useStorageManager` — LSFS semantic filesystem

`useStorageManager` initialises an `StorageManager` backed by the LSFS (LLM-based Semantic File System), which supports natural-language file operations and optional vector-database indexing for semantic search.

```python
from aios.hooks.modules.storage import useStorageManager

storage_manager = useStorageManager(
    root_dir="root",      # physical root directory on disk
    use_vector_db=True,   # index files with ChromaDB/Qdrant for semantic search
)

# Operations are dispatched via StorageQuery syscalls through execute_request:
# operation_type: "create_file" | "read_file" | "write_file" |
#                 "delete_file" | "list_files" | "search_files"

# Example via the REST API:
# curl -X POST http://localhost:8000/query \
#   -H "Content-Type: application/json" \
#   -d '{
#     "agent_name": "file_agent",
#     "query_type": "storage",
#     "query_data": {
#       "operation_type": "create_file",
#       "params": {"file_path": "reports/summary.txt", "content": "Q1 results: up 12%"}
#     }
#   }'
```

---

## `useToolManager` — tool execution with MCP backend

`useToolManager` creates a `ToolManager` that loads tools via the Cerebrum `AutoTool` registry and starts a background MCP (Model Context Protocol) server for computer-use agents.

```python
from aios.hooks.modules.tool import useToolManager

tool_manager = useToolManager()
# MCP server is started automatically on init

# Tools are invoked through ToolQuery syscalls:
# execute_request("agent", ToolQuery(tool_calls=[
#     {"name": "org/tool-name", "parameters": {"arg1": "value1"}}
# ]))

# Cleanup (stops the MCP server subprocess):
tool_manager.cleanup()
```

---

## REST API — `/query` endpoint

The primary HTTP endpoint accepts any query type and routes it to the appropriate kernel subsystem.

```bash
# LLM chat
curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{
    "agent_name": "demo_agent",
    "query_type": "llm",
    "query_data": {
      "messages": [{"role": "user", "content": "What is quantum entanglement?"}],
      "action_type": "chat"
    }
  }'
# Response: {"response": {"response_message": "...", "finished": true, "status_code": 200}}

# Tool calling (LLM picks the tool)
curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{
    "agent_name": "demo_agent",
    "query_type": "llm",
    "query_data": {
      "messages": [{"role": "user", "content": "Search for the latest AI news."}],
      "tools": [{"type": "function", "function": {"name": "google/search", "parameters": {}}}],
      "action_type": "call_tool"
    }
  }'

# Store a memory
curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{
    "agent_name": "demo_agent",
    "query_type": "memory",
    "query_data": {
      "operation_type": "add_memory",
      "params": {"content": "User is allergic to peanuts.", "tags": ["health"]}
    }
  }'

# Semantic memory search
curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{
    "agent_name": "demo_agent",
    "query_type": "memory",
    "query_data": {
      "operation_type": "retrieve_memory",
      "params": {"content": "food allergies", "k": 3}
    }
  }'
```

---

## REST API — Agent lifecycle endpoints

```bash
# Submit an agent from the AIOS Hub
curl -X POST http://localhost:8000/agents/submit \
  -H "Content-Type: application/json" \
  -d '{
    "agent_id": "example/MathAgent",
    "agent_config": {"task": "Solve: integral of sin(x) from 0 to pi"}
  }'
# Response: {"status": "success", "execution_id": 471823}

# Poll agent status
curl http://localhost:8000/agents/471823/status
# Running:   {"status": "running",    "execution_id": 471823}
# Done:      {"status": "completed",  "result": {...}, "execution_id": 471823}

# List all agent processes
curl http://localhost:8000/agents/ps

# Check kernel health
curl http://localhost:8000/status
# {"status": "ok", "message": "All core components are active."}

# List configured LLMs
curl http://localhost:8000/core/llms/list

# Select specific LLMs for subsequent queries
curl -X POST http://localhost:8000/user/select/llms \
  -H "Content-Type: application/json" \
  -d '[{"name": "gpt-4o-mini", "provider": "openai"}]'

# Hot-reload configuration (e.g., after editing config.yaml)
curl -X POST http://localhost:8000/core/refresh

# Update an API key at runtime
curl -X POST http://localhost:8000/core/config/update \
  -H "Content-Type: application/json" \
  -d '{"provider": "openai", "api_key": "sk-new-key"}'

# Shut down all components cleanly
curl -X POST http://localhost:8000/core/cleanup
```

---

## AIOS Terminal — LLM-based semantic file system UI

`runtime/run_terminal.py` starts an interactive terminal that classifies natural-language commands as file operations or chat using a two-stage heuristic + LLM `IntentRouter`.

```bash
python runtime/run_terminal.py
# AIOS> create a file called meeting_notes.txt with today's agenda
# AIOS> list all files in the reports folder
# AIOS> hi, what can you help me with?

# The IntentRouter classifies each input:
from aios.terminal.intent_router import IntentRouter, build_llm_classify_fn, Intent

router = IntentRouter(
    llm_classify_fn=build_llm_classify_fn("terminal_agent")
)
result = router.classify("delete the old backup files")
print(result.intent)      # Intent.FILE_OPERATION
print(result.confidence)  # Confidence.HIGH
print(result.source)      # "keyword"

result2 = router.classify("hello, how are you?")
print(result2.intent)     # Intent.CHAT
```

---

## Experimental Rust Kernel (`aios-rs`)

An early Rust rewrite providing trait definitions and minimal placeholder implementations. Not feature-parity with the Python kernel.

```bash
cd aios-rs
cargo build
cargo test
```

```rust
use aios_rs::prelude::*;

fn main() -> anyhow::Result<()> {
    let llm      = std::sync::Arc::new(EchoLLM);
    let memory   = std::sync::Arc::new(std::sync::Mutex::new(InMemoryMemoryManager::new()));
    let storage  = std::sync::Arc::new(FsStorageManager::new("/tmp/aios_store"));
    let tool     = std::sync::Arc::new(NoopToolManager);
    let mut scheduler = NoopScheduler::new(llm, memory, storage, tool);
    scheduler.start()?;
    scheduler.stop()?;
    Ok(())
}
```

---

## Summary

AIOS is used in two primary patterns: **local agent execution**, where a developer installs both the kernel and the Cerebrum SDK on one machine, launches the kernel with `bash runtime/launch_kernel.sh`, and uses the SDK to define agents that call `execute_request` to LLM/Tool/Memory/Storage syscalls; and **remote kernel mode**, where a powerful server hosts the kernel while lightweight clients (laptops, mobile devices) install only the SDK and interact with the kernel over HTTP. In both modes, agents are submitted via `POST /agents/submit` with an `agent_id` of the form `author/AgentName` (resolved from the AIOS Hub), and results are polled via `GET /agents/{execution_id}/status`.

Integration into existing systems is straightforward: any HTTP client can drive the kernel through the `/query` endpoint, making AIOS embeddable as a managed AI inference and memory backend behind existing applications. Developers can extend the system by implementing the `MemoryProvider` abstract base class for custom memory backends, adding tools to the Cerebrum `AutoTool` registry, or porting components to the Rust scaffold in `aios-rs/` for performance-critical subsystems. The hot-reload endpoint (`POST /core/refresh`) and CLI commands (`aios env set`, `aios refresh`) make configuration changes operational without downtime.