# AIOS: AI Agent Operating System AIOS is an AI Agent Operating System that embeds large language models (LLMs) directly into an operating-system-like kernel to provide scheduling, memory management, storage management, and tool management for LLM-based AI agents. It is composed of two components: the **AIOS Kernel** (this repository) and the **AIOS SDK** ([Cerebrum](https://github.com/agiresearch/Cerebrum)). The kernel exposes a FastAPI HTTP server, manages all resource subsystems, and supports multiple deployment modes — local single-machine, remote kernel, and dev-mode — enabling agents to run on resource-constrained devices while the heavy kernel runs on a powerful server. The AIOS kernel handles the full lifecycle of agent execution: it initialises LLM adapters that load-balance across multiple backends (OpenAI, Anthropic, Gemini, Groq, HuggingFace, Ollama, vLLM, Novita), applies FIFO or Round-Robin scheduling to serialise and batch syscalls, persists files through the LSFS semantic filesystem, and stores/retrieves agent memories through pluggable providers (in-house ChromaDB/Qdrant, Mem0, Zep). A Model Context Protocol (MCP) server enables computer-use agents to interact with virtual desktop environments. Every interaction from an agent is expressed as a typed syscall (LLM, Storage, Memory, Tool) routed through `SyscallExecutor.execute_request`, making the system fully modular and observable. --- ## Configuration — `aios/config/config.yaml` All kernel behaviour is driven by a single YAML file. It specifies LLM backends, memory provider, storage root, scheduler log mode, agent factory concurrency, and server host/port. ```yaml # aios/config/config.yaml api_keys: openai: "sk-..." anthropic: "your-anthropic-key" groq: "your-groq-key" gemini: "your-gemini-key" huggingface: auth_token: "hf-..." cache_dir: "/tmp/hf_cache" novita: "your-novita-key" llms: models: - name: "gpt-4o-mini" backend: "openai" - name: "qwen3:4b" backend: "ollama" hostname: "http://localhost:11434" - name: "meta-llama/Llama-3.1-8B-Instruct" backend: "vllm" hostname: "http://localhost:8091/v1" - name: "meta-llama/Llama-3.1-8B-Instruct" backend: "huggingface" max_gpu_memory: {0: "24GB"} eval_device: "cuda:0" router: strategy: "sequential" # or "smart" log_mode: "console" use_context_manager: false # set true to enable RR scheduler + context switching memory: provider: "in-house" # "in-house" | "mem0" | "zep" auto_extract: false # save conversation turns as memories auto_inject: false # inject relevant memories before each LLM call relevance_threshold: 0.5 max_injected_memories: 5 max_memory_tokens: 1500 mem0: # used when provider: "mem0" user_id: "default" llm: provider: "ollama" config: {model: "qwen2.5:7b", ollama_base_url: "http://localhost:11434"} embedder: provider: "ollama" config: {model: "nomic-embed-text", ollama_base_url: "http://localhost:11434"} vector_store: provider: "chroma" config: {collection_name: "mem0_memories"} zep: # used when provider: "zep" api_key: "your-zep-api-key" session_id: "default" storage: root_dir: "root" use_vector_db: true scheduler: log_mode: "console" agent_factory: log_mode: "console" max_workers: 64 server: host: "0.0.0.0" port: 8000 ``` --- ## `ConfigManager` — singleton configuration accessor `ConfigManager` (singleton) loads and persists `aios/config/config.yaml`, exposing typed getters for each subsystem. The global `config` instance is imported across the kernel. ```python from aios.config.config_manager import config # global singleton # --- Read sections --- llms_cfg = config.get_llms_config() # -> dict with "models", "router", "log_mode" storage = config.get_storage_config() # -> {"root_dir": "root", "use_vector_db": True} memory = config.get_memory_config() # -> {"provider": "in-house", ...} scheduler = config.get_scheduler_config() # -> {"log_mode": "console"} factory = config.get_agent_factory_config() # -> {"max_workers": 64} server = config.get_server_config() # -> {"host": "0.0.0.0", "port": 8000} # --- Update and persist --- config.update_api_key("openai", "sk-new-key") # writes config.yaml immediately config.update_llm_config("claude-3-opus", "anthropic") # --- Get an API key (config.yaml first, then env var fallback) --- key = config.get_api_key("openai") # returns None if missing if key: print("OpenAI key found") # --- Hot-reload without restarting --- config.refresh() # re-reads config.yaml into memory ``` --- ## Launching the Kernel The kernel is a FastAPI application served by uvicorn. It initialises all subsystems on startup. ```bash # Recommended: via the launch script bash runtime/launch_kernel.sh # Or directly with uvicorn python3.10 -m uvicorn runtime.launch:app --host 0.0.0.0 --port 8000 # Background (survives shell close) nohup python3 -m uvicorn runtime.launch:app --host 0.0.0.0 --port 8000 > uvicorn.log 2>&1 & ``` --- ## `useCore` — initialise LLM adapter Initialises an `LLMAdapter` that wraps multiple LLM backends under a single load-balanced interface. ```python from aios.hooks.modules.llm import useCore llm = useCore( llm_configs=[ {"name": "gpt-4o-mini", "backend": "openai"}, {"name": "qwen3:4b", "backend": "ollama", "hostname": "http://localhost:11434"}, ], log_mode="console", use_context_manager=False, # True → RR scheduler with context switching ) # llm is an LLMAdapter instance # llm.execute_llm_syscalls([syscall1, syscall2]) # batch execution ``` --- ## `LLMAdapter` — multi-backend LLM router `LLMAdapter` distributes syscall batches across configured backends using either sequential or smart routing. It handles API-key setup, error classification, dynamic Ollama model registration, and parallel batch execution with `ThreadPoolExecutor`. ```python from aios.llm_core.adapter import LLMAdapter from cerebrum.llm.apis import LLMQuery # The adapter is created by useCore(); direct instantiation example: adapter = LLMAdapter( llm_configs=[ {"name": "gpt-4o-mini", "backend": "openai"}, {"name": "claude-3-5-sonnet-20241022", "backend": "anthropic"}, ], log_mode="console", use_context_manager=False, ) # adapter.execute_llm_syscalls() is called internally by the scheduler. # To verify available models: print(adapter.available_llm_names) # -> ["gpt-4o-mini", "claude-3-5-sonnet-20241022"] # Dynamic Ollama registration (if a model is requested but not in config.yaml): success = adapter._dynamic_register_ollama_model("llama3.2:3b") print(success) # True if the model exists on the Ollama server ``` --- ## `useFIFOScheduler` / `fifo_scheduler_nonblock` — request scheduler The FIFO scheduler processes LLM requests in batches and Memory/Storage/Tool requests individually, each on its own thread. `fifo_scheduler_nonblock` returns the scheduler object so `start()`/`stop()` can be called explicitly (used by the kernel server). ```python from aios.hooks.modules.scheduler import fifo_scheduler_nonblock, useFIFOScheduler # --- Non-blocking (used in runtime/launch.py) --- scheduler = fifo_scheduler_nonblock( llm=llm_adapter, memory_manager=memory_mgr, storage_manager=storage_mgr, tool_manager=tool_mgr, log_mode="console", get_llm_syscall=None, # None → uses global queue get_memory_syscall=None, get_storage_syscall=None, get_tool_syscall=None, ) scheduler.start() # spawns 4 background threads # ... run agents ... scheduler.stop() # gracefully terminates threads # --- Context manager form (for scripts/tests) --- from aios.hooks.modules.scheduler import fifo_scheduler with fifo_scheduler( llm=llm_adapter, memory_manager=memory_mgr, storage_manager=storage_mgr, tool_manager=tool_mgr, log_mode="console", get_llm_syscall=None, get_memory_syscall=None, get_storage_syscall=None, get_tool_syscall=None, ): # scheduler is running; submit agents here pass # scheduler.stop() called automatically on exit ``` --- ## `useFactory` — agent factory `useFactory` returns two callables: `submitAgent` (downloads an agent from the AIOS Hub, instantiates it, and runs it in a thread pool) and `awaitAgentExecution` (polls the `Future` for results). ```python from aios.hooks.modules.agent import useFactory submit_agent, await_execution = useFactory( log_mode="console", max_workers=64, ) # Submit an agent from the AIOS Hub (format: "author/agent-name") process_id = submit_agent( agent_name="example/MathAgent", task_input="Calculate the derivative of x^3 + 2x", ) print(f"Agent running with process_id={process_id}") # Poll until done (returns None if still running) import time result = None while result is None: result = await_execution(process_id) time.sleep(0.5) print("Agent result:", result) # -> {"output": "3x^2 + 2", "status": "completed"} ``` --- ## `useSysCall` / `SyscallExecutor` — unified syscall dispatcher `useSysCall()` creates and returns a `(execute_request, SyscallWrapper, executor)` triple. `execute_request(agent_name, query)` routes any typed query (LLM, Tool, Memory, Storage) through the appropriate kernel module and returns a dict with `"response"` and timing metrics. ```python from aios.syscall.syscall import useSysCall from cerebrum.llm.apis import LLMQuery from cerebrum.tool.apis import ToolQuery from cerebrum.memory.apis import MemoryQuery from cerebrum.storage.apis import StorageQuery execute_request, SyscallWrapper, executor = useSysCall() # --- LLM chat --- llm_result = execute_request( "my_agent", LLMQuery( messages=[{"role": "user", "content": "Summarise the water cycle in 3 bullets."}], action_type="chat", ), ) print(llm_result["response"].response_message) # --- LLM tool call (LLM selects and invokes a tool) --- tool_result = execute_request( "my_agent", LLMQuery( messages=[{"role": "user", "content": "What is 42 * 17?"}], tools=[{"type": "function", "function": {"name": "calculator", "parameters": {}}}], action_type="call_tool", ), ) print(tool_result["response"].response_message) # --- Storage file write --- storage_result = execute_request( "my_agent", StorageQuery( operation_type="create_file", params={"file_path": "notes/todo.txt", "content": "Buy milk"} ), ) print(storage_result["response"].response_message) # "Operation completed" # --- Memory add --- memory_result = execute_request( "my_agent", MemoryQuery( operation_type="add_memory", params={ "content": "The user prefers dark mode.", "tags": ["preferences", "ui"], "category": "User Preferences", } ), ) # --- Memory retrieve --- retrieved = execute_request( "my_agent", MemoryQuery( operation_type="retrieve_memory", params={"content": "UI preferences", "k": 3} ), ) print(retrieved["response"].response_message) ``` --- ## `useMemoryManager` — memory subsystem `useMemoryManager` returns a `MemoryManager` that delegates to a configured provider (in-house, Mem0, Zep). The manager routes `add_memory`, `remove_memory`, `update_memory`, `get_memory`, `retrieve_memory`, and `retrieve_memory_raw` operations. ```python from aios.hooks.modules.memory import useMemoryManager # Provider is read from config.yaml (memory.provider) memory_manager = useMemoryManager(log_mode="console") # Direct provider interaction (normally done via execute_request syscalls): # memory_manager.address_request(memory_syscall) # The provider implements: # add_memory(memory_note) # get_memory(memory_id) # retrieve_memory(query) <- semantic search # update_memory(memory_note) # remove_memory(memory_id) # retrieve_memory_raw(query) <- returns MemoryNote objects ``` **Switch providers at runtime via HTTP:** ```bash # 1. Edit config.yaml: memory.provider = "mem0" # 2. Hot-reload without restarting the server: curl -X POST http://localhost:8000/core/refresh ``` --- ## `useStorageManager` — LSFS semantic filesystem `useStorageManager` initialises an `StorageManager` backed by the LSFS (LLM-based Semantic File System), which supports natural-language file operations and optional vector-database indexing for semantic search. ```python from aios.hooks.modules.storage import useStorageManager storage_manager = useStorageManager( root_dir="root", # physical root directory on disk use_vector_db=True, # index files with ChromaDB/Qdrant for semantic search ) # Operations are dispatched via StorageQuery syscalls through execute_request: # operation_type: "create_file" | "read_file" | "write_file" | # "delete_file" | "list_files" | "search_files" # Example via the REST API: # curl -X POST http://localhost:8000/query \ # -H "Content-Type: application/json" \ # -d '{ # "agent_name": "file_agent", # "query_type": "storage", # "query_data": { # "operation_type": "create_file", # "params": {"file_path": "reports/summary.txt", "content": "Q1 results: up 12%"} # } # }' ``` --- ## `useToolManager` — tool execution with MCP backend `useToolManager` creates a `ToolManager` that loads tools via the Cerebrum `AutoTool` registry and starts a background MCP (Model Context Protocol) server for computer-use agents. ```python from aios.hooks.modules.tool import useToolManager tool_manager = useToolManager() # MCP server is started automatically on init # Tools are invoked through ToolQuery syscalls: # execute_request("agent", ToolQuery(tool_calls=[ # {"name": "org/tool-name", "parameters": {"arg1": "value1"}} # ])) # Cleanup (stops the MCP server subprocess): tool_manager.cleanup() ``` --- ## REST API — `/query` endpoint The primary HTTP endpoint accepts any query type and routes it to the appropriate kernel subsystem. ```bash # LLM chat curl -X POST http://localhost:8000/query \ -H "Content-Type: application/json" \ -d '{ "agent_name": "demo_agent", "query_type": "llm", "query_data": { "messages": [{"role": "user", "content": "What is quantum entanglement?"}], "action_type": "chat" } }' # Response: {"response": {"response_message": "...", "finished": true, "status_code": 200}} # Tool calling (LLM picks the tool) curl -X POST http://localhost:8000/query \ -H "Content-Type: application/json" \ -d '{ "agent_name": "demo_agent", "query_type": "llm", "query_data": { "messages": [{"role": "user", "content": "Search for the latest AI news."}], "tools": [{"type": "function", "function": {"name": "google/search", "parameters": {}}}], "action_type": "call_tool" } }' # Store a memory curl -X POST http://localhost:8000/query \ -H "Content-Type: application/json" \ -d '{ "agent_name": "demo_agent", "query_type": "memory", "query_data": { "operation_type": "add_memory", "params": {"content": "User is allergic to peanuts.", "tags": ["health"]} } }' # Semantic memory search curl -X POST http://localhost:8000/query \ -H "Content-Type: application/json" \ -d '{ "agent_name": "demo_agent", "query_type": "memory", "query_data": { "operation_type": "retrieve_memory", "params": {"content": "food allergies", "k": 3} } }' ``` --- ## REST API — Agent lifecycle endpoints ```bash # Submit an agent from the AIOS Hub curl -X POST http://localhost:8000/agents/submit \ -H "Content-Type: application/json" \ -d '{ "agent_id": "example/MathAgent", "agent_config": {"task": "Solve: integral of sin(x) from 0 to pi"} }' # Response: {"status": "success", "execution_id": 471823} # Poll agent status curl http://localhost:8000/agents/471823/status # Running: {"status": "running", "execution_id": 471823} # Done: {"status": "completed", "result": {...}, "execution_id": 471823} # List all agent processes curl http://localhost:8000/agents/ps # Check kernel health curl http://localhost:8000/status # {"status": "ok", "message": "All core components are active."} # List configured LLMs curl http://localhost:8000/core/llms/list # Select specific LLMs for subsequent queries curl -X POST http://localhost:8000/user/select/llms \ -H "Content-Type: application/json" \ -d '[{"name": "gpt-4o-mini", "provider": "openai"}]' # Hot-reload configuration (e.g., after editing config.yaml) curl -X POST http://localhost:8000/core/refresh # Update an API key at runtime curl -X POST http://localhost:8000/core/config/update \ -H "Content-Type: application/json" \ -d '{"provider": "openai", "api_key": "sk-new-key"}' # Shut down all components cleanly curl -X POST http://localhost:8000/core/cleanup ``` --- ## AIOS Terminal — LLM-based semantic file system UI `runtime/run_terminal.py` starts an interactive terminal that classifies natural-language commands as file operations or chat using a two-stage heuristic + LLM `IntentRouter`. ```bash python runtime/run_terminal.py # AIOS> create a file called meeting_notes.txt with today's agenda # AIOS> list all files in the reports folder # AIOS> hi, what can you help me with? # The IntentRouter classifies each input: from aios.terminal.intent_router import IntentRouter, build_llm_classify_fn, Intent router = IntentRouter( llm_classify_fn=build_llm_classify_fn("terminal_agent") ) result = router.classify("delete the old backup files") print(result.intent) # Intent.FILE_OPERATION print(result.confidence) # Confidence.HIGH print(result.source) # "keyword" result2 = router.classify("hello, how are you?") print(result2.intent) # Intent.CHAT ``` --- ## Experimental Rust Kernel (`aios-rs`) An early Rust rewrite providing trait definitions and minimal placeholder implementations. Not feature-parity with the Python kernel. ```bash cd aios-rs cargo build cargo test ``` ```rust use aios_rs::prelude::*; fn main() -> anyhow::Result<()> { let llm = std::sync::Arc::new(EchoLLM); let memory = std::sync::Arc::new(std::sync::Mutex::new(InMemoryMemoryManager::new())); let storage = std::sync::Arc::new(FsStorageManager::new("/tmp/aios_store")); let tool = std::sync::Arc::new(NoopToolManager); let mut scheduler = NoopScheduler::new(llm, memory, storage, tool); scheduler.start()?; scheduler.stop()?; Ok(()) } ``` --- ## Summary AIOS is used in two primary patterns: **local agent execution**, where a developer installs both the kernel and the Cerebrum SDK on one machine, launches the kernel with `bash runtime/launch_kernel.sh`, and uses the SDK to define agents that call `execute_request` to LLM/Tool/Memory/Storage syscalls; and **remote kernel mode**, where a powerful server hosts the kernel while lightweight clients (laptops, mobile devices) install only the SDK and interact with the kernel over HTTP. In both modes, agents are submitted via `POST /agents/submit` with an `agent_id` of the form `author/AgentName` (resolved from the AIOS Hub), and results are polled via `GET /agents/{execution_id}/status`. Integration into existing systems is straightforward: any HTTP client can drive the kernel through the `/query` endpoint, making AIOS embeddable as a managed AI inference and memory backend behind existing applications. Developers can extend the system by implementing the `MemoryProvider` abstract base class for custom memory backends, adding tools to the Cerebrum `AutoTool` registry, or porting components to the Rust scaffold in `aios-rs/` for performance-critical subsystems. The hot-reload endpoint (`POST /core/refresh`) and CLI commands (`aios env set`, `aios refresh`) make configuration changes operational without downtime.