### Example Tool Calls
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/07_chat.ipynb
Provides an example list of ToolCall objects, each representing a call to the 'simple_add' tool with different arguments.
```python
tool_calls = [ToolCall(id='123', name='simple_add', arguments={'a': 3, 'b': 5}, server=False, extra={'type': 'function'}),
ToolCall(id='456', name='simple_add', arguments={'a': 10, 'b': 20}, server=False, extra={'type': 'function'})]
```
--------------------------------
### Model Output Example
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/07_chat.ipynb
Shows an example of the output received when interacting with a specific model, 'models/gemini-3.1-pro-preview'. This indicates the model's response format.
```text
Result:
Markdown(**models/gemini-3.1-pro-preview:**)
```
--------------------------------
### Gemini CLI Content Generation Example
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/05_gemini.ipynb
Demonstrates generating content using the Gemini CLI with a specific model and input. This is a non-streaming example used for testing.
```python
inp = [{"role": "user", "parts": [{"text": "Hi how are you?"}]}]
resp = await gem_cli.models.generate_content(model=mn, contents=inp)
comp = mk_completion(resp, mn, api_name, vnd_nm)
comp
```
--------------------------------
### Chat with Tool Usage Example
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/07_chat.ipynb
Demonstrates a chat interaction where the assistant uses a tool to perform a calculation. The example shows the message flow from user to assistant, tool use, tool result, and final assistant response.
```markdown
Result:
Markdown(**Msg**
- role: `user`
**Part** (`text`)
What is 5 + 7? Use the tool to calculate it
- data: `None`
**Msg**
- role: `assistant`
**Part** (`tool_use`)
- data: `{'type': 'function_call', 'status': 'completed', 'call_id': 'call_B51xXWFg10YkHqkJVMODTx4i', 'id': 'fc_0e2920bf05b71dc30069f311b880888191a42435c7c037419e', 'name': 'async_add', 'arguments': {'a': 5, 'b': 7}, 'server': False}`
**Msg**
- role: `tool`
**Part** (`tool_result`)
12
- data: `{'id': 'fc_0e2920bf05b71dc30069f311d949c88192a50f746f6e2da30d', 'name': 'async_add', 'arguments': {'a': 5, 'b': 7}, 'server': False}`
**Msg**
- role: `user`
**Part** (`text`)
You have used all your tool calls for this turn. Please summarize your findings. If you did not comp...
- data: `None`
**Msg**
- role: `assistant`
**Part** (`text`)
I used the tool to calculate 5 + 7, and the result is 12. If you have more calculations or questions, feel free to ask!
- data: `{'type': 'output_text', 'logprobs': [], 'text': 'I used the tool to calculate 5 + 7, and the result is 12. If you have more calculations or questions, feel free to ask!', 'citations': []}`
```
--------------------------------
### Example: Streaming Completion with OpenAI
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/02_oai_responses.ipynb
Demonstrates how to obtain a streaming completion from the OpenAI API using a specified model and input.
```python
mn,inp = 'gpt-4o-mini','Hi!'
resp = await oai_cli.responses.create_response(model=mn,input=inp)
comp = mk_completion(resp, mn, api_name, vnd_nm); comp
```
--------------------------------
### Import LiteLLM
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/06_acomplete.ipynb
Import the LiteLLM library to start using its functionalities.
```python
import litellm
```
--------------------------------
### Commented Out Qwen API Client Initialization
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/03_oai_chat.ipynb
A commented-out example showing how to initialize an OpenAPIClient for the Qwen API with its specific endpoint.
```python
# qwen_cli = OpenAPIClient(oai_spec, headers={"Authorization": f"Bearer {os.environ['QWEN_API_KEY']}"})
# for op in qwen_cli.ops: op.base_url = 'https://dashscope.aliyuncs.com/compatible-mode/v1'
```
--------------------------------
### Example Completion Object Creation
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/00_types.ipynb
Demonstrates how to create a Completion object with multiple parts, including 'thinking' and 'text' types, simulating a complex LLM response.
```python
parts = [
Part(type=PartType.thinking, text="First, let me consider the question..."),
Part(type=PartType.text, text="The answer involves two parts. "),
Part(type=PartType.thinking, text="Now for the second part, I need to..."),
Part(type=PartType.text, text="And here's the conclusion."),
]
Completion(model='model', message=Msg(role="assistant", content=parts))
```
--------------------------------
### Streaming Cache Test Setup
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/06_acomplete.ipynb
Sets up streaming cache tests for both `acomplete` and `litellm.completion` to verify cache reads and writes.
```python
# Streaming - as a sanity check
it1 = await acomplete([msg1], model='claude-sonnet-4-20250514', max_tokens=64, stream=True) # writes cache
it2 = await acomplete([msg1, comp1.message, msg3], model='claude-sonnet-4-20250514', max_tokens=64, stream=True) # reads cache
async for comp1 in it1: pass
async for comp2 in it2: pass
```
--------------------------------
### Example: Usage with Code Execution Tool
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/05_gemini.ipynb
Illustrates a Gemini API call requesting code execution and normalizes the usage metadata. Note that code execution tool use is not explicitly logged in the normalized output in this example.
```python
resp = await gem_cli.models.generate_content(model=mn, contents=[{"role": "user", "parts": [{"text": "Calculate the first 10 fibonacci numbers using code"}]}], tools=[{"codeExecution": {}}])
norm_usage(resp)
```
--------------------------------
### Model Output: GPT-4o-search-preview to GPT-4o-search-preview
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/06_acomplete.ipynb
Example output from gpt-4o-search-preview when interacting with itself, detailing current weather conditions.
```text
Output:
gpt-4o-search-preview -> gpt-4o-search-preview: As
of
11
:
16
PM
local
time
on
Friday
,
June
12
,
202
6
,
in
Brisbane
,
Australia
,
the
current
weather
cond…
```
--------------------------------
### Model Output: GPT-4o-search-preview to GPT-4o-mini
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/06_acomplete.ipynb
Example output from gpt-4o-search-preview when interacting with gpt-4o-mini, providing a weather update.
```text
Output:
gpt-4o-search-preview -> gpt-4o-mini : As
of
11
:
16
PM
local
time
on
Friday
,
June
12
,
202
6
,
in
Brisbane
,
Australia
,
the
weather
is
mostly
cl…
```
--------------------------------
### PartAccum Example: Merging Tool Calls
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/01_streaming.ipynb
Demonstrates initializing PartAccum with a ToolCall and then merging parts, excluding tool calls.
```python
pa = PartAccum({0: ToolCall(id='toolu_01GF7HEH9s63gdAYL5dbcSj5', name='python', arguments={}, server=False, extra={'caller': {'type': 'direct'}})}, [])
pa.get_merged(False)
```
--------------------------------
### Example: Basic Text Generation Usage
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/05_gemini.ipynb
Demonstrates a basic text generation call to the Gemini API and normalizes its usage metadata.
```python
resp = await gem_cli.models.generate_content(model=mn, contents=[{"role": "user", "parts": [{"text": "hi!"}]}])
norm_usage(resp)
```
--------------------------------
### Registering a Test API
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/00_types.ipynb
Demonstrates how to register a new API endpoint with the `api_registry`. This example registers a 'test' API with a simple function `f`.
```python
def f(): print('test')
api_registry.register('test',**{'f':f})
```
--------------------------------
### Example of Prepending System Message
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/03_oai_chat.ipynb
Demonstrates prepending a system message to user messages for OpenAI Chat. Shows the resulting message list structure.
```python
sp = 'You are a pirate. Always respond in pirate speak. Keep it to one sentence.'
msg1 = mk_user_msg('What are you?')
msgs = denorm_msgs([msg1])
msgs = denorm_system(sp, msgs); msgs
```
--------------------------------
### Example of Part Instantiation with Long Data
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/00_types.ipynb
Demonstrates creating a `Part` instance with a long text and a long dictionary value to showcase the truncation in its Markdown representation.
```python
Part(PartType.text, 'Hello world!'*1000, data={'long':"10"*1000})
```
--------------------------------
### Create OpenAI Response with System Instructions
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/02_oai_responses.ipynb
Demonstrates creating an OpenAI response using a system prompt and streaming the output. Requires setup for oai_cli, denorm_msgs, mk_user_msg, acollect_stream, and vendor name.
```python
sp = 'You are a pirate. Always respond in pirate speak. Keep it to one sentence.'
msg1 = mk_user_msg('What are you?')
resp = await oai_cli.responses.create_response(model='gpt-4o-mini', input=denorm_msgs([msg1]), instructions=denorm_system(sp), stream=True)
async for comp in acollect_stream(resp, vendor_name=vnd_nm): pass
comp
```
--------------------------------
### Get Model Information
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/00_types.ipynb
Retrieves detailed information about a specific model from a given vendor. This is a commented-out example.
```python
# get_model_info('claude-fable-5', 'anthropic')
```
```python
# get_model_info('MiniMax-M3', 'minimax')
```
--------------------------------
### Cache Test Setup with Litellm
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/06_acomplete.ipynb
Sets up a cache test scenario using `litellm.completion` with long text input and a summarization request, followed by a follow-up question.
```python
big_msgs1 = [{"role":"user","content":[
{"type":"text","text":big_text,"cache_control":{"type":"ephemeral"}},
{"type":"text","text":"Summarize"}]}]
litecomp1 = litellm.completion(model="anthropic/claude-sonnet-4-20250514", messages=big_msgs1, max_tokens=64)
big_msgs2 = big_msgs1 + [
{"role":"assistant","content":litecomp1.choices[0].message.content},
{"role":"user","content":"Now in French"}]
litecomp2 = litellm.completion(model="anthropic/claude-sonnet-4-20250514", messages=big_msgs2, max_tokens=64)
```
--------------------------------
### Get Model Info for Codex Models
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/00_types.ipynb
Retrieves detailed information for specific Codex models. These are examples of using `get_model_info` with different model identifiers.
```python
get_model_info(codex53spark, 'codex')
```
```python
get_model_info(codex55, 'codex')
```
--------------------------------
### Direct Text Completion Output (Part 1)
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/06_acomplete.ipynb
This example shows the first part of a direct text completion output. It is a fragment of a larger response.
```text
Hello
```
--------------------------------
### Example Conversation with mk_msgs
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/07_chat.ipynb
Demonstrates how to use `mk_msgs` to create a simple chat conversation with four messages. The output shows the resulting list of `Msg` objects.
```python
msgs = mk_msgs(['Hey!',"Hi there!","How are you?","I'm doing fine and you?"])
msgs
```
--------------------------------
### ToolCall Example
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/00_types.ipynb
An example of a ToolCall object, which represents a function call made by an LLM.
```python
ToolCall(id='oxwvx1fm', name='simple_add', arguments={'b': 547982745, 'a': 5478954793}, server=False, extra={'thoughtSignature': 'EscDCsQDAQw51scPHdv+D5BX7JWdLzz3Bv8tsKFRuAJe2UkTFZ+NZKzNsLtmQBiia+/r4HJEUptq1zQB0q9HToX0qzCUqyNAbDLY76KxMeW9jpsnUvh6ZjPM5sDD7fAafF7cjdApNMsihPqIZBAZjAlFPcp1c/50MObH5f1q7hO7fgDS4iSJ3Q3FfbAYWnJ4nlA2peVMu/6WFcKZh1wcZCIuN6iFCj6nhH+6RKkaFRaM0b6XCmpti6qldSeZx+qtHmo+lzr1tct4Gz/CITDI7gRJ3qfLYV2u45jOhKzdd1t6gQ39XLJ93j0xd0AwpzcdZLbHWqwWJCQ43nNzhJ7IQTAWOSyPgKDnlAMHq2PTEoXBYkMBApCZ1x+HncBzt77kQrTTe7sWGVmD5boVnYAIFPFGXOULP5tDZ+nog+Fg8NV10vaFKlHVf+VDzFnVWxT259LN12ykGtBilfpTXiKCV12RAZwhuL7vXXHrsBGg5HNVImcXqgMvwf/rtQlJeop+9bEcAiU48hMFMzumOrCmmHD3HgxpYLW7T3vtDmbNdKCDqVtIwO4Rp5HE6GudRWmq8iC2UnyQglUXoXVnxIZW7eYYDsGAYrYgZ1A='})
```
--------------------------------
### Prepare and Create Response with Tools
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/02_oai_responses.ipynb
Sets up the model name, input prompt, and tools for an OpenAI API request, then creates a completion response. This is a common setup for making API calls that involve tool usage.
```python
mn,inp,tools = 'gpt-4o-mini','What is the weather in Istanbul today?',[{"type": "web_search_preview"}]
resp = await oai_cli.responses.create_response(model=mn,input=inp,tools=tools)
comp = mk_completion(resp, mn, api_name, vnd_nm)
```
--------------------------------
### Basic Chat Completion Example
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/07_chat.ipynb
Illustrates a simple chat completion response from a Gemini model. This shows the structure of a successful message and usage statistics.
```python
Result:
Completion(model='models/gemini-3-flash-preview', message=Msg(role='assistant', content=[Part(type=, text='Hello! How can I help you today?', data={'citations': []})]), finish_reason=, usage=Usage(prompt_tokens=3, completion_tokens=9, total_tokens=77, cached_tokens=0, cache_creation_tokens=0, reasoning_tokens=65, raw={'promptTokenCount': 3, 'candidatesTokenCount': 9, 'totalTokenCount': 77, 'promptTokensDetails': [{'modality': 'TEXT', 'tokenCount': 3}], 'thoughtsTokenCount': 65}), tool_calls=[], api_name='gemini', vendor_name='gemini', raw={'deltas': [Delta(text='Hello! How can I', thinking='', refusal='', tool_calls=[], citations=[], server_tool_result=None, finish_reason=None, usage=Usage(prompt_tokens=3, completion_tokens=5, total_tokens=73, cached_tokens=0, cache_creation_tokens=0, reasoning_tokens=65, raw={'promptTokenCount': 3, 'candidatesTokenCount': 5, 'totalTokenCount': 73, 'promptTokensDetails': [{'modality': 'TEXT', 'tokenCount': 3}], 'thoughtsTokenCount': 65}), raw={'candidates': [{'content': {'parts': [{'text': 'Hello! How can I'}], 'role': 'model'}, 'index': 0}], 'usageMetadata': {'promptTokenCount': 3, 'candidatesTokenCount': 5, 'totalTokenCount': 73, 'promptTokensDetails': [{'modality': 'TEXT', 'tokenCount': 3}], 'thoughtsTokenCount': 65}, 'modelVersion': 'gemini-3-flash-preview', 'responseId': 'gRLzacDtIt6A3boP9vCWsAg'}), Delta(text=' help you today?', thinking='', refusal='', tool_calls=[], citations=[], server_tool_result=None, finish_reason=None, usage=Usage(prompt_tokens=3, completion_tokens=9, total_tokens=77, cached_tokens=0, cache_creation_tokens=0, reasoning_tokens=65, raw={'promptTokenCount': 3, 'candidatesTokenCount': 9, 'totalTokenCount': 77, 'promptTokensDetails': [{'modality': 'TEXT', 'tokenCount': 3}], 'thoughtsTokenCount': 65}), raw={'candidates': [{'content': {'parts': [{'text': ' help you today?'}], 'role': 'model'}, 'index': 0}], 'usageMetadata': {'promptTokenCount': 3, 'candidatesTokenCount': 9, 'totalTokenCount': 77, 'promptTokensDetails': [{'modality': 'TEXT', 'tokenCount': 3}], 'thoughtsTokenCount': 65}, 'modelVersion': 'gemini-3-flash-preview', 'responseId': 'gRLzacDtIt6A3boP9vCWsAg'}), Delta(text='', thinking='', refusal='', tool_calls=[], citations=[], server_tool_result=None, finish_reason=, usage=Usage(prompt_tokens=3, completion_tokens=9, total_tokens=77, cached_tokens=0, cache_creation_tokens=0, reasoning_tokens=65, raw={'promptTokenCount': 3, 'candidatesTokenCount': 9, 'totalTokenCount': 77, 'promptTokensDetails': [{'modality': 'TEXT', 'tokenCount': 3}], 'thoughtsTokenCount': 65}), raw={'candidates': [{'content': {'parts': [{'text': '', 'thoughtSignature': 'EtQCCtECAQw51sfRXDOcN9iujtIsIdL/3Hqy9Ppa7GABoJXxMd00zUZUs4rcHJs815F1BZP0RlKbRrxtSACPJBb5ypxaKrzijIymPV7n9FynodoT6/B7wJquuHXD6rIvPy9/nssqrWAcBA5fJOdjXRtfM3tMLhIcl6Np3L87f6KeOwgS/npqJLikxKJxHFukl1cRw2COc3gqKfksPcAwPydBUcegmji3elck26EZmqzqO8+jCETceWkThUCxmg9jM9oWI3JmmrOFSKZ9/IcIFf4kuz/xeFxzPbdh/PQW1GMHndLy/PErTkRIwu5HtcZQYAZWcwqB3ob6ulYi0NdDWl9Y1SeMCa911GpG1W3iOro46AZcpe/+eEj16TFCqReGU6nD2MSHx9iNcGhTu919tAW5BGw1sKfZV5PFltMBAzQTRvplLakXAwsdxE/jPheVo6PZj9VtXQ=='}], 'role': 'model'}, 'finishReason': 'STOP', 'index': 0}], 'usageMetadata': {'promptTokenCount': 3, 'candidatesTokenCount': 9, 'totalTokenCount': 77, 'promptTokensDetails': [{'modality': 'TEXT', 'tokenCount': 3}], 'thoughtsTokenCount': 65}, 'modelVersion': 'gemini-3-flash-preview', 'responseId': 'gRLzacDtIt6A3boP9vCWsAg'})]})
```
--------------------------------
### Initialize AsyncChat with Different Configurations
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/07_chat.ipynb
Demonstrates how to initialize the AsyncChat class with various configurations, including auto-inferred models, known vendors, and explicit API settings.
```python
# Auto inferred
c = AsyncChat("claude-opus-4-5")
c = AsyncChat("models/gemini-3-pro-preview")
c = AsyncChat("gpt-4.1")
# Known Vendor
c = AsyncChat("gpt-5.4", vendor_name="codex")
# Explicit
c = AsyncChat("gpt-oss-20b", api_name='openai_chat', api_key='...', base_url='https://openrouter.ai/api/v1')
```
--------------------------------
### Chat Model Completion Output Example
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/07_chat.ipynb
An example of a chat model's completion output, including reasoning, descriptive text, and usage statistics for an image analysis task.
```python
Result:
Completion(model='accounts/fireworks/models/kimi-k2p6', message=Msg(role='assistant', content=[Part(type='thinking', text="The user wants me to identify what's in the image. Looking at the image, I can see a small puppy with brown and white fur lying on green grass. Next to the puppy are some purple flowers (likely asters or similar small purple flowers). The puppy appears to be a Cavalier King Charles Spaniel puppy, given its distinctive coloring - white face with brown ears and markings, large dark eyes, and that specific puppy-like appearance. It's lying down with its front paws extended on the grass.\n\nI should describe the image clearly and accurately. The image shows:\n- A puppy (Cavalier King Charles Spaniel)\n- Brown and white coloring\n- Lying on green grass\n- Purple flowers nearby (to the left of the puppy)\n- Cute, looking at camera\n\nLet me provide a friendly, descriptive answer.", data=None), Part(type='text', text="This image shows an adorable **Cavalier King Charles Spaniel puppy** lying on green grass. The puppy has the breed's characteristic **brown and white coat**, with large dark eyes, floppy brown ears, and a white blaze down the center of its face. It's resting with its front paws stretched out, looking directly at the camera. \n\nNext to the puppy (on the left side of the image) is a cluster of **small purple flowers**
—likely asters or daisy-like blooms—growing in the grass. The overall scene is very cute and gives off a sweet, summery, outdoor vibe! 🐶🌸", data={'citations': []})]), finish_reason='stop', usage=Usage(prompt_tokens=107, completion_tokens=300, total_tokens=407, cached_tokens=0, cache_creation_tokens=0, reasoning_tokens=0, raw={'prompt_tokens': 107, 'total_tokens': 407, 'completion_tokens': 300, 'prompt_tokens_details': {'cached_tokens': 0}}), tool_calls=[], api_name='openai_chat', vendor_name='fireworks_ai', raw={'id': 'chatcmpl-1cde9440db084c7aaeb92b30e84a41b8', 'object': 'chat.completion', 'created': 1777533873, 'model': 'accounts/fireworks/models/kimi-k2p6', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': "This image shows an adorable **Cavalier King Charles Spaniel puppy** lying on green grass. The puppy has the breed's characteristic **brown and white coat**, with large dark eyes, floppy brown ears, and a white blaze down the center of its face. It's resting with its front paws stretched out, looking directly at the camera. \n\nNext to the puppy (on the left side of the image) is a cluster of **small purple flowers**
—likely asters or daisy-like blooms—growing in the grass. The overall scene is very cute and gives off a sweet, summery, outdoor vibe! 🐶🌸", 'reasoning_content': "The user wants me to identify what's in the image. Looking at the image, I can see a small puppy with brown and white fur lying on green grass. Next to the puppy are some purple flowers (likely asters or similar small purple flowers). The puppy appears to be a Cavalier King Charles Spaniel puppy, given its distinctive coloring - white face with brown ears and markings, large dark eyes, and that specific puppy-like appearance. It's lying down with its front paws extended on the grass.\n\nI should describe the image clearly and accurately. The image shows:\n- A puppy (Cavalier King Charles Spaniel)\n- Brown and white coloring\n- Lying on green grass\n- Purple flowers nearby (to the left of the puppy)\n- Cute, looking at camera\n\nLet me provide a friendly, descriptive answer."}, 'finish_reason': 'stop', 'token_ids': None}], 'usage': {'prompt_tokens': 107, 'total_tokens': 407, 'completion_tokens': 300, 'prompt_tokens_details': {'cached_tokens': 0}}, 'prompt_token_ids': [163587, 2482, 163601, 45702, 1573, 306, 566, 4082, 30, 163602, 4017, 163603, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163604, 198, 163586, 163588, 69702, 163601, 163606]})
```
--------------------------------
### FastLLM Setup and Helper Functions
Source: https://github.com/answerdotai/fastllm/blob/main/README.md
Imports necessary types and functions from fastllm for chat completions. Includes helper functions for creating user messages and streaming responses, which print thinking processes and text as they arrive.
```python
from fastllm.types import Msg, Part, PartType, Completion
from fastllm.acomplete import acomplete, mk_tool_res_msg
import asyncio, json
# Helpers
def user(text): return Msg(role='user', content=[Part(type=PartType.text, text=text)])
async def stream(msgs, model, **kw):
"""Stream a response, printing text/thinking as it arrives. Returns the final Completion."""
cnt, max_think = 0, 10
async for o in await acomplete(msgs, model, stream=True, **kw):
if not isinstance(o, Completion):
if o.get('thinking') and cnt < max_think: print('🧠', end='', flush=True)
if txt := o.get('text'): print(txt, end='', flush=True)
cnt += 1
print()
return o
```
--------------------------------
### Gemini Completion Example
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/06_acomplete.ipynb
Demonstrates a successful completion from the Gemini API, including model name, message content, finish reason, and usage statistics.
```text
Result:
Completion(model='models/gemini-3-flash-preview', message=Msg(role='assistant', content=[Part(type=, text='Hello! How can I help you today?', data={'citations': []})]), finish_reason=, usage=Usage(prompt_tokens=3, completion_tokens=9, total_tokens=35, cached_tokens=0, cache_creation_tokens=0, reasoning_tokens=23, raw={'promptTokenCount': 3, 'candidatesTokenCount': 9, 'totalTokenCount': 35, 'promptTokensDetails': [{'modality': 'TEXT', 'tokenCount': 3}], 'thoughtsTokenCount': 23, 'serviceTier': 'standard'}), tool_calls=[], api_name='gemini', vendor_name='gemini', raw={'deltas': [Delta(text='Hello! How can I help you today?', thinking='', refusal='', tool_calls=[], citations=[], server_tool_result=None, finish_reason=None, usage=Usage(prompt_tokens=3, completion_tokens=9, total_tokens=35, cached_tokens=0, cache_creation_tokens=0, reasoning_tokens=23, raw={'promptTokenCount': 3, 'candidatesTokenCount': 9, 'totalTokenCount': 35, 'promptTokensDetails': [{'modality': 'TEXT', 'tokenCount': 3}], 'thoughtsTokenCount': 23, 'serviceTier': 'standard'}), raw={'candidates': [{'content': {'parts': [{'text': 'Hello! How can I help you today?'}], 'role': 'model'}, 'index': 0}], 'usageMetadata': {'promptTokenCount': 3, 'candidatesTokenCount': 9, 'totalTokenCount': 35, 'promptTokensDetails': [{'modality': 'TEXT', 'tokenCount': 3}], 'thoughtsTokenCount': 23, 'serviceTier': 'standard'}, 'modelVersion': 'gemini-3-flash-preview', 'responseId': 'zgMsau-6BuO2_uMP4erKgQc'}), Delta(text='', thinking='', refusal='', tool_calls=[], citations=[], server_tool_result=None, finish_reason=, usage=Usage(prompt_tokens=3, completion_tokens=9, total_tokens=35, cached_tokens=0, cache_creation_tokens=0, reasoning_tokens=23, raw={'promptTokenCount': 3, 'candidatesTokenCount': 9, 'totalTokenCount': 35, 'promptTokensDetails': [{'modality': 'TEXT', 'tokenCount': 3}], 'thoughtsTokenCount': 23, 'serviceTier': 'standard'}), raw={'candidates': [{'content': {'parts': [{'text': '', 'thoughtSignature': 'ErkBCrYBAQw51sf6XKqYlAH0hfjkKYIf2UGH2zQzCbpusz4xPgjpm8sfiwdjf3sXmr4Ii0wXe5/JEaUY/gx6M2+GkSZj+D+bV12cYzLNZe0H2Jv27iPgCCH3/gNLkwz6sNcaxM3SdQ8ldXf/7Mj5gVBTedYL9LB8XkQPoF6jayDn5/lpR5iYmPXcd9NCgWRT73OoRbsbKqg1LLJcHclabsBZkCqQU8/PqDZrPPzP8KAaUgCS1sMzQPIqI54='}], 'role': 'model'}, 'finishReason': 'STOP', 'index': 0}], 'usageMetadata': {'promptTokenCount': 3, 'candidatesTokenCount': 9, 'totalTokenCount': 35, 'promptTokensDetails': [{'modality': 'TEXT', 'tokenCount': 3}], 'thoughtsTokenCount': 23, 'serviceTier': 'standard'}, 'modelVersion': 'gemini-3-flash-preview', 'responseId': 'zgMsau-6BuO2_uMP4erKgQc'})]})
```
--------------------------------
### Direct Text Completion Output (Part 2)
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/06_acomplete.ipynb
This example shows the second part of a direct text completion output. It continues the response from the previous snippet.
```text
! How can I help you today?
```
--------------------------------
### Content Generation with Google Search Tool
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/05_gemini.ipynb
This example demonstrates how to generate content by leveraging the Google Search tool. Provide an empty dictionary for the 'googleSearch' tool to enable its use.
```python
resp = await gem_cli.models.generate_content(model=mn, contents=[{"role": "user", "parts": [{"text": "What is the weather in Istanbul today?"}]}], tools=[{"googleSearch": {}}])
comp = mk_completion(resp, mn, api_name, vnd_nm)
comp
```
--------------------------------
### Example Usage of run_fence_tool
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/07_chat.ipynb
Illustrates the usage of the `run_fence_tool` function with Python and Bash examples. It asserts that the output for a Python print statement is '3' and the output for a Bash 'ls' command is 'bash: ls'.
```python
out = await run_fence_tool('py', 'print(1+2)', _ns)
test_eq(_result_re.search(out).group(1), '3')
out = await run_fence_tool('bash', 'ls', _ns)
test_eq(_result_re.search(out).group(1), 'bash: ls')
```
--------------------------------
### Anthropic API Response Example
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/04_anthropic.ipynb
An example of a structured response from the Anthropic API, including message content, usage statistics, and raw API details. This shows the output after processing a streaming request.
```text
Result:
Completion(model=None, message=Msg(role='assistant', content=[Part(type=, text='I can see a very small red square or rectangle in the image. The image appears to be mostly white/transparent with just this small red geometric shape visible in what looks like the upper left area. The red element is quite small and appears to be a simple solid color shape.', data={'citations': []})]), finish_reason=, usage=Usage(prompt_tokens=77, completion_tokens=59, total_tokens=136, cached_tokens=0, cache_creation_tokens=0, reasoning_tokens=0, raw={'input_tokens': 77, 'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'output_tokens': 59}), tool_calls=[], api_name='anthropic', vendor_name=None, raw={'deltas': [Delta(text=None, thinking=None, refusal='', tool_calls=[], citations=None, server_tool_result=None, finish_reason=None, usage=None, raw={'type': 'message_start', 'message': {'model': 'claude-sonnet-4-20250514', 'id': 'msg_01Jj7S7fWhFgBBXoFS6ACd6F', 'type': 'message', 'role': 'assistant', 'content': [], 'stop_reason': None, 'stop_sequence': None, 'stop_details': None, 'usage': {'input_tokens': 77, 'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'cache_creation': {'ephemeral_5m_input_tokens': 0, 'ephemeral_1h_input_tokens': 0}, 'output_tokens': 2, 'service_tier': 'standard', 'inference_geo': 'not_available'}}}), Delta(text=None, thinking=None, refusal='', tool_calls=[], citations=None, server_tool_result=None, finish_reason=None, usage=None, raw={'type': 'content_block_start', 'index': 0, 'content_block': {'type': 'text', 'text': ''}}), Delta(text=None, thinking=None, refusal='', tool_calls=[], citations=None, server_tool_result=None, finish_reason=None, usage=None, raw={'type': 'ping'}), Delta(text='I can', thinking=None, refusal='', tool_calls=[], citations=None, server_tool_result=None, finish_reason=None, usage=None, raw={'type': 'content_block_delta', 'index': 0, 'delta': {'type': 'text_delta', 'text': 'I can'}}), Delta(text=' see a very small red square or rectangle in the image. The image', thinking=None, refusal='', tool_calls=[], citations=None, server_tool_result=None, finish_reason=None, usage=None, raw={'type': 'content_block_delta', 'index': 0, 'delta': {'type': 'text_delta', 'text': ' see a very small red square or rectangle in the image. The image'}}), Delta(text=' appears to be mostly white/transparent with just this small', thinking=None, refusal='', tool_calls=[], citations=None, server_tool_result=None, finish_reason=None, usage=None, raw={'type': 'content_block_delta', 'index': 0, 'delta': {'type': 'text_delta', 'text': ' appears to be mostly white/transparent with just this small'}}), Delta(text=' red geometric shape visible in what looks like the upper left area. The red element is', thinking=None, refusal='', tool_calls=[], citations=None, server_tool_result=None, finish_reason=None, usage=None, raw={'type': 'content_block_delta', 'index': 0, 'delta': {'type': 'text_delta', 'text': ' red geometric shape visible in what looks like the upper left area. The red element is'}}), Delta(text=' quite small and appears to be a simple solid color shape.', thinking=None, refusal='', tool_calls=[], citations=None, server_tool_result=None, finish_reason=None, usage=None, raw={'type': 'content_block_delta', 'index': 0, 'delta': {'type': 'text_delta', 'text': ' quite small and appears to be a simple solid color shape.'}}), Delta(text=None, thinking=None, refusal='', tool_calls=[], citations=None, server_tool_result=None, finish_reason=None, usage=None, raw={'type': 'content_block_stop', 'index': 0}), Delta(text='', thinking='', refusal='', tool_calls=[], citations=[], server_tool_result=None, finish_reason=, usage=Usage(prompt_tokens=77, completion_tokens=59, total_tokens=136, cached_tokens=0, cache_creation_tokens=0, reasoning_tokens=0, raw={'input_tokens': 77, 'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'output_tokens': 59}), raw={'type': 'message_delta', 'delta': {'stop_reason': 'end_turn', 'stop_sequence': None, 'stop_details': None}, 'usage': {'input_tokens': 77, 'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'output_tokens': 59}})]})
```
--------------------------------
### Initiate Chat with Web Search Options
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/07_chat.ipynb
This snippet shows how to start a chat interaction using a specific model and tools, while also configuring web search options. It's useful for scenarios requiring real-time information retrieval during a conversation.
```python
await c(smsg, m=gpt54m, tools=[toolsc], web_search_options={})
```
--------------------------------
### Unified Chat Interface Example
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/07_chat.ipynb
Demonstrates the unified chat interface by calling `acomplete` with different LLM models and a sample user message. This showcases the ability to switch providers easily.
```python
ms = ["models/gemini-3.1-pro-preview", "models/gemini-3-flash-preview", "claude-sonnet-4-6", "gpt-4.1"]
msgs = [Msg(role='user', content=[Part(type=PartType.text, text='Hi there!', data={"cache_control": {"type": "ephemeral"}})])]
for m in ms:
display(Markdown(f'**{m}:**'))
display(await acomplete(msgs, m))
```
--------------------------------
### Cache Test Setup with Long Text and Summarization
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/06_acomplete.ipynb
Sets up a cache test scenario using `acomplete` with long text input and a summarization request, followed by a follow-up question.
```python
cc = {"cache_control": {"type": "ephemeral"}}
big_text = 'The quick brown fox jumps over the lazy dog. ' * 200
msg1 = Msg('user', content=[Part('text', big_text, data=cc), Part('text', 'Summarize')])
comp1 = await acomplete([msg1], model='claude-sonnet-4-20250514', max_tokens=64) # writes cache
msg3 = Msg('user', content=[Part('text', 'Now in French')])
comp2 = await acomplete([msg1, comp1.message, msg3], model='claude-sonnet-4-20250514', max_tokens=64) # reads cache
```
--------------------------------
### Audio and Text Input Completion (Pro Model)
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/06_acomplete.ipynb
Compares `acomplete` and LiteLLM `completion` for audio and text input using the 'pro' model. Note the specific handling of `audio_b64` for cost comparison.
```python
msg = Msg('user', content=[Part(PartType.input_audio, audio_b64), Part('text', 'What is this audio saying?')])
comp = await acomplete([msg], model=pro_mn, temperature=0.0)
litecomp = litellm.completion(model=lpro_mn, messages=[{"role":"user","content":[{"type":"input_audio","input_audio":{"data":audio_b64.split(',', 1)[1],"format":"wav"}},{"type":"text","text":"What is this audio saying?"}]}], temperature=0.0)
# test_close(litellm.completion_cost(completion_response=litecomp), comp.cost, 1e-3)
litellm.completion_cost(completion_response=litecomp), comp.cost
```
--------------------------------
### Get Model Name
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/07_chat.ipynb
Retrieves the name of the model being used.
```python
ms[2]
```
--------------------------------
### Example: Usage with URL Context Tool
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/05_gemini.ipynb
Demonstrates a Gemini API call using the URL context tool and normalizes the usage metadata, which includes significant token counts for tool use.
```python
resp = await gem_cli.models.generate_content(model=mn, contents=[{"role": "user", "parts": [{"text": "What is solveit? https://solve.it.com/"}]}], tools=[{"urlContext": {}}])
norm_usage(resp)
```
--------------------------------
### Retrieve Model Information
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/00_types.ipynb
Retrieves information for a specific model and vendor. This is a commented-out example.
```python
# get_model_info('kimi-k2.7-code', 'moonshot')
```
--------------------------------
### Clear Chat History
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/07_chat.ipynb
Resets the conversation history, allowing for a fresh start in a new conversation.
```python
chat.clear_history()
print("Chat history cleared.")
```
--------------------------------
### Search Tool Call Example
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/05_gemini.ipynb
Demonstrates using the Gemini API with a Google Search tool. It shows how to call the API for a search query and then normalize the tool calls, which in this case returns an empty list.
```python
resp = await gem_cli.models.generate_content(model=mn, contents=[{"role": "user", "parts": [{"text": "What is the weather in Istanbul today?"}]}], tools=[{"googleSearch": {}}])
norm_tool_calls(resp)
```
--------------------------------
### Initiate Chat with a Message
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/07_chat.ipynb
Example of initiating a chat conversation with a single message using the shortcut function.
```python
await c(msg)
```
--------------------------------
### Model Output: Claude-Sonnet-4-6 to Claude-Sonnet-4-6
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/06_acomplete.ipynb
Example output from claude-sonnet-4-6 when interacting with itself, providing a weather report for Brisbane.
```text
Output:
claude-sonnet-4-6 -> claude-sonnet-4-6: Here
is the current weather in **Brisbane, Queensland, Australia** for today
, **Friday, June 12, 202…
```
--------------------------------
### Streaming Output - How
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/06_acomplete.ipynb
Demonstrates the streaming output of the word 'How'.
```text
Output:
How
```
--------------------------------
### Model Output: Claude-Sonnet-4-6 to GPT-4o-search-preview
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/06_acomplete.ipynb
Example output from claude-sonnet-4-6 when interacting with gpt-4o-search-preview, detailing the weather in Brisbane.
```text
Output:
claude-sonnet-4-6 -> gpt-4o-search-preview: Here
is
the
current
weather
in
**
Br
isbane
,
Australia
**
for
today
,
**
Friday
,
June
12
,
202
6
**
:
## Wea…
```
--------------------------------
### Model Output: GPT-4o-search-preview to Gemini-3-flash-preview
Source: https://github.com/answerdotai/fastllm/blob/main/nbs/06_acomplete.ipynb
Example output from gpt-4o-search-preview when interacting with gemini-3-flash-preview, describing the weather in Brisbane.
```text
Output:
gpt-4o-search-preview -> models/gemini-3-flash-preview: As of 11:0
0 PM local time on Friday, June 12, 2026, in Brisbane, Australia, the weather is clear
and…
```
--------------------------------
### Using System Prompts with Different Providers
Source: https://github.com/answerdotai/fastllm/blob/main/README.md
Demonstrates passing a system prompt to Claude and Gemini models using FastLLM. Ensure the 'mtok' variable is defined and represents the maximum tokens.
```python
sys = "You are a pirate chef. Always respond in pirate speak and mention food."
print("Claude: ", end='')
r = await stream([user("What should I do today?")], model='claude-sonnet-4-20250514', system=sys, max_tokens=mtok)
print("Gemini: ", end='')
r = await stream([user("What should I do today?")], model='models/gemini-3-flash-preview', system=sys, max_tokens=mtok)
```