### Example Tool Calls Source: https://github.com/answerdotai/fastllm/blob/main/nbs/07_chat.ipynb Provides an example list of ToolCall objects, each representing a call to the 'simple_add' tool with different arguments. ```python tool_calls = [ToolCall(id='123', name='simple_add', arguments={'a': 3, 'b': 5}, server=False, extra={'type': 'function'}), ToolCall(id='456', name='simple_add', arguments={'a': 10, 'b': 20}, server=False, extra={'type': 'function'})] ``` -------------------------------- ### Model Output Example Source: https://github.com/answerdotai/fastllm/blob/main/nbs/07_chat.ipynb Shows an example of the output received when interacting with a specific model, 'models/gemini-3.1-pro-preview'. This indicates the model's response format. ```text Result: Markdown(**models/gemini-3.1-pro-preview:**) ``` -------------------------------- ### Gemini CLI Content Generation Example Source: https://github.com/answerdotai/fastllm/blob/main/nbs/05_gemini.ipynb Demonstrates generating content using the Gemini CLI with a specific model and input. This is a non-streaming example used for testing. ```python inp = [{"role": "user", "parts": [{"text": "Hi how are you?"}]}] resp = await gem_cli.models.generate_content(model=mn, contents=inp) comp = mk_completion(resp, mn, api_name, vnd_nm) comp ``` -------------------------------- ### Chat with Tool Usage Example Source: https://github.com/answerdotai/fastllm/blob/main/nbs/07_chat.ipynb Demonstrates a chat interaction where the assistant uses a tool to perform a calculation. The example shows the message flow from user to assistant, tool use, tool result, and final assistant response. ```markdown Result: Markdown(**Msg** - role: `user` **Part** (`text`) What is 5 + 7? Use the tool to calculate it
- data: `None`
**Msg** - role: `assistant` **Part** (`tool_use`)
- data: `{'type': 'function_call', 'status': 'completed', 'call_id': 'call_B51xXWFg10YkHqkJVMODTx4i', 'id': 'fc_0e2920bf05b71dc30069f311b880888191a42435c7c037419e', 'name': 'async_add', 'arguments': {'a': 5, 'b': 7}, 'server': False}`
**Msg** - role: `tool` **Part** (`tool_result`) 12
- data: `{'id': 'fc_0e2920bf05b71dc30069f311d949c88192a50f746f6e2da30d', 'name': 'async_add', 'arguments': {'a': 5, 'b': 7}, 'server': False}`
**Msg** - role: `user` **Part** (`text`) You have used all your tool calls for this turn. Please summarize your findings. If you did not comp...
- data: `None`
**Msg** - role: `assistant` **Part** (`text`) I used the tool to calculate 5 + 7, and the result is 12. If you have more calculations or questions, feel free to ask!
- data: `{'type': 'output_text', 'logprobs': [], 'text': 'I used the tool to calculate 5 + 7, and the result is 12. If you have more calculations or questions, feel free to ask!', 'citations': []}`
``` -------------------------------- ### Example: Streaming Completion with OpenAI Source: https://github.com/answerdotai/fastllm/blob/main/nbs/02_oai_responses.ipynb Demonstrates how to obtain a streaming completion from the OpenAI API using a specified model and input. ```python mn,inp = 'gpt-4o-mini','Hi!' resp = await oai_cli.responses.create_response(model=mn,input=inp) comp = mk_completion(resp, mn, api_name, vnd_nm); comp ``` -------------------------------- ### Import LiteLLM Source: https://github.com/answerdotai/fastllm/blob/main/nbs/06_acomplete.ipynb Import the LiteLLM library to start using its functionalities. ```python import litellm ``` -------------------------------- ### Commented Out Qwen API Client Initialization Source: https://github.com/answerdotai/fastllm/blob/main/nbs/03_oai_chat.ipynb A commented-out example showing how to initialize an OpenAPIClient for the Qwen API with its specific endpoint. ```python # qwen_cli = OpenAPIClient(oai_spec, headers={"Authorization": f"Bearer {os.environ['QWEN_API_KEY']}"}) # for op in qwen_cli.ops: op.base_url = 'https://dashscope.aliyuncs.com/compatible-mode/v1' ``` -------------------------------- ### Example Completion Object Creation Source: https://github.com/answerdotai/fastllm/blob/main/nbs/00_types.ipynb Demonstrates how to create a Completion object with multiple parts, including 'thinking' and 'text' types, simulating a complex LLM response. ```python parts = [ Part(type=PartType.thinking, text="First, let me consider the question..."), Part(type=PartType.text, text="The answer involves two parts. "), Part(type=PartType.thinking, text="Now for the second part, I need to..."), Part(type=PartType.text, text="And here's the conclusion."), ] Completion(model='model', message=Msg(role="assistant", content=parts)) ``` -------------------------------- ### Streaming Cache Test Setup Source: https://github.com/answerdotai/fastllm/blob/main/nbs/06_acomplete.ipynb Sets up streaming cache tests for both `acomplete` and `litellm.completion` to verify cache reads and writes. ```python # Streaming - as a sanity check it1 = await acomplete([msg1], model='claude-sonnet-4-20250514', max_tokens=64, stream=True) # writes cache it2 = await acomplete([msg1, comp1.message, msg3], model='claude-sonnet-4-20250514', max_tokens=64, stream=True) # reads cache async for comp1 in it1: pass async for comp2 in it2: pass ``` -------------------------------- ### Example: Usage with Code Execution Tool Source: https://github.com/answerdotai/fastllm/blob/main/nbs/05_gemini.ipynb Illustrates a Gemini API call requesting code execution and normalizes the usage metadata. Note that code execution tool use is not explicitly logged in the normalized output in this example. ```python resp = await gem_cli.models.generate_content(model=mn, contents=[{"role": "user", "parts": [{"text": "Calculate the first 10 fibonacci numbers using code"}]}], tools=[{"codeExecution": {}}]) norm_usage(resp) ``` -------------------------------- ### Model Output: GPT-4o-search-preview to GPT-4o-search-preview Source: https://github.com/answerdotai/fastllm/blob/main/nbs/06_acomplete.ipynb Example output from gpt-4o-search-preview when interacting with itself, detailing current weather conditions. ```text Output: gpt-4o-search-preview -> gpt-4o-search-preview: As of 11 : 16 PM local time on Friday , June 12 , 202 6 , in Brisbane , Australia , the current weather cond… ``` -------------------------------- ### Model Output: GPT-4o-search-preview to GPT-4o-mini Source: https://github.com/answerdotai/fastllm/blob/main/nbs/06_acomplete.ipynb Example output from gpt-4o-search-preview when interacting with gpt-4o-mini, providing a weather update. ```text Output: gpt-4o-search-preview -> gpt-4o-mini : As of 11 : 16 PM local time on Friday , June 12 , 202 6 , in Brisbane , Australia , the weather is mostly cl… ``` -------------------------------- ### PartAccum Example: Merging Tool Calls Source: https://github.com/answerdotai/fastllm/blob/main/nbs/01_streaming.ipynb Demonstrates initializing PartAccum with a ToolCall and then merging parts, excluding tool calls. ```python pa = PartAccum({0: ToolCall(id='toolu_01GF7HEH9s63gdAYL5dbcSj5', name='python', arguments={}, server=False, extra={'caller': {'type': 'direct'}})}, []) pa.get_merged(False) ``` -------------------------------- ### Example: Basic Text Generation Usage Source: https://github.com/answerdotai/fastllm/blob/main/nbs/05_gemini.ipynb Demonstrates a basic text generation call to the Gemini API and normalizes its usage metadata. ```python resp = await gem_cli.models.generate_content(model=mn, contents=[{"role": "user", "parts": [{"text": "hi!"}]}]) norm_usage(resp) ``` -------------------------------- ### Registering a Test API Source: https://github.com/answerdotai/fastllm/blob/main/nbs/00_types.ipynb Demonstrates how to register a new API endpoint with the `api_registry`. This example registers a 'test' API with a simple function `f`. ```python def f(): print('test') api_registry.register('test',**{'f':f}) ``` -------------------------------- ### Example of Prepending System Message Source: https://github.com/answerdotai/fastllm/blob/main/nbs/03_oai_chat.ipynb Demonstrates prepending a system message to user messages for OpenAI Chat. Shows the resulting message list structure. ```python sp = 'You are a pirate. Always respond in pirate speak. Keep it to one sentence.' msg1 = mk_user_msg('What are you?') msgs = denorm_msgs([msg1]) msgs = denorm_system(sp, msgs); msgs ``` -------------------------------- ### Example of Part Instantiation with Long Data Source: https://github.com/answerdotai/fastllm/blob/main/nbs/00_types.ipynb Demonstrates creating a `Part` instance with a long text and a long dictionary value to showcase the truncation in its Markdown representation. ```python Part(PartType.text, 'Hello world!'*1000, data={'long':"10"*1000}) ``` -------------------------------- ### Create OpenAI Response with System Instructions Source: https://github.com/answerdotai/fastllm/blob/main/nbs/02_oai_responses.ipynb Demonstrates creating an OpenAI response using a system prompt and streaming the output. Requires setup for oai_cli, denorm_msgs, mk_user_msg, acollect_stream, and vendor name. ```python sp = 'You are a pirate. Always respond in pirate speak. Keep it to one sentence.' msg1 = mk_user_msg('What are you?') resp = await oai_cli.responses.create_response(model='gpt-4o-mini', input=denorm_msgs([msg1]), instructions=denorm_system(sp), stream=True) async for comp in acollect_stream(resp, vendor_name=vnd_nm): pass comp ``` -------------------------------- ### Get Model Information Source: https://github.com/answerdotai/fastllm/blob/main/nbs/00_types.ipynb Retrieves detailed information about a specific model from a given vendor. This is a commented-out example. ```python # get_model_info('claude-fable-5', 'anthropic') ``` ```python # get_model_info('MiniMax-M3', 'minimax') ``` -------------------------------- ### Cache Test Setup with Litellm Source: https://github.com/answerdotai/fastllm/blob/main/nbs/06_acomplete.ipynb Sets up a cache test scenario using `litellm.completion` with long text input and a summarization request, followed by a follow-up question. ```python big_msgs1 = [{"role":"user","content":[ {"type":"text","text":big_text,"cache_control":{"type":"ephemeral"}}, {"type":"text","text":"Summarize"}]}] litecomp1 = litellm.completion(model="anthropic/claude-sonnet-4-20250514", messages=big_msgs1, max_tokens=64) big_msgs2 = big_msgs1 + [ {"role":"assistant","content":litecomp1.choices[0].message.content}, {"role":"user","content":"Now in French"}] litecomp2 = litellm.completion(model="anthropic/claude-sonnet-4-20250514", messages=big_msgs2, max_tokens=64) ``` -------------------------------- ### Get Model Info for Codex Models Source: https://github.com/answerdotai/fastllm/blob/main/nbs/00_types.ipynb Retrieves detailed information for specific Codex models. These are examples of using `get_model_info` with different model identifiers. ```python get_model_info(codex53spark, 'codex') ``` ```python get_model_info(codex55, 'codex') ``` -------------------------------- ### Direct Text Completion Output (Part 1) Source: https://github.com/answerdotai/fastllm/blob/main/nbs/06_acomplete.ipynb This example shows the first part of a direct text completion output. It is a fragment of a larger response. ```text Hello ``` -------------------------------- ### Example Conversation with mk_msgs Source: https://github.com/answerdotai/fastllm/blob/main/nbs/07_chat.ipynb Demonstrates how to use `mk_msgs` to create a simple chat conversation with four messages. The output shows the resulting list of `Msg` objects. ```python msgs = mk_msgs(['Hey!',"Hi there!","How are you?","I'm doing fine and you?"]) msgs ``` -------------------------------- ### ToolCall Example Source: https://github.com/answerdotai/fastllm/blob/main/nbs/00_types.ipynb An example of a ToolCall object, which represents a function call made by an LLM. ```python ToolCall(id='oxwvx1fm', name='simple_add', arguments={'b': 547982745, 'a': 5478954793}, server=False, extra={'thoughtSignature': 'EscDCsQDAQw51scPHdv+D5BX7JWdLzz3Bv8tsKFRuAJe2UkTFZ+NZKzNsLtmQBiia+/r4HJEUptq1zQB0q9HToX0qzCUqyNAbDLY76KxMeW9jpsnUvh6ZjPM5sDD7fAafF7cjdApNMsihPqIZBAZjAlFPcp1c/50MObH5f1q7hO7fgDS4iSJ3Q3FfbAYWnJ4nlA2peVMu/6WFcKZh1wcZCIuN6iFCj6nhH+6RKkaFRaM0b6XCmpti6qldSeZx+qtHmo+lzr1tct4Gz/CITDI7gRJ3qfLYV2u45jOhKzdd1t6gQ39XLJ93j0xd0AwpzcdZLbHWqwWJCQ43nNzhJ7IQTAWOSyPgKDnlAMHq2PTEoXBYkMBApCZ1x+HncBzt77kQrTTe7sWGVmD5boVnYAIFPFGXOULP5tDZ+nog+Fg8NV10vaFKlHVf+VDzFnVWxT259LN12ykGtBilfpTXiKCV12RAZwhuL7vXXHrsBGg5HNVImcXqgMvwf/rtQlJeop+9bEcAiU48hMFMzumOrCmmHD3HgxpYLW7T3vtDmbNdKCDqVtIwO4Rp5HE6GudRWmq8iC2UnyQglUXoXVnxIZW7eYYDsGAYrYgZ1A='}) ``` -------------------------------- ### Prepare and Create Response with Tools Source: https://github.com/answerdotai/fastllm/blob/main/nbs/02_oai_responses.ipynb Sets up the model name, input prompt, and tools for an OpenAI API request, then creates a completion response. This is a common setup for making API calls that involve tool usage. ```python mn,inp,tools = 'gpt-4o-mini','What is the weather in Istanbul today?',[{"type": "web_search_preview"}] resp = await oai_cli.responses.create_response(model=mn,input=inp,tools=tools) comp = mk_completion(resp, mn, api_name, vnd_nm) ``` -------------------------------- ### Basic Chat Completion Example Source: https://github.com/answerdotai/fastllm/blob/main/nbs/07_chat.ipynb Illustrates a simple chat completion response from a Gemini model. This shows the structure of a successful message and usage statistics. ```python Result: Completion(model='models/gemini-3-flash-preview', message=Msg(role='assistant', content=[Part(type=, text='Hello! How can I help you today?', data={'citations': []})]), finish_reason=, usage=Usage(prompt_tokens=3, completion_tokens=9, total_tokens=77, cached_tokens=0, cache_creation_tokens=0, reasoning_tokens=65, raw={'promptTokenCount': 3, 'candidatesTokenCount': 9, 'totalTokenCount': 77, 'promptTokensDetails': [{'modality': 'TEXT', 'tokenCount': 3}], 'thoughtsTokenCount': 65}), tool_calls=[], api_name='gemini', vendor_name='gemini', raw={'deltas': [Delta(text='Hello! How can I', thinking='', refusal='', tool_calls=[], citations=[], server_tool_result=None, finish_reason=None, usage=Usage(prompt_tokens=3, completion_tokens=5, total_tokens=73, cached_tokens=0, cache_creation_tokens=0, reasoning_tokens=65, raw={'promptTokenCount': 3, 'candidatesTokenCount': 5, 'totalTokenCount': 73, 'promptTokensDetails': [{'modality': 'TEXT', 'tokenCount': 3}], 'thoughtsTokenCount': 65}), raw={'candidates': [{'content': {'parts': [{'text': 'Hello! How can I'}], 'role': 'model'}, 'index': 0}], 'usageMetadata': {'promptTokenCount': 3, 'candidatesTokenCount': 5, 'totalTokenCount': 73, 'promptTokensDetails': [{'modality': 'TEXT', 'tokenCount': 3}], 'thoughtsTokenCount': 65}, 'modelVersion': 'gemini-3-flash-preview', 'responseId': 'gRLzacDtIt6A3boP9vCWsAg'}), Delta(text=' help you today?', thinking='', refusal='', tool_calls=[], citations=[], server_tool_result=None, finish_reason=None, usage=Usage(prompt_tokens=3, completion_tokens=9, total_tokens=77, cached_tokens=0, cache_creation_tokens=0, reasoning_tokens=65, raw={'promptTokenCount': 3, 'candidatesTokenCount': 9, 'totalTokenCount': 77, 'promptTokensDetails': [{'modality': 'TEXT', 'tokenCount': 3}], 'thoughtsTokenCount': 65}), raw={'candidates': [{'content': {'parts': [{'text': ' help you today?'}], 'role': 'model'}, 'index': 0}], 'usageMetadata': {'promptTokenCount': 3, 'candidatesTokenCount': 9, 'totalTokenCount': 77, 'promptTokensDetails': [{'modality': 'TEXT', 'tokenCount': 3}], 'thoughtsTokenCount': 65}, 'modelVersion': 'gemini-3-flash-preview', 'responseId': 'gRLzacDtIt6A3boP9vCWsAg'}), Delta(text='', thinking='', refusal='', tool_calls=[], citations=[], server_tool_result=None, finish_reason=, usage=Usage(prompt_tokens=3, completion_tokens=9, total_tokens=77, cached_tokens=0, cache_creation_tokens=0, reasoning_tokens=65, raw={'promptTokenCount': 3, 'candidatesTokenCount': 9, 'totalTokenCount': 77, 'promptTokensDetails': [{'modality': 'TEXT', 'tokenCount': 3}], 'thoughtsTokenCount': 65}), raw={'candidates': [{'content': {'parts': [{'text': '', 'thoughtSignature': 'EtQCCtECAQw51sfRXDOcN9iujtIsIdL/3Hqy9Ppa7GABoJXxMd00zUZUs4rcHJs815F1BZP0RlKbRrxtSACPJBb5ypxaKrzijIymPV7n9FynodoT6/B7wJquuHXD6rIvPy9/nssqrWAcBA5fJOdjXRtfM3tMLhIcl6Np3L87f6KeOwgS/npqJLikxKJxHFukl1cRw2COc3gqKfksPcAwPydBUcegmji3elck26EZmqzqO8+jCETceWkThUCxmg9jM9oWI3JmmrOFSKZ9/IcIFf4kuz/xeFxzPbdh/PQW1GMHndLy/PErTkRIwu5HtcZQYAZWcwqB3ob6ulYi0NdDWl9Y1SeMCa911GpG1W3iOro46AZcpe/+eEj16TFCqReGU6nD2MSHx9iNcGhTu919tAW5BGw1sKfZV5PFltMBAzQTRvplLakXAwsdxE/jPheVo6PZj9VtXQ=='}], 'role': 'model'}, 'finishReason': 'STOP', 'index': 0}], 'usageMetadata': {'promptTokenCount': 3, 'candidatesTokenCount': 9, 'totalTokenCount': 77, 'promptTokensDetails': [{'modality': 'TEXT', 'tokenCount': 3}], 'thoughtsTokenCount': 65}, 'modelVersion': 'gemini-3-flash-preview', 'responseId': 'gRLzacDtIt6A3boP9vCWsAg'})]}) ``` -------------------------------- ### Initialize AsyncChat with Different Configurations Source: https://github.com/answerdotai/fastllm/blob/main/nbs/07_chat.ipynb Demonstrates how to initialize the AsyncChat class with various configurations, including auto-inferred models, known vendors, and explicit API settings. ```python # Auto inferred c = AsyncChat("claude-opus-4-5") c = AsyncChat("models/gemini-3-pro-preview") c = AsyncChat("gpt-4.1") # Known Vendor c = AsyncChat("gpt-5.4", vendor_name="codex") # Explicit c = AsyncChat("gpt-oss-20b", api_name='openai_chat', api_key='...', base_url='https://openrouter.ai/api/v1') ``` -------------------------------- ### Chat Model Completion Output Example Source: https://github.com/answerdotai/fastllm/blob/main/nbs/07_chat.ipynb An example of a chat model's completion output, including reasoning, descriptive text, and usage statistics for an image analysis task. ```python Result: Completion(model='accounts/fireworks/models/kimi-k2p6', message=Msg(role='assistant', content=[Part(type='thinking', text="The user wants me to identify what's in the image. Looking at the image, I can see a small puppy with brown and white fur lying on green grass. Next to the puppy are some purple flowers (likely asters or similar small purple flowers). The puppy appears to be a Cavalier King Charles Spaniel puppy, given its distinctive coloring - white face with brown ears and markings, large dark eyes, and that specific puppy-like appearance. It's lying down with its front paws extended on the grass.\n\nI should describe the image clearly and accurately. The image shows:\n- A puppy (Cavalier King Charles Spaniel)\n- Brown and white coloring\n- Lying on green grass\n- Purple flowers nearby (to the left of the puppy)\n- Cute, looking at camera\n\nLet me provide a friendly, descriptive answer.", data=None), Part(type='text', text="This image shows an adorable **Cavalier King Charles Spaniel puppy** lying on green grass. The puppy has the breed's characteristic **brown and white coat**, with large dark eyes, floppy brown ears, and a white blaze down the center of its face. It's resting with its front paws stretched out, looking directly at the camera. \n\nNext to the puppy (on the left side of the image) is a cluster of **small purple flowers** —likely asters or daisy-like blooms—growing in the grass. The overall scene is very cute and gives off a sweet, summery, outdoor vibe! 🐶🌸", data={'citations': []})]), finish_reason='stop', usage=Usage(prompt_tokens=107, completion_tokens=300, total_tokens=407, cached_tokens=0, cache_creation_tokens=0, reasoning_tokens=0, raw={'prompt_tokens': 107, 'total_tokens': 407, 'completion_tokens': 300, 'prompt_tokens_details': {'cached_tokens': 0}}), tool_calls=[], api_name='openai_chat', vendor_name='fireworks_ai', raw={'id': 'chatcmpl-1cde9440db084c7aaeb92b30e84a41b8', 'object': 'chat.completion', 'created': 1777533873, 'model': 'accounts/fireworks/models/kimi-k2p6', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': "This image shows an adorable **Cavalier King Charles Spaniel puppy** lying on green grass. The puppy has the breed's characteristic **brown and white coat**, with large dark eyes, floppy brown ears, and a white blaze down the center of its face. It's resting with its front paws stretched out, looking directly at the camera. \n\nNext to the puppy (on the left side of the image) is a cluster of **small purple flowers** —likely asters or daisy-like blooms—growing in the grass. The overall scene is very cute and gives off a sweet, summery, outdoor vibe! 🐶🌸", 'reasoning_content': "The user wants me to identify what's in the image. Looking at the image, I can see a small puppy with brown and white fur lying on green grass. Next to the puppy are some purple flowers (likely asters or similar small purple flowers). The puppy appears to be a Cavalier King Charles Spaniel puppy, given its distinctive coloring - white face with brown ears and markings, large dark eyes, and that specific puppy-like appearance. It's lying down with its front paws extended on the grass.\n\nI should describe the image clearly and accurately. The image shows:\n- A puppy (Cavalier King Charles Spaniel)\n- Brown and white coloring\n- Lying on green grass\n- Purple flowers nearby (to the left of the puppy)\n- Cute, looking at camera\n\nLet me provide a friendly, descriptive answer."}, 'finish_reason': 'stop', 'token_ids': None}], 'usage': {'prompt_tokens': 107, 'total_tokens': 407, 'completion_tokens': 300, 'prompt_tokens_details': {'cached_tokens': 0}}, 'prompt_token_ids': [163587, 2482, 163601, 45702, 1573, 306, 566, 4082, 30, 163602, 4017, 163603, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163605, 163604, 198, 163586, 163588, 69702, 163601, 163606]}) ``` -------------------------------- ### FastLLM Setup and Helper Functions Source: https://github.com/answerdotai/fastllm/blob/main/README.md Imports necessary types and functions from fastllm for chat completions. Includes helper functions for creating user messages and streaming responses, which print thinking processes and text as they arrive. ```python from fastllm.types import Msg, Part, PartType, Completion from fastllm.acomplete import acomplete, mk_tool_res_msg import asyncio, json # Helpers def user(text): return Msg(role='user', content=[Part(type=PartType.text, text=text)]) async def stream(msgs, model, **kw): """Stream a response, printing text/thinking as it arrives. Returns the final Completion.""" cnt, max_think = 0, 10 async for o in await acomplete(msgs, model, stream=True, **kw): if not isinstance(o, Completion): if o.get('thinking') and cnt < max_think: print('🧠', end='', flush=True) if txt := o.get('text'): print(txt, end='', flush=True) cnt += 1 print() return o ``` -------------------------------- ### Gemini Completion Example Source: https://github.com/answerdotai/fastllm/blob/main/nbs/06_acomplete.ipynb Demonstrates a successful completion from the Gemini API, including model name, message content, finish reason, and usage statistics. ```text Result: Completion(model='models/gemini-3-flash-preview', message=Msg(role='assistant', content=[Part(type=, text='Hello! How can I help you today?', data={'citations': []})]), finish_reason=, usage=Usage(prompt_tokens=3, completion_tokens=9, total_tokens=35, cached_tokens=0, cache_creation_tokens=0, reasoning_tokens=23, raw={'promptTokenCount': 3, 'candidatesTokenCount': 9, 'totalTokenCount': 35, 'promptTokensDetails': [{'modality': 'TEXT', 'tokenCount': 3}], 'thoughtsTokenCount': 23, 'serviceTier': 'standard'}), tool_calls=[], api_name='gemini', vendor_name='gemini', raw={'deltas': [Delta(text='Hello! How can I help you today?', thinking='', refusal='', tool_calls=[], citations=[], server_tool_result=None, finish_reason=None, usage=Usage(prompt_tokens=3, completion_tokens=9, total_tokens=35, cached_tokens=0, cache_creation_tokens=0, reasoning_tokens=23, raw={'promptTokenCount': 3, 'candidatesTokenCount': 9, 'totalTokenCount': 35, 'promptTokensDetails': [{'modality': 'TEXT', 'tokenCount': 3}], 'thoughtsTokenCount': 23, 'serviceTier': 'standard'}), raw={'candidates': [{'content': {'parts': [{'text': 'Hello! How can I help you today?'}], 'role': 'model'}, 'index': 0}], 'usageMetadata': {'promptTokenCount': 3, 'candidatesTokenCount': 9, 'totalTokenCount': 35, 'promptTokensDetails': [{'modality': 'TEXT', 'tokenCount': 3}], 'thoughtsTokenCount': 23, 'serviceTier': 'standard'}, 'modelVersion': 'gemini-3-flash-preview', 'responseId': 'zgMsau-6BuO2_uMP4erKgQc'}), Delta(text='', thinking='', refusal='', tool_calls=[], citations=[], server_tool_result=None, finish_reason=, usage=Usage(prompt_tokens=3, completion_tokens=9, total_tokens=35, cached_tokens=0, cache_creation_tokens=0, reasoning_tokens=23, raw={'promptTokenCount': 3, 'candidatesTokenCount': 9, 'totalTokenCount': 35, 'promptTokensDetails': [{'modality': 'TEXT', 'tokenCount': 3}], 'thoughtsTokenCount': 23, 'serviceTier': 'standard'}), raw={'candidates': [{'content': {'parts': [{'text': '', 'thoughtSignature': 'ErkBCrYBAQw51sf6XKqYlAH0hfjkKYIf2UGH2zQzCbpusz4xPgjpm8sfiwdjf3sXmr4Ii0wXe5/JEaUY/gx6M2+GkSZj+D+bV12cYzLNZe0H2Jv27iPgCCH3/gNLkwz6sNcaxM3SdQ8ldXf/7Mj5gVBTedYL9LB8XkQPoF6jayDn5/lpR5iYmPXcd9NCgWRT73OoRbsbKqg1LLJcHclabsBZkCqQU8/PqDZrPPzP8KAaUgCS1sMzQPIqI54='}], 'role': 'model'}, 'finishReason': 'STOP', 'index': 0}], 'usageMetadata': {'promptTokenCount': 3, 'candidatesTokenCount': 9, 'totalTokenCount': 35, 'promptTokensDetails': [{'modality': 'TEXT', 'tokenCount': 3}], 'thoughtsTokenCount': 23, 'serviceTier': 'standard'}, 'modelVersion': 'gemini-3-flash-preview', 'responseId': 'zgMsau-6BuO2_uMP4erKgQc'})]}) ``` -------------------------------- ### Direct Text Completion Output (Part 2) Source: https://github.com/answerdotai/fastllm/blob/main/nbs/06_acomplete.ipynb This example shows the second part of a direct text completion output. It continues the response from the previous snippet. ```text ! How can I help you today? ``` -------------------------------- ### Content Generation with Google Search Tool Source: https://github.com/answerdotai/fastllm/blob/main/nbs/05_gemini.ipynb This example demonstrates how to generate content by leveraging the Google Search tool. Provide an empty dictionary for the 'googleSearch' tool to enable its use. ```python resp = await gem_cli.models.generate_content(model=mn, contents=[{"role": "user", "parts": [{"text": "What is the weather in Istanbul today?"}]}], tools=[{"googleSearch": {}}]) comp = mk_completion(resp, mn, api_name, vnd_nm) comp ``` -------------------------------- ### Example Usage of run_fence_tool Source: https://github.com/answerdotai/fastllm/blob/main/nbs/07_chat.ipynb Illustrates the usage of the `run_fence_tool` function with Python and Bash examples. It asserts that the output for a Python print statement is '3' and the output for a Bash 'ls' command is 'bash: ls'. ```python out = await run_fence_tool('py', 'print(1+2)', _ns) test_eq(_result_re.search(out).group(1), '3') out = await run_fence_tool('bash', 'ls', _ns) test_eq(_result_re.search(out).group(1), 'bash: ls') ``` -------------------------------- ### Anthropic API Response Example Source: https://github.com/answerdotai/fastllm/blob/main/nbs/04_anthropic.ipynb An example of a structured response from the Anthropic API, including message content, usage statistics, and raw API details. This shows the output after processing a streaming request. ```text Result: Completion(model=None, message=Msg(role='assistant', content=[Part(type=, text='I can see a very small red square or rectangle in the image. The image appears to be mostly white/transparent with just this small red geometric shape visible in what looks like the upper left area. The red element is quite small and appears to be a simple solid color shape.', data={'citations': []})]), finish_reason=, usage=Usage(prompt_tokens=77, completion_tokens=59, total_tokens=136, cached_tokens=0, cache_creation_tokens=0, reasoning_tokens=0, raw={'input_tokens': 77, 'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'output_tokens': 59}), tool_calls=[], api_name='anthropic', vendor_name=None, raw={'deltas': [Delta(text=None, thinking=None, refusal='', tool_calls=[], citations=None, server_tool_result=None, finish_reason=None, usage=None, raw={'type': 'message_start', 'message': {'model': 'claude-sonnet-4-20250514', 'id': 'msg_01Jj7S7fWhFgBBXoFS6ACd6F', 'type': 'message', 'role': 'assistant', 'content': [], 'stop_reason': None, 'stop_sequence': None, 'stop_details': None, 'usage': {'input_tokens': 77, 'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'cache_creation': {'ephemeral_5m_input_tokens': 0, 'ephemeral_1h_input_tokens': 0}, 'output_tokens': 2, 'service_tier': 'standard', 'inference_geo': 'not_available'}}}), Delta(text=None, thinking=None, refusal='', tool_calls=[], citations=None, server_tool_result=None, finish_reason=None, usage=None, raw={'type': 'content_block_start', 'index': 0, 'content_block': {'type': 'text', 'text': ''}}), Delta(text=None, thinking=None, refusal='', tool_calls=[], citations=None, server_tool_result=None, finish_reason=None, usage=None, raw={'type': 'ping'}), Delta(text='I can', thinking=None, refusal='', tool_calls=[], citations=None, server_tool_result=None, finish_reason=None, usage=None, raw={'type': 'content_block_delta', 'index': 0, 'delta': {'type': 'text_delta', 'text': 'I can'}}), Delta(text=' see a very small red square or rectangle in the image. The image', thinking=None, refusal='', tool_calls=[], citations=None, server_tool_result=None, finish_reason=None, usage=None, raw={'type': 'content_block_delta', 'index': 0, 'delta': {'type': 'text_delta', 'text': ' see a very small red square or rectangle in the image. The image'}}), Delta(text=' appears to be mostly white/transparent with just this small', thinking=None, refusal='', tool_calls=[], citations=None, server_tool_result=None, finish_reason=None, usage=None, raw={'type': 'content_block_delta', 'index': 0, 'delta': {'type': 'text_delta', 'text': ' appears to be mostly white/transparent with just this small'}}), Delta(text=' red geometric shape visible in what looks like the upper left area. The red element is', thinking=None, refusal='', tool_calls=[], citations=None, server_tool_result=None, finish_reason=None, usage=None, raw={'type': 'content_block_delta', 'index': 0, 'delta': {'type': 'text_delta', 'text': ' red geometric shape visible in what looks like the upper left area. The red element is'}}), Delta(text=' quite small and appears to be a simple solid color shape.', thinking=None, refusal='', tool_calls=[], citations=None, server_tool_result=None, finish_reason=None, usage=None, raw={'type': 'content_block_delta', 'index': 0, 'delta': {'type': 'text_delta', 'text': ' quite small and appears to be a simple solid color shape.'}}), Delta(text=None, thinking=None, refusal='', tool_calls=[], citations=None, server_tool_result=None, finish_reason=None, usage=None, raw={'type': 'content_block_stop', 'index': 0}), Delta(text='', thinking='', refusal='', tool_calls=[], citations=[], server_tool_result=None, finish_reason=, usage=Usage(prompt_tokens=77, completion_tokens=59, total_tokens=136, cached_tokens=0, cache_creation_tokens=0, reasoning_tokens=0, raw={'input_tokens': 77, 'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'output_tokens': 59}), raw={'type': 'message_delta', 'delta': {'stop_reason': 'end_turn', 'stop_sequence': None, 'stop_details': None}, 'usage': {'input_tokens': 77, 'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'output_tokens': 59}})]}) ``` -------------------------------- ### Initiate Chat with Web Search Options Source: https://github.com/answerdotai/fastllm/blob/main/nbs/07_chat.ipynb This snippet shows how to start a chat interaction using a specific model and tools, while also configuring web search options. It's useful for scenarios requiring real-time information retrieval during a conversation. ```python await c(smsg, m=gpt54m, tools=[toolsc], web_search_options={}) ``` -------------------------------- ### Unified Chat Interface Example Source: https://github.com/answerdotai/fastllm/blob/main/nbs/07_chat.ipynb Demonstrates the unified chat interface by calling `acomplete` with different LLM models and a sample user message. This showcases the ability to switch providers easily. ```python ms = ["models/gemini-3.1-pro-preview", "models/gemini-3-flash-preview", "claude-sonnet-4-6", "gpt-4.1"] msgs = [Msg(role='user', content=[Part(type=PartType.text, text='Hi there!', data={"cache_control": {"type": "ephemeral"}})])] for m in ms: display(Markdown(f'**{m}:**')) display(await acomplete(msgs, m)) ``` -------------------------------- ### Cache Test Setup with Long Text and Summarization Source: https://github.com/answerdotai/fastllm/blob/main/nbs/06_acomplete.ipynb Sets up a cache test scenario using `acomplete` with long text input and a summarization request, followed by a follow-up question. ```python cc = {"cache_control": {"type": "ephemeral"}} big_text = 'The quick brown fox jumps over the lazy dog. ' * 200 msg1 = Msg('user', content=[Part('text', big_text, data=cc), Part('text', 'Summarize')]) comp1 = await acomplete([msg1], model='claude-sonnet-4-20250514', max_tokens=64) # writes cache msg3 = Msg('user', content=[Part('text', 'Now in French')]) comp2 = await acomplete([msg1, comp1.message, msg3], model='claude-sonnet-4-20250514', max_tokens=64) # reads cache ``` -------------------------------- ### Audio and Text Input Completion (Pro Model) Source: https://github.com/answerdotai/fastllm/blob/main/nbs/06_acomplete.ipynb Compares `acomplete` and LiteLLM `completion` for audio and text input using the 'pro' model. Note the specific handling of `audio_b64` for cost comparison. ```python msg = Msg('user', content=[Part(PartType.input_audio, audio_b64), Part('text', 'What is this audio saying?')]) comp = await acomplete([msg], model=pro_mn, temperature=0.0) litecomp = litellm.completion(model=lpro_mn, messages=[{"role":"user","content":[{"type":"input_audio","input_audio":{"data":audio_b64.split(',', 1)[1],"format":"wav"}},{"type":"text","text":"What is this audio saying?"}]}], temperature=0.0) # test_close(litellm.completion_cost(completion_response=litecomp), comp.cost, 1e-3) litellm.completion_cost(completion_response=litecomp), comp.cost ``` -------------------------------- ### Get Model Name Source: https://github.com/answerdotai/fastllm/blob/main/nbs/07_chat.ipynb Retrieves the name of the model being used. ```python ms[2] ``` -------------------------------- ### Example: Usage with URL Context Tool Source: https://github.com/answerdotai/fastllm/blob/main/nbs/05_gemini.ipynb Demonstrates a Gemini API call using the URL context tool and normalizes the usage metadata, which includes significant token counts for tool use. ```python resp = await gem_cli.models.generate_content(model=mn, contents=[{"role": "user", "parts": [{"text": "What is solveit? https://solve.it.com/"}]}], tools=[{"urlContext": {}}]) norm_usage(resp) ``` -------------------------------- ### Retrieve Model Information Source: https://github.com/answerdotai/fastllm/blob/main/nbs/00_types.ipynb Retrieves information for a specific model and vendor. This is a commented-out example. ```python # get_model_info('kimi-k2.7-code', 'moonshot') ``` -------------------------------- ### Clear Chat History Source: https://github.com/answerdotai/fastllm/blob/main/nbs/07_chat.ipynb Resets the conversation history, allowing for a fresh start in a new conversation. ```python chat.clear_history() print("Chat history cleared.") ``` -------------------------------- ### Search Tool Call Example Source: https://github.com/answerdotai/fastllm/blob/main/nbs/05_gemini.ipynb Demonstrates using the Gemini API with a Google Search tool. It shows how to call the API for a search query and then normalize the tool calls, which in this case returns an empty list. ```python resp = await gem_cli.models.generate_content(model=mn, contents=[{"role": "user", "parts": [{"text": "What is the weather in Istanbul today?"}]}], tools=[{"googleSearch": {}}]) norm_tool_calls(resp) ``` -------------------------------- ### Initiate Chat with a Message Source: https://github.com/answerdotai/fastllm/blob/main/nbs/07_chat.ipynb Example of initiating a chat conversation with a single message using the shortcut function. ```python await c(msg) ``` -------------------------------- ### Model Output: Claude-Sonnet-4-6 to Claude-Sonnet-4-6 Source: https://github.com/answerdotai/fastllm/blob/main/nbs/06_acomplete.ipynb Example output from claude-sonnet-4-6 when interacting with itself, providing a weather report for Brisbane. ```text Output: claude-sonnet-4-6 -> claude-sonnet-4-6: Here is the current weather in **Brisbane, Queensland, Australia** for today , **Friday, June 12, 202… ``` -------------------------------- ### Streaming Output - How Source: https://github.com/answerdotai/fastllm/blob/main/nbs/06_acomplete.ipynb Demonstrates the streaming output of the word 'How'. ```text Output: How ``` -------------------------------- ### Model Output: Claude-Sonnet-4-6 to GPT-4o-search-preview Source: https://github.com/answerdotai/fastllm/blob/main/nbs/06_acomplete.ipynb Example output from claude-sonnet-4-6 when interacting with gpt-4o-search-preview, detailing the weather in Brisbane. ```text Output: claude-sonnet-4-6 -> gpt-4o-search-preview: Here is the current weather in ** Br isbane , Australia ** for today , ** Friday , June 12 , 202 6 ** : ## Wea… ``` -------------------------------- ### Model Output: GPT-4o-search-preview to Gemini-3-flash-preview Source: https://github.com/answerdotai/fastllm/blob/main/nbs/06_acomplete.ipynb Example output from gpt-4o-search-preview when interacting with gemini-3-flash-preview, describing the weather in Brisbane. ```text Output: gpt-4o-search-preview -> models/gemini-3-flash-preview: As of 11:0 0 PM local time on Friday, June 12, 2026, in Brisbane, Australia, the weather is clear and… ``` -------------------------------- ### Using System Prompts with Different Providers Source: https://github.com/answerdotai/fastllm/blob/main/README.md Demonstrates passing a system prompt to Claude and Gemini models using FastLLM. Ensure the 'mtok' variable is defined and represents the maximum tokens. ```python sys = "You are a pirate chef. Always respond in pirate speak and mention food." print("Claude: ", end='') r = await stream([user("What should I do today?")], model='claude-sonnet-4-20250514', system=sys, max_tokens=mtok) print("Gemini: ", end='') r = await stream([user("What should I do today?")], model='models/gemini-3-flash-preview', system=sys, max_tokens=mtok) ```