In past posts we explored In-Context Learning and an LLM-powered agent in Python. Today, let’s take a deeper look at a slightly more advanced technique: function calling. Function calling allows an LLM to detect when it should invoke an external tool to retrieve or process information, enabling smarter and more interactive assistants.
In this example, we use Phi 4, a compact yet capable LLM developed by Microsoft. We’ll show how a system prompt alone can be used to guide the model into issuing function calls based on user prompts. This is useful both in agentic applications and traditional LLM inference workflows.
The Starting Point: A Basic Assistant
Before introducing function calling, let’s look at how a basic LLM would respond to a query without access to real-time information. Below, we see the assistant declining to provide today’s weather in New York City due to its lack of external data access. This is expected behavior for a sandboxed model with no external tools.
User: Search today's weather in New York City, NY.
Assistant: As an AI text-based model, I don't have real-time capabilities or access to current data, so I can't provide today's weather information... [etc.]
Clearly, we need a way for the model to invoke a search tool when the query requires fresh, external information.
Enabling Function Calling via System Prompt
Below, we create a system prompt that includes few-shot examples (a form of In-Context Learning) that teach the model when and how to use a search_duckduckgo function. This is where the magic happens.
SYSTEM_PROMPT = (
"You are an assistant that can perform function calls using search_duckduckgo({ \"query\": \"...\" }). "
"Only use the function when the user asks for recent, location-specific, or online information..."
"Examples:\n"
"User: Who won the Oscar for Best Picture this year?\n"
"Assistant: search_duckduckgo({ \"query\": \"2025 Best Picture Oscar winner\" })\n"
"Assistant: The winner of Best Picture at the 2025 Oscars was The Midnight Star.\n"
"...\n"
"Begin."
)
This prompt format allows us to guide the LLM’s behavior without retraining or fine-tuning. Instead, we inject examples that show how to handle different types of user intent.
Why it works: LLMs are pattern matchers. When they see clear examples of user inputs and corresponding assistant outputs, they mimic those formats with high fidelity. By anchoring the pattern with multiple relevant examples, we improve the model’s ability to generalize to similar inputs.
Writing the Backend: Function and LLM Handler
We write Python code to:
- Send prompts to the LLM via a local API
- Parse the response for function call patterns
- Execute searches via DuckDuckGo
- Feed the results back into the model
def extract_function_call(content):
if "search_duckduckgo" in content:
try:
start = content.find('{')
end = content.find('}', start)
if start != -1 and end != -1:
return json.loads(content[start:end+1])
except Exception:
pass
return None
After extracting a query, we pass it into a DuckDuckGo search utility and return formatted results:
def duckduckgo_search(query, num_results=3):
ddgs = DDGS()
results = ddgs.text(query, max_results=num_results)
return [
{
"title": r.get("title", "No title"),
"link": r.get("href", "No link"),
"snippet": r.get("body", "No snippet")
}
for r in results
]
Small touches that add robustness:
- We default to “No title” or “No snippet” in case keys are missing.
- The search tool limits results to avoid token bloat downstream.
For production use, consider regex or structured parsing here for added robustness, but this simple example keeps it beginner-friendly.
You could extend this to fetch live APIs (like weather or stock data), interact with databases, or call local scripts. The same extraction and response flow applies—just swap out the duckduckgo_search() handler.
Demonstration: Weather Query
Once implemented, the assistant can now handle questions like:
User: What's the weather like in New York City?
Assistant: search_duckduckgo({ "query": "current weather in New York City" })
Assistant: It's currently 52°F and mostly cloudy in NYC, with winds around 10 mph.
Here’s an actual example:

This shows the model:
- Identifies the need for a function call
- Generates it in the correct syntax
- Consumes the returned data and responds appropriately
You’re essentially training the LLM—on the fly—to outsource specific types of work.
Additional Enhancements
- Caching: Add caching to store query results locally for repeated prompts.
- Rate Limiting: Protect external search endpoints from abuse.
- Fallbacks: If the search fails, respond gracefully with suggestions.
- Tool Expansion: Add more tools like
query_stock_price()orlookup_product_reviews()using the same mechanism.
These changes turn a simple Q&A model into a reactive agent that bridges generative reasoning with retrieval.
Takeaways
- You don’t need an orchestration framework to build smart LLM workflows—prompt engineering and a few lines of Python can go a long way.
- Using In-Context Learning to guide function calling is flexible, explainable, and easy to modify.
- You can integrate external tools, APIs, or custom logic into your assistant pipeline without retraining the model.
- With smart prompts and basic code, you can start layering intelligence into assistants that behave more like agents.
This method is lightweight and adaptable, perfect for experimentation or prototyping real-world AI agents. From here, you can imagine combining function calling with Retrieval-Augmented Generation (RAG), multi-agent workflows, or local document indexing.
Python Code
import requests
import json
from duckduckgo_search import DDGS
# === Config ===
API_URL = "http://localhost:5000/v1/chat/completions"
API_KEY = ""
MODEL_NAME = ""
SYSTEM_PROMPT = (
"You are an assistant that can perform function calls using search_duckduckgo({ \"query\": \"...\" }). "
"Only use the function when the user asks for recent, location-specific, or online information. "
"After the function is called, and relevant information is returned, continue the conversation using it.\n\n"
"Examples:\n"
"User: Who won the Oscar for Best Picture this year?\n"
"Assistant: search_duckduckgo({ \"query\": \"2025 Best Picture Oscar winner\" })\n"
"Assistant: The winner of Best Picture at the 2025 Oscars was The Midnight Star.\n\n"
"User: What's the weather like in New York City?\n"
"Assistant: search_duckduckgo({ \"query\": \"current weather in New York City\" })\n"
"Assistant: It's currently 52°F and mostly cloudy in NYC, with winds around 10 mph.\n\n"
"User: How do I reverse a string in Python?\n"
"Assistant: You can reverse a string by using slicing: my_string[::-1]\n\n"
"Begin."
)
# === Search Function ===
def duckduckgo_search(query, num_results=3):
try:
ddgs = DDGS()
results = ddgs.text(query, max_results=num_results)
return [
{
"title": r.get("title", "No title"),
"link": r.get("href", "No link"),
"snippet": r.get("body", "No snippet")
}
for r in results
]
except Exception:
return []
# === LLM Call ===
def call_llm(messages):
headers = {"Content-Type": "application/json"}
if API_KEY:
headers["Authorization"] = f"Bearer {API_KEY}" # Optional for local models
payload = {
"model": MODEL_NAME,
"messages": messages,
"temperature": 0.7,
"max_tokens": 1000
}
response = requests.post(API_URL, headers=headers, json=payload)
return response.json() if response.status_code == 200 else None
# === Extract Function Call ===
def extract_function_call(content):
if "search_duckduckgo" in content:
try:
start = content.find('{')
end = content.find('}', start)
if start != -1 and end != -1:
return json.loads(content[start:end+1])
except Exception:
pass
return None
# === Main Loop ===
def main():
history = [{"role": "system", "content": SYSTEM_PROMPT}]
while True:
user_input = input("\nUser: ")
if user_input.lower() in {"exit", "quit"}:
break
history.append({"role": "user", "content": user_input})
response = call_llm(history)
if not response:
continue
message = response["choices"][0]["message"]
content = message.get("content", "")
function_args = extract_function_call(content)
if function_args:
query = function_args.get("query", "")
results = duckduckgo_search(query)
# Format result text for assistant use
result_text = "\n\n".join(
f"{r['snippet']} ({r['title']}, {r['link']})" for r in results
)
# Prepare list of sources for final output
source_links = [f"{r['title']}: {r['link']}" for r in results]
# Inject function call and results as assistant content
history.append({"role": "assistant", "content": content})
history.append({"role": "assistant", "content": result_text})
# Request continuation with new knowledge
followup = call_llm(history)
if followup:
final_message = followup["choices"][0]["message"]["content"]
# Append sources at the end
if source_links:
final_message += "\n\nSources:\n" + "\n".join(source_links)
print(f"\n{final_message}\n")
history.append({"role": "assistant", "content": final_message})
else:
print(f"\n{content}\n")
history.append({"role": "assistant", "content": content})
if __name__ == "__main__":
main()






Leave a comment