Agentic Systems: Ollama Model Suite vs Roo Code with Configs & Code Samples

1. Executive Summary

This report investigates the use of Ollama for running a suite of AI models locally within agentic systems, comparing this approach to agent-based tools like Roo Code. We explore Ollama’s features, model configuration methods, and provide code samples to illustrate its implementation. The analysis aims to provide insights into the advantages and limitations of using Ollama for local model execution in agentic systems compared to cloud-based alternatives.

2. Introduction to Ollama and Agentic Systems

2.1 What is Ollama?

Ollama is an open-source platform designed to enable the local execution of large language models (LLMs) on personal or organizational hardware. It simplifies the process of downloading, managing, and running LLMs, offering a user-friendly interface and robust customization options. By running models locally, Ollama provides enhanced privacy, reduced latency, and independence from cloud-based services [1] [2].

2.2 Agentic Systems Overview

Agentic systems are AI-powered frameworks that can perform tasks autonomously or semi-autonomously on behalf of users. These systems often leverage multiple AI models to handle various aspects of complex tasks, such as natural language processing, decision-making, and task planning.

3. Key Features of Ollama for Agentic Systems

Ollama offers a comprehensive set of features that make it a powerful tool for developers, researchers, and organizations building agentic systems. Below are its primary features:

3.1 Local Execution of AI Models

3.2 Extensive Model Library

3.3 Customization and Fine-Tuning

3.4 Hardware Optimization

3.5 Ease of Use

3.6 Cost Efficiency

3.7 Integration with Development Tools

3.8 Privacy and Security

4. How Ollama Runs AI Models Locally for Agentic Systems

Ollama’s architecture and workflow are designed to streamline the process of running LLMs on local hardware, making it ideal for agentic systems. Below is an overview of how it operates:

4.1 Installation

Ollama can be installed on macOS, Linux, and Windows (preview version) using a simple command or a downloadable installer. For example:

curl -fsSL https://ollama.com/install.sh | sh

[16]

4.2 Model Management

Models are downloaded from Ollama’s library using the ollama pull command. For instance:

ollama pull llama3.2

[17] [18]

The downloaded models are stored locally, typically in directories such as ~/.ollama/models [19].

4.3 Running Models

Once a model is downloaded, it can be executed using the ollama run command. For example:

ollama run llama3.2

This command initializes the model and opens a prompt for user interaction [20] [21].

4.4 Customization with Modelfiles

Users can create Modelfiles to configure model parameters and behavior. For example:

FROM llama3.2
PARAMETER temperature 0.7
SYSTEM "You are a helpful assistant."

The model can then be created and run as follows:

ollama create mymodel -f Modelfile
ollama run mymodel

[7] [22]

4.5 API Integration

Ollama exposes a RESTful API for programmatic interaction with models. Developers can send prompts and receive responses using HTTP requests. For example:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Explain quantum computing."
}'

[10]

4.6 Hardware Utilization

Ollama automatically detects and utilizes available hardware resources, including GPUs and CPUs. It supports advanced features like Flash Attention to optimize memory usage and improve token generation speed.

4.7 Monitoring and Management

Users can monitor loaded models and resource usage using commands like ollama ps. The platform also supports concurrent processing and queue management for handling multiple requests [23] [24].

5. Ollama’s Model Configuration Method

Ollama’s model configuration method is centered around the Modelfile, which serves as a blueprint for creating, customizing, and sharing models within the Ollama ecosystem. The Modelfile is designed to be simple yet flexible, allowing users to define the behavior and parameters of their models [25] [26].

5.1 Structure of the Modelfile

The Modelfile follows a specific syntax and includes several key instructions, each serving a distinct purpose. The primary components of a Modelfile are as follows:

5.1.1 FROM Instruction

Specifies the base model to use. This is a mandatory field and defines the foundation upon which the custom model is built [27] [28].

Example:

FROM llama3.2

5.1.2 PARAMETER Instruction

Allows customization of model behavior through various settings. Common parameters include:

Example:

PARAMETER temperature 1

5.1.3 TEMPLATE Instruction

Defines the full prompt template sent to the model. This can include placeholders for user input and system messages [30] [31].

Example:

TEMPLATE """<|user|> {{ .Prompt }} <|assistant|>"""

5.1.4 SYSTEM Instruction

Specifies the system message that sets the context or role for the model during interactions [32] [33].

Example:

SYSTEM """A helpful assistant that provides accurate and concise answers."""

5.1.5 ADAPTER Instruction

Defines (Q)LoRA adapters to apply to the base model for fine-tuning [34] [35].

5.1.6 LICENSE Instruction

Specifies the legal license under which the model is shared or distributed [36] [37].

5.1.7 MESSAGE Instruction

Allows specification of message history for the model to use when generating responses [38].

5.2 Creating and Running Models

To create and run a model using a Modelfile, the following steps are typically followed:

  1. Save the Modelfile with the desired configuration (e.g., Modelfile).
  2. Use the ollama create command to generate a new model:
    ollama create model_name -f ./Modelfile
    
  3. Run the model using the ollama run command:
    ollama run model_name
    
  4. Verify the model’s behavior by interacting with it through the terminal or API [39] [40].

5.3 Customization and Fine-Tuning

The Modelfile allows for extensive customization, enabling users to tailor models to specific tasks. For example:

5.4 Examples of Modelfile Configurations

Example 1: Basic Modelfile

FROM llama3.2
PARAMETER temperature 0.7
PARAMETER num_ctx 4096
TEMPLATE """<|user|> {{ .Prompt }} <|assistant|>"""
SYSTEM """A helpful assistant that provides concise answers."""

Example 2: Customized Modelfile for a Specific Task

FROM llama3.2
PARAMETER temperature 1
PARAMETER num_ctx 8192
TEMPLATE """<|system|> {{ .System }} <|user|> {{ .Prompt }} <|assistant|>"""
SYSTEM """An AI assistant specialized in blockchain and Web3 technologies."""

6. Implementing Agentic Systems with Ollama

To implement an agentic system using Ollama, developers can leverage the platform’s API and model configuration capabilities. Here’s a step-by-step guide with code samples:

6.1 Setting Up the Environment

First, ensure Ollama is installed and running on your system. Then, you can use Python to interact with the Ollama API. Install the required libraries:

pip install requests

6.2 Creating a Basic Agent

Here’s a simple implementation of an agent using Ollama:

import requests
import json
class OllamaAgent:
    def __init__(self, model_name):
        self.model_name = model_name
        self.api_url = "http://localhost:11434/api/generate"
    def generate_response(self, prompt):
        payload = {
            "model": self.model_name,
            "prompt": prompt
        }
        response = requests.post(self.api_url, json=payload)
        return json.loads(response.text)['response']
    def perform_task(self, task_description):
        prompt = f"Task: {task_description}\nPlease provide a step-by-step solution."
        return self.generate_response(prompt)
# Usage
agent = OllamaAgent("llama3.2")
result = agent.perform_task("Create a Python function to calculate the Fibonacci sequence.")
print(result)

6.3 Implementing a Multi-Model Agent

For more complex agentic systems, you might want to use multiple models for different tasks. Here’s an example of how to implement this:

import requests
import json
class MultiModelAgent:
    def __init__(self):
        self.api_url = "http://localhost:11434/api/generate"
        self.models = {
            "general": "llama3.2",
            "code": "codellama",
            "math": "phi-4"
        }
    def generate_response(self, model, prompt):
        payload = {
            "model": model,
            "prompt": prompt
        }
        response = requests.post(self.api_url, json=payload)
        return json.loads(response.text)['response']
    def analyze_task(self, task_description):
        prompt = f"Analyze this task and categorize it as 'general', 'code', or 'math': {task_description}"
        return self.generate_response(self.models['general'], prompt)
    def perform_task(self, task_description):
        task_type = self.analyze_task(task_description).lower().strip()
        model = self.models.get(task_type, self.models['general'])
        prompt = f"Task: {task_description}\nPlease provide a detailed solution."
        return self.generate_response(model, prompt)
# Usage
agent = MultiModelAgent()
result = agent.perform_task("Implement a quicksort algorithm in Python.")
print(result)

6.4 Implementing an Agent with Memory

To create a more sophisticated agent with memory, you can use Ollama’s API to maintain conversation history:

import requests
import json
class MemoryAgent:
    def __init__(self, model_name):
        self.model_name = model_name
        self.api_url = "http://localhost:11434/api/chat"
        self.conversation_history = []
    def generate_response(self, prompt):
        payload = {
            "model": self.model_name,
            "messages": self.conversation_history + [{"role": "user", "content": prompt}]
        }
        response = requests.post(self.api_url, json=payload)
        assistant_response = json.loads(response.text)['message']['content']
        self.conversation_history.append({"role": "user", "content": prompt})
        self.conversation_history.append({"role": "assistant", "content": assistant_response})
        return assistant_response
    def perform_task(self, task_description):
        return self.generate_response(task_description)
# Usage
agent = MemoryAgent("llama3.2")
result1 = agent.perform_task("What is the capital of France?")
print(result1)
result2 = agent.perform_task("What is its population?")
print(result2)

7. Comparison with Roo Code and Other Agent-Based Tools

While Ollama provides a powerful platform for running AI models locally, it’s important to compare it with other agent-based tools like Roo Code to understand the strengths and limitations of each approach.

7.1 Ollama Approach

Advantages:

  1. Local Execution: Ollama runs models locally, ensuring data privacy and reducing latency.
  2. Customization: Extensive model customization through Modelfiles.
  3. Cost-Efficiency: No recurring API costs for model usage.
  4. Offline Capability: Can operate without an internet connection.
  5. Wide Model Support: Access to various open-source models.

Limitations:

  1. Hardware Requirements: Requires sufficient local hardware resources.
  2. Update Management: Users need to manually update models and the Ollama platform.
  3. Limited Built-in Tools: Fewer pre-built tools compared to some cloud platforms.

7.2 Roo Code and Similar Agent-Based Tools

Advantages:

  1. Cloud-Based: No need for powerful local hardware.
  2. Managed Updates: Automatic updates to models and platform.
  3. Scalability: Easy to scale resources as needed.
  4. Integrated Tools: Often come with a suite of pre-built tools and integrations.
  5. Collaborative Features: Easier to implement multi-user and team collaboration features.

Limitations:

  1. Data Privacy Concerns: Data may be processed on external servers.
  2. Recurring Costs: Usually involve ongoing subscription or API usage fees.
  3. Internet Dependency: Requires a stable internet connection for operation.
  4. Limited Customization: May have less flexibility in model customization compared to Ollama.

7.3 Comparative Analysis

  1. Development Workflow:

    • Ollama: Suited for developers who prefer local development and have concerns about data privacy.
    • Roo Code: Better for teams that need collaborative features and don’t want to manage infrastructure.
  2. Cost Structure:

    • Ollama: Higher upfront costs for hardware, but lower long-term costs.
    • Roo Code: Lower initial investment, but ongoing costs based on usage.
  3. Customization:

    • Ollama: Offers deep customization through Modelfiles and local fine-tuning.
    • Roo Code: May offer easier integration with existing cloud services but less model-level customization.
  4. Scalability:

    • Ollama: Scaling requires additional hardware investment.
    • Roo Code: Can easily scale up or down based on cloud resources.
  5. Data Control:

    • Ollama: Complete control over data and model interactions.
    • Roo Code: Data may be subject to cloud provider’s policies and jurisdictions.
  6. Ecosystem and Integrations:

    • Ollama: Growing ecosystem, but may require more custom integrations.
    • Roo Code: Often part of larger ecosystems with pre-built integrations.

8. Practical Applications of Ollama in Agentic Systems

Ollama’s capabilities make it suitable for a wide range of applications in agentic systems, including:

8.1 Creative Writing and Content Generation

Agentic systems using Ollama can assist writers with brainstorming, drafting, and editing [43]. For example:

class ContentGenerationAgent:
    def __init__(self):
        self.agent = OllamaAgent("llama3.2")
    def generate_article_outline(self, topic):
        prompt = f"Create a detailed outline for an article about {topic}."
        return self.agent.generate_response(prompt)
    def expand_section(self, section_title, outline):
        prompt = f"Expand on the following section of an article: {section_title}\n\nOutline: {outline}"
        return self.agent.generate_response(prompt)
# Usage
content_agent = ContentGenerationAgent()
outline = content_agent.generate_article_outline("The Impact of AI on Healthcare")
print(outline)
expanded_section = content_agent.expand_section("AI in Diagnostic Imaging", outline)
print(expanded_section)

8.2 Code Assistance

Ollama can be used to create agentic systems for generating, debugging, and documenting code [44]. Here’s an example:

class CodeAssistantAgent:
    def __init__(self):
        self.agent = OllamaAgent("codellama")
    def generate_function(self, description):
        prompt = f"Write a Python function that does the following: {description}"
        return self.agent.generate_response(prompt)
    def explain_code(self, code):
        prompt = f"Explain the following code in detail:\n\n{code}"
        return self.agent.generate_response(prompt)
    def suggest_improvements(self, code):
        prompt = f"Suggest improvements for the following code:\n\n{code}"
        return self.agent.generate_response(prompt)
# Usage
code_agent = CodeAssistantAgent()
function = code_agent.generate_function("Calculate the factorial of a number")
print(function)
explanation = code_agent.explain_code(function)
print(explanation)
improvements = code_agent.suggest_improvements(function)
print(improvements)

8.3 Language Translation

Agentic systems can leverage Ollama for facilitating multilingual communication and localization:

class TranslationAgent:
    def __init__(self):
        self.agent = OllamaAgent("llama3.2")
    def translate(self, text, source_lang, target_lang):
        prompt = f"Translate the following text from {source_lang} to {target_lang}:\n\n{text}"
        return self.agent.generate_response(prompt)
    def localize_content(self, content, target_culture):
        prompt = f"Adapt the following content for {target_culture} audience, considering cultural nuances:\n\n{content}"
        return self.agent.generate_response(prompt)
# Usage
translation_agent = TranslationAgent()
translated_text = translation_agent.translate("Hello, how are you?", "English", "French")
print(translated_text)
localized_content = translation_agent.localize_content("Join us for a summer barbecue!", "Japanese")
print(localized_content)

8.4 Research and Analysis

Ollama can power agentic systems for summarizing academic papers and extracting insights from data:

class ResearchAssistantAgent:
    def __init__(self):
        self.agent = OllamaAgent("llama3.2")
    def summarize_paper(self, abstract):
        prompt = f"Summarize the key points of this academic paper abstract:\n\n{abstract}"
        return self.agent.generate_response(prompt)
    def extract_insights(self, data_description):
        prompt = f"Based on the following data description, provide key insights and potential areas for further research:\n\n{data_description}"
        return self.agent.generate_response(prompt)
# Usage
research_agent = ResearchAssistantAgent()
summary = research_agent.summarize_paper("Abstract: This paper explores the effects of climate change on biodiversity...")
print(summary)
insights = research_agent.extract_insights("Dataset: Global temperature changes from 1900 to 2020, correlated with species extinction rates.")
print(insights)

8.5 Customer Support

Agentic systems using Ollama can power intelligent chatbots and virtual assistants for customer support:

class CustomerSupportAgent:
    def __init__(self):
        self.agent = MemoryAgent("llama3.2")
    def handle_query(self, customer_query):
        prompt = f"As a customer support agent, please respond to the following query: {customer_query}"
        return self.agent.generate_response(prompt)
    def escalate_issue(self, conversation_history):
        prompt = f"Based on the following conversation history, determine if this issue needs to be escalated to a human agent. If so, explain why:\n\n{conversation_history}"
        return self.agent.generate_response(prompt)
# Usage
support_agent = CustomerSupportAgent()
response1 = support_agent.handle_query("How do I reset my password?")
print(response1)
response2 = support_agent.handle_query("That didn't work. I'm still locked out of my account.")
print(response2)
escalation_decision = support_agent.escalate_issue(support_agent.agent.conversation_history)
print(escalation_decision)

9. Best Practices for Using Ollama in Agentic Systems

When implementing agentic systems with Ollama, consider the following best practices:

  1. Model Selection: Choose appropriate models for specific tasks. For example, use code-specific models for programming tasks and general-purpose models for broader queries.

  2. Prompt Engineering: Craft effective prompts to guide the model’s responses. Use clear instructions and provide context when necessary.

  3. Error Handling: Implement robust error handling to manage potential issues with model responses or API calls.

  4. Resource Management: Monitor and manage system resources, especially when running multiple models concurrently.

  5. Regular Updates: Keep Ollama and the downloaded models up to date to benefit from the latest improvements and security patches.

  6. Fine-tuning: For specialized tasks, consider fine-tuning models using domain-specific data to improve performance.

  7. Ethical Considerations: Implement safeguards to prevent the generation of harmful or biased content.

  8. User Feedback Loop: Incorporate user feedback mechanisms to continuously improve the agentic system’s performance.

10. Conclusion

Ollama provides a powerful and flexible platform for implementing agentic systems with locally-run AI models. Its extensive customization options, wide range of supported models, and focus on privacy and security make it an attractive choice for developers and organizations looking to build sophisticated AI-powered agents.

While Ollama offers significant advantages in terms of data control and cost-efficiency, it requires more upfront investment in hardware and technical expertise compared to cloud-based solutions like Roo Code. The choice between Ollama and cloud-based alternatives ultimately depends on specific project requirements, resource availability, and privacy concerns.

As the field of AI continues to evolve, platforms like Ollama are likely to play an increasingly important role in democratizing access to powerful language models and enabling the development of innovative agentic systems across various domains.

References

  1. Run AI Models Locally with Ollama: Fast & Simple Deployment. https://www.hendryadrian.com
  2. Ollama : What is Ollama?. https://medium.com
  3. The Future of Locally-Run AI Models: Embracing Ollama and Personal Compute. https://medium.com
  4. Ollama. https://ollama.com
  5. GitHub - ollama/ollama: Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, Mistral Small 3.1 and other large language models.. https://github.com
  6. Ollama : What is Ollama?. https://medium.com
  7. Ollama: A Deep Dive into Running Large Language Models Locally(PART-1):. https://medium.com
  8. Intro to Ollama: Full Guide to Local AI on Your Computer — Shep Bryan. https://www.shepbryan.com
  9. How to Run Open Source LLMs on Your Own Computer Using Ollama. https://www.freecodecamp.org
  10. Ollama : What is Ollama?. https://medium.com
  11. Ollama Installation¶. https://hostkey.com
  12. How to Run AI Models Private and Locally with Ollama (No Cloud Needed!). https://medium.com
  13. Run AI Models Locally with Ollama: Fast & Simple Deployment. https://www.hendryadrian.com
  14. ollama/docs/faq.md at main · ollama/ollama. https://github.com
  15. How to Run Open Source LLMs on Your Own Computer Using Ollama. https://www.freecodecamp.org
  16. The Future of Locally-Run AI Models: Embracing Ollama and Personal Compute. https://medium.com
  17. Local Ollama Parameter Comparison. https://medium.com
  18. Ollama - MindsDB. https://docs.mindsdb.com
  19. What is Ollama? Everything Important You Should Know. https://itsfoss.com
  20. Run AI Models Locally with Ollama: Fast & Simple Deployment. https://www.hendryadrian.com
  21. Running models with Ollama step-by-step. https://medium.com
  22. The 6 Best LLM Tools To Run Models Locally. https://getstream.io
  23. ollama/docs/faq.md at main · ollama/ollama. https://github.com
  24. ollama/docs/faq.md at main · ollama/ollama. https://github.com
  25. Local Ollama Parameter Comparison. https://medium.com
  26. Run AI Models Locally with Ollama: Fast & Simple Deployment. https://www.hendryadrian.com
  27. ollama/docs/modelfile.md at main · ollama/ollama. https://github.com
  28. Understanding Modelfile in Ollama. https://www.pacenthink.io
  29. ollama/docs/modelfile.md at main · ollama/ollama. https://github.com
  30. Understanding Modelfile in Ollama. https://www.pacenthink.io
  31. ollama/docs/modelfile.md at main · ollama/ollama. https://github.com
  32. ollama/docs/modelfile.md at main · ollama/ollama. https://github.com
  33. Understanding Modelfile in Ollama. https://www.pacenthink.io
  34. ollama/docs/modelfile.md at main · ollama/ollama. https://github.com
  35. Understanding Modelfile in Ollama. https://www.pacenthink.io
  36. ollama/docs/modelfile.md at main · ollama/ollama. https://github.com
  37. Ollama Docs Modelfile Overview | Restackio. https://www.restack.io
  38. ollama/docs/modelfile.md at main · ollama/ollama. https://github.com
  39. Ollama Docs Modelfile Overview | Restackio. https://www.restack.io
  40. Ollama Docs Modelfile Overview | Restackio. https://www.restack.io
  41. ollama/docs/modelfile.md at main · ollama/ollama. https://github.com
  42. Ollama Model File Tutorial | Restackio. https://www.restack.io
  43. Fine-Tune Your AI with Ollama Model Files: A Step-by-Step Tutorial. https://www.linkedin.com
  44. Ollama - Building a Custom Model. https://unmesh.dev
  45. How to Customize LLMs with Ollama | by Sumuditha Lansakara | Medium. https://medium.com
  46. Ollama Tutorial: Running LLMs Locally Made Super Simple - KDnuggets. https://www.kdnuggets.com
  47. ollama/docs/faq.md at main · ollama/ollama. https://github.com
  48. ollama/README.md at main · ollama/ollama. https://github.com
  49. Ollama Model File Examples | Restackio. https://www.restack.io
  50. ollama/README.md at main · ollama/ollama. https://github.com