🦙Ollama & Chatbox

Frontend Chat UI ..

Introduction

Ollama is an open-source platform designed to run large language models (LLMs) locally on personal computers. It allows users to download, run, and fine-tune various AI models like Llama, Mistral, and other open-source LLMs without requiring cloud services or API calls.

Ollama features a simple command-line interface, an API for integration with other applications, and supports both CPU and GPU acceleration.

Ollama

As we're going to be using Ollama .. alot .. let's run through a workshop that explores how to effectively use Ollama on Windows 11, macOS, and Linux.

We'll focus on practical commands, real-world applications, and cross-platform techniques.

Prerequisites:

Ollama already installed on your system
Basic familiarity with command line interfaces
At least 8GB RAM (16GB+ recommended for larger models)

Take to look at the Setup > Quickstart section to ensure Ollama is installed and up and running.

Getting Started

Let's start by discovering what models are available, pulling and selecting ones to use ..

List Ollama models.

# List all available models that can be pulled
ollama list -a

# List all locally installed models
ollama list

Let's pull several models to compare.

# Pull the default Llama 3 model
ollama pull llama3

# Pull Mistral model
ollama pull mistral

# For code-specific tasks
ollama pull codellama

Examine model details.

# Get detailed information about a model
ollama show llama3

# Check model parameters
ollama show llama3 | grep "parameter"

Interacting with Models

Playtime .. Ask the model to generate some stories, process a file, etc..

Once you've pulled the model .. just enter your question!
Let's start with basic queries.

# Basic question
ollama run llama3 "What are three benefits of running AI models locally?"

# Generate creative content
ollama run llama3 "Write a short poem about technology"

# Technical question
ollama run codellama "Explain how to use async/await in JavaScript"

Process files as input and output.

# Process content from a file
echo "Summarize the following text: $(cat article.txt)" | ollama run llama3 > summary.txt

# On Windows PowerShell
Get-Content article.txt | ollama run llama3 "Summarize the following text:" > summary.txt

Modelfile

Create a file named Modelfile (no extension) with these contents.

FROM llama3
SYSTEM You are a helpful writing assistant that specializes in clear, concise business communication. You help users draft professional emails, reports, and presentations.
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER num_ctx 4096

Create the custom model.

ollama create business-writer -f Modelfile

Model Customization

Ollama abstraction layer exposes the model parameters. This enables the user to customize how the model generates the response.

Models

You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.

Here are some example models that can be downloaded:

Model

Parameters

Ollama Command

Code Llama

ollama run codellama

DeepSeek-R1

7B/8B/14B

ollama run deepseek-r1

Gemma 2

2B/9B/27B

ollama run gemma2:2b

Gemma 3

1B/4B/12B/27B

ollama run gemma3:12b

Llama 3.1

ollama run llama3.1

Llama 3.2

1B/3B

ollama run llama3.2

Llama 3.2 Vision

11B

ollama run llama3.2-vision

Mistral

ollama run mistral

Phi 3 Mini

3.8B

ollama run phi3

Phi 4

14B

ollama run phi4

Qwen2.5

1.5B/3B/7B/14B

ollama run qwen2.5

Qwen2.5-coder

1.5B/3B/7B/14B

ollama run qwen2.5-coder

Neural Chat

ollama run neural-chat

Starling

ollama run starling-lm

Llama 2 Uncensored

ollama run llama2-uncensored

LLaVA

ollama run llava

Solar

10.7B

ollama run solar

Command

Description

ollama list

Displays all installed models on your system.

ollama pull <model name>

Download a New Model.

ollama run <model name>

Run an LLM for Chat or Inference.

ollama update <model name>

Ensures you have the latest version.

ollama remove <model name>

Deletes a model.

ollama status

Displays CPU and GPU utilization, active models, and memory usage.

Flag

description

--verbose

Displays stats. Net generation time and output time with tokens/seconds

Ollama also has its REST API, you can call it through curl

curl http://localhost:11434/api/generate -d '{
  "model": "phi4",
  "prompt": "Why is the sky blue?",
  "options": {
    "temperature": 0.7,
    "top_p": 0.9,
    "top_k": 40
  }
}'

The response will be streamed back ..

chatbot@Office:~$ curl http://localhost:11434/api/generate -d '{
  "model": "phi4",
  "prompt": "Why is the sky blue?"
}'
{"model":"phi4","created_at":"2025-02-25T13:01:16.847307208Z","response":"The","done":false}
{"model":"phi4","created_at":"2025-02-25T13:01:16.895178316Z","response":" sky","done":false}
{"model":"phi4","created_at":"2025-02-25T13:01:16.926463821Z","response":" appears","done":false}
{"model":"phi4","created_at":"2025-02-25T13:01:16.957592326Z","response":" blue","done":false}
{"model":"phi4","created_at":"2025-02-25T13:01:16.988690731Z","response":" due","done":false}
...
319,6138,3135,304,1524,810,72916,315,24210,93959,320,12481,323,6307,705,10923,5129,93959,1093,2579,82,323,85138,311,5662,701,6548,382,45600,11,13558,64069,72916,15100,3249,584,1518,264,6437,13180,1234,14595,62182,4787,13],"total_duration":14595395219,"load_duration":3775345351,"prompt_eval_count":16,"prompt_eval_duration":465000000,"eval_count":280,"eval_duration":10352000000}

PreviousJan.ai NextWorkflows

Last updated 3 months ago