How I Installed Ollama To Run AI Models Locally

Ever wondered if you could ditch the cloud and run powerful AI models on your own laptop or server? I thought it sounded impossible until I installed Ollama. In just a few hours, I had Llama3.2 up and running locally. I’ll walk you through exactly how I did it, the trade offs I discovered (spoiler: it’s slower than ChatGPT), and how you can build your own custom AI agents without breaking the bank.

System Requirements & Speed Considerations

Before diving in, let me share why this matters. I tried running a 32 billion parameter model on a high end gaming laptop with a Nvidia 3080 GPU and it was still painfully slow.

A better setup I found was to use a mini-pc with integral cpu/gpu chip as this will use RAM. It’ll be even slower but you don’t need a dedicated GPU with massive memory.

You can then queue up requests using AI agents and custom scripts to run overnight or to automate certain things.

If your goal is fast, use ChatGPT or Deepseek or Claude. But for casual experimentation or development, you can absolutely start on a laptop or small home server type setup.

Ollama Installation Instructions

The easiest way to install Ollama is to do it via the command line. Install Ollama using macOS, Linux, or Windows WSL:

curl -fsSL https://ollama.com/install.sh | sh
ollama pull deepseek-r1

This installs the CLI, server, and auto detects GPU options. Then we pull in the latest deepseek-r1 model.

You can get a full list of models available here: https://ollama.com/search

We can then run it locally with the command:

ollama run deepseek-r1

You’ll get an REPL to chat with. Thanks to auto GPU detection, the model runs on whatever’s available.

What Sized Models Can I Run?

You want scale, I get it. Here’s what I’ve tested successfully:

7B: Comfortable with 8GB RAM or 8GB VRAM
13B: Requires ~16GB RAM or a solid GPU
30-40B: Needs 32+GB RAM and/or highend GPUs

In benchmarks on a Nvidia RTX4090 13B models ran at ~70 tokens/sec using ~65% VRAM. 40B models still ran but slowed to ~8 tokens/sec due to VRAM limits

Note that some models are optimized for smaller home devices. Checkout Gemma3 for example that is designed for single GPU.

https://ollama.com/library/gemma3

How to Integrate with Custom AI Agents

The biggest benefit to running AI models locally is that you can rereoute AI Agent queries that aren’t time sensitve from the OpenAI API directly to Ollama.

Ollama includes a built in API on localhost:11434. Use it like:

curl -X POST http://localhost:11434/api/chat \
-d '{ "model": "deepseek-r1", "messages": [{"role":"user","content":"Teach Me Something About Bitcoin"}] }'

Powerful stuff, and completely local. No API keys, no Monthly memberships required.

If you’re building something unique, wanting full model control, or just tinkering, this is the ultimate sandbox. Give it a spin you won’t regret owning your AI infrastructure.

Get The Blockchain Sector Newsletter, binge the YouTube channel and connect with me on Twitter

The Blockchain Sector newsletter goes out a few times a month when there is breaking news or interesting developments to discuss. All the content I produce is free, if you’d like to help please share this content on social media.

Thank you.

Disclaimer: Not a financial advisor, not financial advice. The content I create is to document my journey and for educational and entertainment purposes only. It is not under any circumstances investment advice. I am not an investment or trading professional and am learning myself while still making plenty of mistakes along the way. Any code published is experimental and not production ready to be used for financial transactions. Do your own research and do not play with funds you do not want to lose.

Posted

18/06/2025

AI, Development, Tools, Tutorials

James

Tags:

AI, DeepSeek, Ollama