James Bachini

How I Installed Ollama To Run AI Models Locally

install ollama locally

Ever wondered if you could ditch the cloud and run powerful AI models on your own laptop or server? I thought it sounded impossible until I installed Ollama. In just a few hours, I had Llama3.2 up and running locally. I’ll walk you through exactly how I did it, the trade offs I discovered (spoiler: it’s slower than ChatGPT), and how you can build your own custom AI agents without breaking the bank.

System Requirements & Speed Considerations

Before diving in, let me share why this matters. I tried running a 32 billion parameter model on a high end gaming laptop with a Nvidia 3080 GPU and it was still painfully slow.

    A better setup I found was to use a mini-pc with integral cpu/gpu chip as this will use RAM. It’ll be even slower but you don’t need a dedicated GPU with massive memory.

    You can then queue up requests using AI agents and custom scripts to run overnight or to automate certain things.

    If your goal is fast, use ChatGPT or Deepseek or Claude. But for casual experimentation or development, you can absolutely start on a laptop or small home server type setup.


    Ollama Installation Instructions

    The easiest way to install Ollama is to do it via the command line. Install Ollama using macOS, Linux, or Windows WSL:

    curl -fsSL https://ollama.com/install.sh | sh
    ollama pull deepseek-r1

    This installs the CLI, server, and auto detects GPU options. Then we pull in the latest deepseek-r1 model.

    You can get a full list of models available here: https://ollama.com/search

    We can then run it locally with the command:

    ollama run deepseek-r1

    You’ll get an REPL to chat with. Thanks to auto GPU detection, the model runs on whatever’s available.


    What Sized Models Can I Run?

    You want scale, I get it. Here’s what I’ve tested successfully:

      • 7B: Comfortable with 8GB RAM or 8GB VRAM
      • 13B: Requires ~16GB RAM or a solid GPU
      • 30-40B: Needs 32+GB RAM and/or highend GPUs

      In benchmarks on a Nvidia RTX4090 13B models ran at ~70 tokens/sec using ~65% VRAM. 40B models still ran but slowed to ~8 tokens/sec due to VRAM limits

      Note that some models are optimized for smaller home devices. Checkout Gemma3 for example that is designed for single GPU.

      https://ollama.com/library/gemma3


      How to Integrate with Custom AI Agents

      The biggest benefit to running AI models locally is that you can rereoute AI Agent queries that aren’t time sensitve from the OpenAI API directly to Ollama.

      Ollama includes a built in API on localhost:11434. Use it like:

        curl -X POST http://localhost:11434/api/chat \
        -d '{ "model": "deepseek-r1", "messages": [{"role":"user","content":"Teach Me Something About Bitcoin"}] }'

        Powerful stuff, and completely local. No API keys, no Monthly memberships required.

        If you’re building something unique, wanting full model control, or just tinkering, this is the ultimate sandbox. Give it a spin you won’t regret owning your AI infrastructure.


        Get The Blockchain Sector Newsletter, binge the YouTube channel and connect with me on Twitter

        The Blockchain Sector newsletter goes out a few times a month when there is breaking news or interesting developments to discuss. All the content I produce is free, if you’d like to help please share this content on social media.

        Thank you.

        James Bachini

        Disclaimer: Not a financial advisor, not financial advice. The content I create is to document my journey and for educational and entertainment purposes only. It is not under any circumstances investment advice. I am not an investment or trading professional and am learning myself while still making plenty of mistakes along the way. Any code published is experimental and not production ready to be used for financial transactions. Do your own research and do not play with funds you do not want to lose.


        Posted

        in

        , , ,

        by

        Tags: