How to run llama3.1 on M1 Mac with Ollama

mlapi
2 min readJul 25, 2024

--

In this blog you will learn how run Llama3.1 on M1 Mac with Ollama.

Table of content

  1. Ollama and how to install it on mac
  2. Using Llama3.1 and Ollama with python
  3. Conclusion

Ollama

With Ollama you can easily run large language models locally with just one command. By default ollama contains multiple models that you can try, alongside with that you can add your own model and use ollama to host it — Guide for that.

How to install Ollama on M1 Mac

Head over to Ollama.com and Click on Download button, then click on Download for macOS.

NOTE: Ollama requires macOS 11 Big Sur or later

A zip file will be downloded, then follow the installation steps.

To verify if ollama is installed or not run the following command

ollama

Your termial will display the information about ollama command

Once Downloded and everything is steup, run the following command to install llama3.1

ollama run llama3.1

Ollama will extract the model weights and manifest files for llama3.1 to run

After running above and after installing all the dependencies you will see a placeholder as send a message, now you can start chating with llama3.1.

Using Llama3.1 and Ollama with python

Now that you have Llama3.1 running on your macOS, now you can test it with curl command as below

curl -X POST http://localhost:11434/api/generate -d '{
"model": "llama3.1",
"prompt":"Hi, tell me about yourself."
}'

Also here is a postman request json, you can use this to make api request through postman or thunderclient

Postman requesy JSON

{
"client": "MLAPI - Llama3.1",
"collectionName": "https://mlapi.co",
"requests": [
{
"name": "llama3.1-with-ollama",
"url": "http://localhost:11434/api/generate",
"method": "POST",
"sortNum": 10000,
"headers": [],
"params": [],
"body": {
"type": "json",
"raw": "{ \"model\": \"llama3.1\",\n \"prompt\":\"Hi, tell me about yourself.\"}",
"form": []
},
"tests": []
}
]
}

Python code

With this python code you can make api request to ollama llama3.1 and get proper response

import requests
import json
def generate_response(prompt):
url = "http://localhost:11434/api/generate"
payload = {
"model": "llama3.1",
"prompt": prompt
}
try:
response = requests.post(url, json=payload)
response.raise_for_status()
# Split the response by newline characters and parse each JSON object
response_data = [json.loads(line) for line in response.text.strip().split("\n") if line.strip()]
# Extract the "response" field from each JSON object and concatenate into a single string
generated_response = " ".join(data.get("response", "") for data in response_data)
# Replace double spaces with single spaces for proper formatting
generated_response = " ".join(generated_response.split())
return generated_response
except requests.exceptions.RequestException as e:
return f"Error: {e}", 500
response = generate_response("Hi, tell me about yourself")
print("Response:", response)

Conclusion

This is a very simple introduction on how to use it on macOS. We are clearly seeing a trend where more people are intrested in running LLM’s on their own machine.

Now you also know how to do it on your macOS.

--

--