In this blog you will learn how run Llama3.1 on M1 Mac with Ollama.
Table of content
- Ollama and how to install it on mac
- Using Llama3.1 and Ollama with python
- Conclusion
Ollama
With Ollama you can easily run large language models locally with just one command. By default ollama contains multiple models that you can try, alongside with that you can add your own model and use ollama to host it — Guide for that.
How to install Ollama on M1 Mac
Head over to Ollama.com and Click on Download button, then click on Download for macOS.
NOTE: Ollama requires macOS 11 Big Sur or later
A zip file will be downloded, then follow the installation steps.
To verify if ollama is installed or not run the following command
ollama
Your termial will display the information about ollama command
Once Downloded and everything is steup, run the following command to install llama3.1
ollama run llama3.1
Ollama will extract the model weights and manifest files for llama3.1 to run
After running above and after installing all the dependencies you will see a placeholder as send a message, now you can start chating with llama3.1.
Using Llama3.1 and Ollama with python
Now that you have Llama3.1 running on your macOS, now you can test it with curl command as below
curl -X POST http://localhost:11434/api/generate -d '{
"model": "llama3.1",
"prompt":"Hi, tell me about yourself."
}'
Also here is a postman request json, you can use this to make api request through postman or thunderclient
Postman requesy JSON
{
"client": "MLAPI - Llama3.1",
"collectionName": "https://mlapi.co",
"requests": [
{
"name": "llama3.1-with-ollama",
"url": "http://localhost:11434/api/generate",
"method": "POST",
"sortNum": 10000,
"headers": [],
"params": [],
"body": {
"type": "json",
"raw": "{ \"model\": \"llama3.1\",\n \"prompt\":\"Hi, tell me about yourself.\"}",
"form": []
},
"tests": []
}
]
}
Python code
With this python code you can make api request to ollama llama3.1 and get proper response
import requests
import json
def generate_response(prompt):
url = "http://localhost:11434/api/generate"
payload = {
"model": "llama3.1",
"prompt": prompt
}
try:
response = requests.post(url, json=payload)
response.raise_for_status() # Split the response by newline characters and parse each JSON object
response_data = [json.loads(line) for line in response.text.strip().split("\n") if line.strip()] # Extract the "response" field from each JSON object and concatenate into a single string
generated_response = " ".join(data.get("response", "") for data in response_data) # Replace double spaces with single spaces for proper formatting
generated_response = " ".join(generated_response.split()) return generated_response
except requests.exceptions.RequestException as e:
return f"Error: {e}", 500response = generate_response("Hi, tell me about yourself")
print("Response:", response)
Conclusion
This is a very simple introduction on how to use it on macOS. We are clearly seeing a trend where more people are intrested in running LLM’s on their own machine.
Now you also know how to do it on your macOS.