LocalAI Self-hosted
LocalAI Self-hosted Citizen Developer LLM Open Source OpenAI API Compatible Privacy GGUF Hugging Face CPU GPU Docker Model Selection Chatbot Automation
| Platform | API Compatible | GPU Required | Cost | Privacy | เหมาะกับ |
|---|---|---|---|---|---|
| LocalAI | OpenAI | ไม่จำเป็น | Free | 100% | Self-hosted |
| Ollama | Custom + OpenAI | ไม่จำเป็น | Free | 100% | Developer |
| LM Studio | OpenAI | ไม่จำเป็น | Free | 100% | Desktop |
| vLLM | OpenAI | จำเป็น | Free | 100% | Production GPU |
| OpenAI API | Native | Cloud | Pay-per-use | ส่งออก | Best Quality |
LocalAI Setup
# === LocalAI Installation ===
# Docker — Quick Start
# docker run -p 8080:8080 localai/localai:latest
# Docker Compose — Production
# version: '3.8'
# services:
# localai:
# image: localai/localai:latest-aio-cpu
# ports:
# - "8080:8080"
# volumes:
# - ./models:/build/models
# - ./config:/build/config
# environment:
# - THREADS=8
# - CONTEXT_SIZE=4096
# - DEBUG=false
# restart: unless-stopped
# Download Models
# curl -L "https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q4_K_M.gguf" \
# -o models/mistral-7b.gguf
# Model Config — models/mistral.yaml
# name: mistral
# backend: llama-cpp
# parameters:
# model: mistral-7b.gguf
# temperature: 0.7
# top_p: 0.9
# top_k: 40
# context_size: 4096
# threads: 8
# Test API — OpenAI Compatible
# curl http://localhost:8080/v1/chat/completions \
# -H "Content-Type: application/json" \
# -d '{
# "model": "mistral",
# "messages": [{"role": "user", "content": "Hello!"}],
# "temperature": 0.7
# }'
# Python — Use OpenAI SDK
# pip install openai
# from openai import OpenAI
#
# client = OpenAI(base_url="http://localhost:8080/v1", api_key="not-needed")
#
# response = client.chat.completions.create(
# model="mistral",
# messages=[
# {"role": "system", "content": "You are a helpful assistant."},
# {"role": "user", "content": "อธิบาย Python list comprehension"},
# ],
# temperature=0.7,
# max_tokens=1000,
# )
# print(response.choices[0].message.content)
from dataclasses import dataclass
@dataclass
class ModelChoice:
model: str
size: str
ram_required: str
speed: str
quality: str
use_case: str
models = [
ModelChoice("Phi-3 Mini", "2.3GB", "4GB", "เร็วมาก", "ดี", "Chat Code"),
ModelChoice("Mistral 7B", "4.1GB", "8GB", "เร็ว", "ดีมาก", "General Purpose"),
ModelChoice("Llama 3 8B", "4.7GB", "8GB", "เร็ว", "ดีมาก", "General Purpose"),
ModelChoice("CodeLlama 7B", "4.1GB", "8GB", "เร็ว", "ดีมาก (Code)", "Code Generation"),
ModelChoice("Mixtral 8x7B", "26GB", "32GB", "ปานกลาง", "ดีเยี่ยม", "Complex Tasks"),
]
print("=== Model Choices ===")
for m in models:
print(f" [{m.model}] Size: {m.size} | RAM: {m.ram_required}")
print(f" Speed: {m.speed} | Quality: {m.quality}")
print(f" Use Case: {m.use_case}")
Citizen Developer Workflows
# === AI Workflows for Citizen Developers ===
# Workflow 1: Document Summarizer
# from openai import OpenAI
# client = OpenAI(base_url="http://localhost:8080/v1", api_key="x")
#
# def summarize_document(text, max_words=200):
# response = client.chat.completions.create(
# model="mistral",
# messages=[
# {"role": "system", "content": f"Summarize in {max_words} words. Thai language."},
# {"role": "user", "content": text},
# ],
# )
# return response.choices[0].message.content
# Workflow 2: Email Draft Generator
# def draft_email(topic, tone="professional"):
# response = client.chat.completions.create(
# model="mistral",
# messages=[
# {"role": "system", "content": f"Draft a {tone} email in Thai."},
# {"role": "user", "content": f"Topic: {topic}"},
# ],
# )
# return response.choices[0].message.content
# Workflow 3: Data Analysis Helper
# def analyze_data(data_description, question):
# response = client.chat.completions.create(
# model="mistral",
# messages=[
# {"role": "system", "content": "You are a data analyst. Give Python code."},
# {"role": "user", "content": f"Data: {data_description}\nQuestion: {question}"},
# ],
# )
# return response.choices[0].message.content
@dataclass
class Workflow:
name: str
input_type: str
output: str
model: str
time: str
difficulty: str
workflows = [
Workflow("Document Summary", "Text/PDF", "สรุปย่อ", "Mistral 7B", "5-15 sec", "ง่าย"),
Workflow("Email Draft", "Topic + Tone", "Draft Email", "Mistral 7B", "5-10 sec", "ง่าย"),
Workflow("Code Generator", "Description", "Python Code", "CodeLlama 7B", "10-30 sec", "ง่าย"),
Workflow("Data Analysis", "Data + Question", "Analysis + Code", "Mistral 7B", "10-20 sec", "ปานกลาง"),
Workflow("Chatbot", "User Message", "AI Response", "Llama 3 8B", "3-10 sec", "ปานกลาง"),
Workflow("Translation", "Text + Language", "Translated Text", "Mistral 7B", "5-15 sec", "ง่าย"),
]
print("\n=== Citizen Developer Workflows ===")
for w in workflows:
print(f" [{w.name}] Difficulty: {w.difficulty}")
print(f" Input: {w.input_type} → Output: {w.output}")
print(f" Model: {w.model} | Time: {w.time}")
Production Considerations
# === Production LocalAI Setup ===
# Hardware Requirements
@dataclass
class HardwareReq:
model_size: str
ram: str
cpu: str
gpu: str
users: str
cost_estimate: str
requirements = [
HardwareReq("3B (Phi-3)", "8GB", "4 cores", "ไม่จำเป็น", "1-5", "$500 PC"),
HardwareReq("7B (Mistral)", "16GB", "8 cores", "แนะนำ", "5-20", "$1,000 PC"),
HardwareReq("13B (Llama)", "32GB", "8+ cores", "แนะนำ", "10-50", "$2,000 PC"),
HardwareReq("8x7B (Mixtral)", "64GB", "16 cores", "จำเป็น", "20-100", "GPU Server"),
]
print("Hardware Requirements:")
for h in requirements:
print(f" [{h.model_size}] RAM: {h.ram} | CPU: {h.cpu}")
print(f" GPU: {h.gpu} | Users: {h.users} | Cost: {h.cost_estimate}")
# Cost Comparison
comparison = {
"LocalAI (Mistral 7B)": {"setup": "$1,000 one-time", "monthly": "$50 electricity", "per_1k_tokens": "$0"},
"OpenAI GPT-4o": {"setup": "$0", "monthly": "Variable", "per_1k_tokens": "$0.005-$0.015"},
"OpenAI GPT-3.5": {"setup": "$0", "monthly": "Variable", "per_1k_tokens": "$0.0005-$0.0015"},
}
print(f"\n\nCost Comparison (10K requests/month):")
for name, costs in comparison.items():
print(f" [{name}]")
for k, v in costs.items():
print(f" {k}: {v}")
เคล็ดลับ
- Start Small: เริ่มจาก Phi-3 Mini ทดลองก่อน แล้วค่อยขยาย
- Q4_K_M: ใช้ Quantization Q4_K_M ดีทั้งคุณภาพและขนาด
- OpenAI SDK: ใช้ OpenAI SDK เปลี่ยนแค่ base_url ง่าย
- Privacy: เหมาะกับข้อมูลลับ ไม่ส่งออกนอกเครื่อง
- Docker: ใช้ Docker Compose สำหรับ Production Setup
LocalAI คืออะไร
Open Source AI Server รันเครื่องตัวเอง OpenAI API Compatible LLM Image Audio Embeddings GGUF Hugging Face CPU GPU Docker Privacy Free
Citizen Developer คืออะไร
ไม่ใช่ Developer มืออาชีพ สร้าง App ด้วย Low-code No-code AI ช่วย Code Automation LocalAI Privacy ไม่มีค่า API ทดลองไม่จำกัด Chatbot
ติดตั้ง LocalAI อย่างไร
Docker run localai Docker Compose Hugging Face GGUF Model Config YAML API POST /v1/chat/completions OpenAI SDK base_url เปลี่ยน
LocalAI เทียบกับ OpenAI API อย่างไร
LocalAI ฟรี Privacy 100% Offline ได้ Hardware ต้องดี Model เล็กกว่า OpenAI คุณภาพสูง GPT-4 มีค่าใช้จ่าย ส่งออก Rate Limit
สรุป
LocalAI Self-hosted Citizen Developer LLM OpenAI API Compatible Privacy GGUF Model Docker CPU GPU Chatbot Automation Low-code Production Workflow
