SiamCafe · Blog
TensorRT Optimization AR VR Development
บทความ

TensorRT Optimization AR VR Development

เผยแพร่ 28 พฤษภาคม 2569

TensorRT Optimization AR VR Development คืออะไร

TensorRT Optimization AR VR Development

TensorRT เป็น SDK จาก NVIDIA สำหรับ optimize deep learning inference บน GPU ให้เร็วที่สุด รองรับ quantization, layer fusion, kernel auto-tuning และ dynamic tensor memory AR (Augmented Reality) และ VR (Virtual Reality) ต้องการ inference latency ต่ำมาก (< 20ms) เพื่อให้ผู้ใช้ไม่รู้สึก lag การรวม TensorRT กับ AR/VR development ช่วยให้ AI features เช่น hand tracking, object detection, face mesh และ scene understanding ทำงานได้แบบ real-time บน headsets และ mobile devices

TensorRT Optimization Pipeline

# tensorrt_pipeline.py — TensorRT optimization pipeline
import json

class TensorRTPipeline:
    OPTIMIZATION_STEPS = {
        "export": {
            "name": "1. Export Model (ONNX)",
            "description": "แปลง PyTorch/TF model เป็น ONNX format",
            "tools": "torch.onnx.export(), tf2onnx",
        },
        "parse": {
            "name": "2. Parse & Build Engine",
            "description": "TensorRT parse ONNX → สร้าง optimized engine สำหรับ target GPU",
            "optimizations": "Layer fusion, kernel selection, memory optimization",
        },
        "quantize": {
            "name": "3. Quantization",
            "description": "ลด precision: FP32 → FP16 → INT8 — เร็วขึ้น 2-4x",
            "tradeoff": "INT8 เร็วสุด แต่ accuracy อาจลดเล็กน้อย (ต้อง calibrate)",
        },
        "profile": {
            "name": "4. Profile & Benchmark",
            "description": "วัด latency, throughput, memory usage",
            "target": "AR/VR: < 11ms per frame (90 FPS), < 16ms (60 FPS)",
        },
        "deploy": {
            "name": "5. Deploy to Device",
            "description": "Deploy engine ไป target device (Jetson, GPU server, mobile)",
        },
    }

    CODE = """
# optimize_model.py — TensorRT optimization
import tensorrt as trt
import numpy as np

def build_engine(onnx_path, precision='fp16', max_batch=1):
    logger = trt.Logger(trt.Logger.WARNING)
    builder = trt.Builder(logger)
    network = builder.create_network(
        1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
    )
    parser = trt.OnnxParser(network, logger)
    
    # Parse ONNX
    with open(onnx_path, 'rb') as f:
        if not parser.parse(f.read()):
            for i in range(parser.num_errors):
                print(f"Error: {parser.get_error(i)}")
            return None
    
    # Build config
    config = builder.create_builder_config()
    config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, 1 << 30)  # 1GB
    
    if precision == 'fp16':
        config.set_flag(trt.BuilderFlag.FP16)
    elif precision == 'int8':
        config.set_flag(trt.BuilderFlag.INT8)
        # Need calibrator for INT8
    
    # Set optimization profile
    profile = builder.create_optimization_profile()
    profile.set_shape("input", (1, 3, 224, 224), (1, 3, 224, 224), (max_batch, 3, 224, 224))
    config.add_optimization_profile(profile)
    
    # Build engine
    engine = builder.build_serialized_network(network, config)
    
    # Save engine
    with open(onnx_path.replace('.onnx', '.engine'), 'wb') as f:
        f.write(engine)
    
    return engine

# Build for AR/VR (FP16 for balance of speed and accuracy)
engine = build_engine('hand_tracker.onnx', precision='fp16')
"""

    def show_steps(self):
        print("=== TensorRT Optimization Pipeline ===\n")
        for key, step in self.OPTIMIZATION_STEPS.items():
            print(f"[{step['name']}]")
            print(f"  {step['description']}")
            print()

    def show_code(self):
        print("=== Build Engine Code ===")
        print(self.CODE[:600])

pipeline = TensorRTPipeline()
pipeline.show_steps()
pipeline.show_code()

AR/VR AI Use Cases

# ar_vr_ai.py — AI use cases in AR/VR
import json

class ARVRAIUseCases:
    USE_CASES = {
        "hand_tracking": {
            "name": "Hand Tracking",
            "model": "MediaPipe Hands / Custom CNN",
            "latency_req": "< 10ms (real-time gesture recognition)",
            "tensorrt_gain": "3-5x speedup → enables 90 FPS tracking",
            "application": "Gesture control ใน VR, virtual keyboard, sign language",
        },
        "object_detection": {
            "name": "Object Detection & Recognition",
            "model": "YOLOv8 / EfficientDet",
            "latency_req": "< 15ms per frame",
            "tensorrt_gain": "2-4x speedup → real-time object overlay",
            "application": "AR labels บนวัตถุจริง, product recognition, navigation",
        },
        "face_mesh": {
            "name": "Face Mesh & Expression",
            "model": "MediaPipe Face Mesh / FaceNet",
            "latency_req": "< 8ms (468 landmarks)",
            "tensorrt_gain": "3-6x speedup → smooth face filters",
            "application": "AR face filters, avatar animation, emotion detection",
        },
        "depth_estimation": {
            "name": "Monocular Depth Estimation",
            "model": "MiDaS / DPT",
            "latency_req": "< 20ms",
            "tensorrt_gain": "2-4x speedup → real-time depth map",
            "application": "AR occlusion, spatial understanding, 3D reconstruction",
        },
        "slam": {
            "name": "Visual SLAM (Simultaneous Localization and Mapping)",
            "model": "ORB-SLAM3 + deep features",
            "latency_req": "< 33ms (30 FPS minimum)",
            "tensorrt_gain": "Feature extraction 2-3x faster",
            "application": "AR world tracking, indoor navigation, spatial anchors",
        },
        "scene_understanding": {
            "name": "Semantic Segmentation",
            "model": "DeepLab / SegFormer",
            "latency_req": "< 20ms",
            "tensorrt_gain": "3-5x speedup",
            "application": "AR furniture placement, virtual walls, room layout",
        },
    }

    def show_use_cases(self):
        print("=== AR/VR AI Use Cases ===\n")
        for key, uc in self.USE_CASES.items():
            print(f"[{uc['name']}]")
            print(f"  Model: {uc['model']}")
            print(f"  Latency: {uc['latency_req']} | TRT gain: {uc['tensorrt_gain']}")
            print()

arvr = ARVRAIUseCases()
arvr.show_use_cases()

Real-time Inference Engine

TensorRT Optimization AR VR Development
# inference_engine.py — TensorRT inference for AR/VR
import json
import random

class InferenceEngine:
    CODE = """
# ar_inference.py — Real-time inference engine
import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np
import cv2
import time

class TRTInference:
    def __init__(self, engine_path):
        self.logger = trt.Logger(trt.Logger.WARNING)
        with open(engine_path, 'rb') as f:
            runtime = trt.Runtime(self.logger)
            self.engine = runtime.deserialize_cuda_engine(f.read())
        
        self.context = self.engine.create_execution_context()
        self._allocate_buffers()
    
    def _allocate_buffers(self):
        self.inputs = []
        self.outputs = []
        self.bindings = []
        self.stream = cuda.Stream()
        
        for i in range(self.engine.num_io_tensors):
            name = self.engine.get_tensor_name(i)
            shape = self.engine.get_tensor_shape(name)
            dtype = trt.nptype(self.engine.get_tensor_dtype(name))
            size = np.prod(shape)
            
            host_mem = cuda.pagelocked_empty(size, dtype)
            device_mem = cuda.mem_alloc(host_mem.nbytes)
            self.bindings.append(int(device_mem))
            
            if self.engine.get_tensor_mode(name) == trt.TensorIOMode.INPUT:
                self.inputs.append({"host": host_mem, "device": device_mem, "shape": shape})
            else:
                self.outputs.append({"host": host_mem, "device": device_mem, "shape": shape})
    
    def infer(self, input_data):
        np.copyto(self.inputs[0]["host"], input_data.ravel())
        cuda.memcpy_htod_async(self.inputs[0]["device"], self.inputs[0]["host"], self.stream)
        
        self.context.execute_async_v2(
            bindings=self.bindings, stream_handle=self.stream.handle
        )
        
        for out in self.outputs:
            cuda.memcpy_dtoh_async(out["host"], out["device"], self.stream)
        
        self.stream.synchronize()
        return [out["host"].reshape(out["shape"]) for out in self.outputs]

# AR/VR loop
engine = TRTInference('hand_tracker.engine')
cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()
    input_tensor = preprocess(frame)  # resize, normalize
    
    start = time.perf_counter()
    results = engine.infer(input_tensor)
    latency = (time.perf_counter() - start) * 1000
    
    landmarks = postprocess(results)
    render_ar(frame, landmarks)
    
    if latency > 16:  # > 60 FPS budget exceeded
        print(f"Warning: latency {latency:.1f}ms")
"""

    def show_code(self):
        print("=== Inference Engine ===")
        print(self.CODE[:600])

    def benchmark(self):
        print(f"\n=== Benchmark Results ===")
        models = [
            {"name": "Hand Tracker", "fp32": random.uniform(20, 40), "fp16": 0, "int8": 0},
            {"name": "YOLOv8-nano", "fp32": random.uniform(15, 30), "fp16": 0, "int8": 0},
            {"name": "Face Mesh", "fp32": random.uniform(10, 25), "fp16": 0, "int8": 0},
            {"name": "MiDaS Depth", "fp32": random.uniform(30, 60), "fp16": 0, "int8": 0},
        ]
        for m in models:
            m["fp16"] = m["fp32"] * random.uniform(0.35, 0.55)
            m["int8"] = m["fp32"] * random.uniform(0.2, 0.35)
            print(f"  {m['name']:<16} FP32: {m['fp32']:>5.1f}ms | FP16: {m['fp16']:>5.1f}ms | INT8: {m['int8']:>5.1f}ms")

ie = InferenceEngine()
ie.show_code()
ie.benchmark()

Edge Deployment

# edge_deploy.py — Deploy to AR/VR devices
import json

class EdgeDeployment:
    DEVICES = {
        "jetson_orin": {
            "name": "NVIDIA Jetson AGX Orin",
            "gpu": "Ampere (2048 CUDA cores)",
            "tensorrt": "TensorRT 8.6+",
            "use": "Standalone VR headsets, robotics, edge AI",
            "perf": "INT8: 275 TOPS",
        },
        "jetson_nano": {
            "name": "NVIDIA Jetson Nano",
            "gpu": "Maxwell (128 CUDA cores)",
            "tensorrt": "TensorRT 8.x",
            "use": "Low-cost AR prototyping, education",
            "perf": "FP16: 0.5 TFLOPS",
        },
        "rtx_4090": {
            "name": "NVIDIA RTX 4090 (Desktop VR)",
            "gpu": "Ada Lovelace (16384 CUDA cores)",
            "tensorrt": "TensorRT 8.6+",
            "use": "PC VR (Quest Link, SteamVR), training + inference",
            "perf": "INT8: 1,321 TOPS",
        },
        "cloud_gpu": {
            "name": "Cloud GPU (A100/H100)",
            "gpu": "Hopper/Ampere",
            "tensorrt": "TensorRT 8.6+",
            "use": "Cloud rendering, multi-user AR/VR, heavy models",
            "perf": "H100 INT8: 3,958 TOPS",
        },
    }

    OPTIMIZATION_TIPS = {
        "model_size": "ใช้ lightweight models: MobileNet, EfficientNet-Lite, YOLO-nano",
        "input_resolution": "ลด input resolution: 640×480 → 320×240 สำหรับ mobile AR",
        "batch_size": "Batch=1 สำหรับ AR/VR (latency > throughput)",
        "async_inference": "Async inference — process frame N ขณะ render frame N-1",
        "multi_model": "Run multiple models: hand + face + object — pipeline ให้ overlap",
        "dynamic_quality": "Dynamic quality — ลด precision/resolution เมื่อ latency สูง",
    }

    def show_devices(self):
        print("=== Target Devices ===\n")
        for key, dev in self.DEVICES.items():
            print(f"[{dev['name']}]")
            print(f"  GPU: {dev['gpu']} | TRT: {dev['tensorrt']}")
            print(f"  Use: {dev['use']}")
            print()

    def show_tips(self):
        print("=== Optimization Tips ===")
        for key, tip in self.OPTIMIZATION_TIPS.items():
            print(f"  [{key}] {tip}")

edge = EdgeDeployment()
edge.show_devices()
edge.show_tips()

Performance Monitoring

# monitoring.py — AR/VR performance monitoring
import json
import random

class PerformanceMonitoring:
    METRICS = {
        "frame_time": "Total frame time (target: < 11ms for 90 FPS)",
        "inference_time": "AI inference latency (target: < 5ms)",
        "render_time": "Rendering time (target: < 5ms)",
        "motion_to_photon": "Motion-to-photon latency (target: < 20ms — ป้องกัน VR sickness)",
        "fps": "Frames per second (target: 72-120 FPS)",
        "gpu_utilization": "GPU usage % (target: < 80% — headroom สำหรับ spikes)",
        "thermal": "Device temperature (throttling > 80°C)",
    }

    def show_metrics(self):
        print("=== AR/VR Performance Metrics ===\n")
        for metric, desc in self.METRICS.items():
            print(f"  [{metric}] {desc}")

    def live_stats(self):
        print(f"\n=== Live Performance ===")
        fps = random.randint(85, 95)
        inference = random.uniform(3, 8)
        render = random.uniform(2, 5)
        frame = inference + render + random.uniform(0.5, 2)
        m2p = frame + random.uniform(5, 10)
        gpu = random.randint(50, 80)
        temp = random.uniform(55, 75)
        
        fps_status = "OK" if fps >= 72 else "LOW"
        m2p_status = "OK" if m2p < 20 else "HIGH"
        
        print(f"  FPS: {fps} [{fps_status}]")
        print(f"  Inference: {inference:.1f}ms")
        print(f"  Render: {render:.1f}ms")
        print(f"  Frame total: {frame:.1f}ms")
        print(f"  Motion-to-photon: {m2p:.1f}ms [{m2p_status}]")
        print(f"  GPU: {gpu}% | Temp: {temp:.0f}°C")

mon = PerformanceMonitoring()
mon.show_metrics()
mon.live_stats()

การนำความรู้ไปประยุกต์ใช้งานจริง

แหล่งเรียนรู้ที่แนะนำ ได้แก่ Official Documentation ที่อัพเดทล่าสุดเสมอ Online Course จาก Coursera Udemy edX ช่อง YouTube คุณภาพทั้งไทยและอังกฤษ และ Community อย่าง Discord Reddit Stack Overflow ที่ช่วยแลกเปลี่ยนประสบการณ์กับนักพัฒนาทั่วโลก

อ่านเพิ่ม: Embedding Model Troubleshooting แก้ปัญหา | SiamCafe Blog · อ่านเพิ่ม: Embedding Model Remote Work Setup | SiamCafe Blog · อ่านเพิ่ม: QuestDB Time Series Post-mortem Analysis | SiamCafe Blog

FAQ - คำถามที่พบบ่อย

Q: TensorRT จำเป็นสำหรับ AR/VR ไหม?

A: จำเป็นมากถ้าใช้ AI features บน NVIDIA GPUs — speedup 2-6x เทียบกับ PyTorch/TF inference ปกติ AR/VR ต้องการ < 11ms per frame (90 FPS) — ไม่มี TensorRT อาจไม่ทัน ทางเลือก: ONNX Runtime (cross-platform), CoreML (Apple), TFLite (mobile)

Q: FP16 กับ INT8 ใช้อันไหน?

A: FP16: ดีสำหรับเกือบทุก use case — เร็วขึ้น 2x, accuracy แทบไม่ลด INT8: เร็วสุด (3-4x) แต่ต้อง calibrate — accuracy อาจลด 1-3% AR/VR แนะนำ: FP16 สำหรับ quality-sensitive (face mesh), INT8 สำหรับ speed-critical (object detection)

Q: Motion-to-photon latency คืออะไร?

A: เวลาตั้งแต่ผู้ใช้ขยับหัว/มือ จนเห็น pixels เปลี่ยนบนจอ ต้อง < 20ms — ไม่งั้นเกิด VR sickness (เวียนหัว คลื่นไส้) ประกอบด้วย: sensor → compute → render → display = total M2P TensorRT ช่วยลด compute time → ลด M2P → ผู้ใช้สบายขึ้น

Q: Deploy บน Meta Quest ได้ไหม?

A: Meta Quest ใช้ Qualcomm Snapdragon (ไม่ใช่ NVIDIA GPU) → ใช้ TensorRT ไม่ได้โดยตรง ทางเลือก: Qualcomm SNPE, ONNX Runtime Mobile, TFLite ถ้าต้องการ TensorRT: ใช้ PC VR (Quest Link + NVIDIA GPU) หรือ cloud rendering