SiamCafe.net Blog
Technology

TensorRT Optimization AR VR Development

tensorrt optimization ar vr development
TensorRT Optimization AR VR Development | SiamCafe Blog
2025-06-05· อ. บอม — SiamCafe.net· 1,070 คำ

TensorRT Optimization AR VR Development คืออะไร

TensorRT เป็น SDK จาก NVIDIA สำหรับ optimize deep learning inference บน GPU ให้เร็วที่สุด รองรับ quantization, layer fusion, kernel auto-tuning และ dynamic tensor memory AR (Augmented Reality) และ VR (Virtual Reality) ต้องการ inference latency ต่ำมาก (< 20ms) เพื่อให้ผู้ใช้ไม่รู้สึก lag การรวม TensorRT กับ AR/VR development ช่วยให้ AI features เช่น hand tracking, object detection, face mesh และ scene understanding ทำงานได้แบบ real-time บน headsets และ mobile devices

TensorRT Optimization Pipeline

# tensorrt_pipeline.py — TensorRT optimization pipeline
import json

class TensorRTPipeline:
    OPTIMIZATION_STEPS = {
        "export": {
            "name": "1. Export Model (ONNX)",
            "description": "แปลง PyTorch/TF model เป็น ONNX format",
            "tools": "torch.onnx.export(), tf2onnx",
        },
        "parse": {
            "name": "2. Parse & Build Engine",
            "description": "TensorRT parse ONNX → สร้าง optimized engine สำหรับ target GPU",
            "optimizations": "Layer fusion, kernel selection, memory optimization",
        },
        "quantize": {
            "name": "3. Quantization",
            "description": "ลด precision: FP32 → FP16 → INT8 — เร็วขึ้น 2-4x",
            "tradeoff": "INT8 เร็วสุด แต่ accuracy อาจลดเล็กน้อย (ต้อง calibrate)",
        },
        "profile": {
            "name": "4. Profile & Benchmark",
            "description": "วัด latency, throughput, memory usage",
            "target": "AR/VR: < 11ms per frame (90 FPS), < 16ms (60 FPS)",
        },
        "deploy": {
            "name": "5. Deploy to Device",
            "description": "Deploy engine ไป target device (Jetson, GPU server, mobile)",
        },
    }

    CODE = """
# optimize_model.py — TensorRT optimization
import tensorrt as trt
import numpy as np

def build_engine(onnx_path, precision='fp16', max_batch=1):
    logger = trt.Logger(trt.Logger.WARNING)
    builder = trt.Builder(logger)
    network = builder.create_network(
        1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
    )
    parser = trt.OnnxParser(network, logger)
    
    # Parse ONNX
    with open(onnx_path, 'rb') as f:
        if not parser.parse(f.read()):
            for i in range(parser.num_errors):
                print(f"Error: {parser.get_error(i)}")
            return None
    
    # Build config
    config = builder.create_builder_config()
    config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, 1 << 30)  # 1GB
    
    if precision == 'fp16':
        config.set_flag(trt.BuilderFlag.FP16)
    elif precision == 'int8':
        config.set_flag(trt.BuilderFlag.INT8)
        # Need calibrator for INT8
    
    # Set optimization profile
    profile = builder.create_optimization_profile()
    profile.set_shape("input", (1, 3, 224, 224), (1, 3, 224, 224), (max_batch, 3, 224, 224))
    config.add_optimization_profile(profile)
    
    # Build engine
    engine = builder.build_serialized_network(network, config)
    
    # Save engine
    with open(onnx_path.replace('.onnx', '.engine'), 'wb') as f:
        f.write(engine)
    
    return engine

# Build for AR/VR (FP16 for balance of speed and accuracy)
engine = build_engine('hand_tracker.onnx', precision='fp16')
"""

    def show_steps(self):
        print("=== TensorRT Optimization Pipeline ===\n")
        for key, step in self.OPTIMIZATION_STEPS.items():
            print(f"[{step['name']}]")
            print(f"  {step['description']}")
            print()

    def show_code(self):
        print("=== Build Engine Code ===")
        print(self.CODE[:600])

pipeline = TensorRTPipeline()
pipeline.show_steps()
pipeline.show_code()

AR/VR AI Use Cases

# ar_vr_ai.py — AI use cases in AR/VR
import json

class ARVRAIUseCases:
    USE_CASES = {
        "hand_tracking": {
            "name": "Hand Tracking",
            "model": "MediaPipe Hands / Custom CNN",
            "latency_req": "< 10ms (real-time gesture recognition)",
            "tensorrt_gain": "3-5x speedup → enables 90 FPS tracking",
            "application": "Gesture control ใน VR, virtual keyboard, sign language",
        },
        "object_detection": {
            "name": "Object Detection & Recognition",
            "model": "YOLOv8 / EfficientDet",
            "latency_req": "< 15ms per frame",
            "tensorrt_gain": "2-4x speedup → real-time object overlay",
            "application": "AR labels บนวัตถุจริง, product recognition, navigation",
        },
        "face_mesh": {
            "name": "Face Mesh & Expression",
            "model": "MediaPipe Face Mesh / FaceNet",
            "latency_req": "< 8ms (468 landmarks)",
            "tensorrt_gain": "3-6x speedup → smooth face filters",
            "application": "AR face filters, avatar animation, emotion detection",
        },
        "depth_estimation": {
            "name": "Monocular Depth Estimation",
            "model": "MiDaS / DPT",
            "latency_req": "< 20ms",
            "tensorrt_gain": "2-4x speedup → real-time depth map",
            "application": "AR occlusion, spatial understanding, 3D reconstruction",
        },
        "slam": {
            "name": "Visual SLAM (Simultaneous Localization and Mapping)",
            "model": "ORB-SLAM3 + deep features",
            "latency_req": "< 33ms (30 FPS minimum)",
            "tensorrt_gain": "Feature extraction 2-3x faster",
            "application": "AR world tracking, indoor navigation, spatial anchors",
        },
        "scene_understanding": {
            "name": "Semantic Segmentation",
            "model": "DeepLab / SegFormer",
            "latency_req": "< 20ms",
            "tensorrt_gain": "3-5x speedup",
            "application": "AR furniture placement, virtual walls, room layout",
        },
    }

    def show_use_cases(self):
        print("=== AR/VR AI Use Cases ===\n")
        for key, uc in self.USE_CASES.items():
            print(f"[{uc['name']}]")
            print(f"  Model: {uc['model']}")
            print(f"  Latency: {uc['latency_req']} | TRT gain: {uc['tensorrt_gain']}")
            print()

arvr = ARVRAIUseCases()
arvr.show_use_cases()

Real-time Inference Engine

# inference_engine.py — TensorRT inference for AR/VR
import json
import random

class InferenceEngine:
    CODE = """
# ar_inference.py — Real-time inference engine
import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np
import cv2
import time

class TRTInference:
    def __init__(self, engine_path):
        self.logger = trt.Logger(trt.Logger.WARNING)
        with open(engine_path, 'rb') as f:
            runtime = trt.Runtime(self.logger)
            self.engine = runtime.deserialize_cuda_engine(f.read())
        
        self.context = self.engine.create_execution_context()
        self._allocate_buffers()
    
    def _allocate_buffers(self):
        self.inputs = []
        self.outputs = []
        self.bindings = []
        self.stream = cuda.Stream()
        
        for i in range(self.engine.num_io_tensors):
            name = self.engine.get_tensor_name(i)
            shape = self.engine.get_tensor_shape(name)
            dtype = trt.nptype(self.engine.get_tensor_dtype(name))
            size = np.prod(shape)
            
            host_mem = cuda.pagelocked_empty(size, dtype)
            device_mem = cuda.mem_alloc(host_mem.nbytes)
            self.bindings.append(int(device_mem))
            
            if self.engine.get_tensor_mode(name) == trt.TensorIOMode.INPUT:
                self.inputs.append({"host": host_mem, "device": device_mem, "shape": shape})
            else:
                self.outputs.append({"host": host_mem, "device": device_mem, "shape": shape})
    
    def infer(self, input_data):
        np.copyto(self.inputs[0]["host"], input_data.ravel())
        cuda.memcpy_htod_async(self.inputs[0]["device"], self.inputs[0]["host"], self.stream)
        
        self.context.execute_async_v2(
            bindings=self.bindings, stream_handle=self.stream.handle
        )
        
        for out in self.outputs:
            cuda.memcpy_dtoh_async(out["host"], out["device"], self.stream)
        
        self.stream.synchronize()
        return [out["host"].reshape(out["shape"]) for out in self.outputs]

# AR/VR loop
engine = TRTInference('hand_tracker.engine')
cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()
    input_tensor = preprocess(frame)  # resize, normalize
    
    start = time.perf_counter()
    results = engine.infer(input_tensor)
    latency = (time.perf_counter() - start) * 1000
    
    landmarks = postprocess(results)
    render_ar(frame, landmarks)
    
    if latency > 16:  # > 60 FPS budget exceeded
        print(f"Warning: latency {latency:.1f}ms")
"""

    def show_code(self):
        print("=== Inference Engine ===")
        print(self.CODE[:600])

    def benchmark(self):
        print(f"\n=== Benchmark Results ===")
        models = [
            {"name": "Hand Tracker", "fp32": random.uniform(20, 40), "fp16": 0, "int8": 0},
            {"name": "YOLOv8-nano", "fp32": random.uniform(15, 30), "fp16": 0, "int8": 0},
            {"name": "Face Mesh", "fp32": random.uniform(10, 25), "fp16": 0, "int8": 0},
            {"name": "MiDaS Depth", "fp32": random.uniform(30, 60), "fp16": 0, "int8": 0},
        ]
        for m in models:
            m["fp16"] = m["fp32"] * random.uniform(0.35, 0.55)
            m["int8"] = m["fp32"] * random.uniform(0.2, 0.35)
            print(f"  {m['name']:<16} FP32: {m['fp32']:>5.1f}ms | FP16: {m['fp16']:>5.1f}ms | INT8: {m['int8']:>5.1f}ms")

ie = InferenceEngine()
ie.show_code()
ie.benchmark()

Edge Deployment

# edge_deploy.py — Deploy to AR/VR devices
import json

class EdgeDeployment:
    DEVICES = {
        "jetson_orin": {
            "name": "NVIDIA Jetson AGX Orin",
            "gpu": "Ampere (2048 CUDA cores)",
            "tensorrt": "TensorRT 8.6+",
            "use": "Standalone VR headsets, robotics, edge AI",
            "perf": "INT8: 275 TOPS",
        },
        "jetson_nano": {
            "name": "NVIDIA Jetson Nano",
            "gpu": "Maxwell (128 CUDA cores)",
            "tensorrt": "TensorRT 8.x",
            "use": "Low-cost AR prototyping, education",
            "perf": "FP16: 0.5 TFLOPS",
        },
        "rtx_4090": {
            "name": "NVIDIA RTX 4090 (Desktop VR)",
            "gpu": "Ada Lovelace (16384 CUDA cores)",
            "tensorrt": "TensorRT 8.6+",
            "use": "PC VR (Quest Link, SteamVR), training + inference",
            "perf": "INT8: 1,321 TOPS",
        },
        "cloud_gpu": {
            "name": "Cloud GPU (A100/H100)",
            "gpu": "Hopper/Ampere",
            "tensorrt": "TensorRT 8.6+",
            "use": "Cloud rendering, multi-user AR/VR, heavy models",
            "perf": "H100 INT8: 3,958 TOPS",
        },
    }

    OPTIMIZATION_TIPS = {
        "model_size": "ใช้ lightweight models: MobileNet, EfficientNet-Lite, YOLO-nano",
        "input_resolution": "ลด input resolution: 640×480 → 320×240 สำหรับ mobile AR",
        "batch_size": "Batch=1 สำหรับ AR/VR (latency > throughput)",
        "async_inference": "Async inference — process frame N ขณะ render frame N-1",
        "multi_model": "Run multiple models: hand + face + object — pipeline ให้ overlap",
        "dynamic_quality": "Dynamic quality — ลด precision/resolution เมื่อ latency สูง",
    }

    def show_devices(self):
        print("=== Target Devices ===\n")
        for key, dev in self.DEVICES.items():
            print(f"[{dev['name']}]")
            print(f"  GPU: {dev['gpu']} | TRT: {dev['tensorrt']}")
            print(f"  Use: {dev['use']}")
            print()

    def show_tips(self):
        print("=== Optimization Tips ===")
        for key, tip in self.OPTIMIZATION_TIPS.items():
            print(f"  [{key}] {tip}")

edge = EdgeDeployment()
edge.show_devices()
edge.show_tips()

Performance Monitoring

# monitoring.py — AR/VR performance monitoring
import json
import random

class PerformanceMonitoring:
    METRICS = {
        "frame_time": "Total frame time (target: < 11ms for 90 FPS)",
        "inference_time": "AI inference latency (target: < 5ms)",
        "render_time": "Rendering time (target: < 5ms)",
        "motion_to_photon": "Motion-to-photon latency (target: < 20ms — ป้องกัน VR sickness)",
        "fps": "Frames per second (target: 72-120 FPS)",
        "gpu_utilization": "GPU usage % (target: < 80% — headroom สำหรับ spikes)",
        "thermal": "Device temperature (throttling > 80°C)",
    }

    def show_metrics(self):
        print("=== AR/VR Performance Metrics ===\n")
        for metric, desc in self.METRICS.items():
            print(f"  [{metric}] {desc}")

    def live_stats(self):
        print(f"\n=== Live Performance ===")
        fps = random.randint(85, 95)
        inference = random.uniform(3, 8)
        render = random.uniform(2, 5)
        frame = inference + render + random.uniform(0.5, 2)
        m2p = frame + random.uniform(5, 10)
        gpu = random.randint(50, 80)
        temp = random.uniform(55, 75)
        
        fps_status = "OK" if fps >= 72 else "LOW"
        m2p_status = "OK" if m2p < 20 else "HIGH"
        
        print(f"  FPS: {fps} [{fps_status}]")
        print(f"  Inference: {inference:.1f}ms")
        print(f"  Render: {render:.1f}ms")
        print(f"  Frame total: {frame:.1f}ms")
        print(f"  Motion-to-photon: {m2p:.1f}ms [{m2p_status}]")
        print(f"  GPU: {gpu}% | Temp: {temp:.0f}°C")

mon = PerformanceMonitoring()
mon.show_metrics()
mon.live_stats()

การนำความรู้ไปประยุกต์ใช้งานจริง

แหล่งเรียนรู้ที่แนะนำ ได้แก่ Official Documentation ที่อัพเดทล่าสุดเสมอ Online Course จาก Coursera Udemy edX ช่อง YouTube คุณภาพทั้งไทยและอังกฤษ และ Community อย่าง Discord Reddit Stack Overflow ที่ช่วยแลกเปลี่ยนประสบการณ์กับนักพัฒนาทั่วโลก

FAQ - คำถามที่พบบ่อย

Q: TensorRT จำเป็นสำหรับ AR/VR ไหม?

A: จำเป็นมากถ้าใช้ AI features บน NVIDIA GPUs — speedup 2-6x เทียบกับ PyTorch/TF inference ปกติ AR/VR ต้องการ < 11ms per frame (90 FPS) — ไม่มี TensorRT อาจไม่ทัน ทางเลือก: ONNX Runtime (cross-platform), CoreML (Apple), TFLite (mobile)

Q: FP16 กับ INT8 ใช้อันไหน?

A: FP16: ดีสำหรับเกือบทุก use case — เร็วขึ้น 2x, accuracy แทบไม่ลด INT8: เร็วสุด (3-4x) แต่ต้อง calibrate — accuracy อาจลด 1-3% AR/VR แนะนำ: FP16 สำหรับ quality-sensitive (face mesh), INT8 สำหรับ speed-critical (object detection)

Q: Motion-to-photon latency คืออะไร?

A: เวลาตั้งแต่ผู้ใช้ขยับหัว/มือ จนเห็น pixels เปลี่ยนบนจอ ต้อง < 20ms — ไม่งั้นเกิด VR sickness (เวียนหัว คลื่นไส้) ประกอบด้วย: sensor → compute → render → display = total M2P TensorRT ช่วยลด compute time → ลด M2P → ผู้ใช้สบายขึ้น

Q: Deploy บน Meta Quest ได้ไหม?

A: Meta Quest ใช้ Qualcomm Snapdragon (ไม่ใช่ NVIDIA GPU) → ใช้ TensorRT ไม่ได้โดยตรง ทางเลือก: Qualcomm SNPE, ONNX Runtime Mobile, TFLite ถ้าต้องการ TensorRT: ใช้ PC VR (Quest Link + NVIDIA GPU) หรือ cloud rendering

📖 บทความที่เกี่ยวข้อง

TensorRT Optimization Learning Path Roadmapอ่านบทความ → TensorRT Optimization Metric Collectionอ่านบทความ → TensorRT Optimization CDN Configurationอ่านบทความ → TensorRT Optimization Network Segmentationอ่านบทความ → TensorRT Optimization Data Pipeline ETLอ่านบทความ →

📚 ดูบทความทั้งหมด →