Lead AI Engineer (FM Hosting, LLM Inference)/Remote

Apetan Consulting·

viaDice

RemoteContractorPublic

Anywhere15h ago

Job Description

Lead AI Engineer (FM Hosting, LLM Inference) Location- Remote Job Title Lead AI Engineer – Foundation Model Hosting & LLM Inference Job Summary We are looking for an experienced Lead AI Engineer to design, deploy, and optimize large-scale Foundation Model (FM) hosting and LLM inference platforms. The ideal candidate will lead AI infrastructure initiatives, improve model serving performance, and build scalable, secure, and cost-efficient AI systems for enterprise applications. Key Responsibilities • Design and manage scalable infrastructure for hosting foundation models and LLMs. • Develop and optimize high-performance inference pipelines for low latency and high throughput. • Deploy and manage models using containerized and distributed environments. • Work with GPU acceleration, model quantization, batching, caching, and inference optimization techniques. • Implement APIs and microservices for AI model serving. • Monitor system reliability, availability, scalability, and cost efficiency. • Collaborate with AI/ML teams to productionize machine learning and generative AI models. • Lead architecture decisions for model deployment, orchestration, and observability. • Ensure security, governance, and compliance for AI infrastructure. • Mentor engineering teams and drive AI platform best practices. Required Skills • Strong expertise in Python and backend system development. • Hands-on experience with LLM serving frameworks such as vLLM, TensorRT-LLM, or Text Generation Inference. • Experience with distributed computing, GPU infrastructure, and Kubernetes. • Knowledge of transformer architectures, model optimization, and inference tuning. • Experience with cloud platforms such as Amazon Web Services, Microsoft Azure, or Google Cloud. • Familiarity with Docker, CI/CD pipelines, and infrastructure automation. • Understanding of vector databases, embeddings, and retrieval systems. • Strong debugging, performance tuning, and problem-solving skills. • Excellent leadership and stakeholder communication abilities. Preferred Qualifications • Bachelor’s or Master’s degree in Computer Science, AI, Machine Learning, or related field. • Experience deploying open-source or enterprise LLMs in production environments. • Knowledge of MLOps and observability tools. • Exposure to RAG architectures, fine-tuning, and AI agents is a plus. Tools & Technologies • Python, FastAPI • vLLM / TensorRT-LLM • Kubernetes, Docker • PyTorch, CUDA • Ray, Triton Inference Server • Vector Databases (Pinecone, Milvus, FAISS) • Amazon Web Services / Microsoft Azure / Google Cloud • CI/CD & Monitoring Tools

Lead AI Engineer (FM Hosting, LLM Inference)/Remote

Apetan Consulting · Anywhere

New