Lead AI Engineer (FM Hosting, LLM Inference)
Location- Remote
Job Title
Lead AI Engineer ā Foundation Model Hosting & LLM Inference
Job Summary
We are looking for an experienced Lead AI Engineer to design, deploy, and optimize large-scale Foundation Model (FM) hosting and LLM inference platforms. The ideal candidate will lead AI infrastructure initiatives, improve model serving performance, and build scalable, secure, and cost-efficient AI systems for enterprise applications.
Key Responsibilities
⢠Design and manage scalable infrastructure for hosting foundation models and LLMs.
⢠Develop and optimize high-performance inference pipelines for low latency and high throughput.
⢠Deploy and manage models using containerized and distributed environments.
⢠Work with GPU acceleration, model quantization, batching, caching, and inference optimization techniques.
⢠Implement APIs and microservices for AI model serving.
⢠Monitor system reliability, availability, scalability, and cost efficiency.
⢠Collaborate with AI/ML teams to productionize machine learning and generative AI models.
⢠Lead architecture decisions for model deployment, orchestration, and observability.
⢠Ensure security, governance, and compliance for AI infrastructure.
⢠Mentor engineering teams and drive AI platform best practices.
Required Skills
⢠Strong expertise in Python and backend system development.
⢠Hands-on experience with LLM serving frameworks such as vLLM, TensorRT-LLM, or Text Generation Inference.
⢠Experience with distributed computing, GPU infrastructure, and Kubernetes.
⢠Knowledge of transformer architectures, model optimization, and inference tuning.
⢠Experience with cloud platforms such as Amazon Web Services, Microsoft Azure, or Google Cloud.
⢠Familiarity with Docker, CI/CD pipelines, and infrastructure automation.
⢠Understanding of vector databases, embeddings, and retrieval systems.
⢠Strong debugging, performance tuning, and problem-solving skills.
⢠Excellent leadership and stakeholder communication abilities.
Preferred Qualifications
⢠Bachelorās or Masterās degree in Computer Science, AI, Machine Learning, or related field.
⢠Experience deploying open-source or enterprise LLMs in production environments.
⢠Knowledge of MLOps and observability tools.
⢠Exposure to RAG architectures, fine-tuning, and AI agents is a plus.
Tools & Technologies
⢠Python, FastAPI
⢠vLLM / TensorRT-LLM
⢠Kubernetes, Docker
⢠PyTorch, CUDA
⢠Ray, Triton Inference Server
⢠Vector Databases (Pinecone, Milvus, FAISS)
⢠Amazon Web Services / Microsoft Azure / Google Cloud
⢠CI/CD & Monitoring Tools