GPU Kernel Developer – AI/ML

NLB Services·

viaLinkedIn

RemoteFull-timePublic

Anywhere20h ago

Job Description

Role Summary We are seeking expert-level GPU Software Engineers to support a high-visibility platform initiative within the Maya program, focused on building software tooling on top of a custom compiler and SDK. The role involves developing, optimizing, and porting GPU kernels and AI workloads to a specialized hardware platform. This is a critical and time-sensitive engagement with immediate onboarding expectations and long-term roadmap alignment (~18 months). Key Responsibilities • Develop GPU kernels for specialized hardware platforms using PyTorch/Triton frameworks • Build software solutions leveraging custom compiler and SDK capabilities • Design and implement kernel-level optimizations to control hardware execution behavior • Port open-source AI/ML models to custom SDK environments • Port and adapt high-performance computing benchmarks and stress workloads such as: • Linpack (High Performance Linpack) • BERT/benchmark-style workloads (referred as “Babu bench”) • • Develop stress testing and validation workloads aligned to hardware behaviour and platform validation • • Support testing and stress testing of current and next-generation hardware platforms • • Collaborate closely with platform architects and compiler teams to enhance system capabilities Core Technical Skills (Must-Have) Programming & Frameworks • Python • C/C++ (systems-level programming) • PyTorch • Triton (Triton language / kernel development) GPU & Systems Expertise • GPU kernel development (mandatory and critical) • Strong understanding of GPU architecture and compute optimization • Experience with compiler-based optimizations / runtime execution layers • Experience with custom SDKs or hardware abstraction layers Performance & Workloads • Experience in: • GEMM kernel development (matrix multiplication kernels) • Porting ML models to new hardware platforms • Performance tuning and stress testing at system level Nice-to-Have Skills • Experience working with custom silicon / hardware platforms • Exposure to high-performance computing (HPC) workloads • Familiarity with: • Linpack benchmarks • AI workload benchmarking tools • • Experience in compiler optimization ecosystems Engagement Model & Structure • Number of roles: 3 developers (initial hiring may start with 2) • Location flexibility: • Onsite / Offshore / Hybrid mix allowed • • Timeline: • Immediate start required • • Duration: • ~18 months program duration with phased platform evolution Key Differentiators (Critical Expectation) • This is NOT a DevOps / support / debugging role • Requires deep hands-on engineering expertise in: • Kernel programming • GPU workloads • ML framework internals • • Candidates must demonstrate build-level competence, not just theoretical knowledge

GPU Kernel Developer – AI/ML

NLB Services · Anywhere

New