GPU Kernel Developer – AI/ML
NLB Services·
viaLinkedIn
RemoteFull-timePublic
Anywhere20h ago
Job Description
Role Summary
We are seeking expert-level GPU Software Engineers to support a high-visibility platform initiative within the Maya program, focused on building software tooling on top of a custom compiler and SDK.
The role involves developing, optimizing, and porting GPU kernels and AI workloads to a specialized hardware platform.
This is a critical and time-sensitive engagement with immediate onboarding expectations and long-term roadmap alignment (~18 months).
Key Responsibilities
• Develop GPU kernels for specialized hardware platforms using PyTorch/Triton frameworks
• Build software solutions leveraging custom compiler and SDK capabilities
• Design and implement kernel-level optimizations to control hardware execution behavior
• Port open-source AI/ML models to custom SDK environments
• Port and adapt high-performance computing benchmarks and stress workloads such as:
• Linpack (High Performance Linpack)
• BERT/benchmark-style workloads (referred as “Babu bench”)
• • Develop stress testing and validation workloads aligned to hardware behaviour and platform validation
• • Support testing and stress testing of current and next-generation hardware platforms
• • Collaborate closely with platform architects and compiler teams to enhance system capabilities
Core Technical Skills (Must-Have)
Programming & Frameworks
• Python
• C/C++ (systems-level programming)
• PyTorch
• Triton (Triton language / kernel development)
GPU & Systems Expertise
• GPU kernel development (mandatory and critical)
• Strong understanding of GPU architecture and compute optimization
• Experience with compiler-based optimizations / runtime execution layers
• Experience with custom SDKs or hardware abstraction layers
Performance & Workloads
• Experience in:
• GEMM kernel development (matrix multiplication kernels)
• Porting ML models to new hardware platforms
• Performance tuning and stress testing at system level
Nice-to-Have Skills
• Experience working with custom silicon / hardware platforms
• Exposure to high-performance computing (HPC) workloads
• Familiarity with:
• Linpack benchmarks
• AI workload benchmarking tools
• • Experience in compiler optimization ecosystems
Engagement Model & Structure
• Number of roles: 3 developers (initial hiring may start with 2)
• Location flexibility:
• Onsite / Offshore / Hybrid mix allowed
• • Timeline:
• Immediate start required
• • Duration:
• ~18 months program duration with phased platform evolution
Key Differentiators (Critical Expectation)
• This is NOT a DevOps / support / debugging role
• Requires deep hands-on engineering expertise in:
• Kernel programming
• GPU workloads
• ML framework internals
• • Candidates must demonstrate build-level competence, not just theoretical knowledge
GPU Kernel Developer – AI/ML
NLB Services · Anywhere
Decision Maker
Daniel Mercer
Head of Engineering
Hiring Team
Advanced Hiring Intelligence
Explore the hiring team for this profile. Go beyond the job board to find decision-makers.