Lead Software Architect – AI Infrastructure & Cluster Performance | Singapore
Lead Software Architect – AI Infrastructure & Cluster Performance | Singapore
Job Details
Vacancies
1 position
Experience Required
No experience required
Job Description
Location: Singapore (travel-ready for regional AI cluster deployments)
Competitive Salary: From SGD $12,000
Work Hours: 9 AM – 6 PM, Monday – Friday
Location: Ubi Road (East)
Key Responsibilities
Leadership Responsibilities
Lead multi-disciplinary engineering teams in AI cluster performance and deployment projects.
Define technical roadmaps, standards, and best practices for large-scale AI infrastructure.
Mentor and upskill engineers in high-performance AI frameworks, cluster optimization, and security hardening.
Manage stakeholder communications, performance reporting, and deployment planning.
Drive decision-making for hardware-software trade-offs, contingencies, and multi-region deployments.
Foster a culture of reliability, proactive monitoring, and continuous performance improvement.
Collaborate with cross-functional teams to enforce governance, Zero-Trust access, and infrastructure hardening.
Technical Responsibilities
Cluster & Hardware Optimization:
Conduct cluster-level audits for software consistency across 3,456 GPUs and 48 racks.
Fine-tune BIOS, firmware, kernel, and network parameters (NVLink, InfiniBand, PCIe Gen5/6) for maximum throughput.
Validate collective communications using NVIDIA NCCL and SHARP for zero-bottleneck AI training.
AI Frameworks & Orchestration:
Deploy, integrate, and optimize PyTorch, JAX, Slurm, Kubernetes with GPU-direct storage.
Implement real-time telemetry for GPU/NPU health, power, and thermal metrics.
Security & Compliance:
Implement Hardware Root-of-Trust, Secure Boot, and Zero-Trust IAM policies.
Enforce inline encryption (AES-GCM 256) for AI fabrics and secure sensitive training data.
Conduct vulnerability scanning, penetration testing, and ensure compliance with ISO27001, SOC2, NIST AI RMF.
Implement secrets management for API keys, SSH keys, and SSL certificates.
Networking & Performance Tuning:
Optimize ultra-low latency networks (400G/800G fabrics) and eliminate congestion using Sharpv4, RoCEv2.
Audit network paths, validate topology alignment with SDN, and monitor fabric performance proactively.
Configure and validate GPUDirect RDMA/Storage for direct GPU-to-storage data movement.
Send Full Name, Contact Number & Resume to:
📱 +65 88914996
📲 Email: [email protected]
Please include your availability, notice period and expected salary in your application.
----------------------------------------------------------------
*Only shortlisted candidates will be contacted.
CREW by HRNet | HRnet Ventures Pte Ltd (24C2435)
Athirah Bte Rosli (R2197227)
Similar Jobs
Cleaning Operation Manager
Project Data Admin
General Manager – Industrial Waste (Waste Treatment Facility-Oil Sludge)
Mechanical Design Engineer (Machine Design)
Technical Specialist (Aerospace) @ Seletar | 1 year contract | $2.8K - $3.5K
Response Reality Check
HRNET VENTURES PTE. LTD.
Ready to Apply?
This is a direct application to HRNET VENTURES PTE. LTD.. No recruitment agencies involved.
Apply for this PositionResponse rate not available - Direct application to employer