Site Reliability Engineer, Machine Learning Operations

MANPOWER STAFFING SERVICES (SINGAPORE) PTE LTD

Apply Now

Site Reliability Engineer, Machine Learning Operations

MANPOWER STAFFING SERVICES (SINGAPORE) PTE LTD

Location

D02 Anson, Tanjong Pagar

Job Type

Full-time

Experience

Mid

Job Details

Vacancies

1 position

Experience Required

No experience required

Job Description

Purpose of Role:

Frontline On-Call Ownership: Serve as the primary responder for the Applied Machine Learning Engine, taking ownership of system availability, health monitoring, and immediate incident response to ensure high reliability.
Incident Lifecycle Management: Manage the end-to-end feedback loop for incidents, including rapid triage, effective resolution, and the facilitation of post-incident reviews to ensure closure and prevent recurrence.
SOP Execution & Optimization: Execute upgrades and deployments strictly adhering to Standard Operating Procedures (SOPs), while actively leveraging Machine Learning and Infrastructure expertise to refine, automate, and improve these processes for greater efficiency.

Responsibilities:

Analyse all kinds of user needs related to machine learning systems provided by AML department , through oncall shifting or any other mechanisms, then propose customer oriented solutions .
Work with other software engineers to implement and deploy customer-oriented machine learning framework related solutions which are proposed by oneself or not .
Update software, enhances existing software capabilities, and develops or deploy software testing 、deployment 、capacity management and validation procedures.
Work with computer hardware engineers to integrate hardware and software systems and trouble-shooting specifications and performance requirements.

Minimum requirements:

Bachelor’s degree in Computer Science or equivalent with 3+ years of relevant experience
Proven experience in analyzing and troubleshooting distributed systems.
Prior experience designing or maintaining large-scale systems.
Scripting skills in at least one major language (Python, Go, or Shell/Bash) to automate repetitive operational tasks.

Nice to have:

Experience defining and managing Service Level Indicators (SLIs), Service Level Objectives (SLOs), error budgets, and practicing Chaos Engineering.
Experience operating MLOps platforms and toolkits such as Kubeflow, MLflow, Feast, or Ray.
Deep understanding of Linux operating system internals or container technologies (Docker/Containerd) and orchestration platforms (Kubernetes) in a production environment.
Basic understanding of Machine Learning concepts and familiarity with frameworks like TensorFlow Serving, TorchServe, or Triton Inference Server

Similar Jobs

(Japanese speaking) Customer Service Assistant (Medical industry)

PASONA SINGAPORE PTE. LTD. • D01 Cecil, Marina, People’s Park, Raffles Place • 9 hours ago

Assistant relationship manager (Private Bank - North Asia Team)

BEATHCHAPMAN (PTE. LTD.) • Islandwide • 9 hours ago

Development Executive (Gaming Industry/ Japanese Speaking) – JK

PASONA SINGAPORE PTE. LTD. • Islandwide • 9 hours ago

Market Risk Analyst – Asset Management

BEATHCHAPMAN (PTE. LTD.) • D02 Anson, Tanjong Pagar • 9 hours ago

Cleaning Operation Manager

BESTWAY CLEANING SERVICES PTE LTD • Islandwide • 9 hours ago

Apply Now

Response Reality Check

Quality: 60%

Response N/A

Company Stats

Response metrics N/A

Platform Spread

mycareersfuture

60%

Quality Score

N/A

Response Rate

View Full Analysis

MANPOWER STAFFING SERVICES (SINGAPORE) PTE LTD

View Company Profile

About MANPOWER STAFFING SERVICES (SINGAPORE) PTE LTD

MANPOWER STAFFING SERVICES (SINGAPORE) PTE LTD Manpower is the global leader in contingent and permanent recruitment workforce solutions. It is part...

Ready to Apply?

This is a direct application to MANPOWER STAFFING SERVICES (SINGAPORE) PTE LTD. No recruitment agencies involved.

Apply for this Position

Response rate not available - Direct application to employer