MICHAEL PAGE

Site Reliability Engineer

Job Details

Company Overview

The Platform Engineering team is seeking a versatile and passionate Site Reliability Engineer to play a critical role in enhancing and optimizing our developer services infrastructure. You will join a highly experienced team that supports a wide array of critical systems, including source control, continuous integration pipelines, and observability tools, all of which are vital to the stability and performance of our trading platforms.

Client Details

The Platform Engineering organization focused on accelerating tech team workflows through providing self-service tools, services, documentation, and support. Platform Engineering is responsible for designing, building, and maintaining the underlying runtime platforms that our software applications depend on. The mission is to streamline development processes, establish a consistent technical foundation across regions, and empower teams with the necessary resources to innovate efficiently.

Platform Engineering is a global team and acts as a bridge between the technical requirements of application development and the practical aspects of deploying and maintaining those applications in a production environment, minimizing friction, and ensuring that tech teams can operate seamlessly and drive progress forward.

Description

Optimize and enhance the reliability, scalability, and performance of our development services infrastructure
Administer and oversee source control systems, continuous integration services, artifact repositories, metrics, logging, observability tools, and related systems, both in on-premises environments and leveraging AWS cloud services
Integrate and deploy new Cloud SaaS solutions, collaborating cross-functionally to deliver innovative tools and functionalities to our teams
Partner with development teams to align system capabilities with their evolving needs, ensuring seamless operations and support, boosting organizational efficiency
Diagnose and resolve production incidents swiftly, supporting Global Operations in a 'follow the sun' model, working with regional peers to guarantee continuous 24/7 system availability
Identify and eliminate system performance bottlenecks and drive uptime improvements through proactive automation, monitoring, and performance tuning
Lead and mentor junior team members, fostering their growth and technical expertise

Profile

5+ years of work experience in a relevant role
Bachelor's Degree in Computer Engineering, Computer Science or equivalent
Strong programming experience in Python or Go; Shell scripting is a plus
Strong knowledge of Linux/Unix Systems
Experience in operating and enhancing services across various environments, designing and implementing disaster recovery plans, high availability (HA) configurations, and failover strategies
Hands-on experience with containerization and orchestration Tools, including Kubernetes and Docker
Experience with CI/CD tools such as Jenkins, GitLab CI, or TeamCity, including the automation of build, test, and deployment processes
Understanding of security principles and best practices, including experience managing secrets and compliance frameworks

Job Offer

We offer competitive remuneration package and comprehensive fringe benefits including medical and life insurance, excellent learning & development opportunities and flexibility to the right candidate.

To apply online please click the 'Apply' button below. For a confidential discussion about this role please contact Royce Chan on +852 3602 2491.

Additional Information

Work Exp

5 Years

Education

Bachelor Degree

Employment Type

Full Time

Location

Within Hong Kong

Job Level

Middle