Site Reliability Engineer
The Platform Engineering team is seeking a versatile and passionate Site Reliability Engineer to play a critical role in enhancing and optimizing our developer services infrastructure. You will join a highly experienced team that supports a wide array of critical systems, including source control, continuous integration pipelines, and observability tools, all of which are vital to the stability and performance of our trading platforms.
Client Details
The Platform Engineering organization focused on accelerating tech team workflows through providing self-service tools, services, documentation, and support. Platform Engineering is responsible for designing, building, and maintaining the underlying runtime platforms that our software applications depend on. The mission is to streamline development processes, establish a consistent technical foundation across regions, and empower teams with the necessary resources to innovate efficiently.
Platform Engineering is a global team and acts as a bridge between the technical requirements of application development and the practical aspects of deploying and maintaining those applications in a production environment, minimizing friction, and ensuring that tech teams can operate seamlessly and drive progress forward.
Description
- Optimize and enhance the reliability, scalability, and performance of our development services infrastructure
- Administer and oversee source control systems, continuous integration services, artifact repositories, metrics, logging, observability tools, and related systems, both in on-premises environments and leveraging AWS cloud services
- Integrate and deploy new Cloud SaaS solutions, collaborating cross-functionally to deliver innovative tools and functionalities to our teams
- Partner with development teams to align system capabilities with their evolving needs, ensuring seamless operations and support, boosting organizational efficiency
- Diagnose and resolve production incidents swiftly, supporting Global Operations in a 'follow the sun' model, working with regional peers to guarantee continuous 24/7 system availability
- Identify and eliminate system performance bottlenecks and drive uptime improvements through proactive automation, monitoring, and performance tuning
- Lead and mentor junior team members, fostering their growth and technical expertise
Profile
- 5+ years of work experience in a relevant role
- Bachelor's Degree in Computer Engineering, Computer Science or equivalent
- Strong programming experience in Python or Go; Shell scripting is a plus
- Strong knowledge of Linux/Unix Systems
- Experience in operating and enhancing services across various environments, designing and implementing disaster recovery plans, high availability (HA) configurations, and failover strategies
- Hands-on experience with containerization and orchestration Tools, including Kubernetes and Docker
- Experience with CI/CD tools such as Jenkins, GitLab CI, or TeamCity, including the automation of build, test, and deployment processes
- Understanding of security principles and best practices, including experience managing secrets and compliance frameworks
Job Offer
We offer competitive remuneration package and comprehensive fringe benefits including medical and life insurance, excellent learning & development opportunities and flexibility to the right candidate.
To apply online please click the 'Apply' button below. For a confidential discussion about this role please contact Royce Chan on +852 3602 2491.
Additional Information
Education
Bachelor Degree
Employment Type
Full Time
Published On
22 hours ago
Job Ref. No.
JN -032025-6692360_37090
You may also interested
Beside your search result, cpjobs also recommend you some jobs which you may interested in.