Objectives:
- Make sure the production environment is running smoothly and monitor availability
- Develop, design and maintain software and systems to manage infrastructure
- Improve reliability, and quality as well as accelerate time-to-market for engineering teams
- Optimize system performance, and stay tuned for business needs
Responsibilities
- Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
- Work closely with development teams to improve service quality
- Involve in system design consulting, platform management, and capacity planning
- To build sustainable systems and services through automation
Required Skills and Qualifications
- Extensively knowledge with one or more high level languages, such as Python, Java, Rust, C/C++, Ruby, and JavaScript
- Experience with distributed storage technologies like NFS, HDFS, Ceph, S3 as well as dynamic resource management frameworks (Mesos, Kubernetes, Yarn)
- A knack for cloud technologies, linux production environment, and scripting.
- Experience with Infrastructure as code (Terraform, Pulumi)