Grow With Thankz

Thankz provides a thriving career with companies abroad without having to leave your home. We understand that finding a meaningful job can be challenging. We create a rewarding work environment and offer opportunities that make going to work exciting for you. If you’re here to thrive, Thankz is the place!

Site Reliability Engineer (Remote)

Are you looking to hire?

Thankz offers a range of outstanding Site Reliability Engineer (Remote) candidates. If you're searching for top talent in this field or a similar position, our team can find the ideal person who meets your specific needs and requirements.

As a Site Reliability Engineer, you will play a vital role in ensuring the reliability, scalability, and performance of our systems. You will collaborate with cross-functional teams to optimize our infrastructure, automate processes, and proactively address any potential issues.  

What you'll be doing 

  • Designing, implementing, and maintaining scalable and resilient infrastructure solutions 
  • Automating deployment, configuration, and monitoring processes to streamline operations 
  • Conducting performance testing and capacity planning to identify bottlenecks and optimize system performance 
  • Troubleshooting production incidents and implementing effective resolutions 
  • Implementing monitoring and alerting systems to proactively identify and address issues
  • Collaborating with development teams to optimize application performance and reliability 
  • Participating in on-call rotations and responding to incidents in a timely manner 
  • Conducting root cause analysis to identify underlying issues and prevent future occurrences 
  • Continuously researching and evaluating new technologies and best practices to enhance system reliability 

Requirements 

  • Bachelor's degree in Computer Science, Information Systems, or a related field  
  • Proven experience as a Site Reliability Engineer or in a similar role 
  • C1/C2 English Level proficiency (both written and spoken English)  
  • Strong background in Linux/Unix administration and scripting 
  • Proficiency in at least one programming language (e.g., Python, Go, Java) 
  • Experience with configuration management tools (e.g., Ansible, Chef, Puppet) 
  • Knowledge of cloud platforms (e.g., AWS, Azure, GCP) and containerization technologies (e.g., Docker, Kubernetes)
  • Familiarity with monitoring and logging tools (e.g., Prometheus, ELK stack) 
  • Understanding of networking principles and protocols (TCP/IP, DNS, HTTP) 
  • Excellent problem-solving and troubleshooting skills 

Preferred candidates possess a deep understanding of cloud platforms, containerization technologies, and monitoring tools and with a strong background in infrastructure and automation. They have a passion for ensuring system reliability, scalability, and performance. Excellent problem-solving skills, the ability to work well in a remote team environment, and a proactive mindset are highly valued. 

We offer a full-time, US-hours remote job, 40-hour workweek Mon-Fri, with excellent prospects for long-term growth for an ambitious experienced Site Reliability Engineer (Remote). We can offer HMO and other benefits to Philippine candidates.