Site Reliability Engineer
Apply now- +4 years México LATAM Remote/Full Time
Description:
We are hiring for a development oriented, collaborative and detail oriented Engineer responsible for solving operational, scalability and reliability issues. Applying software engineering methodologies to system administration processes and collaborate with software engineers and product developers to optimize system performance, system stability and reliability.
The Site Reliability Engineer is also responsible for creating ways to improve and automate operations tasks and manage system availability. And automate operations tasks and manage system availability, latency, performance efficiency, change management, monitoring, emergency response and capacity planning with the support of their team.
What will you do?
- Own your deep learning about our software, its functions, and how it fulfills our clients’ needs and how they use the product
- Oversee systems to ensure reliability for customers.
- Monitor distribution systems and notify appropriate persons of any potential issues.
- Run the production environment by monitoring availability and taking a holistic view of system health.
- Build software and systems to manage platform infrastructure and applications.
- Improve reliability, quality, and time-to-market of our suite of software solutions.
- Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve.
- Partner with development teams to improve services through rigorous testing and release procedures.
Requirements:
- 4+ years of proven experience in a Site Reliability role or similar experience.
- Bachelor's Degree (B.A.) in Computer Science or Design or equivalent four-year degree, or equivalent related experience.
- Excellent oral and written communication skills, including facilitation of group presentations, and consulting skills in the English language.
- Possess deep technical experience with AWS, containerization technologies, automated deployment frameworks, monitoring, logging, alerting, system internals, networking, databases, distributed systems, and service-oriented architecture.
- Demonstrate hands-on technical leadership and business impact in combining software engineering skills with systems engineering skills to solve complex automation and reliability challenges.
- Experience working with Infrastructure and Application Monitoring tools such as: New Relic, SumoLogic, Uptime monitoring (Pingdom), CloudTrail, CloudWatch Insights, CloudFormation, CodePipeline, CodeDeploy.
- Extensive working knowledge managing AWS and Linux OS.
- Experience working with MSSQL, MySQL, in cloud-based environments as well as demonstrable knowledge and experience of AWS service technologies i.e. Aurora MySQL.
- Experience of working with NoSQL database technologies (ideally DynamoDB).
- Experience of working with pipeline automation scripting and tooling i.e. Jenkins, Terraform.
- Ability to learn new languages and technologies strongly preferre.
- Advanced English required.