Site Reliability EngineerApply now
- 4 years or more Mexico/LATAM Remote/Full time
Site Reliability Engineer is a collaborative, detail-oriented development-focused engineer tasked with solving operational, scalability, and reliability issues. They apply software engineering methodologies to system administration processes and collaborate with software engineers and product developers to optimize system performance, stability and reliability. The Site Reliability Engineer is also responsible for creating ways to enhance and automate operations tasks and managing system availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning with the support of their team.
What will you do?
- Own your deep learning about our software, its functions, and how it fulfills our clients’ needs and how they use the product.
- Oversee systems to ensure reliability for customers.
- Monitor distribution systems and notify appropriate persons of any potential issues.
- Run the production environment by monitoring availability and taking a holistic view of system health.
- Build software and systems to manage platform infrastructure and applications.
- Improve reliability, quality, and time-to-market of our suite of software solutions.
- Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve.
- Partner with development teams to improve services through rigorous testing and release procedures.
- 4+ years of proven experience in a Site Reliability role or similar experience.
- Bachelor's Degree (B.A.) in Computer Science or Design or equivalent four-year degree, or equivalent related experience.
- Excellent oral and written communication skills, including facilitation of group presentations, and consulting skills in the English language.
- Possess deep technical experience with AWS, containerization technologies, automated deployment frameworks, monitoring, logging, alerting, system internals, networking, databases, distributed systems, and service-oriented architecture.
- Demonstrate hands-on technical leadership and business impact in combining software engineering skills with systems engineering skills to solve complex automation and reliability challenges.
- Experience working with Infrastructure and Application Monitoring tools such as: New Relic, SumoLogic, Uptime monitoring (Pingdom), CloudTrail, CloudWatch Insights, CloudFormation, CodePipeline, CodeDeploy.
- Extensive working knowledge of AWS Networking, (Setup, administration of VPC Peering Connections, Transit Gateways, VPNs, etc).
- Experience working with MSSQL, MySQL, in cloud-based environments as well as demonstrable knowledge and experience of AWS service technologies i.e. Aurora MySQL.
- Experience of working with NoSQL database technologies (ideally DynamoDB).
- Experience of working with pipeline automation scripting and tooling i.e. Jenkins, Terraform.
- Ability to learn new languages and technologies strongly preferred.
WHY WORK WITH US?
- Growth Opportunities.
- Home office.
- Long-term projects.