Site Reliability Engineer

Apply now
  • year-experience +4 years place México LATAM contract Remote/Full Time
Description:

We are hiring a development-oriented, collaborative, and detail-focused Site Reliability Engineer (SRE) responsible for solving operational, scalability, and reliability challenges. In this role, you will apply software engineering methodologies to system administration processes and collaborate with software engineers and product developers to optimize system performance, stability, and reliability.

The ideal candidate will focus on improving and automating operational tasks while ensuring system availability and scalability. You will manage critical aspects such as latency, performance efficiency, monitoring, emergency response, capacity planning, and change management alongside your team.

We are seeking a proactive individual with strong leadership, resource administration, and communication skills who thrives in a team-oriented environment. A background in development, combined with hands-on SRE or DevOps experience, is essential.

What will you do?

  • Gain a deep understanding of our software, its functions, and how it fulfills our clients’ needs, and identify how they use the product.
  • Oversee and maintain systems to ensure customers’ reliability.
  • Monitor distribution systems and notify appropriate persons of any potential issues.
  • Run production environment by monitoring availability and taking a holistic view of system health.
  • Build software and systems to manage platform infrastructure and applications.
  • Implement Infrastructure as Code (IaC) principles to streamline infrastructure management.
  • Improve reliability, quality, and time-to-market of our suite of software solutions.
  • Measure and optimize system performance, to expand our capabilities, anticipate customer needs, and innovating to improve continually.
  • Collaborate with development teams to enhance services through rigorous testing, release procedures, and automation.
  • Lead incident response efforts, including root cause analysis and the implementation of preventative measures.
  • Participate in on-call rotations and ensure proper incident management and escalation processes.
  • Develop and maintain monitoring tools and dashboards to improve system visibility.
  • Create robust documentation and playbooks for operational processes and emergency protocols.

Requirements:
  • 4+ years of proven experience in a Site Reliability role or similar experience.
  • Bachelor’s Degree in Computer Science, or equivalent experience.
  • Advanced English proficiency, with excellent oral and written communication skills.
  • Development background (not currently a developer but with prior experience in software development).
  • Expertise in AWS (mandatory), containerization technologies, automated deployment frameworks, monitoring, logging, alerting, system internals, networking, databases, distributed systems, and service-oriented architecture.
  • Strong knowledge of Infrastructure as Code (IaC) tools, such as Terraform.
  • Demonstrate hands-on technical leadership and business impact in combining software engineering skills with systems engineering, skills to solve complex automation and reliability challenges.
  • Hands-on experience with monitoring tools and development, including New Relic, SumoLogic, Pingdom, CloudTrail, etc.
  • Familiarity with building and managing production environments, with a focus on reliability and scalability.
  • Experience working with relational databases (MSSQL, MySQL, or Aurora MySQL) and NoSQL databases (ideally DynamoDB).
  • Knowledge of SLOs, and SLIs to measure and manage system performance.
  • Experience with CI/CD pipelines and automation scripting, using tools like Jenkins.
  • Familiarity with on-call rotation processes and managing critical incidents.
  • Ability to quickly learn new tools, languages, and technologies.
  • Strong communication and leadership skills.
  • Experience in resource administration and prioritizing tasks effectively.
  • A team player who thrives in a collaborative environment.
  • Proactive mindset, eager to innovate and solve complex challenges.
WHY WORK WITH US?