Senior Site Reliability Engineer (Contractor) - (R429)
Masabi
- Buenos Aires
- Autónomo
- Tiempo completo
- Automation and Scalability: Drive automation to reduce operational overhead and human error. Build CI/CD pipelines, develop Infrastructure as Code (IaC) using tools like Terraform and CloudFormation, and design scalable systems to handle high traffic while optimising resource utilisation. Drive the effort to scale up new environments as we expand globally.
- Continuous Improvement: Refine processes, tools, and workflows to enhance system reliability, scalability, and efficiency. Plan capacity to anticipate future needs and support high-performance systems.
- Security and Compliance: Ensure infrastructure meets organisational security standards and supports compliance frameworks like SOC 2 and PCI.
- Monitoring and Reliability: Maintain real-time monitoring systems aligned with SLIs and SLOs, ensuring uptime and performance meet or exceed SLAs. Set up proactive alerting mechanisms to address issues before they escalate.
- Cost Optimisation: Monitor and optimise cloud infrastructure costs through autoscaling, rightsizing, and architectural reviews to balance cost-effectiveness with reliability.
- Disaster Recovery and Redundancy: Implement failover strategies, disaster recovery plans, and redundancy to ensure system resilience under all conditions.
- Incident Management: Respond to production incidents, minimise downtime, and restore availability. Perform root cause analysis, implement preventive measures, and contribute to post-incident reviews to share lessons learned.
- Collaboration and Mentorship: Partner with developers to design reliable, maintainable systems. Coach teams on best practices for reliability, scalability, and observability, fostering a culture of ownership.
- Documentation and Knowledge Sharing: Maintain detailed documentation for infrastructure, incident response, and workflows. Develop playbooks and runbooks to ensure seamless knowledge transfer.// Our platform is JVM-based and cloud-native, hosted on AWS. We utilise standard tooling, including Gitlab, Terraform, CloudFormation, Puppet, Kibana, Grafana and Confluent Cloud.**Key Tools and Technologies SREs Work With_**
- Monitoring: Grafana, Prometheus, CloudWatch, Pingdom, Kibana
- CI/CD: GitLab CI, Rundeck
- IaC: Terraform, CloudFormation
- Cloud Platforms: AWS**About You_**
- Significant experience in SRE or related roles, with a proven track record in building and maintaining reliable systems
- Expertise in AWS Cloud technologies
- Hands-on experience with Terraform and Grafana, along with strong knowledge of security principles and networking components
- Experience in building pipelines and robust CI/CD infrastructure
- A collaborative team player who approaches projects with an open mind and prioritises security
- Passionate about leveraging technology to drive advancements while ensuring reliability and security
- Excellent communication skills, a collaborative mindset, and a willingness to learn and contribute to team success
- Self-sufficient and capable of working independently, while also knowing when to seek support or input**Desirable_**
- Familiarity with PCI DSS v4 Compliance requirements is a plus
- AWS Cloud certification
- Experience with orchestrating containers**Careers at Masabi are for people going places - driven by a mission to make transit fair and accessible for all.**We are a network of innovators from all walks of life, passionate about making a difference.
Kit Empleo