๐จ๐ปโ๐ป Senior Software Engineer โ SRE
Alkira
Est. Salary: โน20 Lacs / year
Posted on: 27 Aug
Job Description
Are you passionate about building cloud networking infrastructure? Do you live and breathe Kubernetes and microservices? Join our innovative team at Alkira, Inc., a Network Infrastructure On-Demand company. As we continue to reinvent networking for the multi-cloud era and as our customers continue to grow with us, we are expanding our Engineering team in India.
We are looking for Site Reliability Engineers who can manage, maintain and troubleshoot Alkira's world class cloud networking solution round the clock. In this role, you will work in a product company where you get to sharpen your existing skills and get exposed to a wide range of technologies and constructs ranging from microservices, devops methodologies, Kubernetes, Terraform, data networking and security.
Responsibilities:
You will be responsible for the availability and integrity of the infrastructure that underpins Alkiraโs Cloud Networking platform
You hold the production systems together; troubleshoot issues that arise in production deployment
Provide 24x7 coverage as a part of scheduled shift and on-call rotation
Work with multiple tools like Prometheus, Grafana, Jira etc. to monitor, manage, triage and document infrastructure issues in real time
Mentor and guide junior engineers in best practices for DevOps, SRE, and cloud technologies, fostering a culture of knowledge sharing and continuous learning.
Automate infrastructure deployment using CI/CD
Collaborate with software engineering teams to improve deployment processes, performance tuning, and application scalability.
Build necessary tools to evolve how we maintain and monitor our solution
Develop and execute system and integration test plans
Contribute to the development and documentation of standard operating procedures, runbooks, and incident response plans.
Requirements:
At least 5 yearsโ experience in management of production systems
Self starter and a solution oriented mindset. You see potential challenges as opportunities to learn and grow
Very strong hands-on experience with Linux systems
Experience with cloud providers, AWS, Azure or GCP
Have worked in a 24x7 operations environment
Experience with monitoring and logging solutions (e.g., Prometheus, Grafana) and incident management processes.
Experience with computer networking and network technologies
Experience with CI/CD pipelines such as Concourse-CI, Jenkins.
Experience with Kubernetes
Excellent problem-solving skills and ability to quickly grasp new concepts
Highly desirable - Hashicorp Certified: Terraform Associate