Softcom was established in 2007 with a mission to “connect people and businesses with meaningful innovation”. A technology company which aims to solve problems that will connect people and businesses to value that ultimately improves their lives. We want our products to enable inclusion and growth for people and businesses in Africa.
We are recruiting to fill the position below:
Job Title: Site Reliability Engineer
At Softcom Limited, we’re passionate about building software that solves problems.
As we expand our customer deployments, we are currently seeking an experienced SRE to deliver insights from massive scale data in real time.
Specifically, we are searching for someone who brings fresh ideas, demonstrates a unique and informed viewpoint, and enjoys collaborating with cross-functional teams to develop real-world solutions and positive user experiences at every interaction.
Objectives of this Role
Run the production environment by monitoring availability and taking a holistic view of system health
Build software and systems to manage platform infrastructure and applications
Improve reliability, quality, and time-to-market of our suite of software solutions
Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve
Provide primary operational support and engineering for multiple large distributed software applications
Daily and Monthly Responsibilities
Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
Partner with development teams to improve services through rigorous testing and release procedures
Participate in system design consulting, platform management, and capacity planning
Create sustainable systems and services through automation and uplifts
Balance feature development speed and reliability with well-defined service level objectives
Degree in Computer Science or a Technology-related field required.
3 years experience working in software engineering teams as a SRE or DevOps engineer.
Practical experience of computer operating systems such as MS Windows, UNIX/Linux a
Experience architecting, deploying and scaling production workloads on AWS using services such as EC2, S3, EKS, VPC, IAM etc.
Experience with containers and container orchestration tools such as Docker and Kubernetes.
Experience with CI/CD tools such as Jenkins, Bitbucket pipelines, AWS CodeDeploy, AWS CodeBuild or similar.
Experience with monitoring and observability tools such as ELK stack, Prometheus, Cloudwatch etc.
Experience with incident management tools such as Opsgenie, Pagerduty.
Experience automating infrastructure, testing, and deployments using tools like Terraform or Cloudformation and can explain the Infrastructure as Code paradigm.
Good understanding of Chaos Engineering, even if you haven't yet implemented it yourself yet.
Experience debugging complex problems.
Good understanding of computer networking and messaging, especially between services.
Has hands-on experience using source control (Git).
Has experience with a variety of databases. (MongoDB, PostgreSQL, MySQL).
A proactive approach to spotting problems, areas for improvement, and performance bottlenecks.
Excellent written and verbal communication skills and high level of personal integrity
Innovative thinking and leadership with an ability to lead and motivate cross-functional, interdisciplinary teams
Experience with contract and vendor negotiations and management including managed services.
Specific experience in Agile (scaled) software development or other best in class development practices.
Experience with Cloud computing/Elastic computing across virtualized environments.
Knowledge of relevant IT Security related hardware, software and vendor solutions.
Deep thinking analytical mind with the ability to quickly get to the root cause of issues.