Senior Site Reliability Engineer (Remote) at Moniepoint Incorporated

Posted on Mon 18th May, 2026 - www.hotnigerianjobs.com --- (0 comments)

Moniepoint Incorporated is a global business payments and banking platform and recently became QED Investors’ first investment in Africa. We are the partner of choice for over 600,000 businesses of all sizes, powering the dreams of SMBs and providing them with equal access to the tools they need to grow and scale.

We are recruiting to fill the position below:

Job Title: Senior Site Reliability Engineer

Location: Remote

Job Summary

  • We are seeking an experienced SRE to engineer the reliability of our highly distributed platform. You will combine deep knowledge of distributed systems with strong coding skills to define SLOs, lead incident response, and build automation and self-healing mechanisms into our systems.
  • You will balance immediate operational stability with long-term strategic engineering to ensure our services scale linearly with our hyper-growth.

Responsibilities

  • Participate in on-call rotations as the primary technical lead. Act as the Incident Commander during major severity incidents: initiating war rooms, coordinating cross-functional teams, and providing clear status updates.
  • Instrument code to expose high-cardinality metrics and distributed traces. Collaboratively define, measure, and defend Service Level Objectives (SLOs) and Error Budgets with product owners.
  • Write high-quality, production-ready code (in Java, Go, or Python) to build internal tooling, automation platforms, and self-healing mechanisms that eliminate manual operator intervention.
  • Partner with Product Engineering teams during the design phase to ensure new services are built with reliability, scalability, and observability patterns (circuit breakers, rate limiting, backpressure, fallback strategies) from day one.
  • Analyze system performance and traffic patterns to model future capacity needs. Conduct load testing and chaos engineering experiments to verify system resilience under failure conditions.

Requirements

  • Minimum of 4 years of experience in SRE or Backend Engineering with a strong ability to write clean, performant, and tested code in Java, Go, Rust, or Python.
  • Deep understanding of distributed systems architecture and design patterns. You possess a strong command of microservices fundamentals, event-driven architectures, and the underlying principles required to build systems that scale.
  • Extensive experience with Google Cloud Platform (GCP) or similar cloud providers (AWS/Azure). You are proficient in running production workloads on Kubernetes (GKE/EKS) and troubleshooting cluster/infrastructure issues.
  • Experience designing observability strategies using OpenTelemetry, Prometheus, New Relic, Datadog, or SigNoz to improve system visibility.
  • Familiarity with operating and tuning production data stores (e.g., PostgreSQL, MySQL) and streaming platforms (e.g., Kafka, RabbitMQ) in a high-throughput environment.

What we can offer you

  • Culture - We put our people first and prioritize the well-being of every team member. We’ve built a company where all opinions carry weight and where all voices are heard. We value and respect each other and always look out for one another. Above all, we are human.
  • Learning - We have a learning and development-focused environment with an emphasis on knowledge sharing, training, and regular internal technical talks.
  • Compensation - You’ll receive an attractive salary, pension, health insurance, annual bonus, plus other benefits.

Application Closing Date
Not Specified.

Method of Application
Interested and qualified candidates should:
Click here to apply online

What to expect in the hiring process

  • A preliminary phone call with the recruiter
  • A technical interview with the Hiring Manager
  • A behavioural and technical interview with a member of the Executive team.