Our Ability Jobs

Job Information

National Grid Site Reliability Engineering Lead - ESO in Wokingham, United Kingdom

About the role

This is the time for action – we need to move at pace and reform our energy system – to deliver energy security, to tackle climate change and to protect our people.

National Grid ESO Digital, Data and Technology (DD&T) team is seeking an experienced Site Reliability Engineer to contribute to the operation, support and improvement of some of the UK’s newest most critical IT systems.

Join us in pioneering a cutting-edge SRE function within a transformative IT program reshaping the UK's energy landscape. Play a critical role in a nationally significant initiative, directly contributing to maintaining the country's energy operations. This is an opportunity to literally ensure the lights stay on across the nation.

As a Lead Site Reliability Engineer, you'll delve into the proactive management and operational excellence of our production platform, steering the reliability and security of Critical National Infrastructure (CNI) with cutting-edge SRE principles.

You'll be pivotal in validating and securing releases, managing change, and ensuring operational security, all while maintaining agile delivery. Your role will not only focus on maintaining system reliability and availability but also on driving continuous improvement through innovation

This role is based from Wokingham.

About us

The ESO has a key role to play in tackling climate change by transitioning GB’s electricity system to net zero. We already operate the fastest decarbonising electricity system in the world, with an ambition for zero carbon operation by 2025. And by 2035, we want to run 100% clean, green energy, all the time.

Becoming the National Energy System Operator

In Summer of 2024, the ESO will transition to becoming the National Energy System Operator, or NESO for short. Previously denoted as the Future System Operator (or FSO), the new National Energy System Operator will be the independent body responsible for planning Great Britain’s electricity and gas networks and operating the electricity system.

The ESO, including all of its existing roles, will be at the heart of the new National Energy System Operator. As NESO, we will build on our existing roles, capabilities, and ways of working significantly to create an organisation the energy system and its users’ need. Our new capabilities will enable us to look across vectors, including electricity, natural gas and hydrogen, and crucially consider the trade-offs between them.

The organisation will be set up as a public corporation with its own Board of independent directors, with complete operational independence from government, the regulator and any and all commercial interest. As the ESO are today, NESO will be licenced and regulated by Ofgem through price control agreements and obligated to identify optimal solutions to system operations and planning in the most sustainable, affordable and secure way for all.

The time to deliver is now. As part of our team, you won’t just be touching the lives of almost everyone in Great Britain – you’ll be shaping the way we use and consume energy for generations to come.

Key Accountabilities

  • Team Leadership: Direct and nurture a rotating team of 10+ engineers, fostering an environment of continuous learning, collaboration, and innovation. You'll be instrumental in mentoring team members, encouraging their professional development, and ensuring a high level of team performance and morale.

  • Strategic Roadmap Planning: Take ownership of the strategic planning and prioritisation of the SRE roadmap, aligning team efforts with business goals and operational requirements. Your strategic insight will guide the development and implementation of long-term solutions that enhance system reliability, efficiency, and security.

  • Agile Delivery: Champion Agile methodologies within your team, facilitating sprint planning, stand-ups, retrospectives, and reviews to ensure rapid, iterative delivery of high-value features. Your leadership will ensure that the team remains adaptable, responsive to feedback, and aligned with the evolving needs of the business and its stakeholders.

  • Stakeholder Engagement: Act as a key liaison between the SRE team and other departments including the control room, development, security, and operations. You will communicate complex technical issues and solutions to non-technical stakeholders, ensuring clear understanding and alignment across the organisation.

  • Operational Control: Ensure management of the production system, change management, and incident response. You'll also spearhead release engineering, including the validation of new features and ensuring the security and integrity of our production environment.

  • Innovation through Automation: Engineer solutions to automate IT operations tasks, enhancing system resilience and efficiency. You'll play a crucial role in identifying and mitigating vulnerabilities by analysing the security posture of the production environment and addressing Common Vulnerabilities and Exposures (CVEs).

  • Incident Management: Lead L2 support for incident responses, conducting comprehensive triages and blameless post-mortems. Your expertise will be key in measuring and communicating Service Level Objectives (SLOs)/Service Level Indicators (SLIs) and managing error budgets to foster a culture of continuous improvement.

  • Collaborative Engagement: Collaborate within and across teams, contributing to product development and rapid delivery. You'll also be responsible for handling auditors and ensuring compliance with Safe Agile principles, embodying an agile and responsive approach to change management.

Additional Responsibilities:

  • Security: Analyse and secure the production environment by identifying vulnerabilities, prioritising CVE remediation, and ensuring the implementation of best-practice security measures.

  • Safe Agile: Implement and advocate for Safe Agile practices, ensuring flexible and efficient project management and delivery.

  • Change Management: Manage and streamline change processes to minimise disruption and maintain system integrity, working closely with cross-functional teams to ensure seamless transitions.

About you

Must Have:

  • Comfortable presenting to audiences of varying seniorities

  • Team management capabilities

  • Proficiency in software development (Java, Python, Node.js)

  • Willingness to undergo SC clearance

  • Solid understanding of containerisation principles (e.g., Docker) and orchestration with Openshift or Kubernetes

  • Experience in production support and operational security analysis

Should Have:

  • Knowledge of IT Service Management (ITSM)

  • Capacity for on-call support

  • Understanding of Microservices Architecture

  • Familiarity with GitOps and experience in using Tekton Pipelines

  • Ability to measure and communicate SLO/SLI and manage error budgets

Could Have:

  • Background in Site Reliability Engineering (SRE)

  • Experience using the OpenShift Admin Console

  • Proficiency in monitoring tools like Grafana, Kibana, and Prometheus

  • Experience with ArgoCD/GitOps, Elasticsearch, Istio, Kafka, VMware, Ceph/OpenShift Data Foundation/Software-Defined Storage (SDS), Software-Defined Networking (SDN), HashiCorp Vault, Open Policy Agent, Artifactory, BitBucket

Qualifications:

  • A minimum of 4-5 years of experience in the technology industry, ideally with at least 2+ years in roles supporting business-critical systems.

What you’ll get

A competitive salary between £64,000 – £90,000 – dependent on experience and capability.

As well as your base salary, you will receive a bonus of up to 15% of your salary for stretch performance, 28 days annual leave as standard, and a competitive contributory pension scheme where we will double match your contribution to a maximum company contribution of 12%.

You will also have access to a comprehensive benefits package tailored to support your well-being and professional success. From a competitive salary to flexible work arrangements, we promote your work-life balance. Enjoy fit for purpose wellbeing and lifestyle offerings, ongoing skill development aligned to our Purpose and Values, and be part of a supportive community that values your individuality and where you can belong.

More information

This role closes on 27/05/2024 at 23:59, however we encourage candidates to submit their application as early as possible and not wait until the published closing date as this can vary.

We work towards the highest standards in everything we do, including how we support, value and develop our people. Our aim is to encourage and support employees to thrive and be the best they can be. We celebrate the difference people can bring into our organisation, and welcome and encourage applicants with diverse experiences and backgrounds, and offer flexible and tailored support, at home and in the office.

We're committed to building a workforce that represents the communities we serve, and a working environment in which each individual feels valued, respected, fairly treated, and able to reach their full

#LI-SP1

DirectEmployers