Share this Job

Senior Site Reliability Engineer


Detroit, MI, US

Company:  DTE Eng Corp Svcs LLC
Job ID:  8817

DTE has been fueling communities for over a century as one of the country’s largest diversified energy companies.

At DTE, our employees improve lives with their energy.

Together, our company is changing the future of energy.

We strive daily to fulfill our world’s vision of a clean, sustainable future and our investments in renewable energy and emission control technologies are just the start of delivering impact-free, reliable power to our customers and communities.

As part of our journey – we invite you to join us as we launch a Digital Factory to reimagine the energy industry.  

This new, strategic unit will drive DTE’s digital innovation, sourced from agile practices, unconstrained thinking, user-centricity, bias towards speed, and enthusiasm for experimentation.

And our Digital Factory team will deliver energy solutions that best serve our customers, employees, communities and investors while simultaneously contribute to building a cleaner world.

Whether your passion is design, data, engineering, or agile, we invite you to help shape DTE’s new strategic unit – our future is You.

Are you ready to make that kind of difference? – bring Your energy to DTE.

Together, we can achieve great things.

Job Summary

Leads software and systems engineering to build and run fault-tolerant systems. Oversees development teams and business partners in implementing enhanced monitoring and alerting capabilities for applications.  Ensures internal and external critical and visible systems have reliability with uptime appropriate for user needs and fast rate of improvement. Oversees availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning.  Provides guidance to less experienced team members. Span of control 0; individual contributor.

Key Accountabilities

  • Leads agile teams and business partners in developing specifications that resolve problems and enhancement needs, including focusing on monitoring, and metrics for operational readiness
  • Works with technology business and development partners to gather inputs to develop new capabilities in displaying/monitoring/alerting on key performance indicators by tracking business transactions in real-time
  • Provides specialized technical guidance and expertise on operational technology stability readiness through the continuous improvement in our products
  • Uses technical knowledge, creativity, and company practices to drive down occurrences of incidents through the development of proactive monitoring and alerting
  • Provides continuous feedback to development teams on system stability, defect analysis, and system enhancements
  • Develops plans for validation and verification of changes deployed by infrastructure teams, development teams and sustainment team
  • Implements sustainable, audit-ready processes that support information technology controls, including deployment execution, access management, audits, incident management, and related requirements
  • Develops runbooks and patterns to sustain applications in a production environment
  • Facilitates technical discussions and drives transition to sustain activities with the development and production operations teams

Minimum Education & Experience Requirements

  • This is a multi-track base requirement job; education and experience requirements can be satisfied through one of the following three options:
    • Bachelor’s degree with 7 years of experience in site reliability engineering and/or related environment; OR
    • Associate degree with 9 years of experience in site reliability engineering and/or related environment; OR
    • High school diploma or GED with 11 years of experience in site reliability engineering and/or related environment


Other Qualifications


  • Experience with coordination between upstream applications to resolve incidents


Other Requirements: 

  • Deep knowledge of enterprise-scale platforms and architectures
  • Advanced analytical, problem-solving, and leadership skills
  • Experience with multiple programming and database applications (e.g., Unix and Windows platforms, Java EE, JavaScript, Spring, Spring Boot, REST API/Micro Services, Shell Scripting, Python, PL/SQL and Oracle)
  • Must be available to perform a primary assignment in support of DTE’s emergency response to storms or other events that impact service to our customers.

Additional Information

Incumbents may engage in all or some combination of the activities and accountabilities and utilize a variety of the competencies cited in this description depending upon the organization and role to which they are assigned. This description is intended to describe the general nature and level of work performed by incumbents in this job. It is not intended as an all-inclusive list of accountabilities or responsibilities, nor is it intended to limit the rights of supervisors or management representatives to assign, direct and control the work of employees under their supervision.

At DTE Energy, we are committed to providing an inclusive workplace where everyone feels welcome and a sense of belonging. We seek individuals with a heart for service, a passion to help our communities prosper, and ideas to help shape the future of energy. We are proud to be an equal opportunity employer that considers all qualified applicants without regard to race, color, sex, sexual orientation, gender identity, age, religion, disability, national origin, citizenship, height, weight, genetic information, marital status, pregnancy, protected veteran status or any other status protected by law.

Nearest Major Market: Detroit