Skip to main content

Sr. Site Reliability Engineer

    • Remote (United States)
  • Digital, Data & Technology

Description

At Thrivent, we are focused on a digital transformation that will deliver modern, innovative experiences for our clients, financial advisors, and employees. We are investing in data and technology, using DevOps practices, and building an engineering culture of empowered technical experts. Our technologists are involved in work that includes cloud native development, digital architecture and integration, automation, cloud data platforms, artificial intelligence, and machine learning as well as maximizing platforms such as Salesforce, AWS and Microsoft.

We are looking a Lead Site Reliability Engineer (SRE) to join our team who can, establish best practices, and deliver on application observability and monitoring processes for our Digital Client Experiences. As a Lead SRE, you should have experience with infrastructure focused software development along with a deep understanding in monitoring, alerting, reliability, infrastructure (cloud/on premise), debugging, product engineering and security. SREs will be embedded with our DevOps teams to introduce and define SRE principles, establish reliability goals, and develop tooling for operational observability.

Site Reliability Engineers are responsible for influencing systems reliability and scalability practices across enterprise. In this role, there is a strong focus on building the tooling and integrations necessary to easily onboard services. There will be a mix of platform and application-level work to support out-of-the-box visibility, monitoring, and dashboarding. Not only is our lead responsible for hands on establishing tooling, principles, etc., but also partners with Product and Engineering Leadership to demonstrate the reliability of our client facing applications. This Sr. Engineer will help others understand Site Reliability Engineering as a practice, how it can benefit Thrivent as a whole, along with how they can support individual teams.

Job Duties and Responsibilities:

  • Research new and inventive ways to improve the overall reliability of sites, services, applications, and infrastructure using customer focused, data driven and metrics-based software engineering approaches.
  • Evangelize, design, and deploy SRE (Systems Reliability Engineering) concepts, methodologies, and practices using automation, targeted engagements, and other light touch consulting with engineering, product, infrastructure, and other teams.
  • Deploy, configure, and consult on application/infrastructure monitoring, dashboards, observability, telemetry, logging, tracing, alerting, platform integrations, and other technology as required and recommended.
  • Define, code, and publish standards for modern KPI’s and velocity/reliability control mechanisms using SLO’s, SLI’s, error budgets, and other OKR’s recommended by SRE principles, and as agreed to by key stakeholders.
  • Proactively recommend design changes to new and existing applications or infrastructure to increase reliability.
  • Proactively manage performance of applications and other systems within the environment.
  • Troubleshoot high visibility issues in production and other environments, applying SRE, debugging and problem-solving techniques.
  • Promote and drive a blameless culture across the company to obtain factual data and information to continuously improve overall reliability and performance of company technology assets.

QUALIFICATIONS & SKILLS:  

Required:

  • Bachelor's degree in Computer Science or other technical field or equivalent work experience
  •  7+ years of experience in engineering environments, taking abstract concepts and ideas and formulating a detailed site reliability plan
  • Sound knowledge of industry standard Software Development Life Cycle (SDLC) practices and ability to act as a technology evangelist, driving innovative engineering solutions.
  • Sound knowledge of version and revision control practices and procedures  
  • Sound knowledge of systems design concepts that provide security and stability   
  • Expertise in debugging code and/or complex log files for troubleshooting and analysis of product defects 
  • Expert knowledge/experience with querying databases for complex data lookup/update 
  • Perform code reviews with associate team members 

Preferred Job Qualifications:

  • Banking / Finance / Insurance experience
  • Bachelor’s degree or equivalent experience in computer science or related fields.
  • Deep experience with infrastructure focused software development along with a deep understanding in monitoring, alerting, reliability, infrastructure (cloud/on premise), networking, debugging, product engineering and security. 
  • Direct experience with tools and platforms like Grafana, ELK Stack, Datadog, Dynatrace, Splunk, SCOM, Oracle Enterprise Manager, Service Now, Azure Monitoring, AWS CloudWatch, etc., as examples.

We exist to help people achieve financial clarity. At Thrivent, we believe money is a tool, not a goal. Driven by a higher purpose at our core, we are committed to providing financial advice, investments, insurance, banking and generosity programs to help people make the most of all they’ve been given.  

At our heart, we are a membership-owned fraternal organization, as well as a holistic financial services organization, dedicated to serving the unique needs of our clients. We focus on their goals and priorities, guiding them toward financial choices that will help them live the life they want today—and tomorrow. 

Thrivent provides Equal Employment Opportunity (EEO) without regard to race, religion, color, sex, gender identity, sexual orientation, pregnancy, national origin, age, disability, marital status, citizenship status, military or veteran status, genetic information, or any other status protected by applicable local, state, or federal law. This policy applies to all employees and job applicants.

Thrivent is committed to providing reasonable accommodation to individuals with disabilities. If you need a reasonable accommodation, please let us know by sending an email to  human.resources@thrivent.com  or call 800-847-4836 and request Human Resources.

Sr. Site Reliability Engineer

Remote (United States)

Apply Now
Share

Related Jobs

Sr DevOps Engineer

  • Remote (United States)
  • Digital, Data & Technology

Sr Engineer

  • Remote (United States)
  • Digital, Data & Technology

Sr. Technical Product Owner

  • Remote (United States)
  • Digital, Data & Technology