Software Engineer LMTS (Site Reliability Engineering) Job at Salesforce, Inc.., San Francisco, CA

T1h3UjFxSkl3cmc0ZHQxNERUeVorRHMxZ0E9PQ==
  • Salesforce, Inc..
  • San Francisco, CA

Job Description

This candidate must be a U.S. citizen (U.S. born or naturalized) operating on U.S. Soil who does not hold dual citizenship with the ability to meet customer and government screening standards applicable to this role. This position requires onsite presence in either Boston, San Francisco or Bellevue offices. As a Software Engineer in Site Reliability Engineering (SRE) at MuleSoft, you will be part of a high-impact team focused on architecting, building, and scaling the infrastructure, tools, and platforms that improve the resiliency, reliability, performance, and scalability of distributed systems running on the MuleSoft Anypoint Platform. This is a software engineering-driven role, where you'll write production-grade code to automate operations, enhance observability, and strengthen service resilience—especially in high-security environments, including FedRAMP, Protected B, among others. Your work will span the entire stack: from shaping engineering practices and building proactive failure-prevention mechanisms to streamlining deployment pipelines and improving the end-to-end reliability of mission-critical services. As stewards of observability, incident management, release automation, and reliability engineering, our team’s mission is to embed resiliency and reliability into every layer of the system and consistently exceed industry standards for uptime, latency, and performance. What You’ll Be Doing Engineering Resiliency and Reliability: Design and develop systems, libraries, and tools that strengthen the resiliency and reliability of distributed services running on the MuleSoft Anypoint Platform. Observability by Design: Develop and extend monitoring, logging, and alerting capabilities using industry-standard observability platforms (e.g., metrics, tracing, and log aggregation tools) to ensure issues are detected and diagnosed before they impact customers. Automation at Scale: Write production-grade code in Python, Go, or similar languages to automate operational tasks, scale deployment pipelines, and implement self-healing systems. Incident Response & Prevention: Participate in on-call rotations, drive root cause analysis, and deliver software-based solutions that prevent recurrence and reduce meantime to recovery (MTTR). Platform and Infrastructure Development: Build internal platforms, shared APIs, and systems that enhance developer velocity while improving overall system resilience and operability. CI/CD and Deployment Engineering: Optimize and evolve our CI/CD pipelines using Jenkins, Spinnaker, and infrastructure-as-code tools such as Terraform and Kubernetes to enable safe and frequent delivery. Security and Compliance as Code: Develop and maintain automated solutions to meet FedRAMP, Protected B, and other regulatory requirements—integrating security and compliance directly into deployment workflows. Collaborative Reliability Advocacy: Work closely with product engineers, platform teams, and security stakeholders to influence architectural decisions and bake reliability into all layers of the stack. Runbooks and Design Documentation: Create and maintain high-quality documentation for systems, processes, and playbooks to promote operational excellence and team scalability. Requirements: 8+ years of experience in Software Engineering, SRE, or DevOps roles, with a strong focus on building resilient, scalable, and highly available systems. Proven proficiency in Java, Python, Go, Bash, with experience writing production-quality, maintainable, and testable code for infrastructure and platform automation. Hands-on experience with infrastructure as code, CI/CD pipelines, and deployment automation using tools like Terraform, Jenkins, and Spinnaker. Proven experience architecting, developing, and operating systems in cloud-native environments (AWS) and managing containerized workloads with Kubernetes. Strong understanding of observability engineering, including instrumentation, metrics, logging, and distributed tracing—experience with OpenTelemetry, Grafana, Splunk, Sumo Logic, or similar platforms. Solid knowledge of distributed systems, network protocols (TCP/IP, DNS, TLS), and API design standards (REST, RAML, OAS). Demonstrated ability to diagnose complex system issues, design for fault tolerance and high availability, and continuously improve reliability through software. Familiarity with compliance-bound environments, including FedRAMP, Protected B, or similar, and experience incorporating security and compliance into engineering workflows. A passion for engineering reliability through software—you drive automation, eliminate toil, and foster a culture of operational excellence. A related technical degree required. Preferred: Experience with chaos engineering, fault injection, or reliability gamedays to proactively validate system resilience and recovery readiness. Background in platform-as-a-service (PaaS), internal developer tooling, or building self-service infrastructure that accelerates engineering productivity. Prior experience operating in hybrid or multi-cloud environments, with a focus on portability, automation, and infrastructure standardization. This candidate must be a U.S. citizen (U.S. born or naturalized) operating on U.S. Soil who does not hold dual citizenship with the ability to meet customer and government screening standards applicable to this role, including a Criminal Justice Information Services screening with fingerprint scan. Due to the citizenship requirements for this role, which supports U.S. federal, state, and/or local government customers, citizenship will be verified through two of the following REAL ID Act documents: U.S. Passport, Passport Card, REAL Driver’s License, Global Entry Card, U.S. Government CAC/PIV. You agree to complete a Minimum Background Investigation (MBI) for a Moderate Public Trust position with the U.S. federal government and gain other clearances as deemed appropriate for the role. Benefits & Perks Check out our benefits site which explains our various benefits, including wellbeing reimbursement, generous parental leave, adoption assistance, fertility benefits, and more. Salesforce Information Check out our Salesforce Engineering Site . #J-18808-Ljbffr Salesforce, Inc..

Job Tags

Local area,

Similar Jobs

Class 101 College Planning - SW Missouri

Teacher Job at Class 101 College Planning - SW Missouri

 ...license a plus although not required for subject tutoring - Recent college grads, current teachers, and retired professionals are...  ...apply. Related subject experience in the subject area (including graduate studies) will be beneficial and considered Must be a caring... 

TrackFive

Travel Nurse RN - Med/Surg - $2,450 to $3,250 per week in Utica, NY Job at TrackFive

 ...experienceBenefitsBonus Details 8-12 week duration: $300 completion bonus 13 week duration: $500 completion bonusAbout IMS One WorldIMS is a global offshore recruitment service provider with over 18 years of expertise.As an offshore recruitment partner, we empower staffing firms in the... 

CSI Property Management

Affordable Housing Compliance Specialist Job at CSI Property Management

Career Strategies Inc. seeks an Affordable Housing Compliance Specialist with experience assisting Low Income Housing applicants for a Jacksonville apartment community.Rate of Pay: $14-17 (Depending on Experience)Responsibilities:Monitor, implement, coordinate, and... 

American Public Media Group

Copy Editor, LAist Job at American Public Media Group

Southern California Public Radio is seeking a diligent copy editor to polish stories for LAist.com and On-Air for KPCC. This copy editor will also work with the newsroom on evolving and adhering to our dialogue style guide.Strong news judgment and the ability to prioritize... 

Good Shepherd Rehabilitation

Inpatient Coder Job at Good Shepherd Rehabilitation

 ...~ Reviews patient records, assigns diagnostic and procedural codes, performs related functions and participates in Performance Improvement...  .... ESSENTIAL FUNCTIONS DIAGNOSTIC CODING OF ALL MEDICAL RECORDS REPORTED ON PATIENT BILLS By coding all diagnoses, treatments...