Software Engineer LMTS (Site Reliability Engineering) job at Salesforce, Inc.. San Francisco, CA, US

Job Description

This candidate must be a U.S. citizen (U.S. born or naturalized) operating on U.S. Soil who does not hold dual citizenship with the ability to meet customer and government screening standards applicable to this role. This position requires onsite presence in either Boston, San Francisco or Bellevue offices. As a Software Engineer in Site Reliability Engineering (SRE) at MuleSoft, you will be part of a high-impact team focused on architecting, building, and scaling the infrastructure, tools, and platforms that improve the resiliency, reliability, performance, and scalability of distributed systems running on the MuleSoft Anypoint Platform. This is a software engineering-driven role, where you'll write production-grade code to automate operations, enhance observability, and strengthen service resilience—especially in high-security environments, including FedRAMP, Protected B, among others. Your work will span the entire stack: from shaping engineering practices and building proactive failure-prevention mechanisms to streamlining deployment pipelines and improving the end-to-end reliability of mission-critical services. As stewards of observability, incident management, release automation, and reliability engineering, our team’s mission is to embed resiliency and reliability into every layer of the system and consistently exceed industry standards for uptime, latency, and performance. What You’ll Be Doing Engineering Resiliency and Reliability: Design and develop systems, libraries, and tools that strengthen the resiliency and reliability of distributed services running on the MuleSoft Anypoint Platform. Observability by Design: Develop and extend monitoring, logging, and alerting capabilities using industry-standard observability platforms (e.g., metrics, tracing, and log aggregation tools) to ensure issues are detected and diagnosed before they impact customers. Automation at Scale: Write production-grade code in Python, Go, or similar languages to automate operational tasks, scale deployment pipelines, and implement self-healing systems. Incident Response & Prevention: Participate in on-call rotations, drive root cause analysis, and deliver software-based solutions that prevent recurrence and reduce meantime to recovery (MTTR). Platform and Infrastructure Development: Build internal platforms, shared APIs, and systems that enhance developer velocity while improving overall system resilience and operability. CI/CD and Deployment Engineering: Optimize and evolve our CI/CD pipelines using Jenkins, Spinnaker, and infrastructure-as-code tools such as Terraform and Kubernetes to enable safe and frequent delivery. Security and Compliance as Code: Develop and maintain automated solutions to meet FedRAMP, Protected B, and other regulatory requirements—integrating security and compliance directly into deployment workflows. Collaborative Reliability Advocacy: Work closely with product engineers, platform teams, and security stakeholders to influence architectural decisions and bake reliability into all layers of the stack. Runbooks and Design Documentation: Create and maintain high-quality documentation for systems, processes, and playbooks to promote operational excellence and team scalability. Requirements: 8+ years of experience in Software Engineering, SRE, or DevOps roles, with a strong focus on building resilient, scalable, and highly available systems. Proven proficiency in Java, Python, Go, Bash, with experience writing production-quality, maintainable, and testable code for infrastructure and platform automation. Hands-on experience with infrastructure as code, CI/CD pipelines, and deployment automation using tools like Terraform, Jenkins, and Spinnaker. Proven experience architecting, developing, and operating systems in cloud-native environments (AWS) and managing containerized workloads with Kubernetes. Strong understanding of observability engineering, including instrumentation, metrics, logging, and distributed tracing—experience with OpenTelemetry, Grafana, Splunk, Sumo Logic, or similar platforms. Solid knowledge of distributed systems, network protocols (TCP/IP, DNS, TLS), and API design standards (REST, RAML, OAS). Demonstrated ability to diagnose complex system issues, design for fault tolerance and high availability, and continuously improve reliability through software. Familiarity with compliance-bound environments, including FedRAMP, Protected B, or similar, and experience incorporating security and compliance into engineering workflows. A passion for engineering reliability through software—you drive automation, eliminate toil, and foster a culture of operational excellence. A related technical degree required. Preferred: Experience with chaos engineering, fault injection, or reliability gamedays to proactively validate system resilience and recovery readiness. Background in platform-as-a-service (PaaS), internal developer tooling, or building self-service infrastructure that accelerates engineering productivity. Prior experience operating in hybrid or multi-cloud environments, with a focus on portability, automation, and infrastructure standardization. This candidate must be a U.S. citizen (U.S. born or naturalized) operating on U.S. Soil who does not hold dual citizenship with the ability to meet customer and government screening standards applicable to this role, including a Criminal Justice Information Services screening with fingerprint scan. Due to the citizenship requirements for this role, which supports U.S. federal, state, and/or local government customers, citizenship will be verified through two of the following REAL ID Act documents: U.S. Passport, Passport Card, REAL Driver’s License, Global Entry Card, U.S. Government CAC/PIV. You agree to complete a Minimum Background Investigation (MBI) for a Moderate Public Trust position with the U.S. federal government and gain other clearances as deemed appropriate for the role. Benefits & Perks Check out our benefits site which explains our various benefits, including wellbeing reimbursement, generous parental leave, adoption assistance, fertility benefits, and more. Salesforce Information Check out our Salesforce Engineering Site . #J-18808-Ljbffr Salesforce, Inc..

Job Tags

Local area,

Similar Jobs

GardaWorld Corporation

Security Officer - Patrol Driver Job at GardaWorld Corporation

...GardaWorld Security Services is Now Hiring a Response Security Officer! Ready to suit up as a Response Security Guard? What matters most... .... Tell us about how you embrace change a Security Officer, Patrol, Officer, Driver, Security, Security Guard GardaWorld Corporation

SSM Health

Registered Nurse - RN - Surgery Job at SSM Health

...variety of components including relevant experience, labor market, and other qualifications.... ...Job Summary: Delivers professional nursing care in the operative setting to patients... ...American Heart Association (AHA) And Registered Professional Nurse (RN) - Illinois...

Jobleads-US

Trade Support Analyst Job at Jobleads-US

...responsibility of this position will require strong project management and excel skills (e.g., vlookups, managing large data sets) to help track existing Trade Support workflows.Beyond tracking, the individual is expected to think strategically and drive transformation...

Community College of Baltimore County

Theatre Director - Adjunct Faculty Job at Community College of Baltimore County

...certificate, and workplace certification programs at 3 campuses plus 3 convenient CCBC centers along with off-site community locations.The Theatre Director will direct one of CCBCs academic theatre productions. Directors will collaborate with the Theatre Faculty to select a...

Matlen Silver

UI/UX Designer Job at Matlen Silver

...Job Title:UI/UX Designer Duration: 6 Months (Possible Extension) Location:Jersey City, NJ ***Due to client requirements this role is only open to USC or GC candidates*** No C2C Job Summary: Requirements: ~6+ years of experience in web or digital...