We are the leading global information services company, providing data and analytical tools to our clients around the world. We help businesses to manage credit risk, prevent fraud, target marketing offers and automate decision making. We also help people to check their credit report and credit score, and protect against identity theft.
In 2019, for a fifth consecutive year, Experian has been named to Forbes Magazine’s Top 100 list of the “World’s Most Innovative Companies.”
We employ approximately 17,000 people in 44 countries and our corporate headquarters are in Dublin, Ireland, with operational headquarters in Nottingham, UK; California, US; and São Paulo, Brazil.
At Experian, we are committed to building an inclusive culture and creating an environment where people can balance successful careers with their commitments and interests outside of work. Our flexible working practices support our belief that this balance brings long-lasting benefits for our business as well as our people. Some roles lend themselves to flexible options more than others, and if this is important to you, we are open to discussing agile working opportunities during the hiring process.
We are currently looking for a Senior Site Reliability Engineer to join our Experian Decision Analytics team.
As a Senior Site Reliability Engineer you will lead the team’s technical vision bridging the gap across platforms, infrastructure, automation and software. You will be able to review and design non-functional requirements, prioritise key areas of operational architecture and guide both operational staff and software feature engineers on SRE best practice.
- Uptime of Experian One – Experian’s Cloud SaaS offering for Decision Analytics
- Enhancement and automation of the Monitoring and Alerting of our platform
- Responding to incidents and restoring services, but also identifying issues before they happen
- Gaining a strong understanding of the systems to efficiently triage issues and find owners for problem resolution
- Incident management; able to co-ordinate others and be co-ordinated during service disruptions with a focus on restoring availability
- Reviewing systems designs and implementations to identify resiliency, scalability and monitoring issues prior to implementation
- Strong relationships with other members of the SRE team, primary based in Kuala Lumpur but also London, Arizona, Sofia
- Working relationships with colleagues in other departments, third parties who support backing applications
- Collaborative relationships with developers, security and architects to influence them to build resilient, maintainable solutions
- 5+ years of experience in supporting complex, highly scaled systems in production
- Proven experience with networking, troubleshooting and monitoring
- Experience with incident management and coordination
- Ability to identify an issue or a manual process and ensure that they never occur again
- Ability to write complex queries using various tools
- Ability to identify high level root cause from symptoms, e.g. Networks, Application, Compute, Storage
- Strong Knowledge of Kubernetes, Infrastructure as Code, High availability principles
- Linux knowledge, experience troubleshooting and predicting issues in advance
- Knowledge on cloud native application designs for high performance, scalability and resilience
- Knowledge on OpenShift, Splunk, Dynatrace, Thousand Eyes, ServiceNow, Jira, Jenkins, Python
- Knowledge on Java, Cassandra, Redis, RunDeck, MongoDB, Apigee, Okta, PostGres, AWS, Azure, GCP
- Knowledge on Git Ops
- Line management, coaching or mentoring experience
- Curious, willing and able to learn new technologies and practices
- Cloud aware, you understand how cloud technologies differ from other technical approaches and are able to explain these to others
- Lives and breathes availability and operational excellence in technology
- There is an expectation to work some weekends as well as some on call requirements
- Excellent communication skills in English
Is this you?
- You strive to remove repetitive tasks from your daily existence
- You are a keen following of technology trends
- You believe that software is to be used not to be admired
- You solve for the future as well as the immediate
- You empower others to deliver
- You develop trust, you make conflict constructive, create commitment, drive accountability and drive results
- You are articulate, clear, concise, and you can tailor your approach to the audience
- You can manage stakeholders at all levels and influence decision making
- Personal Development – career pathway for professional growth supported by learning and development programs and unlimited access to online educational training courses, learning materials & books
- Work environment – excellent work conditions with friendly environment, recognized strong team spirit, and fun and quality recreation time
- Social benefit package – life insurance, food vouchers, additional health insurance, corporate discounts, Multisport card, and a Share options scheme
- Work-life balance – 25 days paid vacation and 3 additional paid days for participation in Social responsibility events
- Opportunity for flexible working hours and telecommuting
In order to stay safe and be responsible, we introduce a remote hiring process with online interviews for all candidates