How we started DevOps in #komercka
Komerční Banka is a big Czech Bank with 1,6M retail clients and many corporate clients. In 2016 Komerčka was a typical corporation with heavy IT processes, separate business from IT. If you wanted to change something, you needed a project. Then Jan Juchelka came and started a #agile transformation. And this is how it begins….
This article is about our DevOps transformation, and in the first place, it serves as a big thank you! for many clever people. Without them would be our DevOps journey impossible. I really love work with you people and looking forward to creating our IT world more friendly, joyful, fast and resilient for our customers.
My name is Jindřich Kubát, and I’ve been working in KB for two years as Head of Development. I’m also the main DevOps leader and key budget owner of the DevOps transformation. This article describes our journey so far and some results we already achieved.
We do DevOps. No BizDevOps nor DevSecOps or another combination of these words. We’re thinking about DevOps like all the departments would be in. Business, Security, Architecture, QA, Operations, etc. DevOps is just a word for our culture of better cooperations and shared goals.
Our DevOps Goals
Goals are important. It’s something everyone should follow and should constantly check their own daily work to see if their work really pursues goals. Never underestimate the power of goals that leads to your vision. Without goals, everyone does some job the best they can, but one day you just realize that sometimes people fighting against each other because they have different goals. And that’s why big corporations are so ineffective because it’s difficult to manage goals for all of the people. That’s why I personally love and use SMART goals as personal goals for my managers and why I love OKR, which we use in KB on the company level.
For Our DevOps transformation, we’ve chosen goals from DORA. Our mission is to explain the goals and explain why are they important constantly. That was the first problem in the company because different departments had different goals. We also have constantly explained what kind of work follows the goals and what kind of work not.
Lead time for changes
We think about this goal as End-2-End lead time to changes even though it’s more difficult to measure it. It represents our ability to change fast and fail fast. That means we have had to optimize our Software Delivery Life Cycle (SDLC) heavily. It would be impossible without a “Cloud native environment”. Thankfully we had already our internal cloud based on OpenStack and Kubernetes. The first big change that hit our culture was a completely redesigned SDLC where instead of 10–15 different environments for every single NEW application, we have only four fixed environments! DEV -> QA -> STAGE -> PROD. Every app used many environments for development, testing, acceptations, performance tests, training, etc. It was super expensive to maintain all of them that led to the unavailability of non-prod environments. Developers had to use mocking instead of working integrations. We’ve changed that! QA is the first environment where the team can check stability, API contracts and that the app does what it should do. STAGE is a super stable pre-production environment where everyone can test everything. Stress tests, disaster recovery, Penetration Tests, Integration tests, and many others. Team guarantees that their application works for most of the day. Our ambition is to release to the PROD constantly, almost daily.
Our defined target for this goal is: Decrease an E2E lead time for changes from 100 to 20 days Q4/2025
Goal comment: We measure lead time from the idea that is added to the JIRA into the moment when it is delivered and enabled in the production. Measuring is done backwards for all ideas that were delivered.
We used to have three major releases per year. Yes, 3! All the company processes like education on the branches, testing, release campaign were designed around the big integration release where we released about 15 major monolithic applications and some relatives. In the past company did a big analysis of what would happen if we do 4! It failed. Many people didn’t want change, and some voted to do only two releases per year. Damm! I think many people thought I’m a lunatic when I came with the idea to do 11 releases per year and implement release trains for current application. It took almost a year, but we did it! With the help of many people, we made already our first shorter release in 2021, and I’m expecting the next one. This wouldn’t be possible without our work on test automation, non-prod environment stabilization, e2e responsibility for application in non-prod given to the development teams. It’s important to mention that release trains are designed for current big monolithic applications. For new applications, we’ve prepared a new SDLC with the ambition to do daily releases.
Our defined target for this goal is: Increase the number of releases from 8 releases/team in the year to 120 releases/team in 2025
Goal comment: We think about the release as anything that can affect production stability. All changes should be delivered via CI/CD pipeline. business features, bug fixes, data changes, configuration changes. All those changes increase the total number of releases and each release have to be properly categorized.
Mean time to recovery
The availability heavily depends on Mean Time To Failure (MTTF) and Mean Time to Recovery (MTTR). We used to invest a lot of money into MTTF. Our Operations departments was responsible of releases, maintenance, problem investigation, incident management, etc. That created a silo between DEV and OPER. Now we rather focus on teams autonomy, deployment automation, post mortem analysis. We improved our cooperation on problem investigation and problem-solving. As a result, we improved our availability on production, but more importantly, the availability of our applications on non-production which was a key enabler for speed up our delivery and productivity.
We also started invest into our Observability capability. Instead of CPU, MEM, I/O waits, and others, we rather watch GM: Errors, Latency, Total traffic, Saturation. We’re still at the beginning of our Observability journey, but now teams know that monitoring is important for their job and without monitoring, we’re blind. What you can’t measure, you can’t control.
Our defined target for this goal is: Decrease Time to restore service from 7 to 1 hour in 2025
Goal comment: We measure the time between delivering two business features and potential bug fixes. If a bug fix needed to be delivered than some problem needed to be fixed.
Change Failure Rate
The productivity of the IT work depends on when we realize we made a mistake. When the bug is in production, it’s late and usually expensive. You have to stop your current work and start working on bug fixing. That’s why we began to develop CI/CD pipelines and do the test during the deployment phase. CI/CD tools like Jenkins with combinations with other tools like Sonarcube have helped us to invest in automated tests. That was a necessary investment (and still is) to our SDLC. Now for many application, automated tests are part of their delivery. They can’t release without automated test. It’s part of our new “Way of work”
Our defined target for this goal is: Increase the number of apps of Change Failure Rate under 10 % from 67 % to 95 % in 2025
Goal comment: We measure the total amount of releases, number of business feature releases and number of bug fix releases.
Of course, I didn’t mention a lot of other work that is contributing to the main goals. As DevOps governance, measuring our DevOps journey, the work with other IT departments and Business people. The key vision for us is still delivering sustainable products instead of projects reliably and our vision that the only way how to achieve long-term production availability alongside short time to market is LEAN practices, Theory of Constraints, Continuous work on improvements and our ability to change fast and fail fast.
Sometimes it isn’t easy to see the results of our work when you are daily working on it. That’s why it’s sometimes important to look back and see all of the achievements we all made. Thank you again, people!