Sergey Podgornyy – ottonova.tech

by Sergey Podgornyy - 2023-11-13

How We Transformed Our Daily Meetings for the Better

In the world of teamwork, our daily stand-ups play a crucial role in our collective success. We wanted to make things more efficient, and that’s when we found something awesome in the videos from “Development That Pays“: Walking the Board.

The Problem

We’ve all been there: the endless cycle of daily stand-ups where the team updates on what they did yesterday, what they’re working on today, and any blockers they’re facing.

While it seems like a straightforward approach, it has its drawbacks. The individual spotlight often overshadows the collaborative nature of the team event, with team members more focused on crafting their own stories than actively listening to others. This realization prompted us to seek a more effective alternative.

The Discovery

While browsing YouTube, I discovered “Development That Pays”. It proved to be a goldmine of helpful videos. Two videos, in particular, caught my attention: Daily Stand-up: You’re Doing It Wrong! and Agile Daily Standup – How To Walk the Board (aka Walk the Wall). These videos challenged our traditional approach and introduced an alternative — Walking the Board.

The “Walking the Board” Concept

Walking the Board, also known as Walking the Wall, offered a refreshing perspective. Instead of individual updates, the team starts with the ticket on the top right of the board. The team member assigned to that ticket provides an update, and we move on to the next ticket. This method shifts the focus from individuals to the work itself and changes the stand-up from a chain of individual updates to a team event.

This approach transforms the stand-up into a collaborative journey across the board, ensuring that the spotlight remains on the work itself, not on individual narratives. No more struggling to come up with a good story, the cards on the board provide the agenda. It’s a game-changer that keeps everyone focused on the work. It’s a shift from “What did I do?” to “What is the status of the work on the board?“

How To Walk the Board

Starting at the top right of the board makes financial sense — the items closest to being live are discussed first. If you’re familiar with the concept of net present value, you’ll understand that income now is more valuable than income later, and income tomorrow is more valuable than income next week.

The second reason for starting at the right is purely practical: we are going to move the cards across the board from left to right. By starting at the right, we create space for cards to move into.

Also, the “Development That Pays” video emphasized the importance of moments of glory — allowing team members to move their own cards and take pride in their progress. It’s not just about updating the board; it’s about actively participating in the collective journey.

At the end of the meeting, the board and the team’s understanding of its current “shape are up to date. Team members can also share non-board related topics or updates, fo example tasks not listed on the board.

Success Story: Implementing Walking the Board

Inspired by these videos, we decided to give “Walking the Board” a shot. The transformation was remarkable! Our daily stand-ups became more than just updates — they turned into collaborative sessions centered around the work on the board.

ELMO Rule

To keep discussions concise, we introduced the “ELMO” rule. ELMO stands for Enough, Let’s Move On. “ELMO” is a word that the guide and travelers may use to indicate a conversation is either off-topic or takes too long. If a discussion is going off track, anyone can simply say “ELMO”. This signals that we’ll discuss it later, outside our daily stand-up.

Secret Order

We established a specific order for leading the board and other meetings. This system not only enhances a sense of responsibility but also encourages shared leadership among the team.

The Result and Summary

Since adopting Walking the Board in the summer of 2020, our meetings have changed a lot. We have shifted away from giving individual updates. Instead, our focus is entirely on the work on the board. This change has made our stand-ups more productive and collaborative, as we’re now centered on the tasks at hand rather than individual narratives.

We switched from traditional stand-ups to “Walking the Board” because we wanted our meetings to be more efficient. The videos from “Development That Pays” played a key role in inspiring this change, showing us what wasn’t working with the old way and the benefits of a more team-focused approach. Now, Walking the Board is a regular part of our daily routine, making our meetings more focused, productive, and collaborative. If you’re looking to improve your stand-ups, the insights from “Development That Pays” are definitely worth exploring.

by Sergey Podgornyy - 2023-10-132023-11-13

Migrating from Self-Managed RabbitMQ to Cloud-Native AWS Amazon MQ: A Technical Odyssey

In the ever-evolving world of cloud-native solutions, it can be a daunting task to maintain message brokers. For a while, our team was responsible for a self-managed RabbitMQ instance. While this worked well initially, we encountered challenges in terms of maintenance, version updates, and data recovery. This led us to explore Amazon MQ, a fully managed message broker service offered by AWS.

In this article, we’ll discuss the advantages of both self-managed RabbitMQ and Amazon MQ, the reasons behind our migration, and the hurdles we faced during the transition. Our journey offers insights for other developers, who consider a similar migration path.

The Self-Managed RabbitMQ Era

Our experience with self-managed RabbitMQ was characterized by control, high availability, and the responsibility to ensure data integrity. Here are some of the advantages of this approach:

Total Control
Running your own RabbitMQ server gives you complete control over configuration, security, and updates. You can fine-tune the setup to meet your specific requirements: ideal for organizations with complex or unique messaging needs.
High Availability
It’s worth noting that our entity was running on AWS EC2, whose SLA guarantees only 99.99%, but de-facto we achieved a remarkable uptime rate of 99.999% with our self-managed RabbitMQ setup. The downtime was almost non-existent, which ensured a reliable message flow through our system. High availability is crucial for many mission-critical applications.
Data Recovery
Ironically, data recovery was a challenge with our self-managed RabbitMQ. In the event of a crash, we lacked confidence in our ability to restore data fully. This vulnerability urged us to consider Amazon MQ, a fully managed solution.

The Shift to Amazon MQ

As time passed, it became apparent that managing RabbitMQ was no longer sustainable for our team. Here are the primary reasons that drove us to explore Amazon MQ as an alternative:

Skills Gap
Our team lacked in-house experts dedicated to managing RabbitMQ, which posed a risk to our operations. As RabbitMQ versions evolved, staying up-to-date became increasingly challenging. This skill gap urged us to consider Amazon MQ, a fully managed solution.
AWS Integration
As an AWS service, Amazon MQ seamlessly integrated with our existing AWS infrastructure, providing us with a more cohesive and consistent cloud environment. It allowed us to leverage existing AWS services and tools, which resulted in a smooth migration process.
Managed Service
The promise of offloading the operational burden to AWS was enticing. Amazon MQ handles tasks like patching, maintenance, and scaling. This allows our team to focus on more strategic initiatives.
Enhanced Security
One key advantage of switching to AmazonMQ is its strong foundation on AWS infrastructure. This not only ensures robust security practices but also regular updates are integrated into the system. So, it gives us confidence, as we know that any potential vulnerabilities are under active monitoring and management.

The Amazon MQ Experience

While the move to Amazon MQ presented numerous benefits, we also encountered some challenges that are worth noting:

SLA Guarantees
Amazon MQ’s service level agreement (SLA) guarantees 99.9% availability. This is generally acceptable for many businesses but was a step down from our self-managed RabbitMQ’s 99.999% uptime. While the difference might seem small, it translated into more downtime. A trade-off we had to accept.
Limited Configuration
Amazon MQ abstracts many configuration details. This simplifies management for the most users. However, this simplicity comes at the cost of fine-grained control. For organizations with highly specialized requirements, this might be a drawback.
Cost Considerations
Amazon MQ is a managed service, which means there are associated costs. While the managed service helps reduce operational overhead, it’s crucial to factor in the cost implications when migrating.

What do three nines (99.9) really mean?

Here are my calculations according to Amazon MQ SLA:

if the monthly downtime is lower than ~43 minutes, they will charge 100% of the costs
if the monthly downtime is between ~43 minutes to ~7 hours, they will charge 90% of the costs of this downtime
if the monthly downtime is between ~7 hours to ~1day, they will charge 75% of the costs of this downtime
and if the monthly downtime is higher than ~1 day, they won’t charge any costs for this downtime

Conclusion

Our migration from self-managed RabbitMQ to Amazon MQ represented a shift in the way we approach message brokers. While Amazon MQ offered many benefits, such as reduced operational burden and seamless AWS integration, it came with some trade-offs, including a lower SLA guarantee and less granular control.

Ultimately, the decision to migrate should be based on your organization’s specific needs, resources, and objectives. For us, the trade-offs were acceptable given the advantages of a managed service within our AWS ecosystem.

The path to a cloud-native solution isn’t always straightforward, but it can lead to more streamlined operations and a greater focus on innovation rather than infrastructure management. Understanding the pros and cons of both approaches is vital for an informed decision about your messaging infrastructure.

As technology continues to evolve, it’s essential to stay adaptable and leverage the right tools and services to meet your business needs. In our case, the migration to Amazon MQ allowed us to do just that.

by Sergey Podgornyy - 2020-12-052022-05-03

How and why we updated RabbitMQ queues on production

In this article, I would like to share with you and the whole internet our experience of dealing with RabbitMQ Live updates. You will learn some details about our architecture and use cases. Let’s start from the simplest… Why do we need RabbitMQ in our business?

Backend with synchronous tasks processing

Our Architecture

As a health insurance company, our business depends on many different third-party services to analyze risks, process claimable documents, charge monthly payments etc. All these processes take some time to be processed, so to keep our services fast and autonomous from each other, we are using asynchronous processing of tasks that can be done in the background. This approach speeds up responses and allows to do more in the background, ie. email sending, policy creation, acceptance verification etc.

Whenever a client expresses some intent to the API by making a request to it, this intent can create follow-up tasks. These tasks do not need to be handled synchronously, i.e. they do not need to be handled while processing the initial request. Instead, we put a message about this intent onto the message queue where it can be picked up asynchronously by another process and handled independently from the original request.

Problem

But with great opportunities comes great responsibility. Message processing is very important and critical for our business. Some messages could expire without being consumed or inconsistent with queue restricted arguments. In theory, this should not happen or might happen in a very rare case. But as we are working with customers data, we do not want to lose important messages. To keep dead messages saved in the message broker and do not stuck them in the original queue, we are using dead-letter feature.

Messages are published to exchange and can be sent to multiple queues depending on the routing key. As you can see from the image above, we used the same dead-letter scheme as for the original queues, so dead messages may end up in the wrong dead-letter queues. It is not very critical if you pick up dead messages manually (considering that they are rare), but nevertheless, it is still strange to find these messages in the wrong place.

To solve this problem, we need to add a new argument to the properties of the queues, it is x-dead-letter-routing-key and it should be unique. As a unique value for the routing key, we can use the queue name itself. This idea brought our team one step closer to a good solution: we don’t need a dead-letter exchange anymore 🎉. To simplify it, we can use default nameless exchange "" with the dead-letter queue as the routing key and it will forward the message directly to the proper queue.

Dead-letter implementation with proper routing

Unfortunately, doing everything is not as easy as writing or talking about it 😒. To maintain the consistency and stability of the message broker, the RabbitMQ does not allow changing the arguments of already existing queues.

Deployment preparation

So, RabbitMQ does not allow you to change queue arguments in the runtime, so the only possible way to do it by removing queues and re-creating them again with updated arguments. But it is not possible in production, as we might lose some messages when they already removed, but new ones still do not exist. To solve this problem we need to introduce temporary queues to handle these messages, while old queues will be removed. For a simple system, this will be possible with 4 releases:

Create temporary queues, but do not handle messages from them for now.
Switch to the new queues and remove old queues. At this step, we already have a properly configured queues, but names are different. To return to old names, we need to do the same steps again.
Create new queues with old names, but with updated arguments. Do not consume messages from them for now.
Switch to the new queues with updated arguments.

4 releases, not a few, right? This requires not only a lot of small work, but also attention to make sure everything went right every time. How can we reduce them? 🤔

The simplest thing we can do is agree to rename the queues. This will reduce the number of releases by 2 times, since we will not need to rename them back. This was acceptable to us, and we even got more of it as we improved the message handling process. But that’s a completely different story 😉.

What else can you do? Enabling consumers and message handling in the new queues right away will reduce release count to only one, but we should accept the risk of duplicated messages when new queues already created but old ones are still processing.

At this point, I was stopped by the teammate, because I did not take into account the process of our deployment. We have blue-green deployment process, it’s when you have multiple instances of the same thing. And when you deploy, you take one down, upgrade, then put it up, then take the other one down to upgrade. This guarantees there is something always up. In our case, this means there is always a consumer there.

So, messages can definitely be duplicated if deployed during business hours. Deployment takes several minutes, which means that both old and new queues will be active for several minutes.

Time to analyze and decide whether it is safe to deploy the application at night (and do we really want to do it 🙂) when the message flow is low, or it is worth implementing a third-party service like a Redis to check if the message has already been processed by some consumer, old or new.

Release

The easiest way to check the load on our message broker is to check the number of logs by day of the week and time. Since we are a highly focused company working only in Germany, we have a very low message load from late evening to early morning.

It is not such a big highload as it could be, so we can accept the risk that some messages may be duplicated, but even if this happens, their number will be extremely small and we can manually solve them. This will save the resources and time that would be required for two releases.

After trying to release after midnight we found out that we couldn’t do it at night. Some of our third-party services are not available, so the container simply cannot be booted. Well, it was worth trying once, now we know it for sure. Nighttime for sleeping 😴.

But we can still do it late in the evening or early in the morning. One has only to pay attention to the RabbitMQ load.

Late in the evening:

Early in the morning:

We made the decision to press the release button early in the morning after a good night’s sleep. This time everything went fine and there were no duplicates.

It was not an easy way to solve this problem, but it was worth it. Solving this problem, our team and I learned a lot of interesting things about message consuming and deployment processes. Now it is even better than before, with correct queue settings and decoupled message handling 😎.

TL;DR

RabbitMQ does not allow to rename queues or change queue arguments;
to change something in the queue, you have to remove it and re-create;
to re-create it safe, you need to use temporary queues;
stable system could be run under multiple instances, so be aware of duplicated messages between old queues and new queues;
if your business is tied to one timezone and is not high loaded at night, it is acceptable to have duplicated messages instead of over-engineering your consumers.

by Sergey Podgornyy - 2020-08-102022-05-03

Attending GopherCon online

2020 will be remembered for a very long time by the quarantine and the accompanying restrictions. All events where there is a crowd of people have been cancelled and we are trying to adhere to all recommendations. It would seem that this year’s conference would be impossible. But tough times await new solutions, and now conferences are also moving online.

This innovative solution has its pros and cons. What I liked was:

ability to communicate with the speaker;
switch channels just in one click;
talks were recorded in advance, so speakers could answer questions in runtime;
talk to anyone you want in chat;
sitting in my favourite chair with two monitors;
slides or speaker monitor very clearly visible (people with poor eyesight will understand me 😉);
if you get bored, you can go about your business (conferences in the post-Soviet area are held on weekends, so you need to spend your personal time);
waking up late and no queues to get a pass.

Nevertheless, in addition to the pros, there were also disadvantages:

affiliate ads sound more intrusive and more like spam;
the platform they used for sharing had a few technical issues, so I met a lot of freezes;
only the winners of contests and quizzes can receive partner merchandise.

What about the conference

The conference is divided into two days. The first day was devoted to workshops, and the second day there were 2 tracks for talks. The overall level of the talks was quite high and I personally really liked it.

Workshops were held exclusively in Russian, so the audience was very limited (about 140 participants). But the talks were both in Russian and in English and were very reasonably distributed among the tracks (approximately the number of listeners on the stream was 150 and 80 per track).

Workshops

Observability in practice by Elena Grahovac

Quite an interesting and practical workshop, in which she showed by a practical example of how to log useful information using a uber-go/zap logger, tracing of application flow execution and gathering metrics using opentelemetry, visualization and analysis of the obtained data using jaeger.

The codebase available on GitHub, just use tags in this order clean, logger, tracer, meter and tools to follow the process:

rumyantseva / stayathome

TLA+/TLC: a practical tool for formal verification of algorithms that all gophers need to know for sure by Alexey Naidyonov

Despite the title, I personally think that this topic is important, but not so much that everyone should know it. It would be nice to know – yes, it can help you with your architecture planning, but for need – no, I don’t think so.

growler / gophercon-russia-2020-talk

TLA+ is a tool to design systems and algorithms, then programmatically verify that those systems don’t have critical bugs. It’s the software equivalent of a blueprint.

If you are interested to learn more, here are a few links for you:

The TLA+ Video Course by Leslie Lamport, author of TLA+ and PlusCal specification language
Learn TLA+
TLA+ Examples

If you are interested in a deeper study, then “Specifying Systems” and “Practical TLA+” books will serve as the best continuation for you.

Talks

Continuous profiling for Go applications by Mike Kabischev

Nice talk, started with an overview of profile types and basics profiling with runtime/pprof. Then several continuous profiling packages were compared, such as github.com/conprof/conprof and github.com/profefe/profefe.

Profiling is a part of observability, that’s why pprof should be always available, but net/http/pprof should be accessible in the different port.

Running net/http/pprof on the different port

As a follow-up you can also read Continuous Profiling of Go programs | by Jaana Dogan | Google Cloud – Community | MediumJaana Dogan ・ May 12, 2020 ・ 4 min read Medium

eBPF: Modern Introspection Capabilities in Linux by Marko Kevac

BPF is kernel-level profiling in Linux. It allows you to monitor what happens in the system, as Linux is an event-driven system and you can analyse these events with BPF program. The newer the version of your kernel, the more BPF features you can use. However, BPF is not fully adapted with Go, namely BPF program written in Go cannot work with the kernel part. The most commonly used package is iovisor/gobpf, but there are other alternatives like github.com/dropbox/goebpf and github.com/cilium/ebpf.

If you are interested and would like to know more, then it is best to read “BPF Performance Tools” and “Linux Observability with BPF” books:

Codegenerator in Go by Dmitriy Smotrov

Personally, I am too conservative for decisions such as code generation, as I prefer to do everything myself. Nevertheless, such solutions can speed up work on routine things, for example, describing a repository for a model, or writing tests for this model. In addition, it is important to note that Go has good functionality for such solutions.

Source code is available on GitHub

dsxack / gophercon2020

GoLand Tips & Tricks by Florin Patan

If you are using GoLand as an IDE for writing code, then the examples shown during the talk can be very useful for you.

Code samples can be found in the GitHub

dlsniper / golandtipsandtricks

Debugging concurrent programs in Go by Andrii Soldatenko

The talk was built on the use of the console version of the delve (dlv). Of course, GoLand will solve it for you as its debugger also uses devle, same as VSCode, but not everything from delve release will immediately appear in your IDE. So if you want to have a better and custom debugger, it is good to know how dlv works.

go-delve / delve

Slides can be found in the Dropbox

Go, please: language server under the microscope by Ilya Danilkin

A Language Server is meant to provide the language-specific smarts and communicate with development tools over a protocol that enables inter-process communication. The idea behind the Language Server Protocol (LSP) is to standardize the protocol for how such servers and development tools communicate. This way, a single Language Server can be re-used in multiple development tools, which in turn can support multiple languages with minimal effort.

In the past, there were many LSP implementations in Go, but over time, the Go core team developed the official LSP implementation gopls that we know today.

Slides can be found in slides.com

How to stop thinking about required fields and start writing contracts by Vladimir Serdyukov

The talk tells about the Buffer Protocol mechanism, invented by Google for serializing data structures. The speaker talked about the differences between proto2 and proto3, as well as how to use required fields in proto3. For validation, you can use either buf.build or github.com/uber/prototool.

golang / protobuf

In new projects and for better compatibility it is recommended to use proto3. apiv2 can and should be used, but prototool does not support it. buf.build looks promising, but plugins such as gogoproto lose their relevance.

Intro to AI for software engineers using go-learn by Miriah Peterson

GoLearn is an accessible ML library written primarily in Go with some C and C++. It uses with simple classification problems.

Checkout the examples

sjwhitworth / golearn

To learn more, go through the tutorials at

ardanlabs / training-ai

and

dwhitena / gc-ml

Growth of the open-source community: problems and solutions by Georgy Rylov

The speaker told how he organized a special course at the university and involved students in writing their project.

wal-g / wal-g

As a result, he summed up that students can write productive code in Go and it takes comparable time to review it as for regular developer. It is not necessary to have a curriculum in order to come to the university with your projects.

Generic Programming in Go by Vladimir Vivien, “Learning Go Programming” book author

The possibility of adding generics to Go is currently being developed. Preliminary, they should be expected no earlier than 2 years later.

Go core team assumes a level of performance in runtime, as generics should come with faster execution time. Nevertheless, compiler time may increase, but the Go core team are doing everything to keep compilation fast. Use of generics can be also complicated and the code with them may look unusual. Here is an example of using type parameters in functions:

fmt.Print(F(int)(param int))

The proposal can be found here:

vladimirvivien / go-generics-proposal

Examples using Go2 generics

Conclusion

I was pleased with the time spent listening to talks and workshops. In addition to the information from the official part, in the communication channels, I have gathered for myself several technologies that are worth paying attention to.

uber-go/zap logger might be a good alternative to the sirupsen/logrus which we are currently using at ottonova. Although it is simpler to implement and use, nevertheless its execution speed is several times lower than that of zap.
FluentD is an interesting alternative for LogStash. From a preliminary analysis of FluentD, it appears to be less resource-intensive and more flexible.
Observability is popular and demanded thing, and most of the conference was dedicated to it.

by Sergey Podgornyy - 2019-12-162022-05-03

Bulgaria PHP Conference 2019

Let’s talk about organization, preparation and venue first. From my point of view, the organizers did a lot to make this conference great, at least they tried to do their best. The conference, same as the workshop took place in the very center of the city, in the biggest public hall. It was quite easy to find it and to get there, either with public transport or by foot if you were staying in the city center. One day in advance I got an email with quite descriptive instruction about everything I should know: how to get there, recommended places to stay, what they prepared for attendees etc.

Unfortunately, I was a bit confused, because I did not figure out how to buy a ticket for the Workshop day if you already bought a conference ticket, when the workshop stream was not announced. Directly at the entrance to the workshop, there was a possibility to buy it, but I decided that it is not worth it and it is a bit expensive. Anyway, I am not sad about this fact, as conference organizers prepared a free of charge tour in the city and it was a good alternative.

On the conference day, everything started with registration, grabbing my personal badge, general community talk and breakfast. I felt pretty comfortable there as organizers always tried to take care of us: there was a lot of drinks and snacks there, lunch was served by a special catering company and in the afternoon they made homemade cakes for us.

And now more about the conference: it had 3 streams in parallel and in the afternoon one of these streams became unConf, where anyone could share something with everyone. The biggest stream had a lot of seats for all attendees, but not every talk assembled so many participants.

You have to know about me, that I do not believe I can learn something from any talk, because most of the things are already known from programming paradigms, web development and PHP in general. Usually, talks at conferences are just a shared experience, exploring new unknown stuff or repeating something like SOLID, caching and other. Everything you want to learn could be easily and faster found on the web, and if you missed some talks you could watch them later on YouTube, moreover, for free. Personally, all these conferences are just community spirit, free baubles and lunch. But this conference managed to absolutely surprise me!

The biggest discovery for me was a talk about modern SQL from Markus Winnand. How much I did not know about SQL in general. Knowing modern relational databases, such as MySQL, PostgreSQL, Oracle DB or SQLite, does not mean you know modern SQL. The most SQL standards and features were introduced since SQL-1999 (recursion), SQL-2003 (schemaless and analytical, like median), SQL-2011 (system versioning, aka time-travelling), SQL-2016 (JSON_TABLE), etc. A lot has happened since SQL-92, SQL has evolved beyond the relational idea. If you use SQL for CRUD operations only, you are doing it wrong.

Do not use self-joins in SQL anymore! Also, avoid OFFSETs from your statements, they are a performance leak!

The saddest conclusion I made: the most popular RDBMSes made themselves compliant with modern SQL only recently, but still, there are some features not ready in all RDBMSes. But what about modern ORMs? When will they be compliant with all the features we have in modern SQL? Or is it the best solution, for now, to avoid ORMs and write custom queries?

By the way, he has a book about SQL performance explained, it is highly recommended to read it. You can find more info on his website or buy his book with stickers and mug.

The conference was worth visiting at least for the sake of this talk, and I was very pleased with the fact that I learned so many new things I can use in my applications to boost performance. Anyway, there were also a few talks worth attending:

Encoding and charset, presented by Andreas Heigl. Worth to know that encoding is not a character set and what is what. How to properly work with UTF-8 in PHP and MySQL. Be aware that utf8 in MySQL is not a real UTF-8 encoding, you have to use utf8mb4 instead for proper UTF-8.
Automated PHP Refactoring, presented by Haralan Dobrev at unConf. He shared a collection of all known tools and showed how they could be implemented together.
Hexagonal Architecture by Nicolas Carlo. It was not that much for me personally as DDD is based on this architecture, but anyway it was a very good structured talk with good examples and real-life cases.
PHP-FIG Panel to describe a stack of standards they have. Be aware that PSR-2 is deprecated right now and PSR-12 should be used instead.

Here is a list of some useful slides for you: