The ottonova Tech Radar is a list of technologies. It’s defined by an assessment outcome, called ring assignment and has four rings with the following definitions:
ADOPT – Technologies we have high confidence in to serve our purpose, also in large scale. Technologies with a usage culture in our ottonova production environment, low risk and recommended to be widely used.
TRIAL – Technologies that we have seen work with success in project work to solve a real problem; first serious usage experience that confirm benefits and can uncover limitations. TRIAL technologies are slightly more risky; some engineers in our organization walked this path and will share knowledge and experiences.
ASSESS – Technologies that are promising and have clear potential value-add for us; technologies worth to invest some research and prototyping efforts in to see if it has impact. ASSESS technologies have higher risks; they are often brand new and highly unproven in our organisation. You will find some engineers that have knowledge in the technology and promote it, you may even find teams that have started a prototyping effort.
HOLD – Technologies not recommended to be used for new projects. Technologies that we think are not (yet) worth to (further) invest in. HOLD technologies should not be used for new projects, but usually can be continued for existing projects.
What do we use it for?
The Tech Radar is a tool to inspire and support engineering teams at ottonova to pick the best technologies for new projects. It provides a platform to share knowledge and experience in technologies, to reflect on technology decisions and continuously evolve our technology landscape.
Based on the pioneering work of ThoughtWorks, our Tech Radar sets out the changes in technologies that are interesting in software development — changes that we think our engineering teams should pay attention to and use in their projects.
When and how is the radar updated?
In general discussions around technology and their implementation is driven everywhere across our tech departments. Once we identify that a new technology is raised, we discuss and consolidate it in our Architecture Team.
We collect these entries and once per quarter the Architecture Team rates and assigns them to the appropriate ring definition.
Disclaimer: We used Zalando’s open source code to create our Tech Radar and were heavily influenced by their implementation. Feel free to do the same to create your own version.
Part of ottonova’s Software Engineering team since the very beginning.
Always on the lookout to improve our teams and to take the next step.
Clean code and KISS evangelist.
In this article, I would like to share with you and the whole internet our experience of dealing with RabbitMQ Live updates. You will learn some details about our architecture and use cases. Let’s start from the simplest… Why do we need RabbitMQ in our business?
Our Architecture
As a health insurance company, our business depends on many different third-party services to analyze risks, process claimable documents, charge monthly payments etc. All these processes take some time to be processed, so to keep our services fast and autonomous from each other, we are using asynchronous processing of tasks that can be done in the background. This approach speeds up responses and allows to do more in the background, ie. email sending, policy creation, acceptance verification etc.
Whenever a client expresses some intent to the API by making a request to it, this intent can create follow-up tasks. These tasks do not need to be handled synchronously, i.e. they do not need to be handled while processing the initial request. Instead, we put a message about this intent onto the message queue where it can be picked up asynchronously by another process and handled independently from the original request.
Problem
But with great opportunities comes great responsibility. Message processing is very important and critical for our business. Some messages could expire without being consumed or inconsistent with queue restricted arguments. In theory, this should not happen or might happen in a very rare case. But as we are working with customers data, we do not want to lose important messages. To keep dead messages saved in the message broker and do not stuck them in the original queue, we are using dead-letter feature.
Messages are published to exchange and can be sent to multiple queues depending on the routing key. As you can see from the image above, we used the same dead-letter scheme as for the original queues, so dead messages may end up in the wrong dead-letter queues. It is not very critical if you pick up dead messages manually (considering that they are rare), but nevertheless, it is still strange to find these messages in the wrong place.
To solve this problem, we need to add a new argument to the properties of the queues, it is x-dead-letter-routing-key and it should be unique. As a unique value for the routing key, we can use the queue name itself. This idea brought our team one step closer to a good solution: we don’t need a dead-letter exchange anymore 🎉. To simplify it, we can use default nameless exchange "" with the dead-letter queue as the routing key and it will forward the message directly to the proper queue.
Unfortunately, doing everything is not as easy as writing or talking about it 😒. To maintain the consistency and stability of the message broker, the RabbitMQ does not allow changing the arguments of already existing queues.
Deployment preparation
So, RabbitMQ does not allow you to change queue arguments in the runtime, so the only possible way to do it by removing queues and re-creating them again with updated arguments. But it is not possible in production, as we might lose some messages when they already removed, but new ones still do not exist. To solve this problem we need to introduce temporary queues to handle these messages, while old queues will be removed. For a simple system, this will be possible with 4 releases:
Create temporary queues, but do not handle messages from them for now.
Switch to the new queues and remove old queues. At this step, we already have a properly configured queues, but names are different. To return to old names, we need to do the same steps again.
Create new queues with old names, but with updated arguments. Do not consume messages from them for now.
Switch to the new queues with updated arguments.
4 releases, not a few, right? This requires not only a lot of small work, but also attention to make sure everything went right every time. How can we reduce them? 🤔
The simplest thing we can do is agree to rename the queues. This will reduce the number of releases by 2 times, since we will not need to rename them back. This was acceptable to us, and we even got more of it as we improved the message handling process. But that’s a completely different story 😉.
What else can you do? Enabling consumers and message handling in the new queues right away will reduce release count to only one, but we should accept the risk of duplicated messages when new queues already created but old ones are still processing.
At this point, I was stopped by the teammate, because I did not take into account the process of our deployment. We have blue-green deployment process, it’s when you have multiple instances of the same thing. And when you deploy, you take one down, upgrade, then put it up, then take the other one down to upgrade. This guarantees there is something always up. In our case, this means there is always a consumer there.
So, messages can definitely be duplicated if deployed during business hours. Deployment takes several minutes, which means that both old and new queues will be active for several minutes.
Time to analyze and decide whether it is safe to deploy the application at night (and do we really want to do it 🙂) when the message flow is low, or it is worth implementing a third-party service like a Redis to check if the message has already been processed by some consumer, old or new.
Release
The easiest way to check the load on our message broker is to check the number of logs by day of the week and time. Since we are a highly focused company working only in Germany, we have a very low message load from late evening to early morning.
It is not such a big highload as it could be, so we can accept the risk that some messages may be duplicated, but even if this happens, their number will be extremely small and we can manually solve them. This will save the resources and time that would be required for two releases.
After trying to release after midnight we found out that we couldn’t do it at night. Some of our third-party services are not available, so the container simply cannot be booted. Well, it was worth trying once, now we know it for sure. Nighttime for sleeping 😴.
But we can still do it late in the evening or early in the morning. One has only to pay attention to the RabbitMQ load.
Late in the evening:
Early in the morning:
We made the decision to press the release button early in the morning after a good night’s sleep. This time everything went fine and there were no duplicates.
It was not an easy way to solve this problem, but it was worth it. Solving this problem, our team and I learned a lot of interesting things about message consuming and deployment processes. Now it is even better than before, with correct queue settings and decoupled message handling 😎.
TL;DR
RabbitMQ does not allow to rename queues or change queue arguments;
to change something in the queue, you have to remove it and re-create;
to re-create it safe, you need to use temporary queues;
stable system could be run under multiple instances, so be aware of duplicated messages between old queues and new queues;
if your business is tied to one timezone and is not high loaded at night, it is acceptable to have duplicated messages instead of over-engineering your consumers.
2020 will be remembered for a very long time by the quarantine and the accompanying restrictions. All events where there is a crowd of people have been cancelled and we are trying to adhere to all recommendations. It would seem that this year’s conference would be impossible. But tough times await new solutions, and now conferences are also moving online.
This innovative solution has its pros and cons. What I liked was:
ability to communicate with the speaker;
switch channels just in one click;
talks were recorded in advance, so speakers could answer questions in runtime;
talk to anyone you want in chat;
sitting in my favourite chair with two monitors;
slides or speaker monitor very clearly visible (people with poor eyesight will understand me 😉);
if you get bored, you can go about your business (conferences in the post-Soviet area are held on weekends, so you need to spend your personal time);
waking up late and no queues to get a pass.
Nevertheless, in addition to the pros, there were also disadvantages:
affiliate ads sound more intrusive and more like spam;
the platform they used for sharing had a few technical issues, so I met a lot of freezes;
only the winners of contests and quizzes can receive partner merchandise.
What about the conference
The conference is divided into two days. The first day was devoted to workshops, and the second day there were 2 tracks for talks. The overall level of the talks was quite high and I personally really liked it.
Workshops were held exclusively in Russian, so the audience was very limited (about 140 participants). But the talks were both in Russian and in English and were very reasonably distributed among the tracks (approximately the number of listeners on the stream was 150 and 80 per track).
Workshops
Observability in practice by Elena Grahovac
Quite an interesting and practical workshop, in which she showed by a practical example of how to log useful information using a uber-go/zap logger, tracing of application flow execution and gathering metrics using opentelemetry, visualization and analysis of the obtained data using jaeger.
The codebase available on GitHub, just use tags in this order clean, logger, tracer, meter and tools to follow the process:
TLA+/TLC: a practical tool for formal verification of algorithms that all gophers need to know for sure by Alexey Naidyonov
Despite the title, I personally think that this topic is important, but not so much that everyone should know it. It would be nice to know – yes, it can help you with your architecture planning, but for need – no, I don’t think so.
TLA+ is a tool to design systems and algorithms, then programmatically verify that those systems don’t have critical bugs. It’s the software equivalent of a blueprint.
If you are interested to learn more, here are a few links for you:
The TLA+ Video Course by Leslie Lamport, author of TLA+ and PlusCal specification language
If you are interested in a deeper study, then “Specifying Systems” and “Practical TLA+” books will serve as the best continuation for you.
Talks
Continuous profiling for Go applications by Mike Kabischev
Nice talk, started with an overview of profile types and basics profiling with runtime/pprof. Then several continuous profiling packages were compared, such as github.com/conprof/conprof and github.com/profefe/profefe.
Profiling is a part of observability, that’s why pprof should be always available, but net/http/pprof should be accessible in the different port.
eBPF: Modern Introspection Capabilities in Linux by Marko Kevac
BPF is kernel-level profiling in Linux. It allows you to monitor what happens in the system, as Linux is an event-driven system and you can analyse these events with BPF program. The newer the version of your kernel, the more BPF features you can use. However, BPF is not fully adapted with Go, namely BPF program written in Go cannot work with the kernel part. The most commonly used package is iovisor/gobpf, but there are other alternatives like github.com/dropbox/goebpf and github.com/cilium/ebpf.
If you are interested and would like to know more, then it is best to read “BPF Performance Tools” and “Linux Observability with BPF” books:
Codegenerator in Go by Dmitriy Smotrov
Personally, I am too conservative for decisions such as code generation, as I prefer to do everything myself. Nevertheless, such solutions can speed up work on routine things, for example, describing a repository for a model, or writing tests for this model. In addition, it is important to note that Go has good functionality for such solutions.
Debugging concurrent programs in Go by Andrii Soldatenko
The talk was built on the use of the console version of the delve (dlv). Of course, GoLand will solve it for you as its debugger also uses devle, same as VSCode, but not everything from delve release will immediately appear in your IDE. So if you want to have a better and custom debugger, it is good to know how dlv works.
Go, please: language server under the microscope by Ilya Danilkin
A Language Server is meant to provide the language-specific smarts and communicate with development tools over a protocol that enables inter-process communication. The idea behind the Language Server Protocol (LSP) is to standardize the protocol for how such servers and development tools communicate. This way, a single Language Server can be re-used in multiple development tools, which in turn can support multiple languages with minimal effort.
In the past, there were many LSP implementations in Go, but over time, the Go core team developed the official LSP implementation gopls that we know today.
How to stop thinking about required fields and start writing contracts by Vladimir Serdyukov
The talk tells about the Buffer Protocol mechanism, invented by Google for serializing data structures. The speaker talked about the differences between proto2 and proto3, as well as how to use required fields in proto3. For validation, you can use either buf.build or github.com/uber/prototool.
In new projects and for better compatibility it is recommended to use proto3. apiv2 can and should be used, but prototool does not support it. buf.build looks promising, but plugins such as gogoproto lose their relevance.
Intro to AI for software engineers using go-learn by Miriah Peterson
GoLearn is an accessible ML library written primarily in Go with some C and C++. It uses with simple classification problems.
As a result, he summed up that students can write productive code in Go and it takes comparable time to review it as for regular developer. It is not necessary to have a curriculum in order to come to the university with your projects.
Generic Programming in Go by Vladimir Vivien, “Learning Go Programming” book author
The possibility of adding generics to Go is currently being developed. Preliminary, they should be expected no earlier than 2 years later.
Go core team assumes a level of performance in runtime, as generics should come with faster execution time. Nevertheless, compiler time may increase, but the Go core team are doing everything to keep compilation fast. Use of generics can be also complicated and the code with them may look unusual. Here is an example of using type parameters in functions:
I was pleased with the time spent listening to talks and workshops. In addition to the information from the official part, in the communication channels, I have gathered for myself several technologies that are worth paying attention to.
uber-go/zap logger might be a good alternative to the sirupsen/logrus which we are currently using at ottonova. Although it is simpler to implement and use, nevertheless its execution speed is several times lower than that of zap.
FluentD is an interesting alternative for LogStash. From a preliminary analysis of FluentD, it appears to be less resource-intensive and more flexible.
Observability is popular and demanded thing, and most of the conference was dedicated to it.
Moment.js is an awesome library when it comes to performing complex date-time manipulations. It provides a rich and clean API that covers many use cases. That aside, Moment.js shouldn’t always be the go-to library when it comes to date-time problematics. Alternatives should be considered as well.
What are the alternatives?
Actually, there are plenty of alternatives out there:
Similar API to Moment.js – which means easier migration
How the migration went?
All date-time functions used in our apps are located in the service date.service.ts. So the migration of this service made the switch possible for us.
In general, having the date-time manipulation centralised in one place is a good practice. It makes changes like this one possible without much effort.
The migration process
Make sure that the service is 100% covered with unit tests
Check if all Moment.js API usages are available in Day.js
Replace Moment.js with Day.js in the package.json
Adjust the service to use Day.js
Step 1. was an easy one. We just wrote the missing unit tests for our service.
In general, test coverage of utility functions should always be high.
In Step 2. we found out that the following changes were necessary:
// Day.jsdayjs.months();// dayjs.months is not a function
The APIs are mostly compatible. Finding these key differences between the libraries helped us to tackle all the issues in Step 3 and 4.
All other changes were specifically related to our business logic.
How our bundle changed?
The migration confirmed our intentions. Our bundle is 60KB (~10%) lighter.
Gzipped size of Moment.js was 72.47KB and now of Day.js is 3.14KB (including locale and UTC plugin)
TL;DR
So far, switching to Day.js seems like a great decision. We haven’t run into any issues since our migration, one month ago.
The goal of this blog post is not to convince you that Day.js is awesome and Moment.js is terrible. But to remind you that choosing a date-time library is not an easy task.
There are many options available, so take your time and find out which one might be the best for your apps and needs.
Part of ottonova’s Software Engineering team since the very beginning.
Always on the lookout to improve our teams and to take the next step.
Clean code and KISS evangelist.
At ottonova, we believe in being transparent about the technology stack we use. We see no reason to keep it a secret.
Possible concerns we sometimes hear are about the following two domains:
Security: We believe that security cannot be improved by keeping our tech stack a secret. In fact, discussing our tech stack openly allows us to identify and address any weaknesses or vulnerabilities. Operations and data security should always be a top priority.
Competition: We don’t believe that hiding our tech stack gives us any competitive advantage. Building a successful business involves much more than just assembling technology components – it’s about the team, the culture, and the experience, which cannot be easily replicated.
Our approach is to use mature technology stacks, follow best practices, and incorporate new technology components as needed. This approach is based on our years of experience building scalable web-based platforms and services.
Sharing our tech stack also helps us attract talented individuals to join our team. Which is a major motivations for us.