Why rent a cloud when you can build one?
Cozystack is a Kubernetes-based framework for building a private cloud environment.
Connect with Andrei on Linkedin.
Today’s shoutout goes to user Adam for winning a Populist badge for their answer to Regex replace text but exclude when text is between specific tag.
TRANSCRIPT
[Intro Music]
Ryan Donovan: Hello everyone, and welcome to the Stack Overflow Podcast, a place to talk all things software and technology. I’m Ryan Donovan, your host. Today we’re talking about how to build a cloud all by yourself, and my guest today is Andrei Kvapil, who is founder of Ænix and the core developer of Cozystack. So welcome to the show, Andrei.
Andrei Kvapil: Hello. Thank you, Ryan.
Ryan Donovan: At the beginning of the show, we like to get to know our guest. So, tell us a little bit about how you got into software and technology.
Andrei Kvapil: Well, I was in a community for [a] long time when Kubernetes was invented. I jumped into it, and I really like this technology. I tried to bring and put everything into Kubernetes, even Kubernetes itself. Actually, you can find a lot of my articles where I show what to choose between Argo CD and Flux CD, how to use utilization in Kubernetes, and various technologies like storage networking stuff working there. Some of those articles, you can find even in the kubernetes.io blog.
Ryan Donovan: What do you like about Kubernetes? What interested you about it?
Andrei Kvapil: I really like Kubernetes because of API, actually. I found it is not just the thing which creates containers and runs it somewhere; this is not an alternative to cloud platforms, this is something more. So, you can use Kubernetes to program actually everything. They have a different approach. So, the idea of this declarative logic that you can define a desired state and leave business logic behind. So, you don’t need to think about how is it happening under the hood, you just work on higher-level abstractions, and you go deep just in case if something goes wrong. So, using Kubernetes, you can specify, ‘I want a pizza,’ you’ll have a specific operator which will handle this pizza for you.
Ryan Donovan: Today, I’m sure we’ll touch on Kubernetes again as we build up our own cloud. So, tell us a little bit about what Cozystack is.
Andrei Kvapil: The Cozystack is a platform for managed services. So, nowadays we say that nobody needs just pure virtual machines – everybody is looking for Manage-IT services, such way, like, you can go to AWS or Google Cloud, you can order Kubernetes queries, some database S3 buckets. We say that Cozystack is a next-gen hypervisor, which allows you to order not only virtual machines, but also pass services, and Cozystack is open source. We develop it as part of the CNCF project. That’s one of the things I like to do – to share our experience and deliver it as a platform.
Ryan Donovan: You’ve obviously built the foundation for a cloud system, the hypervisor. What does it take to build up a cloud, starting from the bare metal to running microservices on it? What’s the first thing you need to build on the bare metal?
Andrei Kvapil: Well, nowadays, so many companies are trying to build their own cloud. We can start [with], ‘why’re they trying to do this?’ There is a very fashion[able] thing in Europe— especially in Europe—it’s called digital sovereignty. People trying to get their own clouds to get independent from hyperscalers, I think nowadays, it’s more easy to build your own cloud than a few years ago, thanks to Kubernetes. Even core technologies like OpenStack and other concurrence—let’s say, CloudStack, OpenNebula—nowadays, it’s pretty common to run them on Kubernetes. So, if you’re trying to build your own cloud, you need to think about many things. If you’re a cloud user, you usually work only on the top parts of this cloud, actually API, and you don’t care about the bottom part of this iceberg. Under the hood, you need to solve so many things, let’s say, starting from bare metal node provisioning. If you have three or 10 nodes, it’s okay, you don’t need to solve that at the beginning, but when you have hundreds and thousands of the nodes, you start thinking about how to orchestrate it better. Also, if you are trying to do a product, let’s say, a cloud, which you can run in various environments for various clients, you need to make infrastructure reproducible, and it is very difficult to support this reproducibility for various operating systems. So, I think the first thing you need to do is to unify everything and find a way of how you would distribute this. In Cozystack, we start from the very bottom level of operating systems; in Cozystack, we decided we don’t want to support standard operating systems because they have different kernels, different modules. Instead of this, we choose a really small system, which provides Kubernetes. It’s called Telus Linux, and we built our system on top of it. Thanks to that, it allows us to bake the image with all the needed kernel modules and kernel versions, and to be sure that it works the same way on various environments. So, very basic level is [the] operating system and, actually, Linux kernel.
Ryan Donovan: Yeah, get some sort of small, minimal surface-level, surface area Linux system, and then on top of that, are we ready for the hypervisor, or do we need to do some more work before that?
Andrei Kvapil: Well, there are also a lot of various hypervisors—let’s say, pretty common now is KVM and QEMU. KVM—so, you can use QEMU, but there are so many orchestrators which actually consume QEMU in different ways. I already mentioned OpenStack, OpenNebula, CloudStack. We can add Proxmox, we can add KubeVirt. So, so many technologies.
Ryan Donovan: We don’t need to be comprehensive here.
Andrei Kvapil: Yeah, yeah. I just want to say that, first, you need to decide what kind of cloud you’re building. I really like to split these virtualization systems to do categories. One is traditional virtualization systems – you can consider it like the same way like running physical servers. If you create [a] virtual machine, if you need to have [an] ISO to install this virtual machine, it’s pretty similar like [a] physical server. This is more likely [a] traditional virtualization system, like VMware, Proxmox, and for sometime, Oracle, which currently is maintained.
Ryan Donovan: It’s just a piece of software simulating a single computer, right?
Andrei Kvapil: Yeah. So hypervisor, here, is just a tool to split [a] big server to small ones, but they also introduce a lot of tools like, let’s say, life migration, HA manager, and the things to keep your virtual machine alive, and [a] different category is your cloud system. If you want to build something more ‘cloud native approach,’ let’s say, if you want to run hundreds of virtual machines, all of them should be created from golden image, from the predefined template. If you want to build something like Digital Ocean or AWS, it can be implemented [in] different systems, let’s say, the same OpenStack, CloudStack, Kubevirt, and things like so. So, I think the second step when you start building your cloud [is] you need to decide what kind of virtualization you’re going to provide. Because the traditional virtualization system– they are good and they provide [a] good interface for creating virtual machines, installing operating systems, and to keep them alive. In cloud pattern, you have something different: you just use resources and virtual machines just need identity to consume those resources.
Ryan Donovan: The traditional virtualization is sort of splitting a larger server into smaller parts, but the cloud server is doing the opposite – taking a large thing and running it across several servers or virtual machines. Is that accurate?
Andrei Kvapil: Yes, partially yes. So, cloud usually means that you have API and you have a lot of automation for spouting hundreds of the virtual machines, and inside of these virtual machines, you can also have integration with the cloud. Let’s say, pretty common now, if you run Kubernetes in cloud, you’re expecting that this cloud will provide you persistent volume, a load balancer, and the other services, which you can order directly from the Kubernetes.
Ryan Donovan: Do you need a hypervisor for the traditional virtualization or is that just for the larger cloud system?
Andrei Kvapil: I think this is two different things. We are not doing traditional virtualization very good. There is a good solution, Proxmox – just take and use it. I really love Proxmox because it is open source and it works very well. But we’re more concentrated on providing Manage-IT services. That’s such a thing where you can go and, just in Cloud, order Kubernetes, Redis, Rabbit, and things like so. We implement cloud pattern. In this way, we also use virtualization, but virtualization here is used more for isolating resources between the users.
Ryan Donovan: For the Kubernetes, I assume, is that the next step up to manage all this virtual hardware? You’re spawning?
Andrei Kvapil: Yeah, you need to choose [an] orchestrator. So for traditional visualization systems, I would suggest using Proxmox if you’re considering between the open source tools. In [the] case if you want to build something more smart, something more automated, you need to consider between the OpenStack, Cloud Stack, OpenNebula, KubeVirt, Harvester– there are so many solutions right now. Some of them [are] based on the same technologies—let’s say, Cozystack, Harvester, and OpenShift—they all consume KubeVirt under the hood. They implement it the same way in Kubernetes. I think we’ve got some experience building clouds. In past, we built a few clouds, and I can see the trends and the vault that people like Kubernetes, and they like interacting with that. So, all of them were building clouds using Kubernetes because Kubernetes allows you to become such [an] orchestrator and provide services just by the request.
Ryan Donovan: Yeah, it’s a YAML file, and you don’t have to worry about the infrastructure. I’ve heard a lot of people saying it’s too much, or you know, it’s resume-driven development. Is Kubernetes something that anybody should be touching if they’re thinking about cloud?
Andrei Kvapil: If you’ll take OpenStack: we’re communicating with the people who use OpenStack, and they usually have a team of 10 or 20 people who’re managing just OpenStack. If we will narrow down the scope of the engineers to only Kubernetes – in Cozystack, we have Kubernetes inside, and we have Kubernetes outside as [an] orchestrator. This makes less cognitive load on the people who’re managing this stuff. You can actually narrow the amount of people who’re managing these and the people who work with Kubernetes. Nowadays, it’s a very fashion[able] technology. I think it’s way more easier to find them on the market.
Ryan Donovan: We’ve got the operating system, the virtual machines, the hypervisor, Kubernetes. Is there something else, or do we have a cloud at this point?
Andrei Kvapil: Yeah, that’s still not even the half of what you need to solve. The second part is storage and networking – [the] two most painful things if you want to run stateful services. Let’s start from the networking first: if you use Kubernetes to build your own cloud, it might be even more complex because Kubernetes wasn’t actually created to run virtual machines. They implement different patterns for the networking, which was originally designed to run stateless containers. Right now, Kubernetes allows you to run stateful services, but still, there are a lot of things which are not solved yet in a core Kubernetes. So, you can use some technologies, actually, Red Hat – they have a solution, which is called OVN-Kubernetes; and also there is Kube-OVN Project, which we use in Cozystack – it allows you to solve parts you need to run virtual machines in Kubernetes. Those problems, such as life migration of virtual machine to another node, you need to keep [an] IP address for this virtual machine. You also need to have MAC address management because originally, Kubernetes do not provide any MAC address management for this.
Ryan Donovan: How do you maintain those IP addresses? Are they sort of internally managed, or is there some DNS server that you have to sort of update, talk to?
Andrei Kvapil: Kubernetes usually comes with Kube DNS or Core DNS service, which solves name resolution inside of the cluster that just works out of the box, but we implement [a] very interesting network pattern. So, instead of doing the same thing like Red Hat does, or Harvester, because they also built their solution on top of Kubernetes – they’re trying to reimplement this traditional virtualization system, networking for virtual machines running in Kubernetes. That creates another API, which is not very good controlled by the Kubernetes itself. In Cozystack, we’re trying to stick the same approach to having Kubernetes networking like in every Kubernetes cluster. So, we have one big address range where we dedicate IP addresses for virtual machine, with one difference: the IP addresses – they can be moved between the nodes. The rest [of the] things works the same way [as] in Kubernetes. So, you have [a] service network for load balancers, and you have external load balancers, let’s say, for Layer2 or BGP announcements.
Ryan Donovan: Do you use any sort of internal multicasting where you have one IP address that talks to many different individual locations?
Andrei Kvapil: We do not implement that yet, so in our case, every virtual machine has just one interface and has one IP address assigned. All the rest [of the] stuff is done thanks to Cilium. Cilium is [a] really modern CNI plugin for Kubernetes, which handles everything on [the] BGO level. So, it uses BGP technology to override IP addresses, to provide actually strict policies between the tenants. In this way, you don’t need to think about IP addresses at all. So IP address is just a service thing [that] needs to allow you [to] send and receive packets. You start operating on higher-level labels, names, and things like so. And Cilium– you can just say, ‘I want this namespace, allow to go this name namespace,’ and you don’t care about IP addresses assigned.
Ryan Donovan: Yeah, I mean, in most APIs you don’t really care about the IP address. You just– here’s the URI, URL, whatever it is, and then it goes through… what? API, gateway traffic shaping, load balancing? How are proxies– how much of that do you have to worry about when you’re setting up an API that attaches to your Kubernetes?
Andrei Kvapil: That’s always about how you prepare your environment. We have so many environments, and first, we’re still trying to unify all of them. You can unify everything inside of the cluster, but there are always questions [of] how to communicate with external routers. If they use BGP, they can use different versions, different technologies, to expose those IP addresses. If you’re talking about cloud platform, they usually work on Layer2 and Layer3, but I don’t know many clouds who provide a Layer7 communication service mesh. So yeah, it’s quite famous in [the] Kubernetes world. Some clouds—public clouds—they usually provide some additional service, but if you’re talking about providing infrastructure, you just need to solve Layer2 and Layer3, and such things like life migration, IP address allocation, the security groups, and communication between them. Also, ‘VPCs’ – very painful topic. We do not support them yet. We want to because people get used to, ‘I just want to create my network, so I want to use my own IP addresses.’
Ryan Donovan: What’s ‘ VPC’ stand for?
Andrei Kvapil: That’s usually about the isolation—physical isolation of networking between the users, between the tenants—even despite the fact [that] for now, we have a really common space for all the customers. They’re not controlling their IP addresses, and that looks weird for the people who get used to Cloud. Even despite the fact we use a strict EBPF policies to not allow traffic to go in different environment locations, they still want to have their own dedicated layer-to-network where they can choose their own IP addresses, send multicast traffic, and things like so.
Ryan Donovan: All of this, you know, connection, traffic shaping – it operates within Kubernetes?
Andrei Kvapil: There are few options how you can do this. So, our way is [to] maximally reuse existing Kubernetes solution and networking approach. But there is also a project, it’s called Multus – it’s quite common if you’re creating your own cloud in Kubernetes for organizing VPCs. So, Multus allows you to run multiple networks, multiple interfaces, for every container and virtual machine, but it has also many drawbacks. For example, they created networks [in] such [a] way [that] they will not be controlled by the Kubernetes itself, so you can’t actually use network policy to control them, and every solution is solving that a different way.
Ryan Donovan: Is there some way to propagate network policy to these things outside of the Kubernetes clusters?
Andrei Kvapil: You mean with Multus?
Ryan Donovan: Yeah, in any sort of consistent way – like, you said it can’t apply the network policy.
Andrei Kvapil: The Kubernetes defines default entities for managing network, such as network policies services, it’s called Kind service, which is used for service discovery. And many CNI plugins like OVN-Kubernetes and Kube-OVN, they implement their own entities, which allows you to do something similar for additional networks.
Ryan Donovan: And you said there’s a network piece and then there’s the file storage piece. What do you have to think about for that?
Andrei Kvapil: The storage– also [a] very difficult topic. It can be very different, and also it’s very difficult to create something unified. Some people like to use proprietary black box solutions so they can just put a node, connect it to Kubernetes, install the driver, and use volumes from it. We decided to do [a] hyperconverged cloud, so we ship our storage via Kubernetes. But when you have a storage, that usually means that you’re bringing state into Kubernetes. When you start having state, you start thinking about, ‘I need to update this node. What to do with the data on this node? Should it be removed, migrated?’ And how to manage that properly.
Ryan Donovan: Yeah, because that interferes with the sort of like, ‘destroy, rebuild’ ease of Kubernetes, right?
Andrei Kvapil: Actually, thanks to operator approach, which was invented I think five, or I don’t know how many years ago, that was always about bringing state into Kubernetes. Because originally, it was just [a] stateless system and operators bring this option to you to manage stateful services in Kubernetes. And virtual machine, if we can just imagine, that’s just another stateful service. That’s just [a] QEMU process inside of the container, and that’s exactly how KubeVirt works. So, KubeVirt, we can say that this is [an] operator for running virtual machines inside the containers.
Ryan Donovan: Does it propagate the state across nodes? And how does it happen? Is that something that is gonna cost you money in terms of like, inflow, outflow costs?
Andrei Kvapil: In case [where] you have something stateful, it’s always better to keep state outside. If you’re talking from the storage perspective and don’t talk about the virtualization, let’s say, about normal container workloads, it always better to keep this state somewhere in S3 buckets, which don’t require any Linux operations to mount this file system, or to create the block device. So, when we are talking about the storage, I would split also into three different categories: the object storage is something which you don’t care about– you have API, let’s say, HTTP, or S3 where you can go and put or get your data back from it. But for virtual machines, you need something that works on the system level. You need block storage. So when you create virtual machines, it works with block devices where you can install your Linux or Windows operating system, doesn’t matter. But before you start using this, you need to think how you will deliver that. That’s good if you have external storage, but if you don’t, you need to install some storage system into your Kubernetes. Very common thing is Ceph. We use LINSTOR because Ceph is working not very good in hyperconverged infrastructure. It consumes so [much] CPU time. We use LINSTOR. It actually implements block device. It works like MAD, let’s say, soft RAID 1, but which works on the networking level. That means if one node goes down, your data will consistently start on another node. And if it goes down for a very long time, it also will even replicate this data to another node.
Ryan Donovan: And how much of this data is this, sort of like, individual nodes data is being saved in, you know, S3 buckets in case of, you know, persistence, or failures, or anything? Is that something that needs to happen, or can you just do it all in Ceph or LINSTOR?
Andrei Kvapil: You can’t just put all of your data into S3 buckets, unfortunately. And another thing, if you put your data into S3 bucket, you’re just moving your responsibility to someone else. What if you need to run your own S3 storage? You get exactly in the same situation. So, for running S3 storage, you need block devices or [a] file system where those files will be stored.
Ryan Donovan: So that’s if you’re building your own cloud, that’s another part of it that you have to build your own S3 buckets.
Andrei Kvapil: If you want to support this, then yes. If you don’t want to support it, it might be not required, especially if you use Ceph, you can actually use both. It supports all three interfaces, actually: file system, block devices, and S3 object store. LINSTOR solves just one piece, actually block devices, but they highly perform. And on top of these block devices, we run CVit FS, which makes object storage. Sometimes you also need a file system storage.
Ryan Donovan: What if something downloads a file? Right? What if it writes to a log file?
Andrei Kvapil: Also, [in] various cases, for example, if I run in our cloud– also databases, they actually work with [a] file system; that doesn’t mean that they need to have [a] shared file system because they use the same block devices, and they use normal X4 or XFS created on top of this block device. So that works fine with block storage on there, but if you want to run tons of WordPress websites, or some PSP application, they’re expecting to have it redundant. They’re expecting that they will work with the same file system. In this way, you need to have something like POSIX-compatible file system, which you can mount on multiple nodes with distributed logs. I would say this is [a] very difficult topic.
Ryan Donovan: Well, we’ll save that topic for another time. To our infrastructure stack, we’ve added the network in the file system. Do we have a cloud now?
Andrei Kvapil: Yes and no, because you need to manage your users somehow. You need to authenticate and authorize them, and you need to create assigned quotas for their tenant name spaces, and you need to build them somehow. Starting from this point–
Ryan Donovan: You’re talking about the users on the cloud itself. You’re not talking about, like, users on whatever application, right?
Andrei Kvapil: Yeah, yeah, exactly. Cloud. Okay, so, you created storage networking, you installed some virtualization system, let’s say KubeVirt, you have pure API interface, and you need to control your users somehow. Let’s say, for example, if you are allowing them to create virtual machines, how to avoid situations when they’re interacting with KubeVirt—with Pure KubeVirt API—and replacing their entities with custom images, in such way, compromising the system. So, you need to secure yourself somehow from that. There are not many options how you can do this. You can go through writing policy using Open Policy Agent or Kyverno. There are many solutions which allow you to control user input, but we prefer different ways, so we implemented our own API server in Kubernetes, which allows you to specify only fields allowed to user to change, and then generate the resources for the operators. In such way, we have extensibility, and so, full aid. I would put here that we still have no web interface, which allows our users to connect. There might be some things like quotas billing system, so you need [to] somehow calculate. You still need to monitor it. So actually, in our project, the idea was to get so many open source projects, get them working all together, and provide us a box solution, which includes everything like monitoring, dashboard, virtualization, storage networking… so, you just install them on the system, on the nodes, and you get a ready cloud.
Ryan Donovan: Right, and you just worry about your business logic.
Andrei Kvapil: Yeah. So we take care about infrastructure, and you take care about the business logic. And even more, after you get this infrastructure layer, you start thinking about how to integrate existing things, let’s say, Kubernetes, which you allow to run inside of these Kubernetes, how to teach it to order, hot plug volumes, and load balancers. So, you need to have some API and interaction between the applications you deployed in your cloud with your cloud.
Ryan Donovan: I think we’ve only scratched the surface. We’ll include a couple of your articles about some of these deeper topics for folks who wanna check out more.
Ryan Donovan: Well, it is that time again where we shout out somebody who came onto Stack Overflow, dropped a little knowledge, shared some curiosity, and earned themselves a badge. Today, we’re shouting out a populist badge winner – somebody who dropped an answer on a question that was so good, it outscored the accepted answer. So, congrats to Adam for answering ‘Regex replace text but exclude when text is between specific tag.’ If you’re curious about that, we have the answer for you. I’m Ryan Donvan. I am the host of the podcast at Stack Overflow, editor of the blog. If you have questions, concerns, topics, et cetera, please email me at podcast@stackoverflow.com. And if you wanna find me on the internet, you can locate me on LinkedIn.
Andrei Kvapil: I’m Andrei Kvapil. I’m CEO and founder of Ænix. I’m developer of Cozystack. Please feel free to join our community. If you want to contact us, we have community channels, we have Slack in Kubernetes Slack, and Telegram Channel, just Cozystack in Kubernetes space. If you want to, you can join our biblically community meetings where we speak about the opportunities and how to develop the platform itself. That’s something amazing I learned from the CNCF meetings, because we like to share our experience and get it back from all our users from around the world. So take a look at cozystack.io website. You’ll find all the links there.
Ryan Donovan: Well, thank you for listening, everyone, and we’ll talk to you next time.