Friday, November 16, 2018

Rancher Labs online classes

Rancher 1.x was a cool project. I liked the approach of deploying the controller, then adding workers, and then deploying applications. Under the covers the orchestration could support several different models including adding sidekicks and persistent container following. They really did some work to spearhead the persistent containers which can be complicated because of remote caching, change management, security and so on. They also supported many different models of orchestration including their own cattle, swarm, meso, kubernetes.

With Rancher 2.x they cut the cord on all orchestration but kubernetes. There may be some backporting except that Rancher excels reverse engineering clusters as well as deploying them. They have not talked about the internal design or motivations but it's clear that a running cluster is more authoritative than the data structure you think you captured that might represent the model.

That said picking the authority is a challenge. Worse still is trying to identify, recover and repair broken systems. I described this problem months ago and it is still an open issue. Strangely while I run swarm in production when it goes south I have to rebuild it from scratch. Docker does not like to be repaired.

Years ago Hightower did some lights out operational demos. They were exciting to see containers crash and then be repaired. Even then the failures were pretty simple. Today Kubernetes is configured with a model that represents "this is how I should look" and kubernetes tests the live cluster and tries for a match. Filling in the parts that do not. I'm reminded of some Erlang cluster networking I did. Repairing an Erlang cluster is near impossible. Erlang would prefer a total failure with a restart... and so does Docker.


There is something to be said for complete redeploys. Especially when the system is small and fast enough. But if you've got hundreds or thousands of systems this is not practical. Then there's the other challenge of having hot spares, keeping the code and data in sync. One thing for certain is that each system is different and with different disaster and availability needs.

This is not really where I was taking this post, but it is clear that disaster recovery is still a thing and neither docker, kubernetes, nor rancher have that problem solved.

No comments:

Post a Comment

another bad day for open source

One of the hallmarks of a good open source project is just how complicated it is to install, configure and maintain. Happily gitlab and the ...