Skip to main content

if you have etcd should everything use it?

For many years after Microsoft deployed the registry it was the hell of all hells. It was the one thing that could kill your windows server or desktop and could render your machine unusable. In some cases you could boot into single user mode and repair it; years later there was a snapshot tool; and a plethora of 3rd party tools that did the same.

The etcd project from CoreOS defines etcd as "A highly-available key value store for shared configuration and service discovery".

Talking about Service Discovery,

If you're deploying a gaggle of applications in your environment they may need to discover each other in order to communicate on some level. This function is/was traditionally performed with a DNS server and/or configuration files. In this environment systems and services were typically static, however, in the world of containers and virtualization anything can be anywhere. And worse yet the IP address can change more frequently. High Availability(HA) and system hardware, container failures and system/application upgrades (blue/green) can make the entire system unstable.

One challenge for DNS is that it has a TTL. It takes time for records to be updated and pass through the system. On the other hand the TTL can be set very low for frequent updates and more importantly DNS is the authority on hostname+IPaddress configurations. However in the realm of service discovery DNS is useless. DNS is essentially a key/value store where the key is the fully qualified domain name(fqdn) and the value is the IP address. Of course there are other records and other ways to search the DNS repository but in the end this is what you get.

But what happens when you take etcd and put a DNS API wrapper around it. That's the premise behind SkyDNS. SkyDNS provides the same DNS APIs but uses etcd to store the data instead of other mediums like a traditional DNS server. This is ok for a number of reasons. First of all having direct access to etcd means that the data can be updated using standard etcd query strings. It also means that SkyDNS could implement a listener such than when records important to SkyDNS are updated SkyDNS is given a poke so that it can figure out what happened and react.

The big benefit for this model is that the storage and replication is shared and can be easily backed up. Depending on what you know about DNS this could actually be good or bad. DNS replication may or may not be in realtime in the traditional sense. And in SixSigma/Root Cause analysis the notion that SkyDNS relies on etcd means that the aggregate system is inherently less reliable.

Now what is going to happen when you have 20 or 30 applications and services that use etcd in the same way you might have previously used postgres or some other replicated db? Here is the point where I leave you hanging. etcd is a replicated KV store. There are many KV stores. I'm certain etcd is small (the rocket container is 3.5MB) Fleet and flannel and possibly a few systems use etcd for configuration management so it's clear that it works. But where is the line? Should it be orchestration configuration management or just any kind of configuration?

etcd does not support TLS yet. So security is going to be a hack for a while. What else is missing?

PS: If I had to deploy a DNS server right now I'd probably go with GeoDNS. It has a good reputation and is used by the NTP project. I'm pretty certain that it uses a config file instead of a database for the configuration. This also make replication simple. It also puts the burden on the application and the filesystem to replicate and slide the new configuration in. I do not see it here but I imagine that storing the configuration in a VCS repo and triggering changes would be very useful. As for services it's time to start looking at the extended attributes.


Popular posts from this blog

Entry level cost for CoreOS+Tectonic

CoreOS and Tectonic start their pricing at 10 servers. Managed CoreOS starts at $1000 per month for those first 10 servers and Tectonic is $5000 for the same 10 servers. Annualized that is $85K or at least one employee depending on your market. As a single employee company I'd rather hire the employee. Specially since I only have 3 servers.

The pricing is biased toward the largest servers with the largest capacities; my dual core 32GB i5 IntelNuc can never be mistaken for a 96-CPU dual or quad core DELL

If CoreOS does not figure out a different barrier of entry they are going to follow the Borland path to obscurity.

UPDATE 2017-10-30: With gratitude the CoreOS team has provided updated information on their pricing, however, I stand by my conclusion that the effective cost is lower when you deploy monster machines. The cost per node of my 1 CPU Intel NUC is the same as a 96 CPU server when you get beyond 10 nodes. I'll also reiterate that while my pricing notes are not currently…

eGalax touch on default Ubuntu 14.04.2 LTS

I have not had success with the touch drivers as yet.  The touch works and evtest also seems to report events, however, I have noticed that the button click is not working and no matter what I do xinput refuses to configure the buttons correctly.  When I downgraded to ubuntu 10.04 LTS everything sort of worked... there must have been something in the kermel as 10.04 was in the 2.6 kernel and 4.04 is in the 3.x branch.

One thing ... all of the documentation pointed to the wrong website or one in Taiwanese. I was finally able to locate the drivers again: (it would have been nice if they provided the install instructions in text rather than PDF)
Please open the document "EETI_eGTouch_Programming_Guide" under the Guide directory, and follow the Guidline to install driver.
download the appropriate versionunzip the fileread the programming manual And from that I'm distilling to the following: execute the answer all of the questio…

Prometheus vs Bosun

In conclusion... while Bosun(B) is still not the ideal monitoring system neither is Prometheus(P).


I am running Bosun in a Docker container hosted on CoreOS. Fleet service/unit files keep it running. However in once case I have experienced at least one severe crash as a result of a disk full condition. That it is implemented as part golang, java and python is an annoyance. The MIT license is about the only good thing.

I am trying to integrate Prometheus into my pipeline but losing steam fast. The Prometheus design seems to desire that you integrate your own cache inside your application and then allow the server to scrape the data, however, if the interval between scrapes is shorter than the longest transient session of your application then you need a gateway. A place to shuttle your data that will be a little more persistent.

(1) storing the data in my application might get me started more quickly
(2) getting the server to pull the data might be more secure
(3) using a push g…