Wednesday, June 29, 2011

Pyzmq - not a lot of best practices

ZeroMQ is a Message Queue (MQ) framework that plainly works. Two of the most interesting elements are 1) ZMQ supports a number of client languages; 2) the broker (generally an application that exists to route traffic from a producer to a consumer) is left to the programmer to implement.

If you've read or used any of the other MQs or if you've done some interprocess communication (IPC) before you probably have a good or general idea how this is supposed to work. RabbitMQ does a really good job naming the different patterns and keeping the list to something manageable. While the ZMQ doc is long, detailed, absent of examples for each of the client languages, examples are buggy or old, the examples are simple; but they have many more patterns than RMQ.

One idea that keeps getting trapped in my head is; How do I send a request to the broker and wait for a response. And if the response does not arrive, then what? Basically I'm looking for a best practice here.

In my application design I have the following stack:
json_rpc -> broker -> worker -> remote_http

  1. a user application sends my stack a json-rpc call

  2. the call is forwarded to the broker

  3. the broker routes the request to a worker

  4. the worker forwards the request to a remote service provider

  5. and whatever response is returned... bubbles up through the call stack

So in my json-rpc application I have a module that waits for requests, the message is authenticated, and the input data is validated. Since the response time can be between 250ms and 90secs we'll keep the socket open and wait for the reply. The challenge here is getting the json-rps app to make the request, detect errors, handle certain errors, and forward the request to the broker, wait for a reply, parse the response message, and return to the caller.
Here is the pseudo code:

retry = 3
zmq.send(socket, request, zmq.NOBLOCK)
while True:
# wait 7 seconds between boll timeouts
socks = dict(poller.poll(7000))
if socks.get(service) == zmq.POLLIN:
# we got a message
# NOBLOCK here almost makes no sense other than we want to get
# all errors and we do not want to block at all
# any errors will generate an exception
# we could assert against an empty response but why?
rsp = zmq.recv(socket, zmq.NOBLOCK) 
# timeout
retry = retry - 1
except zmq.core.error.ZMQError, e:
retry = retry - 1
if zmq.EFSM == e.errno:
# wrong state
elif zmq.EAGAIN == e.errno
# no data
retry = 0
if retry <= 0:
# reconnect to the broker
. . .
retry = 3

The thing to keep in mind is that ZMQ will not return from a send or receive unless something good happened. That means that you can expect an exception to be fired if your message did not get out, the response was not received, some other app in the stack restarted and thus changed the state of the socket. This makes sending messages reliably very simple... even though my post is even simpler than than.

Thursday, June 23, 2011

NoSQL for your next project?

I keep keep going back and forth on the whole idea of NoSQL and it bothers me to no end. On the one side the idea of sharding the data at the server level is appealing. Then there are the Key/Value databases and then the document versions. They are graph databases, object databases and a few in between. But then, as always, there is this reality check.

As wonderful as NoSQL would appear to be there is no single use-case that would seem to make it/them the obvious choice. And this is no more obvious as I stumble around a new project I'm considering... an open source merchant gateway and an open source issuing system. I would really like to have one and only one system that I could use for everything but that does not seem possible.

For example; in order to deal with the protocol impedance I need a fifo queue of some kind. I like redis for this as it also has a TTL so that old data is simply removed from the queue. It also has the notion of fields in the value so that a single record can actually represent a dictionary or an array; making a useful container for the message components. (converting an ISO-8583 or variables from a POST into a dictionary is fun and useful). Also, since there is a fifo queue the transaction results and logging can be pushed into a different channel for aggregation and processing later.

In both the issuing and gateway systems the transactional part is considered OLTP system and OLTP systems benefit greatly from hash tables like those that redis provides. However, redis is useless when it comes to reporting, mapreduce, partitioning, and performance when the dataset approaches available memory limits. So it would seem that it would be useful to have a SQL or other storage mechanism to store all the persistant data, a queue for all of the logs and transactions, and a hash to act as a cache for the data from the primary store. The challenge here is that the first time there is a miss on the account data on the cache the compute node will have to go to storage to get the account data. This can be costly and 2x machines will negatively effect the sigma score. And then system recovery is going to be harder even if the data is replicated around the systems. (expire/TTL does not work as expected on replicated systems.)

The NoSQL Databases website claims that there are 122+ NoSQL databases. I just ran over a bunch of them and they all left me wanting something better... or at least feeling that I needed to go back to an RDBMS. (read PostgreSQL)

Some requirements:

  • a queue for message logging that is persistant and replicated

  • a queue for transaction logging account info back to the primary storage

  • a queue for transaction requests that can and will expire, however, the expiration will trigger a log entry

  • a cache that represents the account data from the primary storage

  • a cache that represents the config data from the primary storage that is reloaded on-demand

  • flush from cache when the transaction entry is recovered from the transaction log

  • evaluate the mapreduce hit/miss when the transaction log is processed or mapreduce once a day or hour

  • import the data from the cache into a detached slice until the data is ready for consumption. Then attach it. (Postgres)

One last comment.  Replication is a pain. Specially when you are exporting transactions this way. Given the amount of time it takes to sync the data, specially with transaction bursts, the overall system can and will experience slowdowns. It gets worse when the users are trying to interact with the data. On top of that importing CSV is so much better. And finally, doing the imports when the users are not using the system means that the data is stale but accurate.

I'm going to build a test harness made from redis+mongodb and redis+postgres... just for fun and testing.

A new approach : HamsterDB

Revisiting my favorite subject again, credit card processing, the hamsterDB's description on the NoSQL website triggered an alarm.
hamsterDB - (embedded solution) ACID Compliance, Lock Free Architecture (transactions fail on conflict rather than block), Transaction logging & fail recovery (redo logs), In Memory support – can be used as a non-persisted cache, B+ Trees – supported

The key words being "lock free". In any typical CC issuing system you can expect to see transaction times from 50 to 500ms depending on the amount of work the authorization system has to perform, DB latency and locking.

Typical transaction workflow looks like some code that just tries to get some data from the DB, do some work, get some more data from the DB and do some more work. And while performing I/O with the DB you always have to be ready for a failure. Typical failures are deadlocks, consistency because another process updated a record and so on. And when you think about the breadcrumbs and trying to recover from these failures it simply makes the code more complex.
update account set balance=balance-10.00, version=version+1 where ccnumber=? and version=11;

With the hamster approach you can and should get all the data that you need from the DB upfront at the beginning of the transaction. Keep in mind that in some use-cases this data could be prohibitively large so it's best to completely understand the scope. Then do the workflow as you would normally... leaving you with a set of DB updates/inserts that need to be executed. So execute them.

Now if anything goes wrong you have choices. 1) retry the DB changes from the last step; 2) retry the workflow from step two; 3) retry the entire transaction. It simply depends on the nature of the write failure and what you determined was the best recovery.

And here's why this is better. Given the performance profile for an authorization (50-500ms) and the timeout that is permissible by the association (10-45seconds depending in the trantype) you can retry this transaction internally almost any number of times in order to get a positive response... providing the error was internal and not hardware, network ete related.

One other thing that did not escape my eye. Kevin Smith (formerly from Riak) is the erlang client maintainer. Optional encryption(great for PCI)

On the downside there is no replication, sharding, python, perl, or traditional C. However, the approach would be interesting for other platforms... almost there.

Wednesday, June 22, 2011

Current News and Updates

Google released version 1.5.1 of App Engine. They added some significant APIs and features, however, in my mind it's missing a GO update.

Tornado has been in version 1.2.1 for a long time... and the developers just release version 2.0. (download here) Looking at the release notes there are 3 major updates and several minor. Many of the minor updates are prerequisites. The most impressive will undoubtedly be support for python 3.2. However there may be some minor backward compatibility issues.

Should Go replace my use of Python?

Here is an interesting post that posited the question in my title: Experience porting 4k lines of C code to go

There are a lot of reasons to use GO. I like that it's from Google but I don't like that there is a release often approach. I need something that is a little more stable than that. Granted this offers some justification for deploying packages and the like and using goinstall in order to deploy and update packages as new releases of GO are made available. There is also something to be said about the monolithic codebase, however, that flies in the face of this deploy approach.

But I like the compiled performance, channels and the wealth of packages (It needs more like a performant web framework, templates and production ready database adapters.)

Go, while cool, is still a little half baked. Where python and perl are still up to the challenge.

Tuesday, June 21, 2011

"The Network Is the Machine" : Erlang is not all that

I like erlang and I like it most because it solves a number of problems, however, the problems that I think it solves in general application development are not the kinds of problems that most erlang programmers want to solve. For the [sp: life] live of me I cannot understand why erlang programmers would implement a database like Riak. It's a complicated undertaking and frankly considering how deep the callstack has to be at times it does not seem practical without a real debugger.

As I consider the amount of work that it takes to implement a single credit card transaction I realize that the entire callstack is going to consist of a few thousand instructions regardless of the language. The hardest part of a credit card transaction is the DB record versioning and not the actual in-memory workflow.

So then we start talking about the threading and IPC. MEH. I no longer care about that stuff. Not even for a second. With libraries like ZMQ "we" should reconsider how we allow processes to communicate. Modern MQs are fast, reliable, persistent, and easy.

Finally, If you have a transaction that takes a predictable number of machine cycles (specially in the context of a CC transaction) then executing the transactions synchronously via a fixed number of workers will have less overhead than each transaction being launched at the same time. As light as they may be there is still overhead. O(1)+1 still has a +1 and at some time.... say after 1M transactions they will count for serious performance.

[update 2011-06-22: When the machine is idle then parallel execution and all the thread happiness in erlang makes sense, however, when the machine is busy then single process execution makes the most sense as there is no overhead no matter how small. ie; if you have 100K transactions you still have 100K worth of work to complete. The mean execution time will be higher just because of the latency and overhead but on the hole there is no real advantage. see nodejs as a partial example.]

[update 2011-08-24: I get it and it was not because I was talking to the CTO at Riak. Two days ago I was looking at the connection pool to a Postgres DB. That's when I realized that an erlang implemented database... at least for socket(s) and systems with long runtimes like connection pooling; was a very strong use-case for erlang.]

The Really Big O

Get your mind(s) out of the gutter. I'm actually thinking about benchmarks not bed-marks.

Back in the olden days we used to refer to a program or algorithm's performance profile in terms of o-notiation. I'm pretty sure that most computer scientists still follow this montra and for the most part it's probably still true.

So if it is the case that o-notation is still a real form of measure then why do most languages have different performance profiles while they perform the same work. For example 100K compares is the same when it all boils down to the assembly instruction that makes the decision:

When it comes down to it every language makes decisions the same way. So why such different profiles. I say again.

First of all I think that the trouble lies in the libraries. I'm not convinced that the same care id put into every library so that the minimal number of instructions is executed. The reality is... how much code needs to execute on either side of the instruction above to actually do the work in the target language?

Secondly not all JITs are equal. And while that's good for som languages you'd think that at some point they'd all use the same JIT. But then I'd be complaining about benevolent dictators like Oracle. I really do not like the JDK being used for Scala, Clojure and others. While they are bolted on nicely... They simply inherit everything from the JDK and whatever Java libraries you care to use. As evidenced by Lift. Which seems to depend heavily on Jetty. And if the purpose of the language is to get the benefits of a functional language... using Java libs directly would seem to defeat the purpose.

Finally whatever you are trying to accomplish in a single transaction is not unlike the CMP above. The number of comparisons over the transaction is no different between the many language. So whichever language you select it should have more to do with the overal impedance with the problem scope instead of the lonely intellectual islands.

BerkeleyDB in Payments

BerkeleyDB is awesome... but I liked it better when it was a part of SleepyCat and not Oracle. I hope that Oracle does not bury the product and that it gets the attention it richly deserves.

A number of years ago, while I was working for WildCard Systems, I was designing an authorization system that had a few constraints. The first was that everything was deployed on Windows and second that the DB was going to be MS SQL Server and as a side effect of moving requirements from the sales team I was forced to implement the business rules as stored procedures. At the time SQL Server did not have a replication system and we were still running on souped up PCs pretending to be servers.

So I built my own replication engine. That failed. And then I tweaked it... it was OK for a while... until the transactions started to show up. Over the years we tried several, including Microsoft's version too. They all failed one way or another. But I digress.

At some point everything was moved to enterprise class DELL hardware including a SAN from EMC. And we had some new/serious execs running the show. So now we were doing things like performance testing and peak season preparation and so on. And then the sad news arrived. We were only able to perform 25TPS, sustained. I was dejected. In later years I read about SAN broundouts on EMC hardware. And then there was the in-house replication. I was completely dejected as the system I built for First Data was doing over 100TPS; but then First Data was running on Sun hardware and Oracle and we did not have the budget to compete yet.

Some time shortly after an "Oracle vs SQL Server" meeting and Oracle was buried. I left WildCard. But the idea of developing an authorization system that could perform had captivated me. So I started reading. Finally I read an article from the CEO of SleepyCat. It so inspired me to implement some test programs using BerkeleyDB. In the end I was able to perform 1700TPS on very modest hardware doing almost a complete simulation of a transaction including reads and writes. Granted I had not included replication but I was able to test the basic premis.

I left the experiment with the feeling that BDB could do the job... and now that BDB is part of the NoSQL revolution, willing or not, it seems that maybe it deserves some more attention. So I've been thinking about them again... but now I don;t think that it's a contender. Replication is left to the programmer to implement and if you ask me that sort of programming is highly specialized and demands that experts implement it. So I do not think BDB is a good match at all.

Redis In Payments

There are a number of hurdles for the merchant checkout/shopping-cart to overcome when accepting credit card transactions. There are a number of obvious and outwardly facing challenges like:


  • Acquirer contracts

  • Shopping Cart

  • Banking

  • requirements - payments, recurring payments

Once you make it past, not so technical speed bumps, there are a number of implementation details that follow. On the one hand there is the user experience and how that is implemented by the website; and then there are the many acquirers and the different protocols and payload formats. This is usually referred to as transaction impedance.


What does this mean? How is that implemented in Redis?

Let's start with the user experience. At some point the user will want to complete the purchase by providing some credit card information that you are going to use to send to an acquirer for processing. Given the number of ways this can be accomplished the best way will be an internal implementation using an iFrame. This way you can encapsulate and reuse the checkout in multiple places within your app.

The iFrame will then POST via REST or Ajax type message to the URL that provided the iFrame. The form data should be validated here and then forwarded to the message broker. I'm suggesting RestMQ is a good option as it uses Redis. The message is put on the message queue. Shortly thereafter a worker daemon that has been blocking on an empty queue will awaken, pull a message from the queue, reformat the message for the acquirer and forward the message to the Acquirer.

Here is where it get's tricky. Depending on the protocol either the worker is going to wait for a response from the Acquirer in response to the request or the worker is going to move on to the next message in the broker's queue. This depends on the protocol with the Acquirer. If the Acquirer is implemented in REST or HTTP then the worker can simply wait. Of course there can be as many workers for as many simultaneous connections the acquirer will permit.

On the other hand, many acquirers like to use a single socket and process the transactions in an asynchronous fashion. In which case you'll need two threads and a cache for the transactions in flight. I know that's a tough concept.... here goes...

A worker thread pulls a message from the queue, assigns a UUID, stores the transaction in a hash with an expiration date, then writes the message to the async socket.

Some other thread is blocking while reading from the remote socket.  When a response is received, the working will queue the response.  Another worker will read from that side-queue, locate the UUID, get the request from the cache, parse the response, construct a response for the user checkout and then forward the response to the user's iFrame.

WOW... and now the interesting stuff.  Since the messages in the queue and cache can have expiration dates transactions that timeout can be allowed to timeout all over the system without adversely effecting the overall performance of the system. Historically transaction nodes have allowed transactions to timeout rather than send back errors in response to transactions that timeout. The logic required to insure that the error responses make their way around the "system" are costly. And in fact have demonstrated that users get nutty with the "retry" and they can essentially DDOS your payment system.

This is a pretty exciting design. Tornado and Cyclone use epoll and address the C10K problem nicely. Redis can perform 100K writes per second. Daemon tools can handle keeping the system alive. Redis offers some replication although it is master-slave so HA is still possible. Since this is a cluster solution it is possible to distribute the transaction load over several servers. However, in a cluster arrangement you may need to route the transactions (cluster affinity) so that the same card numbers follow the same route... should there be any sort of stand-in processing, duplicate transactions, etc... you want to have the latest information possible. Implementing routing via cluster affinity is another use-case for Redis as you can store the routes and then replicate that data and read the record from the slaves. With any luck this will be faster than an errant user. But you still have to keep an you out there for evil does... so a good blacklist is also helpful.

Some elements I left out... end of day batch file, recurring transactions, reporting, and customer care.

[UPDATE: Redis does not like combining EXPIRE and replication. The EXPIRE can have unpredictable results when executing queries against the slave(s). OK, so we have learned that FIFOs and Caches are useful in MQ broker construction and as part of the impedance mismatch correction. Redis/RestMQ seem to be strong tools.]

membase + couchDB = couchbase : Why?

Just a short note because I have some reading to do. In the back of my mind there are echos of membase being a ram-only cache-like db and that there were forks and the like that implemented the same APIs and had persistance. This guy does a good job breading things down, however, he gives membase 200K/sec and does not say anything about Redis' performance. So while many of the elements that I would use for comparison are there they are not equally presented. Furthermore there is still a question in my mind about membase and it's persistance that he suggests.

So many questions.

The biggest doubt I have is the title of this article. From the link in the article membase appears to have been absorbed into the couchbase project and so it does not exist on it's own. Not to mention that redirects to The same can be said for and

Then primary use-case for membase is caching of data from any traditional DB, so where is the benefit of merging with couchbase?

Monday, June 20, 2011

Domain Updates

I have updated my domain to Therefore many of the links are going to redirect etc.

Sunday, June 19, 2011

Stay Tuned

I've been working on some new articles that I hope you'll find interesting.

  • How Much Code Have you Written?

  • Do you pad your resume?

  • Perception is everything!

  • FaceTime - Killer App?

You may or may not be able to tell what's coming from the titles. Either way it should be somewhat interesting.

Friday, June 17, 2011

So Long Replay, Hello Tivo

[caption id="attachment_95" align="alignleft" width="300" caption="ReplayTV shutting down operations"]ReplayTV shutting down operations[/caption]

I was going to watch some show I had previously recorded on my ReplayTV and this message popped up on my screen. It seems that ReplayTV is going to halt all operations on July 31, 2011. This really sucks because I like my ReplayTV. They were the first to offer remote room playback (record a show in one room and playback in another using a second ReplayTV and a wired ethernet network.)

This is sort of interesting because at least one of the ReplayTV devices has been misbehaving. Every once in a while it just hangs. So I was thinking that I would need to replace it in a few weeks or months at the most.

In the meantime I was looking at the latest Tivo. They seem really cool. Some of the interesting features include iPad/iPhone/web integration so that you can change the recording options while you are away from home but connected to the internet. And I recall reading that you can record 2 shows and watch a recorded show at once. That's one more show than the replay permitted... but it requires some investigation first.

The one show stopper that I am interested in is whether we can watch in the other room via wired network. This is a big deal and the most important feature.  I once had Dish Network in the house because they said they could playback anything in any room of the house.  That was far from the truth but it was the closest I have seen until the ReplayTv.
When you have more than one TiVo box connected to your home network through broadband, you can easily transfer non-copy protected HD shows between them. Simply connect all of your TiVo boxes to your home network and you can enjoy the convenience of multi-room viewing.

In other families it's simple.  His device and her device. And where the device is, is where you watch that show... or you record it on multiple devices. So my search begins to make sure that I can watch where I want to ...

PS: It is interesting that it also has some Netflix functions... so we can watch movies and TV shows streaming over the internet.

Loading CDRs into MongoDB

Sweet. This was as slick as you'd expect.

The task was to load 235529 records from 100+ CDR files into MongoDB using the mongoimport tool. Using a Rackspace server with 512M ram and 20GB disk... but it's all virtual anyway.

Here are the numbers (not scientific at all):

  • 1m 10s - with verbose turned on

  • 34s - with verbose turned off

I'm certain that some portion of the latency with verbose on is that the console was remote and so there was some lag in the i/o across the internet.
The import:

$ . ./bin/
connected to:
dropping: data.cadb
30700 10233/second
57500 9583/second
85000 9444/second
113600 9466/second
144000 9600/second
170700 9483/second
197200 9390/second
223600 9316/second
imported 235529 objects

Just to be sure I checked that all of the data was loaded... some people have been complaining that data has been lost.

$ wc -l /tmp/20110515/*
. . .(snip). . .
  235529 total

And then I checked the count on mongo.
$ ./mongo/mongo
MongoDB shell version: 1.9.0
connecting to: test
> use data
switched to db data
> db.cadb.find().count();


So everything is exactly where it needs to be in terms of performance. With any luck the loading is going to be linear. So that if I loaded 20M records I could expect to take about 40 minutes.

What is interesting here... is that 40 minutes of loads all at once would normally cause a SQL/RDBMS to burp as the locks were escalated and as indexes needed rebalancing etc. This is one of the main reasons why DBAs prefer to load the initial data from bulk loads into temp tables before moving them into their final resting place. Any why Postgres supports sharded tables that can be temporarily detached while the import takes place.


I decided to try loading a similar range of files remotely over the WAN. It got off to a slow start but then it got to about 75% of the performance that "on the same box" did... and this was through an encrypted tunnel.
rbucker@klub:~$ . ./bin/ 
connected to:
dropping: data.cadb's password:
100 33/second
23100 3850/second
48700 5411/second
80300 6691/second
106500 7100/second
131700 7316/second
158200 7533/second
183200 7633/second
imported 187600 objects

Reported my First Bug to MongoDB

I have a client that generates several million Asterisk CDR (call data records). These CDRs are not perfect. In fact they are formatted as TSV and not CSVs; and they have a leading TAB character. Since the CDRs are generated in 5 minute intervals and the files contain a few thousand CDRs it does not make sense to load the DB a record at a time. It actually makes more sense to bulk load so that the data is processed at as low a level in the DB engins as possible.

My first attempt to load data into MongoDB failed. The data was all askew. The problem is/was that there was a leading tab in the TSV file. And during the normal processing of the input file the import utility was stripping all leading whitespace regardless of the filetype. Since the whitespace includes the TAB character and since the first column of my data was mostly empty... the file had a leading TAB character.

And this character was considered a whitespace and so it was deleted before the record was processed.

So I did what any open source guy would do. I opened a ticket. Fixed the bug. And presented my patch in the ticket.I hope they will accept it.

Thursday, June 16, 2011

Is Social Networking Just a Fad?

I read the headline: Is Social Networking Just a Fad? And instead of reading the article I aggregated a number of other headlines that I read including One Simple Rule: Why Teens are Fleeing Facebook. There was also some grumbling about the upcoming IPO. Now that social network software is mainstream it's just a matter of time for individual business to find a way to capitalize. If you've played and Cyberpunk RPG and you understand the backstory... it seems inevitable.

The rule of thumb in this fiction is always:
"The corporation" wants to spider away as much information as it can from any source that they can. Information from public sources is to be copied and then destroyed. The information from the competition is of utmost importance. And so on...

Sure there is going to be a FB but it will be different. It will be the public forum; it will be the anonymous platform; it will be the place of activism; games; and time wasting. Nothing is going to be real. It will be like most dating sites.

The corporation will exploit it for advertising a la Max Headroom. But have a very tight internal ghost network.

Yes! This post is from the FUD and Subjective folder.

Knuth has gotta go

I graduated HS in 1983 and unlike many of the other students who used the Apple computer lab in school I used my family's Radio Shack TRS-90. And since my father was mostly a business man I wrote text based software. And when we bought our first IBM PC it was monochrome. So it was no surprise that I ended up taking programming courses in college.

It took me 3 years to get out of community college and a 1 year hiatus to find myself and then another 3 or 4 years to catch back up. Since graduating HS I was always working in my field. At some point, while I was in college, I decided to buy the Knuth books. I mean this guy defined our/my existence and there has to be some knowledge in there that I need.
I would like to note that Vol 1 was initially published in 1973. Volume 4a is due out this year(2011) and Volume 5 is due by 2020.

So these books have been following me around for almost 20 years and I'm sad to say that I have never opened them other than to see the publishing date in volume 1.

This takes me back to other days in HS and College when the people and students around me would say things like "I'll never use that in real life". The fact of the matter is you probably won't. And so today I lose a monitor stand and the local library gains a reference... that I'm certain no one else will ever read.

Wednesday, June 15, 2011

A brand new annoyance: Adobe Air

I installed TweetDeck a few days ago and I don't have anything good to say about that experience. This morning my wife installed the latest Shutterfly express app and I'm not happy about that either. You see, both programs required that I install Abobe Air and in today's environment where nothing is free not even the free stuff... it's just one more commercial entity with free access to my computer.

If I paid for the software I suppose I would feel more comfortable with the fact that the application was not going to spider my disk drive and give away my family or professional jewels. As it is we permit way too many software snippets access to our virtual homes. These intruders must be sandboxed and in such a way that does not create additional friction to the user.

It's for that reason that I really like the iOS development environment. Every application is sandboxed and that's just the way it is.

So I managed to get a little sidetracked... I don't want to know that Adobe Air is installed, I don't want to share versions of Adobe Air between applications, I do not want Abobe collecting data on me, and I don't want them sending that data to the master computer. It's bad enough that the search engines and ISPs already monetize everything I do.

Tuesday, June 14, 2011

Review: Google Music Beta

I've been granted a beta account on Google's new music service and my initial impression is that I'm disappointed (and I hope Apple is taking notes for iCloud)

I received my invitation yesterday and I was pretty happy about it. I requested an invitation recently and given the number of uber power users at Google I was not expecting anything.

I installed the desktop app, which is not really a desktop app at all. It's an application that quietly runs in the background aka daemon (or TSR for you DOS throwbacks). The actual user experience takes place in the browser. So on to my checklist of complaints:

  • The browser app seems sluggish (could be because the upload is running).

  • The daemon is uploading my entire library (8K songs) rather than just the signature.

  • They deployed an Android version of the player but no iPhone.

  • If you elect to receive the free songs you cannot tell them from yours unless you really know your library.

  • It only plays on my computer and I like the AirPlay in my iPhone, iPod, AirPort Express (maybe Google should buy Logitech).

All in all I think Pandora and Spotify are the ones that need to be concerned here unless iCloud does not deliver.

[update 2001-06-15: Two days later Google is still uploading my music. 6102 songs of 7543.]

[update #2: Cloud Music Comparison: What's the Best Service for Streaming Your Library Everywhere? - Lifehacker]

MarsEdit is from Pluto

My issues with MarsEdit are not that serious; so let's run the numbers.

Cost - $39.99. Yikes. I had to check the appstore to refresh my memory. I cannot believe I paid this much. The description suggests that it is the #1 blog editor for the mac, so either there are no other blog editors on the mac or this might actually be the best.

Features - Here I have to admit that ME supports a wide number of blogging platforms and maybe that's why the editor seems so clunky. It feels like using the old edlin(DOS) compared to vi circa 1983. I'm just grateful that I did not have to dust off my WordStar cheat sheet.

Append, not update - This could be a bug or it might be PEBKAC; I was doing some simple edit/publish cycles. I published the same document several times... thinking nothing of it. I decided to check my work when I noticed duplicate posts. So it seemed that with every publish I had an extra copy of my document.

Feels like a webapp - There is some RTF in the editor but it feels clunky. The user can either enter the HTML tags in manually or use some RTF keyboard shortcuts. It just does not feel like a Mac desktop app.

No sharing - I use several computers. So I might start the story on one computer and finish on another. ME does not offer a or alternative for editing and sharing docs. The best you can do is uploading a draft version of the doc and then downloading it on the other machine. (this is not much fun)

In summary: it does the job but it feels like there is so much user/application friction that it feels like I'm struggling to get things done. Maybe next time. For now, I'm just going to go with the actual webapp version. It's fast, seems reliable, offers some RTF, etc... but I can share my docs. Oh so much better.

Monday, June 13, 2011

VoIP on iPhone

I know a little about VoIP because I part-time manage 6 Asterisk servers and 3 CDR aggregators. These systems are involved in telephony arbitrage which I understand from the outside but not as an insider. Let's just say that from the outside looking in it's meant to be obtuse.

So it seemed natural to me to want to reduce my phone bill and still have all the comfort features and functions that I get from the local telco. I do not really want to manage my own asterisk server inside the house. I might want to use my analog phones. And I definitely need to maintain the same QOS as I get with the other guy.

That when I found a reddit post that caught my attention and eventually got me looking at PlugPBX. This is a great idea once you get past some of the networking issues and the need for an analog connection to the local phone, however, if you're willing to install IP phones around the house or SIP capable wireless phones, this will be a game changer.

With the advent of wireless HDMI and other wireless technologies we are going to get to the point where the entire home will be wireless.

So this brought be full circle to VoIP phones and if I'm going to deploy a PBX then I need a service provider... and while I'm at it, why not a SIP phone application for my iPhone. A SIP phone application is also known as a soft phone.

I hate to say it's confusing because it's not. It's just a serious challenge going through all the different service providers trying to determine what the features are what the final cost is going to be. Some are domestic and some international. Some offer unlimited rates and some do not; and then there are the other misc features like voice to email. I still do not have a answer or recommendation.

So I pushed that quest back a little and started looking at SIP phones for the iPhone. There were hundred of them. Not very many of them had more than 10 or 15 ratings and all hovered at around 3 stars. Average.

So while I think the PBX is going to make it's way into the home or at least the last mile and the future of the wireless home. The status of the softphone is still wide open.

PS: I have plenty of GEN-Y'ers that only use cell phones. Which is starting to look like a good idea to me.

Wednesday, June 8, 2011

Cassandra; a game changer?

I'm not certain that Cassandra is a NoSQL contender. It may be part of a solution but a solution unto itself. Upon first reading the apache group tells you all these wonderful things that Cassandra does but it feels like is it not enough. The glaring omission is a MapReduce function and the closest your going to get is using Cassandra as the storage engine for a Hadoop NoSQL framework.

Hadoop is a beast of a different color. It seems to support different storage engines... HBase is their traditional storage engine and it also also supports Cassandra. Their may be others but Hadoop is not the focus here. The last word on Hadoop is that is seems that Google has given the Hadoop license a waiver from it patent on parallel queries.

There is a product named Brisk from Datastax. This is Hadoop+Cassandra+some DataStax sugar. I'm just not buying this setup. The website suggests to me that it's more about the DataStax (commercial) dashdoard than it is about the integration of Hadoop and Cassandra.

Finally, one last note about Cassandra. Cassandra is written in Java and as my peers may or may not finally attest, since I have been pushing Java since 1.02, Java is living up it's performance promise. However now that Oracle has purchase Sun and all of the pain that Java community is going through it's anybody's guess what's going to happen next. (Think of Java as Cobol for the modern age) Oracle is the half-whited red headed blathering step-child of the Java community. This can be seen in the Java/Hudson debacle.

While I like the passion that the Cassandra mailing list has and the strength has. I'm not sure that it's a NoSQL database and that it will continue to thrive as it has although I really like these enhancements. They are worthy.

PS: I have no interest in learning thrift.


seven in seven?

I learn a new language at least once a year. It's just something that I have tried to do since I started taking my profession seriously(1988-ish). Recently I started to get the itch to learn a new language and it did not take long to select one.

I had been working with ZMQ (ZeroMQ) for a while and luckily for me they have example code in a number of client languages. Since ZMQ is implemented in C, they have plenty of C examples but curiously enough all of those examples have Lua versions too. The remainder of the examples vary from language to language.

I do not select languages because of the geek factor or the cool factor but for it's ability to shorten the development cycle, the tools it provides, community support, the development pool, community activity and viability in business. And using this criterial I had initially dismissed Lua.

For example, there have not been any releases or patches in over 5 years even though Lua is the scripting language used in WOW(world of warcraft). According to github Lua is not in the top 86% of the languages stored there. The community seems to be very protective and a little snarky. And the origin of the language is based on some misguided protectionism on the part of the Brazilian government. And finally, performance.

So in the last 24 hours or so a couple of things have changed. First of all the Tiobe Community Index released some new numbers that suggest that Lua is making moves. Although the sudden moves make me suspicious. Secondly, Snarky people never really bother me. Next, WOW has a different sensibility when it comes to application correctness than banking applications.

Of course, nothing is going to change the origins of the language and I'm not sure how crazy I am about the fact that it was developed blindly but who cares. The language and it's tools compile and install more simply and easily than erlang. And to respond to a tweet about concurrency primitives:

for n=1,NBR_WORKERS do
local seed = os.time() + math.random()
workers[n] = zmq.threads.runstring(nil, worker_task, self, seed)

I cannot say much about the benchmarks except that they would suggest that the language, in this testcase, is exceptional. And so even if the language does not support IPC for itself this mechanism might be even better for all the reasons one might use an MQ.

So for the time being I think that Lua is still on the short list and as soon as I break free I'm going to take a much closer look.

Monday, June 6, 2011

The New Desktop

INTUITION: The new desktop is going to be an iPad or something based on iOS.

We are yet to see a virus, trojan or malware attack against an iOS device. Of course Apple has been singing the praises of OSX for years on the basis that it has not been attacked or penetrated (true or not). So for a moment just assume that an iOS device is as impenetrable as Steve Jobs would have you believe.

(I'm in my happy place)

So there are a few adjectives that I would use to describe the iOS devices:

  • secure

  • app-liscious

  • cloudable

  • mobile

  • inexpensive

  • self-destructable

  • accessible

The device is secure. It needs to be connected to a PC that has iTunes installed. The iTunes application requires an Apple ID. And somewhere in there is a chain of custody that links the device to the user... by everything short of a DNA scan. And as a application designer I know that each application is sandboxed; meaning that no application can access the data of another app.

Speaking of apps. There are plenty of them. The number of apps is significantly higher than it's closest rival. There are basically 3 types of apps. Apps that you "buy" from iTunes; those that your enterprise installs; and third, close to the second, apps that you write yourself. Apps can access local data but the current thinking seems to suggest cloud computing is the way to go.

Cloud computing is such as misused word and in much of the same way that people misuse .NET. (this is a topic for another day). Cloud computing has come to mean that the local device is put a proxy for the interaction with the application that is running on remote computer(s). Cloud computing also implies that there is a lot of shared and distributed computing for storage and computing. (google docs is a good example of cloud computing and dropbox is a good example of cloud storage). So it's not enough to just say "cloud".

(as of this moment the WWDC keynote is scheduled for tomorrow)

The iPad has some battery life. We are advised that the device should last 10 hours. That's amazing, however, as a desktop replacement batteries are not really needed, however, it makes dealing with power outages much easier. They are easy to take to meetings, give presentations, airplanes (gotta love those seat backs). In a disaster all you need is a Wifi or 3G which makes traveling and setup so much easier. One of the best mobility features is that all you need is a docking array for all of the devices... then it's a pick-your-desk when it's time for you to put on your shift.

I also like the cost because it's basically inexpensive for what's inside. All you need is something that is fast enough to render whatever GUI you need and the rest of fluff. It may be slightly underpowered from some complex multitasking and there are some issues like keeping some sessions open (like comet) but overall it's workable.

And if you lose it or it is stolen it can be taught to phone home and/or self-destructive. Granted this is part of the security model but it's also part of the bigger enterprise strategy.

These devices are accessible and available everywhere. They are in a majority of countries in the world and they interoperate with Macs and PCs. Because it is a tablet it works for any language although it also you can use a bluetooth keyboard.

iOS is one heck of a platform. I only wish the screen was bigger to replace my desktop completely and that I had an IDE that let me write code. The former is not likely to happen but the latter is on it's way and should be here soon enough.

PS: but something has to happen to these batteries. One week ago a single charge would last me 24 to 48 hours depending on the usage. But now that I have installed Twitter, Facebook and LinkedIn the phone does not last 6 hours.


Hello world!

This is just one of the many first posts I've created today. From several twitter accounts to ongoing facebook customizations and trying to get the OTJA going with some strong alliances. Hass is  the manager's manager and myself for my technical intuition. I'm also looking for some additional contributors that complement the team.

This is not necessarily going to be a happy-happy house but one that produces some quality discussions... as I've discovered recently;

not much out there is actually fact.. in fact it's mostly subjective opinion.

So here's to hoping that my there is room out there for my intuition and that I/we can convert our thoughts into something useful.



another bad day for open source

One of the hallmarks of a good open source project is just how complicated it is to install, configure and maintain. Happily gitlab and the ...