Saturday, January 28, 2012

REST semi-realtime transactions


The freelance pattern implemented with TornadoWeb and ZeroMQ.


I recently implemented one of the broker reliable patterns as described by the ZeroMQ guide. It's something very similar to beanstalkd's but left to the reader to implement. This in itself is not a bad thing but it is more code to design, write and test; and had you the budget to hire these guys directly you would get the best broker money could buy. But how reliable is this model. Really?

I'm not a big fan of the broker model. It's a lot of extra code to write for the broker itself. It's also a single point of failure. And then there is the error handling as the client and worker negotiation the status of a transaction only to renegotiate it when the broker fails. And then there are all those places where transactions can queue up and all that code that is written that does not need to be. (the crux of this article)

In a brokerless model each client connects to each server (many to many) and in a traditional socket implementation that would not be possible. But it is with ZMQ. (read the guide). So a user app can connect to more than one server at a time and the client will "fan-out" the send() to the next server.
ctx = zmq.Context()
socket = zmq.Socket(ctx, zmq.REQ) 
socket.setsockopt(zmq.HWM, 1)
socket.connect('http://127.0.0.1:5555')
socket.connect('http://127.0.0.1:5556')
socket.connect('http://127.0.0.1:5557')
. . .
socket.send('a message for you')
socket.send('a message for you')
socket.send('a message for you')

What is going to happen here is that this code is going to send one message each to each of the servers assuming that there is an actual connection. Because the socket defines multiple endpoints. And it's all very orderly and as expected.

The documentation talks about only round robin-ing active connections... sadly a call to connect() without a bind is still considered a valid connection and so this port would still receive a transaction but not actually send it to the server. Meaning that some transactions are going to be delayed. Just how long depends on the restart time for the downed server.

So on the upside... when everything is running smoothly, the transactions are going to be distributed nicely. Each server will be given some work to perform. The workers are still standard userspace applications that do not need any special threading or processing. Just bind to a socket endpoint and wait for incoming work. Do the work and send a response.

When things go wrong or when you might restart a server manually, that endpoint address is still in the client side. Should a transaction be headed that way and the connection had not been reestablished then that message will block until that port instance reconnects. If the server is running via daemontools then it should restart any second. The transaction in the queue will be scooped up and procession will resume. The number of transactions queued per connection depends on the high water mark setting.

I say '1' transaction in the queue because we set the HWM (high water mark) when creating the connections. This is probably a good setting for realtime systems where losing transaction in an invisible queue is the least desirable event. You might also be able to add NOBLOCK on the send() function to get some other actionable events. It really depends on the applications tolerances.

At first I did not like the idea of losing the transaction(s) but I'm warming to the idea that the codebase will be smaller and possibly more reliable overall.

'take me off the list'

A couple years ago there was a "do not call list". Well that did not work.  Every new and fly by night company ignores those laws anyway and by the time law enforcement or the attorney general investigates they are long gone. So what is a person to do, specially when they call several times a day, when the babies are sleeping or when my wife and I are sleeping?

Also, these new phone systems are pretty smart. Their voice recognition is really good and their artificial intelligence or workflow is even better.  I've been fooled 2 or 3 times already but I think I have the magic now.

Hmmm... I tried a few things...

  1. Talking and talking and talking.... it did not work, the machine has more patience than I do. - FAIL

  2. Every swear word I could think of... the machine just ignores me and asks another question like Eliza did. - FAIL

  3. Answering in the negative to every question... but either it would keep trying to sell me something or I would receive another call in a day or two. - FAIL

  4. Answering in the affirmative to every question... but that did not work either. It told me someone human was going to call me... but it kept asking questions; probably based on some law that required an electronic/audio signature. - FAIL


And that's when I figured it out and it was all by accident. I happen to say 'take me off the list'. That's when the machine stopped talking and responded that it was going to take me off immediately.

Voila. So the next time I hear one of these stupid recordings that's exactly what I'm going to say.

Friday, January 27, 2012

Proper use of a MQ designs

Bus, as a term referring to hardware/software components has been around long enough that many noobs have no idea of it's origin or how to use it properly or when to use it.

In modern computing the bus has it's origin in hardware. It was not always like that. Engineers went back and forth between direct connected components and bus architecture until the 1980s when IBM introduced BusMaster architecture in the PS/2. Things remained stagnant for a few years until there was yet another resurgence of direct connected hardware.

Memory I/O performance was increased by moving cache directly into the CPU, connecting the CPU to RAM via DMA type access.1

Disk I/O performance was increased when the disk controllers were allowed to talk directly to the systems RAM.

So it's no wonder now that MQs are becoming easier to deploy that they are becoming the connective tissue between components rather than direct connections or API calls.

Old school programmers remember MQs like IBM's very well when main memory was very expensive and it was cheaper to page services (SOA) in and out of main system memory when an actual event was there and ready for processing. The complete static application codebase could not fit in memory and implementing smaller services was practical.

MQs were not implemented or made popular on PCs until recently with the advent of J2EE, SOA, and the notion of an application bus. The idea being that services could be distributed across a single machine, a cluster of machines or a WAN cluster of machines.

One of my earlier recollections was when I worked for IBM and A/IX team on the A/IX microkernel.

Now we are experiencing a renaissance of a sort where software/system architects are installing MQs in every corner of their application design. I'm clueless as to why. I'm constantly wrestling with my own designs wondering where the value is and where the inevitable costs are. And one thing for sure is that there always tradeoffs. But here's the thing and for me it's starting to become a rule instead of a thing.
If it takes more time and/or more code to MQ an event [or the event contents is relatively large] to a service then the code that makes up that service should be statically linked rather than treated as a service because you are just generating heat and not money.

For example, in modern programming languages the first target application is typically hello world. And in most MQ implementations the first service is typically echo or add. Sadly, if the rule were implemented then these two services would never be written and we would have to think about more complex services to demonstrate the design (echo and add are so simple that they fool most managers).

In the simple case:



There is very little overhead here. The contract between the client and the add class/service is clear. The code is small enough, the data is small enough, that statically linking the code makes the most possible sense. Consider that the service that the service provides is but a few assembly instructions once you get past the instruction stack and memory fetches.

But if you've had a sip from the bug-juice then you see the world this way:



In this case the client thinks it's calling the add function locally, and the add function thinks it's being called locally, the fact is that may not be the case. The remote add stub and the MQ may be on the same machine or even CPU core in the same box but it might not be.

In fact the stub has a certain amount of overhead just because it exists. Then there is the communication overhead. (a remote connection across a WAN can take from 100 to 300ms on a good day), The marshaling of the data from one system to another has some overhead too. (back in the day of the original RPC there was a stub generator that would simply massage the endian-ness of the data so that it was cross platform; now things are much more complicated). The actual add() is only going to take one or two CPU clock cycles; which cannot be measured in ms. So for the sake of some grandiose style guide we have added huge amounts of overhead.

In a recent design I implemented an MQ/service-bus in order to handle the impedance mismatch between incoming connections using epoll, a single threads web server, and the need to be able to handle high transaction volumes.



In this case the service does a lot more than just add two numbers. The message from the end-user client needs to be parsed(1), some data needs to be decrypted(2), some initial decisioning needs to be performed(3), and then the transaction needs to be reformatted(4) and directed to a 3rd party for processing(4&5).  When the 3rd party completes the transaction, then the response needs to be parsed(6), a few more decisions made(7), and then a response is assembled for the end-user(8). And then the response is sent back to the client through the same path(9).

What makes this different than the add() service is that it's performing real work in the form of the service. If I took all of the work-units that the service performed and split the service into the sub-parts then I could potentially have hundreds of services, each with 100-300ms of communication overhead. I've identified 9 possible steps in the transaction... and at 100ms per steps that's almost a full second.

I'm trying to find a silver lining for a fine-grained SOA but I cannot. The work performed by the service is sync not async and therefore the product of the non-essential MQ is heat.

Two examples;

1) Google is working on GO, a new programming language. Rob Pike and Co have been describing Google's code base. The code covers several programming languages and it is considered a monolith. GO is best used when it's statically linked; I do not recall if it even supports dynamic libs.

2) Depending on your magnification the study of MQ starts with the hardware and some sub-components. Then  firmware, then the OS and it's device drivers, and continues to build out. This looks a lot like a Mandelbrot image of sorts.

Our predecessors selected what was a service and what was a direct connect very carefully. Mostly based on ROI and the cost of gold. The same decisions are true today.

 

Wednesday, January 25, 2012

Dynamic Languages and PCI-DSS

Some security experts, including myself, thought that implementing financial software using dynamic languages would create a security threat for the "company" or the account holder. However, as I sit here this morning contemplating an open source payment platform delivery system I realize that it's a silly hypothesis.

Forgoing all of the traditional attack/fraud vectors I'm thinking about the code. The PCI-DSS covers the production hardware and database(s) but it also covers the developer's computers, build machines, staging and QA, and the code repository. The "processor" is expected to treat the securely and equally. (this highest priority goes to the encryption keys and devices).

So if an attacker can get to any of these systems and inject code then you really have a bigger problem than whether the code was Python, perl or Ruby. Of course since Java can be executed anywhere then it can also be compiled anywhere. Reverse engineering Java and then recompiling is no more or less difficult. Of course injecting code into a Java or C based system is more complicated if the attacker is already in... regardless of the programming language, you're cooked.

As I continue to consider my open source project it's just a matter of selecting the right dynamic language for it. I want to be productive enough that the extra time can be spent on physical security instead of the false hopes of obfuscation.

Tuesday, January 24, 2012

How to find programmers

Inc is running an article that finally makes sense of internet hiring. The leadership at Pulse, the company named in the article, told it's programmers to start blogging. As a result they have started attracting attention from all corners. This certainly makes more sense than speed dating, code scraping, social ranking, etc... And of course it helps to have Inc do a story on your business.

First you are attracting people who are interested in the company and the work being done. Second you might be opening a dialog with the candidate before they are actually a candidate via blog comments. And finally, it likely is not going to cost you anything more than existing methods and it's certainly less than professional recruiting services.

Pulse get's a +1 from me.

Monday, January 23, 2012

Hirelite - speed dating for jobs

Hirelite is another one of those last minute entries in the fly by night job site of the day websites. There is no doubt that the likes of Careerbuilder, Monster and TopJobs have lost their luster. But this is starting to look like a clown car at the circus. Just how many of these so called job search companies are there?

That was rhetorical. Don't answer it.

While it's true that there is a social aspect to the professional hiring process it's certainly not akin to dating. The Bachelor is on TV right now and I do not see a resemblance to the hiring process there. When you date someone it's usually because there is an intent on a level of permanency, in an employment situation you're going in a different vector. In the 1980s it was commonplace to ask "where do you see yourself in 5 years".

Anyway, as an employer I have a responsibility to find the best candidates though responsible means. As an employee I want to be hire by companies that aren't trendy and show common sense. A 5 minute speed session is no way to find a mate and certainly not a valid way to find an employee.

Hirewolf makes no excuses

Hirewolf is the latest in a series of employment service providers that promise to filter and test potential candidates in the hopes of getting a "golden ticket" to employment. (recently I wrote about GitHire)

What makes these guys different is that they make no excuses for the decision making process. They are going to decide the candidates fate by
"We will choose whichever project strikes us as most beneficial to the open source community."

This is probably the most shameless description of the 1% solution that I've ever heard or read. Just in case I've never said it... I do not want to be scrutinized by any these services. These companies are not driven by the same rules that HR departments of proper companies which are prohibited from disclosing anything about your performance. All they can say is the dates of employment. And maybe your title.

So the fact that both of these companies are going to scrub your social identity and produce a score for the perspective employer... that you may or may not know... that even if you knew you might not have a chance to defend or refute the score.

Right now social aggregation does not have a passive place in the hiring process. The candidate must be actively involved... let the games begin.

Domain Specific Framework

I'm grateful that Wikipedia does not have a reference to something called a Domain Specific Framework. So I get to define it here.
A Domain Specific Framework (DSF) is a set of hardware, languages, libraries, and best practices that make up a software development environment for a programmer(s) for implementing applications larger than "hello world" and of median complexity that does not require too much specialization or edge case libraries.

A perfect example of an ideal DSF might be Xcode for iOS application development. What makes it ideal is that it is self-contained and has most everything an iOS developer is going to need to implement, test and deploy an application for an iOS device. A second good example would be JEE (fka J2EE). (Grails, Rails and Django are good examples too)

Where I typically get derailed is when super heroes start with a base programming language, maybe they implemented version 1.0 of the application,  and then start stapling on libraries in order to develop a DSF for the team of developers that have been added to the team. These might be called a Custom Domain Specific Framework.

There is good reason for avoiding DSF style of application development if an application is going to have a fixed and manageable level of complexity. But when you know that the application is going to expand beyond that in a short, albeit relative, period then a DSF might be a better way to get started.

The justification for a DSF like JEE and Rails is huge. These and other DSFs are a basic common denominator for developers. It sets the bar for the conversation and for the development standards. To say that you know Ruby/RubyGems or Java/Maven and can read English is not enough to say that you know Rails or JEE development.

I cannot believe I'm about to suggest this but... .NET means something. So does Rails, JEE, Django, and so on. To say "we use java and a hundred other jars" will only add friction to your development lifecycle whether it's hiring, training, building, testing or deploying. Hey I get it. Not all of the DSFs out there use best in breed libraries. That's a completely different issue.

NOTE: The one important thing I forgot to mention is that when there is commercial potential for a DSF then it's likely that there is going to be a commercial version... as there are several JEE vendors (IBM, Oracle). Glassfish is an open source late entry but it is certified and I do not remember who owns Jetty (probably RedHat). But the other thing is that there are plenty of books and good online documentation. You cannot say the same for a lot of custom DSF(s).

Forced Pomodoro

I was in the middle of writing and testing my CRUD-fest article when it occurred to me that the evaluation versions of IntelliJ, PyCharm and RubyMine might actually be better than the paid-for version. Granted it's not English to use the eval version and not pay for it but an American might split hairs a little while longer.

For the record I am preparing a purchase order but I might continue to use the eval version. That's because the eval version pops up with a warning every 30 minutes. And every 30 minutes you have to restart the IDE. Now I'm not a fan of IDEs because they hide so much from you in terms of the full stack, however, they are pretty good when it comes to productivity when you want to concentrate on code and not framework. I'm also not a big fan of using Java or the JVM unless it's packaged with the IDE; but that's off topic.

The fact that I have to restart the IDE every 30 minutes means I have a better concept of time. I have a moment to catch my breath in order to focus on the next 30 minutes. The only thing I'm not certain of is whether it's going to close my files before it turns itself off. All in all, even after I pay for it I might have to continue to use the eval version.

 

Saturday, January 21, 2012

CRUD-fest : grails, rails, django shootout

The mission is to deploy a CRUD implementation in all three frameworks by reverse engineering my schema from an existing Postgres Database which I will construct with raw SQL. Later I would like to add some data to the tables so let's see how it handles some ETL (export transform load) in the form of a CSV file into some REST calls that I'd implement or some other type of messaging.

What I did not do! I think O'Reilly has the most comprehensive map of the history of all computer programming languages, however, GitHub has a list of languages that would seem to be current or relevant. Granted that some of this, to be effective, would mean investigating popular frameworks within the domain of languages. Well of that semi-complete list from GitHub I picked out this set: php, go, lua, haskell, erlang, scala, clojure, perl, javascript, rhino, nodejs, iOS, Objective-C, C++, C, Pascal, Pro-SQL, CoffeeScript, OCaml, Scheme, tcl, Smalltalk, Visual Basic. I think they are the most relevant. As it goes, however, either they do not reverse engineer schema from a PG (postgres) connection like most ORMs, or they do not have web or other application frameworks in order for a user to interact with the data, and many do not have IDEs or version managers they way that Ruby and Python do. (I cover the IDE topic later.) I think I picked the sweetspot of frameworks to test and skipped the ones that would distract me from the task.
Code Wars: PHP vs Ruby vs Python – Who Reigns Supreme [Infographic]

Let's start with the schema design. The actual SQL is here. The "message" represents an ISO8583 message. You can follow the link to read more about the message. What's important to know is that it represents the standard message used between certified credit card acquirers and their associations like Visa, MasterCard, Amex, Discover and many others.  It also represents the message used between the associations and their issuing processors. The message format is also reused by many POS terminal software vendors as well as gateway and technical acquirers/processors. Each association has slightly different implementations depending on their specific needs, however, most of the fields have common names and usage.

When building an endpoint or a gateway or any point in between there is a need to support ISO8583 and a bigger need to test the endpoint. As part of the testing phase it's important to build a test harness that can generate the necessary test transactions. Depending the application's position in the network it will have different testing needs. One thing for certain is that the messages and their contents need to be modeled, testable and repeatable. One huge challenge is TDD (test driven development) and another is simple regression testing. It's my personal belief that a well defined toolset could be deployed in such a way that vendors across the board could contribute data a code for regression testing thus reducing the load on everyone. PS: while this mission is ISO8583 there is no reason why a request/response transaction could not be JSON, XML or S-exp; it's a general enough schema.

  • TABLE: message_field_dictionary - this is a list of the field names and their formats. Nothing else.

    • FIELD: id (PK)

    • FIELD: field_id (FK)

    • FIELD: field_name

    • FIELD: field_description

    • FIELD: field_format (i.e.; 'YYYYMMDD')

    • FIELD: data_types (a, an, ans)

    • FIELD: default_values (i.e.; '1', '2', '3', 'abc'...)



  • TABLE: message_request - this is the set of fields used in a particular request transaction. It is also possible that this represents the incoming request pattern. If the pattern does not match then that's separate issue.

    • FIELD: id (PK)

    • FIELD: test_case_id (FK)

    • FIELD: field_id

    • FIELD: request_type

    • FIELD: field_value



  • TABLE: message_response - this is a set of fields in the response generated from the response. This might also represent a response message based on the incoming request pattern.

    • FIELD: id (PK)

    • FIELD: test_case_id (FK)

    • FIELD: field_id

    • FIELD: response_type (i.e.; absent, required, optional, conditional)

    • FIELD: field_value (i.e.; a value, regex, or a combination set, private function or other field_id)



  • TABLE: message_test_cases

    • FIELD: id (PK)

    • FIELD: test_case_id (FK)

    • FIELD: short_name (FK)

    • FIELD: description

    • FIELD: expected_results

    • FIELD: elapsed_ceiling

    • FIELD: is_active

    • FIELD: group_name

    • FIELD: sub_group_name



  • TABLE: message_test_results

    • FIELD: id (PK)

    • FIELD: test_case_id

    • FIELD: started

    • FIELD: finished

    • FIELD: elapsed_time

    • FIELD: results

    • FIELD: request

    • FIELD: response

    • FIELD: errors

    • FIELD: trace



  • TABLE: test_cards - I decided to put the test cards in a separate table because if contributors provided test transactions the actual card numbers and magstripes would be considered confidential data... and the associations do not want anyone recording that info with one possible exception.

    • FIELD: id - (PK)

    • FIELD: card_number (FK)

    • FIELD: serial_number (FK)

    • FIELD: expiration_date

    • FIELD: issue_date

    • FIELD: street

    • FIELD: zipcode

    • FIELD: pin

    • FIELD: atm_pin

    • FIELD: CVV

    • FIELD: CVV2

    • FIELD: track1

    • FIELD: track2

    • FIELD: track3

    • FIELD: reset_balance

    • FIELD: is_decrement

    • FIELD: actual_balance

    • FIELD: open_to_buy




The schema is self explanatory. I did not create any real indexes. I'm not certain (right now) whether I'm going to create any FKs or referential integrity. It depends on how much reverse engineering the different frameworks are going to execute. One thing for sure, this is not intended to be a lesson in DB design. Maybe another time. FKs might be required in order for the reverse engineering to work properly. Specially if I use tables to populate pulldowns and select lists.

The IDE I decided to use was RubyMine, PyCharm and IntelliJ. It is purely by coincidence that I decided to use this family of IDE from jetbrains. (for the record I'm currently using the demo version. I'm hoping that the licenses do not expire before I finish this article... this paragraph was written before coding began). What makes them interesting is that they support Django, Rails and Grails out of the box. After agonizing over it IntelliJ was the only reason why I included Grails. Java does not have a version manager like Ruby or Python, however, you can get there by collecting your jar files in a single folder alongside the JDK you're using. And since many binary distributions of the JDK are in version folders it makes resetting the CLASSPATH and PATH easier... but still more manual than the ruby and python versions.

RubyMine heads up. When I originally installed RubyMine I had not installed RVM. I found a post from the folks at JetBrains about version 2.0.2 where they probably added RVM support. Anyway they said it just worked. That after a restart RubyMine would give you access to your tools. That was not the case. I had to take one extra step. I had to go into the preferences and navigate through the "Ruby and SDK" page. I also navigated through the gem sets for good measure. Now when I created my project I had access to my Rails version; previously unknown. I had a similar problem with PyCharm and it's support for VirtualEnv but I will have to verify it with Django. (Shame on me. My desktop virtualenv did not have django installed. I will likely have to do a complete install based on my notes which were originally installed on my virtual machine and not my desktop).

Getting Started


Now that I've managed to crawl through the minutia of project preparation it's time to start putting the project together. So Now that I have installed IntelliJ, PyCharm and RubyMine I have to create my project. I'm calling the project crud_fest_rb, crud_fest_py, and crud_fest_j.

Creating an empty rails and an empty django project was pretty simple. Specially after all the setup I've done in preparation. The one observation I'll make is that there are a lot more artifacts in an empty rails project than there are in a similar django project. The Grails project has a lot more artifacts than that and since there is a compilation step it takes a lot longer to get started. One thing that bugs me about IntelliJ is that it starts the browser to a default page once it's ready to run.

This project was not meant to be a JetBrains tutorial, however, I'll mention a few more things. Rails started right away. Django required the user to enable the admin function, update the settings file to point to the proper DB, and then you had to manually execute the 'manage.py syncdb' command in order to create a default admin  user. This step will be required later in order to sync the db schema to the model. Rails and Grails are still under investigation.

One nice thing about the Rails and Django projects is that they respect SQLite3. Not that Grails goes out of it's way to reject SQLite3 but the support is hard to come by. This means that when I put my SQL together it will need to support but SQLite3 and H2. Which will probably work but what a pain. I suppose I could use Postgres but there is nothing easier than a 'cp' command to reset the DB to it's default. And if this project is successful then copying the output table to the target application means that that the SQLite DB file is now the config file.

The Schema


I have created the SQL. I was not going to embed the code directly but the Gist is here. It's only six small tables and the foreign keys are few. There are a few constraints which should be removed when the constraints are fully represented in code instead of schema. (Keep in mind that when calculating performance things like O(log<n>) no longer makes sense when there are cascading reads based on constraints. And frankly it does not make any sense to have the constraint modeled in code and SQL at the same time.)

Import the Schema


... into the project is the next step.

Django


PyCharm could execute the following commands but currently it feels better to execute them manually from the command line.

The first step is making sure that the DBs are configured properly in the settings.py file. In a recent version of Django the developers made it possible and easy to support multiple and different databases simultaneously. That means I could connect to one DB for one set of actions and another DB for a different set. There are plenty of interesting use-cases here. So let's configure the DB:
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.sqlite3', # Add 'postgresql_psycopg2', 'postgresql', 'mysql', 'sqlite3' or 'oracle'.
'NAME': '/tmp/crud_fest.db', # Or path to database file if using sqlite3.
'USER': '', # Not used with sqlite3.
'PASSWORD': '', # Not used with sqlite3.
'HOST': '', # Set to empty string for localhost. Not used with sqlite3.
'PORT': '', # Set to empty string for default. Not used with sqlite3.
},
'messages': {
'ENGINE': 'django.db.backends.sqlite3', # Add 'postgresql_psycopg2', 'postgresql', 'mysql', 'sqlite3' or 'oracle'.
'NAME': '/tmp/crud_fest_py.db', # Or path to database file if using sqlite3.
'USER': '', # Not used with sqlite3.
'PASSWORD': '', # Not used with sqlite3.
'HOST': '', # Set to empty string for localhost. Not used with sqlite3.
'PORT': '', # Set to empty string for default. Not used with sqlite3.
}
}

(The indenting is not exact here but that's a WordPress thing)

The second step is to make certain that the admin functionality that we previous enabled (and now stored in the default database: crud_fest.db) has been sync'd properly. (you'll need to answer some questions about the admin user including the username and password.)
/Users/rbucker/git/flafreeit/crud_fest_py/manage.py syncdb

Now that the admin tables have been created, you'll need to create the actual crud_fest_py tables.
cd ${HOME}/git/flafreeit/crud_fest_db
sqlite3 /tmp/crud_fest_py.db <./setup.sql

And then the last step is to dump or reverse engineer the table(s) into a models.py file.
/Users/rbucker/git/flafreeit/crud_fest_py/manage.py inspectdb > /Users/rbucker/git/flafreeit/crud_fest_py/test_config/models.py

Looking at the models.py file you'll see something like (looks like I have some trimming to do; the admin tables were included):
# This is an auto-generated Django model module.
# You'll have to do the following manually to clean this up:
# * Rearrange models' order
# * Make sure each model has one field with primary_key=True
# Feel free to rename the models, but don't rename db_table values or field names.
#
# Also note: You'll have to insert the output of 'django-admin.py sqlcustom [appname]'
# into your database.
from django.db import models
class MessageFieldDictionary(models.Model):
id = models.IntegerField(null=True, primary_key=True, blank=True)
field_id = models.IntegerField(unique=True, null=True, blank=True)
field_name = models.CharField(unique=True, max_length=25, blank=True)
field_description = models.TextField(blank=True)
field_format = models.CharField(max_length=200, blank=True)
data_types = models.CharField(max_length=200, blank=True)
default_values = models.CharField(max_length=200, blank=True)
class Meta:
db_table = u'message_field_dictionary'

This is just a sample of the tables that inspectdb generated... because there is one final step. Now that we have a models.py file with the individual schema we need to tell Django about the tables and the individual fields that need to be editable. There are some shortcuts; the online docs are really good. So we are going to create an admin.py file like this.
from django.contrib import admin
from myproject.myapp.models import MessageFieldDictionary

class MessageFieldDictionaryAdmin(admin.ModelAdmin):
pass
admin.site.register(MessageFieldDictionary, MessageFieldDictionaryAdmin)

Once this last step is completed then you need to launch the django server and navigate to the admin site with your favorite browser. All of your tables should be there. You might still need to customize the widgets but this is the place where we stop.

One Final note. Sadly my decision to use SQLite means that I might have to do all of this all over again. It seems that the foreign keys have not been incorporated into the results of the 'inspectdb' command. There is a pragme in SQLite that enables FKs but it has to be compiled in beforehand. There was also at least one multi field constraint that does not appear.

But this is a good place to stop for now.

Rails


The first thing I noticed is that there is no code in Rails for reverse engineering a legacy database the way that Django does. So I had to install a missing gem.
gem install rmre

Then I had to reverse engineer my DB.
cd ${HOME}/git/flafreeit/crud_fest_rb
rmre -d /tmp/crud_fest_py.db -o ./app/models/

After the command completed I was returned to the command line. There were no error messages so I assume that it completed OK. I looked in the ./app/models/ directory and noticed that there were a handful of new files. These files were 1:1 with the table names. Here's an example:
class MessageFieldDictionary < ActiveRecord::Base
    set_table_name 'message_field_dictionary'
end

This is a little hinky because none of the field names or types have been included. After doing some searching I found that this is OK and that ActiveRecord will fill in the blanks. I don't know if I buy that. I like that Django fills in the holes and this sparse programming ... *sigh*.

There is another command that is interesting:
rake db:schema:dump

This will dump the schema into a file db/schema.rb. It represents the complete schema. I suppose that this code could be copied to the model files. But for the moment this is not required... I think the db:migrate command will regenerate this db/schema.rb file when it completes.

Another caveat here is that it is possible to have multiple databases configured in the database.yml file. The difference, however, is that ActiveRecord needs to know which database goes with which definition. So there is some manual work to be done here. There are some simple google searches you can execute and most of them make perfect sense. It's a little beyond the scope here even though I described the python version.

Grails


Grails supports multiple databases in it's DataSource.groovy file. Since I'm working with Grails 2.0.0 there is a possibility that the latest Hibernate is included. Hibernate is the ORM that Grails uses to communicate with the DB. But the tool for reverse engineering needs to be installed.
cd ${HOME}/git/flafreeit/crud_fest_j
grails install-plugin db-reverse-engineer

The output was pretty simple.
rbucker@rmac[crud_fest_j]$ grails install-plugin db-reverse-engineer
| Plugin installed.

... but I have no idea which version was installed. So we move forward for the moment. The next step is to locate the h2 command line version.
java -cp ${HOME}/lib/grails-2.0.0/lib/com.h2database/h2/jars/h2-1.2.147.jar org.h2.tools.Shell

You'll need to answer a few default questions. You can accept the default values for the moment.

Now that H2 is running and pointing to the same repository as the DataSource.groovy, now we need to run the SQL to create the database tables as we did previously. (I had to make some changes to the code because there are some differences with H2.

Now we try to do the reverse engineering.... actually, it's not going to happen. I'm going to leave this up to the reader to complete. And if anyone wants to contribute, please, by all means. For the moment This is the end of the road for this project.

Conclusion


It is safe to say that I'm done with this project. While it seems plausible that I could reverse engineer a database using the Grails Plugin - the amount of configuration required just amazes me. Spring is heavily dependent on XML config files and it appears that Grails uses some of each. One thing for certain is that Java has these huge namespaces everywhere so just the slights config requirement for reverse engineering is so incredibly painful. I'm really surprised that the Grails guys did not do more Groovy scripting for this sort of thing.

Ruby/Rails on the other hand still required an outside GEM in order to perform the reverse engineering. I'm surprised that with Rails3 and 3.2 that they never addressed the issue directly. And not to mention that the resulting models were still sparsely emitted.

Finally, Django seems to have gotten it right. It emits the code in it's entirety. Getting to the CRUD is a simple matter of some manual labor which the user could script easy enough. My vote is going to Python as the all around winner and Ruby a close second. The Java code is on hold, maybe we can call this a "did not finish". The overall performance of the dependencies and the compile step make it a less valuable experience.And let's not forget the JDK version madness.

So for the time being I'm inclined to purchase a license for PyCharm and RubyMine just because this is where I'm going to be spending my time for a little while... and it's my money.

My Ruby Installation

[update 2012-01-22] A new project is underway. 'crud-fest-rb' it's part of a new story I'm writing. I just tried to launch a new rails project and I was blocked because I was missing some basic gems. They are now in the list below. The names are: jquery-rails, coffee-rails, sass-rails, uglifier. My empty RubyMine/rails-3.2 project is now running. The CRUD comes next.

[update 2012-01-21] I just published this article a few hours ago and I realized that I forgot some stuff. I forgot to mention that I need Twitter's Bootstrap project here too. So when I start working on my first project I'll need to import Bootstrap. I might have missed a few more things... I have been following python and many of the python libs longer than ruby so that makes sense. If you have any recommendations send them on. PS: TextMate is getting an update soon but I think I'm going to install and use RubyMine from jetbrains.com. It's like PyCharm, also from jetbrains.com but for Ruby and supports Rails. (hopefully Rails 3.2). Like PyCharm, RubyMine support RVM out of the box. Nothing special required.

I'm not a great fan of Ruby. I was in the beginning when I was first introduced to Spring. And it was not so much Ruby as it was Rails. The fact that it did all the CRUD one could want was a big deal. Now it's not that uncommon. Django offers CRUD for python app developers and Grails offers CRUD for java developers.

Reddit recently linked to a visual comparing python, ruby and php. There were some interesting observations about the number of programmers and projects in each vertical. That combined with my new love for RVM has given Ruby a new lease on life in my toolkit. (I have also worked on professional Ruby projects but without RVM they were no fun).

RVM is a big deal. Not unlike virtualenv for python, RVM allows the programmer to install and configure his/her development environment in userspace. This means that many of the fears I had about creating my dev environment on my local hardware and version collision and dependency mismatch means that all I really need to do is create a separate user directory on my laptop for each project. This way everything is nicely partitioned and what I can focus on is backups rather than the crazy live of remote virtual servers for development. Of which there is still some value for demonstrations and client access.

So let's setup a basic RVM install so that we can start our next project. Some of the installation is going to overlap with the python installation (here). You can skip all of the userspace installation but the sudo installation is all you'll need. Keep in mind that this installation was all about a virtual server at RackSpace. If you are installing on a Mac or Windows machine your install will be different.

Install the default ruby. I originally thought that I'd need to install at least one Ruby installation in the core OS but it turned out to be unnecessary. RVM will install any version of ruby for you. Once RVM is installed you can issue the command (rvm list known) and you'll get a list of the known rubies available. In fact I had to remove this package and start again because of some side effects down the road.
apt-get install ruby1.9.1

Install RVM
bash -s stable < <(curl -s https://raw.github.com/wayneeseguin/rvm/master/binscripts/rvm-installer)

startup RVM... you can logoff and back in or you can look at the last line of your '.profile' and execute the source command.
source "/home/rbucker/.rvm/scripts/rvm"

NOTE: I did some looking at the .rvm directory and I saw some things that did not make sense.  It seems that many of the modules and ruby artifacts were being installed in a ruby folder labeled 1.9.1 presumably that was a ruby version. I have restarted the install process several times now and it appears that RVM is installing my version of ruby 1.9.3-p0 as a set of patches over version 1.9.1 which explains the folder structure.

install a recent Ruby version
rvm install 1.9.3-p0

Make this version current (you will have to run this command every time you login)
rvm use 1.9.3-p0

If you want to make this version of ruby the default version run this command (note the default command will insure this version of ruby os ready each time you login)
rvm use 1.9.3-p0 --default

which rubies are installed and which is current
ruby list

install some gems in userspace. Note that rvm is taking over the gem install. Lucky for me Rails 3.2 was released a few days ago and now it's installing. I've elected to install Sinatra too. That's because it's a useful micro framework when rails is just too much.
gem install rails
gem install jquery-rails
gem install coffee-rails
gem install sass-rails
gwm install uglifier
gem install sinatra
gem install redis
gem install mongodb
gem install pg
gem install mustache
gem install fastercsv 
gem install iso8583
gem install sqlite3
gem install ruote 
gem install json
gem install sxp
gem install sexp
gem install zmq
gem install beanstalk-client
gem install rmre
gem install haml
gem install db-charmer
gem install mail
gem install activemerchant
...
gem update money rack rake sourcify sprockets

**there were some initial difficulties installing the 'pg' gem. It started off as a problem because the install would not complete. I kinda realized that I was missing the postgres dev files so I installed that core package as the root user. When I installed the gem on my OSX machine it installed nicely. I'm not certain I know why or how the postgres client was installed on my OSX machine but it seems to have included enough code to compile the gem. MacPort was not installed on either machine or maybe it was corrupted because it's not functioning and I'd swear that it was installed once before.

**one other thing that is concerning me is the version number of the ZMQ gem. If the version number maps to the version of the main lib then they are mountains apart and that's something I cannot afford. It's one of the reasons that I'm going down the path of RVM and native gem management. Some interesting notes; the version of the GEM is 2.0.7 and when you run ZMQ.version() from the IRB you get different results so it must be checking the binary libs.

On my OSX
rbucker@rmac[usr]$ irb
1.9.3-p0 :001 > require 'zmq'
=> true
1.9.3-p0 :002 > ZMQ.version()
=> [2, 1, 10]

On my Ubuntu
rbucker@soldev:~$ irb
1.9.3-p0 :001 > require 'zmq'
=> true
1.9.3-p0 :002 > ZMQ.version()
=> [2, 1, 11]

install gems for building gems. I do not know anything about them except that I hope that they work.  I've installed all three because they should not collide. There is a 4th and 5th option.  There are some devs that have github general templates that they use. You can always roll your own in a similar way. Personally I'm interested in these 3 projects because it might mean less editing for me.
gem install jeweler
gem install hoe
gem install echoe

At this point I'd create my bare bones project, install bootstrap just like the python version, and then start hacking my project.

Reference material: I really like pragprog.com. They seem to have the most current docs. Their PDF and epub ebooks are great and they keep me updated. (Rails 3.2 was just released and their eBook was also updated just a few days later; granted not much was supposed to have changed, but still).

In the coming week I'll be writing an article on the CRUD-fest in django and rails as I start building my credit card association test harness. Stay tuned as this page will be updated as I add gems to my basic installation.

Thursday, January 19, 2012

socket adapter impedance mismatch

Netty 3.3.0 was release with some dependencies

I'd like to try some netty code whether it's standalone or connected to Apache:Camel but when I downloaded the source I saw that I needed Maven2 in order to build it. So I started the install process for Maven2...

Yikes! I have no idea what licensing constraints I've entered into; why maven needs rhino and a large number of other libs.

Back in the day when Object Oriented programing was becoming popular, around 1983-ish, people gravitated to the private/protected/public guarding of methods and data. I'm not sure why that was but I can guess it's probably ego. There was a point to it when code or libraries were distributed in binary only form but today open source has all but eliminated secret sauce and we typically use naming to identify usage with documented examples and recommendation.

That said, the super-dumptruck needs to be replaced with a backpack approach. This means that there is a more granular approach to installing dependencies and even interdependencies but it 2012 and we should be able to handle this.
rbucker@soldev:~/src/netty-3.3.0.Final$ sudo apt-get install maven2
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following extra packages will be installed:
antlr bsh bsh-gcj fop gcj-4.6-base gcj-4.6-jre-lib java-wrappers libantlr-java libasm3-java libavalon-framework-java libbackport-util-concurrent-java libbatik-java
libbsf-java libclassworlds-java libcommons-beanutils-java libcommons-cli-java libcommons-codec-java libcommons-collections-java libcommons-collections3-java
libcommons-configuration-java libcommons-digester-java libcommons-httpclient-java libcommons-io-java libcommons-jxpath-java libcommons-lang-java
libcommons-logging-java libcommons-net2-java libcommons-validator-java libdoxia-java libdoxia-sitetools-java libexcalibur-logkit-java libganymed-ssh2-java libgcj-bc
libgcj-common libgcj12 libgeronimo-jms-1.1-spec-java libgnuinet-java libgnujaf-java libgnumail-java libgoogle-collections-java libitext1-java libjdom1-java
libjline-java libjsch-java libjsr305-java libjtidy-java liblog4j1.2-java libmaven-archiver-java libmaven-clean-plugin-java libmaven-compiler-plugin-java
libmaven-dependency-tree-java libmaven-file-management-java libmaven-filtering-java libmaven-install-plugin-java libmaven-jar-plugin-java libmaven-plugin-tools-java
libmaven-reporting-impl-java libmaven-resources-plugin-java libmaven-scm-java libmaven-shade-plugin-java libmaven-shared-io-java libmaven2-core-java libmodello-java
libnekohtml-java libnetbeans-cvsclient-java liboro-java libplexus-ant-factory-java libplexus-archiver-java libplexus-bsh-factory-java libplexus-build-api-java
libplexus-cipher-java libplexus-classworlds-java libplexus-compiler-api-java libplexus-compiler-javac-java libplexus-compiler-manager-java
libplexus-component-api-java libplexus-container-default-java libplexus-containers-java libplexus-digest-java libplexus-i18n-java libplexus-interactivity-api-java
libplexus-interpolation-java libplexus-io-java libplexus-sec-dispatcher-java libplexus-utils-java libplexus-velocity-java libqdox-java libregexp-java librhino-java
libsaxon-java libservlet2.4-java libservlet2.5-java libslf4j-java libwagon-java libwerken.xpath-java libxalan2-java libxbean-java libxml-commons-external-java
libxmlgraphics-commons-java libxp6 rhino velocity
Suggested packages:
bsh-doc fop-doc libantlr-java-gcj libavalon-framework-java-doc libbackport-util-concurrent-java-doc jython libclassworlds-java-doc libcommons-beanutils-java-doc
libcommons-collections-java-doc libcommons-collections3-java-doc java-virtual-machine libcommons-digester-java-doc libcommons-httpclient-java-doc
libcommons-io-java-doc libcommons-jxpath-java-doc libcommons-logging-java-doc libcommons-net-java-doc libdoxia-java-doc libgcj12-dbg libgcj12-awt libgnumail-java-doc
libjline-java-doc libjsr305-java-doc libjtidy-java-doc liblog4j1.2-java-gcj libmx4j-java libmodello-java-doc libnekohtml-java-doc libplexus-classworlds-java-doc
libplexus-component-api-java-doc libplexus-container-default-java-doc libplexus-i18n-java-doc libplexus-interactivity-api-java-doc libplexus-utils-java-doc
libplexus-velocity-java-doc libqdox-java-doc libsaxon-java-doc libservlet2.4-java-gcj libjavassist-java libwagon-java-doc libxalan2-java-doc libxsltc-java
libxalan2-java-gcj groovy libspring-core-java libspring-beans-java libspring-context-java libspring-web-java libequinox-osgi-java librhino-java-doc velocity-doc
The following NEW packages will be installed:
antlr bsh bsh-gcj fop gcj-4.6-base gcj-4.6-jre-lib java-wrappers libantlr-java libasm3-java libavalon-framework-java libbackport-util-concurrent-java libbatik-java
libbsf-java libclassworlds-java libcommons-beanutils-java libcommons-cli-java libcommons-codec-java libcommons-collections-java libcommons-collections3-java
libcommons-configuration-java libcommons-digester-java libcommons-httpclient-java libcommons-io-java libcommons-jxpath-java libcommons-lang-java
libcommons-logging-java libcommons-net2-java libcommons-validator-java libdoxia-java libdoxia-sitetools-java libexcalibur-logkit-java libganymed-ssh2-java libgcj-bc
libgcj-common libgcj12 libgeronimo-jms-1.1-spec-java libgnuinet-java libgnujaf-java libgnumail-java libgoogle-collections-java libitext1-java libjdom1-java
libjline-java libjsch-java libjsr305-java libjtidy-java liblog4j1.2-java libmaven-archiver-java libmaven-clean-plugin-java libmaven-compiler-plugin-java
libmaven-dependency-tree-java libmaven-file-management-java libmaven-filtering-java libmaven-install-plugin-java libmaven-jar-plugin-java libmaven-plugin-tools-java
libmaven-reporting-impl-java libmaven-resources-plugin-java libmaven-scm-java libmaven-shade-plugin-java libmaven-shared-io-java libmaven2-core-java libmodello-java
libnekohtml-java libnetbeans-cvsclient-java liboro-java libplexus-ant-factory-java libplexus-archiver-java libplexus-bsh-factory-java libplexus-build-api-java
libplexus-cipher-java libplexus-classworlds-java libplexus-compiler-api-java libplexus-compiler-javac-java libplexus-compiler-manager-java
libplexus-component-api-java libplexus-container-default-java libplexus-containers-java libplexus-digest-java libplexus-i18n-java libplexus-interactivity-api-java
libplexus-interpolation-java libplexus-io-java libplexus-sec-dispatcher-java libplexus-utils-java libplexus-velocity-java libqdox-java libregexp-java librhino-java
libsaxon-java libservlet2.4-java libservlet2.5-java libslf4j-java libwagon-java libwerken.xpath-java libxalan2-java libxbean-java libxml-commons-external-java
libxmlgraphics-commons-java libxp6 maven2 rhino velocity
0 upgraded, 103 newly installed, 0 to remove and 1 not upgraded.
Need to get 60.3 MB of archives.
After this operation, 134 MB of additional disk space will be used.
Do you want to continue [Y/n]?

Now that I installed Maven2 I performed a 'mvn clean' command. Maven is now downloading countless artifacts from the maven server. (I'm not going to copy all of the filenames here)

In conclusion... there is no way that a single person can manage a project with this dependency stack unless you a) don't care about full stack awareness; b) just going to defer all responsibility future generations; c) have a crystal ball that points out bugs and provides magic workarounds.

Wednesday, January 18, 2012

Flask, Pystache and Bootstrap

[Update 2012-05-14] RDO pointed out that there were some changes in pystache. The first and foremost is that they changed the API structure entirely. While I might not have actually implemented that best practices as there are a number of ways to accomplish the same thing. I was more interested in staying try to the initial project structure.  So if you read the pystache Renderer class you'll probably see everything you need. And to round the changes out I found a few bugs in the MANIFEST.in and __init__.py that have been corrected here. This code runs although it's not loading any of the images and some of the css... as I have not updated the html to work the way it should. But it's easy enough for the reader now that the rest is working. (One final note, edit the fluid.html file and change ../assets/ to /static/) 

I'm using flask, pystache and bootstrap in order to build a fast prototype. I've already written a little about this but the text was getting long in the tooth so I cut it off short of a complete description. I decided to break the work into a separate text... as it's more manageable. Some of these instructions will vary if you are using virtualenv.

- download bootstrap
cd ${HOME}
mkdir -p git
cd ${HOME}/git
git clone https://github.com/twitter/bootstrap.git

- download the latest pystatche (pip does not have the latest code so go to github for it)
cd ${HOME}
mkdir -p git
cd ${HOME}/git
git clone https://github.com/defunkt/pystache.git
cd pystache
sudo python ./setup.py install

- download and install flask
sudo pip install flask

- download and install 'modern-package-template'
sudo pip install modern-package-template

- create your project (answer the questions as best as you can)
cd ${HOME}
mkdir -p hg
cd ${HOME}/hg
paster create -t modern_package helloworld

- now you need to do a number of things in order to get pystache to work with bootstrap. I'm going to list them out and give examples where I can.

  • copy the 'examples' directory from bootstrap to the 'src/helloworld' directory

  • create a 'static' directory in the 'src/helloworld' directory

  • copy the css and JS directory from the bootstrap directory to the static folder

  • update the MANIFEST.in file to include the examples and static directories

  • update src/helloworld/__init__.py

  • update src/helloworld/hello.py


FILE: src/helloworld/__init__.py
from helloworld import hello
def main():
    hello.main()

FILE: src/helloworld/hello.py
from pystache.loader import Loader
from pystache.renderer import Renderer
from flask import Flask, url_for

app = Flask(__name__)
@app.route("/simple")
def simple():
loader = Loader(extension='html', search_dirs=[app.root_path+'/examples',])
template = loader.load_name('fluid')
renderer = Renderer()
return renderer.render(template, {'person': 'Mom'})

def main():
app.debug = True
app.run()

FILE: MANIFEST.in
include README.rst
include NEWS.txt
recursive-include src/helloworld/examples *
recursive-include src/helloworld/static *

pystache is able to locate the templates stored in the source because I added the code 'app.root_path' to the template directory parameter in the pystache loader.

Flask is able to locate the static files because flask defaults '/static' to the folder in the same directory as the current module. This did not require any special config. If you decided to move the file you would have to add some params to app.run()

Finally, since setuptools does not really recognize this project structure you have yo update the MANIFEST file yourself in order to tell setuptools to package the example/templates and static files.

One parting note. Jinga2 is installed with flask as part of the dependencies. I would have used Jinga2 but except for 2 reasons. 1) I'm experimenting with different languages and I'd rather use one template assembler. 2) There has been some recent criticism of template assemblers in general. The complaint is that they break the MVC model. It is currently impossible for Mustache to do that with the current syntax.

Serving any object like JS or PNG files need to be in the static directory or they need to be served by a web gateway in front of the flask instance. Like nginx, lighted or Apache.

I hope this article is better than the previous. I might clean that one up.

Bootstrapping your next project with Bootstrap.

Bootstrap is a nice little web app starter framework released by the kind folks at Twitter. I'm not sure why they did it but I suppose that does not matter much. It's nice, open, and fun.

For the purpose of this bootstrap project, which I'm calling freestrap, I'm going to select some technology that I previously installed in a recent article. The stack will be:

  • python

  • tornadoweb or maybe flask

  • beanstalkd

  • bootstrap and it's deps

  • modern-package-template

  • mustache


I was going to implement a database layer too but I think that will be postponed until the next project is fully realized.

  1. At this point everything is already installed.

  2. You need to navigate to your project directory.  I like to create a git or hg folder immediately in my home.

    1. cd ${HOME}

    2. mkdir -p hg



  3. then you'll need to create the project folder ('freestrap') with the modern packager.

    1. cd ${HOME}/hg

    2. paster create -t modern_package freestrap



  4. And you can run the application by typing
    freestrap

  5. if you execute the 'tree' command you should see something like this.


(currentenv)rbucker@soldev:~/hg/flafreeit/freestrap$ tree
.
├── bootstrap.py
├── buildout.cfg
├── HACKING.txt
├── MANIFEST.in
├── NEWS.txt
├── README.rst
├── setup.py
└── src
├── freestrap
│   └── __init__.py
└── freestrap.egg-info
├── dependency_links.txt
├── entry_points.txt
├── not-zip-safe
├── PKG-INFO
├── SOURCES.txt
└── top_level.txt

3 directories, 14 files

That's it for the bulk installation. Now I'll integrate flask, mustache and bootstrap.

  • create src/freestrap/hello.py with the following code.


from flask import Flask
app = Flask(__name__)
@app.route("/")
def hello():
return "Hello World!"
def main():
app.run()


  • update src/freestrap/__init__.py to look like this


# Example package with a console entry point
from freestrap import hello
def main():
hello.main()

now install the package and rerun the app:

  • python ./setup.py install

  • freestrap


You'll see that the web server is running on 127.0.0.1:5000. If you're like, however, you won't be able to load the test page because it's a remote server that needs to connect to 0.0.0.0. So change the last line in hello.py to:

  • app.run(host='0.0.0.0', port=5000,)


and with the next restart you'll be able to point your browser to this app. Of course you always tunnel.
ssh -L 5000:localhost:5000 rbucker@myhost.remote.com

and then put http://localhost:5000

Redis EVAL() in 2.6.0

Our friends on the Redis commit team are proponents of the Ruby language when not coding in Lua, tcl or C. And so the EVAL() function example code is written Ruby. That's all fine and well... but what about <my_lang>?.

So I spent all of 30 seconds on a python version of the same code. Chances are pretty good that the code will work. I do not know for certain because 2.6.0 is not ready yet and I'm not in a position to install unstable yet. Of course I could run it in userspace but that's another topic.
import redis
r = redis.Redis()
RandomPushScript = """
local i = tonumber(ARGV[1])
while (i > 0) do
res = redis.call('lpush',KEYS[1],math.random())
i = i-1
end
return res
"""
r.delete('mylist')
print r.eval(RandomPushScript,1,'mylist',10)
# __END__

One of the crappy things about python is the indents. It makes copy/paste to a place like wordpress semi-functional.

Tuesday, January 17, 2012

I Want A Skype Replacement

I'm not happy with Skype. There I said it. But like many things in life I do not have much of a choice. The first time I used Skype I was in Barcelona Spain on my honeymoon and I needed to call our family to let them know we arrived before we headed out.

So it cost me $4 for the hotel wifi, and $5 for one day of Skype... and I was able to make one phone call that was interrupted 5 times by either bad wifi or Skype service. I'll never know. I should have used my cell phone thanks to hindsight.

Now I use Google Voice for as much as I can. There are still a few challenges there. It's decent quality but in order to keep the cellphone minutes down I'm forced to use Google Talk via the browser because they do not have a wifi client for the iPhone.

What I'd really like, now, is a Skype online number.  That's a number that someone else, presumably landline or mobile, in order to reach me wherever my Skype connection is active. The problem is I cannot tell what the pricing is. I cannot tell where the discounts are and I cannot tell what the best plan is for me.

The most annoying part is that they do not offer any sales support beyond their website and FAQ. I tried to enter into the system for a callback for a business-class service. They promised a callback, but all I got was an email. Granted there was a phone number at the bottom but I was not going to call it because they promised me a call.

Looking at their website they promised me a 50% discount on the online number but every time I went into my cart it cost $60/yr.  Where was the discount?  Then I found a small +++ reference.  There is a catch.  You have to purchase a subscription.

So after the 12-month (prepaid) subscription the cost is $2.50/month. But I still have to pay for an online number. Since I cannot buy both at the same time I have no idea if they are going to honor the advertised 50% discount. But here is something else I noticed.  I have a $10 balance on my Google Voice account and my Skype account.  I have not made a call that would cost anything ... So what is this all good for?

Monday, January 16, 2012

More! Your Want Some More?

The Hungry Programmer was an interesting article. Unfortunately the comments have been closed so I could not reply directly... While he talks about the quality of food and relates it to programmers. I think he forgot one important analogy.

If you buy pre washed green beans in a bag. While they cost more... they are simpler to cook, simpler to eat, have a predictable shelf life, and if you belong to the school of shopping daily to reduce waste then you've done that too as consumption is well known.

So after all that. It's ok to pay more for good programmers who write good code.

Thursday, January 12, 2012

GitHire - novel but nothing new

I'm getting tired of this subject. GitHire is trying to get employers and candidates that they have a better way. But for one, candidates don't care how GitHire works because it's the employer that's going to make the call and the employer... well let's just hope they are not naive enough to spend $500.

GitHire uses GitHub's APIs. This gives them access to all sorts of project data. But it's all meaningless.

  1. there are a million programmers but they program JavaScript and Ruby (see http://github.com/languages/). You better not be looking for an erlang programmer.

  2. besides, what about all the exceptional programmers that use BitBucket or their projects are private or maybe they don't use any DVCS.

  3. just how do they determine "exceptional"? Lines of Code, Check-ins ... we talked about these metrics and gaming the system in CS101.

  4. code reviews? But wait, they said they don't do code reviews. The fact that they got your creds from GitHub means that they've seen your code.

  5. resumes are obsolete. Really? The resume is your analog profile. So unless you like blind dates or being fixed up with your couples friends you gotta say something. And just because some people fluff their resume... that's what the interview is for.


There are no shortcuts for getting a new job or finding qualified candidates. You have to present your best work, be humble yet confident, creative and interesting. And that goes for the candidate too. Some day I will have to write "the job seeker's bill of rights", and it would look something like:

  • we will read your resume and verify it's accuracy

  • we will talk candidly to your references

  • we will talk to you about you and your past, present and future

  • we will talk to you about your values

  • You will be interviewed by suitable staff members from Human Resources, and management in your department.

  • You will never be interviewed by potential peers.


I'm sure this list needs to be refined but it's a start.

Tuesday, January 10, 2012

SOA and Transaction Processing

I really like the idea of client/server and distributed processing. Cloud computing with all of it's distributed and cooperative nodes around the planet is really cool. SOA might almost be cooler.

Yet another one of my mentors once said:
"I spend the first third of my career building monolithic applications and the second third converting them to client/server applications so it's no wonder that my last third was spent converting them back". --Sagg

The fact of the matter is that there are constant tradeoffs as we convert from one system to the next. And while hybrids reduce the cost somewhat they are incomplete as far as swinging fully one way or the other... and they offer their own problems.

SOA for Dummies draws a nice picture of what a typical SOA system looks like.



Once you get past the infrastructure requirements like the hundreds, possibly thousands, of APIs available in the ESB implementations like Apache Camel, ActiveMQ and JMS. Then you have to manage the many configuration files that are no more or less complicated to implement but there are so many. Then you need to know and understand the entry point into the public services ... and as you implement your services you need to understand the database transaction model (ACID) and topics like dual commit. Finally and no less importantly you need to understand the performance profile for the entire project. There is no silver bullet like randomly moving services onto different CPUs etc... you really need full stack awareness to be effective.

Don't take my word for it. Jon Maron writes:
The ACID transaction model has served the industry well in the past, but Jon Maron points out that it has some major drawbacks when applied to the loosely coupled service domain.

Dual-phase commit works in many cases and it solves a number of problems like the ones that SOA presents in distributed services. Keep in mind that at some point all of the database locks, semaphores, spin locks etc will all converge at some point in the transaction's lifecycle. These types of race conditions are well documented in the database and the microkernel literature.

While SOA for Dummies is a good read; it is more of a whitepaper for the executive set. Wikipedia has a more specific definition including something called the eight specific service-orientation principles. (not their definition). They go on to reference the SOA Manifesto which has some good guiding principles that can be applied to most any application development. What caught my eye here is a quote:
A service comprises a stand-alone unit of functionality available only via a formally defined interface. Services can be some kind of "nano-enterprises" that are easy to produce and improve. Also services can be "mega-corporations" constructed as the coordinated work of subordinate services.

SOA and Enterprise are the two themes that are recurring and conjoined in these two articles and many of the articles that I read as I got to this final point. They seem inseparable and rightfully so. Probably because there is no money in implementing SOA for your local delicatessen's cash register. The question I'm wrestling with is trying to identify the tipping point; when a business would migrate from a simple monolithic application to a distributed worker and then to SOA. I can look at First Data or AT&T and intuitively say SOA but I cannot say the same for Starbucks. Home Depot and Amazon might require some inside knowledge but likely a mixed (not hybrid) approach would work best.

PS: I found this tweet interesting (for all it's distributed goodness SOA might not be that good) but draw your own conclusions.
well actually this shows that a single server with pipelining reading replies asynchronously can do better than N parallel clients.--@antirez [talking about benchmarks against Redis]

 

Monday, January 9, 2012

Getting things done, read, or a reply

I'm not a productivity guru but in a recent conversation with a client of mine we started to hash out why his vendor was not responding to more than a few of the questions posited. In some sample emails there would be several one sentence questions and in others there might be multi paragraph descriptions before any number of questions. In the end we always seem to defer to a conference call. (I really wanted to get things going in an email because there was logical sense to the discussion.)

Then there is the resume. Recent studies (not sited) suggest that a one-pager is the best way to increase the likelihood that your resume will be read. I have both a one-pager and a multi-pager. With my history it seemed to me that a narrative approach rather than the usual boring: assignments, roles, responsibilities, languages, frameworks, etc... would make sense. And besides it no longer fits.

When I worked for NaBanco, later First Data, (circa 1994) we had a "one-pager" to describe all systems changes. Presumably if it tool more than a page to describe then it was too complicated and it needed to be split into separate change requests. It had not occurred to me before this moment but this is clearly an Agile process and it was definitely pre-agile manifesto(copyright 2001).

So by extension; if you were writing a how-to or a best coding practices document for your company you might want to take the same advice. Of course this does not mean 6pt font but it does mean a consistent and well formatted document that is compelling to read. The good news is that this is a good task for a document writer and not a programmer.

Interview Programming Problems Done Right ~ C for Coding

Interview Programming Problems Done Right ~ C for Coding studying classic computer problems in an interview is no way to determine whether a person knows how to code. [http://bit.ly/xm3Q8L]

Sunday, January 8, 2012

Job Search

So you want to search for a new job? The first thing you must do is update your resume. You also need to take a good look at your career and decide if that is where you want to continue to work and just how far from center you are willing to go. You then need to ask yourself about relocation, benefits, and compensation. Once you have those things in perspective you are ready for the search.

Unfortunately the search breaks down into several paths. a) the general job board, b) the specific skill job board, and the ever present c) recruiter.

I'm not going to discuss (c) except to say; I like recruiters for a lot of good reasons but the landscape has changed a lot in recent years. Management in most recruiting companies use their most senior staff to work with the client(employer) and the the most junior work with the resource. The process is always the same. And unless youve studied the process you wont know what I mean so lets move on.

a) the general job board includes; FLUID - Florida Unemployment Internet Direct Claims, Job Central, Dice.com, CareerBuilder, Monster, and HotJobs. They are all fine sources for jobs however there are a number of things to keep in mind. By the time you have located a real position on one of these sites its because the employer is cheap and would rather pay their rates to post a position rather than give it to you. There is a level of desperation on the part of the candidate and the employers know it. So theyll lowball you. And then theres this numbers game that they play. They want to be seen and they are not afraid to discard resumes for the smallest prejudice. So unless you do not have access to a specialized board you should really stay away from these. (youll also waste a lot of time filtering through the results to find that one perfect job and by then its too late.)

b) if youre a programmer, like me, then there any number of sources for jobs. Sadly I know that there is a lot that goes into trying to get to the top of the list and when you hit, it can get viral. Also, these websites can further divide the specialization. That includes perl, python, ruby... Im sure there are a lot more. My favorite include jobs.perl.org, github, joel on software, 37signals, hacker news, python weekly jobs. (I wish I had a java board to go with my list...) There are a few startup boards if you dont mind taking a risk or possible relocation on your own dime. And of course there are specialized boards in just about every country in the world. But unless you speak the language you can forget those. And while the perl, python and ruby sites seem realistic the nodejs jobs site seems a little premature.

Whoops. I missed d). That would the freelance sites. As a general rule I avoid them entirely. a) because they pay so little and expect so much, b) the requirements are unrealistic, and c) completing the task is subjective and subject to approval. Even if the payment is aligned properly whos to say the client will pay. Its like ebay for source code.

Looking for a job is stressful and it consumes a vast amount of time. There is an advantage to social networking. (e).

(f) the newspaper; I have not seen a good technical job since 1987-ish.

Interview Problems *sigh*

[update 2012-01-10] Braintree does it too. While they list some attributes that they look for in their code reviews it's subjective. The interpretation of the results do not appear to be based on any science but probably intuition.

[update 2012-01-08] ** additional note at the end.

[update 2012-01-08] I was in such a hurry to get out of the house today that I forgot to add this little tidbit. [Top software dev job boards ~ Max Masnick http://bit.ly/wqcEuv]. One of the interesting services that Montana links to is something called InterviewStreet. I have no idea what it really is but Montana says "connects you with jobs if you can solve programming problems on their website." So after you spend all of that time customizing your resume, cover letter, making sure that your GitHub contributions are top notch... you now have to take a quiz so that maybe someone is going to notice you and give you a phone screen. And just maybe if you know whether to wear boxers or briefs you might actually get a face to face interview. But that's when I almost missed Drew Inglis' comments [Three ways to improve your InterviewStreet CodeSprint solution | Drew Inglis http://bit.ly/xh9i1O]. Apparently he's going to give the reader 3 ways in which the evaluator is going to give the candidate extra points... which is tantamount to taking an SAT prep course. The only difference is that the SAT is created by professional test makers and the quiz of the day is not.

I recently wrote an article talking about the interview coding question... something that [seems to be] common in most Silicon Valley startups. I suppose it would be interesting to know what the exact origin of this sort of interview structure is/was ... but for me this might be unprecedented in the history of employment other than some of the performance arts. (besides maybe the WPM test for people in the secretarial pool)

For example you would normally ask actors or musicians to audition. First and foremost you're looking for some raw talent and then you need to see how they fit into the ensemble. Somewhere in the audition the director is looking for improvisation skills (it's common to forget lines among other things) and some idea how well they are going to do with a new script and possible changes. (memorization). This is my basic understanding from when I performed in college rep. I'm certain that Broadway is much more complicated.

Conversely there are other professions where one is likely not to be asked a coding type of question. While I do not have specific knowledge but I can imagine that the chief of surgery is not asking a veteran surgeon how to tie a knot or how to remove an appendix. And while patients might ask for a second opinion when it comes to diagnosis I'm also very certain that the question about how to tie a knot is not going to come up.

At some point in a persons professional career one is expected to be on top of the historical aspects and the current events. And quizzing that person directly as apposed to indirectly can be construed as insulting by some and confrontational by others. Keep in mind that an interview is not strictly about the candidate proving themselves to the company, but the company, and it's managers, proving itself to the candidate. A panel of three programmers (who would be my peers at Amazon) once asked me to design a word processor. Since I was interviewing for a position in the payments department it caught me completely off guard and was clearly not related to Visa and MasterCard regulations which I knew like the back of my hand at the time.

Which is a perfect segue.Charlie B. wrote (Why I Won’t Hire You | Golem Technologies http://bit.ly/xPySOq) and if not for the title alone I would not return his calls let alone take the interview. There are reasons to have a 10 page resume as well as a one page summary. HR gurus have always said that it is best to have a custom resume and cover letter for every perspective employer. But Charlie's demands are completely subjective and prejudicial. Before he has seen a single resume he has decided that a long resume is a waste of his time. This is not the sort of person I want to work for.

William Shields posted an article, in response to a 37Signals article (discussed later), [Interview Programming Problems Done Right ~ C for Coding http://bit.ly/xm3Q8L]. Shields goes on to say how solving Pascal's Triangle is a good interview question because the solution is small. The tragedy of his thinking is that everyone remembers Pascal's Triangle; the problem. He further implies that everyone remembers CS101 or is currently an active mathematician. (I had to spell check mathematician; what makes you think I'm going to remember Pascal's Triangle.) Besides, in Shields' example it's just too easy to google a solution.

My second least favorite way to interview a candidate is through GitHub; as proposed by Assif [Gigantt Blog: The GitHub Job Interview http://bit.ly/yUUhJd]. GitHub is a code repository. It houses a lot of code. Some of that code is actually executing out there somewhere. From a social engineering perspective it's too easy to create sample projects that encapsulate all of the industry's best practices like PEP-8 or modern-perl. And even with this sort of material it's just as subjective as a coding problem.

[Why we don't hire programmers based on puzzles, API quizzes, math riddles, or other parlor tricks - (37signals) http://bit.ly/zujO1P] - David writes a short 4.1 paragraph essay on the subject, but certainly nothing that warrants the sort of criticism that the commentors provided. I'm not even certain I see why Shields would link to it except to get some hits.  David seems to agree with Assif that GitHub samples are good and that coding questions are bad. Shields seems to think that studying aged old computer problems as a quiz is the best way.

I'm just amazed by the coverage this topic is getting. It's popping up on many of the feeds that I follow. Programming can be an art but it is certainly not a performance art in all but a few cases. The rest of the time it's just work like digging a ditch without having to clean under your nails. For the most part programmers give themselves too much credit. We are in the business of supporting business and that's about it. To the stock market programmers that claim that they make changes on the fly and make 500K a year are a) in the extreme minority and b) going to lose their jobs when the SEC figures out that they are the cause of the many market crashes and c) when electronic trading becomes regulated.

I am a programatic programmer(thanks to pragprog.com) with 25+ years experience. I've worked in different vertical markets using different languages and tools. From oscilloscopes to ZeroMQ. It was not until 15 years into my career that I realized the importance of the business, and the customer, and how little the perfect piece of code really is. I'm not going to be sending anyone to the moon on my code and so whether it's yards or meters does not matter to me or my clients or employers. What matters is that the code works as expected and when it was expected. If they need something else then they'll ask for it.

As for selecting perspective candidates. That's a job for the professionals. Professional HR managers and professional technical managers. And unless it's in the culture of the organization you are not likely to see a quiz. Quiz' and coding samples are evil, subjective and prejudicial. You need to find a better way like listening to the candidate describe the projects they were on and the role the performed. What made it special and what was difficult and how they resolved things. These are the questions to ask when hiring qualified programmers.

**It's my belief that the GitHub and coding test became a part of the startup culture because the freshman programmers that were cheap and naive enough to work for a generation-1 startup company never had mentors or sufficient enough experience to know what the best interviewing techniques were. (they typically did not have HR departments either). As the company grew the quiz culture remained behind. What was once created due to inexperience is now a part of culture. The side effect, now, is that manager do not have to interview as many candidates. They have yet another tool to say NO! Interviewers should not be looking for reasons to say no. They should be looking for reasons to say yes. No is too simple,

Leave gaming the system to Captain Kirk and the kobayashi maru.

Tuesday, January 3, 2012

Response: startup coding challenges

Startup coding challenges are meaningless. They are a way for the "haves" to show they are superior to the "have-nots". In very many ways it's legalized cyber-bullying. [If you've ever watched a show like big brother or Survivor; you peer(s) are not your friends. The workplace is not a utopia and therefore your peers are not your friends either.] In fact So under all but the rarest of circumstances should a prospective employee be interviewed by a peer because:

  • if the manager does not take the peer's advice and hires the candidate anyway; the peer will feel slighted

  • unless the peer is trained what to look for in the interview he/she will likely game system so that lesser candidates are hired

  • the interview and hiring process is [already] subjective. The coding challenge is very subjective; disguised as objective. "solve the challenge and get the job and get the job" is just not the way it is... "solve the problem using my favorite language with my favorite idioms and comments and maybe you have a job so long as your salary needs are less than mine" [I made this exact statement before Braintree posted exactly this description]


And while I'm on the subject... sample code is almost as bad. This is code that is usually picked over and over incorporating all of the "best practices". This does not tell you anything about what to expect from the candidate in the first 100 days.

It is incumbent on the manager(s) and candidate's potential supervisors to evaluate the candidate on some very basic criteria.

  • does the candidate get "it"?

  • does the candidate have any useful specialized domain knowledge?

  • will the candidate work well with the team?

  • is the candidate a "hero" or "ego"?

  • self starter?

  • continuing education?

  • able to communicate clear thoughts?

  • able to work under pressure?

  • flexibility? Especially for the ideas of others.

  • how does the candidate handle stress?

  • how does the candidate handle differing opinions?


I would say that any serious human resources department's annual review questionnaire is a good place to start since these are the questions you're going to be asking anyway.

Good Luck!

Monday, January 2, 2012

Cloud Storage Options

I have been considering my cloud storage options and they all lack serious consideration. The best seems to be the best of the worst. I've made a spreadsheet of the features that are important to me and my situation. They are:

  • Backup?

  • Sync?

  • Secure Share?

  • Mobile?

  • Net Drive?

  • Cross Platform?

  • Free is more of a curiosity for testing?

  • First Pay is the cost of first entry?

  • And whatever else I've learned that is a deal breaker.


The spreadsheet below is incomplete only because they simply do not stack up. There are so many other considerations... speed, server location, trust, brand awareness, security, perceived benefit and service.



Given the state of my environment, which is a hybrid family/SOHO setup, I like/prefer CrashPlan and DropBox. However, I'm certain that things are going to change.  These companies need growth which is only going to increase with features and service levels. Of course brand awareness like Google means a lot.

PS: Google storage should be on the list but it's not.  It is simply a net drive and an unreliable one at that.

Sunday, January 1, 2012

New Years Resolution - Reduction

I've decided that the word that will describe 2012 is reduction. In that spirit I have noticed that I have a number of articles in my instapaper queue that need to be read. Once they are read I will do my level best to reduce the number of articles that I actually collect. So if I'm scanning my news feed the criteria for a read later is going to be more strict. But let's review the stories so far.

  • reducing code nesting: We've been talking about this since CS101.

  • SQL to MongoDB mapping: I really like mongo but the argument for SQL is too strong. There is no need for an end user mapper... just implement SQL.

  • programming competitions: really?

  • Assembly Language Hello World: It's not much bigger than a DOS assembler version of the same. It's not the assembler that I grew up on.

  • DropBox Automator: only if you do not care who reads everything in your account.

  • scripting the un-internet: history repeats itself is not the same as a loop.

  • Digital Wallets: They've been talking about digital wallets for years. No one is interested.

  • Interarchy: interesting and possibly helpful. Needs better docs because there are side effects to every action. Would be nice if you were in the app store.

  • Trying to mount a file system: So may bad choices to select from. Since this is sensitive data how many to rey and trust.

  • Django Reusable Apps: It's just a general outline from 50K feet. Would be nice if there was some sample code or at least a sample project template. (see python's moder template)

  • Avoid Apress: I don't think I care enough to read this article. Not sure I know why I saved it in the first place.

  • "good enough": It's a maturity thing. At some point some of the details no longer matter. For this reason ideas like PEP-8 and Agile no longer matter.  GTD is more important.

  • Rob Pike: I like GOLANG but this guy seems to be off the reservation.

  • Lua 5.2: This would be better of the BDFL-Lua were more open. They only release about every 18 months and it's just not that interesting other than it's small footprint.

  • Show and Tell MongoDB: How many times is the same talk going to be linked to?

  • Burn DVDs on OSX Lion: There are a number of articles and most try to solve the problem. The real issue is that Apple does not include iDVD with new machines. Therefore iLife is not really installed with a new machine.

  • Mongo's write lock: First Python's GIL and now this.

  • direvn: Interesting but a bad/incomplete description. Integration with rvm and virtualenv should be addressed better.

  • different by design: Maybe, but it's not enough to be different... you need to be good too.

  • Credit Reports: credit reports and social security numbers need better consumer protection; whatever that needs to be.

  • Don't hire remote programmers: I need to read this one for company research.

  • Hire more remote programmers: I need to read this one too.

  • Renegade: I don't think we agree on the role of a CTO.

  • MySQL to Solr: I'm not going to read this article because I think Solr is not a DB but a search tool like Lucene.

  • For-if anti-patterns: This does not help anything. Check your O() at the door.

  • ...


(I have about 10 more... I only allocated 30 minutes for this post... so in the interest of reduction; this is it.)
I thought I was done... this one caught my eye and the rest have been deleted before I waste another momennt

  • Pay your programmers $200/hr: I'll be reading and rereading this one. I'm not sure that I warrant 200/hr but then I'm certain that the electronic traders do not. Not that they are smarter or willing to take risks but that they make mistakes too and the results are much like football games of old without a replay system that can keep up and rules that allow for correction on a grand scale. Even in football; once the next play has been started and the clock ticking there is no looking back.



And two keepers that I have not read yet but are worth a last minute reprieve. The future of retail, and recruiting for a startup.

Happy New Year! Welcome to 2012!

another bad day for open source

One of the hallmarks of a good open source project is just how complicated it is to install, configure and maintain. Happily gitlab and the ...