Skip to main content

TornadoWeb - web scraping, eventd, recursion

I'm working with TornadoWeb and ZeroMQ at the moment and I was having a heck of a time getting things to work correctly. Especially when I was trying to call-out to another webserver and preserve the async-mode in the system. (TornadoWeb provides a basic Asyncronous HTTP Client library)

First of all, some months ago I found a post that wanted me to replace Tornado's IOLoop with the ZMQ version. They did not state why but they were specific to say that Tornado's was to be replaced with ZMQ and not the other way around. The code was not very interesting:
# override tornado's ioloop with zmq's
ioloop.IOLoop = zmq.eventloop.ioloop.IOLoop

And while I was researching this project I found that ZMQ provides a method that does the work for you. I reviewed the code and there was nothing special in it. It was pretty much the same code as I had originally implemented. I changed the code to use the new method. Not because they did it better but because I am hoping that this might future proof my project incase there is a change that I cannot account for. So the code now looks like:
# override tornado's ioloop with zmq's
#ioloop.IOLoop = zmq.eventloop.ioloop.IOLoop
zmq.eventloop.ioloop.install()

Hey! Nothing spectacular there. Moving on to the next challenge...

The code I'm currently working on is an admin site. Admin in the sense that only one or two users will actually every use this app and it would be very unlikely that more that one user would be online (even in an emergency). While that is true... the synchronous version of the website did not perform very well. Specially when I was deleting large amounts of data from Redis.

In this particular case I'm trying to implement a "test harness". While I like nosetests and it works great I need something that is interactive and ubiquitous. In this use-case the user enters the URL for the test-index-page. The page is drawn from a dictionary of testcases. The user clicks on a testcase and it runs to completion... drawing it's output in the buffer and then going back (recursively) to the admin website and putting in some additional data (a call trace) and appending that to the buffer too. The challenge was several fold. a) the build-in classes were not working asynchronously, b) there were two client calls to make; 1) the authorization 2) the call trace. c) make it all work within the asynchronous framework.

Before I go any further. It works and here is the code. I'm not going to explain it any more than this for now.
# test handler
class TestHandler(tornado.web.RequestHandler):
"""
"""
def initialize(self):
self.server = 'http://myapiserver.local:8882'
self.adminserver = 'http://myadminswerver:8881'
self.path = 'api1.1'
super(TestHandler,self).initialize()

# normally the callback function does not get this decorator, however, this
# was needed in order to make this work. Notice that this is the second handler
@asynchronous
def _handle_request2(self, response):
"""This is the second callback handler.
"""
if response.error:
self.write("Error: %s" % (response.error))
else:
self.write(response.body)
# need the self.finish() because the asynchronous decorator
# disables the auto_finish()
self.finish()

# since this is the first
@asynchronous
def _handle_request1(self, response):
"""This is the first callback handler.
"""
if response.error:
self.write("Error: %s" % (response.error))
else:
# write the output to the buffer but since we are not calling
# finish() the data remains in the buffer.
self.write(response.body)

# make the second call and callback to the second handler.
url = "%s/myfunction_two/" % (self.adminserver)
request = httpclient.HTTPRequest(url)
# it is important to replace the io_loop here (also needed to make it work)
http_client = httpclient.AsyncHTTPClient(io_loop=ioloop.IOLoop.instance())
# going to callback to the 2nd handler
http_client.fetch(request, self._handle_request2)

@asynchronous
def _get(self,other=None):
url = "%s/%s" % (self.server, path or self.path)
pay = self.path
request = httpclient.HTTPRequest(url, body=pay, method='POST')
# it is important to replace the io_loop here (also needed to make it work)
http_client = httpclient.AsyncHTTPClient(io_loop=ioloop.IOLoop.instance())
# going to callback to the 1st handler
http_client.fetch(request, self._handle_request1)

# notice that there is NO decorator here. It will be applied when _get() is called.
def get(self,other=None):
"""display the menu or execute the test
"""
self.guid = str(uuid.uuid1())
if not other:
# display the testcase menu
for k in testcases.keys():
self.write('<a href="/t/%s/">%s</a><br>' % (k, k))
self.finish()
else:
self._get(other)

It would have been nice if there had been a "parallel" task execution as the two queries could be executed at the same time because they are unrelated requests. Granted the callback would have to juggle the results in order to get them in the right order and then display them I suppose it might be possible with a single handler if the handler could inspect the data before calling finish(). It's something worth posting in the future.

I also want to mention that a similar strategy would probably apply to Mojolicious. (assignment for the reader; I home someone will post and link back.)

Comments

Popular posts from this blog

Entry level cost for CoreOS+Tectonic

CoreOS and Tectonic start their pricing at 10 servers. Managed CoreOS starts at $1000 per month for those first 10 servers and Tectonic is $5000 for the same 10 servers. Annualized that is $85K or at least one employee depending on your market. As a single employee company I'd rather hire the employee. Specially since I only have 3 servers.

The pricing is biased toward the largest servers with the largest capacities; my dual core 32GB i5 IntelNuc can never be mistaken for a 96-CPU dual or quad core DELL

If CoreOS does not figure out a different barrier of entry they are going to follow the Borland path to obscurity.

UPDATE 2017-10-30: With gratitude the CoreOS team has provided updated information on their pricing, however, I stand by my conclusion that the effective cost is lower when you deploy monster machines. The cost per node of my 1 CPU Intel NUC is the same as a 96 CPU server when you get beyond 10 nodes. I'll also reiterate that while my pricing notes are not currently…

eGalax touch on default Ubuntu 14.04.2 LTS

I have not had success with the touch drivers as yet.  The touch works and evtest also seems to report events, however, I have noticed that the button click is not working and no matter what I do xinput refuses to configure the buttons correctly.  When I downgraded to ubuntu 10.04 LTS everything sort of worked... there must have been something in the kermel as 10.04 was in the 2.6 kernel and 4.04 is in the 3.x branch.

One thing ... all of the documentation pointed to the wrong website or one in Taiwanese. I was finally able to locate the drivers again: http://www.eeti.com.tw/drivers_Linux.html (it would have been nice if they provided the install instructions in text rather than PDF)
Please open the document "EETI_eGTouch_Programming_Guide" under the Guide directory, and follow the Guidline to install driver.
download the appropriate versionunzip the fileread the programming manual And from that I'm distilling to the following: execute the setup.sh answer all of the questio…

Prometheus vs Bosun

In conclusion... while Bosun(B) is still not the ideal monitoring system neither is Prometheus(P).

TL;DR;

I am running Bosun in a Docker container hosted on CoreOS. Fleet service/unit files keep it running. However in once case I have experienced at least one severe crash as a result of a disk full condition. That it is implemented as part golang, java and python is an annoyance. The MIT license is about the only good thing.

I am trying to integrate Prometheus into my pipeline but losing steam fast. The Prometheus design seems to desire that you integrate your own cache inside your application and then allow the server to scrape the data, however, if the interval between scrapes is shorter than the longest transient session of your application then you need a gateway. A place to shuttle your data that will be a little more persistent.

(1) storing the data in my application might get me started more quickly
(2) getting the server to pull the data might be more secure
(3) using a push g…