Tuesday, May 18, 2010

JSCoverage, cobertura and selenium

I spent a few hours this morning trying to add code coverage to our javascript code base. I came across the pretty well written jscoverage tool and liked the look of the results. What I didn't like was the output format, I needed a way to collect the results for use in continuous integration. We use hudson and the cobertura plugin successfully for both c++ and java projects so I took a crack at converting the jscoverage data into something usable in our system.

Here's what I came up with: JSCoverage to Cobertura

I based my solution off of the reporting code embedded in the jscoverage-server generated files. That report function was the main difference between the regular jscoverage output and the jscoverage-server output, by sending this code with the request we can use the stock jscoverage generated files. My scripts are written in ruby but could be easily ported to other languages, the key is really the function that is uploaded and evaluated inside the browser, that returns a hunk of json that is a map of filenames to arrays of line hit counts. I call the collect_coverage function in the teardown of each of my selenium tests and since the @@coverage variable is shared across all of the test cases all of the runs are merged together so I get a good picture of the coverage. Obviously there is no branch coverage but if jscoverage were to get that capability it would be relatively straight forward to add it to the cobertura output.

One problem is that the collection of data is only 'test-driver' initiated, this means its impossible to test 'exit' behavior or tests that exercise more than one page (since the coverage information is kept in the DOM it is lost on reload). One way to handle this would be inverting the control and having a POST occur before the page is closed (which is what jscoverage-server aimed to accomplish) but that requires alot more plumbing work which makes it less flexible, more likely to break. Plus this is not much of an issue when running javascript unit tests since the page is never reloaded.

Using this setup we now have the javascript coverage of our unit and acceptance tests run by selenium converted to the cobertura xml format and tracked in hudson, isn't open source great! Let me know if there are issues with script in the comments.

Wednesday, April 14, 2010

Scala Constructor Oddities and NPEs

While trying to track down a bug today I discovered a set of behaviors in scala that seem very dangerous to me and they exposed a few scala implementation details I wasn't expecting to see. The first thing I found was its trivial to make a class that is impossible to construct:


class UnConstructableClass{
def dontExcept() { b.size}
dontExcept
val b = List()
}


If you run "new UnConstructableClass", you will get a NullPointerException. It turns out that while the object is being constructed all of the fields (val and var) are assigned null before that line in the constructor is run, so its possible to have unexpected null values and exceptions in scala code that is supposed to be immune to nulls!

Thats an obviously pathological case and veteran scala developers will say that "you should allways define your vals/vars before using them". This is of-course true and a good rule of thumb but I would make 2 points. The first is that the compiler has enough information to error or atleast warn us when we've done something stupid like this. The second issue is that function definitions don't behave the same way, we can use them before they have been defined, we can switch the definiton and call for dontExcept and it still excepts. I think I understand the technical reasons why this occurs but its very non-intutive behavior in my mind.

What makes this behavior so insidious is the effects when using abstract classes. If we call a base classes function we have be completley sure that all parts of our derived class have been constructed or else we can end up with NPE's in completely unexpected locations. Consider the following:


class DerivedClass(delayCall : Boolean) extends BaseClass(delayCall)
{
start
val b = List()
def act() : Int = {b.size}
}


abstract class BaseClass(delayCall: Boolean){
implicit def asRunnable(func : ()=>unit) : Runnable = {new Runnable(){def run(){func()}}}

def act() : Int // abstract or overriden function
var result : Int = -1;
def start(){
if(delayCall) {
var t = new Thread(() => {
try{
result = act()
}catch{
case n : NullPointerException =>
result = -2;
}
})
val r = new Random
t.start
if(r.nextBoolean())Thread.sleep(10)
}
else result = act()
}
}


If we run try to construct a DerivedClass(false) then we will get an immediate NPE during construction because the behavior is totally deterministic. If instead we construct a DerivedClass(true) which calls our abstract function at a non-deterministic time in the future we may or may not get an NPE. This effect may also occur with overriden functions though I haven't investigated that yet.

Combining this type of issue and the often non-obvious asyncronous behavior of method calls (!?) its making me question when it is safe to call functions inside the class body constructor. The original bug that drove me to investigate this issue was "autostarting" an actor, calling the start function inside of the constuctor. This would cause a 1 out of 10 failure where a derived class wasn't fully constructed before the act() method was called and it tried to use an abstract field. For now the rule of thumb of initializing all fields before calling any functions will have to suffice.

I have a working example of these issues with test cases in a gist: http://gist.github.com/366364

Monday, March 1, 2010

When a dash is not a dash... Or damn you Word!

We are in the final stages of testing a communication driver for a piece of custom hardware attached via serial. The application provides a command line interface for testing that the device is communicating properly and that commands sent to the device operate as expected. We had written up a long test procedure to document what the tool is capable of doing and how best to use it. The document was written in word and was well formatted and easy to read. We were satisfied with everything we had done and released the executable and supporting documents to some of our engineers to test. Things worked as expected for a few days and then suddenly it stops working with an odd error message. The standard command line to start the app was:

driver.exe -I -S default_driver.xml

This translates to "interactive" (-I), "simulated" (-S) and then the xml config file to use. Well today all of a sudden this command line was getting a slew of odd "unexpected argument" errors. We have been using the same argument parsing library http://graphics.stanford.edu/~drussel/Argument_helper/ for a few years so it was quite unbelievable that the library was at fault. We tried it on our machines and worked fine for us, but sure enough on his machine we could replicate the error. We figured it must be a memory corruption bug and pored through the code looking for anywhere where we could be borking the memory but couldn't find anything. Out of frustration we reordered the arguments and it started working as expected. Then we went back to the original order and it worked!

We puzzled over this for a while then we realized that we had told the engineer to copy the command line from the testing document. Turns out word had helpfully replaced one of the argument flag dashes with a dash doppelganger. It looks exactly like a dash but didn't have the ascii value for a flag, which is what our command line argument parsing library was looking for. What made this all the more frustrating was that copying and pasting this value into nearly anything else got the expected dash character. This included using the echo command to print copy/pasted command to a file. This crazy dash character doesn't even copy/paste consitently in word, if we select that character, hit control-F and search, it silently replaces it with a regular dash in the find box so that char doesnt even show up in the search of the file.

We have added a check to the command line parsing to make sure that all chars are in the standard ascii set (between ' ' and '~') so we don't get bit again.






Tuesday, February 2, 2010

Rails 2.3.5 Slow Webrick

I found an issue with rails 2.3.5 webrick that caused remote connections not on the host machine to be brutally slow. I did some research and found that avadi-daemon was the culprit. On my machine I disabled the daemon and speeds instantly rebounded to where they were supposed to be.

sudo /etc/init.d/avahi-daemon stop


If that fixes the issue for you, make sure you uninstall or disable it so it doesn't start the next time you reboot. (I deleted the symlink from /etc/rc3.d/).

Lighthouse ticket. (Though I think I should escalate this ticket somewhere else, no developers appear to be following the ticket.)

Tuesday, January 12, 2010

Qpid Ruby Driver

We have been playing with Apache's qpid as an amqp broker and I was getting annoyed with the need to include ext/sasl to run anything that used it as a gem. They were 99% of the way there but they didn't make the gem spec correctly so that it would actually run the sasl extconf.rb file. All you need to do is add the following line to the gem specification definition. (Line 108 in qpid-0.5/ruby/Rakefile)

s.extensions << 'ext/sasl/extconf.rb'

Now you can just include "qpid" in your client code without having to get sasl on the path.

This is one of thoses cases where github would be good, I don't know the best way to push this change back to whoever manages this code. There was a github branch of this code (colinsurprenant-qpid) but it is over a year out of date and doesn't even run against the current 0-10 version of qpid. It's too bad because he had addressed the strange way the amqp protocol xml spec files were handled and stored. I may have too look at repatching in those changes and adding the protocol files to the gem.