Archive for September, 2006

Lang.Net Symposium

Thursday, September 28th, 2006

Microsoft recently hosted the Lang .Net Symposium 2006 conference

Lang .NET 2006 Symposium is a forum for discussion on programming languages, managed execution environments, compilers, multi-language libraries, and integrated development environments. This conference provides an excellent opportunity for Programming Language Implementers and Researchers from both industry and academia to meet and share their knowledge, experience, and suggestions for future research and development in the area of programming languages.

I heard about this symposium and wanted to attend but couldn’t. It basically had a bunch of programming language theory academics and language implementors in attendance. People who are writing compiler and interpreters for new languages static and dynamic that run on virtual machines .Net and the JVM.

Thanks to the nice people at Microsoft, Erik Meijer.

The talks were recorded and available at
http://www.langnetsymposium.com/speakers.asp

Check it out.

Ruby Open3.popen3 doesn’t return the pid of the child process.

Thursday, September 21st, 2006

This week during our Ruby Hacking night, Bill Dolinar and I looked at Open3.popen3. I was disappointed that it didn’t seem to return a pid(process identifier) for the spawned process. The Ruby documentation shows that popen3 returns the three standard process io streams.

stdin, stdout, stderr = Open3.popen3('nroff -man')

It doesn’t return the pid of the process it’s spawning however. Which means you can’t wait on the child process with Process.waitpid. Looking at the Open3 source we see the following:

def popen3(*cmd)
pw = IO::pipe   # pipe[0] for read, pipe[1] for write
pr = IO::pipe
pe = IO::pipe

pid = fork{
# child
fork{ # .... CUT
# grandchild
exec(*cmd)
}
exit!(0)
}

Open3 does save off a pid but not the pid of the spawned ‘nroff -man’ process. Instead it captures the pid of a process which will spawn off the desired ‘nroff -man’ process. So why the double fork? Well Matz’s reasoning is so that the child process gets deamonized and reparented to the init process on unix. Maybe so, but that should be the programmers choice, whether to deamonize or not. Maybe I want to wait on the child process myself using Process.waitpid. Maybe I want to send the child process a signal, or kill it. All these tasks require that the programmer be provided with the pid of the child process.

First the double fork behavior should be selectable by the developer.

Second the pid of the spawned process should be returned.

The solution open4.rb

See also: RCR 206: Make popen3 raise an exception on failure

Skuery 0.0.1 Release

Thursday, September 21st, 2006

Skuery 0.0.1 Release

This is the first release of Skuery a partial embedding of XQuery functionality in PLT Scheme.

The release was a part of my master thesis work.

skuery-0.0.1.tgz

Statistics for Graduate Students

Thursday, September 21st, 2006

I’m “sitting in” today on a statistics cource for graduate students that a few of my friends are taking this semester. The class is structured to help non-statistics majors to use statistics in their graduate studies. I always get a few wierd looks whenever I do this from people who know me and are taking the class, but I always learn. I’ll blog the insights I find today.

Regarding T-tests

  • Used to compare two samples.
  • Assume that the populations have normal distributions and that the standard deviations are nearly the same.

Transformations of Data – transformations move your data to a different space that allows for showing nice curves.

  • Should be 1 to 1 if possible
  • logarithmic transformation for skewed data
  • pH, Richter scale
  • Use if the ratio of the largest to smallest observation is greater than 10.
  • More than a magnitude of range difference.
  • Use for positively sewed data for two or more groups when the group with larger average also has the larger spread.
  • square root transformation for counts yes or no data.
    • radiation
    • bacterial cells
    • blood tests
  • reciprical transformation
    • for waiting times
    • putting a negative on the data preserves order
  • logistic transformation when you have proportions or percents
    • log(p/(1-p))

    Strategy for dealing with outliers

    • Check that the data is recorded properly, no typos.
    • Do the analysis with and without the outliers. Report both sets of results if they differ.
    • Could the outliers be from a different population?

    2 sample T tests are not resistant to outliers. There are other methods

    Use graphic means to determine that you are using the correct transformations

    • Normal probability plots
    • Side-by-side Box plot
    • Dot plots or scatter plots (more usefull for regression analysis)

    Medium is usually a better indicator of the central value in skewed data.

    Ruby Performace

    Wednesday, September 13th, 2006

    I’m a little sick of hearing the Ruby community tell everyone that performance doesn’t matter. Instead of trying to justify away poor performance or argue that a better algorithm will make up for Ruby’s shortcomings, a Ruby proponent should DO something about it.

    I would like to see responses to criticism of Ruby performace such as this:
    X is slow in Ruby, I’ve worked on it a little bit and it is now Y% faster.

    Lets address the issue piece by piece and make it better instead of making ourselves look stupid by saying that performance isn’t important and everyone else has it wrong.

    I’m thinking aloud about the whole Joel on Software discussion that Ruby is slow. We, as a Ruby Community, need to quit bashing general opinions and do something about it. If you don’t like the shootout, make it better or provide documentation that describes the language subsystems that benchmarks in the Great Computer Language Shootout are exercising.

    As for my contribution, I’m working on Cardinal the Ruby implementation on top of Parrot. I enjoy working on Cardinal because I get to interact with implementors that are working on almost evey dynamic language imaginable. I see the tricks they use to improve performance and try to incorporate them into my work.

    I would also encourage community members to try to submit more tests and patches to cleanup and improve Matz’s C implementation of Ruby. I’ve been able to submit a couple of small bug fixes and I’m recommiting myself to making a couple more submissions this week.

    It’s time to stop heckling and start working. :)

    Lateset Cardinal and Parrot News

    Tuesday, September 12th, 2006

    Well my thesis is finished so I’m able to spend more time working on Cardinal and Parrot.

    This past week I wrote a PGE(Parrot Grammar Engine) grammar for C99.  At first thought this may sound funny.  A C parser for a dynamic language virtual machine?  Well almost all the dynamic languages being implemented on top of Parrot support C extensions.  Parrot itself has a both C extension interface and a C interface for embedding the Parrot virtual machine in other programs.

    Language Extension builders often wish to make C function definitions and constants availible to the dynamic scripting language they are extending.  This is often accomplished with FFI(foriegn function interface) libraries.  Writing extension code, even with the help of FFI libraries often involves replicating information stored in C header files.  Having a C99 parser for Parrot helps to eliminate the manual process of replicating C constants and definitions in dynamic language extensions.  If eliminating replication isn’t possible, hopefully the process can be automated.

    In the Cardinal realm, I was able to add support for Ruby BEGIN and END blocks to Cardinal today.  BEGIN blocks are code blocks that execute before the rest of the program begins.  END blocks execute after the program has finished but before the Ruby interpreter begins to shutdown.

    Cardinal and Parrot are both in need of contributors.   Interested volunteers can contact me at tewky@yahoo.com.