Mad science: 2008

Tuesday, September 23, 2008

SciPy 2008 Proceedings online

The proceedings of the 2008 SciPy conference are now online. The articles provide a snapshot of some of the great work being done using SciPy as well as valuable references for SciPy users. Gaël Varoquaux has done a great job getting the whole collection in shape. Thanks Gaël!

Sunday, March 16, 2008

Data type and byte order conversions for NSData using the STL

We've been spoiled working in Python's numpy array library. Converting data types or byte order for numeric data in a numpy array is trivial (thanks to the work of numpy's authors of course). When we're in Objective-C land, the inability to change data types or byte orders on NSData instances is frustrating. So we've written a short C++ header that provides templated functions for converting the data type and/or byte order of numeric data stored in NSData/NSMutableData instances. I think it might be useful for others who don't want to (re)learn C++ and the STL. Writing code in Objective-C to accomplish the same functionality is a mess of if/then/else or switch statements. Every job has a tool and C++ templates seem to be the best tool for this job (numpy's authors have a similar solution using a code generator to keep things all in C).

For the brave and or insterested, take a look at NumericDataTypeConversions.h (BSD license). Bug reports or patches are welcome. Since it's a C++ header, any Objetive-C file that includes it will have to have the .mm extension (or otherwise ensure that the file is compiled as Objective-C++). Happy coding.

Wednesday, January 30, 2008

scikits.ann in PyPI

After several folks had trouble getting the scikits.ann egg from the server listed in previous posts, I've uploaded the source and OS X 10.5 egg to PyPI. You can now install it via easy_install. Of course, you can still get it via the scikits SVN.

Tuesday, January 29, 2008

ANN namespace madness

I hadn't checked the scikits Developer's Trac recently. Apparently it's now OK to use non-BSD (but OSI approved) licenses in the scikits namespace. So I've moved our ANN wrapper back. You can get it from the scikits SVN or via

easy_install -f http://rieke-server.physiol.washington.edu/~barry/python/ scikits.ann

if you're on OS X 10.5.

ANN wrapper (v0.2) now LGPL, OS X egg available

With help and suggestions from Rob Hetland, I've made many changes to the API of our Approximate Nearest Neighbor wrapper for scipy. The API is now hides much of the SWIG-yness of the old version and feels (I hope) more pythonic. On Rob's suggestion we've also added C++ code to make querying multiple points much faster. Because the ANN library is LGPL, I've relicensed our wrapper as LGPL to avoid any LGPL/BSD conflicts and have moved the wrapper to the scigpl namespace from scikits (too bad, scikits looks so much flashier).

I've also made an egg for OS X that statically links the ANN library. You should be able to install via (one line):

easy_install -f http://rieke-server.physiol.washington.edu/~barry/python scigpl.ann

Of course, you can still get it via the scikits SVN.

Here's how the new API looks:

>>> import scigpl.ann as ann

>>> import numpy as np

>>> k=ann.kdtree(np.array([[0.,0],[1,0],[1.5,2]]))

>>> k.knn([0,.2],1)
(array([[0]]), array([[ 0.04]]))

>>> k.knn([0,.2],2)
(array([[0, 1]]), array([[ 0.04, 1.04]]))

>>> k.knn([[0,.2],[.1,2],[3,1],[0,0]],2)
(array([[0, 1],
[2, 0],
[2, 1],
[1, 2]]), array([[ 0.04, 1.04],
[ 1.96, 4.01],
[ 3.25, 5. ],
[ 1. , 6.25]]))

>>> k.knn([[0,.2],[.1,2],[3,1],[0,0]],3)
(array([[ 0, 1, 2],
[ 2, 0, 1],
[ 2, 1, 0],
[ 1, 2, -1]]), array([[ 4.00000000e-002, 1.04000000e+000, 5.49000000e+000],
[ 1.96000000e+000, 4.01000000e+000, 4.81000000e+000],
[ 3.25000000e+000, 5.00000000e+000, 1.00000000e+001],
[ 1.00000000e+000, 6.25000000e+000, 1.79769313e+308]]))

Friday, January 18, 2008

scikits.ann part deux

I've updated our Python wraper for David Mount and Sunil Arya's Approximate Nearest Neighbor (ANN) library. It now handles searching the tree for the k-nearest neighbors of a set of points. Since it's all done in C, this should be much faster than looping in Python for large sets of points. Along the way, I was able to clean up the API significantly--I got rid of the SWIG-isms and the whole thing feels much more Pythonic now.

If you need to do k-nearest neighbor searches, have a look. It's in the scikits SVN.

Tuesday, January 1, 2008

scikits.ann

Our Python wrapper for David Mount and Sunil Arya's Approximate Nearest Neighbor (ANN) library is now in the scikits repository at scipy.org. The scikits.ann module is a SWIG-generated Python wrapper for the ANN library. It provides a numpy-compatible immutable kd-tree implementation which can perform k-nearest neighbor and approximate k-nearest neighbor searches. It currently builds on unix/OS X and I'm working to incorporate Jose Martin's contributions to get things building on Windows with MinGW.

ANN is licensed under the LGPL and we've licensed the scikits.ann wrapper under the BSD license. If you need a kd-tree implementation for Python/numpy, check it out.

What is a UI?

This year, I'd like to say a big thank you to the writers of numpy and scipy, the numerical and scientific libraries for Python. We use these open source projects very heavily in our work. The combined efforts of all of the contributors to these projects has made Python a premier language for numerical computation.

The author of numpy, Travis Oliphant recently moved from Utah to Texas to work with Enthought. One of the many contributions Enthought has made to the scientific software community is developing a whole suite of tools (in Python) that make developing cross-platform scientific applications much easier. In a recenet post, Travis talks at length about what makes a "good" UI. Travis has obviously thought a lot about UI's for scientific computing and his discussion is very interesting. Definitely worth a read. Briefly, Travis notes that the user interface of an app is not just the buttons and menus, but the entire application framework—persistence, undo/redo, safe exploration, workflow, etc.

In scientific software, I think "workflow" is the most important part of an application's user interface. Application frameworks such as Apple's Cocoa, Microsoft's .NET, Trolltech's Qt, etc. have solved many of the applicaiton framework UI issues that Travis mentions (in fact, this is why we use Cocoa and Qt for most of our work at Physion). Apple's Cocoa framework is particularly impressive in this regard. An application with undo/redo, persistence to an SQLite database, automatic network discovery of distributed computing resources, you-name-it, is virtually code-free for the developer. But, none of these frameworks have solved the scientific workflow problem. That's because we know how undo/redo should work. We've known for decades how a database should work (for the most part), but science is about the new and very often discovering something new involves creating the entire workflow anew.

One reason why new workflow and new experiments often go together is because a workflow implicitly defines a world model—an idea about what objects in the world mater and how they interact—and new experiments also define a new world model. Therefore, new experiment, new workflow. Only by matching the workflow to the experiment, can we make software that becomes invisible to the researcher while facilitating new discoveries.

We certainly haven't solved the workflow problem either, but we're working on it. Here's to one more year of trying.

Mad science