7 problems GSoC admins encounter

At the always-excellent Mentor Summit for the Google Summer of Code, I ran a session titled “Best Practices for GSoC Admins.” Many of these practices appear specific to a program like GSoC at first glance, but they easily transfer to recruiting new contributors outside GSoC; just imagine the same ideas with less process behind them. Here I want to share the main points of our session and expand upon some of them in the hopes that it can help future GSoC admins and other people interested in recruiting new developers.

We focused primarily on the biggest problems we face as admins. Intriguingly, although perhaps 30 people attended, nearly all the problems were universal (fortunately, some groups had already solved them!). The only exception to their universal nature was that smaller organizations seemed not to require the same level of progress tracking, because missing/poor progress quickly became obvious to everyone in a small group.

Here are the top problems, with suggested solutions:

  1. Tracking progress. Require weekly updates from both students and mentors. This means admins don’t need to personally track every student or ensure the mentor is around. Blogs or wiki pages (“journals”) work for many projects, although some have issues with blogs. A key point is to offload work to mentors so they tell you whether students are on track. Keep a spreadsheet (possibly public for tracking and shame?) to stay on top of things, because it’s easy to lose track after a few weeks.
  2. Knowing student skills. Model the type of things they would do during GSoC on a smaller scale. Require a patch during the application period to prove they can build and modify your software. Additionally, require that students interact with your community so you can consider how (and whether) they will fit in.
  3. Avoiding failure. Check in with students at “quarter points” — halfway between the start and the midterm, and halfway between the midterm and the final. This leaves time to fix any show-stopping problems before they guarantee failure. During the application period, get a calendar of when both students and mentors will be gone so you can take this into account. Investigate problems early to avoid failure instead of waiting until it’s inevitable. In the case of conflicts between students and mentors, admins can act as neutral mediator — make sure everyone knows this when the summer starts so they don’t feel helpless. Some students communicate poorly (grad-school model of independent work), so try to catch this early and push them. Are there non-binary solutions, ways to do something besides just pass or fail? Can we withhold T-shirts, pay less money based on final “grade”, increase number of payments, pay late, etc.
  4. Disappearing/lazy mentors. One major problem here is figuring out what motivates mentors: what are the incentives and punishments? The most common response to unacceptable mentoring was blocking that person from any future mentoring. Is that enough? Nobody knows; it seems to be mostly an after-the-fact solution that may not fix things during the summer.
  5. Inexperienced mentors. Pair new mentors with experienced mentors and/or backup mentors. Admins should offer to be “mentor-mentors,” teaching the beginners how it’s done.
  6. Increasing the number of proposals. Two student audiences exist: those familiar with your project and those who discover it through GSoC. For the first, target non-accepted students from previous years (Reject gently!). Publicize GSoC internally on your mailing lists, websites, etc. For the second, publicize your project in blogs, to college profs, etc. Have a good ideas list (where good means fun and exciting, so students apply to your project). Increase the time between org acceptance and student deadline so students have time to discover exciting organizations and ideas. Have more flexible ideas that give students some ownership (they must expand upon them!).
  7. Improving the quality of proposals. A high-quality application template is key. Problem: at least one organization saw a correlation between adding a template and getting fewer proposals. Could applying be made a two-step process, so that the template is displayed after a student commits to applying to a specific organization? Require a timeline in the proposal to ensure they understand project details at a level sufficient to code them, but allow it to flex once coding starts. Ask specific questions to gauge both understanding and enthusiasm. Do live interviews by IRC or phone, possibly with live coding.

If you have any suggestions for these problems, or more problems you’ve encountered, please let me know in the comments!

Linux problems you never considered: Handling Fortran90 modules for multiple compilers

One of the strangest areas of Linux packaging is scientific software. Often it’s written by non-programmers, it has an ad-hoc, handwritten or poorly maintained build system, and it uses unusual features of strange languages (like Fortran, the topic of this post). I’ve given talks on how upstreams should package scientific software in the past, but this post touches on a different issue: how distributions should handle one of the stranger aspects of Fortran packages.

The rough equivalent of libraries in Fortran90 is modules. One major problem, however, is that modules (“libraries”) are stored differently and change for each compiler+version used to build the package. For example, modules built using GCC’s gfortran and Intel’s ifort are entirely incompatible; even gfortran 4.3 and 4.4 are not expected to play nicely together.

This becomes a problem for people who care about performance, or people who develop Fortran programs, because these people need to have modules available for many different compilers. Initially, you might think we should store Fortran modules in directories reflecting this diversity. Running `gcc -print-file-name=finclude` on recent GCC versions prints the location where GCC installs its own Fortran modules: /usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/finclude on my system. So you could imagine a series of directories like /usr/lib/$COMPILER/$VERSION/finclude/ where Fortran modules end up for each compiler.

But the problem arises when you consider how packaging actually works: you only get one simultaneous installation of each package+version. That means you can’t easily install modules for three different compiler+version combinations at once. For each module set, you need to rebuild the package for a new compiler and reinstall the package; this means you uninstall the old modules built for the other compiler.

Three possible solutions occurred to me:

  1. Litter modules by making the package forget it installed them. In this scenario, you would rebuild a package multiple times with different compilers, and the modules would get left behind in a compiler-specific directory like /usr/lib/$COMPILER/$VERSION/finclude/.
  2. Create a mechanism for switching between the same package version built by a different compiler. This might work by creating binary packages for module-installing packages, then storing them in directories like /usr/portage/packages/$COMPILER/$VERSION/. A switching script could examine these directories and switch between them on-demand by installing those packages using Gentoo’s PKGDIR setting. Using package-specific settings in /etc/portage/env/ to know when to create binaries by setting FEATURES=buildpkg, then adding a late hook to copy the binpkgs to the compiler-specific package directory, might be one route to this.
  3. Build the same package version with many compilers at once, then bundle it in a single package and install modules for all of them. This would work similarly to Gentoo’s experimental multi-ABI support (available in some overlays), which rebuilds a package numerous times for 32-bit or 64-bit within a single ebuild. This approach has two major downsides: (1) It requires explicit support to be written into every ebuild using it, and (2) a change to just one version of one compiler requires rebuilding the package for every compiler+version.

I’m leaning toward approach 2, which looks relatively easy and quick to support, with the benefit of feeling much cleaner than approach 1 and easier to implement & faster in action than approach 3. With approach 2, only one module directory is required rather than compiler-specific directories. A reasonably compiler-neutral location for Fortran modules would be /usr/$LIBDIR/finclude/, so that’s what I propose to use.

If you have any other ideas or think a different option is better, please let me know in the comments.