One of the strangest areas of Linux packaging is scientific software. Often it’s written by non-programmers, it has an ad-hoc, handwritten or poorly maintained build system, and it uses unusual features of strange languages (like Fortran, the topic of this post). I’ve given talks on how upstreams should package scientific software in the past, but this post touches on a different issue: how distributions should handle one of the stranger aspects of Fortran packages.
The rough equivalent of libraries in Fortran90 is modules. One major problem, however, is that modules (“libraries”) are stored differently and change for each compiler+version used to build the package. For example, modules built using GCC’s gfortran and Intel’s ifort are entirely incompatible; even gfortran 4.3 and 4.4 are not expected to play nicely together.
This becomes a problem for people who care about performance, or people who develop Fortran programs, because these people need to have modules available for many different compilers. Initially, you might think we should store Fortran modules in directories reflecting this diversity. Running `gcc -print-file-name=finclude` on recent GCC versions prints the location where GCC installs its own Fortran modules: /usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/finclude on my system. So you could imagine a series of directories like /usr/lib/$COMPILER/$VERSION/finclude/ where Fortran modules end up for each compiler.
But the problem arises when you consider how packaging actually works: you only get one simultaneous installation of each package+version. That means you can’t easily install modules for three different compiler+version combinations at once. For each module set, you need to rebuild the package for a new compiler and reinstall the package; this means you uninstall the old modules built for the other compiler.
Three possible solutions occurred to me:
- Litter modules by making the package forget it installed them. In this scenario, you would rebuild a package multiple times with different compilers, and the modules would get left behind in a compiler-specific directory like /usr/lib/$COMPILER/$VERSION/finclude/.
- Create a mechanism for switching between the same package version built by a different compiler. This might work by creating binary packages for module-installing packages, then storing them in directories like /usr/portage/packages/$COMPILER/$VERSION/. A switching script could examine these directories and switch between them on-demand by installing those packages using Gentoo’s PKGDIR setting. Using package-specific settings in /etc/portage/env/ to know when to create binaries by setting FEATURES=buildpkg, then adding a late hook to copy the binpkgs to the compiler-specific package directory, might be one route to this.
- Build the same package version with many compilers at once, then bundle it in a single package and install modules for all of them. This would work similarly to Gentoo’s experimental multi-ABI support (available in some overlays), which rebuilds a package numerous times for 32-bit or 64-bit within a single ebuild. This approach has two major downsides: (1) It requires explicit support to be written into every ebuild using it, and (2) a change to just one version of one compiler requires rebuilding the package for every compiler+version.
I’m leaning toward approach 2, which looks relatively easy and quick to support, with the benefit of feeling much cleaner than approach 1 and easier to implement & faster in action than approach 3. With approach 2, only one module directory is required rather than compiler-specific directories. A reasonably compiler-neutral location for Fortran modules would be /usr/$LIBDIR/finclude/, so that’s what I propose to use.
If you have any other ideas or think a different option is better, please let me know in the comments.