One of the strangest areas of Linux packaging is scientific software. Often it’s written by non-programmers, it has an ad-hoc, handwritten or poorly maintained build system, and it uses unusual features of strange languages (like Fortran, the topic of this post). I’ve given talks on how upstreams should package scientific software in the past, but this post touches on a different issue: how distributions should handle one of the stranger aspects of Fortran packages.
The rough equivalent of libraries in Fortran90 is modules. One major problem, however, is that modules (“libraries”) are stored differently and change for each compiler+version used to build the package. For example, modules built using GCC’s gfortran and Intel’s ifort are entirely incompatible; even gfortran 4.3 and 4.4 are not expected to play nicely together.
This becomes a problem for people who care about performance, or people who develop Fortran programs, because these people need to have modules available for many different compilers. Initially, you might think we should store Fortran modules in directories reflecting this diversity. Running `gcc -print-file-name=finclude` on recent GCC versions prints the location where GCC installs its own Fortran modules: /usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/finclude on my system. So you could imagine a series of directories like /usr/lib/$COMPILER/$VERSION/finclude/ where Fortran modules end up for each compiler.
But the problem arises when you consider how packaging actually works: you only get one simultaneous installation of each package+version. That means you can’t easily install modules for three different compiler+version combinations at once. For each module set, you need to rebuild the package for a new compiler and reinstall the package; this means you uninstall the old modules built for the other compiler.
Three possible solutions occurred to me:
- Litter modules by making the package forget it installed them. In this scenario, you would rebuild a package multiple times with different compilers, and the modules would get left behind in a compiler-specific directory like /usr/lib/$COMPILER/$VERSION/finclude/.
- Create a mechanism for switching between the same package version built by a different compiler. This might work by creating binary packages for module-installing packages, then storing them in directories like /usr/portage/packages/$COMPILER/$VERSION/. A switching script could examine these directories and switch between them on-demand by installing those packages using Gentoo’s PKGDIR setting. Using package-specific settings in /etc/portage/env/ to know when to create binaries by setting FEATURES=buildpkg, then adding a late hook to copy the binpkgs to the compiler-specific package directory, might be one route to this.
- Build the same package version with many compilers at once, then bundle it in a single package and install modules for all of them. This would work similarly to Gentoo’s experimental multi-ABI support (available in some overlays), which rebuilds a package numerous times for 32-bit or 64-bit within a single ebuild. This approach has two major downsides: (1) It requires explicit support to be written into every ebuild using it, and (2) a change to just one version of one compiler requires rebuilding the package for every compiler+version.
I’m leaning toward approach 2, which looks relatively easy and quick to support, with the benefit of feeling much cleaner than approach 1 and easier to implement & faster in action than approach 3. With approach 2, only one module directory is required rather than compiler-specific directories. A reasonably compiler-neutral location for Fortran modules would be /usr/$LIBDIR/finclude/, so that’s what I propose to use.
If you have any other ideas or think a different option is better, please let me know in the comments.
Maybe you should speak to flameeyes
It sounds very much like the issues he had with ruby and they’ve kinda nailed most of that with ruby-ng
The common factor with ruby (assuming it’s the same as python) is multiple compilation of the same files, but a major difference is the time required to do that compilation. Building “real” source packages takes so much longer than compiling python files that it’s difficult to draw a good parallel for how to implement it.
I’m probably missing something, but don’t all the solutions require multiple compilation, so why is the time factor significant?
From what I understand, ruby-ng uses a different approach to python. It adds a RUBY_TARGET variable to manage which compilers to install a package for. See http://blog.flameeyes.eu/2010/07/29/ruby-1-9-vs-python-3.
I can’t really tell you much more than that, but I think it would be worth having a look at.
They do. The main question between 2 & 3 is how often you have to rebuild. When it’s built for multiple compilers within a single ebuild, you need to rebuild for all of them every time you change any of them. When you have multiple separate binpkgs, you never need to unnecessarily rebuild for the same compiler, only new ones.
What about having one package per compiler version. So, fortran-foo-gcc4.4 and so on?
That works great for binary distributions, where you can break up a single source build into an arbitrary number of binary packages. But in a source-based distro, source packages map one-to-one to binary packages, so it doesn’t really work.
I’m not sure I understand your argument here. There are several packages elsewhere in the tree that share source files (KDE is one that pops to mind).
Are you saying that the non-build parts of the various packages would be duplicated work, or am I completely off-base?
Kevin, the problem is that you need to recompile the *same* source files once for every compiler minor version on the system. Does that clarify anything?
A little. If I understand, you’d have to either keep a large number of packages around (one per current package per compiler type per minor version), or come up with a new way to handle that automatically, which starts to lean towards the other choices anyway.
That problem gets stickier the more I think about it.
The number of scientific packages in fortran is constantly going down, but the clocks in science are slow and we will not get rid of fortran in the next 10 years (or ever) 😉
I like approach 2 due to the fact the less rebuilds are needed the better the approach will be accepted in community using fortran codes. Scientist tend to work with the same version of package for a long time to produce reliable results and a rebuild is always a danger.
Well, as long as there are exactly 2.5 programming languages that are able to produce high performance code with the right compiler (Fortran, C, and C++ if you leave out all the goodies), scientific software will not move away from Fortran, ever.
This also means that the pain of keeping, say, MPI modules for gfortran, ifort and whatever other compiler you want to use will also stay.
Haven’t used portage in a while, so I can’t remember how it works exactly, but…
Couldn’t you have a separate source dir for each compiler/version, link the actual source in each directory to one (master) directory, but have a different description file for each compiler/version.
Then, you aren’t *forced* to recompile every time, only when the description of one of the “meta”-sources changes…unless I’m messing something important about portage that prevents this.
I am not sure how fortran handles this, but in Ada (my previous comment, just below) it is done pretty much like this. Recompilation is only done once – when the proper lib gets installed and only for “primary” compilers. Interfaces and compiled/linked libs are “ready for consumption”. Of course, if users prefer they can force local recompilations as well (from within their project)..
I have just sent my comment to sci list (there was a related discussion initiated). Posting relevant part(s) here, as this is even more on topic.
—————–
Ada has been implementing such a system
for many years now (perhaps the first such miltiABI class in the tree),
therefore, if there some technical issues that can be referenced:
http://www.gentoo.org/proj/en/prog_lang/ada/dev_reference.xml
(unfortunately incomplete, but I should have put principal points there before
“dropping it”) and discussed.
The implementattion is somewhat along the lines of “2.” point. In fact, 2 and 3 follow the same principle, the difference is only where
the multi-build control code resides – in the ebuild or external script. The
Ada implementation places it in the “standard” locations: an eclass for
building compilers, an eclass for taking care of libs and eselect module “to
rule them all”. Thus there is no duplication of code, but it is still “where expected”. There is even an option of selecting “primary” profiles – the
ones for which libs will be built, and having “experimental” – just for play
compilers. Therefore I would suggest to interested people to look at the
code/contact me, etc.. I think we can all benefit from discussing this topic.
Overall though, I would very much like to push for the standard “in portage”
treatment of multiple ABIs – at leas for PM to provide some necessary
“core”functionality. However this is still well within the design phase, as I
understand..
It looks like the basic approach is to SLOT everything using dynamic slots that break the cache. Is that correct?
Well, I don’t know about SLOTs – every multiABI approach can be considered SLOTting-like at some level I guess. Besides I haven’t thought about it in portage terms – there was simply nothing that could be used and any kind of support was not even considered back then.
Basically, every lib gets built for the defined set of compilers. Initially it was built for all of the installed ones but later I rethought this and introduced “primary” compilers – a simple list. All libs get built only for primaries but for all of them, halting with error if anything is missing. This forces some consistency and prevents hard-to-track issues.
Libs, for every abi, get installed in abi-specific location. Since both compilers are gcc-based I simply used gcc layout. Active one can be selected with eselect module (followed by “. /etc/profile” as per gcc-config). This stuff seems to work fine for many years. At least bugs that I get never seem to involve “core” stuff, mostly technicalities.
Oh, I see where you got that idea of SLOTs from. Ok, see, with Ada the situation may be a bit more elaborate. There are two similar but separate compilers, by ACT and by FSF. On top of that, each one is SLOTted, as backend gets upgraded from time to time (each major gcc version), and we want to provide some tracking for older packages, – Ada is no less conservative than some of the sci stuff :). The multi-abi approach is “on top” of (or rather in addition to) that. Cache, what cache are you refering to? If you mean portage cache, then it should be fine. Compilers themselves are separate and libs have their own versions, which are “perpendicular” to compiler profiles. Its a set product. Portage knows (as it is) nothing about compiler profiles, so nothing is there to be broken. Some mess could have arisen in yearly years when libs were installed for all present compilers (even then I did not get any bugs due to this – I guess Ada devs are accurate enough by training :)). After I implemented “primary compiler sets” the possibility of messup via that route was closed too..
Kind of an interesting idea. Do people ever seem to make primary compilers anything besides “my current compiler only” or “everything”?
Well, I did, so it is tested :). I cannot say clearly for users, as they only submit bugs when something does not work as they want :), otherwise they are rather silent. At least I do not remember bugs filed against issues related to this part.
BTW (just checking) did you (or anybody) get my reply to sci list? Gentoo’s mailing system seems to omit author of the message on list replies, – I never see my messages appear on the list..
Nope. http://archives.gentoo.org/gentoo-science/
Solution 4: Cut through this mess of WTFery, and define an ABI for fuck’s sake!