New GCC hotness

I’ve been talking with Lance lately about setting up a good development machine, and GCC 4.3 (still unreleased) came up so I checked out its changes. Here’s a few I found particularly cool:

  • A new forward propagation pass on RTL was added. The new pass replaces several slower transformations, resulting in compile-time improvements as well as better code generation in some cases.
  • A new command-line switch -frecord-gcc-switches has been added to GCC, although it is only enabled for some targets. The switch causes the command line that was used to invoke the compiler to be recorded into the object file that is being created. The exact format of this recording is target and binary file format dependent, but it usually takes the form of a note section containing ASCII text. The switch is related to the -fverbose-asm switch, but that one only records the information in the assembler output file as comments, so the information never reaches the object file.
  • A new internal representation for GIMPLE statements has been contributed, resulting in compile-time memory savings.
  • A new command-line option -fdirectives-only has been added. It enables a special preprocessing mode which improves the performance of applications like distcc and ccache.
  • Experimental support for the upcoming ISO C++ standard, C++0x
  • Fortran: The -fexternal-blas option has been added, which generates calls to BLAS routines for intrinsic matrix operations such as matmul rather than using the built-in algorithms.
  • Fortran: Support to give a backtrace (compiler flag -fbacktrace or environment variable GFORTRAN_ERROR_BACKTRACE; on glibc systems only) or a core dump (-fdump-core, GFORTRAN_ERROR_DUMPCORE) when a run-time error occurred.
  • Java: libgcj now supports all 1.5 language features which require runtime support: foreach, enum, annotations, generics, and auto-boxing.
  • x86/amd64: Tuning for Intel Core 2 processors is available via -mtune=core2 and -march=core2.
  • x86/amd64: Code generation of block move (memcpy) and block set (memset) was rewritten. GCC can now pick the best algorithm (loop, unrolled loop, instruction with rep prefix or a library call) based on the size of the block being copied and the CPU being optimized for. A new option -minline-stringops-dynamically has been added. With this option string operations of unknown size are expanded such that small blocks are copied by in-line code, while for large blocks a library call is used. This results in faster code than -minline-all-stringops when the library implementation is capable of using cache hierarchy hints.
  • x86/amd64: Support for SSSE3 built-in functions and code generation are available via -mssse3.
  • x86/amd64: Both SSE4.1 and SSE4.2 support can be enabled via -msse4.
  • x86/amd64: GCC can now utilize the ACML library for vectorizing calls to a set of C99 functions on x86_64 if -mveclibabi=acml is specified and you link to an ACML ABI compatible library.
  • MIPS: libffi and libjava now support all three GNU/Linux ABIs: o32, n32 and n64. Every GNU/Linux configuration now builds these libraries by default.
  • The configure options –with-pkgversion and –with-bugurl have been added. These allow distributors of GCC to include a distributor-specific string in manuals and –version output and to specify the URL for reporting bugs in their versions of GCC.

I’m already using GCC 4.2, but I hadn’t really looked into its changes either until now:

  • OpenMP is now supported for the C, C++ and Fortran compilers.
  • A new command-line option -Waddress has been added to warn about suspicious uses of memory addresses as, for example, using the address of a function in a conditional expression, and comparisons against the memory address of a string literal. This warning is enabled by -Wall.
  • C++: -Wextra will produce warnings for if statements with a semicolon as the only body
  • C++/libstdc++: Enabled library-wide visibility control, allowing -fvisibility to be used.
  • x86/amd64: -mtune=generic can now be used to generate code running well on common x86 chips. This includes AMD Athlon, AMD Opteron, Intel Pentium-M, Intel Pentium 4 and Intel Core 2.
  • x86/amd64: -mtune=native and -march=native will produce code optimized for the host architecture as detected using the cpuid instruction.