Detect project primary code languages

GitHub, GitLab and similar repository services deal with hundreds of coding languages. Accurate detection of coding languages in a project is useful for discovery of repositories that are of interest to users and for security scanning, among other purposes. Scientific computing developers are generally interested in a narrow subset of programming languages. HPC developers are generally interested in an even narrower subset of programming languages. We recognize the “long tail” of advanced research using specialized languages or even their own language. However, most contemporary HPC and scientific computing work revolves around a handful of programming languages.

Prior art

To rapidly detect coding languages at each “git push”, GitHub developed the open-source Ruby-based Linguist. GitLab also uses Linguist. We developed a Python interface to Linguist that requires the end user to install Ruby and Linguist. However, Linguist is not readily usable from native Windows (including MSYS2) because some of Linguist’s dependencies have Unix-specific code, despite being written in Ruby. The same issues can happen in general in Python if the developers aren’t using multi-OS CI. GitHub recognized the accuracy shortcomings of Linguist (cited as 84% on average) and developed the 99% accurate closed-source OctoLingua OctoLingua deals with the 50 most popular code languages on GitHub. Little has been heard since July 2019 about OctoLingua.

New tool

We provide initial implementation of a tool code-sleuth that actively introspects projects, using a variety of heuristics and direct action. A key design factor of code-sleuth is to introspect languages using specific techniques such as invoking CMake or Meson to introspect the project developers intended languages. The goal is not to detect every language in a project, but instead to detect the primary languages of a project. Also, we desire to resolve the language standards required, for example:

  • Python 2.6..2.7
  • Python > 3.6
  • C++14
  • C11
  • Fortran 2008

This detection will allow a user to know what compiler or environment is needed in automated fashion.

Boost install on Windows

The Boost library brings useful features to C++ that are not yet in STL. For example, the C++17 filesystem library was in Boost for several years. Until the most recent compiler releases, C++17 filesystem required Boost.

Boost install requires several hundred megabytes in general. While MacOS and Linux users can simply install Boost via commands like brew install boost, on Windows installing Boost from the Boost binary distribution takes a lengthy build procedure.

Most developers using GCC or Clang on Windows can instead simply install Boost using MSYS2:

pacman -S mingw-w64-x86_64-boost

Install MSYS2 on Windows

MinGW brings GNU compiler tools to Windows since the late 1990s. MSYS2 provides numerous developer tools including MinGW on Windows using pacman package manager.

Install

  1. Download msys2-x86_64-*.exe and run the installer, installing to C:/msys64. MSYS2 needs to be on a non-FAT / non-ExFAT drive capable of symbolic links, such as C:.

  2. Start the MSYS2 console in the Windows Start menu. Update MSYS2 to get the latest packages in the MSYS2 terminal. Run this command multiple times until it says “nothing to do”.

    pacman -Syuu
    
  3. add to your Windows user PATH:

    c:\msys64\mingw64\bin
    

PowerShell (optional)

To use MSYS2 / MinGW64 programs from PowerShell without disrupting other compiler use, we create ~/gcc.ps1 containing:

$Env:CC="gcc"
$Env:FC="gfortran"
$Env:CXX="g++"
$Env:path += ";c:/msys64/mingw64/bin/"

When it’s desired to use MSYS from a PowerShell prompt run ~/gcc.ps1.

Usage

From MSYS2 command prompt, tasks include:

Search for packages:

pacman -Ss gcc

Packages

MSYS2 packages of interest for scientific computing include:

Compilers

Libraries

build systems

tools

If you have confusion about why another version of a program is being used, check executable location like:

where gcc

You may need to reorder directories in your Windows Path variable, for example GNU Octave may need to be moved lower in the Path list or removed from Path.

Notes

Comparison

The advantages of MSYS2 over complementary approaches include:

  • Cygwin:
    • MSYS2 works from the Windows Command Prompt or PowerShell
    • MSYS2 provides native Window binaries
    • Cygwin does not have a command-line package installer
  • Windows Subsystem for Linux: same as Cygwin
  • Chocolatey provides many general Windows programs of interest to end users. MSYS2 is available via Chocolatey:
    choco install msys2
    
  • Scoop is similar to Chocolatey, but more developer oriented. MSYS2 is available via Scoop:
    scoop install msys2
    
  • AppGet is similar to Chocolatey, with a smaller set of packages
  • WinGet is from Microsoft and is also like Chocolatey and design inspired by AppGet
  • standalone MinGW generally is not up to date, has an old GCC version and no way to install packages

Software executable dry run

Developers covering multiple platforms and archs can benefit from including a self-contained dry run. We define a software dry run as a fast self-contained run of the executable, exercising most or all of the program using actual input files. The concept of dry run is used by popular programs that rely on several components and connections including rsync.

Benefits

A dry run self-check can be used from Python or any other script calling the executable to ensure the binary is compatible with the current platform environment. The dry run helps mitigate confusing error messages by checking that the executable runs on the platform before making a large program run.

The dry run can catch platform-specific issues like:

  • incompatible executable format (running a executable built for another platform)
  • executable built for incompatible arch (using CPU feature not available on this platform)
  • shared library (DLL) path / arch issues

Implementation

The dry run does not output any files besides temporary files. For example, in a simulation, the dry run might run one complete time step. To test file I/O, optionally write temporary file(s) using the same file format. An advanced dry run might read in those temporary files and do a basic sanity check.

By our definition, a dry run is distinct from an integration test. A dry run of the program just checks that the platform environment is OK to run with this binary. The dry run checks simply that the code executes without crashing. The dry run does not emphasize deep checks of program output as an integration test would. Consider making the dry run return code be 0 for compatibility with CMake and other high level build systems.

CMake dry run test

Assuming you have configured the project executable code as above, implement a check of the dry run with CMake.

project(Foo LANGUAGES C)
enable_testing()

add_executable(foo foo.c)

add_test(NAME check_foo COMMAND $<TARGET_FILE:foo> -dryrun <other command line flags>)
set_tests_properties(check_foo PROPERTIES PASS_REGULAR_EXPRESSION "OK: myprogram")

Here we make optional use of PASS_REGULAR_EXPRESSION to verify the special dry run text you put in the executable code. The dry run test should have return code zero.

Python f2py install problem workaround

f2py is a somewhat fragile submodule of Numpy that we do not generally recommend. f2py works with legacy Fortran 77 code, but generally does not work with modern Fortran code. Projects should carefully consider alternative approaches to f2py, such as a command-line + file interface with Python.

If experiencing compiler errors when using f2py, a last resort workaround is finding another computer that the install works on, of the same operating system. This can work on Windows or Linux from a computer of the same operating system and compiler ABI.

Donor computer

On the “donor” working computer:

python setup.py bdist_wheel

This creates mypkg/dist/mypkg-x.y.z-cp3x-cp3xm-win_amd64.whl (similar for other OS). This can only be used on Python 3.x (as per the filename) and the same CPU architecture.

python setup.py develop

This creates mypkg/src/mypkgy/fortranmodule.cp3x-win_amd64.pyd

Recipient computer

Both of those files are copied from the “donor” computer to the “recipient” computer. The *.pyd file is placed or soft-linked to the Python current working directory. The *.whl file is one-time installed by:

python -m pip install mypkg-x.y.z-cp3x-cp3xm-win_amd64.whl

Access Windows Subsystem for Linux files from Windows

It is possible to safely access the WSL filesystem from Windows. For WSL2, the WSL distro need not be running first to access the files within. WSL2 will automatically start the requested filesystem Linux image and the 9P file server in less than a second upon attempting to access the WSL2 image filesystem.

The WSL distro files are available from Windows under:

\\wsl$\Ubuntu\

To keep things simpler, we still keep files that need to be accessed from WSL and Windows under the usual Windows file system, making softlinks in WSL as useful.

For example, code in Windows under c:/users/username/code is accessed from WSL by one-time:

ln -s /mnt/c/users/username/code ~

Notes

Raw WSL files

DO NOT EDIT THESE RAW FILES!

Windows Subsystem for Linux places files for each WSL image uniquely named like:

%LOCALAPPDATA%\Packages\CanonicalGroupLimited.UbuntuonWindows*\LocalState\rootfs\ext4.vhdx

Mount external drives in WSL

N1MM Logger on Linux

N1MM Logger is popular amateur radio contest logging software designed for Windows. It may also be usable on Linux using WINE.

Caveats

This procedure requires some expertise with using WINE and may not work easily. It’s much easier to just run N1MM Logger on Windows, perhaps in a virtual machine.

Given the rapid development of N1MM, this unsupported procedure may break at any time. This N1MM logger on Linux was tested using:

  • Ubuntu 18.04 / 20.04
  • WINE 4.0 (WINE 3.x is fine too)
  • winetricks 20181203
  • winecfg set to Windows 7 32-bit

Install

Setup a 32-bit Windows 7 WINE environment with .NET 4.0, then install the N1MM logger.

  1. Set WINE to Windows 7 under

    WINEPREFIX=~/.wine_n1mm WINEARCH=win32 winecfg
    

    This implicitly creates a new 32-bit Wineprefix.

  2. Install .NET 4.0 in WINE 32-bit. It takes about 3-5 minutes, and at a couple points in the install, the progress bar seems to freeze, but the console text keeps scrolling. Note that .NET newer than 4.0 might not work for N1MM (thanks Harry Bloomberg for noting this).

    WINEPREFIX=~/.wine_n1mm winetricks dotnet40
    
  3. Download and run N1MM Full Install

    WINEPREFIX=~/.wine_n1mm wine N1MM*FullInstaller*.exe
    
  4. Download and run N1MM latest update

    WINEPREFIX=~/.wine_n1mm wine N1MM*Update*.exe
    
  5. Start and configure N1MM Logger as per the directions for your particular contest, the binary is at:

    WINEPREFIX=~/.wine_n1mm wine '/home/thin/.wine_n1mm/drive_c/Program Files/N1MM Logger+/N1MMLogger.net.exe'
    

    Create a script ~/n1mm.sh containing:

    #!/bin/bash
    
    WINEPREFIX=~/.wine_n1mm wine '/home/thin/.wine_n1mm/drive_c/Program Files/N1MM Logger+/N1MMLogger.net.exe'
    

    then

    chmod +x ~/n1mm.sh
    

Run N1MM Logger by simply typing in Terminal:

~/n1mm.sh

Radio control

N1MM can OPTIONALLY interface with your radio to pull out the frequency/mode for the log. You’ll need to map the WINE serial port and then select that COM port in N1MM Logger.

  1. look for the USB ↔ serial adapter before/after plugin with:

    dmesg -w
    
  2. Start the WINE registry editor:

    WINEPREFIX=~/.wine_n1mm wine regedit
    
  3. configure the port. Say your device is seen at /dev/ttyUSB0, and you want it to appear to WINE on COM1. Edit HKEY_LOCAL_MACHINE/Software/Wine/Ports to have a new string entry named COM1 with value /dev/ttyUSB0.

  4. restart WINE:

    wineserver -k
    

    then reopen N1MM logger wit the script you created in the installation:

    ~/n1mm.sh
    
  5. verify this setting (but do not edit) by:

    ls ~/.wine_n1mm/dosdevices/com1
    

    there should be: com1 -> /dev/ttyUSB0

Note: Harry Bloomberg notes that you may be able to specify the specific long device name under /dev/serial instead of /dev/ttyUSB0. This may help avoiding the USB device changing port numbers when plugging / unplugging the USB device.

Notes

Alternatives

Currently, ReactOS 0.4.10 is not able to install N1MM logger. The N1MM Logger install hangs at:

Downloading RGB9RAST_x86.msi

Advanced use

Phil Erickson of MIT Haystack noted that for certain SDRs that use hamlib, you may be able to rewire the output of N1MM into hamlib via socat.

Alternative Matlab editor with lint

A key strength of Visual Studio Code editor is the high-quality plugins available. Xavier Hahn’s Matlab plugin uses Matlab’s mlint command-line utility to lint code in the VS Code editor. The lint is shown as squiggle underlines with hover messages on the detected code issues.

Another key feature provided for Matlab .m code is Go to Definition that allows clicking on a function name and automatically opening to the location where the function is defined, even in another file.

setup

There is a bit of manual setup needed, in VS Code preferences, to set “matlab.mlintpath” to the full path to the mlint executable. This path would be like “c:/Program Files/MATLAB/R2020a/bin/win64/mlint.exe”.

Recursively lint directory tree of Matlab code

Matlab (dark) theme changer

Currently, Matlab does not have a factory-built method to programatically change the color theme of the Matlab IDE (interactive code-editing GUI). Using undocumented functionality (a common technique to do advanced things in Matlab) it is possible to change the color theme of the main IDE. Not all UI colors are changed, in particular the buttons, borders and line numbers remain with the factory colors. Also many data manipulation and analysis UI remain at factory colors. This technique allows users to mitigate the need for an alternative Matlab code editor.

We join the voices of those calling on the Mathworks to make Matlab color theme changing built-in from the factory, particularly to address accessibility concerns.

Matlab schemer_import

The Matlab schemer_import utility was a 2018 File Exchange Pick of the Week with favorable comments from Yair Altman among others. If you are currently using Matlab defaults for IDE color, you can use the command schemer_import right away from the downloaded code. If you wish to first preserve / export your existing custom color theme, read the documentation for schemer_export first to ensure your color theme is correctly exported first before importing a theme.

Check website for broken link with Python

Our small Python-based Markdown link-checking script is effective for large (thousands of pages, tens of thousands of links) Markdown-based websites/ It is immensely faster than the legacy HTML LinkChecker program of the next section. Alternatives exist for Go and JavaScript.

If you’re using Netlify, consider a link-checking plugin that checks tens of thousands of links for each “git push” of the website Markdown in about two minutes.

HTML LinkChecker

If your website is not Markdown-based, there is a large HTML LinkChecker Python program that was an effective offline or online method to recursively check websites from the command line. However, it is not frequently maintained, and has a growing number of false positives and false negatives.

Install

The PyPi releases are out of date so instead of the usual

pip install linkchecker

we recommend using the development Linkchecker code

git clone --depth 1 https://github.com/linkchecker/linkchecker/

cd linkchecker

python -m pip install -e .

Internal/external links are tested recursively. This example is for a Jekyll website running on my laptop:

linkchecker --check-extern localhost:4000

The checking process takes several minutes, perhaps even 20-30 minutes, depending on your website size (number of pages & links). Pipe to a file as below if you want to save the result (recommended).

Examples

list options for recursion depth, format output and much more:

linkchecker -h

save the output to a text file

linkchecker --check-extern http://localhost:4000 &> check.log

monitor progress with

tail -f check.log