PunchcardOS

December 21, 2024 · View on GitHub

Punchcard is a hyper-minimal Linux distro that contains just enough pre-compiled code to compile and run C programs for the purposes of bootstrap a trustworthy toolchain. Unless you compile from source, you don't really know that there aren't backdoors, unwanted telemetry, malware, etc. in the code you are running, and even if you inspect the source, the compiler, or another tool you use could be inserting malware into its outputs (Supply Chain Attack).

This repository is meant to provide a minimal set of tools for "bootstrapping" other bigger, more complex tools like the GNU C compiler without using a pre-compiled GNU C compiler and glibc (among many other things) so that you can have a trustworthy system to build up from.

Namesake

"PunchcardOS" harkens back to the bootstrapping of computing by the use of punchcard programs.

Details

PunchcardOS will start with a pared-down Linux kernel, built specifically to run only on whatever hardware it is going to run on, but with no networking capability (by default, at least), and with other features added or removed to minimize code size while still accepting some security features. Ideally, this kernel will use no hardware-specific cryptography features. Can you really trust Intel to generate your keys without backdoors?

The binaries included in ./programs will be written so as to compile using nolibc, meaning that they can run as static binaries on a system that has no libc implementation. Every program will be small, understandable, and single-file, so that very little tooling is needed to build it.

For this system to be usable, you will at least need a pre-compiled shell and a C compiler. These will be pre-compiled and included in the distro so you can bootstrap a system. (Maybe we could even do away with the shell in the future by making some small C compiler compile the shell, then run it.)

You won't even have an ls command, a text editor, cat, less, or anything like that until you compile it in this constrained environment.

The minimal C compiler will be based on ChibiCC, and the shell will be based on lsh. All of the above will be modified to compile with nolibc, possibly with reduced functionality. The assembler will be based on minias. Maybe this will work as a small linker, or perhaps this? ELF utils is just absolutely huge, so it would be preferable to avoid it.

Source for the programs in the ./programs folder will be included, which you will need to compile in this constrained environment to use as a part of the bootstrapping.

These programs include:

  • A text editor based on kilo
  • An HTTP server
  • Tools for encrypting and hashing
  • Tools for making an HTTP POST request
  • Common unix utilities, such as ls and rmdir
  • Tools for inspecting assembly and binaries

(Obviously some of these won't work if you have no networking.)

The distro will also include clones of the stage0 source and GNU Mes, among other repositories that contain common build tooling useful for bootstrapping other projects. This project isn't meant to compete with these, but rather, complement them: to provide a trustworthy, minimal distro on which you can use these tools for more bootstrapping.

The Goal

The goals of this project is to create a distro that is small enough where all of the code could reasonably be reviewed and understood, but which can also bootstrap itself, using only a pre-compiled Linux kernel, C toolchain, and shell: the bare minimum to bootstrap.

Once PunchcardOS itself has been bootstrapped using its own small, trustworthy tooling, you could assume that its pre-compiled kernel, C toolchain, and shell are also trustworthy, and therefore you have a fully trustworthy system. On the second pass, the kernel will be compiled with networking support, so that source archives can be fetched from remote sources to bootstrap any software.

In short:

  1. Build PunchcardOS (using untrustworthy tooling)
  2. Use PunchcardOS to bootstrap PunchcardOS (thereby producing a trustworthy PunchcardOS)
  3. Use PunchcardOS (now trustworthy) to bootstrap any software
  4. Publish the hashes of your build outputs so others can verify, or publish the build outputs themselves.

Executable Standards

All executables in this project will be able to be built against nolibc, consist of only a single file, and will use hard-coded timestamps (except the date command).

Hard-Coded Timestamps

We want the executables produced using the tools in this repository to have consistent outputs, regardless of the machine or the time in which they were created. A given set of inputs and tooling, should always produce a byte-for-byte equal output, so that they (or hashes of them) can be compared between bootstrapping attempts on different machines.

As such, source files in this repository, if taken from other projects, will be modified so that they use a fixed timestamp where they would normally use the real timestamp.

Areas where code was modified to hard-code a timestamp will be annotated with a comment that contains HARDCODED_TIMESTAMP, in that exact casing, so that you can search for these and change it, if desired.

The timestamp to which these will be set is to be decided upon, but it will be documented here.

nolibc

About nolibc

nolibc was a feature snuck into the Linux kernel without much fanfare. Basically, it is a few headers that implement just a subset of a proper libc implementation so applications do not have to link against a third-party libc. This is a libc that comes with the Linux kernel! You can read more about it here. The root of all the headers includes documentation in the comments here.

You have to do a little work of your own to implement any libc functionality you are not getting from these headers, but small programs can function without.

This is the perfect solution for PunchcardOS, since it cuts out the amount of software needed to bootstrap a trustworthy system.

Compiling with nolibc headers

First, your code has to be written to actually use nolibc. To do this, wrap the headers you normally include with #ifndef NOLIBC and #endif. nolibc defines NOLIBC. Doing this prevents duplicate definitions coming from glibc or whatever other libc you use by default. If there are any unimplemented standard library things you need, define them in an #else block.

To compile a static binary that is linked against nolibc (as described in the header of nolibc.h in the Linux Kernel), run:

gcc -static -nostdlib -include PATH_TO_LINUX_SRC/tools/include/nolibc/nolibc.h \
 -o YOUR_BIN_NAME ./programs/YOUR_SRC_NAME -lgcc

(After replacing YOUR_BIN_NAME, YOUR_SRC_NAME, and PATH_TO_LINUX_SRC obviously.) I don't know if the ordering of arguments matters, but it doesn't seem like it.

You can see a good example of this in ./programs/basename.c. The standard library doesn't have a basename function, so I had to implement one that is used only when compiling the nolibc variant.

Building PunchcardOS

# Define which kernel modules are minimally required to run your hardware.
lsmod > mods

# Actual build of the distro.
docker build . -t pcosbuild -f ./punchcardos.dockerfile

# Privileged step required to mount filesystem.
sudo docker run --privileged pcosbuild bash \
-c 'mkdir m && mount boot m && cp bzImage init.cpio syslinux.cfg files/* m && umount m'

# This gets the ID of the most recent pcosbuild container.
CONTAINER_ID=$(docker ps --format 'table {{.ID}} {{.Image}}' -a | grep pcosbuild | head -n1 | cut -d ' ' -f1)

docker cp $CONTAINER_ID:/build/boot pcos.img

Running PunchcardOS

On QEMU

At minimum, you must run this:

qemu-system-x86_64 -drive file=pcos.img,format=raw,index=0 -m 2G

It is recommended that you add -serial stdio at the end of that, but I am not your boss. UPDATE: It seems like the graphical console doesn't work anymore. I suspect I removed something needed for the keyboard to work.

When it boots up, you should see a Syslinux prompt. This is because there is no configuration to tell Syslinux what to boot and how to boot it. At this prompt, type in:

/bzImage -initrd=/init.cpio root=/dev/sda

If you included -serial stdio on the QEMU command, add console=ttyS0 to the end of the above Syslinux prompt, and hit ENTER. You should see a shell appear in either the graphical QEMU window, or in the terminal where you ran QEMU. Type ls and hit ENTER. If you see a listing of entries, it booted!

Compiling within PunchcardOS

Since PunchcardOS only comes with a minimally working C compiler, you will need to compile a C program to build anything else. As an example, you can compile a program using the Tiny C compiler using a command like:

cc -nostdlib -static -g -include /src/straplibc/src/nolibc.h \
  /lib/syscall.o /src/tcc/lib/va_list.o /src/punchcard/ls.c

You might need to include /src/tcc/lib/va_list.o. The compiled binary does not have access to the __va_arg symbol without this for some reason (to be investigated).

/lib/syscall.o is a pre-compiled ELF object allowing you to make system calls using straplibc. straplibc is an even smaller nolibc that was specifically tailed for bootstrapping purposes.

On Physical Hardware

PunchcardOS is yet to have been tested on physical hardware.

Tests

There is setup involved for these to work, but you can run the tests under test/ like so: node ./test/enc.mjs.

License

Some of the software in this repository was copied from other repositories, and are governed by a different result. Unless otherwise noted, all software in this repo is licensed under the MIT License whose text here:

MIT License

Copyright 2023 Jonathan M. Wilbur <jonathan@wilbur.space>

Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the “Software”), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
the Software, and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Bootstrapping Order

These are kind of my chicken-scratchy notes. I will clean these up as I figure out how this is going to work.

Get a trustworthy GCC

In order (I think):

The above steps will be done:

  1. Once, quickly and sloppily, just to make sure that it's possible.
  2. Again, but carefully reading all of the code before compiling.
  3. Again, after implementing hardware obfuscation.
  4. Again, on different hardware.

Get a trustworthy Linux Kernel

For some reason, this page is where the requirements to build are found:

https://docs.kernel.org/process/changes.html

Other Notes

Ambiguous Error with Docker

After adding the steps to the dockerfile to fetch the tarballs from remote sources, I was getting an error that said something to the effect of

failed to solve: failed to prepare $SOMETHING_1 as $SOMETHING_2: max depth exceeded

As it turns out, this absolutely spectacular and unambiguous error message means that you have too many steps in your Dockerfile, and you need to condense steps. I got by it by collapsing simliar steps into the scripts ./scripts/configure_kernel.sh and ./scripts/fetch_sources.sh.

Getting the distro booting

Building Linux 6.9.6 on an 11th Gen Intel(R) Core(TM) i7 with 16 GB RAM using the allmodconfig build followed by the config changes in the punchcardos.dockerfile and using make -j4 took almost exactly two hours. I tried this because the kernel was panicking, and I assumed that I was probably missing some driver Qemu needed, so compiling absolutely everything should make it work, right?

(I do wonder if the compiled kernel modules would even be available in initramfs, so this might not have fixed my problem anyway. I don't really know how kernel modules work, but I think they are stored on disk, so I don't know how the kernel would be able to read them before there are any disks mounted!)

After using the allmodconfig, the distro still failed to start up, but differently this time. It seemed to be panicking on some sort of tracing self-test. I tried disabling kernel configs related to this and I don't remember what the result was, but since I was going to have to do another two-hour long allmodconfig build again to see if that fixed it, I just gave up on this avenue of troubleshooting.

After trying to get this to work for hours, I finally figured out how to get the kernel logs to display in stdout using Qemu (specifically so I can scroll up and see earlier kernel ring buffer messages that are otherwise lost when they go off-screen). You have to add -serial stdio to your qemu-system-x86_64 command like so:

qemu-system-x86_64 -drive file=pcos.img,format=raw,index=0,media=disk -serial stdio

and when the Syslinux prompt appears, you have to add console=ttyS0 to the kernel parameters, like so:

/bzImage -initrd=/init.cpio console=ttyS0

The former tells Qemu to use stdin and stdout for the serial terminal, and the latter tells the Linux kernel to use this serial terminal as its console. It seems like some kernel messages will still get displayed in the Qemu window, still, but they are sparse enough to not be a problem.

This was useful for figuring out why I couldn't boot: the Linux kernel was panicking because I did not specify a root filesystem via the root parameter. I didn't know that was required. I still haven't figured out the fix to this, but I have gotten further.

After testing all options for the root parameter, I found that the Linux kernel was successfully mounting a disk, but not finding /sbin/init. It turned out that, because I was using the find command in the directory above the initramfs to create the CPIO archive, this archive had all of its contents in an initramfs folder, making the correct path /initramfs/sbin/init. I haven't fixed this yet, but I figured I should document it as soon as I realized it. You can view the contents of CPIO archives using cpio -itv < archive.cpio.

After getting past that hurdle, the only way I could get the disk to be "found" by the kernel was to remove the media=disk parameter from Qemu options. It is weird that this worked, because apparently this is the default behavior.

After that, it could not find /sbin/init. I changed the build so that this was placed at /init. That fixed it and now it works, but I have no idea why. The documentation specifically lists where the kernel searches for init programs, and this is not one of those places!

I had another problem later on: after adding the downloading of tarballs from online into the init.cpio, and after refactoring this into a Bash script outside of the Dockerfile, the Syslinux boot prompt would complain that there was "No such file or directory" as init.cpio, even though copying this file into the final image succeeded. Once again, a piece of shit program is reporting a misleading error message: the problem was that the system didn't have enough memory to load the 224M initrd filesystem. I changed it to 1G, which got it to boot up successfully.

A C Compiler

As it turns out c4 is not viable as a bootstrapping compiler. It basically only works for compiling itself, and its codebase is so abstruse that it is not suitable for a "bootstrapping" compiler. ChibiCC seems more promising. I think I will leave c4 in the distro, though. If somebody can make it work for their needs, it's fine to have another C compiler lying around.

UPDATE: After experimenting with ChibiCC, I don't think that will work either. Specifically, nolibc uses too many GCC-specific features where I don't think I can make nolibc work. See comments in ./programs/chibicc.c. I might be able to make this work by using a modified nolibc. The biggest thing I need to eliminate is the GCC-style inline assembly.

Segfault in strlen when using nolibc

Make sure you are using recognized format specifiers in printf and the like. The segfault was happening because I was using %X, when only %x was supported. I need to report this. I don't think it should do this.

See Also

To Install

To Do

  • Real MVP
    • readelf -rW support (actually, you might not need to do this if you get rid of vDSO)
    • Ext2 Filesystem
    • Make bash run scripts
    • Test minised with the expressions Linux uses
  • MVP
    • readelf
      • Print symbols
      • Print symbols in executable code
      • Print relocations
      • readelf -rW support
    • ld
      • This might get a lot simpler if I can get VDSO to not be built.
      • You might be able to use neatld (see ./programs/neatld.c)
    • minibash
    • hexdump
    • hunt
    • peck
    • pex (Peck, but for Hex)
    • ar (Should be easy, just copy mkcpio and change a few things.)
    • minised
    • miniawk
      • This is too heavily dependent on regex and math. It will have to compile against musl.
      • Proven to be required, even by an allnoconfig build of the Linux kernel.
    • cpio
    • A bootloader (https://github.com/owenson/tiny-linux-bootloader/tree/master)
    • m4
    • dd
    • truncate
    • chmod
    • chown
    • find (This doesn't seem to be necessary.)
    • grep
    • httpget
    • env
    • stty
    • sleep
    • dmesg
    • stat
    • touch
    • hostname
    • strings
    • tty
    • test (https://raw.githubusercontent.com/michael105/minicore/master/porting/minutils/src/%5B.c)
      • [ ! -r include/generated/utsversion.h ]
      • [ 10 -gt 255 ] (What is this for?)
      • [ ! -d '/build/stage/src/linux/tools/objtool/libsubcmd/include/subcmd/' ]
    • mv
    • cp
    • rsync
    • printf (Used to build linux, but not much)
    • perl (Does not seem to be needed for a Linux build)q
    • sort
    • uniq
    • xargs
    • tr
    • objcopy
    • nm (Used in Linux, but might not be needed)
    • wc might be needed for Linux
    • ln
    • cmp
    • Some hashing program
    • C runtime that drops capabilities, changes root, landlock?
  • Mount initramfs as read-only
  • Is it possible to make single-file binutils / elfutils commands?
  • Tools for blacklisting or removing modules?
  • Syslinux configuration
  • Clear kernel ring buffer messages from screen
  • Get a working C compiler
  • Shuffle kernel syscall numbers so malicious hardware is clueless
  • Make a linker
  • Make an assembler
  • Container building via Earthly
  • https://github.com/oriansj/torture_c/tree/main
  • https://github.com/jart/sectorlisp
  • https://github.com/realchonk/microcoreutils/tree/master
  • I still haven't found a suitable ELF / x86-64 disassembler.
  • Signed commits
  • Signed releases
  • Signed container images
  • Use seccomp()
  • Use chroot()
  • Document special / non-standard programs
  • Use TPM for some RNG
  • Configure Integrity Management Architecture (IMA)?
  • no_new_privs
  • Use Landlock
  • Drop capabilities
  • setcap / getcap programs
  • Executable minification
  • Clone and use: https://git.kernel.org/pub/scm/docs/kernel/pgpkeys.git
  • Get QEMU graphical console working again
  • Delete the unused arch/* and test folders from Linux to minimize the amount of code
  • Programs
    • My own ideas
      • nofun - Turn unused functions into nop sleds + exit
      • hashsyms - Hash functions and symbols for easy visual comparison of binaries
      • elfdiff
    • arch
    • asn1parse
    • awk
    • basename
    • bc4linux
    • bc
    • c4
    • cat
    • chgrp
    • chibicc
    • chmod
    • chown
    • chroot
    • cksum
    • clear
    • cmp
    • col
    • colrm
    • comm
    • cowsay
    • cp
    • cpio
    • cut
    • date
    • dd
    • diff3
    • diff
    • dirname
    • dmesg
    • echo
    • edit
    • enc
    • env
    • expand
    • expr
    • false
    • file
    • find
    • fmt
    • fold
    • getcap
    • getconf
    • getopt
    • grep
    • groff
    • groups
    • halt
    • head
    • hexdump
    • hostid
    • hostname
    • httppost
    • hunt
    • id
    • installkernel
    • ischroot
    • join
    • kill
    • kilo
    • ld
    • less
    • lisp
    • logger
    • look
    • ln
    • ls
    • m4
    • make
    • man
    • minias
    • minilisp
    • mkdir
    • mount
    • mv
    • nl
    • nm
    • otp
    • patch
    • pbget
    • pbput
    • pivot_root
    • pmap
    • poweroff
    • ps
    • pwd
    • rand
    • readelf
    • reboot
    • reset
    • rm
    • rmdir
    • rsync
    • sdiff
    • sed
    • seq
    • setcap
    • sha256
    • sha3
    • sh
    • shuf
    • shutdown
    • sleep
    • sort
    • split
    • stat
    • stty
    • sum
    • tabs
    • tac
    • tail
    • tar
    • tee
    • test
    • time
    • timeout
    • tinylisp
    • touch
    • tr
    • tree
    • true
    • truncate
    • tset
    • tty
    • uname
    • unexpand
    • uniq
    • uptime
    • users
    • uuidgen
    • wc
    • whereis
    • which
    • whoami
    • xargs

Long-Term Approach

  • Use hexedit UEFI command to manually create binaries
    • This will only work for very small binaries!
  • Make a super tiny assembler
    • Actually, this might need to assemble EFI Bytecode... but maybe not
    • This should be minimal enough to just work with the output of ChibiCC
      • addq
      • addsd
      • addss
      • and
      • cdq
      • cqo
      • cvtsd2ss
      • cvtsi2sdl
      • cvtsi2sdq
      • cvtsi2ssl
      • cvtsi2ssq
      • cvtss2sd
      • cvttsd2sil
      • cvttsd2siq
      • cvttss2sil
      • cvttss2siq
      • dec
      • divsd
      • divss
      • faddp
      • fcomip
      • fdivrp
      • fistpl
      • fistpq
      • fistps
      • fmulp
      • fnstcw
      • fstp
      • fstpt
      • fsubrp
      • idiv
      • imul
      • inc
      • jbe
      • je
      • jmp
      • lock cmpxchg
      • mov
      • movl
      • movq
      • movsbl
      • movsd
      • movss
      • movswl
      • movsxd
      • movzb
      • movzbl
      • movzwl
      • movzx
      • mulsd
      • mulss
      • neg
      • or
      • rep stosb
      • sar
      • seta
      • setae
      • setb
      • sete
      • setl
      • setle
      • setne
      • setnp
      • shl
      • shr
      • subsd
      • subss
      • test
      • ucomiss
      • ucomisz
      • xchg
      • xor
  • Assemble a ChibiCC that has been modified to be simple as possible, but which can still self-host, after verifying source
  • Use that C compiler to build up to TinyCC
  • Build all tools needed
  • Build the Linux kernel

Kernel Version Choice

I have been reconsidering using a newer Kernel version. Fabrice Bellard (maker of TinyCC) managed to compile the Linux Kernel 2.4.x with TinyCC.

I have decided on 2.6.27.21 for the bootstrapping version for the following reasons:

  • 2.6.27.x is an LTS release, and although it is already at its end of life, I would assume that it is more stable than Kernel versions adjacent to it.
  • It is a release that is early enough where there is a lot less complex tooling needed to build it.
  • There is Linux Libre releases available for this version.
  • This version is modern enough where it still supports USB keyboards and other such peripherals you are likely to use.
  • This version has support for the allnoconfig Make target.
  • Greg Kroah-Hartman took over the Linux kernel

Compiling on the Host System

This version did not compile right out of the box for me. It still took some work to make it compile.

  1. Add -fno-pie to KBUILD_CFLAGS in Makefile.
  • PIE became a default in newer versions of GCC, which is why this is needed.
  1. In arch/x86/vdso/Makefile, replace VDSO_LDFLAGS_vdso.lds = -m elf_x86_64 with VDSO_LDFLAGS_vdso.lds = -m64.
  2. Generate kernel/timeconst.h and remove FORCE from its Make target.
  3. There seems to be some problem with duplicate functions, noticeably where one is marked extern and another is not. It seems that maybe older versions of GCC or perhaps the linker treated an extern implementation like a "weak" linkage. This might involve a lot of changes to fix.
  4. In include/linux/inotify.h, change the pin_inotify_watch and unpin_inotify_watch implementations (NOT the declarations) from extern to static. This may need to be reversed once you install headers. I am not sure.
  5. Change __mutex_unlock_slowpath and __mutex_lock_slowpath from static to extern in kernel/mutex.c. I have no idea why this was needed. It doesn't make any sense.