from Hacker News

Hello World

by fbrusch on 4/8/24, 9:07 AM with 79 comments

by MuffinFlavored on 4/8/24, 6:56 PM

I got bored the other day and tried to achieve something similar on MacOS with Rust:

    #![no_std]
    #![no_main]
    
    use core::panic::PanicInfo;
    
    #[panic_handler]
    fn panic_handler(_panic: &PanicInfo<'_>) -> ! {
        // TODO: write panic message to stderr
        write(2, "Panic occured\n".as_bytes()); // TODO: panic location + message
        unsafe { sc::syscall!(EXIT, 255 as u32) };
        loop {}
    }
    
    fn write(fd: usize, buf: &[u8]) {
        unsafe {
            sc::syscall!(WRITE, fd, buf.as_ptr(), buf.len());
        }
    }
    
    #[no_mangle]
    pub extern "C" fn main() -> u32 {
        write(1, "Hello, world!\n".as_bytes());
        return 0;
    }

Then I inspected the ELF output in Ghidra. No matter what it was about ~16kb. I'm sure some code golf could be done to get it done (which has obviously been done + written about + documented before)

by praptak on 4/9/24, 7:42 AM
There's another rabbit hole which Musl seems to have skipped. Using `syscall` directly is not all there is to calling system functions on Linux.
The "better behaved" way is to call vDSO. It's a magic mini-library which the kernel automatically maps into your address space. Thus the kernel is free to provide you with whatever code it deems optimal for doing a system call.
In particular some of the system calls might be optimized away and not require the `syscall` at all because they are executed in the userspace. Historically you could expect vDSO to choose between different mechanisms of calling the kernel (int 0x80, sysenter).
https://man7.org/linux/man-pages/man7/vdso.7.html
by qweqwe14 on 4/9/24, 10:00 AM
Also see this blog post, which compares "Hello World" programs in different languages by overhead: https://drewdevault.com/2020/01/04/Slow.html
Follow-up: https://drewdevault.com/2020/01/08/Re-Slow.html
This legendary blogpost makes the smallest Linux program (the program simply exits with status 42): https://www.muppetlabs.com/~breadbox/software/tiny/teensy.ht...
You can also find the smallest "Hello World" program on that website.
by bkallus on 4/8/24, 8:21 PM
This almost entirely skips the role of the dynamic linker, which is arguably the true entry point of the program.
If you are interested in that argument, see https://gist.github.com/kenballus/c7eff5db56aa8e4810d39021b2....
by susam on 4/9/24, 8:58 AM
In case there are any DOS enthusiasts out here, a "hello, world" program written in assembly/machine code in DOS used to be as small as 23 bytes: https://github.com/susam/hello
Out of these 23 bytes, 15 bytes are consumed by the dollar-terminated string itself. So really only eight bytes of machine code that consists of four x86 instructions.
by cancerhacker on 4/8/24, 9:01 PM
I liked and appreciated this, two points I'd like to make: you should disable optimizations and whatever inlining caused printf to become puts (or, alternatively, write the hello world to use puts directly), second would be to break your compile step into the 4 real parts: preprocess, compile, assemble, link. Or add --save-temps to the cc line and describe the various files created. There's a lot less magic involved if you can see the pipeline.
by Syntaf on 4/8/24, 7:31 PM
This reminds me of one of my favorite CS assignments in college for a systems programming class:
```
    > Given a hello world C++ snippet, submit the smallest possible compiled binary
```
I remember using tools like readelf and objdump to inspect the program and slowly rip away layers and compiler optimizations until I ended up with the smallest possible binary that still outputted "hello world". I googled around and of course found someone who did it likely much better than any of us students could have ever managed [1]
[1]: https://www.muppetlabs.com/%7Ebreadbox/software/tiny/teensy....
by norir on 4/8/24, 8:43 PM
> I’m sorry the ending maybe wasn’t as satisfying as you hoped. I’m happy someone found this interesting. I’m not quite sure why I wrote this, but it’s now after midnight so I should get some sleep.
This was actually a perfect ending to this piece.

by delta_p_delta_x on 4/8/24, 7:17 PM

Sadly, like most 'hello world' deep dives, the author stops at the `write` syscall and glosses over the rest. Everything before the syscall essentially boils down to `printf` calling `puts` calling `write`—it's one function call after another forwarding the `char const*` through, with some book-keeping. In my opinion, not the most interesting.

What comes after the syscall is where everything gets very interesting and very very complicated. Of course, it also becomes much harder to debug or reverse-engineer because things get very close to the hardware.

Here's a quick summary, roughly in order (I'm still glossing over; each of these steps probably has an entire battalion of software and hardware engineers from tens of different companies working on it, but I daresay it's still more detailed than other 'tours through 'hello world'):

  - The kernel performs some setup setup to pipe the `stdout` of the hello world process into some input (not necessarily `stdin`; could be a function call too) of the terminal emulator process. 
  - The terminal emulator calls into some typeface rendering library and the GPU driver to set up a framebuffer for the new output. 
  - The above-mentioned typeface rendering library also interfaces with the GPU driver to convert what was so far just a one-dimensional byte buffer into a full-fledged two-dimensional raster image:
    - the corresponding font outlines for each character byte is loaded from disk;
    - each outline is aligned into a viewport; 
    - these outlines are resized, kerning and font metrics applied from the font files set by the terminal emulator;
    - the GPU rasterises and anti-aliases the viewport (there are entire papers and textbooks written on these two topics alone). Rasterisation of font outlines may be done directly in hardware without shaders because nearly all outlines are quadratic Bézier splines.
  - This is a new framebuffer for the terminal emulator's window, a 2D grid containing (usually) RGB bytes.
  - The windowing manager takes this framebuffer result and *composits* with the window frame (minimise/maximise/close buttons, window title, etc) and the rest of the desktop—all this is done usually on the GPU as well.
    - If the terminal emulator window in question has fancy transparency or 'frosted glass' effects, this composition applies those effects with shaders here.
  - The resultant framebuffer is now at the full resolution and colour depth of the monitor, which is then packetised into an HDMI or DisplayPort signal by the GPU's display-out hardware, depending on which is connected.
  - This is converted into an electrical signal by a DAC, and the result piped into the cable connecting the monitor/internal display, at the frequency specified by the monitor refresh rate.
    - This is muddied by adaptive sync, which has to signal the monitor for a refresh instead of blindly pumping signals down the wire
  - The monitor's input hardware has an ADC which re-converts the electrical signal from the cable into RGB bytes (or maybe not, and directly unwraps the HDMI/DP packets for processing into the pixel-addressing signal, I'm not a monitor hardware engineer).
  - The electrical signal representing the framebuffer is converted into signals for the pixel-addressing hardware, which differs depending on the exact display type—whether LCD, OLED, plasma, or even CRT. OLED might be the most complicated since each *subpixel* needs to be *individually* addressed—for a 3840 × 2400 WRGB OLED as seen on LG OLED TVs, this is 3840 × 2400 × 4 = 36 864 000 subpixels, i.e. nearly 37 million pixels.
  - The display hardware refreshes with the new signal (again, this refresh could be scan-line, like CRT, or whole-framebuffer, like LCDs, OLEDs, and plasmas), and you finally see the result.

Note that all this happens at most within the frame time of a monitor refresh, which is 16.67 ms for 60 Hz.

by stevefolta on 4/8/24, 10:36 PM
> Unlike python, however, you can’t just call an interpreter to run this program.
Sure you can: "tcc -run hello.c". Okay, technically that's an in-memory compiler rather than an interpreter.
For extra geek points, have your program say "Hellorld" instead of "Hello world".
by qznc on 4/8/24, 10:43 PM
Another interesting fact: Write a Hello World program in C++, run the preprocessor on it (g++ -E), then count the lines of the output.
I just tried and it shows me 33738 lines (744 lines for C btw).
In a language like C++, even Hello World uses like half of all the language features.
by bitwize on 4/9/24, 12:42 PM
One of the things I really like about the "hello, world" example in K&R is it's used as a template to show the reader what a complete, working (if bare minimum) C program looks like. All the relevant parts are labeled including the #include preprocessor directive, function definition, function call (to printf), etc. There are a few paragraphs explaining these parts in greater detail. Finally some explanation is provided about how one might compile and run this program, using Unix as an example (the authors are pretty biased toward that system).
This article is very much in that same friendly, explanatory spirit, although obviously it goes into greater depth and uses a modern system.
by sohzm on 4/8/24, 9:02 PM
This reminded me of this video "Advanced Hello World in Zig - Loris Cro" https://youtu.be/iZFXAN8kpPo although not a equivalent comparison but still a intresting watch
by baudaux on 4/9/24, 5:11 AM
GNU hello program is quite complex as well
https://www.gnu.org/software/hello/
by Bengalilol on 4/8/24, 9:34 PM
Just saying hi ! (and thanks for the read, got me back to the old days of ASM One and SEKA on Amiga, trying to clean up my memory dust pile)
by mseepgood on 4/9/24, 10:31 AM
Could have continued with following the syscall code in the kernel.
by deepsun on 4/9/24, 4:26 PM
Would be cool to see the same for statically-linked HelloWorld.
by ben_w on 4/8/24, 6:54 PM
I've been thinking recently, we might have too many layers of abstraction in our systems.
And I didn't even know about most of the ones in this post.
by kenneth on 4/9/24, 12:48 PM
What I think is shocking is that 99% of the software engineering world these days would have zero ability to comprehend this explanation. In my opinion, you could not be a real programmer without the ability to understand what your program is under the hood. You might be able to get away with it, but you're just faking it.
And yet, 99% of the people I've ever seen in the industry have no idea how any of the code they write works.
I used to ask a simple interview question, I wanted to see if potential hires could explain what a pointer or memory was. Few ever could.
by Retr0id on 4/8/24, 8:09 PM
> All modern big and important programs that make a computer work are written this way [AoT-compiled to native code].
This is the conventional wisdom, but it's increasingly not true.
by nektro on 4/9/24, 8:26 PM
loved it! except the final conclusion
by vivzkestrel on 4/9/24, 2:42 AM
there was another post like this on HN that was actually illustrated in a better manner, anyone got links to it?