this post was submitted on 23 Nov 2023
25 points (87.9% liked)

Programming

17398 readers
119 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev



founded 1 year ago
MODERATORS
 

Being a foss enthusiast I can configure most of my software in way too many ways. However I noticed that this is not true for most compilers. Which got me thinking: why isn't that the case. In gcc (or your favorite compiler tool) I have a shitload of options about what are errors and warnings and how the code should be compiled and tons of other options. But not on how the code should be interpreted and what the code should look like.

Why can't I simply add a module to a build process to make it [objective oriented | have indentation for brackets | automatically allocate memory | automatically assume types | auto forward-declarate | some other thing that differentiates one language from another]* ? Its so weird that I have a pdf reader that has an option to set the window icon, a mail client that lets me specify regex to search for a mentioned but forgotten attachment and play a game that lets me set my texture picmip resolution. But that the tool (gcc) to build these things has not even got a config file build in. We have build tools around them to supply arguments.

This could look like the following: ( oversimplified )

  1. preprocess
  2. compile
  3. assemble
  4. link

v

  1. add brackets from indentation
  2. preprocess
  3. check if objective oriented constraints are all satisfied
  4. do something else
  5. compile
  6. assemble
  7. run assembly through as an example ai for antivirus scanning
  8. link
  9. run test

There could also be a fork in this process: sending for example the source code both to a compiler and an interpreter to detect edge case behavior while compiling. Or compile with both automatic typing and your defined typing so that when rounding errors are big you can instantly compare with a dynamically typed version of your program. Or the other way around, maybe you want different parts of your code to be handled with different preprocessors.

The build process should be configured per project for things about the input like syntax and per computer for things about the output like optimizations.

There are of course some drawbacks, one being a trust issue where someone pulls in a obscure module to build malicious releases. It probably also is harder to maintain stability when you have to keep in mind that your preprocessor isn't the first to be run. And your compiling process can take a lot longer if you have to go through multiple pre, post or even compilation phases.

If you know such a build tool, or c (: haha :) some obvious reasons that this should not exist, please let me know. Thank you for reading this lenghty post.

Thanks for the comments, based on them I think I can better explain what I want. I would like a language that has got minimal specification so its preprocessor, compiler, assembler and linker are a collection of plugins rather than one chunky program.

So the compiler reads for example a line. void main(int argc, char argv) and then all main body plugins get a event_newline. The function plugin reads this and creates a new object that contains the function main. Then sets an event_functionBody that is caught by other plugin(s) to read the contents of main and return what it has to do.

top 47 comments
sorted by: hot top controversial new old
[–] jeffhykin@lemm.ee 24 points 11 months ago* (last edited 11 months ago) (3 children)

does this compiler exist

TLDR; 65% of what you want exists as the Rust compiler, which is probably as close as you're going to get at the moment (edit: I was wrong see the comment about racket for a less practical but more flexible system). Take a look at macros like view! on this page. Rust doesnt support html-like syntax, but it does within that view! because someone made a macro that supports it. Rustc doesn't directly have a config file AFAIK but it also doesn't need any build tools (no make, cmake, autoconf, etc) because everything can be done with rust itself (because it's macro system is Turing complete with full file access).

Full Response:

I agree with the general idea, but I think there are lots of misconceptions. Gcc does allow doing things before the preprocess step, after the preprocess step, before the linking step, etc. It's possible, but not easy, to run your own programs inbetween those kinds of steps. As for why there's no config file, it's probably cause gcc is really old, but I'll have to let someone else comment on that.

However, syntax support is effectively a completely different feature request. For example the "adding brackets to indentation" couldn't really/correctly come before the preprocessing step. I mean a really hacky solution like my indent experiment from a long time ago can, but it will never be even slightly reliable because of the preprocessor, multi-line strings, comments and other edgecases. Let me explain.

  • The syntax cannot be parsed without running the preprocessor. Things like un-matched brackets are completely allowed before the preprocessing step. It would be literally impossible for the parser to run before preprocessing.
  • So let's talk preprocessing. The preprocessor is so stupid it won't even notice the difference between C, Haskell, or Ada. It's just looking for strings, comments, ints, and preprocessor directives. That's it. It has no idea about scopes or brackets or anything like that.
  • So for the "adding brackets to indentation" to work, it would need to run its own preprocessor step, then do some parsing of its own, and then run the indent-to-bracket conversion.

But note, preprocessor strings just coincidentally parse the same as C strings. There's already a limitation of the preprocessor failing on, lets say, python where python has triple-quote strings.

That said, preprocessing is actually highly unusual in the sense that it can be done as a separate step. Usually parsing needs to be done as a unified operation. Not to say it can't be modular, but rather the module must be given to a central controller that knows about everything rather than just having a code-transformaiton step.

With those misconceptions out of the way, now I want to talk about the parts I agree with.

IMO the perfect language is the one that has an "engine" that is completely separate from the syntax. And then the language/compiler should allow for different syntax. LLVM IR could be argued as being "an engine", but man is it a messy edgecase-y engine with no unified front-end.

The closest current thing to what you're talking about is almost certainly Rust macros. Unlike the preprocessor, Rust macros fully understand rust and are a part of the parsing process. They are decently close to what you're saying, instead of compiler flags it's just imports within Rust. You can write HMTL, SQL, and other code just right in the middle of a rust program (and it's not a string either, it's actual syntax support). Not only is it possible, but I have been eagerly awaiting for someone to create a garbage-collected syntax within a Rust macro. People have already created garbage collectors, it's just a matter of making a nice wrapper and inter-op.

That said, and even though Rust macros are head-and-sholders above basically every other language, I personally still think rust macros don't go far enough. Indent-based code isn't really possible within rust macros, rust macros can't have imbalanced braces, and there can be escaping issues that prevent things like YAML syntax from ever being possible. They also can't allow for extensions like units, e.g. 10gallons without wrapping it with some kind of delimiter (which defeats the point)

AFAIK currently there is no compiler that supports a composable syntax like that. I've worked on designing such a system, and while I don't think it's impossible, it is extremely hard. There's a lot of complications, like parsing precedence, lookaheads, operator precedence. Two syntax modules that don't know about each other can easily break each other. Like I said, I don't think it's impossible, but it is difficult.

[–] spykyvenator@programming.dev 2 points 11 months ago (1 children)
[–] jeffhykin@lemm.ee 1 points 11 months ago* (last edited 11 months ago) (1 children)

also if you're interested in languages join c/programming_languages!

[–] spykyvenator@programming.dev 1 points 11 months ago

Seems like the better place to post this idd

[–] porgamrer@programming.dev 2 points 11 months ago* (last edited 11 months ago)

I mentioned it elsewhere here but I think the Terra research language has explored this area more thoroughly than Rust, just because that's its only purpose. The website and academic papers are definitely worth a skim: https://terralang.org/

It's basically a powerful LLVM-based compilation library exposed where everything is exposed through Lua bindings. The default Terra compiler is just a Lua script that you can pull apart, extend, rearrange, etc. It's all designed for ease of experimentation, whereas Rust has to worry about being a rock-solid production compiler.

Honourable mention to C# source generators too. They are janky as hell but very effective.

[–] bizdelnick@lemmy.ml -1 points 11 months ago* (last edited 11 months ago) (4 children)

There's nothing new in rust that was not already possible with C++. It is possible to change language syntax using macros and templates... if you want to write code that nobody will understand.

[–] deur@feddit.nl 8 points 11 months ago

Yeaaah except that rust-analyzer can honest to god manage to inspect macro codegen.

And the fact that macros are made to retain "span" information...

And that macros arent a huge hack...

[–] xmunk@sh.itjust.works 2 points 11 months ago

I personally prefer to compile my LISP using gcc.

[–] jeffhykin@lemm.ee 1 points 11 months ago
[–] Miaou@jlai.lu 1 points 11 months ago

I like how you addressed the problem with this approach in c++ but somewhat still clicked on the "post" button

[–] Redkey@programming.dev 14 points 11 months ago (1 children)

Some of the things you mentioned seem to belong more properly in the development environment (e.g. code editor), and there are plenty of those that offer all kinds of customization and extensibilty. Some other things are kind of core to the language, and you'd really be better off switching languages than trying to shoehorn something in where it doesn't fit.

As for the rest, GCC (and most C/C++ compilers) generates intermediate files at each of the steps that you mentioned. You can also have it perform those steps atomically. So, if you wanted to perform some extra processing at any point, you could create your own program to do so by working with those intermediate files, and automate the whole thing with a makefile.

You could be on to something here, but few people seem to take advantage of the possibilities that already exist, and combining that with the fact that most newer languages/compilers deliberately remove these intermediate steps, this suggests to me that whatever problems this situation causes may have other, existing solutions.

I don't know much about them myself, but have you read about the LLVM toolchain or compiler-compilers like yacc? If you haven't, it might answer some questions.

[–] spykyvenator@programming.dev 3 points 11 months ago (1 children)

LLVM Is something I want to check out for some time now but never did. yacc I haven't heard about. but its indeed what I'm getting at, why haven't we got a single language that you can adapt to all needs.

[–] stifle867@programming.dev 11 points 11 months ago (1 children)

The more generic you make something the worse it is at specific goals. The more use cases you support, the more complex and harder to maintain, the more it's likely to fail. There will never be a "universal" programming language.

Imagine if you had a programming language that did "everything". Well there are people who want a simple programming language. Don't these two things seem completely at odds?

[–] spykyvenator@programming.dev 1 points 11 months ago* (last edited 11 months ago) (2 children)

I agree, I put an example in my main post, it isn't really a language in that it has as little as possible language specifications. It could be simple or complex syntax based on what plugins you select for your use case. Its not a universal programming language more like a universal programming language specification that most languages fit into.

[–] stifle867@programming.dev 4 points 11 months ago* (last edited 11 months ago) (1 children)

You're essentially describing a turing machine. I don't mean to be facetious and I don't have proof for this but my gut tells me by the time you make something this generic it will no longer be a "universal programming language" and will become a specification to allow for anything while failing to provide anything actually useful.

Anything more specific and you're essentially implementing YACC or some form of code generation that's already been invented and is not specific enough to be useful for this purpose.

EDIT: In my mind it's like saying we have cars, boats, airplanes, bicycles, etc. Why isn't there a platform where if we wanted we could add wings and jet engines and make it into a plane? Or instead add a horse and carriage? Or 4 wheels and a steering wheel?

Maybe you could do so, but the result wouldn't be anything actually useful because making a plane has specific design goals that aren't shared with a bicycle.

[–] spykyvenator@programming.dev 1 points 11 months ago* (last edited 11 months ago) (1 children)

I don't think that, but it could be. Variables, functions and things like loops, switches and if statements are things that many programming languages have in common. They can be specified without forcing a specific syntax and already take you far from turing machines.

[–] stifle867@programming.dev 2 points 11 months ago* (last edited 11 months ago) (1 children)

I'm starting to understand what you're saying. It wouldn't be a universal programming language because even those things you list are not universal.

So now I am imagining a system very roughly where you could say (for example):

language.add(Variables)
language.add(Functions)
language.add(Loops)
language.add(Strings)
language.add(BracketScope)
language.add(Regex)
language.add(ActorConcurrency)

You would add support for various features and maybe control the syntax via configuration? Is that more along the lines of what you are envisioning?

[–] spykyvenator@programming.dev 2 points 11 months ago (1 children)

Yes, indeed, I had a hard time explaining this but its what I mean.

[–] stifle867@programming.dev 3 points 11 months ago* (last edited 11 months ago) (1 children)

Interesting. I don't see any immediately obvious technical reasons why this wouldn't be possible.

There are languages that include a variety of different programming paradigms (I'm thinking of D). I can't think of any that support different syntaxes but I'm sure one would exist. However, a language that is configurable I feel does not exist and could be an interesting experiment.

I still do fear however, that any attempt would still not be practical as if you design a language feature that is generic enough to work with/without other features and with different syntaxes then it would not be specific enough to be clearly useful. In other words by trying to support everything it becomes good at nothing.

[–] ericjmorey@programming.dev 1 points 11 months ago

OCaml has 2 syntax variations. The original OCaml syntax and ReasonML.

[–] killeronthecorner@lemmy.world 3 points 11 months ago* (last edited 11 months ago)

To repeat the other person's point a bit, what you're describing sounds very much like LLVM, and other IR languages.

IRs exist to allow a variety of programming languages to be specified in a way that doesn't require direct compilation of that language to asm. This means the IR has to support some representation of the superset of all those languages' features.

So I guess your question could be interpreted as: why don't we just use an IR to write code? Mostly because they require you to forego many of the modern conveniences of modern programming languages. The whole point of going higher level and more opinionated in language choice is to allow you to turn designs into code faster than you can with lower level representations.

I don't entirely follow what you're trying to achieve with the plugins idea but it very much sounds like a combination of ideas that are found in LLVM combined with features from modern workbench IDEs. You might want to read about the architecture of Eclipse.

Eclipse was a popular development "workbench" that allowed you to plug in various tools at every level and stage of development and configure them to your taste, as well as allowing you to build your own plugins to work with languages in a bespoke way.

https://www.eclipse.org/articles/Article-Plug-in-architecture/plugin_architecture.html

[–] PowerSeries@lemmy.ca 10 points 11 months ago (1 children)

Have you looked at the Lisps / Scheme / Racket yet? Racket in particular makes it quite nice to go #lang blah at the top of the file and change the parsing or interpretation entirely.

For example all the documentation pages and guides are written in scribble:

https://docs.racket-lang.org/scribble/getting-started.html#%28part._first-example%29

#lang scribble/base
 
@title{On the Cookie-Eating Habits of Mice}
 
If you give a mouse a cookie, he's going to ask for a
glass of milk.

And it has an entire document markup language created in it, which can output pdf or html. But you can still use @ syntax to drop in racket code to compute values. Or create templates.

I even implemented a #lang which took assembly directly (and interpreted it, it was for a class).

So if you are really after full control, you should study Lisps and their macro systems.

[–] jeffhykin@lemm.ee 1 points 11 months ago

I've used racket before and I did not know about this! If you're willing to share I'd love to hear more about how you defined that assembly lang.

[–] bizdelnick@lemmy.ml 7 points 11 months ago (2 children)

Did you read the gcc documentation? Don't you think it is already overcomplicated?

[–] jeffhykin@lemm.ee 3 points 11 months ago* (last edited 11 months ago) (1 children)

IMO a config.yaml or toml would simplify things

[–] bizdelnick@lemmy.ml 1 points 11 months ago (1 children)

You have make, cmake, b2, meson etc. to define flags that you want using another syntax.

[–] jeffhykin@lemm.ee 3 points 11 months ago* (last edited 11 months ago) (2 children)

"Yeah we could make single defacto config, one that requires no additional dependencies, and one that entirely skips the mess of cli-args-via-env-vars ... OR 😏 We could make users pick one of several competing options, all of which do effectively the same thing but are mostly incompatible with each other and allow for a new tabs-vs-spaces-kind-of debate while also not letting you understand other peoples code bases unless you learn all of them. And, not only does it require everyone to install a separate binary, but also they need to somewhat coordinate on which version--versions that are independent from the gcc version but must be kinda-sorta-coordinated with the gcc version."

Sorry, I'm not convinced

[–] bizdelnick@lemmy.ml 2 points 11 months ago* (last edited 11 months ago)

I agree. The problem is that we already have a lot of compatibility breaking options in gcc: different language standards, non-standard extensions, language features that can be disabled, warnings that can be turned into errors... Multiplying them is not the thing that will make a programming language/compiler better.

[–] spykyvenator@programming.dev 1 points 11 months ago (1 children)

I'm not trying to target gcc specifically but compilers in general. I just know gcc best which is why its in my examples.

[–] bizdelnick@lemmy.ml 6 points 11 months ago

Well, every C/C++ compiler is overcomplicated because it has a preprocessor, provide numerous pragmas, attributes etc. etc. What you want is not just a new compiler, it is a new customizable language.

[–] porgamrer@programming.dev 6 points 11 months ago

If I understood correctly, the closest thing I know of to what you are describing is probably Terra:

https://terralang.org/

It is an academic project with various papers presenting case studies that do things like change the whole programming paradigm the language, or the execution model, or the syntax.

The wider paradigm is called multi-stage programming. The other obvious languages to mention are the lisp family, and more recent spin-offs like Julia.

[–] nodoze313@lemmynsfw.com 4 points 11 months ago (2 children)
[–] Zeth0s@lemmy.world 2 points 11 months ago

This was my immediate reaction as well.

For those who like living a messy life, there's always Visual Studio (the original beast, not VSCode)

[–] spykyvenator@programming.dev 1 points 11 months ago* (last edited 11 months ago) (2 children)

Yes, not sure what you mean by this but its indeed what I'm getting at, our compilers aren't built enough in unix fashion to my liking. gcc handles preprocessing, compilation and linking. but I wouldn't know how to run a second preprocessor after the first one in gcc, just did a quick search apparently gcc -E handles this, but that doesn't seem that intuitive to run gcc -E on all files to some temporary directory, there run some other program on all the code then compile and link. A pipeline would be nicer and I also don't know any tools that can do additional preprocessing.

[–] noli@programming.dev 5 points 11 months ago (2 children)

LLVM is designed in a very modular way and the LLVM IR allows you to specify e.g. if memory management should be manual/garbage collected.

You could make a frontend (design a language) for LLVM that exposes those options through some compiler directives.

In general I'd heavily recommend looking into LLVM's documentation.

[–] spykyvenator@programming.dev 1 points 11 months ago (1 children)

LLVM really looks like something that I need to look into

[–] jeffhykin@lemm.ee 2 points 11 months ago* (last edited 11 months ago)

LLVM is the engine everything compiles to. The problem is there's no car, it's just the engine lol.

And other than Rust (which uses LLVM) the existing cars are not very configurab--well I mean they're configurable but not at the extreme level of configuration you're talking about.

[–] jeffhykin@lemm.ee 1 points 11 months ago (1 children)

Wow I knew some about LLVM IR but I had no idea it had high level options like garbage collection.

[–] noli@programming.dev 2 points 11 months ago

Oh yeah, it's actually pretty extensive and expressive. If you're interested in this sort of stuff it's worth checking out the IR language reference a bit. Apparently you can even specify the specific garbage collection strategy on a per-function basis if you want to. They do however specify the following: "Note that LLVM itself does not contain a garbage collector, this functionality is restricted to generating machine code which can interoperate with a collector provided externally" (source: https://llvm.org/docs/LangRef.html#garbage-collector-strategy-names )

If you're interested in this stuff it's definitely fun to work through a part of that language reference document. It's pretty approachable. After going through the first few chapters I had some fun writing some IR manually for some toy programs.

[–] nodoze313@lemmynsfw.com 1 points 11 months ago (1 children)

Does running lint prior not resolve the issue? Isn't this the entire goal of make, cmake, autotools, etc? Why do you need to run it after? So you can re-process the macros after they are in line? Should just validate the macros before running gcc.

[–] spykyvenator@programming.dev 1 points 11 months ago (1 children)

It is somewhat like running multiple linters and prettifiers but these are hefty tools, the build tool should provide an interface that lets you attach different programs for every little step from code to machine lang

[–] monotremata@kbin.social 1 points 11 months ago* (last edited 11 months ago)

It really sounds like you're describing Make (or LLVM). Is there something you need it to do that those can't handle?

[–] REdOG@lemmy.world 4 points 11 months ago (1 children)

Isn't that what configure, make and makefiles are for?

[–] spykyvenator@programming.dev 2 points 11 months ago

Yes, they are there to combine several programs into the building process, and could be used for this. What I would want is programs like typescript that preprocess your code with possible changes in syntax and language specification

[–] Restaldt@lemm.ee 3 points 11 months ago (1 children)

Linting

This is called linting

[–] jeffhykin@lemm.ee 2 points 11 months ago* (last edited 11 months ago)

Linting let's you use indentation based block delineation instead of curly brackets while keeping the rest of the language functionality?