Millennials Are Killing the Modules TS
In my previous post, I mentioned that have serious issues with the current Modules TS as it is written. I have wanted to say more on the subject since my tweets, posts, and general comments are missing quite a bit of context for why I think the implementation is wrong and will be more trouble than it is worth.
Part of this is due to what the community (as found on r/cpp, and other subreddits) seem to think modules are and how they work, how some members on the committee would like modules to work, and the reality of how modules actually work. None of these understandings are the same, and they barely even overlap.
From what I understand, there will be a competing modules proposal from several Clang developers that tries to solve some of these issues. However, we will not see this until January/February and the upcoming vote to turn the modules proposal into a full fledged technical specification is coming up in November. Something needs to be done.
Before we get into it, anytime I used the word modules, please mentally subsitute it with the phrase “modules, as they are implemented in the Modules TS”.
Even as recently as today (October 9th), I’ve seen incorrect understandings from both the C++ community and even the Rust community on what C++ modules will let us do.
Reddit user /u/Abyxus neatly sums up my biggest issue with this comment
And even then, they get it wrong! If I have
import x in a file named
it just means that
y.cxx depends on a module named
x. There is no rule that
the module is inside a file named
x.ixx. For all a consumer knows, the module
x could be located in a file named
i.like.turtles.ixx. We’ll get into
more of this specific issue later on in this post, but I want to point out: I
have yet to see anyone on the C++ subreddit beyond a committee member know
exactly what modules do ad how they work. I hope you find this as concerning as
Effectively, without support for build tools, modules are effectively dead in the water. During the Grill The Committee talk, I asked Gabby Dos Reis about this depedecy problem. His reponse was short (mostly due to Jon Kalb placing a rule that answers could be no longer than two sentences). It was, to paraphase, “We’ll make the build systems understand”. While in written form this statement reads like a threat that a mafioso might make (“you come to me on this, the day of my build system’s wedding?"), it doesn’t really speak as to how this might work.
Near the end of their comment, /u/Abyxus states three possibilities for how to interact with modules:
Just to nip this in the bud: None of the compiler vendors want that last option. We’re absolutely not going to get it. The author of build2, Boris Kolpackov starts his CppCon talk with a very important quote from Richard Smith (once more, I’m paraphrasing): “The compiler must not become a build system.” Frankly, I agree. Our compiler should not be a build system, and likewise, our build system should not become a compiler! (As this now means we have a compiler that is a build system. Oops!). What I mean by this is whatever we do to find these dependencies, we should not have to partially execute any of the (currently 9) translation phases that the C++ compiler must execute.
With that said, why don’t we have our build systems parse the source files and keep track of everything? There are several reasons why this is a terrible idea.
Currently, within C++ we can have many headers to many implementation files. Under modules, this approach does not go away. We will have interface modules, and implementation modules. This separation might not be a bad thing, but its a bit of a shock, especially if you were expecting modules to be anything like any language that has had module support in the past 20 years. Additionally, this move to interface and implementation modules does not suddenly cut down on our implementation files, nor does it cut down on headers. In fact, that isn’t even a primary goal for modules. The entire goal is to give users a guaranteed exported interface. There is no module hierarchy, no guaranteed single file module implementation, no de factor way of finding a module. These are where the biggest problems lie.
Yes, there is no hierarchy. To be quite honest, I really don’t care if there
is. However, this seems to be something people are expecting. That the compiler
will somehow enforce a module to be contained within a given location,
relative to a parent module name. Here’s the truth: There is no such thing as a
submodule with the Modules TS. They don’t exist. Sure, I can manually create
an amalgamation module, that shares a common prefix with these modules, but
there is no actual relationship between their names. I can have two modules,
sol.saturn, and while I could amalgamate them into a module
sol, I could just as easily amalgamate them into a module named
cygnus, or even into a module called
worse, your build system won’t know where these modules are located, and
unless you want to manually list the order of compilation for modules we need
some way of determining the correct dependency order.
Under the current C++ compilation module, a header is opened by the preprocessor (with guaranteed directories to search). Because we have a name for the file, the preprocessor is able to recursively open each header until an include does not exist, or there are no more headers to include. This also means we can get our dependent acyclical graph and have our compiler give this information to us in some way. Additionally, these steps taken by the preprocessor are part of the language. Translation phases 1 - 4 are just for the preprocessor.
With modules, there is no translation phase for creating the binary interface file of dependent modules unless they’ve already been created outside of the translation unit’s first run. We’re placing the creation of these IFC files (or whatever represents the exported interface) into the hands of a magic black box. We’re taking steps that should be part of the language and placing them into the hands of… some unknown tool that doesn’t understand the C++ language, isn’t part of the compiler, isn’t provided by vendors in any capacity, and is not guaranteed to be some tool that just runs a regex over everything. Forget performance, name ONE language where it requires some unknown, unnamed, unspecified tool to find its module’s dependencies. Where the entire language’s system of public or private interfaces within a project depend entirely on this tool being run separate from the compiler, and not having a known step in which it executes. And let’s not forget, this tool has to run the preprocessor, because it can’t enforce that a module name isn’t some preprocessor defined token, and oh yes, it can’t enforce the location of dependent modules either (some of which might be conditional imports based on preprocessor tokens!).
The compiler does not know where the binary interface came from, when it came
from, how it came into existence, or even if it was a different version of your
current compiler. It doesn’t even know if it was the same compiler (yes,
currently the various compilers do know if it was them, however its been a
stated goal by Gabby to standardize the IFC format as part of the TS when it is
added to the standard). Additionally, because we don’t have a global
if constexpr, a build system has to run the preprocessor to know if there
are platform specific imports, or in the event that someone makes a module name
a preprocessor macro. What’s to stop you? After all, the compiler has no idea
where these compiled interface files come from, and it can’t assume anything
about the location because (as I mentioned earlier) the names of a module are
not tied to the filename, their location, or literally anything else. They
exist as a name, unique to the module, and nothing more.
Could a current build system enforce that? Yes. Will they? No. Their users most likely consist of a large number of varying project layouts, tool execution, code generation, and naming conventions. Existing build systems can’t enforce any of these (their users just won’t use modules), which means that any implementation of C++ modules can’t enforce anything, which means that build systems can’t enforce anything, which means that any future changes to C++ modules can’t enforce anything which means that…this circle repeats until we’re all dead or using another language. This is something I call the project layout orouboros (coincidentally, mentioned in my CppCon talk 😉) and it affects build systems as much as the C++ language itself.
We won’t get into that today because there are, in my opinion, more important issues and questions that need answering. Let’s take for example, that members on the committee cannot even agree as to what a module is. Some on the committee think that a module should be represented by a single translation unit. Others say we should try to minimize the impact of modules by having both interface and implementation modules. I’ve even seen in various programming communities people talking about how C++ modules will get rid of library files (they won’t, and you’ll know why if you think about it for a second).
Should we be able to export preprocessor macros from modules? If you say yes, keep in mind that you now absolutely need to run through a preprocessor when running through the module dependency graph in case an import of one module suddenly changes the meaning of an import further down the declaration. I don’t even know if this is possible because I’ve seen no mention of what translation phase the module’s imports are brought into!
extern module do??? I’ve asked this question so many times.
And yet, no one can tell me what in satan’s black heart it is supposed to do
or even mean. Why are we adding syntax to a technical specification, if there
isn’t any specification of the semantics that syntax is to provide?
We’re placing a feature (modules) that isn’t finished into a Technical
Specification with the plan of placing it into C++20 as soon as possible. If it
seems like this is a rant, it’s because I’m beyond angry. These issues have
not been thought through from a tooling perspective. Terrible decisions have
been made and reversed by the committee (just look at the whole
syntax that was voted in and then out this past year). This whole situation is
just a mess. If this language feature is for build systems, then why is the
feature being implemented in compilers first, with a hand wavy motion as an
excuse for there being only one build system that has implemented support
thus far, two years after modules were first announced?
It’s very clear to me that we’re going to get a feature that is so vague no build system will be able to use it at scale. Most of the time for starting a build will be spent running a tool that is just trying to find the order of files because a change to a single interface file can be a change to:
In a directory of 30 files, this is nothing. But if you tell me that I have to go from running the preprocessor once for 10'000 header files, to running it twice for each module that has replaced each header, in addition to a new tool that has to magically know which file represents which module (with no guarantee as to the location of dependent modules) I’m going to write a massive run on sentence about it and then yell “eat me, nerd” into your ear with a megaphone while tweeting “FITE ME IRL” at you on twitter.
In my extremely agitated opinion, modules as a language feature will be dead
on arrival. It either needs serious scrutiny from the community, a rewrite, an
acquiescence from the committee that the compiler should handle these steps, or
we need to kill it and start over. We did this last option with
(which is now
if constexpr), and we can surely do it with modules. You can’t
call this the Modules Technical Specification if it barely specifies any
behavior for translation phases, how the compiler is to interact with the build
system, and how build systems are to comprehend modules.