C++ Needs Better IO Facilities

6 minutes

C++ is not well liked by a large number of non-C++ programmers. One of the arguments that has stood the test of time against C++ is the iostreams library. It’s even a point of contention within the C++ community. The main arguments against the entire iostreams library can usually be summed up as:

We’re going to ignore that first bullet point however, because it has been argued to death and back. Developers can argue all day about overloading and what should and should not be overloaded (and they usually do)

However, iostreams are (to say the least) less performant than their C stdio counterparts. Enough to warrant a variey of StackOverflow posts about it. Out of all the classes and interfaces in the C++ standard library, the iostreams library is the least modern (and indeed the one that received the least attention during the C++11 and C++14 revisions). Among these issues are inheritance, virtual functions, and a lack of user defined allocator support at type instantion time. Out of all the C++ standard libraries, iostreams are the least C++, and arguably the most over-engineered. C++’s motto is “pay for what you use”, and a C++ programmer shouldn’t have to pay such a heavy tax for something as simple as reading bytes from a file, string, or other resource.

The last bullet point, that iostreams are God Objects, is the most important.If there is one issue that I have with C++ streams, it is that they do too much. They do formatting and scanning of text, reading and writing of binary data, the handling of locale information (but not in a way that doesn’t require the use of something Boost.Locale to get it right), buffer overflow and buffer underflow, position seeking, and worst of all, the very nature of a stream results in an overcomplicated user-defined stream insertion/extraction overload. When we rely on Argument Dependent Lookup and the ostream& operator << overload, we end up in a bit of a pickle. What happens if we want our user-defined type to be output as binary instead of text? How do we, as the user of this library, decide if we want to have a binary formatted output function or a text formatted output function? There is no good (or rather easy) answer. What ends up happening is a user provides an ostream& operator << so they can just dump text to the console or to a string for debug information. There is no way to say “When writing to this resource, we treat it as binary data. When writing to a different one, we will use text formatting for printing log information”. We can’t just let ADL kick in and take care of the rest for us. It is for this reason that we end up with libraries like cereal and Boost.Serialization, so that developers can explicitly state “here’s how we store our data”, even if it is for the smallest of utilities.

How do we solve this? The committee won’t (or maybe can’t, backwards compatibility is a big deal). The streams library is here to stay. But, we need a better alternative. One that lets us rely on ADL, that doesn’t focus on the use of inheritance or virtual functions, that works in a way that lets the user decide how they read and write binary data vs formatting or scanning text data, or even printing.

What we need are better, more generic I/O facilities for C++. We need a library that splits up the different Concepts of Resources, Readers, Writers, Streams, Formatters, Scanners, Buffers, and even smaller concepts such as position Seeking within a Resource.

Something like this wouldn’t replace serialization library’s like Boost.Serialization, but would be a better foundation for their output.

I’ve looked to other programming languages for inspiration on what a potential Modern C++ I/O API might look like. Rust has some pretty good concepts, some that would even map 1 : 1 with C++. However, there are some rust specific language features that they rely on. They do not make a difference between Read/Write and Format/Scan, nor do they treat each possible Resource as one, opting instead of write one Reader for each possible Resource (MemReader, FileReader, BufReader, etc). Even Java has some decent concepts (allowing for a writeObject function), however because Java is “all aboot the oop”, it relies on a user defining a class to handling these Read/Write vs Format/Scan as well as its builtin reflection system.

With these ideas in mind, I was able to develop some brief concepts that a Modern C++ I/O API should express. First, we have our verbs. These are the functions that are used for ADL to allow a generic approach to perform actions on those types which meet our Concepts. All of them relate to either a Concept, or are taken from a C stdio-like name.

read
Read binary data from a Resource
write
Write binary data from a Resource
scan
Read text from a Resource
format
Write text to a Resource
open
Opens a resource for I/O operations
close
Closes a resource for I/O operations
flush
Flushes a Buffer to its Resource
sync
Synchronizes a resource with the operating system if possible
tell
Gets the current position for I/O operations
seek
Sets the current position for I/O operations
skip
Read and discard data from a Resource until a non-white-space or given character is encountered. Use only on Scanners
print
Output text to stdout or stderr

We now need to express our Concepts. These would have an equivalent type trait available to check if a given type meets one of these requirements, allowing for SFINAE within APIs that rely on them.

Reader
Can read from a Resource
Writer
Can write to a Resource
Stream
Is both a Reader and Writer
Pipe
Holds both a Reader and Writer. For every Read, there is a Write
Scanner
Can scan from a Resource
Formatter
Can format to a Resource
Channel
Is both a Scanner and Formatter
Filter
Holds both a Scanner and Formatter. For every Scan, there is a Format
Resource
Represents a data source or data target
Buffer
Manages a Resource by buffering I/O operations
Seeker
Allows moving the current I/O operation position of a Resource

This kind of approach would work well for mixins, as well as allowing user-defined resources. For instance, implementing a Reader-only Resource for the SQLite3 Blob object would be as simple as implementing a basic read function that wraps the sqlite3_blob_read function. One would also be able to hook in to additional types in other libraries, such as libsdl’s SDL_RWops. The possibilities for inter-library interaction are numerous, possibly limitless.

If there were a library that expressed these concepts, it would most certainly make C++’s approach to I/O competitive with other languages.

C++