In C++ creator calls for help to defend programming language from 'serious attacks', the article mentioned a few proposals:
- Profiles (C++) by Bjarne Stroustrup, work in progress on GitHub.
- TrapC by Robin Rowe, news report also has example code.
- Fil-C by Filip Pizlo.
- Mini-C by Aymeric Fromherz (Inria) and Jonathan Protzenko (Microsoft).
- Safe C++ (also known as Circle C++) by Sean Baxter.
Assuming we all know why memory safety is important, let's dive into how each of these proposals deliver on memory safety. Bear in mind that these are working proposals, so some of the details are "magic" or wishful thinking that needs to be fleshed out further.
Profiles (C++)
Profiles by Bjarne Stroustrup is not a single proposal, but a collection of proposals. Each profile states a promise about the safety property it provides and the language features or checks needed to achieve it. Profiles enforcement can be specified in code or toggled through compiler flags.
Through the use of a new expect()
function, which is like assert()
, error handling can be done in one of the following ways: ignore, logged (and ignored), logged (and throw an exception), throw an exception, or exit the program.
There are several profiles and their summaries:
- Profile: Type stipulates that "every object is used only in accordance with its definition." It may be a union of several profiles such as Ranges, Invalidation, Algorithms, Casting, RAII and Union. One idea is to eliminate raw pointer handling from collection objects through a new span abstraction.
- Profile: Arithmetic detects over and underflow conversion errors.
- Profile: Concurrency detects race condition and deadlocks. It acknowledges that this is the "least mature of the suggested profiles" and "has received essentially no work specifically related to profiles."
- Profile: Ranges detects out of range indexing.
- Profile: Pointers stipulates that "every pointer points to an object or is the nullptr; every iterator points to an element or the end-of-range; every access through a pointer or iterator is not through the nullptr nor through a pointer to end-of range." It introduces a new language feature not_null, which looks like a type qualifier but it is not clearly specified.
- Profile: Algorithms stipulates that "no range errors from mis-specified ranges (e.g., pairs of iterators or pointer and size). No dereferences of invalid iterators. No dereference of iterators to one-past-the-end of a range." There is some overlap with the pointers profile regarding the iterator end of range. It introduces a new language feature not_end(c, p) which returns whether iterator p is at the end of the container c.
- Profile: Initialization stipulates that "every object is explicit initialized."
- Profile: Casting prevents integer truncation by narrowing cast. Provides a narrow_cast<> with runtime checking.
- Profile: Invalidation prevents "access through an invalidated pointer or iterator." The compiler is supposed to "ban calls of non-const functions on a container when a pointer to an element of the container has been taken," and suggests that the compiler does it through "serious static analysis involving both type analysis and flow analysis" (i.e. magic).
- Profile: RAII prevents resource leaks by representing every resource as a scoped object. The constructor and destructor handle the acquisition and release of the resource. This can also be used to do reference counting.
- Profile: Union says "every field of a union is used only as set" and suggests later providing pattern matching (i.e. algebraic data types) as an alternative.
It is clear that Profiles leverage many C++ only features and will not apply to C. However, the strength of this approach is that it recognizes safety as a synthesis of many issues that can be addressed incrementally. It allows legacy code to be incrementally updated to satisfy one profile at a time, so there is less upfront cost towards memory safety.
Profiles can also become unnecessarily broad. For example, concurrency through flow analysis is another can of worms that requires computing the arbitrary permutation of concurrent access to detect race conditions. Invalidation is also magic, as most code do not sufficiently express their intent to transfer resource ownership. On the other hand, it is unclear if all the profiles together will guarantee what people now expect from "safe" Rust.
TrapC
TrapC by Robin Rowe proposes a new dialect of C with the following modifications:
malloc()
always allocates from a garbage collected heap, andfree()
is no-op.- Pointers are instrumented with type and size information, and pointer dereferencing is checked in runtime.
- Access violations can be caught using a new
trap
statement. Unhandled violations will terminate the program. goto
andunion
are not supported.- TrapC can call C functions but not vice versa.
- Typesafe
printf()
with a generic"{}"
format specifier. - No special provision for thread-safety.
It is supposed to be able to compile unmodified legacy C code with additional runtime checks. When access violations cause unwanted program termination, users can write trap handlers as necessary. The white paper suggests a possible goal to "produce executables that are smaller and faster than from C compilers" but it is not clear how it is possible with additional runtime checking overhead (i.e. magic).
There is some escape analysis, so if a scoped object is returned as a pointer, it becomes heap allocated, similar to Go. Through this feature, TrapC proclaims that "it is possible to have code that is wrong in C, yet right in TrapC," but it is not clear how much legacy code that used to have undefined behavior in C will now benefit from having a defined behavior.
Fil-C
Fil-C by Filip Pizlo proposes a new dialect of C with the following modifications:
malloc()
always allocates from a garbage collected heap, butfree()
puts an object to a free list.- Pointers are instrumented with type and size information, and pointer dereferencing is checked in runtime.
- Garbage collection implementation supports concurrent collection without stop-the-world.
I have some doubts about the free list. The proposal does not prevent pointers from being aliased (having multiple pointers to the same object). Freeing an object will nullify one pointer but the other pointer is still valid. The proposal may be a little immature.
Much of the manifesto extols the virtue of author's garbage collector design, so it's not clear if the author is selling a new language or selling a new garbage collector. Garbage collector is not supposed to be tied to the language. There is no one-size-fits-all garbage collector, so it ought to be possible to use different garbage collection strategies depending on the workload requirements of the application.
Mini-C, or "Compiling C to Safe Rust, Formalized"
Aymeric Fromherz (Inria, France) and Jonathan Protzenko (Microsoft Azure Research, US) explore how to compile C to Rust without resorting to "unsafe" Rust. The resulting code strongly provides the same safety guarantee that Rust provides. Some of the considerations include:
- Static analysis to translate pointer arithmetics in C to slices and splitting in Rust.
- Infers when a reference needs mutable borrowing, including references from a struct.
They validated the feasibility of their approach on a subset of C that is already formally verified through other means, but it is probably a long shot from being able to accept legacy C code.
It relies on the fact that some carefully written C code has internal consistencies that are not explicitly expressed, but we can design inference algorithms to figure out what these internal consistencies are. Some of the inference techniques used in this paper can be reversely applied on Rust to reduce the notational requirements of Rust code.
The resulting executable does not need garbage collection, but still relies on runtime bounds checking (in Rust).
Safe C++ (also known as Circle C++)
Sean Baxter started the Circle C++ compiler around 2019 as "a compiler that extends C++17 for new introspection, reflection and compile-time execution" with a flare in meta-programming. Some of the memory safety extensions implemented by this compiler over the years are now being proposed as a C++ draft.
Some highlights of the proposal:
#feature on safety
activates the compile-time memory safety checks, like#pragma
in existing compilers.- A
safe
specifier that requires usage of safety language extension in a function (though it still allows explicit unsafe code), like thenoexcept
specifier that disallows a function from throwing an exception. - It still relies on runtime bounds checking.
- Ownership tracking through checked references
T^
(mutable) andconst T^
(shared). Each object may have either a single mutable reference, or any number of shared references, but not both at once. - Named lifetime parameter
/a
and borrowing referenceT^/a
. - Lifetime binder template parameter
typename T+
. - A
mut
statement prefix to establish a mutable context that allows conversions from lvalues to mutable borrows and references. - A new standard library with safe containers and algorithms. In particular, it replaces
begin()
andend()
iterators with slice iterators annotated by named lifetime parameters. - Pattern matching with "choice type" to enforce that optional values have to be checked before access.
- Type traits for thread safety:
T~is_send
,T~is_sync
. - Type traits for allowing unsafe pointer arithmetics:
T~as_pointer
,T~as_length
; not fully explained. - Type traits
T~string
,T~is_trivially_destructible
; not fully explained.
The safety semantics are inspired mostly by Rust, which is mentioned in the proposal 85 times.
Safe C++ may very well provide some concrete design for some of the Profiles work by Stroustrup. Contrary to Profiles, Safe C++'s monolithic "all or nothing" approach might make it more difficult to port legacy code due to the upfront cost to satisfy all memory safety requirements all at once. Perhaps choice type, thread safety, and pointer arithmetics can be split into their own Profiles.
Conclusion
There are several ways to compare and contrast these approaches.
- Whether they expect significant modification to legacy code:
- Upfront: Safe C++, Mini-C.
- Incrementally: Profiles C++
- No: TrapC, Fil-C.
- Whether they force the use of garbage collection:
- Yes: TrapC, Fil-C.
- No: Profiles C++, Safe C++, Mini-C.
- Whether they require C++.
- Yes: Profiles C++, Safe C++.
- No: TrapC, Fil-C, Mini-C.
In light of the recent Rust-in-Linux debacle, if we were to port Linux kernel code to a memory safe C dialect, we would not be able to use garbage collection, nor would we be able to use C++. This leaves Mini-C as the only viable option. However, the inference algorithm may not be able to handle the complexity of Linux kernel code, so some kind of object borrowing or lifetime annotation will still be needed.
Incremental safety check is a useful feature, as this alleviates the upfront cost to fix legacy code for all memory safety issues all at once.
It's worth noting that all of the proposals above require runtime bounds checking. Without some kind of size annotation throughout the code, it would be hard for static analysis to infer whether bounds checking can be safely omitted. This precise problem is solved by ATS through the use of dependent types. Perhaps it could be useful to design a dependent type system for a dialect of C for those who aren't used to ML styled programming of ATS. We can take some inspiration from Mini-C to reduce the notational overhead.
No comments:
Post a Comment