TL;DR: the new behavior in 1.24.0 broke the rlua crate, and is being
reverted. If you have since changed your code to take advantage of the behavior
in 1.24.0, you’ll need to revert it for now. While we still plan to introduce
this behavior eventually, we will be rolling it out more slowly and with a new
implementation strategy.
Quoting the 1.24 announcement:
There’s one other change we’d like to talk about here: undefined behavior.
Rust generally strives to minimize undefined behavior, having none of it in
safe code, and as little as possible in unsafe code. One area where you could
invoke UB is when a panic! goes across an FFI boundary. In other words, this:
extern "C" fn panic_in_ffi() {
panic!("Test");
}
This cannot work, as the exact mechanism of how panics work would have to
be reconciled with how the “C” ABI works, in this example, or any other ABI
in other examples.
In Rust 1.24, this code will now abort instead of producing undefined behavior.
As mentioned above, this caused breakage. It started with a bug filed against
the rlua crate. rlua is a
package that provides high level bindings between Rust and the Lua programming
language.
Side note: rlua is maintained by Chucklefish,
a game development studio from London that’s using Rust. Lua is a very
popular language to use for extending and scripting games. We care deeply about
production Rust users, and so handling this was a very high priority for the
Rust team.
On Windows, and only on Windows, any attempt to handle errors from Lua would
simply abort. This makes rlua unusable, as any error of any kind within Lua
causes your program to die.
After digging in, the culpurit was found: setjmp/longjmp. These functions
are provided by the C standard library as a means of handling errors. You
first call setjmp, and then, at some later point in time, call longjmp.
When you do, control flow returns to where you had previously called
setjmp. This is often used as a way to implement exceptions, and sometimes,
even coroutines. Lua’s implementation uses setjmp/longjmp to implement
exceptions:
Unlike C++ or Java, the C language does not offer an exception handling
mechanism. To ameliorate this difficulty, Lua uses the setjmp facility from
C, which results in a mechanism similar to exception handling. (If you
compile Lua with C++, it is not difficult to change the code so that it uses
real exceptions instead.)
The issue is this: what happens when some C code setjmp/longjmp’s through
Rust stack frames? Because drop checking and borrow checking know nothing
about this style of control flow, if you longjmp across a Rust stack
frame that has any type that’s not Copy on its stack, undefined
behavior will result. However, if the jump happens entirely in C, this
should work just fine. This is how rlua was managing it: every call
into Lua is wrapped with lua_pcall:
When you write library functions for Lua, however, there is a standard way
to handle errors. Whenever a C function detects an error, it simply calls
lua_error, (or better yet luaL_error, which formats the error message and
then calls lua_error). The lua_error function clears whatever needs to be
cleared in Lua and jumps back to the lua_pcall that originated that
execution, passing along the error message.
So, the question becomes: Why does this break? And why does it break on
Windows?
When we talked about setjmp/longjmp initially, a key phrase here wasn’t
highlighted. Here it is:
After digging in, the culpurit was found: setjmp/longjmp. These functions
are provided by the C standard library as a means of handling errors.
These functions aren’t part of the C language, but part of the standard
library. That means that platform authors implement these functions, and
their implementations may differ.
Windows has a concept called SEH, short for “Structured Exception
Handling”.
Windows uses SEH to implement setjmp/longjmp, as the whole idea of SEH
is to have uniform error handling. For similar reasons, C++ exceptions use
SEH, as do Rust panics.
Before we can sort the exact details of what’s happening, let’s look at how rlua
works. rlua has an internal function, protect_lua_call, used to call into
Lua. Using it looks like this:
protect_lua_call(self.state, 0, 1, |state| {
ffi::lua_newtable(state);
})?;
That is, protect_lua_call takes some arguments, one of which is a closure. This
closure is passed to lua_pcall, which catches any longjmps that may be thrown
by the code passed to it, aka, that closure.
Consider the code above, and imagine that lua_newtable here could call
longjmp. Here’s what should happen:
protect_lua_call takes our closure, and passes it to lua_pcall.
lua_pcall calls setjmp to handle any errors, and invokes our closure.
- Inside our closure,
lua_newtable has an error, and calls longjmp.
- The initial
lua_pcall catches the longjmp with the setjmp it called earlier.
- Everyone is happy.
However, the implementation of protect_lua_call converts our closure to an
extern fn, since that’s what Lua needs. So, with the changes in 1.24.0, it
sets up a panic handler that will cause an abort. In other words, the code
sorta looks something like this pseudo code now:
protect_lua_call(self.state, 0, 1, |state| {
let result = panic::catch_unwind(|| {
ffi::lua_newtable(state);
});
if result.is_err() {
process::abort();
}
})?;
Earlier, when discussing setjmp/longjmp, we said that the issue with it in
Rust code is that it doesn’t handle Rust destructors. So, on every platform but
Windows, the above catch_unwind shenanigans is effectively ignored, so
everything works. However, on Windows, since both setjmp/longjmp and Rust
panics use SEH, the longjmp gets “caught”, and runs the new abort code!
The solution here is to
generate the abort handler, but in a way that longjmp won’t trigger it. It’s
not 100% clear if this will make it into Rust 1.25; if the landing is smooth,
we may backport, otherwise, this functionality will be back in 1.26.