(This assumes you know Rust and probably something about compilers. We're also going to pretend for this document that Rust traits and Haskell typeclasses are the same thing, and use the terms basically interchangably. Also I am very much not an expert in type theory or related mathematics; I am at best a very determined idiot.)
The essential problem generics try to solve is one of code reuse. How
do we write a data type List<T>
and a bunch of methods for it that can
work with any type T
? If we use C then we copy-paste all our list
processing code and find-and-replace the type involved for each
different T
we need, and call it a day. And it works. (Yes C has
macros, they're terrible.) So generics are a purely(?) ergonomic
feature, the CPU doesn't care how many times you copy-paste that code
(instruction cache aside). But it's absolute hell trying to actually
fix and update stuff written like this, so nobody really wants to do
it this way. It's a place where it's obvious our tools should take the
burden of the work from us.
The alternative to having programming language support for generics or copy-pasting your general-purpose code is to work outside the type system, ie use void pointers everywhere. This chucks compile-time safety out the window and so is unsatisfactory, and it also turns out the CPU does care about it when you write code this way. If your generic data structures always have to touch their data through a pointer then doing that extra pointer load is slower than having data structures that store their contents inline. Generic code the compiler specializes for the data type can be faster than workarounds.
Generic types alone are not too complicated to implement. Every
language with typed arrays has some kind of generic types in them.
The complexity comes up when you want to start reasoning about the
relationships between types, including types you don't know about yet.
It's easy to implement List<T>
. It is more complicated to make it so
you can write an equals()
function that takes a List<T>
, and have it
work for any List<T>
where T
can be compared for equality. The
key point is that the Eq<T>
definition doesn't need to know about our
List
impl, and the List<Eq<T>>
impl doesn't need to know
anything about T
other than it has an Eq
impl. You reduce
copy-pasting by decoupling the code into a few different interchangable
parts, and let the compiler jigsaw-puzzle it back together for you to
get the result you want. So that is our goal. How can we do this?
Syntactic macros, or their more refined children type templates, are actually a fairly tempting solution to this. I should explore it more, it's on my to-do list. The concept is simple: automate the C-like copy-paste process.
Zig's comptime
code generation is an example of this approach. From
what I've seen it works pretty well in practice. It has one big
property that I don't like though: whether or not your code typechecks
is tested when the macro/template is expanded, not when it is defined.
This means that you can write a comptime function that generates invalid code
in some cases, and you never find out about it as long as those cases
never happen. For small macros/templates this is perfectly fine because
it's easy to make sure all the code is tested. But when you start using
it for large and complicated things that may have unused code paths
lurking in unexpected places, it starts to worry me. Maybe it works
fine in practice most of the time, but the idea leaves a bad taste in my
mouth.
Also, the moment you start having non-trivial relationships between
your types then with this approach starts falling apart because the
compiler can't reason about them very well. It's difficult to express
relationships like "Every implementation of Ord
is also an
implementation of Eq
" using macros. So while this approach's power is
effectively unlimited, it doesn't offer much help to the programmer
beyond the minimum possible.
I really feel like a lot of Rust's learning curve and complexity
comes from traits. Traits add another layer of semantic reasoning to
understand and figure out that is separate from your existing normal
code of functions and type definitions, with its own set of concepts
like coherence, trait constraints, associated types, and so on. (Then
it gets even more complex when you add lifetimes into it; I am not even
going to pretend to consider lifetimes for Garnet yet.)
Plus when you start to take traits further
you start getting into higher kinded types and more complex theoretical
stuff which, like deep inheritance hierarchies, tends to bog one down
with data-model concerns that seldom have much marginal benefit. Rust
also makes some decisions here that I believe don't help the matter,
such as using traits to represent magical compiler behaviors such as
Copy
.
So in the vein of making different choices from Rust I'm going to try to use "modules", which are a different sort of generic abstraction. Modules may trade off some power for simplicity, or maybe just trade off one kind of complexity for another, we will see. That said, typeclasses and modules are definitely related; in some way they are duals of each other, the same image being viewed from different directions. So the goal of this is to try to see how modules work out in practice.
The design path that lead me to modules was basically one of minimizing language constructs. With traits you define a trait by making a collection of function types, then implement it by writing functions that conform to those types, and so on. And well, we already have a way to make a group of named functions with particular signatures and use it as a type, they're called "structs". So what if our language primitives were actually just functions and structs? What if our "interface"-like definitions were simply structs that worked exactly the same as any other struct, and to make our type-checker happy we just use them in a way that it knows it can solve?
Turns out this is pretty much exactly what SML/OCaml "modules" are. When you strip out nicities those languages add like inheritance, defining relationships with types from other modules, global type inference, and all that jazz, then a module is just a struct that can have types as its members as well as functions. ML traditionally makes things all kinda weird and confusing with a separate module syntax and a plethora of features, but I don't think there's any real reason they can't just be treated like normal structs and manipulated via the same language constructs as any other value. If not entirely, then at least mostly. There's a cool research paper for a language called 1ML that does pretty much exactly this, but frankly if we choose to require type annotations on functions and some values then I don't think we even need 1ML. So far. I could be wrong! This is an experiment after all.
(Digression: Lua does something spiritually similar to this with
its metatables. It's pretty great! Basically each value has a "metatable" attached that
has some magical methods on it that are called by operators,
comparisons, etc. It also includes methods that are called on function
application and on method lookup failure, so with these small primitives
you can build just about anything. Getting too fancy with this sort of
metatable probably isn't a good idea, but you can 100% define a type
List
, define a new type T
, implement a custom comparison method for
T
and List
separately, and have list1 == list2
do the Right Thing.
But the key is that a metatable is just a normal table, it looks and
works exactly like it and you write code for it in Lua like anything
else. The only problem is that Lua is a dynamic language, so actually doing
this is usually figured out at runtime. If there's a mistake in a
method name, it is found at runtime. I want to be able to make the type
checking work at compile time, so that I don't have programs with
unpredictable runtime characteristics. So we need to be able to have
our type-checker reason about this at compile time.)
Ok, digression over. Let's get to a use case, play it out, and see how
it goes. Ignore the syntax, this is all a proof of concept that will
change, and I am blatantly just stealing the design and names from the
equivalent Rust traits. Let's make an Eq
module definition for testing
equality of a type:
type Eq(@Self) = struct
equal: fn(@Self, @Self): Bool
end
This just defines a struct with one generic type parameter, called
Self
. (The @
is there to make it easy to parse type parameters in
my proof of concept code. Like I said, ignore the syntax.) There is
absolutely nothing special about the Self
type, you could name it T
or Arglebargle
or whatever you wanted. The struct contains one
member, a function that takes two arguments of type Self
and returns
a Bool
. Ok, say we define a new type and want to implement this Eq
ability for it. We just make a new instance of this struct:
const IntEq = $Eq(I32) {
.equal = fn(lhs: I32, rhs: I32): Bool = ... end,
}
-- Actually use it...
IntEq.equal(1, 2)
(The $Eq(T)
syntax is a constructor that produces a new struct of
the type Eq(T)
. Our structs are all nominally typed, like Rust's
structs: if we have a different struct definition
Foo(T)
with the same fields as Eq(T)
, you still cannot interchange
them.)
With that out of the way... take a look at our "trait implementation".
This is just a normal constant like any other value in the code! It
has a name, it has a type, if we want it can have an exported
symbol in the binary, etc. We can call IntEq.equal(3, 4)
and it just
works, and if the compiler can prove that IntEq
is constant in your
program then it can inline it down so you don't need to go through a
function pointer in practice. It's kinda weird that you need to know
it's called IntEq
instead of the name just being Eq<I32>
like it is
for Rust traits but that's okay for now. Maybe we can figure out a shortcut for that later,
or at worst a naming convention. Let's try this again for something else... how about
making a fake List
type and implementing index and length methods on it?
type List(@T) = struct
dummy_data: @T,
end
type Idx(@Self, @Output) = struct
idx: fn(@Self): @Output,
end
type Len(@Self) = struct
len: fn(@Self): I32,
end
const ListLen = $Len(List(@T)) {
.len = fn(self: List(@T)): I32 = ... end
}
const ListIdx = $Idx(List(@T), @T) {
.idx = fn(self: List(@T), i: I32): @T = ... end
}
Cool, let's do something with it!
fn sum(l: List(I32)): I32 =
let mut total: I32 = 0
for i in range(0, ListLen.len(l)) do
total += ListIdx.idx(l, i)
end
total
end
Great, looks good to me. Kinda chonky, but it works. Now... What if we
want to implement the equivalent of Rust's Eq<List<T>> where T: Eq
?
Well we have a type for Eq
already, so we know Eq(List(T))
has to
look something like this...
const ListEq = $Eq(List(@T)) {
.equal = fn(lhs: List(@T), rhs: List(@T)): Bool =
if ListLen.len(lhs) != ListLen.len(rhs) then return false
else
for i in range(0, List.len(lhs)) do
let thing1 = List.idx(lhs, i)
let thing2 = List.idx(rhs, i)
if not thing1 == thing2 then -- OH NO
return false
end
end
end
return true
end
end
...uh oh. We do thing1 == thing2
which needs to call some method
Eq.equal(thing1, thing2)
but we don't have a module of type Eq(T)
.
We have to provide an instance of the Eq
module that knows how to compare our
T
type! Uhhhhhh oh dear well let's just write it how we think it should
look and see what happens:
...
.equal = (lhs: List(@T), rhs: List(@T), eq_impl: Eq(@T)): Bool =
if List.len(lhs) != List.len(rhs) then return false
else
for i in range(0, List.len(lhs)) do
let thing1 = List.idx(lhs, i)
let thing2 = List.idx(rhs, i)
if not eq_impl.eq(thing1, thing2) then
return false
end
end
end
return true
end
...
Great, but we've changed the signature of the eq()
method! We can't
do that! If only we could sneak the eq_impl
value into the struct via
a closure:
fn make_list_eq_thingy(eq_impl: Eq(@T)): Eq(List(@T)) =
let our_new_eq_method = fn(lhs: List(@T), rhs: List(@T)): Bool =
if List.len(lhs) != List.len(rhs) then return false
else
for i in range(0, List.len(lhs)) do
let thing1 = List.idx(lhs, i)
let thing2 = List.idx(rhs, i)
if not eq_impl.equal(thing1, thing2) then
return false
end
end
end
return true
end
$Eq(List(@T)) { .equal = our_new_eq_method }
end
Great, well we don't have a const of type Eq(List(T))
for any
T
, but we have a function that can produce a value of that
type. Weird. But it turns out this kind of thing is perfectly valid!
ML calls it a "functor", but really it's just a function that takes a
module and returns a new module. (Math people, don't bother correcting
me.) So then we can use this to make our Eq(List(@T))
implementations
for various T
's:
const IntListEq = make_list_eq_thingy(IntEq)
const FloatListEq = make_list_eq_thingy(FloatEq)
...and so on for whatever type implementations we want. This way if a user
creates their own type Foo
, they can add their own implementation of
Eq(List(Foo))
just by calling this function. Even better, if they
make a Eq(Foo)
impl for their custom type but don't impl
List(Eq(Foo))
, then a third party can do it. You don't have the
coherence rules of Haskell or Rust where you can only make an
implementation of a trait/typeclass if you are the creator of the trait
or the type you are implementing it for. Win! I've made many PR's to
various Rust crates that just added derive(Clone)
or derive(Hash)
or
such to specific types, and it's a giant pain in the ass.
Plus, importantly for a lowish-level language like Garnet, if you use my preferred code
analysis method of squinting at it and thinking real hard, you can
convince yourself that a Sufficiently Smart Compiler can still evaluate
this into something sensible at compile time. It's just a function
after all, which does nothing but return
a struct. If it's a pure function there's no problem with evaluating
it at compile-time, and if it's not a pure function it has no business
being called outside of another function anyway: Garnet's const
is
like Rust's const
or Zig's comptime
in that it means "this must be
able to be evaluated at compile time". The only trick is that the struct
has a closure in it, not a normal function. But the compiler has all
the info it needs to figure that out and generate efficient code for it.
This has a couple downsides though. One, the ergonomics is something of
a bitch. To compare two List(T)
's you have to find the matching
implementation const for that particular List(T)
. Two, it
turns out that coherence isn't just for show. If you call
make_list_eq_thingy(IntEq)
twice, you get two different
Eq(List(I32))
values which, by ML's type rules at least, are not the
same. You can say "the compiler knows these constant values are going
to be identical and can let you take advantage of that" then it probably
works in many cases, but if you have multiple different impl's and mix
up one with the other the compiler will just say "Yep they're both
Eq(List(I32))
, looks good to me." The best example I've found of the
hazards of this is that if you have two different implementations of Hash(Foo)
and try to make a hashtable with a Foo
as the key. You get a type
HashTbl(Foo, Val)
but if it doesn't encode the actual implementation
of Hash
involved in the HashTbl
then you can have the case where
one function in one library calls HashImpl1.insert(tbl, key, val)
and
another calls HashImpl2.get(tbl, key, val)
with the same tbl
value.
When this happens then you just straight up die. There's ways to make
it work that we'll talk about more later, but the bookkeeping involved
with trying to keep track of all these module impl's can get intense.
But this kind of coherence issue is a fundamental problem with
having multiple separate implementations of the same interface, not
unlike diamond inheritance in OO systems, and is
just as awful to try to solve.
IMO when that sort of thing occurs you're better off just telling the
programmer to rethink their life instead of trying to be clever and
solve it automatically.
Rust/Haskell's coherence rules dodge this problem by just making it
impossible to have two different implementations of the same trait that
the compiler needs to choose among.
Ok, so there's some problems and tradeoffs to this approach. That's
fine, we're all about tradeoffs here. The main practical difference
between traits/typeclasses and these sorts of modules is that with
traits the compiler does more of the accounting work for you. But that
also introduces this icky layer of abstraction for you to reason about
that loses nice properties like "a module can have the obvious runtime
representation of a struct full of function pointers" and "your
namespacing is identical for modules and types, and for module impls and
values". You add more things to your language, instead of fitting
together existing things in new ways. So I can think of a few ways to
start smoothing out the bumps in this sort of module system, the
simplest just being really good naming conventions with probably some
compiler help. But can we do some more fancy stuff? What does it
look like if we want to write a
sum()
function that can take anything vaguely list-y? I don't want
to get into iterators right now, so we'll just do it the old fashioned
way with our existing Idx
and Len
modules. Then
we need an add()
operation for arbitrary types, and a default value to
return if the list-like thing we are summing has length 0:
type Add(@T) = struct
add: fn(@T, @T): @T
end
type Default(@T) = struct
default: fn(): @T
end
Great, now we can just write a function that iterates through anything list-y with number-ish contents and sums them all up:
fn generic_sum(
container: @C,
idx_impl: Idx(@C, @Item),
len_impl: Len(@C),
add_impl: Add(@Item),
default_impl: Default(@Item)
): @Item =
let mut total: @Out = default_impl.default()
for i in range(0, len_impl.len(container)) do
total = add_impl.add(total, idx_impl.idx(container, i))
end
total
end
Hey there we go! Simple, obvious, slightly verbose but perfectly normal code with no real tricks except some basic generic parameters, types and structs. It looks and works exactly how you'd expect. There's only one problem:
Trying to actually call this function is absolutely batshit. The signature of this function involves 4 different module impl's, and you have to keep track of them all, remember which values implement these modules for which types, and put them all in explicitly. If the goal is to make code reuse easier, then this is an abject failure.
I was feeling kinda down about this realization, but apparently people
have been down this road before and in 2015 came up with an idea for a solution called modular
implicits. This sounds both hopeful
and hazardous from the start, so I was cautious. I hate anything
implicit. But I also hate the idea of going from foo.sum()
in Rust to
sum(foo, std.FooIdx, std.FooLen, std.IntAdd, std.IntDefault)
in Garnet. So maybe it's time for a little implicit-ness.
Even though it's originally inspired by Scala, the concept behind
modular implicits is very simple. You write your hideous
generic_sum()
function signature as above and mark certain arguments
as implicit
. Then when you write a module definition you can also
mark it implicit
. Then instead of calling
sum(foo, std.FooIdx, std.FooLen, std.IntAdd, std.IntDefault)
you can just call
sum(foo)
and it will look for implicit
definitions with types
Idx(Foo, I32)
, Len(Foo)
, Add(I32)
, and Default(I32)
and if it finds them
then it will fill in the appropriate args with them. If there is more
than one possible value for any arg, it will stop with a type error and
tell you to disambiguate, and then you just pass the impl you want in
like a normal param. And if you want to specify a particular impl over
the default, again you can just pass it in. If you want to add a
particular module to the list of implicits it will search for, then you
just import it into the local scope, similar to Rust's traits. Making a
type checker that is capable of handling these implicit args often
requires that you give it explicit signatures for function definitions,
but we do that already; no need for global inference IMO. So modular
implicits are actually mind-bogglingly simple in concept, and apparently
it works out pretty well. Let's try rewriting our horrible generic_sum()
function using them:
const implicit ListIdx = ...
const implicit ListLen = ...
const implicit IntAdd = ...
const implicit IntDefault = ...
fn generic_sum(
container: @C,
idx_impl: implicit Idx(@C, @Item),
len_impl: implicit Len(@C),
add_impl: implicit Add(@Item),
default_impl: implicit Default(@Item)
): @Item =
let mut total: @Out = default_impl.default()
for i in range(0, len_impl.len(container)) do
total = sum_impl.add(total, idx_impl.idx(container, i))
end
total
end
let some_container: List(I32) = ...
generic_sum(some_container)
I don't like the implicit
syntax per se, because despite appearances I
don't actually like random modifier keywords hanging around all over the
place. But that's just cosmetics. There's other ways you could specify
what values to search for, up to and including just letting it use any
value with the appropriate type, but same as the @
and $
sigils,
starting off with explicit implicit
markers is easy to parse and works
fine for a proof of concept.
So... yeah. It's a braindead-obvious solution that seems to work pretty well, which appeals to me. There's some actual finesse involved with doing the search successfully, of course. You can read the paper and it discusses various details like "does this actually always work?" and "does the search for implementations actually always terminate?" and "do we get different results if we do the search in different orders?" You know, the little things.
Of course it isn't. Here's the major issues I am aware of.
One wrinkle is module inheritance. IMO this is rarely necessary, but
when it is useful it's very useful. A good example is having an
Eq
module and an Ord
module. Anything that implements Ord
is also
Eq
by definition. Meanwhile something that is not Eq
can not be
Ord
. The Modular Implicits paper uses this case to demonstrate a sort
of diamond-inheritance problem, let's re-implement it in Garnet to see
how it looks (making up new syntax as necessary):
type Eq(@Self) = struct
equal: fn(@Self, @Self): Bool
end
type Ord(@Self) = struct
inherits Eq(@Self)
cmp: fn(@Self, @Self): Ordering
end
So then you could do something like this:
fn make_list_eq(eq: Eq(@T)): Eq(List(@T) = ...
fn make_list_ord(ord: Ord(@T)): Ord(List(@T) = ...
const IntListOrd: Ord(List(I32)) = make_list_ord(IntOrd)
const IntListEq: Eq(List(@T)) = make_list_eq(IntEq)
fn equal(eq: implicit Eq(@T), x: @T, y: @T): Bool =
eq.equal(x, y)
end
equal(list_of_int1, list_of_int2)
Since ListOrd
is a valid implementation of Eq(List(@T))
as well as
of Ord(List(@T))
, the call to equal()
can resolve the implicit argument to
be Eq(List(I32))
or Ord(List(I32))
. To paraphrase the Modular
Implicits paper, we can restrict Ord(I32)
to its parent Eq(I32)
and
then apply make_list_eq()
to it to get Eq(List(I32)
, or we can apply
make_list_ord()
to IntOrd
to get Ord(List(I32)
and then restrict it to an
Eq
to get Eq(List(I32))
. We have two
unrelated implementations that can resolve to the same
type, and ambiguous implicit resolutions are not allowed, so this will
not typecheck without you giving it an implementation to use explicitly.
And which is the right impl for you to give, anyway? So with this sort
of inheritance, if you have
different implementations in scope, different parts of the program may
do different things from the same function call! Again from the paper,
"in Haskell, the problem is avoided by canonicity -- it doesn't matter
which way around the diamond we go, we know the result will be the
same", because there's only one valid implementation of each trait.
So doing this kind of inheritance can make things explode, taking our already-not-necessarily-great coherence problem and making it worse. In the paper they instead do something similar to inheritance using "module aliases". Which, as far as I can tell, are baaaaasically just variables in your modules that refer to other modules. Well... what if our modules were just structs? You can already nest structs. So once again we turn OCaml's special case into our general case:
type Eq(@Self) = struct
equal: fn(@Self, @Self): Bool
end
type Ord(@Self) = struct
self_eq: Eq(@Self),
cmp: fn(@Self, @Self): Ordering
end
Ok let's see what the world looks like with these new definitions:
-- A functor to define Eq for anything that already defines Ord.
-- This feels a little circular... but that circularity is
-- precisely the problem we're trying to solve, so here we go.
-- This is our "inheritance" operation... we want Ord to inherit
-- from Eq, so any time you have an Ord(T) then you can turn it
-- into an Eq(T).
fn make_ord_eq(ord: Ord(@T)): Eq(@T) =
$Eq(@T) {
.equal: ord.self_eq.equal,
}
end
const IntEq = $Eq(I32) {
.equal = fn(lhs: I32, rhs: I32): Bool = ... end
}
const IntOrd = $Ord(I32) {
.self_eq: IntEq,
.compare = fn(lhs: I32, rhs: I32): Ordering = ... end
}
-- Functors to implement our Eq and Ord modules for lists of
-- particular T's.
fn make_list_eq(eq: Eq(@T)): Eq(List(@T)) =
$Eq(List(@T) {
.equal = fn(lhs: List(@T), rhs: List(@T)): Bool = ... end
}
end
fn make_list_ord(ord: Ord(@T)): Ord(List(@T)) =
$Ord(List(@T) {
.self_eq = make_list_eq(ord.self_eq),
.cmp: fn(lhs: List(@T), rhs: List(@T)): Ordering = ... end
}
end
Ok, now if you try to create an Eq(List(@T))
you can call
make_ord_eq(make_list_ord(OrdInt))
OR you can call
make_list_eq(EqInt)
, which is a great demonstration of the diamond
problem. (And also IMO a great example of how much ML's separate module
syntax obscures its semantics, since it took me like half an hour on
one page of the paper to figure out what the hell is actually going on
here.) But (supposedly) the compiler is able to trace the value through
the self_eq
fields of the Ord
implementatons to the
make_list_eq(@T)
call either way, so itknows it gets to the same actual
value of type Eq(Int)
both times, IntEq
.
The inheritance diamond still exists but, like Haskell, you can prove
that you end up with the same actual implementation of your module no
matter which path you trace through the diamond.
As the paper notes, this requires that all our functors that produce these modules are totally absolutely 100% pure functions. Fortunately, our functors need to be evaluated at compile time so that the compiler can figure out all the unknown types and optimize unneeded indirection away, and the way you make functions easy to evaluate at compile time is also to make them totally absolutely 100% pure functions. Since we can demonstrably do this kind of type metaprogramming with nothing but functions, structs and type params, I don't think this restriction is much of a problem.
So let's talk about this whole coherence problem more, since it seems to
keep coming up.
The modular implicits paper makes a useful distinction here: canonicity
is the property that, essentially, each trait/typeclass has only one
implementation per type. In Rust you can't have one crate that implements
Hash<Foo>
and another crate that implements Hash<Foo>
for the same
trait Hash
and the same type Foo
. With modules, we do not have
this sort of canonicity, and in fact we can not since a programmer can
just create whatever const
struct value they want.
However, canonicity is a means, not an end in itself. The actual goal is coherence, which is basically the property that the definition of a function can't result in two different implementations depending on how its types are inferred. The above inheritance problem demonstrates a case where you can have two different valid solutions to a type inference problem that lead to different module implementations. So how can we have coherence without canonicity? Once again we'll take an example from the paper, that of an ordered set implemented by a binary tree:
type TreeSet(@Item) = struct
contents: @Item,
left: Option(TreeSet(@Item)),
right: Option(TreeSet(@Item)),
end
fn contains(ord: implicit Ord(@T), set: TreeSet(@T, value: @T): Bool = ... end
let set_of_ints: TreeSet(I32) = ...
contains(IntOrd, set_of_ints, 3)
-- Or if IntOrd is already in scope we can let it be implicit:
contains(set_of_ints, 3)
However, this function doesn't quite work the way you want it to,
because you can have multiple implementations of Ord(I32)
, say one
that compares integers in ascending order and one that compares in
descending order.
The solution the paper seems to give for this problem is "don't do
this". You can instead write your contains()
function as a method in
a module that incorporates a particular Ord
implementation, and use a
functor to generate the implementation from whatever ordering you get it:
type Set(@SetType, @T) = struct
.contains = fn(@SetType, itm: @T): Bool,
end
fn make_tree_set(ord: implicit Ord(@T)): Set(TreeSet(@T), @T) =
$Set(TreeSet(@T), @T) {
.contains = fn(set: TreeSet(@T), itm: @T): Bool = ... end
}
end
fn contains(set_impl: implicit Set(@SetType, @T), set: @SetType, value: @T): Bool =
set_impl.contains(set, value)
end
-- All right now let's imagine we have a file foo.gt:
const implicit IntAscendingOrd: Ord(I32) = ...
const MySetType = make_tree_set()
fn do_stuff(set: MySetType) =
...
end
-- then we have file bar.gt:
const implicit IntDescendingOrd: Ord(I32) = ...
const MySetType = make_tree_set()
let bar_set: MySetType = ...
foo.do_stuff(bar_set)
This successfully doesn't compile! foo.do_stuff()
takes an value of
type foo.MySetType
, which is created by calling make_tree_set(foo.IntAscendingOrd)
,
while bar.MySetType
looks like it's defined the same way but
is a different value created by calling
make_tree_set(foo.IntDescendingOrd)
. These types are not
interchangable and the compiler knows it!
So if you define your Set
module
interface this way, then you get coherence without canonicity and the
compiler will infallibly enforce it for you. In other words, you can't
accidentally mix up Set
impls with different implementations of Ord
, as
long as you make sure you define its interface properly.
So the compiler doesn't enforce it for you, but you can still
work around the coherence problem with a bit of care. "Don't
do that" isn't necessarily a particularly satisfactory solution to this
sort of issue... but honestly, the canonicity solution of "you didn't
create this type or this trait, so you better make a PR upstream because
someone forgot to put #[derive(Ord)]
on their structure" isn't very
satisfactory either. And, upon inspection, Rust's HashMap
implementation does something very similar to this in practice to allow
you to substitute in different hash algorithms and create hashmaps with
identical signatures but non-interchangable types. So really it turns
out that the world is just an imperfect place where careful design is
always a priority, either by action or inaction.
An associated type is basically just the ability to have a module with a field that is a type rather than a value:
type Foo = struct
type SomeType
foo: fn(self.SomeType)
end
If you're brave then you can just have associated types act like
normal struct fields with the type type
:
type Foo = struct
SomeType: type,
foo: fn(self.SomeType)
end
You can do this in practice quite simply: you just say that any instance
of this struct cannot exist at runtime, you can only create it and
evaluate it at compile time and it all must be optimized away into a
puff of logic in the final program. This is basically the approach we've
been taking with modules this whole time anyway. The problem is that
having a type
type that can represent any type is a heckin' fast
track to making your type checker undecidable. Letting a program
manipulate arbitrary types without any constraints
makes it very easy to construct paradoxes like T = List<T>
or
T1 = T2, T2 = T1
, and for some reason people tend to frown on type
checkers that never terminate, and which can't even necessarily tell you
why your program isn't type checking.
So to prevent that you need more limitations and stuff on just what kind
of type type
can be, and proving that it all fits together correctly
gets capital-T Tricky. It's a solvable problem with the right
restrictions, and finding different sets of self-consistent restrictions
leads you down the road of type systems like System F. But I'd prefer to
just avoid it entirely if I can. No matter what choice I make it will
be a layer of special cases atop our nicely simple system.
The thing is... I'm not convinced that we need associated types anyway. I've yet to run into a case with this module system where I couldn't just lift the associated type out of the struct into a type parameter, turning the module impl into a functor if necessary. In other words, every time I've translated a Rust trait with an associated type into Garnet been able to use type parameters and turn this:
type Foo = struct
SomeType: type,
foo: fn(self.SomeType)
end
into this:
type Foo(@SomeType) = struct
foo: fn(@SomeType)
end
That isn't to say that associated types won't be needed at some point. Just that I haven't yet needed them. I may be missing something; these ARE fundamentally different things. Type parameters are given when the type is defined, associated types are given when the type is implemented. So maybe I'll eventually run into a case where associated types are necessary.
But if I can get away without having associated types, then I would like to. So let's look at a couple Rust traits that use associated types and try to implement them with modular generics:
-- Iterator module
type Iterator(@Iter, @Item) = struct
next: fn(@Iter) -> Option(@Item),
end
-- Actual iterator type for our lists
type IteratorForList(@T) = struct
list: List(@T),
current_pos: I32,
end
-- impl Iterator<List<Item>> for IteratorForList<Item>
const ListIter = $Iterator(IteratorForList(@Item), @Item) {
.next = fn(iter: IteratorForList(@Item)): Option(@Item) =
if iter.current_pos == List.len(iter.list) then
None
else
let itm = List.idx(iter.list, iter.current_pos)
iter.current_pos += 1
Some(itm)
end
end
}
Seems fine, really.
Basically, we allow module implementations to have unbound type
parameters in them, which is fine 'cause we allow functions to have
unbound type parameters in them anyway, and our module implementations
are just piles of functions.
I might be doing this wrong,
I'm still writing the code for actually implementing this. But it works in my head.
Let's try again, and make something a bit more sophisticated than our
previous Add
definition:
type Add(@Lhs, @Rhs, @Out) = struct
add: fn(@Lhs, @Rhs): @Out
end
const IntAdd = $Add(I32, I32, I32) {
.add = fn(lhs: I32, rhs: I32): I32 = ... end
}
-- So we can do `instant + some_duration` and get a new instant out
const InstantDurationAdd = $Add(Instant, Duration, Instant) {
.add = fn(lhs: Instant, rhs: Duration): Instant = ... end
}
So, seems to work. We don't need to refer to Self::Output
or
anything like the Rust version
does, all those type
parameters are in the same namespace introduced by $Add(...)
.
It there's something where we don't know all those types until
implementation time, we can we can write functors to fill in the gaps.
(Section 3.6 of the Modular Implicits paper actually talks about exactly
this, but I figured it out myself before reading it, hah!)
So yeah. I'm fairly convinced this approach should work. I'm even reasonably convinced that it can create code that isn't a gigantic pain in the ass, which is the whole point after all. (And that's also more than I can say for stock ML modules.) I have a proof of concept of a type checker that implements the basic type system like 90% worked out as of Dec 2022, and adding the implicit searching atop it should really just be a transformation layer without any changes to the core type checker. So I should hopefully be able to play around with this for real fairly soon.
Things that are out of scope for now but will need be solved later:
foo:do_stuff()
be syntactic
sugar for do_stuff(foo)
, which then takes an implicit module
argument do_stuff(SomeModule, foo)
. It will take a bit of juggling
to see if that works out well in practice though.
(On the flip side, our explicit modules already give us the ability
to specialize function calls a la Rust's turbofish.)impl Foo for Bar<T> where T: Eq
is
way more readable than fn make_foo(Bar(@T), Eq(@T)): Foo(Eq(@T))
or whatever. We really need some syntactic sugar around this sort of
nonsense, and it might also be quite nice to make it so that
functors can get called implicitly when we need them. Possibly we
have a macro or something that takes a type and a module, and
implements that module for that type and creates a value of that impl
with a consistent naming scheme? Or we extend our modular implicits
so that instead of searching for a module impl Foo(Bar)
they can
also search for a functor that returns a Foo(Bar)
and calls it if
necessary? This could end up related to whatever we do that is
similar to derive's.Discussion: https://lobste.rs/s/hot5by/garnet_generics_problem
commit 980fe3d6097d4fb6046777fa72acd0af6dce7f22 Author: Simon Heath <simon.heath@nearearth.aero> Date: 2024-09-05T17:49:25-04:00 does this ever matter? idk maybe.