TYPES.HK

Why I tend to leave Agda for Lean (Part 1)

Sep 26, 2025

Before starting everything, I want to make it clear that I'm not an expert in Agda, and I do think I only use a small subset of Agda's features. I can be wrong and biased. I first learned Agda in 2022 with PLFA book, and decided to use it for my research in 2023, the beginning year of my PhD study. In the before I used Coq (now Rocq) in my master's thesis. What bugs me is that I always felt like I do not have a full control over it when programming in Rocq. The decision towards Agda was then obvious: it's simplicity in syntax and theory, and everything is *explicit*. I loved this explicitness so much. My first Agda project is finished in about a year. This is the language I enjoy working with, and I am quite grateful to. Later I have been continuing developing my theory&system of my agda project, by adding new constructs, new features, changing to well-scoped infrastructure, and wrote a lot of lemmas. It turns out it is increased to 10x lines of code than previous project, and the explosive growth of lemmas is making this project hard to maintain. When the lines of code reached 5k, I started to feel a bit uncomfortable; when the lines reached 10k, I had to tolerate to keep splitting the files (for speeding type-checking). Now it is over 15k and already becomes painful to work with. The awkward situations include but are not limited to: * You're proving a transitivity lemma, and you are destructing two subtyping judgments which are both complicated, the agda-mode takes 30 seconds to make a case-splitting, and each time you refine and fill a hole, it still takes 30 seconds to re-check the whole file; * You tended to get along with its slow-compilation, you googled a lot about the [performance tips of Agda](https://wiki.portal.chalmers.se/agda/Main/PerformanceTips), also searching around in Zulip and mailing list. And you are told the best practice is to split the files, and in this way you can avoid type-checking proved lemmas. You know it only helps a little, the best practice is actually to buy an Apple M-chip machine, which performs great in single-core tasks. * You felt okay because you were not fighting alone. Your colleague was also using Agda to prove a lemma in a complex system. Each-hole filling took minutes to compile, and he had 50+ holes to finish. Then some day, he told you he had **a great finding** and shared with you, which is to abstract each case in that theorem as separate lemmas, together with IH as a premise, then you can avoid re-type-checking the whole theorem you're working on, for speeding up the process. You felt happy to know, but you can't stop worrying about yours. Ok, we should end this topic. I think that's all about working on a large project, it's just that Agda is not very suitable for it, there's nothing wrong with the design of Agda itself and I'm still a fan of it. I just need to make a change. The timing of finding an alternative isn't perfect but also not bad: * I just wrapped up my second Agda project. I can be ready for the next one, with the *right* tool, which is good. * The next project will still be based on the previous one, this requires a certain amount of migration. * and I'm perhaps(?) entering into the senior stage of PhD study, giving up the tool accompanying me for years is definitely a hard decision. That's my situation, and it may not be applicable to everyone. What could be potentially helpful to readers, instead of listening to my whining, is that I want to further share my thoughts about **why Lean** after immersing myself into it in recent weeks. ## Namespace The **namespace** is quite good. It may be not a theoretical improvement, but it is definitely a practical game-changer in proof engineering. Consider in polymorphic type systems, you have defined several judgments around types, for example: well-formed types `WF`, closed types `Closed`, and perhaps others. Then you may write code like: ``` data WF : Env -> Type -> Set where wf-int : WF T Int wf-var : (inScope x T) -> WF T (Var x) wf-arr : (WF T A) -> (WF T B) -> WF T (A -> B) wf-forall : WF (tvar T) A -> WF T (forall A) ``` Then for the `Closed`, you may name your instance constructors `clo-int`, `clo-var`, `clo-arr`, `clo-forall`. You're duplicating things here. And duplication is more than tedious to write, but also hard to maintain and predict. All those judgments are inductive definitions over types, and they should share the same part of the name. In lean, you can write ``` inductive WF : Env -> Type -> Prop | int : WF T Int | var : (inScope x T) -> WF T (Var x) | arr : (WF T A) -> (WF T B) -> WF T (A | forall : WF (tvar T) A -> WF T (forall A) inductive Closed : Env -> Type -> Prop | int : Closed T Int | var : (inScope x T) -> Closed T (Var x) | arr : (Closed T A) -> (Closed T B) -> Closed T (A -> B) | forall : Closed (tvar T) A -> Closed T (forall A) ``` two judgments will automatically create namespaces `WF` and `Closed`, and then you can refer them as `WF.int`, and `Closed.int`. What makes things more interesting, is Lean has the feature of namespace inference, in most places you can just put `int` or `.int`, without opening the namespace. ## Metaprogramming This is more like a *for-fun* than a *for-profit* feature to me. Racket is the language that takes me to the PL research, I used it in my undergrad thesis, and I even bet half of my time to sell it to my supervisor in the application interview. I always look forward to a language that has macro systems not just about S-expressions. There's a Racket2 project, but my work stops me from distracting myself to it, I haven't tried it seriously. Now there's a theorem prover with a powerful macro system, why not? I said it's for fun but it's actually really useful in at least two places: (1) Extending tactics or writing new tactics; (2) EDSL for the system modelled. ## Automation I use scores to make my point clear, but scores are only chosen based on my impression when writing this post, and it does not mean anything statistically. Again, I'm not an expert on neither languages. I can be wrong and biased, and I do not intend to start a holy war. * In Agda you can get 1/5 level of automation with 5/5 efforts about tinkering with reflection, ring-solver. * In Rocq you can get 4/5 level of automation with 1/5 efforts, thanks to amazing tools like *CoqHammer*; you can get 5/5 level of automation with 5/5 efforts by choosing good tools, configuring good search bases and writing good tactics. * In Lean you can perhaps get 2/5 level of automation with 1/5 efforts, get 5/5 level of automation with 4/5 efforts. I'm quite ok with Lean in terms of automation, and it may require more tweaks, but on the other hand, you have more control over it. ## Dependent Types As a modern dependently typed language, Agda and Lean share a lot of similarities. ### 1. Dependent Pattern Matching Dependent pattern matching is super helpful when destructing two related expressions. For example, to prove the determinism of a small-step operational semantics, you may have a lemma like: ``` theorem step_det : (e --> e1) -> (e --> e2) -> e1 == e2 ``` Often we induction on the first `step` and destruct the second `step`. If you reason it in a tactic rewriting style, the proof tends to shape like ``` induction h1 generalizing b2 case PlusConstConst => cases h2 with | PlusConstConst => rfl | Plus1 hstep => trivial | Plus2 hstep => trivial case Plus1 h1 ih => cases h2 with | PlusConstConst => trivial | Plus1 h2' => have ih' := ih _ h2'; rw [ih'] | Plus2 h2' => simp; trivial case Plus2 h1' ih => cases h2 with | PlusConstConst => trivial | Plus1 h2' => simp; trivial | Plus2 h2' => have ih' := ih _ h2'; rw [ih'] ``` However, we only care about three cases, which can be captured by using dependent pattern matching: ``` match h1, h2 with | .PlusConstConst, .PlusConstConst => rfl | .Plus1 h1', .Plus1 h2' => ... | .Plus2 h1', .Plus2 h2' => ... ``` But you can use some dummy-cases filtering tactics like `<;> try (solve | trivial)` to achieve the same effect, with a bit proof noise: ``` induction h1 generalizing b2 case PlusConstConst => cases h2 <;> try (solve | trivial) case Plus1 h1 ih => cases h2 <;> try (solve | trivial) case Plus1 h2' => have ih' := ih _ h2'; rw [ih'] case Plus2 h1' ih => cases h2 <;> try (solve | trivial) case Plus2 h2' => have ih' := ih _ h2'; rw [ih'] ``` I'm glad that Lean supports both styles. ### 2. Indexed Datatypes Agda has a good support for indexed datatypes, and I use well-scoped de Bruijn indices, where to use natural numbers as indices to indicate the available number of free variables in the environment. But there seems to be a problem when I want more computable and richer indices in Agda, but Agda rejects it. For example, Agda cannot match terms with `Type (1 + m)` and terms will `Type (m + 1)`, and will report an error that cannot unify two terms. It could happen to generalize the index here to arbitrary `n`, and make some proof. But it is still tedious to write. I don't have a good idea about how to have a good way to write it in Agda. But in Lean, it works well, and it leaves a place for users to filling the proof between indices, and you can use tactics to automate it, since properties around indices are often trivial. Maybe after later, when I have more experience with Lean in terms of indexed datatypes, I might write a more detailed post about it. ### 3. Mutual Induction Direct support for mutual induction is my main reason for using Agda, since my research involves a lot of mutually defined judgments. Rocq prover seemed to generate wrong mutual induction principles in old days, but I'm not sure how it behaves now. Lean has a built-in support for mutual induction, which is nice to know. ## Great editor support: VS Code I have been using Emacs for eight years, I am into its philosophy, I have no complaints about `agda2-mode`. But, Lean on VS Code, is just, fantastic. It's highly recommended trying it out. Also, the info-view (aka goal view) looks to be extensible. Once a time I want to hack into the `agda2-mode` info view to generate better goals, but gave up after a few hours exploring ELisp and Haskell codebase and getting lost. ## AI Assistance Lean is chosen as a platform for boosting and benchmarking AI4Math research. I haven't tried the specialized tools&models like [Lean Copilot](https://github.com/lean-dojo/LeanCopilot), but I did try to use Claude4 to generate Lean code, given some finished Agda proof I let it read and generate the corresponding Lean proof for me, for routine lemmas like commutativity between shifts, it took several iterations to get a correct proof, but works well. ## Final Words I will still choose Agda for small projects, it does a great job for hands-on experimentation and quick prototyping. However, for larger projects and more complex formalization, tactics-based proof assistants like Lean seem to make the life easier. At the time of writing this post, I'm still on the transition and getting myself familiar with Lean. In the part 2, I will give a deeper discussion about Lean. For those who are interested, I want to share some references about Lean, in the fields of PL: * [PLFA in Lean](https://github.com/rami3l/PLFaLean): A port of the PLFA book in Lean. * [FSub in Lean](https://github.com/Linyxus/cslib/tree/wellscoped-fsub): A well-scoped de Bruijn version of FSub formalized in Lean, using parallel substitution. * [Coq to Lean Cheatsheet](https://github.com/funexists/coq-lean-cheatsheet)