Panel on the Use of AI Tools for Mathematics Research

The JMM panel, titled “The use of AI tools in Mathematics research,” was organized by Swarat Chaudhuri, Ayush Khaitan and Amitayush Thakur. It brought together leading mathematicians and computer scientists working in the field of AI and mathematics. The motivation for the panel was two-fold: that these experts could accurately describe the current state of adoption of AI tools in mathematics research, and that the computer science community could get a better understanding of the kinds of tools that would be most useful for mathematicians.

Given below are some highlights from our panel discussion. Any factual errors are the author’s alone.

Panelists

Francois Charton (FAIR, Meta)
Thomas Hubert (Google DeepMind)
Alex Kontorovich (Rutgers University)
Kaiyu Yang (FAIR, Meta)

Before the discussion began, the moderator welcomed the panelists and explained the structure of the panel. The moderator would ask questions either to a particular panelist or to all panelists collectively. The panelists were free to answer both types of questions. Moreover, the panelists were encouraged to get technical and also talk about their own research to better inform their answers.

On the current use of AI tools in Mathematics Research

The discussion began by asking all the panelists about the kinds of AI tools currently being utilized in mathematics research. Alex highlighted that foundational large language models (LLMs) like ChatGPT and Claude are widely used by mathematicians for their research. He stated that discussing research ideas with a powerful LLM is akin to talking to an expert during tea time in his department.

Alex also mentioned that tools such as Lean are slowly gaining traction, along with FunSearch, AlphaGeometry, and PatternBoost. He noted that while DeepMind is developing the Alpha family of technologies, access to these tools by mathematics professors could significantly help their research programs. Thomas agreed that open access to similar technologies would be helpful to mathematicians at universities.

Francois Charton also shared his experience from a semester-long workshop at Harvard CMSA, where he assisted in solving complicated math problems using basic Transformers. He emphasized that Transformers have the capability to understand complex mathematical concepts and can provide solutions that might be challenging even for professional mathematicians.

On Funsearch and PatternBoost

Moving on to the application of technologies like FunSearch and PatternBoost, the panel discussed their current use in constructing examples and counterexamples in combinatorics. Francois noted that combinatorics problems are more amenable to such methods due to their finite search spaces. However, extending these technologies to fields like differential geometry or analysis presents challenges.

The moderator suggested that even in geometry, where there is often a finite list of manifolds to consider, automation through LLMs coupled with tools like FunSearch or PatternBoost could be feasible. Francois agreed, suggesting that geometry problems could be tackled by dividing manifolds into finite components, a potential area for future exploration.

On AlphaProof and the IMO

The conversation then shifted to AlphaProof, an AI tool that is "almost" at the proficiency of a gold medalist at the International Mathematical Olympiad (IMO). Thomas expressed that grounding LLMs with formal theorem provers like Lean is expected to yield significant results in proving research-level mathematical problems. He further clarified that while AlphaProof had been trained on IMO problems, it had been trained in a way akin to how a human would train for it, as opposed to just memorizing large datasets of solutions. Hence, this approach held promise for solving research-level math problems.

Kaiyu Yang elaborated on AlphaProof's performance, noting that it successfully solved 4 out of 6 problems at the 2024 IMO. The two combinatorics problems that remained unsolved involved the translation of non-standard words like "monsters" into Lean, which suggests that a more standard phrasing of the questions could have led AlphaProof to perform even better. Kaiyu posited that the integration of LLMs with formal theorem provers holds promise for tackling the most challenging problems in mathematics.

On grounding LLMs with formal theorem provers

The moderator asked Kaiyu how confident he was in his opinion, as stated in his recent position paper, that LLMs should be coupled with formal theorem provers in order for them to reliably solve complex math problems (these are needed to "check the LLM's work"). After all, OpenAI's O-3 is not coupled with formal theorem provers, and it is able to successfully solve 25% of the questions in EpochAI's FrontierMath benchmark.

Kaiyu responded that although O-3 does not use formal theorem provers, it probably does use a verifier. It remains an open question as to whether it is better to couple LLMs with verifiers, or with formal theorem provers.

Do LLMs actually understand mathematics?

Addressing the skepticism around LLMs' understanding of mathematical concepts, Francois Charton argued that mathematical formulas can be encoded as sentences. In his research at Harvard CMSA, he demonstrated that transformers could translate mathematical questions and formulas into answers similarly to how one might translate English sentences into French.

Alex Kontorovich seemed unconvinced, and recalled that ChatGPT's ability to play chess is fairly limited, although it has ingested most books about the game. He suggested that without a deeper understanding, LLMs might not reliably solve math problems merely by predicting the next token. Francois, however, maintained that transformer architectures are capable of reliably solving most mathematical problems and logical puzzles that they are trained or fine-tuned on.

Conclusion

All in all, the discussion was electric, and the experts skillfully weaved together intuitive explanations as well as technical details into helpful and complete answers for the 100-odd mathematicians and computer scientists in the audience. At the end of the panel, there was a palpable feeling in the audience that something quite fundamental had changed about the nature of mathematics research, and that this wouldn't be the last panel on AI tools and math.