The Rust programming language has been gaining sustained popularity over the last few years. At the same time it also has its own share of passionate detractors.
Sometime back Richard Apodaca suggested Rust as a language to consider for chemoinformatics in his Depth-First piece, Cheminformatics in Rust. The site as a whole is a rich mine of chemoinformatics insight. Having explored Rust on my own, I believe Rust can be a wonderful fit for chemoinformatics.
Chemoinformatics (sometimes written as cheminformatics) applies computational and information science methods to chemical data, enabling researchers to store, search, and analyze molecular information to accelerate discovery.
Not only is Rust gaining popularity in creating systems software due to its impressive perfomance, Rust is also being adopted by programmers in other programming language ecosystems to rewrite their foundational language tooling such as compilers, type checkers etc. For example, in the Python ecosystem, Rust has been used by Astral to build the much acclaimed uv toolkit for package management and tooling, and the same team also ships the Ruff linter.
However no other programming language ecosystem has embraced and seen a large proliferation of Rust usage as the JavaScript/Typescript ecosystem. Almost all tooling is being rewritten in Rust, from Oxc challenging Babel and ESLint to Turbopack accelerating bundling.
It came out as a surprise (even unjustifiable to some Rust diehards) that Microsoft chose to rewrite its TypeScript compiler using Go instead of Rust. Be that what it may, the reasons given by Microsoft for the choice of Go makes a lot of sense from a technical and portability standpoint.
The proliferation of Rust in the JavaScript ecosystem has even led some to call Rust as a key component of the Third Age of JavaScript. I believe the chemoinformatics software ecosystem can also be benefit from a similar wave defined by embracing Rust in its ecosystem.
Existing chemoinformatics software
As of now, most of the mature chemoinformatics software is dominated by C/C++ or Java. Take for example the popular trifecta of open source chemoinformatics toolkits, Open Babel, RDKit and CDK. They are primarily written in C++, C++ and Java respectively.
With the wide adoption of Python in the data science and AI/ML ecosystem, it made sense to elevate Python as a first class citizen in cheminformatics software too. It is thus not a surprise that most of the existing C++/Java toolkits expose Python bindings while a good number of new cheminformatics packages are being entirely written in Python.
It would thus seem Python should be the natural choice for cheminformatics software. Why propose Rust? Python has an easier learning curve, has a wide range of mature data analysis libraries eg pandas and ML toolkits such as SciKit etc.
What I am proposing is not an either/or situation between Python and Rust, that would be an artificial distinction. The chemoinformatics ecosystem can benefit further by embracing both Rust and Python, instead of one or the other.
Where we should direct our replacement energy is the legacy C++ and Java code that still underpins many core cheminformatics engines. Rust delivers the same systems-level control and performance as C++ while offering memory safety guarantees and modern tooling, making it a natural successor for that layer. By rewriting the performance-critical kernels in Rust and keeping Python as the ergonomic interface, teams can incrementally modernize without sacrificing speed or usability.
Below I will try to establish the case for Rust, and how it can be used together with Python. Among other reasons, Rust is a good choice for chemoinformatics software because of its performance (comparable to C/C++, better than Java), memory safety, type safety, functional programming primitives, great interoperability to both C/C++ and Python and a growing community and ecosystem.
Performance and memory safety
Computer memory can roughly be managed in two ways, manual via manual allocations (tell the OS give me this much space) or the programming language automatically manages it for you via a garbage collector. While a garbage collector removes the complexities of manual memory management and memory safety bugs, it comes with a performance hit.
As Rust has shown, you can have automatic memory management without the performance hit of a garbage collector. Instead of using a garbage collector, Rust does this by the use of a borrow checker and an affine type system. The Rust compiler enforces memory safety guarantees at compile time without the overhead of a garbage collector at runtime.
However I must also address the ghost in the room; dealing with the borrow checker has been deemed complex by some. This is not without merit and those who dismiss such concerns are doing a disservice to the language.
Dealing with the borrow checker is a challenging and may take time to get right, but in my opinion the benefits far outweighs the inconvinience. It demands some patience, but the payoff is worth it. Hopefully new improvements to the language will make this better.
Type safety and functional programming primitives.
Rust is a strongly typed language, meaning variables have types that are known and enforced at compile time. A type system not only enforces good data abstractions, it encodes domain rules that catch whole classes of bugs before the program ever runs.
The Rust type system, coupled with functional programming imperatives baked in the language makes it a powerful tool. The typestate pattern, for instance, lets you encode data invariants at compile time so invalid states never compile, as explored in the Rust Typestate Pattern article.
In practice this means we can guard chemistry-specific invariants at compile time. Imagine an AtomBuilder that starts in a Unvalidated state and only transitions to Ready after we check valence, charge, and permitted hybridisation. The compiler refuses to let us call add_to_molecule() while the builder sits in Unvalidated, so a developer cannot accidentally emit an atom with valence five on carbon or an impossible charge distribution.
Great interoperability with C/C++ and Python.
Many chemoinformatics workhorses: Open Babel, RDKit, and the Chemistry Development Kit—are written in C++ or Java. Their maintainers layer Python bindings on top so analysts can stay in Python. Picking Rust does not mean abandoning that mature code. Rust talks to native libraries cleanly through its FFI, and crates such as cxx and cc make binding to existing C++ code routine.
On the other side of the stack, Rust crates can ship first-class Python modules thanks to PyO3 and the maturin build tool, so you can expose fast Rust kernels to Python users without asking them to compile anything themselves.
A good example here is Polars, a direct competitor to the Pandas library gaining traction thanks to its performance. It is written in Rust and exposes both a Rust and Python API.
A growing userbase and flourishing ecosystem
Plenty of other languages promise pieces of the puzzle—OCaml, Haskell, or F# for functional purity, Odin or Zig for lean systems programming, but none of them combine Rust’s safety story with its momentum. Rust’s draw is not just the language on paper, it is the energetic community, the crates ecosystem, and the cadence of real-world adopters shipping production code every month.
How do we use Rust and Python in chemoinformatics software
For cases where performance is not critical, throwaway scripts and prototypes, or IO bound ops where language choice makes little difference, use Python.
For use cases where performance is critical, write the core in Rust and expose both a Rust and Python API. Polish the API more where there will be more API users. For example Polars made the decision to spend more efforts polishing the Python API and make it more egornomic and easy to use because most of Polars users are Pythonistas.
Where we need to use a library written in C or C++ use the FFI. For example, Inchi is written in C. Most of the chemoinformatic libraries use the offical inchi C code in some form. I have been playing around and prototyping on how I can use the Inchi C code in a Rust package.
How do I get started with Rust?
If you are still here, hopefully I have convinced you to use Rust in your chemoinformatics projects. A few tips:
- The Rust Book is a good place to start; follow it up with the Rustonomicon exercises.
- Brush up on a few Functional Programming concepts—abstract data types, pattern matching, iterators—with guides like Rust by Example or Programming Rust.
- Brush up on the fundamentals of how computer memory works, virtual memory, and pointers with Beej’s Guide to C Programming or CMU’s Computer Systems course notes.
There are many other wonderful resources for learning Rust on the internet, and LLMs or coding agents can be quite a help.