My first journey of rust

During my more than 10 years of programming, I have tried to learn Java/Python/Golang/PHP/JavaScript,
Of course, C++ and C are painful memories from when I first went to college.
For me, learning a new programming language is often a painful process,
you have to break some of your grammatical expectations of the previous language,
and at the same time, the entire language design logic has to be well tuned to yourself,
Especially for Rust, this process is extremely painful.
But when you achieve something in a small way, you are also very happy.

After 2 months of trying to learn from the Rust documentation,
I had an ambitious idea to try to rewrite the Apache Uniffle server with Rust.

As an important optimization of the big data computing framework,
Uniffle takes over the shuffle data of a large number of Spark computing tasks, providing
the writing and reading of temporary data must not only provide high-performance and high-stability,
but also need to reduce the memory resource consumption as much as possible.
Because of the terrible GC problem of GRPC Java, in the Java world, Netty is often used to avoid GC.

But this time, because I am full of longing for Rust's GC-free language and performance,
I also hope that it can give me the basic components for future big data.
With more imagination, I started to rewrite Apache Uniffle, which is also the beginning of this journey.

Rewriting the Uniffle server is a challenging task.
The whole rust project includes GRPC protocol support and various types of persistence.
storage support, asynchronous/synchronous coordination, and more.

GRPC support

In fact, I find that the whole rust world is full of creative and sharing people,
thanks to good module management, I quickly use tonic
The framework of the grpc interface protocol has been built.
But because tonic does not support the use of bytes crate for protobuf bytes, we have to use
some workarounds ways.

Similarly, compared to Java, the lack of tonic documentation
and examples prevents me from understanding how to implement monitoring methods for grpc,
including network connections,throughput, etc. Fortunately, everything is grasping the key points,
find some projects that also use Tonic to learn its in-depth usage

Anyway, this is a great project for me!

Support for multiple storages

The implementation of uniffle needs to write some large block data to HDFS.
In fact, I must say that there is no one in the rust world.An easy to use and complete HDFS client.
Fortunately, Apache opendal provides a basic version of the JNI HDFS client, because my work
Integrated in the field of big data infrastructure, I can also quickly use opendal to access hdfs.
For those who have not been exposed to Hadoop, the complicated configuration will take a lot of time.

Thanks to the open source community, I was able to quickly implement support for HDFS.
At the same time, I also contributed some pull requests to give back to the open source community.

tokio's journey

As far as I know, tokio is already a widely used asynchronous implementation in the Rust community.
Candy brings happiness, but makes you fat. After using tokio as my asynchronous executor,
I also successfully ran through the demo with a small amount of data.

But when I use the rust version of uniffle (I call it riffle) to compare with Java uniffle,
I find that the effect is very poor, and the performance drops by more than 2 times.
As I started a long journey of investigation and performance improvement,
I think this also allows me to have a deeper understanding of some features of rust, and to the industrial level
The RUST implementation went a step further.

First, we enabled cpu profiling and used tikv's jemalloc package,
which can expose associated cpu counters and draw flame graphs.
This part is very much thanks to another small partner, which was implemented by him.

But from the flame graph, I didn't see the bottleneck of related performance,
so unfortunately I continued troubleshooting the problem, and it degenerated to the most basic way of logging.
In the process, let me guess that it is Tokio's scheduling performance problem,
because when Riffle receives writes, the concurrency is as high as 4000-5000,
the read and write pressure is very high.

I've been looking for a painless tool to confirm my hunch.
tokio console is one option, from its dashboard,
I found some problems, but I have no way of knowing which crate or code segment is causing it,
which makes me very worried.

After searching on google, I locked a total, await tree can expose the wait time of an async execution,
This is so important to me to be able to probe every grpc request where the problem is,
on which await code section is executing very slowly.

After using it, the problem was immediately exposed. The tokio mutex was causing
too many context switches and the performance was very bad. in my part
After replacing the code with std mutex, the performance is obviously improved.
This is a big trap for beginners, asynchronous lock. It doesn't make you deadlock,
but it brings serious performance problems under high concurrency. In the future,
I will also use other tools and develop more metrics to further improve performance.

It's been a really interesting journey, not only implementing an ambitious
little project from 0 to 1, but also giving me insight into different performance profiles.

Finally, I would like to give a few development suggestions for beginners that
I have gained during this journey of rust

Don't stare at complicated macros and bitter ownership, as well as advanced features like pins, try to write a project that is the most important.
Do not use tokio mutex in key performance areas.
Do not use nested dashmap structure, it is also the cause of performance loss of flamegraph

Attached is my rust journey harvest: https://github.com/zuston/riffle