I visit news.ycombinator.com - the Hacker News - at least few times a day. It often features interesting news items and intelligent comments. Last week I found a language comparison post by losvedir. Author thought of an interesting and compact problem around serving merged data from two CSV files over HTTP. Author so far tested idea only with static/compiled programming languages.
Since Rye is focused on data/information processing and also on "the backend", this sounded like a perfect job for it. Rye also has this very high-level datatype called Spreadsheet (more about the reasoning in the previous blog-post).
This datatype, with some powerful functions, elegant setup for a HTTP server and easy loading of CSV files made whole Rye code very simple. While Go's solution used 180 lines of code and Rust 200, Rye (which is a dynamic language) took just 38 lines.
The problem
Program needs to load two CSV files. First (trips) features 72k lines, second (stop_times) 1.7M. Then the program creates a HTTP server which uses a part of the URL to determine the route. Program needs to find all the trips for this route, and then for each trip the stop times (from the second csv). It returns a two-level JSON structure of all that data.
This is the Rye code that does it, you can find it in our github repo:
You can find all 180 lines of
Go code
here, and
Rust code
here.
Loading times
Since the CSV files are big, author bench-marked loading times (
his results).
Go was among the fastest at this. I can see from author's results that he has a much faster "computer" than me :).
Rye features some unboxed data-structures, useful mostly for bulk data, where values are boxed lazily / on demand, but they are loaded and stored unboxed. Such are the List and Dict, and also Spreadsheet has this mode.
That is why Rye loads CSV into the Spreadsheet value as fast as Go does. I compared with hid Go example locally.
Seek times
For some routes, like "Red" on the screenshot, there were couple 1000 trips in the trips CSV, and program needed to search for related stop-times amongst all 1.7M rows for each of them. This proved to be absolutely too slow.
That is why I added simple indexing to the Spreadsheet type. This made it at least 100x faster and usable. I load tested the server with author's script and while it's nowhere near the Go solution it could manage 228 requests/second which is also not so bad for a work-in-progress interpreted language.
Code size
Even with very high level languages, there is no magic that would reduce any general code to 20% of, for example, Go code. Rye code here is so much shorter because we have this very high level datatype (Spreadsheet), that matched what we needed to do so well, and the author had to do it all "by hand" with less appropriate structures.
Below is the Rye vs. Go code, for illustration.
Code beauty
Beauty
is in the eyes of the beholder, but I think the code really came out
beautifully in this case. If you haven't already, read the
previous post about the
high level ideas in Rye.
Komentarji
Objavite komentar