Preskoči na glavno vsebino

Spreadsheet in action - Transit data demo

I visit news.ycombinator.com - the Hacker News - at least few times a day. It often features interesting news items and intelligent comments. Last week I found a language comparison post by losvedir. Author thought of an interesting and compact problem around serving merged data from two CSV files over HTTP. Author so far tested idea only with static/compiled programming languages.

Since Rye is focused on data/information processing and also on "the backend", this sounded like a perfect job for it. Rye also has this very high-level datatype called Spreadsheet (more about the reasoning in the previous blog-post).

This datatype, with some powerful functions, elegant setup for a HTTP server and easy loading of CSV files made whole Rye code very simple. While Go's solution used 180 lines of code and Rust 200, Rye (which is a dynamic language) took just 38 lines.

The problem

Program needs to load two CSV files. First (trips) features 72k lines, second (stop_times) 1.7M. Then the program creates a HTTP server which uses a part of the URL to determine the route. Program needs to find all the trips for this route, and then for each trip the stop times (from the second csv). It returns a two-level JSON structure of all that data.

This is the Rye code that does it, you can find it in our github repo:

You can find all 180 lines of Go code here, and Rust code here.


Loading times

Since the CSV files are big, author bench-marked loading times (his results). Go was among the fastest at this.  I can see from author's results that he has a much faster "computer" than me :).
 
Rye features some unboxed data-structures, useful mostly for bulk data, where values are boxed lazily / on demand, but they are loaded and stored unboxed. Such are the List and Dict, and also Spreadsheet has this mode.
 
That is why Rye loads CSV into the Spreadsheet value as fast as Go does. I compared with hid Go example locally.

Seek times

For some routes, like "Red" on the screenshot, there were couple 1000 trips in the trips CSV, and program needed to search for related stop-times amongst all 1.7M rows for each of them. This proved to be absolutely too slow. 
 
That is why I added simple indexing to the Spreadsheet type. This made it at least 100x faster and usable. I load tested the server with author's script and while it's nowhere near the Go solution it could manage 228 requests/second which is also not so bad for a work-in-progress interpreted language.
 

Code size

Even with very high level languages, there is no magic that would reduce any general code to 20% of, for example, Go code. Rye code here is so much shorter because we have this very high level datatype (Spreadsheet), that matched what we needed to do so well, and the author had to do it all "by hand" with less appropriate structures.

Below is the Rye vs. Go code, for illustration.


Code beauty

Beauty is in the eyes of the beholder, but I think the code really came out beautifully in this case. If you haven't already, read the previous post about the high level ideas in Rye.

And of course follow us on Github!

Komentarji

Priljubljene objave iz tega spletnega dnevnika

Less variables, more flows example vs Python

In the last blogpost ( Less variables, more flows ) I wrote a quick practical script I needed. It was an uncommon combination of CGI, two GET requests with Cookies and a POST request with Authorization header. I really like practical random/dirty problems, rather than ideal - made up problems to test the language. To get a sense of comparison I rewrote the example 2 times while removing specific Rye features. But that comparison is meaningless to a person that doesn't know Rye or at least Rebol already. So I went on fiverr and made a request for a Python script with these requirements. I got a nicely written Python script that uses functions for each step. To be more comparable, I rewrote the Rye code to a similar structure. Below is the result ... For a next step, it would be interesting, to extract a little simpler example out and add error handling. With Rye-s specific failure handling, I think the difference would become even greater. You can find Rye on github .

Ryelang - controlled file serving example and comparison to Python

This is as anecdotal as it gets, but basic HTTP serving functions in Rye seem to be working quite OK. They do directly use the extremely solid Go 's HTTP functions, so that should be somewhat expected. I made a ryelang.org web-server with few lines of Rye code 3 months ago and the process was running ever since and served more than 30.000 pages. If not else, it  seems there are no inherent memory leaks in Rye interpreter. Those would probably show up in a 3 month long running process? And now I got another simple project. I needed to make a HTTP API for some mobile app. API should accept a key, and return / download a binary file in response if the key is correct. Otherwise it should return a HTTP error. So I strapped in and created Rye code below. I think I only needed to add generic methods stat and size? , all other were already implemented, which is a good sign. Of course, we are in an age of ChatGPT, so I used it to generate the equivalent  Python code. It used the elegant

Receiving emails with Go's smtpd and Rye

This goes a while back. At some project for user support, we needed to receive emails and save them to appropriate databases. The best option back in the day seemed project Lamson . And it worked well ever since. It was written in Python by then quite known programmer Zed Shaw. It worked like a Python based SMTP server, that called your handlers when emails arrived. It was sort of Ruby on Rails for email. We were using this ever since. Now our system needs to be improved, there are still some emails or attachments that don't get parsed correctly. That isn't the problem of Lamson, but of our code that parses the emails. But Lamson development has been passive for more than 10 years. And I am already moving smaller utilities to Rye.  Rye uses Go, and Go has this nice library smtpd , which seems like made for this task. I integrated it and parsemail into Rye and tested it in the Rye console first. Interesting function here is enter-console , that can put you into Rye console any