Preskoči na glavno vsebino

Spreadsheet in action - Transit data demo

I visit news.ycombinator.com - the Hacker News - at least few times a day. It often features interesting news items and intelligent comments. Last week I found a language comparison post by losvedir. Author thought of an interesting and compact problem around serving merged data from two CSV files over HTTP. Author so far tested idea only with static/compiled programming languages.

Since Rye is focused on data/information processing and also on "the backend", this sounded like a perfect job for it. Rye also has this very high-level datatype called Spreadsheet (more about the reasoning in the previous blog-post).

This datatype, with some powerful functions, elegant setup for a HTTP server and easy loading of CSV files made whole Rye code very simple. While Go's solution used 180 lines of code and Rust 200, Rye (which is a dynamic language) took just 38 lines.

The problem

Program needs to load two CSV files. First (trips) features 72k lines, second (stop_times) 1.7M. Then the program creates a HTTP server which uses a part of the URL to determine the route. Program needs to find all the trips for this route, and then for each trip the stop times (from the second csv). It returns a two-level JSON structure of all that data.

This is the Rye code that does it, you can find it in our github repo:

You can find all 180 lines of Go code here, and Rust code here.


Loading times

Since the CSV files are big, author bench-marked loading times (his results). Go was among the fastest at this.  I can see from author's results that he has a much faster "computer" than me :).
 
Rye features some unboxed data-structures, useful mostly for bulk data, where values are boxed lazily / on demand, but they are loaded and stored unboxed. Such are the List and Dict, and also Spreadsheet has this mode.
 
That is why Rye loads CSV into the Spreadsheet value as fast as Go does. I compared with hid Go example locally.

Seek times

For some routes, like "Red" on the screenshot, there were couple 1000 trips in the trips CSV, and program needed to search for related stop-times amongst all 1.7M rows for each of them. This proved to be absolutely too slow. 
 
That is why I added simple indexing to the Spreadsheet type. This made it at least 100x faster and usable. I load tested the server with author's script and while it's nowhere near the Go solution it could manage 228 requests/second which is also not so bad for a work-in-progress interpreted language.
 

Code size

Even with very high level languages, there is no magic that would reduce any general code to 20% of, for example, Go code. Rye code here is so much shorter because we have this very high level datatype (Spreadsheet), that matched what we needed to do so well, and the author had to do it all "by hand" with less appropriate structures.

Below is the Rye vs. Go code, for illustration.


Code beauty

Beauty is in the eyes of the beholder, but I think the code really came out beautifully in this case. If you haven't already, read the previous post about the high level ideas in Rye.

And of course follow us on Github!

Komentarji

Priljubljene objave iz tega spletnega dnevnika

Ryelang - controlled file serving example and comparison to Python

This is as anecdotal as it gets, but basic HTTP serving functions in Rye seem to be working quite OK. They do directly use the extremely solid Go 's HTTP functions, so that should be somewhat expected. I made a ryelang.org web-server with few lines of Rye code 3 months ago and the process was running ever since and served more than 30.000 pages. If not else, it  seems there are no inherent memory leaks in Rye interpreter. Those would probably show up in a 3 month long running process? And now I got another simple project. I needed to make a HTTP API for some mobile app. API should accept a key, and return / download a binary file in response if the key is correct. Otherwise it should return a HTTP error. So I strapped in and created Rye code below. I think I only needed to add generic methods stat and size? , all other were already implemented, which is a good sign. Of course, we are in an age of ChatGPT, so I used it to generate the equivalent  Python code. It used the ele...

Receiving emails with Go's smtpd and Rye

This goes a while back. At some project for user support, we needed to receive emails and save them to appropriate databases. The best option back in the day seemed project Lamson . And it worked well ever since. It was written in Python by then quite known programmer Zed Shaw. It worked like a Python based SMTP server, that called your handlers when emails arrived. It was sort of Ruby on Rails for email. We were using this ever since. Now our system needs to be improved, there are still some emails or attachments that don't get parsed correctly. That isn't the problem of Lamson, but of our code that parses the emails. But Lamson development has been passive for more than 10 years. And I am already moving smaller utilities to Rye.  Rye uses Go, and Go has this nice library smtpd , which seems like made for this task. I integrated it and parsemail into Rye and tested it in the Rye console first. Interesting function here is enter-console , that can put you into Rye console any...

Go's concurrency in a dynamic language Rye

  The Rye programming language is a dynamic scripting language based on REBOL’s ideas, taking inspiration from the Factor language and Unix shell. Rye is written in Go and inherits Go’s concurrency capabilities, including goroutines and channels. Recently, Rye gained support for Go’s select and waitgroups. Building blocks Goroutines Goroutines are lightweight threads of execution that are managed by the Go/Rye runtime. They operate independently, allowing multiple tasks to run concurrently without blocking each other. Creating a Goroutine in Rye is straightforward. The go keyword is used to launch a new Goroutine, followed by the Rye function to be executed concurrently. For instance, the following code snippet creates and starts a Goroutine that prints a message after a delay: ; # Hello Goroutine print "Starting Goroutine" go does { ; does creates a function without arguments sleep 1000 print "Hello from Goroutine!" } print "Sleepi...