Preskoči na glavno vsebino

Html parsing dialect demo

In the last post I said that I will focus on data (pre)processing with Rye next and gradually try to make Rye a useful tool in this smaller niche.

I already made few demos in this area previously. The first step will be making a HTML parsing / scraping functionality in a form of a custom dialect. The dialect will share most of the traits with other tree navigation dialects (XML, JSON, ...).

It's a "streaming" parser, Sax (not DOM) like. This means it doesn't load document into memory and then you navigate it, but it "streams" the document and you can react to events of finding specific nodes. This way you can parse huge files, it's very memory efficient and documents can be parsed as they are loaded.

Below is an image of the Rye repl that shows few examples. I will write a tutorial about the dialect soon. New code will be pushed to github in next few days.

 

Komentarji

Priljubljene objave iz tega spletnega dnevnika

Less variables, more flows example vs Python

In the last blogpost ( Less variables, more flows ) I wrote a quick practical script I needed. It was an uncommon combination of CGI, two GET requests with Cookies and a POST request with Authorization header. I really like practical random/dirty problems, rather than ideal - made up problems to test the language. To get a sense of comparison I rewrote the example 2 times while removing specific Rye features. But that comparison is meaningless to a person that doesn't know Rye or at least Rebol already. So I went on fiverr and made a request for a Python script with these requirements. I got a nicely written Python script that uses functions for each step. To be more comparable, I rewrote the Rye code to a similar structure. Below is the result ... For a next step, it would be interesting, to extract a little simpler example out and add error handling. With Rye-s specific failure handling, I think the difference would become even greater. You can find Rye on github .

Ryelang - controlled file serving example and comparison to Python

This is as anecdotal as it gets, but basic HTTP serving functions in Rye seem to be working quite OK. They do directly use the extremely solid Go 's HTTP functions, so that should be somewhat expected. I made a ryelang.org web-server with few lines of Rye code 3 months ago and the process was running ever since and served more than 30.000 pages. If not else, it  seems there are no inherent memory leaks in Rye interpreter. Those would probably show up in a 3 month long running process? And now I got another simple project. I needed to make a HTTP API for some mobile app. API should accept a key, and return / download a binary file in response if the key is correct. Otherwise it should return a HTTP error. So I strapped in and created Rye code below. I think I only needed to add generic methods stat and size? , all other were already implemented, which is a good sign. Of course, we are in an age of ChatGPT, so I used it to generate the equivalent  Python code. It used the elegant

Receiving emails with Go's smtpd and Rye

This goes a while back. At some project for user support, we needed to receive emails and save them to appropriate databases. The best option back in the day seemed project Lamson . And it worked well ever since. It was written in Python by then quite known programmer Zed Shaw. It worked like a Python based SMTP server, that called your handlers when emails arrived. It was sort of Ruby on Rails for email. We were using this ever since. Now our system needs to be improved, there are still some emails or attachments that don't get parsed correctly. That isn't the problem of Lamson, but of our code that parses the emails. But Lamson development has been passive for more than 10 years. And I am already moving smaller utilities to Rye.  Rye uses Go, and Go has this nice library smtpd , which seems like made for this task. I integrated it and parsemail into Rye and tested it in the Rye console first. Interesting function here is enter-console , that can put you into Rye console any