In the last post I said that I will focus on data (pre)processing with Rye next and gradually try to make Rye a useful tool in this smaller niche.
I already made few demos in this area previously. The first step will be making a HTML parsing / scraping functionality in a form of a custom dialect. The dialect will share most of the traits with other tree navigation dialects (XML, JSON, ...).
It's a "streaming" parser, Sax (not DOM) like. This means it doesn't load document into memory and then you navigate it, but it "streams" the document and you can react to events of finding specific nodes. This way you can parse huge files, it's very memory efficient and documents can be parsed as they are loaded.
Below is an image of the Rye repl that shows few examples. I will write a tutorial about the dialect soon. New code will be pushed to github in next few days.
Komentarji
Objavite komentar