Preskoči na glavno vsebino

Off Topic: Exploring the Cayley Graph Database (part 1)


I am trying to integrate Cayley's, a graph database's query language Gizmo, into Rye. To do this, I need to get to know it well enough. Not just the exact query language, but I need a better understanding of how to solve problems with graph databases. 

Since outside of the official documentation and tutorial, I couldn't find many resources about Gizmo, I decided to write this down.

I am using the 30kmoviedata graph, that you can find on Cayley's Github repository. AFAIK the graph includes at least a network of movies, directors and actors. I will try to compare the Gizmo to SQL where applicable. 

I am just learning, take it all with a grain of salt and let me know if I made any mistakes.

Very basics

Let's start with the basics and get a title of a film based on ID. In SQL this would be something like:

select name from film where id = "/en/alien_1979";

and in Cayley's Gizmo:

g.V().Is("</en/alien_1979>").Out("<name>").All()

The query language is based on Gremlin, and looks like JavaScript. "g" and "V()" stand for "graph" and "Vertex()".

In general we find some nodes and then follow the path to more nodes. Name is not a property of the film node per-se, but another node with a "<name>" edge.

Vertex is another name for node, edges for links between the nodes.

Predicates

Functions Out and In follow the links from current nodes. You can use them without an argument to follow any type of link, or with, to specify it.

g.V().Is("</en/alien_1979>").Out().All()

{
	"result": [
		{
			"id": "</en/ridley_scott>"
		},
		{
			"id": "_:2597"
		},
                ... more of these ...
                {
			"id": "Alien"
		},
		{
			"id": "</film/film>"
		}
	]
}

This wasn't very helpfull. We got all the nodes, that a film links to. To see what kinds of links are possible we use functions OutPredicates and InPredicates:

g.V().Is("</en/alien_1979>").OutPredicates().All()

{
    "result": [
        {
            "id": "</film/film/directed_by>"
        },
        {
            "id": "</film/film/starring>"
        },
        {
            "id": "<name>"
        },
        {
            "id": "<type>"
        }
    ]
}

So our film node links to it's directors, actors, it's name and type.

InPredicates in the case above doesn't return anything, so nothing links to it.

Finding director(s)

We can see above, that a film node points to it's directors. Let's get the names of them, for this movie:

graph.V().Is("</en/alien_1979>").Out("</film/film/directed_by>").Out("<name>").All()

{
    "result": [
        {
            "id": "Ridley Scott"
        }
    ]
}


At first thought, we would do this in SQL with a single join:

select p.name  
from films f join people p on f.directed_by = p.id 
where f.id = "/en/alien_1979";


But there is a problem, as we will see later, a film can have more than one directors, so we need an additional many-to-many table, which makes the code a little more messy:

select p.name 
from films f 
 join film2director f2d on f.id = f2d.id_film  
 join people p on p.id = f2d.id_director 
where f.id = "/en/alien_1979";

 
We get a little deeper in the next blog-post.

Komentarji

Priljubljene objave iz tega spletnega dnevnika

Less variables, more flows example vs Python

In the last blogpost ( Less variables, more flows ) I wrote a quick practical script I needed. It was an uncommon combination of CGI, two GET requests with Cookies and a POST request with Authorization header. I really like practical random/dirty problems, rather than ideal - made up problems to test the language. To get a sense of comparison I rewrote the example 2 times while removing specific Rye features. But that comparison is meaningless to a person that doesn't know Rye or at least Rebol already. So I went on fiverr and made a request for a Python script with these requirements. I got a nicely written Python script that uses functions for each step. To be more comparable, I rewrote the Rye code to a similar structure. Below is the result ... For a next step, it would be interesting, to extract a little simpler example out and add error handling. With Rye-s specific failure handling, I think the difference would become even greater. You can find Rye on github .

Ryelang - controlled file serving example and comparison to Python

This is as anecdotal as it gets, but basic HTTP serving functions in Rye seem to be working quite OK. They do directly use the extremely solid Go 's HTTP functions, so that should be somewhat expected. I made a ryelang.org web-server with few lines of Rye code 3 months ago and the process was running ever since and served more than 30.000 pages. If not else, it  seems there are no inherent memory leaks in Rye interpreter. Those would probably show up in a 3 month long running process? And now I got another simple project. I needed to make a HTTP API for some mobile app. API should accept a key, and return / download a binary file in response if the key is correct. Otherwise it should return a HTTP error. So I strapped in and created Rye code below. I think I only needed to add generic methods stat and size? , all other were already implemented, which is a good sign. Of course, we are in an age of ChatGPT, so I used it to generate the equivalent  Python code. It used the elegant

Receiving emails with Go's smtpd and Rye

This goes a while back. At some project for user support, we needed to receive emails and save them to appropriate databases. The best option back in the day seemed project Lamson . And it worked well ever since. It was written in Python by then quite known programmer Zed Shaw. It worked like a Python based SMTP server, that called your handlers when emails arrived. It was sort of Ruby on Rails for email. We were using this ever since. Now our system needs to be improved, there are still some emails or attachments that don't get parsed correctly. That isn't the problem of Lamson, but of our code that parses the emails. But Lamson development has been passive for more than 10 years. And I am already moving smaller utilities to Rye.  Rye uses Go, and Go has this nice library smtpd , which seems like made for this task. I integrated it and parsemail into Rye and tested it in the Rye console first. Interesting function here is enter-console , that can put you into Rye console any