exploring typecheking sql query in typed/racket

This post will describe a POC on how to typecheck sql query in typed/racket.


The goal is to make this typecheck:

(: some-query : (Listof (Vector String String)))
(define some-query
  (query
   movies
   #:from     ([actor a] [movie m])
   #:where    (= a.name "Jhon Wayne")
   #:select   (a.name m.name)))

With the schema defined as:

(define-schema movies
  (actor    [id Integer]
            [name String])
  (movie    [id Integer]
            [name String]
            [director_id Integer])
  (director [id Integer]
            [name String]))

This POC already take me too much time so there will be shortcomings the full source code is available here and its example.

The define-schema macro

This macro will define a syntax transformer retaining the schema information to be used later by the query macro. The type of the columns didn’t reflect any SQL type it is just typed/racket types, user defined types could be used also.

The code isn’t worth commenting the expanded code will be:

(define-syntax movies
  (schema-st 
   (list 
    'movies
    (quote-syntax
     ((actor ((id Integer) (name String)))
      (movie ((id Integer) (name String) (director_id Integer)))
      (director ((id Integer) (name String))))))))

(define-for-syntax (struct schema-st (data) #:property prop:procedure (lambda (st) (schema-st-data st))))

This is completely straight forward it defines a movie transformer that retains data available for other macros.

For shortcoming reasons it’s unclear if using the struct schema is necessary or if (define-syntax movies (list …)) would have done the trick.

Also, the data structure shouldn’t have been a complete syntax object of the whole schema information but just a syntax object of the types, like:

`(movies
  ((actor ((id ,#'Integer) (name ,#'String)))
   (movie ((id ,#'Integer) (name ,#'String) (director_id ,#'Integer)))
   (director ((id ,#',#'Integer) (name ,#'String)))))

And it also could output type alias to have abstract type:

(define-type movies.actor.id Integer)
(define-type movies.actor.name String)
(define-type movies.movie.id Integer)
;;...

The query macro

Copied the example here to refresh your mind.

(: some-query : (Listof (Vector String String)))
(define some-query
  (query
   movies
   #:from     ([actor a] [movie m])
   #:where    (= a.name "Jhon Wayne")
   #:select   (a.name m.name)))

This one take the definition by whatever you have named it, movies in this case. It will fetch the schema definition thought syntax-local-value so you have to pass the movies every time you use the query macro.

As you would expect this macro check that #:from and #:select are present only once, bark if an alias isn’t defined for each table, for select and where it checks the columns name are present in table actor and movie.

The expanded code isn’t even useful just to ensure the typecheking mechanism works.

The two interesting parts are:

  • It uses ann on the where expression to ensure it has the type of a.name so it gives (ann “Jhon Wayne” String)

  • It uses cast on the query result and generate runtime assertion on every rows. Typed/racket didn’t provide an unsafe-cast form yet.

The expansion could look like:

(: some-query : (Listof (Vector String String)))
(define some-query
  (cast (exec-query 
         "SELECT a.name, m.name FROM actor AS a, movie AS m WHERE a.name = ?"
         (ann "Jhon Wayne" String))
        (Listof (Vectorof String String))))

Conclusion

  1. Using cast could be costly maybe generate our own guard with predicate could reduce this cost.

  2. Have to pass the schema definition to every query could be tedious and inelegant. Maybe racket/stxparam could help or maybe define-schema should generate a transformer query-movies that is like query without the movie part.

  3. The query DSL is redundant with SQL because it’s already a DSL. It may be possible to use a reader for SQL something like #sql"SELECT * FROM table".

  4. The types of the schema information didn’t reflect the SQL type, the ability to pass any kind of type alone isn’t useful, it should also pass two procedures one to convert from the complex type to some SQL representation and the inverse. Example: if you want to use json in some column option of type TEXT the definition should be [options Json read-json write-json].

  5. All this is an awful lot of works. :)