SQL to LogsQL Tutorial #

This is a tutorial for the migration from SQL to LogsQL . It is expected that you are familiar with SQL and know how to execute queries at VictoriaLogs .

Data Model #

See also this playground, which automatically converts SQL queries to LogsQL queries . The source code of this project is available here .

SQL is usually used for querying relational tables. Every such table contains a predefined set of columns with predefined types. LogsQL is used for querying logs. Logs are stored in log streams . So log streams are analogous to tables in relational databases. Log streams and relational tables have the following major differences:

Log streams are created automatically when the first log entry (row) is ingested into them.
There is no predefined schema in log streams, so logs with an arbitrary set of fields can be ingested into every log stream. Both names and values in every log entry are strings. They may contain arbitrary string data.
Every log entry (row) can be represented as a flat JSON object: {"f1":"v1",...,"fN":"vN"}. See these docs .
By default, VictoriaLogs selects log entries across all log streams. The needed set of log streams can be specified via stream filters .
By default, VictoriaLogs returns all the fields across the selected logs. The set of returned fields can be limited with the fields pipe .

Query Structure #

SQL query structure is quite convoluted:

      SELECT
  <fields, aggregations, calculations, transformations>
FROM <table>
  <optional JOINs>
  <optional filters with optional subqueries>
  <optional GROUP BY>
  <optional HAVING>
  <optional ORDER BY>
  <optional LIMIT / OFFSET>
  <optional UNION>
    

LogsQL query structure is much simpler:

      <filters>
  | <optional_pipe1>
  | ...
  | <optional_pipeN>
    

The <filters> part selects the needed logs (rows) according to the provided filters . Then the provided pipes are executed sequentially. Every such pipe receives all the rows from the previous stage, performs some calculations and/or transformations, and then pushes the resulting rows to the next stage. This simplifies reading and understanding the query: just read it from beginning to end to understand what it does at every stage.

LogsQL pipes cover all the functionality of SQL: aggregations, calculations, transformations, subqueries, joins, post-filters, sorting, etc. See the conversion rules on how to convert SQL to LogsQL.

Conversion Rules #

The following rules must be used for converting an SQL query into a LogsQL query:

If the SQL query contains WHERE, then convert it into LogsQL filters . Otherwise, just start the LogsQL query with * . For example, SELECT * FROM table WHERE field1=value1 AND field2<>value2 is converted into field1:=value1 field2:!=value2, while SELECT * FROM table is converted into *.
IN subqueries inside WHERE must be converted into in filters . For example, SELECT * FROM table WHERE id IN (SELECT id2 FROM table) is converted into id:in(* | fields id2).
If the SELECT part isn’t equal to * and there are no GROUP BY / aggregate functions in the SQL query, then enumerate the selected columns at fields pipe . For example, SELECT field1, field2 FROM table is converted into * | fields field1, field2.
If the SQL query contains JOIN, then convert it into join pipe .
If the SQL query contains GROUP BY / aggregate functions, then convert them to stats pipe . For example, SELECT count(*) FROM table is converted into * | count(), while SELECT user_id, count(*) FROM table GROUP BY user_id is converted to * | stats by (user_id) count(). Note how the LogsQL query mentions the GROUP BY fields only once, while SQL forces you to mention these fields twice: in the SELECT and in the GROUP BY clauses. How many times have you run into a discrepancy between SELECT and GROUP BY fields?
If the SQL query contains additional calculations and/or transformations at the SELECT, which aren’t covered yet by GROUP BY, then convert them into the corresponding LogsQL pipes . The most frequently used pipes are math and format . For example, SELECT field1 + 10 AS x, CONCAT("foo", field2) AS y FROM table is converted into * | math field1 + 10 as x | format "foo<field2>" as y | fields x, y.
If the SQL query contains HAVING, then convert it into filter pipe . For example, SELECT user_id, count(*) AS c FROM table GROUP BY user_id HAVING c > 100 is converted into * | stats by (user_id) count() c | filter c:>100.
If the SQL query contains ORDER BY, LIMIT and OFFSET, then convert them into sort pipe . For example, SELECT * FROM table ORDER BY field1, field2 LIMIT 10 OFFSET 20 is converted into * | sort by (field1, field2) limit 10 offset 20.
If the SQL query contains UNION, then convert it into union pipe . For example, SELECT * FROM table WHERE filters1 UNION ALL SELECT * FROM table WHERE filters2 is converted into filters1 | union (filters2).

SQL queries are frequently used to obtain top N column values that are the most frequently seen in the selected rows. For example, the query below returns the top 5 user_id values that are present in the largest number of rows:

      SELECT user_id, count(*) hits FROM table GROUP BY user_id ORDER BY hits DESC LIMIT 5
    

LogsQL provides a shortcut syntax with the top pipe for this case:

      * | top 5 (user_id)

It is equivalent to the longer LogsQL query:

      * | by (user_id) count() hits | sort by (hits desc) limit 5

LogsQL pipes support much wider functionality compared to SQL, so spend some time reading pipe docs and playing with them at VictoriaLogs demo playground or at the playground for Grafana datasource for VictoriaLogs .

Previous How To Convert Loki Queries to VictoriaLogs Queries Next Articles