SQL to LogsQL tutorial #

This is a tutorial for the migration from SQL to LogsQL . It is expected you are familiar with SQL and know how to execute queries at VictoriaLogs .

data model #

SQL is usually used for querying relational tables. Every such table contains a pre-defined set of columns with pre-defined types. LogsQL is used for querying logs. Logs are stored in log streams . So log streams is an analogue of tables in relational databases. Log streams and relational tables have the following major differences:

Log streams are created automatically when the first log entry (row) is ingested into them.
There is no pre-defined scheme in log streams - logs with arbitrary set of fields can be ingested into every log stream. Both names and values in every log entry have string type. They may contain arbitrary string data.
Every log entry (row) can be represented as a flat JSON object: {"f1":"v1",...,"fN":"vN"}. See these docs .
By default VictoriaLogs selects log entries across all the log streams. The needed set of log streams can be specified via stream filters .
By default VictoriaLogs returns all the fields across the selected logs. The set of returned fields can be limited with fields pipe .

query structure #

SQL query structure is quite convoluted:

      SELECT
  <fields, aggregations, calculations, transformations>
FROM <table>
  <optional JOINs>
  <optional filters with optional subqueries>
  <optional GROUP BY>
  <optional HAVING>
  <optional ORDER BY>
  <optional LIMIT / OFFSET>
  <optional UNION>
    

LogsQL query structure is much simpler:

      <filters>
  | <optional_pipe1>
  | ...
  | <optional_pipeN>
    

The <filters> part selects the needed logs (rows) according to the provided filters . Then the provided pipes are executed sequentially. Every such pipe receives all the rows from the previous stage, performs some calculations and/or transformations, and then pushes the resulting rows to the next stage. This simplifies reading and understanding the query - just read it from the beginning to the end in order to understand what does it do at every stage.

LogsQL pipes cover all the functionality from SQL: aggregations, calculations, transformations, subqueries, joins, post-filters, sorting, etc. See the conversion rules on how to convert SQL to LogsQL.

conversion rules #

The following rules must be used for converting SQL query into LogsQL query:

If the SQL query contains WHERE, then convert it into LogsQL filters . Otherwise just start LogsQL query with * . For example, SELECT * FROM table WHERE field1=value1 AND field2<>value2 is converted into field1:=value1 field2:!=value2, while SELECT * FROM table is converted into *.
IN subqueries inside WHERE must be converted into in filters . For example, SELECT * FROM table WHERE id IN (SELECT id2 FROM table) is converted into id:in(* | fields id2).
If the SELECT part isn’t equal to * and there are no GROUP BY / aggregate functions in the SQL query, then enumerate the selected columns at fields pipe . For example, SELECT field1, field2 FROM table is converted into * | fields field1, field2.
If the SQL query contains JOIN, then convert it into join pipe . – If the SQL query contains GROUP BY / aggregate functions, then convert them to stats pipe . For example, SELECT count(*) FROM table is converted into * | count(), while SELECT user_id, count(*) FROM table GROUP BY user_id is converted to * | stats by (user_id) count(). Note how the LogsQL query mentions the GROUP BY fields only once, while SQL forces mentioning these fields twice - at the SELECT and at the GROUP BY. How many times did you hit the discrepancy between SELECT and GROUP BY fields?
If the SQL query contains additional calculations and/or transformations at the SELECT, which aren’t covered yet by GROUP BY, then convert them into the corresponding LogsQL pipes . The most frequently used pipes are math and format . For example, SELECT field1 + 10 AS x, CONCAT("foo", field2) AS y FROM table is converted into * | math field1 + 10 as x | format "foo<field2>" as y | fields x, y.
If the SQL query contains HAVING, then convert it into filter pipe . For example, SELECT user_id, count(*) AS c FROM table GROUP BY user_id HAVING c > 100 is converted into * | stats by (user_id) count() c | filter c:>100.
If the SQL query contains ORDER BY, LIMIT and OFFSET, then convert them into sort pipe . For example, SELECT * FROM table ORDER BY field1, field2 LIMIT 10 OFFSET 20 is converted into * | sort by (field1, field2) limit 10 offset 20.
If the SQL query contains UNION, then convert it into union pipe . For example SELECT * FROM table WHERE filters1 UNION ALL SELECT * FROM table WHERE filters2 is converted into filters1 | union (filters2).

SQL queries are frequently used for obtaining top N column values, which are the most frequently seen in the selected rows. For example, the query below returns top 5 user_id values, which present in the biggest number of rows:

      SELECT user_id, count(*) hits FROM table GROUP BY user_id ORDER BY hits DESC LIMIT 5
    

LogsQL provides a shortcut syntax with top pipe for this case:

      * | top 5 (user_id)

It is equivalent to the longer LogsQL query:

      * | by (user_id) count() hits | sort by (hits desc) limit 5

LogsQL pipes support much wider functionality comparing to SQL, so spend your spare time by reading pipe docs and playing with them at VictoriaLogs demo playground or at the playground for Grafana datasource for VictoriaLogs .

Previous LogsQL examples Next How to convert Loki queries to VictoriaLogs queries