This is a tutorial for the migration from SQL to LogsQL . It is expected that you are familiar with SQL and know how to execute queries at VictoriaLogs .
Data Model #
See also this playground, which automatically converts SQL queries to LogsQL queries . The source code of this project is available here .
SQL is usually used for querying relational tables. Every such table contains a predefined set of columns with predefined types. LogsQL is used for querying logs. Logs are stored in log streams . So log streams are analogous to tables in relational databases. Log streams and relational tables have the following major differences:
- Log streams are created automatically when the first log entry (row) is ingested into them.
- There is no predefined schema in log streams, so logs with an arbitrary set of fields can be ingested into every log stream. Both names and values in every log entry are strings. They may contain arbitrary string data.
- Every log entry (row) can be represented as a flat JSON object:
{"f1":"v1",...,"fN":"vN"}. See these docs . - By default, VictoriaLogs selects log entries across all log streams. The needed set of log streams can be specified via stream filters .
- By default, VictoriaLogs returns all the fields across the selected logs. The set of returned fields
can be limited with the
fieldspipe .
Query Structure #
SQL query structure is quite convoluted:
SELECT
<fields, aggregations, calculations, transformations>
FROM <table>
<optional JOINs>
<optional filters with optional subqueries>
<optional GROUP BY>
<optional HAVING>
<optional ORDER BY>
<optional LIMIT / OFFSET>
<optional UNION>
LogsQL query structure is much simpler:
<filters>
| <optional_pipe1>
| ...
| <optional_pipeN>
The <filters> part selects the needed logs (rows) according to the provided
filters
.
Then the provided
pipes
are executed sequentially.
Every such pipe receives all the rows from the previous stage, performs some calculations and/or transformations,
and then pushes the resulting rows to the next stage. This simplifies reading and understanding the query: just read it from beginning
to end to understand what it does at every stage.
LogsQL pipes cover all the functionality of SQL: aggregations, calculations, transformations, subqueries, joins, post-filters, sorting, etc. See the conversion rules on how to convert SQL to LogsQL.
Conversion Rules #
The following rules must be used for converting an SQL query into a LogsQL query:
- If the SQL query contains
WHERE, then convert it into LogsQL filters . Otherwise, just start the LogsQL query with*. For example,SELECT * FROM table WHERE field1=value1 AND field2<>value2is converted intofield1:=value1 field2:!=value2, whileSELECT * FROM tableis converted into*. INsubqueries insideWHEREmust be converted intoinfilters . For example,SELECT * FROM table WHERE id IN (SELECT id2 FROM table)is converted intoid:in(* | fields id2).- If the
SELECTpart isn’t equal to*and there are noGROUP BY/ aggregate functions in the SQL query, then enumerate the selected columns atfieldspipe . For example,SELECT field1, field2 FROM tableis converted into* | fields field1, field2. - If the SQL query contains
JOIN, then convert it intojoinpipe . - If the SQL query contains
GROUP BY/ aggregate functions, then convert them tostatspipe . For example,SELECT count(*) FROM tableis converted into* | count(), whileSELECT user_id, count(*) FROM table GROUP BY user_idis converted to* | stats by (user_id) count(). Note how the LogsQL query mentions theGROUP BYfields only once, while SQL forces you to mention these fields twice: in theSELECTand in theGROUP BYclauses. How many times have you run into a discrepancy betweenSELECTandGROUP BYfields? - If the SQL query contains additional calculations and/or transformations at the
SELECT, which aren’t covered yet byGROUP BY, then convert them into the corresponding LogsQL pipes . The most frequently used pipes aremathandformat. For example,SELECT field1 + 10 AS x, CONCAT("foo", field2) AS y FROM tableis converted into* | math field1 + 10 as x | format "foo<field2>" as y | fields x, y. - If the SQL query contains
HAVING, then convert it intofilterpipe . For example,SELECT user_id, count(*) AS c FROM table GROUP BY user_id HAVING c > 100is converted into* | stats by (user_id) count() c | filter c:>100. - If the SQL query contains
ORDER BY,LIMITandOFFSET, then convert them intosortpipe . For example,SELECT * FROM table ORDER BY field1, field2 LIMIT 10 OFFSET 20is converted into* | sort by (field1, field2) limit 10 offset 20. - If the SQL query contains
UNION, then convert it intounionpipe . For example,SELECT * FROM table WHERE filters1 UNION ALL SELECT * FROM table WHERE filters2is converted intofilters1 | union (filters2).
SQL queries are frequently used to obtain top N column values that are the most frequently seen in the selected rows.
For example, the query below returns the top 5 user_id values that are present in the largest number of rows:
SELECT user_id, count(*) hits FROM table GROUP BY user_id ORDER BY hits DESC LIMIT 5
LogsQL provides a shortcut syntax with the
top pipe
for this case:
* | top 5 (user_id)
It is equivalent to the longer LogsQL query:
* | by (user_id) count() hits | sort by (hits desc) limit 5
LogsQL pipes support much wider functionality compared to SQL, so spend some time reading pipe docs and playing with them at VictoriaLogs demo playground or at the playground for Grafana datasource for VictoriaLogs .