q -H -t "SELECT COUNT(DISTINCT(uuid)) FROM ./clicks.csv"
带 HHERE 条件查询 clicks.csv:
q -H -t "SELECT request_id,score FROM ./clicks.csv WHERE score > 0.7 ORDER BY score DESC LIMIT 5"
2cfab5ceca922a1a2179dc4687a3b26e 1.0
f6de737b5aa2c46a3db3208413a54d64 0.986665809568
766025d25479b95a224bd614141feee5 0.977105183282
2c09058a1b82c6dbcf9dc463e73eddd2 0.703255121794
从标准输入读取输入,计算 /tmp subtree 中每个用户/组的总大小。
sudo find /tmp -ls | q "SELECT c5,c6,sum(c7)/1024.0/1024 AS total FROM - GROUP BY c5,c6 ORDER BY total desc"
mapred hadoop 304.00390625
root root 8.0431451797485
smith smith 4.34389972687
q 命令的关联查询:
q "SELECT myfiles.c8,emails.c2 FROM exampledatafile myfiles JOIN group-emails-example emails ON (myfiles.c4 = emails.c1) WHERE myfiles.c8 = 'ppp'"
ppp dip.1@otherdomain.com
ppp dip.2@otherdomain.com
使用标题行中的列名,计算拥有的进程数最多的前 3 个用户ID,并按降序排列。请注意查询中自动检测到的列名 UID 的用法。
root 152
harel 119
avahi 2018
附:q 命令官网整体说明:
q allows performing SQL-like statements on tabular text data.
Its purpose is to bring SQL expressive power to manipulating text data using the Linux command line.
Basic usage is q "" where table names are just regular file names (Use - to read from standard input)
When the input contains a header row, use -H, and column names will be set according to the header row content. If there isn't a header row, then columns will automatically be named c1..cN.
Column types are detected automatically. Use -A in order to see the column name/type analysis.
Delimiter can be set using the -d (or -t) option. Output delimiter can be set using -D
All sqlite3 SQL constructs are supported.
Example 1: ls -ltrd * | q "select c1,count(1) from - group by c1"
This example would print a count of each unique permission string in the current folder.
Example 2: seq 1 1000 | q "select avg(c1),sum(c1) from -"
This example would provide the average and the sum of the numbers in the range 1 to 1000
Example 3: sudo find /tmp -ls | q "select c5,c6,sum(c7)/1024.0/1024 as total from - group by c5,c6 order by total desc"
This example will output the total size in MB per user+group in the /tmp subtree
See the help or https://github.com/harelba/q/ for more details.
-h, --help show this help message and exit
-v, --version Print version
-V, --verbose Print debug info in case of problems
Save database to an sqlite database file
Method to use to save db to disk. 'standard' does not
require any deps, 'fast' currenty requires manually
running `pip install sqlitebck` on your python
installation. Once packing issues are solved, the fast
method will be the default.
Input Data Options:
-H, --skip-header Skip header row. This has been changed from earlier
version - Only one header row is supported, and the
header row is used for column naming
Field delimiter. If none specified, then space is used
as the delimiter.
-t, --tab-delimited
Same as -d . Just a shorthand for handling
standard tab delimited file You can use $'\t' if you
want (this is how Linux expects to provide tabs in the
command line
-e ENCODING, --encoding=ENCODING
Input file encoding. Defaults to UTF-8. set to none
for not setting any encoding - faster, but at your own
-z, --gzipped Data is gzipped. Useful for reading from stdin. For
files, .gz means automatic gunzipping
-A, --analyze-only Analyze sample input and provide information about
data types
-m MODE, --mode=MODE
Data parsing mode. fluffy, relaxed and strict. In
strict mode, the -c column-count parameter must be
supplied as well
-c COLUMN_COUNT, --column-count=COLUMN_COUNT
Specific column count when using relaxed or strict
-k, --keep-leading-whitespace
Keep leading whitespace in values. Default behavior
strips leading whitespace off values, in order to
provide out-of-the-box usability for simple use cases.
If you need to preserve whitespace, use this flag.
Disable support for double double-quoting for escaping
the double quote character. By default, you can use ""
inside double quoted fields to escape double quotes.
Mainly for backward compatibility.
Disable support for escaped double-quoting for
escaping the double quote character. By default, you
can use \" inside double quoted fields to escape
double quotes. Mainly for backward compatibility.
--as-text Don't detect column types - All columns will be
treated as text columns
Input quoting mode. Possible values are all, minimal
and none. Note the slightly misleading parameter name,
and see the matching -W parameter for output quoting.
Sets the maximum column length.
-U, --with-universal-newlines
Expect universal newlines in the data. Limitation: -U
works only with regular files for now, stdin or .gz
files are not supported yet.
Output Options:
Field delimiter for output. If none specified, then
the -d delimiter is used if present, or space if no
delimiter is specified
-T, --tab-delimited-output
Same as -D . Just a shorthand for outputting tab
delimited output. You can use -D $'\t' if you want.
-O, --output-header
Output header line. Output column-names are determined
from the query itself. Use column aliases in order to
set your column names in the query. For example,
'select name FirstName,value1/value2 MyCalculation
from ...'. This can be used even if there was no
header in the input.
-b, --beautify Beautify output according to actual values. Might be
Output-level formatting, in the format X=fmt,Y=fmt
etc, where X,Y are output column numbers (e.g. 1 for
first SELECT column etc.
Output encoding. Defaults to 'none', leading to
selecting the system/terminal encoding
Output quoting mode. Possible values are all, minimal,
nonnumeric and none. Note the slightly misleading
parameter name, and see the matching -w parameter for
input quoting.
Query Related Options:
Read query from the provided filename instead of the
command line, possibly using the provided query
encoding (using -Q).
query text encoding. Experimental. Please send your
feedback on this
q 命令扩展阅读: