q -H -t "SELECT COUNT(DISTINCT(uuid)) FROM ./clicks.csv"
229
带 HHERE 条件查询 clicks.csv:
q -H -t "SELECT request_id,score FROM ./clicks.csv WHERE score > 0.7 ORDER BY score DESC LIMIT 5"
2cfab5ceca922a1a2179dc4687a3b26e 1.0
f6de737b5aa2c46a3db3208413a54d64 0.986665809568
766025d25479b95a224bd614141feee5 0.977105183282
2c09058a1b82c6dbcf9dc463e73eddd2 0.703255121794
从标准输入读取输入,计算 /tmp subtree 中每个用户/组的总大小。
sudo find /tmp -ls | q "SELECT c5,c6,sum(c7)/1024.0/1024 AS total FROM - GROUP BY c5,c6 ORDER BY total desc"
mapred hadoop 304.00390625
root root 8.0431451797485
smith smith 4.34389972687
q 命令的关联查询:
q "SELECT myfiles.c8,emails.c2 FROM exampledatafile myfiles JOIN group-emails-example emails ON (myfiles.c4 = emails.c1) WHERE myfiles.c8 = 'ppp'"
ppp dip.1@otherdomain.com
ppp dip.2@otherdomain.com
使用标题行中的列名,计算拥有的进程数最多的前 3 个用户ID,并按降序排列。请注意查询中自动检测到的列名 UID 的用法。
ps -ef | q -H "SELECT UID,COUNT(*) cnt FROM - GROUP BY UID ORDER BY cnt DESC LIMIT 3"
#Output:
root 152
harel 119
avahi 2018
附:q 命令官网整体说明:
Usage:
q allows performing SQL-like statements on tabular text data.
Its purpose is to bring SQL expressive power to manipulating text data using the Linux command line.
Basic usage is q "" where table names are just regular file names (Use - to read from standard input)
When the input contains a header row, use -H, and column names will be set according to the header row content. If there isn't a header row, then columns will automatically be named c1..cN.
Column types are detected automatically. Use -A in order to see the column name/type analysis.
Delimiter can be set using the -d (or -t) option. Output delimiter can be set using -D
All sqlite3 SQL constructs are supported.
Examples:
Example 1: ls -ltrd * | q "select c1,count(1) from - group by c1"
This example would print a count of each unique permission string in the current folder.
Example 2: seq 1 1000 | q "select avg(c1),sum(c1) from -"
This example would provide the average and the sum of the numbers in the range 1 to 1000
Example 3: sudo find /tmp -ls | q "select c5,c6,sum(c7)/1024.0/1024 as total from - group by c5,c6 order by total desc"
This example will output the total size in MB per user+group in the /tmp subtree
See the help or https://github.com/harelba/q/ for more details.
Options:
-h, --help show this help message and exit
-v, --version Print version
-V, --verbose Print debug info in case of problems
-S SAVE_DB_TO_DISK_FILENAME, --save-db-to-disk=SAVE_DB_TO_DISK_FILENAME
Save database to an sqlite database file
--save-db-to-disk-method=SAVE_DB_TO_DISK_METHOD
Method to use to save db to disk. 'standard' does not
require any deps, 'fast' currenty requires manually
running `pip install sqlitebck` on your python
installation. Once packing issues are solved, the fast
method will be the default.
Input Data Options:
-H, --skip-header Skip header row. This has been changed from earlier
version - Only one header row is supported, and the
header row is used for column naming
-d DELIMITER, --delimiter=DELIMITER
Field delimiter. If none specified, then space is used
as the delimiter.
-t, --tab-delimited
Same as -d . Just a shorthand for handling
standard tab delimited file You can use $'\t' if you
want (this is how Linux expects to provide tabs in the
command line
-e ENCODING, --encoding=ENCODING
Input file encoding. Defaults to UTF-8. set to none
for not setting any encoding - faster, but at your own
risk...
-z, --gzipped Data is gzipped. Useful for reading from stdin. For
files, .gz means automatic gunzipping
-A, --analyze-only Analyze sample input and provide information about
data types
-m MODE, --mode=MODE
Data parsing mode. fluffy, relaxed and strict. In
strict mode, the -c column-count parameter must be
supplied as well
-c COLUMN_COUNT, --column-count=COLUMN_COUNT
Specific column count when using relaxed or strict
mode
-k, --keep-leading-whitespace
Keep leading whitespace in values. Default behavior
strips leading whitespace off values, in order to
provide out-of-the-box usability for simple use cases.
If you need to preserve whitespace, use this flag.
--disable-double-double-quoting
Disable support for double double-quoting for escaping
the double quote character. By default, you can use ""
inside double quoted fields to escape double quotes.
Mainly for backward compatibility.
--disable-escaped-double-quoting
Disable support for escaped double-quoting for
escaping the double quote character. By default, you
can use \" inside double quoted fields to escape
double quotes. Mainly for backward compatibility.
--as-text Don't detect column types - All columns will be
treated as text columns
-w INPUT_QUOTING_MODE, --input-quoting-mode=INPUT_QUOTING_MODE
Input quoting mode. Possible values are all, minimal
and none. Note the slightly misleading parameter name,
and see the matching -W parameter for output quoting.
-M MAX_COLUMN_LENGTH_LIMIT, --max-column-length-limit=MAX_COLUMN_LENGTH_LIMIT
Sets the maximum column length.
-U, --with-universal-newlines
Expect universal newlines in the data. Limitation: -U
works only with regular files for now, stdin or .gz
files are not supported yet.
Output Options:
-D OUTPUT_DELIMITER, --output-delimiter=OUTPUT_DELIMITER
Field delimiter for output. If none specified, then
the -d delimiter is used if present, or space if no
delimiter is specified
-T, --tab-delimited-output
Same as -D . Just a shorthand for outputting tab
delimited output. You can use -D $'\t' if you want.
-O, --output-header
Output header line. Output column-names are determined
from the query itself. Use column aliases in order to
set your column names in the query. For example,
'select name FirstName,value1/value2 MyCalculation
from ...'. This can be used even if there was no
header in the input.
-b, --beautify Beautify output according to actual values. Might be
slow...
-f FORMATTING, --formatting=FORMATTING
Output-level formatting, in the format X=fmt,Y=fmt
etc, where X,Y are output column numbers (e.g. 1 for
first SELECT column etc.
-E OUTPUT_ENCODING, --output-encoding=OUTPUT_ENCODING
Output encoding. Defaults to 'none', leading to
selecting the system/terminal encoding
-W OUTPUT_QUOTING_MODE, --output-quoting-mode=OUTPUT_QUOTING_MODE
Output quoting mode. Possible values are all, minimal,
nonnumeric and none. Note the slightly misleading
parameter name, and see the matching -w parameter for
input quoting.
Query Related Options:
-q QUERY_FILENAME, --query-filename=QUERY_FILENAME
Read query from the provided filename instead of the
command line, possibly using the provided query
encoding (using -Q).
-Q QUERY_ENCODING, --query-encoding=QUERY_ENCODING
query text encoding. Experimental. Please send your
feedback on this
q 命令扩展阅读: