14.4 列出数据存储内容和可用组件
命令行界面允许列出数据存储内容和可用组件。如果需要的话,其预期用途是帮助手工编辑分析文件。通过使用-list参数,您可以获得数据存储的元数据以及允许您手动编写分析文件的DataCleaner组件。
如果您查看-usage命令的输出,那么列出数据存储的内容是非常简单的。下面是使用示例数据库“orderdb”的几个示例:
> datacleaner-console.exe -list datastores
Datastores:
-----------
Country codes
orderdb
> datacleaner-console.exe -list tables -ds orderdb
Tables:
-------
CUSTOMERS
CUSTOMER_W_TER
DEPARTMENT_MANAGERS
DIM_TIME
EMPLOYEES
OFFICES
ORDERDETAILS
ORDERFACT
ORDERS
PAYMENTS
PRODUCTS
QUADRANT_ACTUALS
TRIAL_BALANCE
> datacleaner-console.exe -list columns -ds orderdb -table employees
Columns:
--------
EMPLOYEENUMBER
LASTNAME
FIRSTNAME
EXTENSION
EMAIL
OFFICECODE
REPORTSTO
JOBTITLE
列出DataCleaner的组件是通过将-list参数设置为以下三种组件类型之一完成的:ANALYZER、TRANSFORMER或FILTER:
> datacleaner-console.exe -list analyzers
...
name: Matching analyzer
- Consumes multiple input columns (type: UNDEFINED)
- Property: name=Dictionaries, type=Dictionary, required=false
- Property: name=String patterns, type=StringPattern, required=false
name: Pattern finder
- Consumes 2 named inputs
Input column: Column (type: STRING)
Input column: Group column (type: STRING)
- Property: name=Discriminate text case, type=Boolean, required=false
- Property: name=Discriminate negative numbers, type=Boolean, required=false
- Property: name=Discriminate decimals, type=Boolean, required=false
- Property: name=Enable mixed tokens, type=Boolean, required=false
- Property: name=Ignore repeated spaces, type=Boolean, required=false
- Property: name=Upper case patterns expand in size, type=boolean, required=false
- Property: name=Lower case patterns expand in size, type=boolean, required=false
- Property: name=Predefined token name, type=String, required=false
- Property: name=Predefined token regexes, type=String, required=false
- Property: name=Decimal separator, type=Character, required=false
- Property: name=Thousands separator, type=Character, required=false
- Property: name=Minus sign, type=Character, required=false
...
> datacleaner-console.exe -list transformers
...
name: Tokenizer
- Consumes a single input column (type: STRING)
- Property: name=Delimiters, type=char, required=true
- Property: name=Number of tokens, type=Integer, required=true
- Output type is: STRING
name: Whitespace trimmer
- Consumes multiple input columns (type: STRING)
- Property: name=Trim left, type=boolean, required=true
- Property: name=Trim right, type=boolean, required=true
- Property: name=Trim multiple to single space, type=boolean, required=true
- Output type is: STRING
...