DataCleaner---14.4 列出数据存储内容和可用组件

14.4 列出数据存储内容和可用组件

命令行界面允许列出数据存储内容和可用组件。如果需要的话,其预期用途是帮助手工编辑分析文件。通过使用-list参数,您可以获得数据存储的元数据以及允许您手动编写分析文件的DataCleaner组件。
如果您查看-usage命令的输出,那么列出数据存储的内容是非常简单的。下面是使用示例数据库“orderdb”的几个示例:

> datacleaner-console.exe -list datastores
Datastores:
-----------
Country codes
orderdb

> datacleaner-console.exe -list tables -ds orderdb
Tables:
-------
CUSTOMERS
CUSTOMER_W_TER
DEPARTMENT_MANAGERS
DIM_TIME
EMPLOYEES
OFFICES
ORDERDETAILS
ORDERFACT
ORDERS
PAYMENTS
PRODUCTS
QUADRANT_ACTUALS
TRIAL_BALANCE

> datacleaner-console.exe -list columns -ds orderdb -table employees
Columns:
--------
EMPLOYEENUMBER
LASTNAME
FIRSTNAME
EXTENSION
EMAIL
OFFICECODE
REPORTSTO
JOBTITLE

列出DataCleaner的组件是通过将-list参数设置为以下三种组件类型之一完成的:ANALYZER、TRANSFORMER或FILTER:

> datacleaner-console.exe -list analyzers

...

name: Matching analyzer
- Consumes multiple input columns (type: UNDEFINED)
- Property: name=Dictionaries, type=Dictionary, required=false
- Property: name=String patterns, type=StringPattern, required=false
name: Pattern finder
- Consumes 2 named inputs
   Input column: Column (type: STRING)
   Input column: Group column (type: STRING)
- Property: name=Discriminate text case, type=Boolean, required=false
- Property: name=Discriminate negative numbers, type=Boolean, required=false
- Property: name=Discriminate decimals, type=Boolean, required=false
- Property: name=Enable mixed tokens, type=Boolean, required=false
- Property: name=Ignore repeated spaces, type=Boolean, required=false
- Property: name=Upper case patterns expand in size, type=boolean, required=false
- Property: name=Lower case patterns expand in size, type=boolean, required=false
- Property: name=Predefined token name, type=String, required=false
- Property: name=Predefined token regexes, type=String, required=false
- Property: name=Decimal separator, type=Character, required=false
- Property: name=Thousands separator, type=Character, required=false
- Property: name=Minus sign, type=Character, required=false

...

> datacleaner-console.exe -list transformers

...

name: Tokenizer
- Consumes a single input column (type: STRING)
- Property: name=Delimiters, type=char, required=true
- Property: name=Number of tokens, type=Integer, required=true
- Output type is: STRING
name: Whitespace trimmer
- Consumes multiple input columns (type: STRING)
- Property: name=Trim left, type=boolean, required=true
- Property: name=Trim right, type=boolean, required=true
- Property: name=Trim multiple to single space, type=boolean, required=true
- Output type is: STRING

...


点这儿返回DataCleaner文档主目录

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值