Merkle Training 笔记

Database Marketing

know me -> engage me -> keep me -> empower me


UT

identify object

test case on business logic

run in dev environment

check and record


SQL

NOT null for dimension table

SQL Sever

windows/column index/distribute replay

T-SQL:

sequence/page data/exception handle/execute/lag lead

sqlcmd -S <server> -U <user> -P <password> -Q "select * from table;"

DML/DDL/DCL(do..while)/TCL(transaction)

RANK(134)/DENSE_RANK(1234)/ROW_NUMBER(1234)/NTILE(1122345) OVER([PARTITION BY] ORDER BY)

SSIS(data flow etl / intergration )/SSRS(report)/SSAS(analyst)

Architecture/Designer/Runtime engine/Service/Task/Data FLow Engine/API

Control/Data/Parameter/EventHandler

OELDB/ADO

用?(0,1,2)代表parameter, 变量不可以在sql中使用

Package》Task》Dataflow》R/T

Container:For...Loop/For each loop/sequence

@[User::variable_name=initial value

Tips:

少用dynamic sql

少用go

union all 更快

between 比 in 快

union可以替代or

少用select into

set no count on 不返回记录

不在where条件上加函数

不创造太复杂的视图



Shell

C-c stop job

C-z pause job

C-d end input

fg/bg

ln / ln -s

chmod owner+group+others

man/help

>/<

" "解析 ‘ ‘字符

sort

less/more

grep

sed

date +y%m%d%

$?/$#/$0..

sed -n '1p' filename

awk -F'\t' '{print $1}'

awk -F'\T' '{print NF}'


DBMS

RDBMS:

SMP(Symmetric Multi-Processing)

MPP(Massively Parallel Processing)  DW

OLTP/OLAP

MySQL Oracle Netezza

batch(go/exec)

identity: server_name.db_name.scheme_name.obj_name

SET ANSI_NULLS(是否查询null)

LIKE ::

+ for SQL Server || for netezza

Control Flow

begin..end

if..else

while..

case

goto

return

break/continue

wait for

join

union/except/intersect

merge

-2^31~2^31-1 integer

-2^63~2^63-1 bigint

create procedure xxx @variable varchar(20) input/output


Index

clustered index(data page)

non-clustered index data page  row id

corering index 复杂索引只有第一个索引可以seek, scan开销很大

include index更轻量


SVN(check in/check out/commit/merge)

branchs/tags/trunk

diffing/blame


DW

relational database

query and analyst rather than transaction

historical data

not only one resource

why:seperate workload/Intergrated 特征:主题/集成/时变/稳定

OLTP OLAP

index few many

scheme 3NF 去范式化

join many some


fact-dimension 

staging data contains: historical data(3NF)/summary data/nalytical area(multi-dimensional)


Dell SMB

dell sftp server-->sterling->etl(stage/cdi/ost/targe)->compaign/model/report

CDI(customer data intergartion)

CR(Connected Recognition):IntellDRESS+AI(b2b/internastional)

+KnowleageLInk(Merkle CDI process):reference base->dup->dedup->overlap(match之后对原有数据进行覆盖)->fisrt logic(对客户进行确认,例如死亡,犯罪等,标准化 names&address)->data fix->demon join(NCOA)反馈处理数据(添加编号,更新地址)->id changes(xref file/id match)->cluster rule(matching rules)

Post CDI(hygiene/match process):

client input->pre-KL(补全格式)-》hygiene->match(global reference base)->output->postKL

output files: base(可否投递)COA(change of address搬家) Drops(信息不全) IID/HID/AID

file check:file count/filename/exist?/code page/row count/column count/header match/do some cleaning

File Bridge

schedule:tidal(enterprise scheduler) watch jobs processing and report issues(job definition/dependece/Job event/Variable)

feed: freference base feeds/suppression only feeds/tag files/transactional metadata

STG/HIST/ERROR

DTLM(data transformation loading map)STT

BRD/TRD

APJ/EMEA/NA/LATAM


Dell PDW

microsoft MPP dw platform

Advance Process:pick best(person,site)/transaction link/segmetation/model scores/preference management(PMC)/id change

CDI->Order process->suppression->address,email,phone validation->derived attributes->aggregates->snapshots->file outpus

batch id/job id/step id/status/start dt/end dt/last task process/error/reject/delete/update cnt/file id/file group id



Netezza:MPP

TwinFn

Disk--FPGA|CPU|Memory->Linux Host->BI|loda|ETL

S-blades(每个都是独立的server)

SPV(snippet processing unit)

SPA(snippet processing arrary)下面有很多SPU+Disk

Distribution Key(最多四个cols) 数据均匀的分布在SPU上

多column不方便join

原则:unique/join key/int/bigint(字节短)/don't on small table

Zone Map(int/bigint/data..)用于data slice, 没有index有primary key无contraints

生效于orderby和organize on

Data Skew Netezza不支持远程语句

netezza sql" odbc/ole db driver

Aginity(Netezza自己的客户端)

WinSQL(can use with any RDB)

nzsql(linux shell)

system table(_v_table/_v_relation_column/_v_procedure)

CREATE TABLE XX( ....) DISTRIBUTE ON cols;

EXTERNAL TABLE (Input/output data with host/client)

sys table host

user table(SPU)basic+temp


Hadoop 

distributed software framework processing of large data/across many nodes

Fault tolerant/scalability/Flexibility

your app code +map/reduce on hdfs

ecosystem of main Java programs

reliable share storage(HDFS,default block 64MB,3 copy)

write once and read many times

read large file scream(not optional) and small IO

hadoop fs -ls

analysis system(Map/Reduce)

Job Tracker/Task Tracker/NameNode/DataNode


Map Reduce(java lib)

assign a map task that can process data in parallel

Project

Hive (online OLTP on-line transaction processing)

hive -f sql file

hive -e sql

Sqoop (transfer data between hadoop and RDB)

Oozie(like Tidal)

Pig

Hbase

NoSQL

非结构化/半结构化(机器数据stream)

web log 很难实现结构化

scable(不是线性)

分布式环境(冗余)

key-value

map-combine-reduce


CA Erwin

ODS


Informatica:

Normal Join

Master Outer(Left Join, Detail is left)

Detail Outer

Full Outer






  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值