Tips for Storm

Storm:
spout,tuple,bolt
Storm记录级容错原理:
A xor A = 0
A xor B … xor B xor A =0,其中每个操作数出现且仅出现两次。
在storm做实时计算,必须创建topology;topology是计算图,其中每个节点包含一个处理逻辑,节点之间的链接表明了数据如何在节点之间被传输。
sorm的核心是stream,stream是无边界的tuple序列。
storm为stream的转换提供的基本组件是spouts和bolts;
bolt消费多个input stream,做一些处理,且可生成新的stream;复杂的stream转换,如从tweet stream计算trending topics,需要多个步骤和多个bolts;bolts通过运行函数,过滤tuple,聚集,链接,与database交互可以实现任何操作。
spouts和bolts的网络被包装成一个topology,这是提交给storm集群来执行的最顶层抽象。
代码示例: ExclamationBolt追加!!!到它的输出:
public static class ExclamationBolt implements ITichBolt{
OutputCollector _collector;
public void prepare(Map conf,TopologyContext context,OutputCollector collector){
_collector = collector;
}
public void execute(Tuple tuple){
_collector.emit(tuple,new Values(tuple.getString(0)+”!!!”));
_collector.ack(tuple);
}
public void cleanup(){
}
public void declareOutputFields(OutputFieldsDeclarer declarer){
declarer.declare(new Fields(“word”));
}
public Map getComponentConfiguration(){
return null;
}
}
prepare方法给bolt提供了OutputCollector,用来输出tuple;tuple可以在prepare,execute,cleanup等方法中任何时候输出,或者在其他同步线程中。
execute方法从bolt的输入接收tuple,ExacamationBolt从tuple中抓取第一个field,并输出新的tuple;如果自己实现的bolt订阅了多个输入源,你可以通过Tuple.getSourceComponent方法来判断来自哪个组件。
cleanup和getComponentConfiguration方法在bolt实现中并不是经常需要。可以使用Base Class来提供默认实现。
storm两种运行模式:本地模式和分布式模式;
workers运行topology;
the following is core code e.g. :
Config conf = new Config();
conf.setDebug(true);
conf.setNumWorkers(2);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology(“test”,conf,builder.createTopology());
Utils.sleep(10000);
cluster.killTopology(“test”);
cluster.shutdown();
stream grouping告诉topology在两个组件中怎样发送tuple;spout和bolt在集群中并行以多个task执行。
the following is core code e.g. :
TopologyBuilder builder = new TopologyBuilder();
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout(“sentences”,new RandomSentenceSpout(),5);
builder.setBolt(“split”,new SplitSentence(),8).shuffleGrouping(“sentences”);
builder.setBolt(“count”,new WordCount(),12).fieldsGrouping(“split”,new Fields(“word”));
field grouping保证同一个word分配到同一个task,fields grouping通过field的字集将stream分组,fields grouping是streaming join和streaming aggregation的实现基础。
为了保证数据能正确的被处理,对于spout产生的每一个tuple,storm都会跟踪,这里面涉及到ack/fail的处理,如果一个tuple处理成功,会调用spout的ack方法,如果失败,会调用fail方法;而在处理tuple的每一个bolt都会通过OutputCollector来告知storm,当前bolt处理是否处理成功。
IBasicBolt实现类不关心ack/fail,spout的ack/fail完全由后面的bolt的ack/fail来决定,所以IBasicBolt用来做filter或者简单的计算比较合适。
this is core code about ack/fail:
public void execute(Tuple input){
_collector.setContext(input);
try{
_bolt.execute(input,_collector);
_collector.getOutputter().ack(input);
}catch(FailedException e){
LOG.warn(“failed to process tuple”,e);
_collector.getOutputter().fail(input);
}
}
只要bolt集合中的任何一个fail,会立即触发spout的fail方法,而ack方法需要所有的bolt调用为ack时才能触发。
Bolt常见处理步骤:
1,读入一个tuple;
2,根据这个输入tuple,提取后发射0个,1个或多个tuple;
3,最后通过ack操作确认这个tuple被处理成功。
Storm专门为这种模式封装了相应接口:IBasicBolt,BaseBasicBolt等类实现了这一接口。
The following is a demo:
public static class WordCount extends BaseBasicBolt{
Map<String,Integer> counts =new HashMap<String,Integer>();
@Override
public void execute(Tuple tuple,BasicOutputCollector collector){
String word = tuple.getString(0);
Integer count = counts.get(word);
if(count == null)
   count = 0;
count++;
counts.put(word,count);
collector.emit(new Values(word,count));
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer){
declarer.declare(new Fields(“word”,”count”));
}
}
Storm批处理:将多个Tuple元组批量插入数据库;the following is a demo:
public class BatchingBolt implements IRichBolt{
private static final long serialVersionUID = 1L;
private OutputCollector collector;
private Queue<Tuple> tupleQueue = new ConcurrentLinkedQueue<Tuple>();
private int count;
private long lastTime;
private Connection conn;
public BatchingBolt(int n){
count = n;
conn = DBManager.getConnection();
lastTime = System.currentTimeMillis();
}
@Override
public void prepare(Map stormConf,TopologyContext context,OutputCollector collector){
this.collector = collector;
}
@Override
public void execute(Tuple tuple){
tupleQueue.add(tuple);
long currentTime = System.currentTimeMillis();
//每count条tuple批量提交一次,或者每隔1秒提交一次
if(tupleQueue.size() >= count || currentTime >= lastTime +1000){
Statement stmt = conn.createStatement();
conn.setAutoCommit(false);
for(int i = 0;i<count;i++){
Tuple tup = (Tuple)tupleQueue.poll();
String sql = DBManager.getSql(tup);
stmt.addBatch(sql);
collector.ack(tup);
}
stmt.executeBatch();
conn.commit;
conn.setAutoCommit(true);
lastTime = currentTime;
}
}
@Override
public Map<String,Object> getComponentConfiguration(){
return null;
}
}
并行度就是线程数,task相当于bolt实例,默认情况下一个线程执行一个task,通过.setNumTasks(number)可设置一个线程执行多个task;
典型难点:滑动窗口,topN问题。
应用场景举例(订单分析):
select count(id)  有效订单量,sum(totalPrice)  优惠前金额,sum(totalPrice - discount) 优惠后金额,count(distinct memberId) 下单用户数,   (第三个bolt    数据分析,存储)
case when substring(sendpay,9,1)=‘1’ when substring(sendpay,9,1) =‘2’ else -1 end 手机客户端下单标记   from  realtime_orders wehre createdate >= ‘2014-11-11’     (第一个bolt check是否有效,根据里面的日期字段进行判断)    group by case when substring(sendpay,9,1)=‘1’ when substring(sendpay,9,1)=‘2’ then 2 else -1 end    (第二个bolt,进行sendpay的数据纠正)
zookeeper锁机制:curator

Sure, here are some tips for installing EGSnrc on Ubuntu: 1. Update your system: Before starting the installation process, it's always a good idea to update your system. Open a terminal and run the following command: ``` sudo apt update && sudo apt upgrade -y ``` 2. Install required dependencies: EGSnrc requires some libraries and packages to be installed. Run the following command to install them: ``` sudo apt install build-essential gfortran libx11-dev libxt-dev libxpm-dev libxft-dev libxmu-dev libglu1-mesa-dev ``` 3. Download the EGSnrc distribution: Visit the EGSnrc website (https://nrc-cnrc.github.io/EGSnrc/) and download the latest distribution file (e.g., egsnrc_vn.mmm.tar.gz), where "vn" represents the version number and "mmm" represents the release number. 4. Extract the distribution: Open a terminal and navigate to the directory where you downloaded the distribution file. Use the following command to extract it: ``` tar -xvf egsnrc_vn.mmm.tar.gz ``` 5. Set up environment variables: In order to use EGSnrc, you need to set up some environment variables. Open your shell configuration file (e.g., ~/.bashrc) in a text editor and add the following lines at the end: ``` export EGS_HOME=/path/to/egsnrc export EGS_CONFIG=/path/to/egsnrc/HEN_HOUSE/specs/your_machine_name export PATH=$PATH:$EGS_HOME/bin/$my_machine ``` Replace "/path/to/egsnrc" with the actual path where you extracted the EGSnrc distribution, and "your_machine_name" with the appropriate name for your machine (e.g., linux). 6. Compile EGSnrc: Open a terminal and navigate to the EGSnrc directory. Run the following command to compile EGSnrc: ``` make ``` 7. Verify the installation: After the compilation process completes successfully, you can run some example simulations provided with EGSnrc to verify the installation. Refer to the EGSnrc documentation for instructions on running the examples. That's it! You have now successfully installed EGSnrc on Ubuntu. Remember to consult the EGSnrc documentation for further usage instructions and details.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值