国内IT跟国外差的何止是技术？

最新推荐文章于 2022-08-01 22:30:56 发布

徐力彬

最新推荐文章于 2022-08-01 22:30:56 发布

阅读量865

点赞数

今天一早收到一封老外的邮件,让我甚是惭愧.

邮件是之前我提问,关于改了某个开源框架的源代码,实现一个功能.但那个开源框架升级了,我看了下源码,想像老版本那样改下代码来实现我那个功能,貌似很复杂.于是发邮件问了下那个老外.其实很早便知道,老外回答人问题很耐心,而且技术方面回答得也很全面,很系统,很详细.但今天收到的那封邮件,让我更惊讶.

惊讶之余,一直在反思.于是有了这个话题.国内IT跟国外差的何止是技术？就从回答问题这个角度,我在国内貌似就没碰到多少能够做到这样的人.曾经看到IK(一个中文分词开源框架)的作者,在群内回答别人的问题,小到IDE工具怎么使用,都回答得很详细.让我很是敬佩!

在国内我也一直没感觉到多少所谓的牛人、高手、大师、专家的存在.有也只是半桶水,我曾经也如此自狂过.买本书,不是抄袭就是翻译的。网上搜点资料,更加转载了N遍.至于找那些牛人大师专家们,更加不知道在东南西北。网上碰到个感觉是牛人的,貌似忙得团团转.有回答也是YES或者NO顶多这种小问题你自己去找点资料看.

慢慢的也突然被人称作"牛人".很是诧异.就我这水平也成了当年我仰慕又鄙视(鄙视的原因是以上内容)的牛人啦?为了做个当年所谓的好牛人而不是很忙很忙很忙的牛人,于是将一些经验写到博客上,建Q群,回答别人的问题.慢慢的也开始同情起牛人来了.

一个相同的问题,你可以回答一百、一千遍,但回答一万遍是啥感觉?牛人还真的很忙,任务越来越多,有的还是稀奇古怪的,从来没做过,也从来没想做过.但你是牛人嘛,不得不做.还有就是问问题的,也没经过大脑.很多问题网上一搜遍有.只知道问How,而不知道问Why.更有甚者,问你有没有时间,帮我远程调试下(人数还不少)...

于是也慢慢的少在自己建的群里说话,QQ平时也是忙碌.到了晚上就干脆隐身...只有碰到有质量的问题(话题)才去忽悠急句.哎,看来,想忽悠也很难啊...

这里再次重申,我非牛人,真的非牛人,真心话,非客套、表面、官腔...

我不知道大家会怎么想?我是感觉社会很浮躁.本来该喝静心口服液写代码的我们,却也浮躁了起来.但感觉也是环境所迫!曾经我感慨着说要是写代码不用考虑吃饭的问题那该多好.但那只是一个梦!

提供工作的,公司八字还没一撇就说将来会上市.你过来,只是待遇低点,但公司前景大大的好,将来会咋的咋的.写代码的,没个几年就说自己咋的咋的,你不开多少价我就不过去.我曾经看到有公司在群里发招聘消息,但第一句通常是什么待遇？一旦是非知名企业发布的,大家你一言我一言,一片质疑声.但换个角度你自己自荐,简历送上去,那家公司又会感觉你是个菜鸟.心里会想有这么牛,还用投简历到我这里吗?google老总都要来三顾茅庐!

眼下是工作时间,我不知道这里写这对工作没用的博客(对我的公司而言)是不是个杯具!但感觉还是要说一说,任务我会加班完成！

慢慢的自己写代码也有几年.一直想找个点做下去(我感觉那才是我的出路),却很难,真的很难!技术方面公司非得逼你做这做那,总有弄不完的新东西.其实也不想跳槽,HR眼里咬牙切齿的频繁跳槽!但不是你非礼了我,就是我甩了你!

有没有一家能让自己把它也当成事业的公司?非得自己创业才是事业?

那历史就不应该有三顾茅庐!

最后贴上那封邮件.邮件是关于网络爬虫Heritrix的增量抓取.有兴趣的同学可以看下.

由于Heritrix是一次性抓取,抓取完毕后就结束或者暂停.但这不适合于增量抓取,于是我修改了源代码实现了该功能(让它抓完一次就重新获取种子再去抓取,如此循环下去).但Heritrix升级了,我感觉新特性很强悍,但要在新版本实现该功能,改代码貌似就没这么方便了.于是写了封邮件,请教了下Heritrix的作者。他的回复深深的吸引了我,于是有了这篇文章.下面贴上邮件内容:

1)我的提问(蹩脚英文别见笑):

Hi all,

i know H3.0 can run continous crawl by some config(FetchHistoryProcessor PersistStoreProcessor PersistLoadProcessor )。but there is a problem:when crawl finished,Heritrix will pause or exit;but i want to let it crawl again.for example:reload the seeds.

i have modified Heritrix 1.14.3's code,like this:

class:

org.archive.crawler.frontier.WorkQueueFrontier

function:

public CrawlURI next() throws InterruptedException, EndedException {

while (true) { // loop left only by explicit return or exception

// modify by me,add these codes.

// if there are no urls to get,let it to reload seeds

synchronized (this) {

if (this.controller.getFrontier().isEmpty()) {

loadSeeds();

}

}

long now = System.currentTimeMillis();

// Do common checks for pause, terminate, bandwidth-hold

preNext(now);

if one thread can not get url from this function，these code what add by me will reload the seeds and by synchronized,so that the othre threads who want to get url will wait for that thread reload seeds finished,and they can get url too.

it worked well.and any body can give me some advices?and how about Heritrix3.0?

2)Heritrix的作者的回复:

This is an interesting use case.

I would not recommend hooking this reload into the frontier's next()
method, in H1 or H3, as that's a very central place. (If it's working
for you in H1, that's OK, but I would still think there may be timing
risks doing it there.)

Ideally you'd want to hook off some event which happens when the
frontier is empty, but before the crawl commits to a full stop. I can't
think of an existing method/event that exactly meets that need, but it
seems like a good idea, and I'll look into the possibility of adding a
clean, convenient place for this in H3.

In the meantime, some ideas would be:

• look at the ReschedulingProcessor as an example of how to use a new
CrawlURI property, 'rescheduleTime', to force a URI to be revisited at a
specific time in the future. While this will not allow your seeds to be
visited at the moment the frontier is empty, you could use it to ensure
your seeds are revisited every N days/hours/minutes/seconds/etc.

• look at the CrawlLimitEnforcer as an example of an optional component
that watches ApplicationEvents from the crawl. Events to potentially
listen for include CrawlStateEvent, CrawlURIDispositionEvent, and
StatSnapshotEvent. By either noticing when the crawl is *almost* done,
or noticing the pause-at-finish, your component could then re-enqueue
your seeds at that time (and unpause the crawl if necessary).

Each of these techniques should allow your desired effect by adding a
new optional Processor or listening bean, rather than editing core
crawler code.

Hope this helps,

- Gordon @ IA

再附上IK作者的博客:

linliangyi2007 (林良益) 博客： http://linliangyi2007.iteye.com/

徐力彬

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
国内IT跟国外差的何止是技术？

今天一早收到一封老外的邮件,让我甚是惭愧. 邮件是之前我提问,关于改了某个开源框架的源代码,实现一个功能.但那个开源框架升级了,我看了下源码,想像老版本那样改下代码来实现我那个功能,貌似很复杂.于是发邮件问了下那个老外.其实很早便知道,老外回答人问题很耐心,而且技术方面回答得也很全面,很系统,很详细.但今天收到的那封邮件,让我更惊讶. 惊讶之余,一直在反思.于是有了这个
复制链接

扫一扫