知乎周源微信_每周源代码9-WideFinder版

最新推荐文章于 2024-09-01 15:30:33 发布

cunfuteng7334

最新推荐文章于 2024-09-01 15:30:33 发布

阅读量164

点赞数

文章标签：编程语言 java python linux ruby

原文链接：https://www.hanselman.com/blog/the-weekly-source-code-9-widefinder-edition

版权

知乎周源微信

In my new ongoing quest to read source code to be a better developer, I now present the ninth in an infinite number of a weekly series called "The Weekly Source Code." Here's some source I'm reading this week that I enjoyed.

在我的新不断追求阅读源代码，是一个更好的开发者，我现在每周一次的系列名为无限数量呈现第九“每周源代码”。这是我本周喜欢的一些资料。

Ya, I know this one is just 4 days after the last one, but I was having too much fun and couldn't wait until Wednesday. Plus, it's a new week so poop on you.

是的，我知道这是最后一个比赛之后的第四天，但是我玩得太开心了，等不及要等到星期三。另外，这是新的一周，所以请大便。

Last month Tim Bray shared his experiences writing a program that does, well, here's Joe Cheng's succinct description. Tim calls the project WideFinder.

上个月，蒂姆·布雷(Tim Bray)分享了他编写程序的经验，该程序很好地完成了Joe Cheng的简洁描述。蒂姆将项目称为WideFinder 。

The Wide Finder challenge is to write a program that:

Wide Finder的挑战是编写一个程序，该程序：

Scans logfiles for hits on blog articles
扫描日志文件以查找博客文章中的点击
which are counted and
被计算在内
sorted with the
与排序
top 10 most popular being printed to stdout. It should also
前10大最受欢迎的被打印到标准输出。 它也应该
be about as elegant and concise as Tim’s Ruby version and
和Tim的Ruby版本一样优雅简洁
its performance should scale with additional CPU cores.
它的性能应随其他CPU内核而扩展。

And this is done on a fairly large log file of about 250 megs. While Item #6 is the most interesting, many folks are focusing on Item #5. Either way, it's a heck of a lot more interesting problem than FizzBuzz and worth adding to your interview ~~arsenal~~ pocket.

这是在大约250兆的相当大的日志文件上完成的。虽然第6项是最有趣的，但许多人都在关注第5项。无论哪种方式，这都是比FizzBuzz有趣得多的问题，值得添加到您的采访 ~~武器库中~~ 。

I encourage you to go check out Tim's site as he's continued to list the sources that he finds most interesting. As a primarily C# programmer who's always trying to stretch out of my comfort zone, here's what I've found interesting, in the order I found them interesting.

我鼓励您访问Tim的网站，因为他继续列出了他最感兴趣的来源。作为一个主要的C＃程序员，他总是试图超出我的舒适范围，这就是我发现有趣的地方的顺序，按照我发现它们有趣的顺序。

Don Box's Naive Implementation in C# 3.0 - Apparently this is the kind of code Don can write after two beers. Notice the use of yield to make this "LINQ over a text file of CR/LF strings." That's one of those write-it-constantly-over-and-over-again helper methods that makes me wonder why it wasn't just included.
Don Box在C＃3.0中的朴素实现-显然，这是Don在喝完两杯啤酒后可以编写的代码。请注意使用yield来制作“在CR / LF字符串的文本文件上的LINQ”。那就是那些不断重复地写它的辅助方法之一，这让我想知道为什么不仅仅包含它。

    static void Main(string[] args)
    {
        var regex = new Regex(@"GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+)");
 
        var grouped = from line in ReadLinesFromFile(@"C:\temp\bray.txt")
                      let match = regex.Match(line)
                      where match.Success
                      let url = match.Value
                      group url by url;
  
        var ordered = from g in grouped
                      let count = g.Count()
                      orderby count descending
                      select new { Count = count, Key = g.Key };
        
        foreach (var item in ordered.Take(10))
            Console.WriteLine("{0}: {1}", item.Count, item.Key);
    }
 
    // LINQ-compatible streaming I/O helper
    public static IEnumerable<string> ReadLinesFromFile(string filename)
    {
        using (StreamReader reader = new StreamReader(filename)) {
            while (true)
            {
                string s = reader.ReadLine();
                if (s == null)
                    break;
                yield return s;
            }
    }

Joe Cheng tightens it up with his LINQ skillz and does the group and sort all in one swell foop. As an aside, I'd like to see his QuickTimer class for next week. Nice use of my favorite C# idiom - IDisposable/using. Joe also alludes to some parallelism that could be easily added with PLINQ. Maybe we'll see that code soon.
Joe Cheng用LINQ的技巧加强了工作，并一口气地进行小组整理。顺便说一句，我想在下周看他的QuickTimer课程。很好地使用了我最喜欢的C＃习惯用法-IDisposable / using。 Joe还提到可以通过PLINQ轻松添加的一些并行性。也许我们会很快看到该代码。

using (new QuickTimer(“Total time”))
{
    IEnumerable<string> data = new LineReader(args[0]);

    Regex regex = new Regex(@”GET /ongoing/When/\d\d\dx/\d\d\d\d/\d\d/\d\d/([^ ]+) “,
        RegexOptions.Compiled | RegexOptions.CultureInvariant);

    var result = from line in data
                 let match = regex.Match(line)
                 where match.Success
                 group match by match.Groups[1].Value into grp
                 orderby grp.Count() descending
                 select new { Article = grp.Key, Count = grp.Count() };

    foreach (var v in result.Take(10))
        Console.WriteLine(“{0}: {1}”, v.Article, v.Count);
}

My programmer's man-crushes continue as Jomo Fisher posts the WideFinder Naive F# Implementation. Notice how everyone uses "naive" to basically say "I'm sure it could be better, so don't be mean." I can't tell you with a straight face that I totally understand this. It's kind of magical.
当Jomo Fisher发布WideFinder Naive F＃实现时，我的程序员的心血继续。请注意，每个人都是如何使用“天真”来基本说“我敢肯定这会更好，所以不要刻薄”。我不能直率地告诉你我完全理解这一点。有点神奇。

#light
open System.Text.RegularExpressions
open System.IO
open System.Text
 
let regex = new Regex(@"GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+)", RegexOptions.Compiled)

let seqRead fileName =
    seq { use reader = new StreamReader(File.OpenRead(fileName))
          while not reader.EndOfStream do
              yield reader.ReadLine() }

let query fileName = 
    seqRead fileName
    |> Seq.map (fun line -> regex.Match(line)) 
    |> Seq.filter (fun regMatch -> regMatch.Success)
    |> Seq.map (fun regMatch -> regMatch.Value)
    |> Seq.countBy (fun url -> url)

*And here's the code to call it:    

for result in query @"file.txt" do 
    let url, count = result

Busting out of the Microsoft Languages for a minute, here's Tim's Ruby example:
暂时退出Microsoft Languages，这是Tim的Ruby示例：

counts = {}
counts.default = 0

ARGF.each_line do |line|
  if line =~ %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }
    counts[$1] += 1
  end
end

keys_by_count = counts.keys.sort { |a, b| counts[b] <=> counts[a] }
keys_by_count[0 .. 9].each do |key|
  puts "#{counts[key]}: #{key}"
end

Here's the Java version from UnintentionalObjectRetention. I haven't done Java since I worked at Nike in 1997 as a contractor:
这是UnintentionalObjectRetention的Java版本。自从我1997年在耐克公司担任承包商以来，我还没有做过Java：

public class WideFinder {     
  public static void main(String[] args) throws IOException {      
    Map<String, Integer> counts = new HashMap<String, Integer>();      
    Pattern p = Pattern.compile("GET /ongoing/When/\\d\\d\\dx/(\\d\\d\\d\\d/\\d\\d/\\d\\d/[^ .]+) ");  
    BufferedReader in = new BufferedReader(new InputStreamReader(           
         new FileInputStream(args[0]), "US-ASCII"));         
    String line = null;     
    while ((line = in.readLine()) != null) {       
       Matcher m = p.matcher(line);       
       if (m.find()) {            
         String key = m.group();     
         Integer currentCount = counts.get(key);   
         counts.put(key, (currentCount == null ? 1 : (currentCount + 1)));     
       }
    }    
    in.close();      
    List<Entry<String, Integer>> results = new ArrayList<Map.Entry<String, Integer>>(counts.entrySet());      Collections.sort(results, new Comparator<Entry<String, Integer>>() {   
       public int compare(Entry<String, Integer> o1, Entry<String, Integer> o2)        
       {            
          return o2.getValue().compareTo(o1.getValue());     
       }     
     });   
     for(int i = 0; i < 10; i++) {    
       System.out.println(results.get(i));    
     }
  }
}

And last, but kind of first, here it is in LISP from Nate.
最后，但首先，这里是Nate的LISP中。

(defun run (&rest logs)
  (let ((counts (make-hash-table :test #'equal)))
    (dolist (filename logs)
      (with-open-file (stream filename :direction :input
                              :external-format :latin-1)
        (loop for line = (read-line stream nil stream)
           until (eq line stream)
           do (cl-ppcre:register-groups-bind (match)
                  ("GET /ongoing/When/\\d{3}x/(\\d{4}/\\d{2}/\\d{2}/[^ .]+) " line)
                (incf (gethash match counts 0))))))
    (loop for key being the hash-keys of counts
       collect key into keys
       finally (map nil #'(lambda (x)
                            (format t "~D: ~A~%" (gethash x counts) x))
                    (subseq (sort keys #'>
                                  :key #'(lambda (x) (gethash x counts))) 0 10)))))

A good way to understand other languages (programming or human) is to read the same story in each of these languages and compare them. Tim's problem serves that purpose well!

理解其他语言(编程语言或人类语言)的一种好方法是用每种语言阅读相同的故事并进行比较。蒂姆的问题很好地达到了这个目的！

Oh, and if you want to see why we program in Managed Code, check out the C version.

哦，如果您想了解我们为什么使用托管代码编程，请查看C版本。

Feel free to send me links to cool source that you find hasn't been given a good read.

随时向我发送指向很酷的资源的链接，您发现这些链接没有得到很好的阅读。

翻译自: https://www.hanselman.com/blog/the-weekly-source-code-9-widefinder-edition

知乎周源微信

cunfuteng7334

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
知乎周源微信_每周源代码9-WideFinder版

知乎周源微信In my new ongoing quest to read source code to be a better developer, I now present the ninth in an infinite number of a weekly series called "The Weekly Source Code." Here's some source I'm rea...
复制链接

扫一扫