Explainability & Reviewing: The responsbility finally goes back to the audience, i.e EVERYONE-CSDN博客

本文链接：https://blog.csdn.net/weixin_43928665/article/details/118955921

本文探讨了机器学习的强大和透明度缺失带来的问题，强调了解释性和洞察力的重要性。作者批判了只追求更好数值而忽视深入理解的论文趋势，呼吁回归到对研究过程和动机的重视。讨论了AI不透明可能导致的道德风险和科技进步测量难题。

摘要由CSDN通过智能技术生成

Emerging trends: I did it, I did it, I did it, but …

— Kenneth Ward Church, 2017, Natural Language Engineering

A call for Explanation — Insights should be much more valued than numbers

It is considered a feature that ML has become so powerful (and so opaque) that it is no longer necessary (or even relevant) to talk about how it works. [Church & Hestness 2019 A Survey of 25 Years of Evaluation]

Does it make it ok for machines to do bad things if no one knows what's happening and why, including those of us who created the machines?

There has been a trend for publications to report better and better numbers, but less and less insight.

Years ago, someone from an industrial lab presented a talk at a conference saying basically I did it, I did it, I did it, but I’ll be damned if I’ll tell you how! I had a strong allergic reaction to this talk because I was worried that my employer might ask me to publish similar papers so they could take credit for my results while protecting their intellectual property as trade secrete. Since then, I have often argued that we need to reject papers that try to pull this kind of stunt. We can’t afford papers that report results without insights.

It reminds me of [Bowman 2021 What will it take to fix benchmarking in NLU]: Ultimately, the community needs to compare the cost of making serious investments in better benchmarks to the cost of wasting researcher time and computational resources due to our inability to measure progress. If insightless papers with better numbers were chosen to be published, then, indeed, the progress could be “hard to gauge”.

O’Neil argues in Weapons of Math Destruction that big data increase inequality and threaten democracy largely because of opacity. Numbers offer the sheen of objectivity; algorithms seem to ‘transcend morality’, as O’Neil put it.

No one miss the old days: It was no longer necessary to think about degrees of freedom as we did in the bad old days when we used to worry about feature selection. It used to be considered necessary and desirable to have more observations than parameters, but these days it is no longer necessary to worry about such details with modern neural nets.

How could ML work if there really are more degrees of freedom than observations? Modern optimizations are so complicated that it is hard to address traditional questions like degrees of freedom, significance of each parameter and ANOVA. Such questions were well-understood for simple optimization methods such as regression, but the literature has less to say about such questions for more modern optimization methods, though there are a few suggestions such as [LeCun 1989 Optimal Brain Damage] LeCun gave suggestions upon how to measure "significance of a parameter in a network or a network’s “information content” to move beyond the notion that “complexity = the number of free parameters”.

Neural nets are great for many tasks, but they haven’t yet automated researchers out of a job. Research is harder than just pushing a button and waiting for the optimization to converge on a publishable result.

I worry that the literature may be turning into a giant leaderboard.

A reviewing burdens continue to become more and more oneraous, reviewers are looking for easier and easier ways to discharge responsibility. Leaderboards provide a useful service by helping the audience figure out how the proposed solution stacks up to the competition, but that should be just a starting point to motivate a more interesting discussion on why the proposed solution works as well as it does.

I prefer to believe that cheating is rarely caught because there is so little to catch. In any case, there are lots of standard tricks that we have all seen way too often like weak baselines, mindless metrics, lack of transparency, etc.

The It is nice to see the field come together as it has, but we may have been too successful.

Final Words: Whatever you measure, you get

The Writer, The Reviewer and the Audience are the same group of people

The work tends to be better if authors are advocating positions that they care passionately about for reasons that go beyond personal gain. Apparently that will not be the case at least in the short-term. But at the end of the day, the ultimate satisfaction comes from meeting (and exceeding) audience expectations. The audience must demand more than merely good numbers. It is the responsibility of the audience to expect both good numbers as well as insights, and vote early and often with citations.