民意调查的样本量重要吗

You see it all the time on social media. Someone’s doing a poll and they add on “RT for a bigger sample size/more accuracy” at the end of it. Someone else argues with an opinion poll because “they only asked 1,000 people and no one I know has ever been itterviewed”. Finally, someone explains that while they might have only asked a few people for their research report, it’s still a bigger ratio than polling companies manage for the whole country, so therefore it must be accurate.

您一直在社交媒体上看到它。 有人进行了调查,他们在调查的末尾添加了“ RT以获得更大的样本量/更高的准确性”。 有人用民意测验来争辩,因为“他们只问了1000人,而我认识的人从未受到过反感”。 最后,有人解释说,尽管他们可能只要求几个人提供他们的研究报告, 但该比率仍然比全国范围内的民意调查公司更大 ,因此,它必须准确。

And in response to all three examples, people who actually understand how statistics and polling work wince at seeing an idea being so badly misunderstood.

针对这三个示例,真正了解统计和民意测验工作的人会看到一个想法被严重误解了。

Now, I appreciate on a surface level that opinion polling and surveying can seem like they’re just guesswork. The idea that you could ask a thousand or so people and get a broadly accurate picture of the way the nation from questioning them does seem absurd on its own. If you can tell the views of millions from a sample of a thousand, then why not a hundred, or ten, or even just one, if you could find the right person to represent the nation as a whole?

现在,我从表面上理解,民意测验似乎只是猜测。 您可以问一千个左右的人,并且大致了解这个国家免于质疑他们的方式的想法,这似乎是荒谬的。 如果您可以从一千个样本中分辨出数以百万计的观点,那么, 如果可以找到合适的人来代表整个国家 ,那为什么不一百,十个,甚至只有一个呢?

This misses that the important part about sampling is that it’s not just grabbing a thousand or so people, asking them their opinions and then totalling them up at the end. It’s not just any sample, it’s one that’s both random and representative.

这错过了有关采样的重要部分的问题,即它不只是吸引一千个左右的人,询问他们的意见,然后在最后汇总他们。 它不仅是样本,而且是随机样本和代表性样本。

The ideal way to get a random sample is to have a list of the entire population you want to survey and then pluck your sample from that. So, if you had a city with a population of a million, you could go through the list of all of them, take the name of every thousandth person, and end up with your sample, who’d you then go an interview. As this technique isn’t easy to do in the real world, polling and research companies have found all sorts of other methods to attempt something similar. These methods also attempt to counter the problem that many people don’t respond very well to being asked their opinion by a stranger — in order to get a thousand responses, you’ll need to ask a lot more than a thousand people.

获取随机样本的理想方法是列出要调查的整个人口,然后从中抽取样本。 因此,如果您有一个人口为一百万的城市,则可以浏览所有这些人的名单,以千分之一的名字命名,最后给出样本,然后由谁来接受采访。 由于这种技术在现实世界中不容易实现,民意测验和研究公司发现了各种各样的其他方法来尝试类似的事情。 这些方法还试图解决许多人对陌生人提出的意见React不佳的问题-为了获得一千个回应,您需要询问的人数超过一千。

Solving this problem is something polling and research companies spend a lot of time and effort on. The questions of how to get a random sample of the population, and then how to get them to answer your questions when you’ve identified them is the key to being able to do their jobs and the quicker and cheaper they can do that, the better for them. Every company, and every academic polling research team has their own way of solving these issues. It’s also worth remembering that for many polling companies, political polling is a sideline (and even a loss-leader) but they do it because it’s a very good way of advertising your accuracy.

解决这个问题是民意测验和研究公司花费大量时间和精力的事情。 如何获得总体样本的问题,以及如何在确定他们后让他们回答您的问题的问题,这是能够胜任他们工作的关键,而他们能够更快,更便宜地做到这一点,对他们更好。 每个公司和每个学术民意测验研究小组都有自己解决这些问题的方式。 还值得记住的是,对于许多投票公司而言,政治投票是副业(甚至是亏损领头人),但他们这样做是因为这是宣传您的准确性的一种非常好的方法。

Another thing to remember with a random sample is that it’s the responsibility of those doing the survey to find their sample, not to expect their subjects to come to them. As an example, a large part of the British general election exit poll is still done as a face-to-face survey outside polling stations, but the interviewer doesn’t stand under a sign saying “please come speak to me about your vote”. To get a random sample, they approach a certain proportion of the people who vote and ask them to take part. By approaching every tenth person to come out a polling station, they’re getting a random sample of the people who vote there, not a self-selecting sample of those who’d choose to come to them.

随机样本要记住的另一件事是,进行调查的人有责任找到他们的样本,而不是期望他们的受试者来找他们。 例如,英国大选退出民意调查的大部分内容仍是在投票站外进行的面对面调查,但是采访者并没有站在标语中说“请与我谈谈您的投票” 。 为了获得随机样本,他们会与一定比例的投票者接触,并要求他们参加。 通过接近十分之一的人来投票站,他们得到的是在那儿投票的人的随机样本,而不是那些选择来投票的人的自选样本。

The second key part is making sure that your sample is representative of the population as a whole. In an ideal world, your random sample will have been a generally accurate representation of the demographics of your wider population, but even if that is the case, different response rates from different parts of the population might mean that your data is skewed because you’ve ended up speaking to too many people from one segment of the population and not enough from another.

第二个关键部分是确保您的样本代表整个人口。 在理想的世界中,您的随机样本通常可以大致代表更广泛人群的人口统计数据,但是即使是这种情况,来自不同人群的不同响应率也可能意味着您的数据存在偏差,因为最后,我与一群人中的太多人交谈,而与另一群人的交谈却不够。

The question then is whether the data you have can be weighted to attempt to balance out that skew, or if you need to scrap it and start again. As an example, if your survey of a thousand has responses from 550 women and 450 men, you can probably weight the responses to make it closer to the 510 and 490 a fully representative sample would have. If you’ve got 999 of one and 1 of the other, then you really need to start again (and ask some serious about your sampling methodology).

然后的问题是,是否可以对您拥有的数据进行加权以尝试平衡该歪斜,或者是否需要报废并重新开始。 举例来说,如果您对一千个样本所做的调查中有550名女性和450名男性的回答,那么您可以权衡答案,使其更接近完全具有代表性的样本510和490。 如果您有一个中的999个,另一个中的1个,那么您确实需要重新开始(并认真询问采样方法)。

The key thing about weighting is that it’s being done on a random sample of the whole population in line with known demographics. You can’t do it on a sample that’s neither random or representative to fix it. That’s why you can’t make a self-selecting poll accurate by weighting the responses because there’s a bias at the heart of the data towards those who choose to respond and those who’ve heard of your survey.

加权的关键是要根据已知的人口统计数据对整个人口进行随机抽样。 您不能在没有随机性或代表性的样本上进行修复。 这就是为什么您无法通过加权响应来使自选民意调查准确的原因,因为数据的核心偏向于那些选择回答和听过您调查的人。

I’m not going to go heavily into the maths of sample size and accuracy here because they’re complex and I don’t have any of the key texts to hand so I’d likely make a mistake in explaining them. The maths of sampling originate in mathematical probability, but the key thing to understand is that the relationship between sample size and accuracy is not a straight line. Having twice as many responses does not make a survey twice as accurate and even if your sample is sufficiently random and representative it requires a drastic increase in the amount of data to make a small increase in the level of accuracy.

在这里,我将不着重于样本数量和准确性的数学运算,因为它们很复杂,而且我手头没有任何关键文本,因此在解释它们时可能会犯错误。 采样的数学起源于数学概率,但要理解的关键是样本量与准确性之间的关系不是直线。 拥有两倍的回答不会使调查的准确性提高两倍,即使您的样本足够随机且具有代表性,它也需要大幅增加数据量才能使准确性水平有所提高。

If your sample isn’t properly random or representative to begin with, however, then it doesn’t matter how large your sample is, it’s only ever going to be accurate by accident. This is not new information, it’s something we’ve known since the 1930s and the dawn of opinion polling as we know it, most famously demonstrated in the 1936 US Presidential election, when George Gallup’s poll of thousands of people proved much more accurate than the Literary Digest’s poll of millions because he used proper sampling and they didn’t.

但是,如果您的样本没有适当的随机性或代表性,那么样本的大小无关紧要,这只是偶然的结果。 这不是新信息,这是我们自1930年代以来就知道的,民意测验的曙光众所周知, 最著名的例子是1936年美国总统大选 ,当时乔治·盖洛普(George Gallup)对数千人进行的民意测验比事实更准确。 文学文摘 ( Literary Digest )进行了数以百万计的民意调查,因为他使用了适当的抽样方法,但他们没有使用。

The important thing to remember is that the sample size of a poll is important in determining how accurate it is, but only if that sample has been obtained in a way that ensures it’s random and representative. The number of people who’ve responded is not important if they’ve not been accurately selected to participate, no matter how much you might like the results they generate.

要记住的重要一点是,民意调查的样本大小对于确定样本的准确性非常重要,但前提是必须以确保样本具有随机性和代表性的方式获得样本。 如果未正确选择参加会议的人员数量,那么无论您希望他们产生多少结果,参与人员的数量都不重要。

翻译自: https://medium.com/@nick.barlow/is-the-sample-size-of-a-poll-important-2b25b5bfe64d

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值