两个行业的故事:编程语言与富裕国家和发展中国家之间的差异

103 篇文章 0 订阅
14 篇文章 0 订阅

A Tale of Two Industries: How Programming Languages Differ Between Wealthy and Developing Countries

技术与人均国内生产总值相关
在最近的一篇文章中,我们看到Android的流量问题(占国家堆栈溢出访问的百分比)往往与一个国家的人均GDP负相关。 这可能会导致我们想知道任何其他标签是否也是如此。
当我们探索主要的编程语言和平台时,除了Android之外,其他一些表现出来包括PHP,Python和R.


Android和PHP流量的数量与国家的收入呈负相关,而Python和R则呈正相关。在每种情况下,我们可以看到例外(韩国使用的Android比我们预期的要多,而中国更多的是Python),但一般来说,相关性很强。 (每个具有约0.5-6的R2,对于多次测试进行调整后,p值<10-6)。


我们会强调,我们在这里并不表示任何因果关系。我们当然不是说编程语言选择影响到一个国家的平均收入,但是我们也不是说一个国家的财富直接影响到他们对技术的使用。我们怀疑司机可能会混合经济和社会因素(教育水平,软件行业的年龄,外包水平),这通常与一国的财富相关。


我们如何将软件开发行业分为两个?


当我们研究趋势时,谈论两组国家(高收入和非高收入)是有用的,而不是考虑一堆相关性。作为一个有用的预先存在的分类,我们可以使用世界银行的收入分类,这是根据人均国民总收入(国民总收入)(参见这里讨论这一分类)。



There are 78 high-income economies, largely made up of the US and Canada, Western Europe, parts of the Middle East and East Asia, and Australia/New Zealand. I’ve done some analyses of the fundamental drivers of the between-country variation (such as principal component analysis) that suggest this is a reasonable division, and that it’s more meaningful than other ways we could divide them, such as Eastern vs Western Hemisphere. (For instance, Australia is generally more similar to the US and Europe in terms of visited technologies than it is to China or Indonesia).


The division splits Stack Overflow traffic into groups of about two-thirds and one-third: 63.7% of Stack Overflow’s traffic comes from high income countries. (This likely is due to a combination of greater proportion of software development, more widespread internet access, and a disproportionate share of English-speakers). Much of the traffic from non-high-income countries comes from India, followed by Brazil, Russia, and China.

How do high-income countries differ in the technologies they use?

We’ve now divided the software development world into two segments. How do high-income and non-high-income countries differ in terms of the technologies they use?



We can extract several interesting insights:

  • Difference in data science technologies: As we saw earlier, Python and R are associated with a country’s income. Python is visited about twice as often in high-income countries as in the rest of the world, and R about three times as much. We might also notice that among the smaller tags, many of the greatest shifts are in scientific Python and R packages such as pandasnumpymatplotlib and ggplot2. This suggests that part of the income gap in these two languages may be due to their role in science and academic research. It makes sense these would be more common in wealthier industrialized nations, where scientific research makes up a larger portion of the economy and programmers are more likely to have advanced degrees.

  • C/C++: C/C++ are two other notable languages that tend to be visited from high-income countries. One hypothesis is that this may have to do with education: as we saw in a previous post, C and C++ are among the languages more disproportionately visited from American universities. It could also be related to the geographic distribution of the electronics and manufacturing industries.

  • PHP and Android: We explored Android development around the world in a previous post, but PHP is another technology that’s notably associated with lower-income countries. It’s interesting to see that CodeIgniter, a PHP open source framework, is the tag that’s singularly most disproportionately visited from lower-income countries, by a large margin. Further examination shows it is especially heavily visited in South/Southeast Asia (particularly India, Indonesia, Pakistan and the Philippines) while it has very little traffic from the US and Europe. It’s possible that CodeIgniter is a common choice for outsourcing firms building websites.

Conclusion: why does this matter?

I was certainly interested in these results as a fun fact about the programming language ecosystem. But it also has implications for other data explorations we’ll be publishing in the near future.

When we ask questions about the software development industry, it’s important to know that we’re really answering two separate questions that have been “blended” together, and that separating them can sometimes give us more informative answers.

For example, we’re often interested in understanding which technologies drive the most traffic, such as examining technologies like Flash that are shrinking over time. If we were to create a list of the most visited programming technologies, it would be different for high-income and low-income countries:




评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值