5g创业的构想_数据科学项目的五个具体构想

5g创业的构想

Do you want to enter the data science world? Congratulations! That’s (still) the right choice.

您想进入数据科学世界吗? 恭喜你! 那(仍然)是正确的选择。

The market currently gets tougher. So, you must be mentally prepared for a long hiring journey and many rejections. I assume that you have already read that a data science portfolio is crucial and how to build it up. Most of the time, you will do data crunching and wrangling and not applying fancy models.

目前市场变得更加艰难。 因此,您必须为漫长的征途和许多拒绝做好心理准备。 我认为您已经读过数据科学资料集至关重要,以及如何构建它 。 在大多数情况下,您将进行数据打乱和整理,而不应用复杂的模型。

One question that I am asked on and on is about concrete data sources for cool data and project opportunities to build such a portfolio.

我不断被问到的一个问题是关于酷数据的具体数据源和建立这样一个项目组合的项目机会。

I give you the following five ideas for your data science portfolio and a few hints on developing uniqueness.

我为您的数据科学产品组合提供以下五个想法,以及有关发展独特性的一些提示。

数据科学项目的五个具体构想 (Five Concrete Ideas for Data Science Projects)

1. Customer analytics for a local non-profit organization

1.本地非营利组织的客户分析

An essential task of a non-profit organization is to find the right person, at the right place or location, in the right moment, approached with the right medium for donations for charitable activities. When that can be optimized, the non-profit organization can collect more funds and do more activities.

非营利组织的一项基本任务是在正确的时间,在正确的地点或地点找到合适的人,并以合适的媒介与之进行慈善活动的捐款。 如果可以对其进行优化,则非营利组织可以收集更多资金并开展更多活动。

What makes that project interesting?

是什么使该项目有趣?

First, most non-profit organizations have much data, not necessarily in digitized form, and often not in good quality. The main task is building a database, data crunching, and getting the data in a usable form. You learn to structure the whole data mess, which is still up to 80% of a data science job.

首先,大多数非营利组织拥有大量数据,不一定是数字化形式,而且往往质量也不高。 主要任务是建立数据库,处理数据并以可用形式获取数据。 您将学习构建整个数据混乱的结构,这仍然是数据科学工作的80%。

Second, you do something good for the local community, and you show your social responsibility. You interact with people who are not data experts. Both shows needed soft skills for a data science position.

其次,您为当地社区做点好事,并表现出您的社会责任。 您与不是数据专家的人进行交互。 两者都显示了数据科学职位所需的软技能。

I did voluntarily such projects for an organization that helps children in poverty and for an organization that provides care at home for elderly besides my professional job. Having these experiences builds trust in your person and is a door opener for many other exciting projects.

我自愿为帮助贫困儿童的组织以及为我的专业工作以外的家庭提供养老服务的组织自愿进行了此类项目。 拥有这些经验可以建立对您的信任,并为许多其他激动人心的项目打开大门。

Finally, non-profit organizations work the same as private banking or wealth management. They also have to acquire the right customer, at the right moment, with the right campaign to bring them money. And I can tell you; the data are also not of better quality than of a non-profit organization. You can directly leverage your experience in other industries.

最后,非营利组织的运作与私人银行或财富管理相同。 他们还必须在正确的时机通过正确的活动来获取合适的客户,以使他们赚钱。 我可以告诉你; 数据的质量也不比非营利组织的更好。 您可以直接利用您在其他行业中的经验。

How to start?

如何开始?

I found the non-profit organizations through my network. There is always somebody within your family, relatives, and friends engaged with a non-profit organization. Then, I agreed on a first get to know meeting and explained to them what my skills are and what is the value of such analyses. I have given them examples from Google and Facebook. And I searched for publicly available information about the increase in leads at other non-profit organizations to provide them with a flavor. After I have given them first the time to think a few days about it, and in each case, they came back and agreed to do the project. Then, I started the whole data crunching work.

我通过我的网络找到了非营利组织。 您的家人,亲戚和朋友中总会有人与非营利组织合作。 然后,我同意了第一次相识会议,并向他们解释了我的技能是什么,这种分析的价值是什么。 我给了他们谷歌和Facebook的例子。 然后,我在其他非营利组织中搜索了有关线索增加的公开信息,以向他们提供一种风味。 在我先给他们时间思考几天之后,在每种情况下,他们都回来了并同意做这个项目。 然后,我开始了整个数据整理工作。

When the data is ready to use, you can work through the classical descriptive, predictive, and prescriptive analytics cycle.

当数据准备好使用时,您可以完成经典的描述性,预测性和规范性分析周期。

2. CERN

2. 欧洲核子研究组织

The CERN is mainly known for its leading fundamental research in particle physics and the largest particle laboratory globally.

CERN主要以其领先的粒子物理学基础研究和全球最大的粒子实验室而闻名。

It is often unknown that the CERN makes most of its data, codes, algorithms, and tools they have developed and is using for their research, available to the public. They have sophisticated algorithm testing toolboxes and provide 1-, 2-, 3- and 4-dimensional images. And they have much more.

众所周知,欧洲核子研究中心(CERN)会将其已经开发并用于研究的大多数数据,代码,算法和工具公开提供给公众。 他们具有完善的算法测试工具箱,并提供1、2、3和4维图像。 他们还有更多。

The CERN does not call that all “innovation.” No, these are just “tools” to perform their “real” innovation task: new frontiers in particle physics.

欧洲核子研究中心(CERN)并不将其全部称为“创新”。 不,这些只是执行“真正”创新任务的“工具”:粒子物理学的新领域。

I can only highly recommend investing some time, browsing through their web pages and explore all the data and tools available for data analytics. It is one of their core businesses and on a very sophisticated level. I still learn a lot today and get many new ideas.

我仅强烈建议您花一些时间,浏览他们的网页并浏览所有可用于数据分析的数据和工具。 它是他们的核心业务之一,而且水平很高。 今天我仍然学到很多东西,并且得到了很多新的想法。

The web page is nested. Please do not lose your passion for the first time browsing it!

该网页是嵌套的。 请不要失去对第一次浏览的热情!

On the CERN Open Data Portal, you can find two petabytes of particle physics data, for starting your own analyses.

CERN开放数据门户上 ,您可以找到两个PB的粒子物理数据,以开始自己的分析。

What makes that project interesting?

是什么使该项目有趣?

When you start as a data scientist with a project, you typically only know that there are somewhere some data. First, you have to explore what data is available, where it can be found, whether it has redundancies, who has knowledge and access to the data, etc.

当您从一个项目的数据科学家开始时,您通常只知道某处有一些数据 。 首先,您必须探索可用的数据,可在何处找到,是否有冗余,谁拥有知识并可以访问数据等。

When starting with CERN data, the task is the same when you are unfamiliar with all the particle physics experiments. Luckily, I had in my data science teams always ex-CERN scientists, making it a lot easier to understand.

从CERN数据开始时,如果您不熟悉所有粒子物理学实验,则任务是相同的。 幸运的是,我在数据科学团队中一直是前CERN科学家,这使它更容易理解。

Second, having “CERN” on the resume is always an advantage, presupposed that some serious work had been done. Through the physics classes, published issues, webinars, and discussions, you can get part of the community. CERN employs about 2,500 people on-side and has approximately 17,500 contributing scientists globally. Many startup founders have a CERN community background.

其次,在简历上加上“ CERN”始终是一个优势,前提是必须进行一些认真的工作。 通过物理课程,已出版的问题,网络研讨会和讨论,您可以加入社区。 欧洲核子研究组织在全球拥有约2500名员工,在全球拥有约17,500名贡献科学家。 许多创业者都有CERN社区背景。

Last, you have sparse data, meaning the vital information represented in the data is rare. Of thousands or millions of data points, you only look for a few patterns to find and identify. Finding such sparse signals is essential in many fields: predictive maintenance, finding the billionaire ready to invest in your fund, or precision medicine.

最后,您的数据稀疏,这意味着数据中表示的重要信息很少。 在成千上万的数据点中,您只需要寻找一些模式即可找到和识别。 在许多领域中,找到这种稀疏信号至关重要:预测性维护,寻找准备投资您的基金的亿万富翁或精密医学。

How to start?

如何开始?

Start with getting familiar what the CERN is doing by browsing their web page and Wikipedia. On the Open Data Portal, you have a document link where a lot of background information including links to GitHub, and tutorials can be found. There is also a dedicated Data Science node. Look what the CERN scientists have already done, learn from them, and start analyzing individually selected datasets with your own methods.

首先通过浏览CERN的网页和Wikipedia来了解他们的工作。 在开放数据门户网站上,您有一个文档链接 ,可在其中找到许多背景信息,包括到GitHub的链接和教程。 还有一个专用的Data Science节点 。 看看CERN科学家已经做了什么,向他们学习,然后开始使用您自己的方法分析单独选择的数据集。

Working with CERN data is not a fast project, but a very instructive one. Besides, you can learn a lot about a topic on the frontier of physics.

使用CERN数据不是一个快速的项目,而是一个很有启发性的项目。 此外,您可以了解有关物理前沿的很多知识。

3. Omdena

3. Omdena

Omdena calls itself a collaborative AI platform. It brings project-wise 30–50 people together that solve with data and AI a real-existing problem in this world.

Omdena称自己为协作式AI平台。 它汇集了30-50名项目专家,他们通过数据和AI解决了这个世界上现存的问题。

Unlike a Kaggle competition, it is a real end-to-end project with all the project struggles. You are working in a team with different skills, and with all the interpersonal challenges. And you can have a real impact as all projects are linked to one of the UN’s 17 Sustainable Development Goals.

与Kaggle竞赛不同,它是一个真正的端到端项目,需要进行所有项目努力。 您正在一个具有不同技能并面临所有人际挑战的团队中工作。 由于所有项目都与联合国的17个可持续发展目标之一相关,因此您将产生真正的影响。

A good friend of mine with 20+ years as a data science expert contributes, on average, 20% of his time for projects on Omdena. And even he is saying that he is always learning a lot of new stuff.

我有20多年数据科学专家的好朋友,平均有20%的时间用于Omdena项目。 甚至他说自己一直在学习很多新知识。

Omdena needs a wide range of skills in the AI, data science and machine learning field, and expertise levels. You have to go through an application process, like applying for an internship, with the big difference that not competitive personalities are looked for but people with team spirit. They do not look only for experts. It is the spirit of collaboration.

Omdena在AI,数据科学和机器学习领域以及专业水平方面需要广泛的技能。 您必须经历一个申请过程,例如申请实习,这之间的最大区别在于,他们不是在寻找具有竞争能力的人,而是在寻找具有团队合作精神的人。 他们不仅寻找专家。 这是合作的精神。

What makes that project interesting?

是什么使该项目有趣?

You are part of a real-world data science project. There are no sugarcoated missions, data, and outcomes. It “just” has to solve a real issue with a data-driven approach. You are getting familiar with the whole data science project cycle, and you can experience the different stages and roles.

您属于真实世界的数据科学项目的一部分。 没有糖衣任务,数据和成果。 它“只是”必须使用数据驱动的方法解决实际问题。 您已经熟悉了整个数据科学项目周期,并且可以体验不同的阶段和角色。

Next, it is exciting to work side by side with experienced people and to get their mentorship. In just one project, you will learn more than in all your 10 MOOCs and Kaggle competitions.

接下来,与经验丰富的人们并肩工作并得到他们的指导很令人兴奋。 在一个项目中,您将学到的全部10项MOOC和Kaggle竞赛中所学到的知识都将多于其他项目。

And last but not least, you are getting a project certificate. Yes, it is another certificate besides your Coursera, Udacity, and university degrees, but it attests your practical experience.

最后但并非最不重要的一点是,您将获得项目证书。 是的,它是除了Coursera,Udacity和大学学位以外的另一种证书,但是它证明了您的实践经验。

How to start?

如何开始?

Look at the completed, ongoing and upcoming projects. Become familiar with Omdena’s approach and, when interested in participating, follow the guideline here.

查看已完成,正在进行和即将进行的项目。 熟悉Omdena的方法,如果有兴趣参加,请遵循此处的指南。

4. International and governmental organization

4. 国际和政府组织

Many international and governmental development organizations are now working data-driven. The UN, WHO, World Bank, International Finance Corporation, Inter-American Development Bank, and the European Bank for Reconstruction and Development are some. Also, most governments have task forces responsible for mission-driven data and AI projects and building an ecosystem.

许多国际和政府发展组织现在都在以数据为驱动力。 联合国,世界卫生组织,世界银行,国际金融公司,美洲开发银行和欧洲复兴开发银行都在其中。 此外,大多数政府都有专责小组,负责任务驱动的数据和AI项目以及构建生态系统。

Besides offering internships, paid, or unpaid, most contracts are fixed-term contracts lasting from a few months to three years.

除了提供实习(带薪或无薪)外,大多数合同都是定期合同,期限从几个月到三年不等。

Further, many data science and AI startups are working with governmental departments.

此外,许多数据科学和AI初创公司正在与政府部门合作。

In the last 12 months, I supported two former team members to find such projects. The one, half-Thai, went to Thailand to work in a big data startup that is working with Thailand’s government.

在过去的12个月中,我支持了两位前团队成员来寻找此类项目。 一个半泰国人去了泰国,在一家与泰国政府合作的大数据初创公司中工作。

The other scanned all the job adds, submitted his CV to these international organizations, and contacted people to finally get a fixed-term contract for a project of 4 months at one of the development banks abroad.

另一个扫描了所有增加的工作,将自己的简历提交给了这些国际组织,并与人们联系,最终在国外的一家开发银行获得了为期4个月的项目的定期合同。

What makes that project interesting?

是什么使该项目有趣?

These jobs and projects are often abroad. In addition to practical data science experience, many experiences with a foreign culture, and how to behave in an international diplomacy environment can be gained. That gives you vital soft skills for advancing on the career ladder.

这些工作和项目通常在国外。 除了实际的数据科学经验,还可以获得许多外国文化的经验,以及如何在国际外交环境中表现。 这为您提供了在职业阶梯上前进的重要软技能。

You can take on responsibility from the beginning. Small teams, interactions with decision-makers, presentations in front of leading people, are part of most projects. You often get contacts and mentorship of leading experts in that field, as they often advise international and governmental organizations.

您可以从一开始就承担责任。 小型团队,与决策者的互动,在领导者面前的演讲是大多数项目的一部分。 您经常会得到该领域领先专家的联系和指导,因为他们经常会为国际和政府组织提供建议。

Finally, the projects are unique, and research related, which gives space for new experimentation. Examples of such projects include the analyses of road fatalities of a developing country where the government wants to take action to reduce them or geospatial cause analyses of air pollutions because the government wants to put laws in place to limit it. Many social-economic aspects are integrated into these analytics.

最后,这些项目是独特的,并且与研究相关,这为新的实验提供了空间。 此类项目的示例包括分析发展中国家要采取行动以减少事故的道路致死率,或者对空气污染的地理空间原因进行分析,因为政府希望制定法律来限制这种死亡。 许多社会经济方面都集成到这些分析中。

How to start?

如何开始?

The first task is researching the open positions, the ongoing projects, and, importantly, startups working with such organizations.

首要任务是研究职位空缺,正在进行的项目,以及重要的是与这些组织合作的初创公司。

Positions can be found on UNjobs — not only from the UN but from all the organizations, as mentioned earlier, as well as, e.g., Coursera. Further, search on official homepages for the keyword “data scientist.”

不仅在联合国,而且在所有组织(如前所述)以及Coursera等职位,都可以找到关于UNjobs的职位。 此外,在官方主页上搜索关键字“数据科学家”。

If there should be no suitable internship or short-term job, submit your CV anyway. If they have projects, they compare it with the already available CVs in the database, and if your profile matches, they will contact you.

如果没有合适的实习或短期工作,则无论如何都要提交简历。 如果他们有项目,他们会将其与数据库中现有的简历进行比较,如果您的个人资料匹配,他们将与您联系。

Second, look for startups that are working with governments. If the startups have projects linked to the UN Sustainable Development Goals, they most probably work with governments.

其次,寻找与政府合作的初创公司。 如果初创公司的项目与联合国可持续发展目标相关 ,那么它们很可能与政府合作。

Another indication for that is when addressing society’s benefits, like water resourcing, safer community, e.g., preventing road accidents or violence, equality aspects, fighting diseases like HIV or malaria, or decreasing pollution.

另一个迹象表明,这是在解决社会利益时,例如水资源,更安全的社区,例如预防交通事故或暴力,平等方面,与艾滋病毒或疟疾等疾病作斗争或减少污染。

Start early in looking for such a project. It takes a bit of time and persistence.

尽早开始寻找这样的项目。 这需要一些时间和持久性。

But I can highly recommend it. Such an assignment opens many doors during your career, independent of the industry you are working. I could recently move to a global reputable think tank as a program lead. It’s a once in a lifetime chance to get such a position. Why have they asked me? Because I have done such projects in the past.

但是我强烈推荐它。 这样的任务为您的职业打开了许多门,与您所从事的行业无关。 我最近可以担任程序负责人,加入全球知名的智囊团。 这是一生一次获得这样职位的机会。 他们为什么问我? 因为我过去做过这样的项目。

5. The EDGAR database

5. EDGAR数据库

EDGAR, the abbreviation for Electronic Data Gathering, Analysis, and Retrieval, is a database that contains all submissions by companies and others that are required by law to file forms with the U.S. Securities and Exchange Commission.

EDGAR是电子数据收集,分析和检索的缩写,是一个数据库,其中包含公司和法律要求向美国证券交易委员会提交表格的其他人的所有提交。

You have wealthy business-relevant information in the form of figures and text. A quick introduction is provided here.

您可以使用图形和文字形式获取与业务相关的丰富信息。 这里提供快速介绍。

What makes that project interesting?

是什么使该项目有趣?

You learn first, how to access, download, and extract information from a web database, mainly consisting of text. That can be done with Python, and there exists already OpenEDGAR, an open-source software written in Python. But I would recommend other languages like Perl. It is specially designed for text processing, i.e., extracting the required information from a specified text file and converting it into a different form. It is much faster than Python. And if you want to work in a bank, there are still many databases set up in Perl.

首先,您将学习如何从主要由文本组成的Web数据库中访问,下载和提取信息。 可以使用Python做到这一点,并且已经存在OpenEDGAR,这是一个用Python编写的开源软件。 但是我会推荐其他语言,如Perl。 它是专为文本处理而设计的,即从指定的文本文件中提取所需的信息并将其转换为其他形式。 它比Python快得多。 而且,如果您想在银行工作,Perl中仍然设置了许多数据库。

It is an excellent database for sentiment analysis and using it to predict company and share price performance. Many fillings are encoded because companies want to shine and not give enough information to competitors. So, this database is a great learning resource for natural language processing (NLP).

它是用于情绪分析的出色数据库,可用于预测公司和股价表现。 因为公司想要发光而不给竞争对手足够的信息,所以许多填充物都有编码。 因此,该数据库是自然语言处理(NLP)的绝佳学习资源。

Last, these are great topics to start your own blog, either about investments, or NLP. Seriously done, you can get public awareness of your data science work, and it increases your chance for your dream data science job dramatically.

最后,这些都是开创自己的Blog的绝佳主题,涉及投资或NLP。 认真完成后,您可以使公众意识到您的数据科学工作,这极大地增加了您从事梦想的数据科学工作的机会。

How to start?

如何开始?

Decide on one single company that you want to analyze. Take one that exists at least ten years. Start with the goal to predict if the shares of the companies should be sold or bought.

确定要分析的一家公司。 以至少存在十年的时间为例。 从目标开始,以预测是否应该出售或购买公司的股票。

Familiarize yourself with the different forms in EDGAR. Start with the 10-K, the recent annual report of the company, and the 8-K, the ‘current report’ where events that shareholders should know are published.

熟悉EDGAR中的各种形式。 从公司最近的年度报告10-K和8-K(当前报告)开始,其中发布了股东应了解的事件。

Do common sentiment analysis over the last several years and look at the positive, negative, and net sentiments trends. Compare the curves with the development of the share price. Also, the statements have forward-looking information included. Analyze them, and this will give you the trend.

在过去几年中进行共同的情绪分析,并查看正面,负面和净情绪的趋势。 将曲线与股价的发展进行比较。 此外,这些声明还包含前瞻性信息。 分析它们,这将为您提供趋势。

Hint: the language in forward-looking statements contain words like “will”, “should”, “may”, “might”, “intend” and so forth.

提示:前瞻性陈述中的语言包含诸如“将”,“应该”,“可能”,“可能”,“打算”等词语。

Develop it with more sophisticated NLP and sentiment algorithms, by looking at other companies in the same industry, and integrate different sources like news and macro-economic figures. Compare it with share prices and financial ratios. There are no limits in all these analyses and rich content for a blog.

通过查看同一行业中的其他公司,使用更复杂的NLP和情感算法进行开发,并整合新闻和宏观经济数据等不同来源。 将其与股价和财务比率进行比较。 所有这些分析和博客的丰富内容都没有限制。

Connecting the Dots

连接点

I know that it is hard work to build up a cool data science portfolio. With such a collection, you can make above-average progress in that field, having a lot of fun, and getting your data science dream job.

我知道要建立一个很棒的数据科学产品组合很困难。 有了这样的集合,您可以在该领域取得超乎寻常的进步,获得很多乐趣,并使您的数据科学梦想成真。

I do not only recommend this for newbies in the data science area but also senior data scientists. It opens up many new paths during your career, not only because of the projects but also through the newly gained network.

我不仅向数据科学领域的新手推荐此方法,还向高级数据科学家推荐此方法。 它不仅为您的项目打开了道路,而且通过新获得的网络也为您的职业打开了许多新的道路。

These ideas show you the wide range of possibilities and give ideas to think out of the box.

这些想法为您展示了广泛的可能性,并为您提供了开箱即用的想法。

For me and my friends, the learning factors and fun is essential. That is our main focus when dedicating time to such projects.

对于我和我的朋友来说,学习因素和乐趣至关重要。 将时间用于此类项目时,这是我们的主要重点。

That we have built up also an exciting and unique portfolio, was just a waste product.

我们建立了一个令人兴奋和独特的产品组合,这只是一种浪费。

翻译自: https://towardsdatascience.com/5-concrete-real-world-projects-to-build-up-your-data-science-portfolio-ef44509abdd7

5g创业的构想

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值