idea调试代码错误
⛰ 🐛 🛫 🛬 🐛 🏖
🛫🛬🐛
Here at Guild, we like to foster a welcoming and inclusive engineering environment. In this series we will explore some ways to be an effective mentor as well as an effective mentee. For this segment, let’s use bug hunting as a tool for mentorship! Bug hunting is an invaluable skill, and it becomes an even greater challenge when you are dealing with distributed systems.
在Guild,我们希望营造一个友好而包容的工程环境。 在本系列中,我们将探索成为有效导师和有效导师的一些方法。 对于这一部分,让我们使用错误查找作为指导的工具! 搜寻错误是一项宝贵的技能,当您处理分布式系统时,它会成为更大的挑战。
Bugs are a part of all systems, and tracking them down is a part of our jobs as developers. But when a bug is no longer a single fly buzzing around a program and becomes a fly stuck in a web of systems, debuggability is more cumbersome and tedious. In order to successfully hunt it down, we need to enhance our understanding of what it means to trace bugs through systems.
错误是所有系统的一部分,对其进行跟踪是我们作为开发人员的工作的一部分。 但是,当错误不再只是程序周围的嗡嗡声而变成卡在系统网络中的苍蝇时,可调试性就变得更加麻烦和乏味。 为了成功地找到它,我们需要增强对通过系统跟踪错误的含义的理解。
A systems model is an essential tool for tracing complex problems. Being able to visualize, either mentally or with a diagram, provides a “big picture” context that allows you to identify spaces where bugs can hide. Teaching newer engineers how to trace bugs is a great way to build out mental models for your systems and to level up people on your team. Let’s think of an example.
系统模型是跟踪复杂问题的重要工具。 能够在思维上或图表上进行可视化,提供了一个“大图片”上下文,使您可以识别可以隐藏错误的空间。 教新工程师如何跟踪错误是一种为系统构建思维模型并提升团队成员水平的好方法。 我们来看一个例子。
Imagine that a user reports that they are having trouble logging in. A likely instinct is to check the authentication code. This a logical first assumption, as that code is directly responsible for the actions taken by the user in order to access your system. In a monolithic architecture, we could trace the call stack through one system; but when services are abstracted and microservices adopted, we need a mental model in order to properly map out what may be going wrong. We need to know how the code we think is responsible for the bug is interacting and affecting other code and systems around it.
假设用户报告他们登录有问题。一种可能的本能是检查身份验证代码。 这是一个合乎逻辑的第一个假设,因为该代码直接负责用户为了访问您的系统而采取的操作。 在单片架构中,我们可以通过一个系统跟踪调用堆栈。 但是,当抽象服务并采用微服务时,我们需要一个思维模型来正确地指出可能出了什么问题。 我们需要知道我们认为负责该错误的代码如何交互并影响周围的其他代码和系统。
Let’s use a hypothetical authentication flow in which we have many applications, as well as many services driving the data and logic behind these applications.
让我们使用一个假设的身份验证流程,在该流程中,我们拥有许多应用程序,以及许多驱动这些应用程序背后的数据和逻辑的服务。
At the highest level, this happens:
在最高级别,发生这种情况:
User visits a login page, enters credentials, and authenticates into Application A.
用户访问登录页面,输入凭据,然后对 Application A进行 身份验证 。
If we dig slightly deeper, we understand that the process looks more like this:
如果我们进行更深入的了解,我们将了解该过程看起来像这样:
Code is run to validate that the user exists in our database, passwords match, and the user has all the data we need to establish a connection into Application A.
运行代码来验证用户是否存在于我们的数据库中,密码是否匹配以及用户是否拥有我们与应用程序A建立连接所需的所有数据。
The validation returns true, so grant the user a JSON Web Token (JWT, pronounced “JOT”), establish a session, and carry on with your day.
验证返回true,因此向用户授予JSON Web令牌(JWT,发音为“ JOT”),建立会话并继续进行。
But thanks to our mental model, we know that reality looks much closer to this.
但是,由于我们的思维模式,我们知道现实看起来更接近这一点。
User visits a login page, enters credentials, and authenticates into Application A (So we have a Single Page App that renders the login form, and handles all client-side validations. We can call this Service One)
用户访问登录页面,输入凭据,然后对应用程序A进行身份验证(因此,我们有一个单页面应用程序,该应用程序呈现登录表单并处理所有客户端验证。我们可以将此 服务 称为 )
Service One posts data to a backend service to look up the user and validate their credentials (So, we now have another server-side application handling database lookups and validations as well. We call this Service Two)
服务一将数据发布到后端服务以查找用户并验证其凭据(因此,我们现在还有另一个服务器端应用程序也处理数据库查找和验证。我们将其称为 服务二 )
Service Two does some querying, finds the user, formats the User object into JSON, and passes it back to Service One.
服务二进行一些查询,找到用户,将User对象格式化为JSON,然后将其传递回Service One。
Service One receives this data, formats it into a JWT, and hands it over to Application A.
服务一接收此数据,将其格式化为JWT,并将其移交给应用程序A。
But wait! Application A needs some additional data in order to fully render an experience for the user! Application A then queries Service Three, which finds the missing data and passes it back to Service Two, which then re-formats the JWT, and then the JWT is passed back to Application A. Finally we are good to go!
可是等等! 应用程序A需要一些其他数据,以便为用户提供完整的体验! 然后 , 应用程序A查询 服务三 ,找到丢失的数据并将其传递回服务二,服务二再对JWT进行格式化,然后将JWT传递回应用程序A。最后,我们很高兴!
Phew. That’s a lot of services doing a lot of work for an authentication flow! Without an understanding of how all services interact and work together, it’s much harder to determine the root cause. Sure, we can keep making educated guesses about where to target our search, but with a clear mental model of the system, the task becomes less daunting. If you were part of the initial buildout of this system, or helped in architecting it, finding a bug may be much easier; however, imagine what this feels like as a brand new person on the team, or a junior engineer just starting out… it’s confusing!
ew 大量的服务为身份验证流程做了很多工作! 如果不了解所有服务的交互方式和协同工作方式,则很难确定根本原因。 当然,我们可以继续对目标搜索进行有根据的猜测,但是有了清晰的系统思维模型,任务就不会那么艰巨了。 如果您是该系统初始构建的一部分,或者是在体系结构设计方面提供了帮助,则查找错误可能会容易得多。 但是,请想象一下,作为团队中的一个新手或刚开始的初级工程师的感觉是什么……这很令人困惑!
In a way, I think of debugging as a git bisect of our system — that is, let’s start with the outlying layers and drill inward from there. With a git-bisect we can use a binary search in order to find the commit that introduced the bug. In a system-bisect we can utilize a binary-like search to find what system contains the bug. The mental concepts are the same; only the mechanisms are slightly different. And having a model of our systems allows us to bisect more effectively.
从某种意义上说,我认为调试是系统的git bisect ,也就是说,让我们从外围层开始,然后从那里进行深入研究。 使用git-bisect,我们可以使用二进制搜索来查找引入错误的提交。 在系统二等分中,我们可以利用类似于二进制的搜索来查找包含该错误的系统。 心理概念是相同的。 只有机制略有不同。 拥有我们的系统模型可以使我们更有效地平分。
Effective bisecting can help get around an age-old truth about engineering: what you may think is obvious is not, in fact, obvious. Yes, sadly, your assumptions after months or years of working in a system can sometimes leave out important and relevant details for unearthing your bug. So, walk through the basics as part of your process, and train teammates to do the same.
有效的平分技术可以帮助您解决有关工程的古老事实:您可能认为显而易见的事实实际上并不明显。 是的,令人遗憾的是,在系统中工作数月或数年后,您的假设有时可能会遗漏重要且相关的细节,以发掘您的错误。 因此,在您的流程中逐步学习基础知识,并训练队友也要这样做。
Now, when debugging, context is king. If the engineer debugging this issue has been around for a while, or helped architect some of the system, it is much easier for them to hold on to a mental model while going through the debugging process. Maybe they know that certain users are always missing data Y when they come from a certain population — in fact they have that ticket lingering in the icebox ready to fix this exact issue! So I suggest two things here:
现在,在调试时,上下文为准。 如果工程师调试此问题已经有一段时间了,或者帮助构建了一些系统,那么在进行调试过程时,他们就更容易坚持思维模型。 也许他们知道某些用户来自某个特定人群时总是会丢失数据Y-实际上,他们已经在冰柜中徘徊了票证,随时准备解决此确切问题! 所以我在这里建议两件事:
Diagram and document everything. I know it may sound redundant, but strong documentation can come in two forms: written and visual. Use both, please! Some people are visual learners, and having the design of your systems in this format is extremely helpful in turning everyone on your team into effective debuggers. Others will prefer to read about the design of the system, so having written documentation is equally valuable. Some folks, like myself, prefer both. It is nice to be able to visualize a system with an architecture diagram, while at the same time having solid documentation around each available service. By visualizing the flow, I know that Service One makes a POST request to Service Two. But what endpoints are available on Service Two? How is authentication handled on that backend? And what gets returned if something is amiss? These are the types of things that documentation and diagrams can help with! So please, write good documentation, but also help other people who learn differently with visual diagrams. When you are the only person who understands a system fully, and you are out of the country for two weeks on vacation, your past self will thank you.
绘制并记录所有内容。 我知道这听起来有些多余,但是强大的文档可以以两种形式出现:书面形式和视觉形式。 请同时使用两者! 有些人是视觉学习者,以这种格式进行系统设计对于将团队中的每个人变成有效的调试器非常有帮助。 其他人会更喜欢阅读系统的设计,因此拥有书面文档同样有价值。 有些人像我一样,两者都喜欢。 能够以体系结构图可视化系统,同时拥有围绕每个可用服务的可靠文档,这是很好的。 通过可视化流程,我知道服务一向服务二发出了POST请求。 但是服务2上有哪些端点可用? 在该后端如何处理身份验证? 如果有什么不对,会得到什么? 这些是文档和图表可以帮助的事情类型! 因此,请写出好的文档,同时也可以帮助其他人从视觉图表中学到不同的东西。 当您是唯一一个完全了解系统的人,并且在国外度假两个星期时,您过去的生活会感谢您。
2. Pair with people! If you are the resident expert on a particular system, and a hairy bug comes in or a system is down, can you grab a more junior person on your team and have them ride along as you debug? This works twofold:
2.与人配对! 如果您是某个特定系统的常驻专家,并且出现了毛病,或者系统出现故障,您是否可以在团队中聘用一个初级人员并让他们随身携带调试工具? 这是双重的:
a) You are sharing a lot of knowledge with your team and showing someone who doesn’t have as much experience how you debug. We all do this slightly differently, and there are a lot of techniques that can be shared! Maybe you love networking, but your more junior teammate doesn’t know much about it, so show them! They will get stronger and you will grow to understand them more.
a)您与您的团队分享了很多知识和展示人谁不拥有尽可能多的经验, 你如何调试。 我们每个人所做的操作略有不同,并且可以分享很多技术! 也许您喜欢网络,但是您的下级队友对此并不了解,所以请向他们展示! 它们将变得更强大,您将逐渐了解它们。
b) You are jaded. Yeah, you, the super senior. Other people on the team, especially less experienced or newer members, have good questions and fresh eyes. They see things that you overlook, and they can give you a different vantage point. Pairing on bug triaging is a valuable exercise for you and your teammates. Help them grow and share the knowledge; sure, maybe this isn’t always feasible, but if some non-mission critical bug is popping up, use it as a mentoring opportunity and level up your friends. Again, when you are on vacation, your past self will thank you.
b)你感到厌烦。 是的,你,超级大四生。 团队中的其他人,尤其是经验不足或较新的成员,都有很好的问题和崭新的眼光。 他们看到您忽略的事物,并且可以为您提供不同的优势。 对错误分类进行配对对您和您的团队成员而言都是宝贵的练习。 帮助他们成长并分享知识; 当然,也许这并不总是可行的,但是如果弹出一些非关键任务的错误,请将其用作指导机会并升级您的朋友。 再次,当您休假时,您过去的自我会感激您。
Not sure where exactly to start? Here’s a list of the common places I check when I’m debugging:
不确定从哪里开始? 这是调试时检查的常见位置的列表:
- Did a bad deploy cause this? Walk new teammates through how to find deploys and versions, how to roll back bad deploys when they happen, and how to do a postmortem or retro on things when they go wrong. 部署不当是否导致了这一点? 向新队友介绍如何查找部署和版本,如何在发生错误时回退错误的部署以及如何在出错时对事情进行事后分析或追溯。
- Did the bug reporter send a screenshot? Can you see what browser they are using? Is there a chance that the problem is browser specific? Does the screenshot show what type of error they received? Little bits of knowledge that feel unimportant or obvious could turn out to be relevant, and they might not be apparent to someone newer to your system. For mentoring, my motto is: Ask, don’t assume. Asking clarifying questions helps you, the debugger, clarify the problem statement for yourself while also clarifying things for your mentee. 错误报告者是否发送了屏幕截图? 您可以看到他们使用的浏览器吗? 问题是否有可能是特定于浏览器的? 屏幕截图是否显示他们收到的错误类型? 一点无关紧要或显而易见的知识可能被证明是相关的,并且对于系统的新手来说可能并不明显。 对于指导,我的座右铭是:问,不要假设。 提出澄清问题可以帮助调试器为您自己澄清问题陈述,同时也为您的受训者澄清问题。
- Where are your logs? You have logs, right? Logs can be noisy and hard to trace. Show your teammates the little tips and tricks you’ve picked up along the way. If there is a log aggregator — Splunk, Datadog, Cloudwatch, whatever — show teammates how to query it effectively. If your logging is set up to tag events with a piece of unique data, find that data and show new developers how to format those queries to get the most value out of your logs. 您的日志在哪里? 你有日志吧? 日志可能嘈杂且难以追踪。 向您的队友展示您在此过程中获得的小技巧。 如果有日志聚合器(Splunk,Datadog,Cloudwatch等),请向队友展示如何有效地查询它。 如果您的日志记录设置为使用一条唯一的数据标记事件,请查找该数据并向新开发人员展示如何格式化这些查询以从日志中获取最大价值。
- What is handling your networking? Do you have proxies and DNS and workers set up? Some engineers are not as familiar with networking and may not realize that it could be an issue. Show them your networking code, where it lives, what it does, etc. Again, what you think is obvious isn’t always crystal clear for others, or even for yourself! 什么在处理您的网络? 您是否设置了代理,DNS和工作人员? 一些工程师对网络不太熟悉,可能没有意识到这可能是一个问题。 向他们展示您的网络代码,它所处的位置,它的作用等等。同样,您认为显而易见的东西并不总是对其他人,甚至对您自己也不清楚!
- Is something down? No, seriously, is AWS having issues in us-west-2? Is a third party tool having issues? Sometimes the simpler the issue, the more easily it can be overlooked. Sanity check your third party systems! 有什么事吗? 不,严重的是,AWS在us-west-2中有问题吗? 第三方工具有问题吗? 有时候,问题越简单,就越容易被忽视。 健全检查您的第三方系统!
Reproduce the bug. Spin it up locally and try to reproduce. Get a runtime debugger involved. Are you using Rails? Pry can be your best friend. Are you using React? React Dev Tools and good old fashioned `console.log` can be life savers. By reproducing the bug locally, we bring the logs to our machines and have strong tools to drill into each bit of code and find the weak link.
重现该错误。 在本地旋转并尝试复制。 获取涉及的运行时调试器。 您在使用Rails吗? 撬可以成为您最好的朋友。 您在使用React吗? React Dev Tools和老式的`console.log`可以拯救生命。 通过在本地重现错误,我们将日志带到了我们的机器上,并且拥有强大的工具来深入研究每一段代码并找到薄弱的环节。
So, there you have it — using debugging as a means of mentoring. Though not hard and fast, I think some of these practices and ideas can really help other engineers on your team get a better grasp of your systems, while also providing immediate value to users. Now, a junior engineer just joined your team, and in the first week you’ve shown them the lay of the land while also fixing a bug affecting users. That’s a win-win scenario that really helps empower your teammates. It shows them that, yes, we all write bugs; and yes, you really can do this!
因此,您可以使用调试作为指导。 尽管并非一帆风顺,但我认为其中的一些实践和想法确实可以帮助您团队中的其他工程师更好地掌握您的系统,同时还可以为用户带来直接的价值。 现在,一名初级工程师刚刚加入了您的团队,在第一周,您已向他们展示了这块土地,同时还修复了影响用户的错误。 这是双赢的方案,确实有助于增强队友的能力。 它向他们表明,是的,我们所有人都在编写错误。 是的,您确实可以做到这一点!
idea调试代码错误