各个级别常见的编程错误(以及如何修复它们)
克隆并安装
设置环境
请使用您的数据库凭据编辑 .env 文件。
运行迁移
启动服务
每个开发者都会经历这样一个时刻——通常是在凌晨两点左右,在冰冷的显示器灯光下,手指悬在键盘上方,就像钢琴家即将演奏拉赫玛尼诺夫——这时你突然意识到,你苦苦追查了三个小时的 bug 竟然是由一个拼写错误引起的。而且还不是什么有趣的拼写错误。仅仅是少了一个分号,或者把变量名写成了别的uesr名字user。
那一刻,你会感到人类所有缺点的重担都压在你的肩上。
我写代码的时间比我愿意承认的要长得多,而我学到的是:我们都会犯错。每个人都会。刚完成训练营的初级开发人员,正在艰难地进行第三次微服务迁移的中级工程师,以及忘记的编程语言比大多数人一生要学的还要多的高级架构师——我们都在这片数字荒野中跌跌撞撞,身后留下了一堆漏洞、反模式和技术债务。
但妙处就在于:错误其实都是等待被发现的模式。一旦你发现了这种模式,一旦你真正理解了为什么事情总是出错,你就能解决它。不仅是代码,还有你的思维方式。
这篇文章记录了我曾经犯过、目睹过、在深夜调试过,最终学会避免的编程错误。文章按技能水平分类,但我鼓励你通读全文,因为事实是,即使是资深开发者,在疲惫、匆忙或身处不熟悉的领域时,也会犯一些“新手”才会犯的错误。而且,我们作为新手犯的错误,有时会以复杂的方式贯穿我们整个职业生涯。
所以,给自己倒杯咖啡(或者茶,我不会评判),找个舒服的地方坐下,让我们来聊聊我们称之为软件开发的这团美丽的混乱吧。
第一部分:新手常犯的错误(或者说,“欢迎来到雷霆穹顶”)
1. 将错误信息视为人身攻击
当你刚开始学习编程时,错误信息就像电脑在对你大吼大叫。它们用晦涩难懂的语言写成,充斥着行号、堆栈跟踪和你看不懂的术语。你的第一反应是惊慌失措,或者立刻逐字逐句地搜索整个错误信息,或者胡乱修改代码直到错误消失(旁白:这从来没用)。
原因如下:错误信息之所以令人畏惧,是因为它们暴露了我们的无知。它们证明我们不懂某些东西,而这种感觉令人不舒服。
解决方法:学会接受错误信息。我是认真的。错误信息是礼物。它们是电脑在试图帮助你,告诉你到底哪里出了问题。
首先仔细阅读错误信息。不要略读,要认真阅读。最重要的信息通常位于堆栈跟踪的顶部或底部。查找以下内容:
- 错误类型(TypeError、SyntaxError 等)
- 发生此问题的文件和行号
- 实际的信息解释了出了什么问题。
我举个例子。你看:
TypeError: Cannot read property 'length' of undefined
at validateInput (app.js:42)
at processForm (app.js:89)
这是在告诉你一个故事:“嘿,在 app.js 的第 42 行,在 validateInput 函数中,你试图访问某个对象的 'length' 属性,但该对象未定义——它不存在。”
现在你知道该从哪里入手,该找什么了。是不是某个变量传递错误?是不是之前的函数返回了未定义状态?错误信息就像一张藏宝图,指引你找到问题所在。
专业提示:刚开始的时候,不妨记个“错误日志”。遇到错误时,记下错误信息、你最初的理解、实际含义以及你的解决方法。几个月后,你就能拥有一本专属的调试百科全书了。
2. 不理解代码就复制粘贴
Stack Overflow 是个很棒的资源,GitHub 也是个解决方案的宝库。但几乎所有新手都会掉入一个危险的陷阱:找到一段能用的代码,就直接照搬到自己的项目里,看到它解决了眼前的问题,然后就继续做其他事,却从未真正理解它是如何运作的。
我曾经和一位初级开发人员共事,他直接从教程里复制粘贴了一整套身份验证系统。一开始运行完美……直到有一天我们需要添加一个新功能。他对着代码看了两个小时,最后才承认自己完全看不懂。结果我们不得不从头开始重写整个系统。
为什么会这样:学习的时候,你会面临压力——要交作业的压力、要跟上进度的压力、要避免出丑的压力。复制粘贴会让你感觉效率很高。但它也是一种伪装成高效的拖延行为。
解决方法:使用“向橡皮鸭解释”测试。在集成任何非原创代码之前,逐行检查并解释每一行的作用。如果可以,最好大声解释。可以跟朋友、宠物,或者,没错,一只真正的橡皮鸭解释。
如果你无法解释它,说明你不理解它。如果你不理解它,当它出错时(而它肯定会出错),你就无法进行调试。
以下是更佳的工作流程:
- 在 Stack Overflow 或其他地方寻找解决方案
- 仔细阅读
- 关闭浏览器
- 试着凭记忆自己实现一下。
- 将你的版本与原版进行比较
- 了解这些差异
一开始可能会比较慢,但你的学习速度会呈指数级增长。此外,你也不会再引入那些你根本无法修复的莫名其妙的bug了。
3. 未使用版本控制(或使用不当)
我见过一些奇葩的做法。我见过开发者把整个项目文件夹复制一份,然后在文件名后面加上日期来做备份。我还见过project_final……,project_final_FINAL…… project_final_FINAL_actually_final,以及我个人最喜欢的…… project_final_FINAL_actually_final_this_time_i_swear_v2。
我还见过一些初学者使用 Git,但他们把它当作一个黑盒子——只是机械地输入他们记住的命令,却不理解这些命令的作用,然后一旦出现问题就惊慌失措。
原因如下:版本控制系统,尤其是 Git,学习曲线非常陡峭。提交、分支和合并的概念模型一开始确实很难理解。因此,人们要么完全避免使用它,要么只是浅尝辄止。
解决方法:花一个周末认真学习 Git。不仅仅是死记硬背命令,而是要理解其底层模型。以下是我总结的思路:
把 Git 想象成一棵快照树。每次提交都是项目在某一时刻的快照。分支只是指向特定提交的标签。合并时,你实际上是将两个分支的历史记录结合起来。
首先要养成以下这些基本习惯:
尽早提交,频繁提交。不要等到功能“完成”才提交。每当你完成一个逻辑单元的工作时,就应该提交。修复了一个 bug?提交。添加了一个函数?提交。每次提交都应该是原子性的——它应该只做一件事,而且这件事应该可以用一句话来描述。
写出有意义的提交信息,不要写“修复了一些东西”或“更改了一些东西”。例如,“为注册表单添加电子邮件验证”或“修复用户服务中的空指针异常”。未来的你会感谢自己的。
凡事都创建分支。开发新功能?创建分支。尝试实验性重构?创建分支。这样你就可以自由地进行实验,而不用担心分支失效,因为你可以随时放弃分支并返回原处。
深入学习这些命令:
git status(发生了什么变化?)git diff(具体发生了哪些变化?)git log(它的历史是什么?)git checkout(在分支或提交之间切换)git reset(在本地撤销操作)git revert(改变历史)
这里有一个调试技巧,可以为你节省无数时间:git bisect它允许你通过二分查找遍历提交历史,精确找到引入 bug 的提交。这就像穿越时空的调试魔法。
4. 忽略代码风格和格式
刚开始编程的时候,你觉得代码能运行就足够了。谁会在意缩进是否一致或者变量名是否晦涩难懂呢?只要能运行就行,不是吗?
但两周后你再去看那段代码,却完全搞不懂它是干什么用的。更糟糕的是,别人读了之后,看你的眼神就像你递给他们一张用剪报拼凑的勒索信。
原因如下:初学者低估了自己会遗忘多少知识,以及代码的阅读频率远高于编写频率。代码的阅读频率大约是编写频率的 10 倍,甚至可能更高。
解决方法:制定一套语言风格并坚持下去。更好的办法是,使用代码检查工具和格式化工具来管理你的语言:
- JavaScript/TypeScript:ESLint + Prettier
- Python:Black + Flake8
- Ruby:鲁博战警
- Java:Checkstyle
- Go: gofmt(内置!)
在编辑器中设置这些自动运行。一开始,你可能会被那些红色波浪线弄得心烦意乱。但慢慢地,你会逐渐理解这些规则,并自然而然地开始编写更简洁的代码。
除了工具之外,还要遵循以下原则:
命名比你想象的更重要。一个名为 `a` 的变量x什么也告诉我不了。而一个名为 `b` 的变量则userEmailAddresses能告诉我一切。没错,它确实更长一些。没关系。磁盘空间很便宜,但你的时间和精力却很宝贵。
一致性至关重要。选择一种代码风格(驼峰式命名法还是蛇形命名法,制表符还是空格等等),并始终保持一致。不一致会增加认知负担。每次有人阅读你的代码时,如果他们看到不寻常的代码风格,他们的大脑就必须停下来理解。
空格是你的朋友。代码是诗,而诗需要呼吸的空间。在逻辑部分之间用空行分隔长函数。在运算符周围添加空格。让你的代码自由呼吸。
5. 不阅读文档
初学者常常在不了解所用工具的情况下就开始编写代码。他们知道 Python 有列表和字典,就用它们来处理所有事情,却没意识到集合的存在,而集合其实非常适合解决这个问题。他们知道 JavaScript 有数组,就手动遍历数组,却不知道 map、filter 和 reduce 等函数。
我曾经亲眼目睹一个初学者花了一整个下午的时间自己实现字符串反转函数,测试、调试,结果最后却发现他使用的编程语言其实已经内置了反转方法。他脸上的表情简直是晴天霹雳。
原因:文档看起来很枯燥,似乎会拖慢你的速度。你想做的是动手实践,而不是阅读如何构建东西。此外,官方文档对于新手来说可能枯燥乏味、技术性强,难以理解。
解决方法:改变你与文档的关系。不要把它看作苦差事,而要把它看作解锁超能力的途径。你花在阅读文档上的每一小时,都意味着你少花一小时去重复造轮子。
首先可以尝试以下策略:
完整阅读入门指南。不要略读,要认真阅读。大多数库和框架都有指南,教你理解基本概念和最佳实践。这可是宝贵的资源。这是工具开发者的智慧结晶,他们正是你正在使用的工具的创造者。
编写代码时,请始终打开 API 参考文档。把它想象成你的魔法书。你是一位巫师,而文档就是你的魔法书。当你需要使用某个函数时,查阅它,看看它接受哪些参数、返回什么值以及存在哪些特殊情况。
阅读代码示例。官方文档通常会提供示例。仔细研究它们,运行它们,修改它们,甚至破坏它们并观察会发生什么。这是主动学习,而且效果显著。
使用速查表。对于常用工具(例如 Git、Vim、SQL 等),最好准备一份速查表放在手边。随着时间的推移,你会记住基本操作,但快速查阅可以减少操作上的不便。
6. 试图记住所有内容
初学者常常认为“优秀的程序员”把所有东西都记住了——每个函数、每个语法细节、每个算法。所以他们也试图记住一切,结果忘记了就觉得自己很笨。
告诉你个秘密:资深开发者也经常上网搜索。我用 Python 好几年了,每次写字符串格式化代码的时候还是会查语法。我写过几百条 SQL 查询语句,但如果好久没用过 JOIN 语句,还是会查一下具体的语法。
造成这种情况的原因:有一种误解认为专家什么都懂。这是错误的。专家拥有的是模式识别能力、问题解决能力以及快速查找信息的能力。
解决方法:专注于理解概念和模式,而不是死记硬背语法。学习事物运作的原理,而不仅仅是输入什么。
例如,不必死记硬背每种语言中 for 循环的确切语法。相反,要理解循环的本质是迭代——重复执行某项操作。一旦你理解了这个概念,查找你所用语言的具体语法就轻而易举了。
构建“第二大脑”。这可以是:
- 个人维基(我用的是 Notion,有些人喜欢 Obsidian)
- GitHub 上的 gist 集合
- 一个整理良好的书签文件夹
- 一个你写自己学到的东西的博客
当你解决问题或学到新知识时,用自己的话并举例子把它写下来。这样做有两个好处:写作本身有助于你更好地理解,而且你还能创建一个个人参考指南,方便日后查阅。
7. 没有测试你的代码
对于新手来说,测试感觉像是额外的工作。代码已经可以运行了(至少在你尝试的那个场景下似乎可以运行),那么为什么还要花时间编写测试呢?
然后你做了一个小小的改动,突然间一切都莫名其妙地崩溃了。或者你部署到生产环境,发现你的代码在你的机器上运行完美,但在实际环境中却彻底失败。
原因如下:学习过程中,测试感觉抽象而理论化,其益处并不显而易见。此外,测试还会增加复杂性——除了其他所有知识之外,你现在还需要学习一套测试框架。
解决方法:从小处着手。你不需要马上实现 100% 的测试覆盖率或实践测试驱动开发。只需从养成这个简单的习惯开始:
写完函数后,要为其编写一些测试。测试正常情况,测试边界情况,测试无效输入时的行为。
例如,假设你编写了一个用于验证电子邮件地址的函数:
def is_valid_email(email):
return '@' in email and '.' in email
编写测试:
def test_valid_email():
assert is_valid_email('user@example.com') == True
def test_invalid_email_no_at():
assert is_valid_email('userexample.com') == False
def test_invalid_email_no_dot():
assert is_valid_email('user@example') == False
def test_empty_string():
assert is_valid_email('') == False
现在,当你编写第三个测试时,你可能会意识到你的函数无法正确处理空字符串。恭喜你——你刚刚在它进入生产环境之前就发现了这个 bug。
随着你越来越熟练,逐渐扩大测试范围:
- 针对各个功能的单元测试
- 集成测试用于检验组件之间的协同工作情况。
- 针对关键用户工作流程的端到端测试
拥有一套完善的测试套件所带来的自信令人陶醉。你可以毫无顾虑地进行重构。你可以放心地进行更改,因为即使你破坏了某些东西,测试也能将其捕获。
8. 过早优化
这是一个典型的陷阱。你在编写代码时,会开始想:“这段代码可能会很慢。如果我需要处理一百万个用户怎么办?如果这个循环成为瓶颈怎么办?”
所以你花了三天时间用 Redis 实现了一个复杂的缓存系统,仔细优化了每一个查询,并使用位操作来节省几个字节的内存……然后你的应用程序用户从未超过十个,而你却毫无益处地让你的代码库变得无限复杂。
唐纳德·克努特说得最好:“过早优化是万恶之源。”
原因何在:优化让人感觉高明而复杂。但它也是一种拖延——与其面对构建实际功能的艰巨工作,不如花些时间去调整性能来得轻松。
解决方法:请按以下优先级顺序操作:
- 使其正常运行——确保该功能正常工作。
- 改正错误——重构代码以提高清晰度和可维护性
- 追求速度——优化,但前提是你有证据表明它速度慢。
从最简单的可行方案入手。使用最直接的数据结构。编写清晰易懂的代码。在出现性能问题之前,无需担心性能问题。
当您需要进行优化时,请遵循以下步骤:
首先要进行测量。使用性能分析工具来识别真正的瓶颈。你不能仅仅依靠直觉来判断哪里运行缓慢。代码中最慢的部分几乎从来不是你想象的那样。
优化正确的地方。80 /20 法则在性能优化中尤为适用。通常,20% 的代码占用了 80% 的运行时间。找到这 20% 的代码并对其进行优化,其余部分则无需考虑。
保持简洁。有时候,“优化”的解决方案实际上更简单。使用内置方法和库——它们通常已经过优化。不要自己编写排序算法;使用标准库的排序函数。它速度更快,并且经过数百万开发者的测试。
第二部分:中级错误(或者说,“你懂得不多,但足以造成危险”)
你已经写代码一段时间了。你开发过一些项目,为一些代码库做过贡献,可能还把一些功能部署到了生产环境。你现在很厉害——你能解决遇到的绝大多数问题,而且开始对架构和设计模式有了自己的见解。
这是一个很棒的阶段。同时,你也会在这个阶段开始犯一些更复杂的错误。
9. 过度设计解决方案
我经常在中级开发人员身上看到这样一种模式:他们刚刚学习了设计模式、微服务或任何热门的架构趋势,就想在所有地方都使用它。
他们需要存储一些用户设置,因此采用了仓库模式,包括接口、依赖注入和三层抽象。原本只需 20 行简单代码就能实现的功能,他们却编写了 200 行复杂的架构代码。
我做过这种事。我们都做过这种事。我曾经搭建了一个包含 12 个不同服务的系统,而其实两个就足够了。结果有一天,我们需要添加一个简单的功能,却不得不修改所有 12 个服务,我才真正体会到自己狂妄自大的后果。
为什么会这样:当你达到中级水平时,你会对所有新概念感到兴奋。你想证明自己掌握了高级技巧。你也想证明自己不再是新手了。而且说实话,编写复杂的解决方案比编写简单的解决方案更有成就感。
解决方法:接受 YAGNI 原则——“你不需要它”。这条原则指出,你应该只在真正需要的时候才实施某些措施,而不是在你预想可能需要的时候就实施。
在增加复杂性之前,请先问自己以下问题:
这解决的是我现在遇到的问题,还是将来可能遇到的问题?如果是后者,请稍等。未来的问题往往不会出现,或者即使出现,也与你想象的截然不同。
我可以用更简单的方法解决这个问题吗?通常情况下,可以。最简单的解决方案往往也是最好的解决方案。它更容易理解、更容易修改、更容易调试。
这种复杂性的代价是什么?每一个抽象概念、每一种模式、每一个架构决策都有其代价。你需要付出认知负担、维护所需的代码行数以及新开发人员的上手时间。那么,这种收益是否值得付出这些代价呢?
我举个具体的例子。假设你正在搭建一个博客,需要展示文章:
过度设计的方法:
# post_repository_interface.py
class PostRepositoryInterface:
def get_all(self): pass
def get_by_id(self, id): pass
# post_repository.py
class PostRepository(PostRepositoryInterface):
def __init__(self, db_connection):
self.db = db_connection
def get_all(self):
return self.db.query("SELECT * FROM posts")
def get_by_id(self, id):
return self.db.query("SELECT * FROM posts WHERE id = ?", id)
# post_service.py
class PostService:
def __init__(self, repository: PostRepositoryInterface):
self.repository = repository
def list_posts(self):
return self.repository.get_all()
# Then in your controller, you inject dependencies...
简单方法:
# posts.py
def get_all_posts(db):
return db.query("SELECT * FROM posts")
def get_post_by_id(db, id):
return db.query("SELECT * FROM posts WHERE id = ?", id)
两种方法都可行。第一种方法表明你了解仓库模式和依赖注入。第二种方法则以最小的复杂度真正解决了问题。
仓库模式本身并不坏——它确实有实际的应用场景。但对于一个数据库查询简单的博客来说,它就显得过于复杂了。只有在真正需要的时候才使用它:比如在多个数据源之间切换时,需要模拟数据库进行测试时,或者当这种复杂性是合理的。
智慧在于懂得何时该行动。
10. 不理解异步和并发
这就是中级开发者遇到的瓶颈。你的应用会进行 API 调用、数据库查询、文件操作——所有这些都需要等待。所以你听说过应该“使用异步”或“并发”来提升性能。
你在 JavaScript 代码中随意添加 ` asyncand` 和 `and` await,或者在 Python 脚本中添加线程,然后……事情就开始变得奇怪。出现了竞态条件。数据损坏了。有时候它能正常运行,有时候却不行,而你却完全不知道原因。
我调试过一些耗时数天才能找到的竞态条件错误,这些错误大约每运行一百次才会出现一次,而且只在配备多核 CPU 的生产服务器上才会发生。这些错误简直是噩梦。
原因如下:并发编程确实很难。它需要与顺序编程不同的思维模式。当多个事件同时发生时,你需要考虑它们交互时会发生什么,而这很快就会变得非常复杂。
解决方法:首先要理解并发和并行之间的区别:
并发是指同时处理多个任务。它关乎程序结构——组织程序,使多个任务能够同时进行而无需相互等待。
并行处理是指同时执行多项任务。它关乎执行——实际上是在多个CPU核心上同时运行代码。
你可以实现并发而没有并行(一个 CPU 核心,但任务轮流执行),也可以实现并行而没有并发(多个 CPU 核心运行独立任务)。
对于异步操作(例如网络请求),通常需要的是并发,而不是并行。在 JavaScript 中使用 async/await 或在 Python 中使用 asyncio 时,代码并非同时运行——而是组织代码,使得当一个任务在等待(例如等待网络响应)时,另一个任务可以执行。
这里有一个对我理解很有帮助的思维模型:把异步操作想象成一家餐厅。你(服务员)从1号桌接单(启动一个异步操作)。在厨房准备1号桌的餐点时(操作处于等待状态),你可以去接2号桌和3号桌的单。你并不是同时烹饪多道菜——你只是在等待的时候不会闲着。
异步代码的关键原则:
1. 要真正理解什么是异步。CPU密集型操作(数学运算、数据处理)无法从异步中获益。I/O密集型操作(网络、磁盘、数据库)则可以。
2. 注意共享状态。如果多个异步操作修改相同的数据,可能会出现竞态条件。要么避免共享状态,要么使用锁/互斥锁来保护它。
3. 正确处理错误。在异步代码中,错误处理可能很棘手。异步操作中的异常可能不会像你预期的那样向上冒泡。务必将异步操作包裹在 try-catch 代码块中。
4. 不要过度使用异步。并非所有操作都需要异步。如果你的代码本身就是顺序执行的,那就保持顺序执行。异步会增加代码的复杂性;只有当其带来的好处(例如更高的资源利用率、更快的响应速度)大于成本时才应该使用异步。
以下是一个 JavaScript 示例:
异步操作不当:
async function processUsers() {
const users = await getUsers(); // Get users from database
for (let user of users) {
await sendEmail(user); // Send emails one by one, waiting for each
}
}
这比同步代码还糟糕!你明明用了异步,却仍然要等上一封邮件发送完毕才能开始发送下一封。
良好的异步使用方法:
async function processUsers() {
const users = await getUsers();
// Start all email operations concurrently
const emailPromises = users.map(user => sendEmail(user));
// Wait for all to complete
await Promise.all(emailPromises);
}
现在你实际上是在利用并发性。所有邮件几乎同时开始发送,你只需等待它们全部发送完毕即可。
11. 忽略数据库性能
你已经学会了 SQL,可以编写查询语句。你创建了一个功能,可以从数据库加载数据并显示出来。它运行良好……使用包含 50 条记录的测试数据。
然后你把系统部署到生产环境,那里的数据库表有 50 万条记录,突然间每个页面都要加载 30 秒。用户们都气疯了。老板也开始问问题。你疯狂地在谷歌上搜索“为什么我的数据库这么慢”。
原因如下:数据库性能并非显而易见。看似相同的查询,其性能表现可能因索引、连接和表大小的不同而大相径庭。在开发小型数据集时,这些问题往往难以察觉。
解决方法:从一开始就要考虑数据库性能。以下是关键原则:
索引是你的好帮手。索引就像书里的索引一样——它能让你无需翻阅每一页就能找到所需内容。如果你经常需要按某一列进行搜索或筛选,那么该列就需要一个索引。
没有索引:
SELECT * FROM users WHERE email = 'user@example.com';
-- Database scans every row - O(n) operation
附电子邮件索引:
CREATE INDEX idx_users_email ON users(email);
SELECT * FROM users WHERE email = 'user@example.com';
-- Database uses index - O(log n) operation
两者之间的差别简直是天壤之别。对于一个拥有百万行数据的表,未建立索引的查询可能需要几秒钟,而建立索引的查询只需几毫秒。
但不要为所有数据都建立索引。索引是有代价的——它们会占用空间,而且会降低写入(插入、更新、删除)速度,因为索引本身也需要更新。只为那些经常查询的列建立索引,尤其是在 WHERE 子句、JOIN 条件和 ORDER BY 子句中。
N+1 查询简直是魔鬼。这可能是最常见的数据库性能优化错误。这种情况发生在你加载一个项目列表,然后遍历列表中的每个项目并执行一次数据库查询时。
# BAD: N+1 queries
posts = db.query("SELECT * FROM posts")
for post in posts:
author = db.query("SELECT * FROM users WHERE id = ?", post.author_id)
post.author = author
# If you have 100 posts, this makes 101 database queries!
# GOOD: Use a JOIN
posts = db.query("""
SELECT posts.*, users.name as author_name
FROM posts
JOIN users ON posts.author_id = users.id
""")
# One query, no matter how many posts
在第一个例子中,如果你有 100 篇文章,你需要执行 101 次数据库查询(一次查询文章,然后 100 次查询作者)。在第二个例子中,你只需要执行一次查询就能将所有信息合并在一起。这样速度可以提升 100 倍。
使用 EXPLAIN 命令。大多数数据库都提供 EXPLAIN 命令,它可以显示数据库将如何执行查询。务必学会解读 EXPLAIN 命令的输出结果。它会告诉你是否使用了索引、扫描了多少行以及瓶颈在哪里。
EXPLAIN SELECT * FROM users WHERE email = 'user@example.com';
输出结果会告诉你查询的执行过程。如果在一个大型表上看到“全表扫描”的输出,则说明你需要创建索引。
只加载你需要的数据。如果只需要几列数据,就不要这样做SELECT *。数据库仍然需要加载所有数据,然后你还要通过网络发送这些数据。务必具体说明:
-- Instead of this:
SELECT * FROM users;
-- Do this:
SELECT id, name, email FROM users;
分页至关重要。如果要显示项目列表,请不要一次性加载所有内容。请使用 LIMIT 和 OFFSET 函数:
SELECT * FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 0;
这样每次只加载 20 篇文章。反正你的用户可能也只会浏览第一页。
12. 没有为失败做好准备
中级开发人员编写代码时常常隐含一个假设:一切都会正常运行。网络请求会成功。文件会存在。第三方 API 可用。
然后,生产环境开始出现故障,你才意识到你的应用根本无法应对。应用崩溃,数据损坏,用户看到晦涩难懂的错误信息。
原因如下:在本地开发时,一切通常都很正常。网络速度很快,服务也在运行。你不会遇到现实世界中的各种混乱情况。
解决方法:采取防御心态。假设事情会失败,并为此做好准备。
任何外部依赖项都可能出现故障。网络调用、数据库查询、文件操作——所有这些都可能以多种方式失败。它们可能超时,可能返回错误,服务可能宕机,连接可能在操作过程中断开。
将外部操作放在 try-catch 代码块中:
async function getUserData(userId) {
try {
const response = await fetch(`/api/users/${userId}`);
if (!response.ok) {
throw new Error(`HTTP error: ${response.status}`);
}
return await response.json();
} catch (error) {
console.error('Failed to fetch user data:', error);
// Return a sensible default or rethrow with context
return null;
}
}
设置超时时间。网络操作绝不能无限期挂起。请设置合理的超时时间:
import requests
try:
response = requests.get('https://api.example.com/data', timeout=5)
except requests.Timeout:
print("Request timed out")
except requests.RequestException as e:
print(f"Request failed: {e}")
实现带退避的重试机制。瞬态故障(例如网络暂时中断、服务短暂不可用)通常可以通过重试解决。但不要立即重试——使用指数退避算法:
import time
def fetch_with_retry(url, max_retries=3):
for attempt in range(max_retries):
try:
response = requests.get(url, timeout=5)
return response
except requests.RequestException as e:
if attempt < max_retries - 1:
wait_time = 2 ** attempt # 1s, 2s, 4s
time.sleep(wait_time)
else:
raise # Give up after max retries
严格验证输入。永远不要轻信任何来源的输入——无论是用户输入、API输入还是其他任何来源。检查类型、范围和格式:
def process_age(age_str):
try:
age = int(age_str)
except ValueError:
return "Invalid age: not a number"
if age < 0 or age > 150:
return "Invalid age: out of range"
return f"Age is valid: {age}"
实现优雅降级。如果非关键服务出现故障,您的应用不应该完全崩溃。例如,推荐引擎可能宕机了——没关系,只需显示默认列表即可:
def get_recommendations(user_id):
try:
return recommendation_service.get_for_user(user_id)
except ServiceUnavailableError:
# Fallback to a default list
return get_popular_items()
监控并发出警报。你无法解决你不知道的问题。使用日志记录和监控工具。当出现故障时,你应该在用户开始抱怨之前就了解情况。
13. 对记忆和资源的理解不足
你编写的代码在开发阶段运行完美。部署后,随着时间的推移,应用程序开始占用越来越多的内存。最终,它崩溃并出现“内存不足”错误。
或者你打开一个文件,读取其中的数据,然后忘记关闭它。你不断重复这个过程,突然间你就无法再打开任何文件了,因为你已经达到了操作系统的文件描述符限制。
原因如下:资源管理在系统正常运行时是不可见的。你看不到内存的分配,也看不到文件句柄的消耗。直到资源耗尽为止。
解决方法:了解你的代码使用的资源,并有意识地管理它们。
内存泄漏时有发生。即使是使用垃圾回收机制的语言,也可能发生内存泄漏。常见原因:
1. 永无止境增长的全球性国家:
// BAD: This cache grows without bound
const cache = {};
function getUserData(userId) {
if (cache[userId]) {
return cache[userId];
}
const data = fetchUser(userId);
cache[userId] = data; // Never removed!
return data;
}
修复方案:实现缓存清除:
const cache = new Map();
const MAX_CACHE_SIZE = 1000;
function getUserData(userId) {
if (cache.has(userId)) {
return cache.get(userId);
}
const data = fetchUser(userId);
if (cache.size >= MAX_CACHE_SIZE) {
// Remove oldest entry
const firstKey = cache.keys().next().value;
cache.delete(firstKey);
}
cache.set(userId, data);
return data;
}
2. 未被移除的事件监听器:
// BAD: Listener is never removed
element.addEventListener('click', handleClick);
如果不断添加监听器而不删除旧的监听器,它们就会在内存中不断累积。
// GOOD: Remove when done
element.addEventListener('click', handleClick);
// Later, when the element is removed:
element.removeEventListener('click', handleClick);
3. 未清除的计时器:
// BAD: Timer keeps running even after component unmounts
setInterval(() => {
updateData();
}, 1000);
// GOOD: Clear timer when done
const timerId = setInterval(() => {
updateData();
}, 1000);
// Later:
clearInterval(timerId);
显式地关闭资源。文件、数据库连接、网络套接字——这些都会消耗系统资源。打开它们,使用它们,然后关闭它们。大多数编程语言都有上下文管理器来实现这一点:
# BAD: File might not get closed if an error occurs
file = open('data.txt', 'r')
data = file.read()
process(data)
file.close()
# GOOD: File is guaranteed to close, even if an error occurs
with open('data.txt', 'r') as file:
data = file.read()
process(data)
# File automatically closed here
处理大型数据结构时要格外小心。将 1GB 的文件完全加载到内存中无异于自取灭亡。建议改为流式传输:
# BAD: Loads entire file into memory
with open('huge_file.txt', 'r') as file:
contents = file.read()
for line in contents.split('\n'):
process(line)
# GOOD: Processes one line at a time
with open('huge_file.txt', 'r') as file:
for line in file: # This streams, not loads all at once
process(line)
分析内存使用情况。使用工具查看内存都流向了哪里:
- Python:
memory_profiler - JavaScript:Chrome 开发者工具堆快照
- Java:VisualVM、JProfiler
这些工具可以显示哪些对象正在占用内存,从而揭示你之前不知道存在的内存泄漏。
14. 货物崇拜式编程
你在代码库、教程或你欣赏的库中看到某种模式。你并不完全理解为什么要这样做,但它似乎是“正确的方法”,所以你在自己的代码中处处复制它。
或许是某种特定的文件夹结构,或许是某种特定的类组织方式,或许是某种类似单例或工厂模式的设计模式。你使用它仅仅是因为你看到资深开发人员在使用它,而不是因为你理解它在什么情况下以及为什么适用。
这是货物崇拜式的洗脑——在不理解其目的的情况下进行仪式。
为什么会出现这种情况:我们通过模仿来学习,这本身并没有错。但我们有时会忽略理解模仿对象背后的语境和逻辑这一步骤。
解决方法:当你发现某种模式或做法时,一定要问“为什么?”,深入探究其背后的原因:
为什么选择这种模式?它解决了什么问题?还有哪些替代方案,为什么它们都被否决了?
权衡利弊是什么?每个设计决策都有成本和收益。了解这两者有助于你判断何时应用该模式,何时不应用。
如果我不使用这种模式会怎样?有时答案是“没什么大不了的”。有些模式解决的是你根本不存在的问题。
我举个例子:单例模式。它确保一个类只有一个实例,并提供一个全局访问点来访问它。
class Database {
constructor() {
if (Database.instance) {
return Database.instance;
}
this.connection = createConnection();
Database.instance = this;
}
}
// Only one instance ever created
const db1 = new Database();
const db2 = new Database();
console.log(db1 === db2); // true
看起来很精致!但究竟什么时候才应该用它呢?
在以下情况下使用单例模式:
- 你确实只需要一个实例(例如数据库连接池)。
- 该实例需要全局可访问
- 创建多个实例会导致问题(资源冲突、数据不一致)。
以下情况请勿使用单例模式:
- 你只是想整理函数(建议使用模块)。
- 你用它是为了避免传递依赖项(这会使测试更难)。
- 你这是盲目照搬,因为你在一本设计模式书里看到了它。
单例模式在很多情况下已经不再流行,因为它会创建隐藏的依赖关系,使测试变得困难。在现代代码中,依赖注入通常是首选:
// Instead of Singleton:
class UserService {
constructor(database) {
this.db = database; // Dependency is explicit
}
getUser(id) {
return this.db.query(`SELECT * FROM users WHERE id = ?`, id);
}
}
// In your app setup:
const db = new Database();
const userService = new UserService(db);
现在依赖关系是明确的、可测试的、灵活的。
教训是:不要因为模式听起来很聪明就使用它们,而应该因为它们能解决你遇到的具体问题而使用它们。
15. 未读取错误日志
你的代码已部署。一位用户报告了一个错误。你尝试在本地重现该错误,但一切正常。你耸耸肩,认为可能是用户操作有误。
与此同时,你的生产日志中充斥着大量的错误信息,告诉你到底出了什么问题,但你却视而不见。
原因:日志记录感觉像是无意义的重复劳动。它们信息量很大,而且充斥着看似无关的信息。此外,正确设置日志记录似乎也很复杂。
解决方法:日志是发现漏洞的藏宝图。学会有效地使用它们。
实现结构化日志记录。不要只是打印随机字符串。使用支持级别、上下文和结构的日志框架:
import logging
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)
# Use appropriate levels
logger.debug("Detailed info for debugging")
logger.info("General information about program execution")
logger.warning("Something unexpected but not critical")
logger.error("An error occurred, but program continues")
logger.critical("Critical error, program might crash")
# Add context
logger.info("User logged in", extra={'user_id': 12345, 'ip': '192.168.1.1'})
记录正确的内容:
- 记录错误时要包含完整的上下文信息。不仅仅是“发生错误”,而是“无法获取用户 ID 为 12345 的用户数据:连接超时”。
- 记录重要业务事件。例如用户注册、购买、重大状态变更等。
- 记录性能指标。该数据库查询耗时多久?处理了多少项?
- 请勿记录敏感数据。请勿记录密码、信用卡信息或个人身份信息。
请正确使用日志级别:
- 调试模式:用于诊断问题的详细信息。通常不会在生产环境中启用。
- 信息:确认一切运行正常。
- 警告:发生了一些意外情况,但应用程序可以继续运行。
- 错误:发生严重问题。部分功能失效。
- 严重:非常严重的问题。应用程序可能无法继续运行。
在生产环境中,您可以将级别设置为 INFO 或 WARNING,这样 DEBUG 日志就不会淹没您的系统。
使日志可搜索。使用日志聚合工具(例如 Elasticsearch、Splunk、CloudWatch 等),以便搜索和筛选日志。您需要能够回答以下问题:
- “请显示过去一小时内的所有错误”
- “显示所有耗时超过5秒的API请求”
- “显示所有与用户 ID=12345 相关的日志”
设置警报。当出现特定错误情况时,您应该立即收到通知。不要等待用户报告问题。
第三部分:高级错误(或“伤疤的智慧”)
你现在是一名资深开发人员。你已经交付了多个大型项目。你指导过初级开发人员。你参与架构决策。你被委以重任,负责复杂而关键的系统。
然而,你依然会犯错。不同的错误,更隐蔽的错误,但终究是错误。这些错误源于经验——只有当你积累了足够的经验,足以尝试真正困难的事情之后,才会犯这样的错误。
16. 过早抽象
这是过度设计的邪恶孪生兄弟,而且更阴险,因为它看起来像是一种良好的实践。你看到一些重复的代码,你的直觉会告诉你“DRY!不要重复自己!”于是你立刻把它抽象成一个共享函数或类。
问题在于,这两段看起来相似的代码实际上可能代表着不同的关注点,只是目前看起来相似而已。当你将它们抽象化时,你实际上是将两个原本应该独立的部分耦合在了一起。之后,当需求发生变化时(而需求总是会变化的),你需要修改抽象层来处理这两种情况,这会增加复杂性和条件逻辑,最终抽象层比直接复制代码还要复杂。
我曾经创建过一个抽象层来处理系统中的“用户操作”。它看起来完美无缺——登录、注册、个人资料更新,所有操作都遵循相同的模式。六个月后,每种操作类型都发生了巨大的变化,导致这个抽象层变成了一团乱麻,充斥着各种 if 语句和特殊情况。最终,我们删除了它,并分别重写了每种操作。单独的实现方式更加清晰,也更容易维护。
原因如下:我们都听过“DRY”(Don't Repeat Yourself,不要重复自己)原则被奉为圭臬。我们从小就被教育要把重复代码视为坏事。此外,创建抽象层感觉很高级——感觉就像在编写“简洁的代码”。
解决方法:遵循“三法则”。在三次遇到相同模式之前,不要进行抽象。第一次写出来的时候,你是在学习需要掌握的知识。第二次,你是在验证模式。第三次,你才算真正理解了模式,可以安全地进行抽象。
即使如此,也要问问自己:
它们真的是同一件事吗?还是只是碰巧现在看起来很相似?两段代码可能结构相似,但代表不同的领域概念。
它们会一起演进还是各自独立发展?如果它们可能因为不同的原因而发生变化,那就应该将它们分开。这其实就是单一职责原则的另一种体现——不同的职责应该分开,即使代码看起来相似。
抽象化是否比重复实现更简洁?抽象化应该降低复杂性,而不是增加复杂性。如果你的抽象化需要大量的参数、配置选项和条件逻辑,那么它可能比重复实现更糟糕。
请看这个例子:
# You have two similar functions:
def send_welcome_email(user):
subject = "Welcome to our platform!"
body = f"Hello {user.name}, welcome!"
send_email(user.email, subject, body)
log_email_sent(user.id, 'welcome')
def send_password_reset_email(user, token):
subject = "Reset your password"
body = f"Hello {user.name}, use this token: {token}"
send_email(user.email, subject, body)
log_email_sent(user.id, 'password_reset')
你的第一反应可能是抽象思考:
# Premature abstraction:
def send_user_email(user, email_type, extra_data=None):
if email_type == 'welcome':
subject = "Welcome to our platform!"
body = f"Hello {user.name}, welcome!"
elif email_type == 'password_reset':
subject = "Reset your password"
token = extra_data['token']
body = f"Hello {user.name}, use this token: {token}"
# More elif blocks as we add email types...
send_email(user.email, subject, body)
log_email_sent(user.id, email_type)
这种抽象方式已经显得有些混乱了。如果再添加第三种邮件类型(确认邮件、通知邮件等),情况会更糟。更好的方法或许是将它们分开,或者创建一个更灵活的抽象层:
# Better: Keep them separate or use composition
class EmailTemplate:
def __init__(self, subject, body_template):
self.subject = subject
self.body_template = body_template
def render(self, **kwargs):
return self.body_template.format(**kwargs)
def send_templated_email(user, template, log_type, **kwargs):
body = template.render(name=user.name, **kwargs)
send_email(user.email, template.subject, body)
log_email_sent(user.id, log_type)
# Usage:
welcome_template = EmailTemplate(
"Welcome to our platform!",
"Hello {name}, welcome!"
)
send_templated_email(user, welcome_template, 'welcome')
reset_template = EmailTemplate(
"Reset your password",
"Hello {name}, use this token: {token}"
)
send_templated_email(user, reset_template, 'password_reset', token=token)
这种方式更加灵活,不需要为每种电子邮件类型编写条件逻辑。
记住:适度的重复总比错误的抽象要好。当你对领域有更深入的理解时,总可以再进行抽象。过早的抽象比重复的抽象更难撤销。
17. 没有为可观测性进行设计
您的系统正在生产环境中运行,并且分布在多个服务中。现在出现了一些问题——用户报告页面加载缓慢、偶尔出现错误或行为异常——但您却完全不知道系统内部究竟发生了什么。
你添加了一堆打印语句并重新部署。现在你被日志淹没了,但你仍然无法弄清楚为什么某个特定的请求失败了,或者为什么上周二凌晨 3 点延迟会飙升。
原因如下:我们在构建系统时,往往专注于“正常流程”——确保各项功能都能正常运行。可观测性常常被我们视为事后才考虑的因素,仿佛是以后再添加的东西。但当你真正需要它的时候,再进行补救就困难得多。
解决方法:从一开始就进行可观测性设计。可观测性意味着能够通过检查系统的输出来了解系统内部正在发生的事情。
可观测性的三大支柱是:
1. 日志记录- 离散事件(“用户 123 已登录”、“付款已处理”)
2. 指标- 随时间变化的数值测量值(请求速率、错误率、CPU 使用率)
3. 追踪- 跟踪请求在整个系统中的流转过程
我们来逐一谈谈:
日志记录(我们之前已经介绍过,但让我们深入了解一下):
在分布式系统中,关联性至关重要。当一个请求流经多个服务时,你需要能够追踪它的整个路径。使用关联 ID:
import uuid
from flask import Flask, request, g
app = Flask(__name__)
@app.before_request
def before_request():
# Get correlation ID from header, or generate new one
g.correlation_id = request.headers.get('X-Correlation-ID', str(uuid.uuid4()))
@app.route('/api/users')
def get_users():
logger.info(
"Fetching users",
extra={'correlation_id': g.correlation_id}
)
# ... rest of handler
# When calling another service, pass the correlation ID
headers = {'X-Correlation-ID': g.correlation_id}
response = requests.get('http://other-service/data', headers=headers)
现在,您可以通过在日志中搜索关联 ID 来跟踪所有服务中的单个请求。
指标:
指标可以帮助您发现趋势和模式。可以追踪以下方面:
- 请求速率(每秒请求数)
- 错误率(失败请求的百分比)
- 延迟(请求所需时间)
- 资源使用情况(CPU、内存、磁盘)
from prometheus_client import Counter, Histogram
import time
# Define metrics
requests_total = Counter('requests_total', 'Total requests', ['method', 'endpoint'])
request_duration = Histogram('request_duration_seconds', 'Request duration')
@app.route('/api/users')
def get_users():
requests_total.labels(method='GET', endpoint='/api/users').inc()
start_time = time.time()
try:
# Handle request
return jsonify(users)
finally:
duration = time.time() - start_time
request_duration.observe(duration)
这些指标可以绘制成图表,让你看到随时间变化的规律。凌晨 3 点错误率是否飙升?图表显示了这一点。过去一周延迟是否逐渐增加?你也能看到。
追踪:
分布式追踪可以显示请求在系统中的传输路径以及每个步骤所花费的时间。这对于调试微服务至关重要。
from opentelemetry import trace
tracer = trace.get_tracer(__name__)
@app.route('/api/order')
def create_order():
with tracer.start_as_current_span("create_order") as span:
span.set_attribute("user_id", user_id)
# This creates a child span
with tracer.start_as_current_span("validate_payment"):
result = validate_payment(payment_info)
# Another child span
with tracer.start_as_current_span("reserve_inventory"):
inventory.reserve(items)
return {"order_id": order_id}
通过追踪,可以看到该create_order请求总共耗时 500 毫秒,其中 450 毫秒用于处理请求validate_payment,30 毫秒用于reserve_inventory处理其他请求。现在你知道应该在哪里进行优化了。
仪表盘和警报:
如果无人查看,可观测性数据就毫无用处。创建仪表盘,让系统健康状况一目了然:
- 请求速率和错误率
- 潜伏期(p50、p95、p99 百分位数)
- 资源使用情况
- 关键业务指标(每分钟订单量、活跃用户数)
设置异常警报:
- 错误率高于1%
- 延迟 p95 大于 2 秒
- 日志中的任何严重错误
- 服务健康检查失败
目标是在用户发现问题之前就了解问题。
18. 未考虑极端情况和故障模式
你设计了一个漂亮的系统。架构优雅,代码简洁,正常流程也经过了彻底测试。然后你部署了系统,结果却出现了你从未预料到的种种问题。
用户输入带有表情符号的姓名,导致数据库崩溃。两个请求同时到达,创建了重复记录。你依赖的某个服务开始返回格式错误的数据。网络随机丢包,系统进入不一致状态。
原因在于:我们通常会考虑事物应该如何运作,而不是它们可能出现哪些故障。极端情况看似不太可能发生,所以我们不会为此做好规划。但在一个拥有数百万用户的分布式系统中,小概率事件却会不断发生。
解决方法:培养一种防御性的、多疑的心态。假设一切都会失败,并为此做好准备。
考虑边界条件:
- 如果列表为空怎么办?
- 如果字符串非常非常长呢?
- 如果这个数字是零呢?负数呢?无穷大呢?
- 如果日期是过去呢?或者遥远的未来呢?
def calculate_average(numbers):
# What if numbers is empty?
if not numbers:
return 0 # or raise an exception, or return None—but handle it!
return sum(numbers) / len(numbers)
def process_text(text):
# What if text is None? What if it's enormous?
if text is None:
return ""
if len(text) > 10_000:
# Prevent DoS attack via huge input
raise ValueError("Text too long")
return text.strip().lower()
考虑竞态条件:
当多件事同时发生时,它们可能会以意想不到的方式相互作用。
# BAD: Race condition
def increment_counter(user_id):
count = db.get_counter(user_id) # Read: count = 5
count += 1 # Increment: count = 6
db.set_counter(user_id, count) # Write: count = 6
# If two requests do this simultaneously:
# Request A reads: count = 5
# Request B reads: count = 5
# Request A writes: count = 6
# Request B writes: count = 6
# Final count is 6, not 7!
# GOOD: Atomic operation
def increment_counter(user_id):
db.atomic_increment('counters', user_id)
# This is handled atomically by the database
考虑一下部分失败的情况:
在分布式系统中,操作可能会在进行过程中失败,导致系统处于不一致的状态。
def transfer_money(from_account, to_account, amount):
# What if we succeed in debiting but fail in crediting?
debit(from_account, amount) # Succeeds
credit(to_account, amount) # Fails! Now money vanished!
解决方案:使用事务或幂等性:
def transfer_money(from_account, to_account, amount, transfer_id):
# Use database transaction for atomicity
with db.transaction():
# Check if already processed (idempotency)
if db.exists('transfers', transfer_id):
return # Already processed
debit(from_account, amount)
credit(to_account, amount)
db.insert('transfers', {'id': transfer_id, 'status': 'completed'})
# Either everything succeeds or everything rolls back
想想连锁故障:
当一项服务出现故障时,是否会导致其他服务也出现故障?
# BAD: Cascading failure
def get_user_profile(user_id):
user = user_service.get(user_id) # If this times out...
orders = order_service.get_orders(user_id) # ...we never get here
recommendations = rec_service.get_recs(user_id) # ...or here
return render(user, orders, recommendations)
# GOOD: Isolated failures
def get_user_profile(user_id):
user = user_service.get(user_id) # Critical
try:
orders = order_service.get_orders(user_id, timeout=1)
except TimeoutError:
orders = [] # Degrade gracefully
try:
recommendations = rec_service.get_recs(user_id, timeout=1)
except TimeoutError:
recommendations = get_default_recommendations()
return render(user, orders, recommendations)
使用断路器:
如果某个服务反复出现故障,请暂时停止调用该服务,让它恢复正常:
class CircuitBreaker:
def __init__(self, failure_threshold=5, timeout=60):
self.failure_count = 0
self.failure_threshold = failure_threshold
self.timeout = timeout
self.opened_at = None
self.state = 'closed' # closed, open, half-open
def call(self, func):
if self.state == 'open':
if time.time() - self.opened_at > self.timeout:
self.state = 'half-open'
else:
raise CircuitBreakerOpen("Service unavailable")
try:
result = func()
if self.state == 'half-open':
self.state = 'closed'
self.failure_count = 0
return result
except Exception as e:
self.failure_count += 1
if self.failure_count >= self.failure_threshold:
self.state = 'open'
self.opened_at = time.time()
raise
这样可以防止级联故障——如果某个服务宕机,你就不会再用注定会失败的请求轰炸它了。
19. 不投资开发者体验
您正在构建一个包含多个服务、数据库、队列和缓存的复杂系统。要使其在本地运行,新开发人员需要:
- 安装 15 种不同的工具
- 按正确顺序运行 10 条命令
- 编辑 6 个配置文件
- 祈祷一切顺利启动。
搭建一个可用的本地环境需要两天时间。代码审查耗时极长,因为审查人员无法轻松地测试更改。调试也十分痛苦,因为没有简单的方法可以在本地重现生产环境的场景。
原因在于:我们优先考虑面向用户的功能,而非开发者工具。让开发者使用起来更方便,感觉像是“锦上添花”,而非必不可少。但糟糕的开发者体验会不断累积——最终拖慢一切。
解决方案:投资简化开发流程。回报巨大。
简化设置:
理想情况下,一名新开发人员应该能够运行一个命令并获得一个可运行的环境:
# Clone repo
git clone repo-url
cd project
# Single command to set up everything
make setup
# Single command to run everything
make run
使用 Docker Compose 打包所有依赖项:
# docker-compose.yml
version: '3'
services:
app:
build: .
ports:
- "3000:3000"
depends_on:
- database
- redis
environment:
DATABASE_URL: postgres://db:5432/myapp
REDIS_URL: redis://redis:6379
database:
image: postgres:13
environment:
POSTGRES_DB: myapp
redis:
image: redis:6
现在docker-compose up一切就绪。无需安装,无需配置,开箱即用。
编写优秀的文档:
不仅仅是 API 文档——还要记录整个开发者体验:
- 如何搭建开发环境
- 如何运行测试
- 如何调试常见问题
- 建筑设计决策及其原因
- 如何添加常见类型的功能
确保文档可搜索并保持更新。过时的文档比没有文档更糟糕。
创建调试工具:
构建便于调试的工具:
# Development-only endpoint that shows system state
@app.route('/debug/status')
def debug_status():
if not app.debug:
abort(404)
return {
'database': db.is_connected(),
'redis': redis.ping(),
'queue_size': queue.size(),
'active_users': session_store.count(),
'feature_flags': feature_flags.all()
}
快速反馈循环:
从做出改变到看到效果的时间应该越短越好:
- 快速测试(在几秒钟内运行单元测试,而不是几分钟)
- 热重载(代码更改立即生效)
- 易于部署到测试环境
正确的错误信息:
开发过程中出现问题时,错误信息应该告诉你如何解决:
# BAD
if not config.api_key:
raise ValueError("API key missing")
# GOOD
if not config.api_key:
raise ValueError(
"API key missing. Set the STRIPE_API_KEY environment variable. "
"For local development, copy .env.example to .env and add your key."
)
20. 直到为时已晚才考虑安全问题
安全问题在真正发生之前,往往给人一种抽象的感觉。因此,人们很容易将其推迟——“我们以后再添加身份验证”,“我们最终会加密这些数据”,“我们规模太小,没人会盯上我们”。
然后,你的系统就遭到入侵,用户数据泄露,公司因为负面新闻上了头条。或者,你之后试图增加安全措施,却发现这需要重写一半的系统。
原因何在:安全似乎会拖慢开发速度,增加复杂性。此外,安全漏洞看似只是假设,直到它们成为现实。
解决方法:从一开始就构建安全机制。这比事后加装安全装置要容易得多。
永远不要相信用户输入。这是黄金法则。验证、清理、转义——始终如此。
# SQL Injection vulnerability
def get_user(username):
# NEVER do this!
query = f"SELECT * FROM users WHERE username = '{username}'"
return db.execute(query)
# If username is: "admin' OR '1'='1"
# Query becomes: SELECT * FROM users WHERE username = 'admin' OR '1'='1'
# Returns all users!
# SAFE: Use parameterized queries
def get_user(username):
query = "SELECT * FROM users WHERE username = ?"
return db.execute(query, (username,))
XSS(跨站脚本攻击)防护:
// DANGEROUS: Inserting user content directly into HTML
element.innerHTML = userData;
// If userData is: "<script>alert('hacked')</script>"
// The script executes!
// SAFE: Escape HTML entities
function escapeHtml(unsafe) {
return unsafe
.replace(/&/g, "&")
.replace(/</g, "<")
.replace(/>/g, ">")
.replace(/"/g, """)
.replace(/'/g, "'");
}
element.textContent = userData; // Also safe—textContent doesn't interpret HTML
身份验证和授权:
- 使用成熟的库(OAuth、JWT),不要自己编写。
- 使用 bcrypt 或 Argon2 对密码进行哈希处理,切勿以明文形式存储密码。
- 所有地方都应使用 HTTPS,没有任何例外。
- 实施速率限制以防止暴力破解攻击
from werkzeug.security import generate_password_hash, check_password_hash
# Storing password
hashed = generate_password_hash(password)
db.save_user(username, hashed)
# Verifying password
stored_hash = db.get_user_hash(username)
if check_password_hash(stored_hash, provided_password):
# Login successful
最小特权原则:
只授予用户和服务所需的权限,不多也不少。
# BAD: Application connects to database as admin
db = connect(user='admin', password='admin_pass')
# GOOD: Application has limited permissions
db = connect(user='app_user', password='app_pass')
# app_user can only SELECT, INSERT, UPDATE on specific tables
# Cannot DROP tables, CREATE users, etc.
保持依赖项更新:
库中漏洞层出不穷,务必及时修补。
# Regularly check for vulnerabilities
npm audit
pip-audit
设置自动依赖项更新(Dependabot、Renovate),以便在出现安全问题时收到通知。
加密敏感数据:
- 对静态数据(数据库、备份)进行加密
- 传输中数据加密(HTTPS、TLS)
- 切勿记录敏感数据(密码、信用卡信息、社保号码)
安全标头:
@app.after_request
def set_security_headers(response):
response.headers['X-Content-Type-Options'] = 'nosniff'
response.headers['X-Frame-Options'] = 'DENY'
response.headers['X-XSS-Protection'] = '1; mode=block'
response.headers['Strict-Transport-Security'] = 'max-age=31536000; includeSubDomains'
return response
这些标头可以防止常见的攻击,例如点击劫持和 XSS 攻击。
21. 优化的是当下的流量,而不是未来的流量
您的应用程序每秒可以完美处理 100 个请求。您的数据库查询速度很快。您的 API 响应时间为 50 毫秒。一切都很棒。
六个月后,你的请求量达到了每秒 10,000 次。你的数据库不堪重负。你的 API 响应时间长达 10 秒。你的架构在小规模运行时完美无缺,但在高负载下却摇摇欲坠。
原因如下:过早优化固然不好,但完全忽略可扩展性也同样不可取。你需要找到平衡点——不要为了不需要的规模而过度设计,但也不要把自己逼入绝境。
解决方法:设计时要考虑下一个数量级。
如果今天有 100 个用户,那就按 1000 个用户来设计。当用户达到 1000 个时,再按 10000 个用户来重新设计。不要从一开始就试图设计成能容纳 100 万用户——那样会过度设计,浪费时间。但也不要设计得让后续扩展变得不可能。
尽早发现瓶颈:
即使它们目前还不是问题,也要知道它们在哪里:
- 扫描整个表的数据库查询
- 无法线性扩展的操作(N+1 查询、嵌套循环)
- 单点故障(一台服务器,一个数据库)
- 同步操作可以异步执行
横向扩展设计:
水平扩展(增加服务器数量)比垂直扩展(增大服务器容量)更容易。设计时应确保增加容量的方式是增加服务器数量,而不是升级现有服务器。
无状态服务:将状态存储在数据库或缓存中,而不是应用程序内存中。这样,您可以运行服务的多个实例。
# BAD: State in memory
active_sessions = {} # Lives in this server's memory
@app.route('/login')
def login():
session_id = generate_id()
active_sessions[session_id] = user_data
return session_id
# If we add another server, it doesn't have this session data!
# GOOD: State in Redis
@app.route('/login')
def login():
session_id = generate_id()
redis.set(f"session:{session_id}", user_data, ex=3600)
return session_id
# Any server can access this session data
使用消息队列进行异步工作:
不要在 HTTP 请求处理程序中执行耗时的操作(例如发送电子邮件、处理图像、生成报告)。请将它们放入队列中:
# BAD: Synchronous
@app.route('/send-newsletter')
def send_newsletter():
users = db.get_all_users() # 100,000 users
for user in users:
send_email(user.email, newsletter_html) # Takes 100ms each
return "Sent!" # User waited 10,000 seconds (2.7 hours)!
# GOOD: Asynchronous
@app.route('/send-newsletter')
def send_newsletter():
users = db.get_all_users()
for user in users:
queue.enqueue('send_email', user.email, newsletter_html)
return "Queued!" # Returns immediately
# Background worker processes queue
def worker():
while True:
job = queue.dequeue()
if job:
send_email(job.email, job.html)
积极缓存:
每层都进行缓存:
- 浏览器缓存:缓存时间较长的静态资源(CSS、JS、图片)
- CDN:从靠近用户的边缘节点提供静态内容
- 应用缓存:将耗时的操作缓存到内存中(Redis、Memcached)
- 数据库缓存:查询结果缓存
from functools import lru_cache
import redis
r = redis.Redis()
# Memory cache for function results
@lru_cache(maxsize=1000)
def get_popular_posts():
return db.query("SELECT * FROM posts ORDER BY views DESC LIMIT 10")
# Redis cache for database queries
def get_user(user_id):
# Check cache first
cached = r.get(f"user:{user_id}")
if cached:
return json.loads(cached)
# Cache miss—query database
user = db.query("SELECT * FROM users WHERE id = ?", user_id)
# Store in cache for 1 hour
r.setex(f"user:{user_id}", 3600, json.dumps(user))
return user
缓存失效很难把握时机。要仔细考虑何时失效:
def update_user(user_id, data):
db.update("users", user_id, data)
# Invalidate cache
r.delete(f"user:{user_id}")
监控性能指标:
随着业务增长,跟踪关键指标:
- 响应时间(p50、p95、p99)
- 吞吐量(每秒请求数)
- 错误率
- 资源利用率(CPU、内存、磁盘、网络)
设置提醒,以便了解何时接近限额。
Load test before you need to:
Don't wait until you go viral to discover your bottlenecks. Use tools like Apache JMeter, Locust, or k6 to simulate high traffic:
# locust test
from locust import HttpUser, task, between
class WebsiteUser(HttpUser):
wait_time = between(1, 3)
@task
def view_homepage(self):
self.client.get("/")
@task(3) # 3x more likely than view_homepage
def view_post(self):
post_id = random.randint(1, 1000)
self.client.get(f"/posts/{post_id}")
Run this with 1,000 simulated users and see what breaks. Fix it before it breaks in production.
22. Not Treating Documentation as a First-Class Deliverable
You've built an amazing system. The architecture is solid, the code is clean, the tests pass. You ship it. Six months later, you've moved to another project, and someone needs to modify your system.
They spend days trying to understand it. They can't figure out why certain decisions were made. They don't know how components interact. They end up rewriting parts that were working fine because they didn't understand the original intent.
Or worse: you return to your own code after six months and can't remember why you did things the way you did.
Why this happens: Documentation feels like boring work that slows you down. Code is the real deliverable, right? Plus, documentation gets out of date quickly, so what's the point?
The fix: Documentation is code for humans. Treat it as seriously as you treat code for computers.
Document the "why," not just the "what":
Code shows what you're doing. Comments and docs should explain why.
# BAD: Documents the obvious
# Increment counter by 1
counter += 1
# GOOD: Explains reasoning
# We increment after validation to avoid counting invalid attempts.
# This matches the analytics team's definition of "active sessions."
counter += 1
Architecture Decision Records (ADRs):
When you make significant architectural decisions, write them down:
# ADR 001: Use PostgreSQL for Primary Database
## Status
Accepted
## Context
We need to choose a database for our application. Key requirements:
- ACID transactions for financial data
- Complex queries with joins
- Strong consistency guarantees
- Mature tooling and community support
We considered PostgreSQL, MySQL, and MongoDB.
## Decision
We will use PostgreSQL as our primary database.
## Consequences
Positive:
- Excellent support for complex queries and transactions
- JSON support for semi-structured data
- Strong community and tooling
- Free and open source
Negative:
- Vertical scaling limits (though horizontal options exist)
- Slightly more complex setup than MySQL
- NoSQL-style operations less optimized than MongoDB
This captures not just what you decided, but why, and what alternatives you considered. When someone questions the decision later, they can see the reasoning.
README files that actually help:
Every project should have a README that answers:
- What is this? (Brief description of what the project does)
- Why does it exist? (What problem does it solve?)
- How do I set it up? (Step-by-step setup instructions)
- How do I use it? (Common usage examples)
- How do I contribute? (For open source or shared projects)
- Where can I get help? (Links to docs, chat, issue tracker)
# Order Processing Service
Handles order fulfillment and inventory management for the e-commerce platform.
## Quick Start
bash
Clone and install
git clone https://github.com/company/order-service
cd order-service
npm install
Set up environment
cp .env.example .env
Edit .env with your database credentials
Run migrations
npm run migrate
Start the service
npm run dev
## Architecture
This service is part of a microservices architecture:
- Listens to `order.created` events from the Order API
- Updates inventory in the Inventory Database
- Emits `order.fulfilled` or `order.failed` events
- See [Architecture Docs](docs/architecture.md) for detailed diagrams
## Key Concepts
**Idempotency**: All event handlers are idempotent. We track processed event IDs
in the `processed_events` table to prevent duplicate processing.
**Inventory Locking**: When processing an order, we use database row-level locks
to prevent race conditions. See `src/inventory/lock.ts` for implementation.
Inline documentation for complex logic:
When you write something complex or non-obvious, explain it:
def calculate_shipping_cost(weight_kg, distance_km, is_express):
"""
Calculate shipping cost based on weight and distance.
Args:
weight_kg: Package weight in kilograms
distance_km: Shipping distance in kilometers
is_express: Whether express shipping is selected
Returns:
Cost in dollars (float)
Note:
The formula uses a base rate plus distance and weight multipliers.
Express shipping adds a 50% surcharge.
The weight tiers (0-5kg, 5-20kg, 20+kg) were determined by our
logistics team based on carrier pricing brackets. See ticket #1234.
"""
base_rate = 5.00
# Weight-based pricing tiers (from logistics team analysis)
if weight_kg <= 5:
weight_cost = weight_kg * 0.50
elif weight_kg <= 20:
weight_cost = 5 * 0.50 + (weight_kg - 5) * 0.30
else:
weight_cost = 5 * 0.50 + 15 * 0.30 + (weight_kg - 20) * 0.20
distance_cost = distance_km * 0.10
subtotal = base_rate + weight_cost + distance_cost
if is_express:
subtotal *= 1.5 # 50% express surcharge
return round(subtotal, 2)
Keep docs in version control:
Documentation should live in the same repo as code, so it stays synchronized. Use Markdown files in a docs/ directory, or tools like Sphinx, MkDocs, or Docusaurus for more sophisticated documentation sites.
Update docs with code:
Make documentation updates part of your code review process:
- Changed how a feature works? Update the docs.
- Added a new API endpoint? Document it.
- Fixed a tricky bug? Add a note explaining the issue and solution.
Treat outdated documentation as a bug. It's worse than no documentation because it actively misleads people.
23. Not Understanding Technical Debt
You need to ship a feature quickly. You take some shortcuts—skip tests, hard-code values, copy-paste some code, add a TODO comment. "We'll clean this up later," you tell yourself.
Later never comes. The shortcuts accumulate. The codebase becomes harder to work with. New features take longer to build. Bugs become more frequent. You're spending more time fighting the code than building features.
This is technical debt, and it compounds like financial debt.
Why this happens: Shipping features creates visible value. Paying down technical debt feels like busywork with no immediate benefit. Plus, there's always something more urgent, more important, more visible.
The fix: Understand that technical debt is a tool, not a failure. Sometimes taking shortcuts is the right decision—but you need to be intentional about it and pay it back.
Not all shortcuts are technical debt:
Bad code you plan to throw away isn't debt—it's a prototype or proof of concept. If you're validating an idea, quick and dirty is fine. Just don't put it in production without a rewrite.
Temporary workarounds can be okay if they're truly temporary. Need to hard-code a value to ship today while you build the configuration system? Fine. But set a deadline to fix it, and honor that deadline.
Technical debt is shortcuts in production code that you intend to improve later.
Make debt visible:
Track technical debt explicitly. Some teams maintain a "tech debt" label in their issue tracker. Others dedicate time each sprint to debt paydown.
# TODO: Refactor this into separate functions
# HACK: Works around bug in library v1.2, remove when upgraded
# DEBT: Should use proper caching here, but skipping for now
These comments make debt visible, but they're not enough. Create actual tickets:
Issue #123: Refactor user authentication module
Technical Debt
The authentication code in auth.py has grown organically and is now
difficult to maintain. It mixes concerns (validation, session management,
logging) and has poor test coverage.
Impact: Medium
- New auth features take 2x longer than they should
- Recent bugs in session handling (issues #110, #115)
Effort: ~3 days
Benefits:
- Easier to add new auth methods
- Better test coverage
- Clearer separation of concerns
Pay debt down regularly:
Don't wait for a "big refactoring" that never happens. Pay down debt incrementally:
The Boy Scout Rule: Leave code better than you found it. When you touch a file, improve it a little—better variable names, extract a function, add a test.
Dedicated time: Many teams allocate 20% of each sprint to technical debt and maintenance. This prevents debt from accumulating faster than you can pay it down.
Refactor in place: When adding a feature to messy code, refactor first, then add the feature. Don't pile new code on top of bad code.
Know when to declare bankruptcy:
Sometimes technical debt is so bad that paying it down isn't worth it. You're better off rewriting the component from scratch. This is a big decision—rewrites are risky—but sometimes it's the right call.
Signs it's time to rewrite:
- The code is so tangled that every change breaks something
- Nobody understands how it works anymore
- The architecture fundamentally doesn't support new requirements
- You're spending more time working around problems than building features
But be careful: Rewrites often take 3x longer than expected, and you're not building new features during that time. Consider refactoring iteratively instead—replace components one at a time while keeping the system running.
24. Not Learning from Incidents
Something goes wrong in production. Maybe the site goes down. Maybe data gets corrupted. Maybe users can't check out and you're losing revenue.
You scramble to fix it. You work frantically, trying things until something works. Eventually, you get it back up. You breathe a sigh of relief, close the laptop, and move on.
A month later, it happens again. Same root cause, slightly different symptoms. You fix it again. This repeats every few months.
Why this happens: After an incident, you're tired. You're stressed. You just want to move on to normal work. Writing up what happened feels like homework.
The fix: Conduct blameless postmortems. Learn from every incident so it doesn't happen again.
Blameless is critical: The goal isn't to find who to blame. It's to understand what systemic issues allowed the incident to occur. If you blame people, they'll hide mistakes, and you'll never learn from them.
Bad postmortem: "Bob deployed broken code and took down the site."
Good postmortem: "A code change that passed all tests caused an issue in production. Our testing didn't catch this because we don't have integration tests for this component. Our deployment process deployed to all servers at once, so when the issue appeared, it affected 100% of users."
Postmortem template:
# Incident Postmortem: Site Outage on 2024-11-15
## Severity
Critical - Full site outage
## Duration
45 minutes (14:23 - 15:08 UTC)
## Impact
- All users unable to access the site
- Approximately 10,000 users affected
- Estimated $5,000 in lost revenue
## Root Cause
A database migration added a column without a default value. The application
code assumed this column existed and had values, causing all queries to fail.
## Timeline
14:23 - Migration deployed to production
14:24 - Site starts returning 500 errors
14:25 - Alerts fire, on-call engineer paged
14:30 - Engineer begins investigation
14:35 - Identified database errors in logs
14:45 - Root cause identified
14:50 - Rollback decision made
15:00 - Rollback deployed
15:08 - Site fully operational
## What Went Well
- Alerts fired quickly
- Engineer responded within 5 minutes
- We had a rollback procedure
## What Went Wrong
- Migration wasn't tested against production data
- We didn't have a staging environment that matched production
- Deployment affected all servers at once (no gradual rollout)
- Application didn't handle missing column gracefully
## Action Items
1. [P0] Add staging environment with production-like data (Owner: Alice, Due: 2024-11-20)
2. [P0] Implement gradual deployments (5% -> 25% -> 100%) (Owner: Bob, Due: 2024-11-22)
3. [P1] Add migration testing to CI/CD (Owner: Charlie, Due: 2024-11-25)
4. [P2] Improve application error handling for database changes (Owner: Diana, Due: 2024-12-01)
## Lessons Learned
- Migrations are code and need the same testing rigor
- Gradual deployments would have limited impact to 5% of users
- Need better alignment between schema changes and application code
Key elements:
Timeline: When did things happen? This helps identify where processes broke down.
Root cause: Not just "what broke" but "why was it possible for this to break?" Go deep—use the "5 whys" technique.
Action items: Concrete, assigned tasks with deadlines. Without action items, the postmortem is just a history lesson.
Lessons learned: What will you do differently next time?
Share postmortems widely. Other teams can learn from your incidents. Many companies publish public postmortems—it builds trust and shares knowledge across the industry.
Track action items. A postmortem without follow-through is useless. Review action items regularly to ensure they get done.
25. Not Mentoring Others (Or Not Learning from Mentees)
You've reached a senior level. You're technically strong. You can solve complex problems. But you're still coding in isolation, focusing only on your own work, not helping others grow.
Or maybe you are mentoring, but you're doing it badly—telling people what to do instead of teaching them how to think, or getting frustrated when they don't understand something immediately.
Why this happens: We promote people for technical skills, not necessarily for teaching ability. Mentoring takes time and energy. It's also vulnerable—you have to admit what you don't know and be okay with being questioned.
The fix: Learn to mentor effectively. It multiplies your impact exponentially.
A senior developer is not the person who writes the most code. A senior developer is the person who makes everyone around them better.
Good mentoring principles:
1. Ask questions instead of giving answers.
Junior: "My code isn't working. What should I do?"
Bad mentor: "You need to add a null check on line 23."
Good mentor: "Let's debug together. What does the error message say? What have you tried so far? Let's walk through the code and find where the null value is coming from."
The first approach solves the immediate problem. The second approach teaches debugging skills that will solve hundreds of future problems.
2. Share your thought process.
When pair programming or code reviewing, narrate your thinking:
"I'm looking at this function, and my first question is: what are the inputs and outputs? Okay, it takes a user ID and returns a user object. Now, what could go wrong? What if the ID doesn't exist? What if it's null? I see we're not handling those cases..."
This teaches how to think about code, not just what code to write.
3. Give feedback sandwich-style.
Start with what's good, then areas for improvement, then encouragement.
"This code works well and is easy to read. The variable names are clear. One thing to consider: if the list is empty, this will throw an error. How could we handle that case? Overall, you're making great progress—keep it up."
4. Calibrate difficulty.
Give people tasks just beyond their current level—not so easy they're bored, not so hard they're overwhelmed. This is the "zone of proximal development."
If someone is learning React, don't start with "Build a complex state management system with Redux." Start with "Build a simple counter component." Then "Add a form that updates the counter." Gradually increase complexity.
5. Let them struggle (a little).
Don't immediately rescue them when they're stuck. Struggling is how we learn. But don't let them struggle forever—provide hints, ask guiding questions, show similar examples.
"I see you're stuck on this. Let me ask: have you looked at the documentation for this function? What did you try? What happened? Let's look at the error together."
6. Learn from your mentees.
Junior developers often see things that experienced developers miss:
- They question assumptions ("Why do we do it this way?")
- They're not bound by "that's how we've always done it"
- They bring fresh perspectives and new techniques
Stay humble. Be willing to say "I don't know" or "That's a good point, let me think about that."
7. Create a safe environment.
People learn best when they feel safe to make mistakes and ask questions. If someone asks a "dumb" question, never make them feel stupid.
"That's a great question! Let me explain..." not "You should already know this."
Document your knowledge:
You can't mentor everyone directly. Write blog posts, create internal docs, record video tutorials. This scales your knowledge across the team and organization.
26. Chasing Trends Instead of Mastering Fundamentals
A new framework drops. It's all over Hacker News. Everyone's talking about how it will revolutionize development. You immediately start rewriting your app in it.
Six months later, another framework appears. You rewrite again. Your resume lists 15 frameworks but you don't deeply understand any of them.
Why this happens: New is exciting. New feels like progress. Plus, there's fear of falling behind, of becoming irrelevant.
The fix: Focus on fundamentals. Frameworks change, but fundamentals endure.
Technologies have a half-life: The specific framework you learn today might be obsolete in five years. But the fundamentals—data structures, algorithms, design principles, debugging skills—remain valuable for your entire career.
Learn React, but more importantly, learn:
- How rendering works
- How state management works
- Component composition
- When to optimize and when not to
Then when Svelte or the next thing comes along, you'll understand it quickly because you understand the underlying concepts.
Master one thing deeply before moving to the next:
It's better to know JavaScript deeply than to know JavaScript, Python, Ruby, Go, and Rust superficially.
Deep knowledge means understanding:
- The language's idioms and conventions
- Its performance characteristics
- Its ecosystem and tooling
- Its edge cases and gotchas
- How to debug when things go wrong
Timeless skills to invest in:
- Data structures and algorithms: How to choose the right structure for the job, how to analyze complexity
- System design: How to architect reliable, scalable systems
- Debugging: How to systematically find and fix problems
- Reading code: Most of your time is spent reading, not writing
- Communication: Explaining technical concepts to technical and non-technical audiences
- Problem solving: Breaking down complex problems into manageable pieces
When should you learn new technologies?
- When your current tools genuinely don't solve the problem
- When the new technology has proven itself (not just hype)
- When you have time to learn it properly, not superficially
- When it aligns with your career goals
Don't learn something just because it's trendy. Learn it because it solves a problem you have or teaches you concepts that transfer to other domains.
Part IV: Universal Mistakes (The Ones We Never Stop Making)
These mistakes transcend skill level. I've seen brilliant senior engineers make them. I've made them myself, repeatedly, despite knowing better. They're human mistakes, not technical ones.
27. Not Taking Breaks
You're in the zone. The code is flowing. You've been at it for four hours straight. You're making progress. Surely you can just push through, finish this feature, ship it...
Then you realize you've been chasing a bug for the last hour that wouldn't exist if you'd just read the error message carefully. Or you've been implementing a complex solution when a simple one would work better, but you're too tired to see it.
Why this happens: Programming is addictive. Solving problems triggers dopamine. Taking a break feels like losing momentum.
The fix: Your brain needs rest. Take breaks. Seriously.
The Pomodoro Technique: Work for 25 minutes, break for 5. After four cycles, take a longer break (15-30 minutes).
Take walks: When you're stuck on a problem, go for a walk. Don't think about the problem. Let your diffuse mode thinking work on it subconsciously. I've solved countless bugs while walking the dog.
Stop when you're tired: Working while exhausted is counterproductive. You write bad code, make poor decisions, and introduce bugs. Better to stop, rest, and return fresh.
The two-minute rule for tough bugs: If you're truly stuck, step away for at least two minutes. Get water, stretch, look out the window. Often the solution appears when you return.
Sleep on it: I can't count the number of times I've struggled with a problem for hours, gone to sleep frustrated, and woken up with the solution immediately obvious.
28. Not Asking for Help
You've been stuck for three hours. You're too embarrassed to ask for help. You don't want to look stupid. You don't want to admit you don't know something.
So you keep struggling. Maybe you find a convoluted workaround. Maybe you eventually figure it out. But you've wasted hours (or days) when someone could have helped you in five minutes.
Why this happens: Pride. Fear of judgment. Impostor syndrome. Not wanting to bother people.
The fix: Asking for help is a skill. Learn it.
The 30-minute rule: If you're genuinely stuck for 30 minutes, ask for help. Not after 5 minutes (try to solve it yourself first), but definitely not after 3 hours.
How to ask for help effectively:
1. Show what you've tried:
"I'm trying to X. I've tried A, B, and C. A gave me error Y. B seemed to work but then Z happened. C didn't work because of reason R. Do you have any ideas?"
This shows you've made an effort and helps the person understand the problem.
2. Provide context:
Not: "My code doesn't work."
But: "I'm trying to fetch user data from the API, but I'm getting a 401 error. Here's my code [paste]. Here's the error [paste]. I've checked that my API key is correct."
3. Make it easy to help you:
- Include relevant code snippets
- Include error messages
- Explain what you expect vs. what's happening
- Minimize: provide a minimal reproducible example
4. Be respectful of others' time:
"Hey, do you have 5 minutes to help me with something?" is better than launching into a complex explanation while someone is clearly busy.
5. Pay it forward:
When someone helps you, help others when you can. Build a culture where asking for help is normal and encouraged.
Remember: Everyone was a beginner once. Everyone gets stuck. People generally want to help—they're not judging you for not knowing something.
29. Not Celebrating Wins
You finished a complex feature. It took weeks of hard work. It works beautifully. You ship it.
Then immediately move on to the next task without acknowledging what you just accomplished.
Over time, this leads to burnout. You feel like you're always climbing, never reaching the summit. You lose sight of how much you've grown and what you've achieved.
Why this happens: Software development has no natural endpoints. There's always another feature, another bug, another improvement. The work is never "done."
The fix: Deliberately celebrate wins, large and small.
Keep a "wins journal." At the end of each week, write down what you accomplished:
- Bugs fixed
- Features shipped
- Skills learned
- Problems solved
- People helped
When you feel like you're not making progress, read it. You'll be amazed at how much you've actually done.
Share wins with your team. In standups or retrospectives, celebrate what people accomplished. Not just "I finished ticket #123" but "I implemented the payment system, which was complex because of X, Y, Z, and I learned a lot about..."
Reflect on how far you've come. Look at code you wrote a year ago. You'll probably cringe—that's good! It means you've grown. Remember what seemed impossible then that's easy now.
Take time between projects. When you ship something big, take a breath before diving into the next thing. Process what you learned. Document what worked and what didn't.
30. Forgetting Why You Started Coding
You started coding because it was fun. Because creating something from nothing was magical. Because solving puzzles was satisfying.
Now it's a job. You're dealing with meetings, deadlines, politics, legacy code, and bureaucracy. You've forgotten the joy.
Why this happens: Professional software development is different from learning to code. Real-world constraints, stakeholder demands, and technical debt can drain the fun out of it.
The fix: Reconnect with what you love about coding.
Side projects: Build something just for fun. No deadlines, no stakeholders, no best practices you don't want to follow. Just play. Build something silly, experimental, useless. Remember what it feels like to create for the sake of creating.
Learn something unrelated to your job: Pick up a new language or paradigm just because it interests you. Try functional programming if you've only done OOP. Try game development if you've only done web dev. The goal isn't career advancement—it's curiosity and joy.
Teach others: Help beginners learn to code. Their excitement is contagious. Seeing someone's eyes light up when their first program works will remind you of your own early days.
Contribute to open source: Find a project you use and love, and contribute to it. Being part of a community effort can be more fulfilling than corporate work.
Remember your wins: Keep a folder of projects you're proud of. When you're feeling burned out, look through them. Remember that time you built something amazing from scratch. Remember how good it felt.
Take breaks from the industry: If you're truly burned out, that's okay. Take a sabbatical. Travel. Learn something completely different. Programming will still be here when you get back, and you'll return with fresh energy and perspective.
The Meta-Lesson: Mistakes Are the Path
I've shared dozens of mistakes in this post—mistakes I've made, mistakes I've seen, mistakes I've debugged at 3 AM while cursing past decisions. But here's the truth that took me years to understand:
Mistakes are not failures. They're data.
Every bug you fix teaches you something about how systems fail. Every bad architecture decision teaches you about tradeoffs. Every project that goes sideways teaches you about what not to do next time.
The developers I respect most aren't the ones who never make mistakes—those developers don't exist. They're the ones who make mistakes, learn from them, share them, and help others avoid them.
This is how we grow:
We write code that works. Then we see how it fails. We fix it. We understand why it failed. We develop intuition. We make different mistakes at a higher level. We learn from those. We keep climbing.
It's a spiral, not a ladder. You'll revisit the same lessons at different levels throughout your career. You'll make "beginner" mistakes again when you're working in an unfamiliar domain. You'll find new depths to concepts you thought you understood years ago.
And that's beautiful.
A Final Word
If you're a beginner and this post feels overwhelming—don't worry. You don't need to master everything at once. Pick one or two things to focus on. Make them habits. Then pick another one or two.
If you're intermediate and recognizing yourself in these mistakes—good. That awareness is the first step to improving.
If you're advanced and still making these mistakes sometimes—you're human. We all are.
The best developers I know aren't the ones who never make mistakes. They're the ones who:
- Make mistakes quickly (fail fast, learn fast)
- Learn from them (reflect and adjust)
- Share them (help others avoid the same pitfalls)
- Stay curious (keep learning, keep growing)
- Remain humble (there's always more to learn)
So go forth. Write code. Make mistakes. Learn from them. Build things. Break things. Fix things. Help others. Ask for help. Celebrate your wins. Remember why you love this.
And when you inevitably encounter a bug at 2 AM that turns out to be a single missing semicolon, take a deep breath, smile at the absurdity of it all, fix it, commit it with a message like "fix: add missing semicolon (don't judge me)," and keep going.
We're all in this together, stumbling forward, learning as we go, building the future one line of code—and one mistake—at a time.
Have you made mistakes I didn't cover? Found solutions I didn't mention? I'd love to hear about them. Drop a comment below and let's learn from each other. After all, that's what this whole beautiful mess is about.
文章来源:https://dev.to/thebitforge/common-coding-mistakes-at-every-level-and-how-to-fix-them-4cgb