发布于 2026-01-06 5 阅读
0

Complexity

复杂

澄清:这篇文章并非讨论计算复杂度,例如大O符号等。

关于复杂性这一主题,有一些经典著作:

还有一点值得一提:

复杂性是软件开发的核心。计算机本身就非常复杂,它包含数十亿个“继电器”(以晶体管的形式存在)。最复杂的机械手表有多少细节呢?我猜少得多。

计算机和计算机网络是人类迄今为止建造的最复杂的事物。而软件的职责就是应对这种复杂性。显然,我们编写软件并非为了直接操控所有组件,而是要处理层层“抽象”的体系:晶体管构成逻辑门,逻辑门构成处理器,处理器运行机器代码,机器代码由程序生成(通常使用高级语言),而体系的顶端则是库、框架、范式等等。

我们建造这座高塔是为了隐藏复杂性。你可以停留在某一层“抽象”,如果幸运的话,或许可以避开复杂性,但更有可能的是,你终将面对复杂性。你听说过“分而治之”、“紧耦合”或封装吗?它们都与管理复杂性有关。

什么是复杂性?

复杂的——由许多不同且相互关联的部分组成。

——牛津词典

复杂性是系统的固有属性。系统包含的(独特的)部件越多,部件之间的连接越多,系统就越复杂。

希基关注的是关联性。的确,只要你能一次性考虑小群体,(独特的)项目数量就没那么重要;但如果大量事物以僵化的方式关联在一起,你就需要把它们全部“拖”到一起,才能思考这个群体,才能对这个群体进行推理。

此外,希基还对部件的复杂性进行了定义。如果一个部件承担多种职责或扮演多种角色,那么它就是复杂的。可以将其理解为多个事物组合在一起,因此它是复杂的。

为什么复杂性难以理解?

希基区分了“难-易”和“复杂-简单”两种情况。布鲁克斯则用“困难”代替了“难”。那么,为什么复杂系统会让人觉得难呢?

米勒定律(这是一条经验定律)指出,人们一次最多能集中注意力7个(±2)项目。例如,如果实验者说出一些数字并要求受试者按相反顺序重复,那么受试者最有可能在7个(±2)项目内成功。由此可见,我们的大脑一次无法记住太多信息。

要理解一些复杂的概念,你需要先在大脑中“启动”它。这通常需要一些时间和“外部”存储空间(例如笔记本、电脑等)。你可能看过那幅关于专注力以及如何避免打扰程序员的漫画,它形象地阐释了“启动”复杂概念(系统、模型)的过程。

所以真正的问题在于,如果你要处理这个复杂的系统,从长远来看它会“扼杀”你。每次你需要修改某些东西,每次你需要解释它(并且其他人需要学习它),这种对复杂系统的缓慢评估过程都会重复进行。

本质复杂性与偶然复杂性

我按照亚里士多德的观点,将它们分为本质问题(软件本质中固有的困难)和偶然问题(如今伴随软件生产而来但并非固有的困难)。

——小弗雷德里克·P·布鲁克斯

布鲁克斯谈论的是“困难”,而不是复杂性。我认为“困难”是一个更广泛的概念,因此软件开发中的困难也包括复杂性。

本质复杂性——是指你实际试图解决的问题的复杂性,除非你同意改变初始任务的范围,否则它是不可简化的。

偶然复杂性——是指解决问题时使用的工具或选择的路径所增加的复杂性,它们并非初始复杂性,并且(理论上)可以在不改变初始任务范围的情况下消除。

例如:我需要编写一个前端应用程序,实现“某项功能”。“某项功能”指的是核心的复杂性。而配置 webpack、babel、应对 JavaScript 疲劳(选择框架)、争论分号、手动格式化代码(除非有 Prettier 工具)等,都属于偶然的困难。

重要提示:本质复杂性和偶然复杂性之间的区别并不总是显而易见的。人们有时会将二者混淆,而将注意力集中在偶然复杂性上。

本质性和偶然性是相对的概念,取决于语境。

如果我需要在代码中实现一些业务规则,那么数据结构(如何实现记录,是使用 Robin Hood 哈希表还是相对论哈希表)及其算法(例如快速排序还是 Timsort)的选择属于偶然复杂度。但如果我需要为编程语言编写标准库,那么数据结构和算法的选择就属于本质复杂度。
在(某些)高级编程语言中,数据结构的选择仍然是必需的,因为目前还没有简单的自动化方法,但已经有一些研究正在朝着这个方向发展

偶然复杂性的例子

本质复杂性因任务而异,但偶然复杂性却在不同任务中反复出现。我想,我们可以列出一些与复杂性相关的“坏习惯”。我将列举一些(并非全部)。

抽象概念错误。例如,CSS 定位模型。第一次尝试:position`<div>`、flow`<span>` margin、`<span>`——布局难以描述,需要使用 CSS hack,因为这些基本属性与布局无关。第二次尝试:Flexbox——布局更容易描述,因为它提供了列、对齐等基本属性。第三次尝试:网格布局——现在我们才真正谈论布局。但所有这些解决方案的复杂性都相当高(配置项很多)。布局的本质任务是描述块的大小以及它们之间的空间分配。Kevin Lynagh 提出的解决方案更简单。使用过“ spacer gif ”的人直观地发现问题在于空间分配。

抽象层级突破。正如我上面提到的,我们使用抽象层级来隔离不同的复杂性。但有时会发生抽象层级突破——低层抽象渗透到高层抽象中。例如,goto命令(command)就存在于高级编程语言中,直到被迪杰斯特拉(Dijkstra)批评为止。

遗留问题——当我们不得不沿用一些旧的标准、约定俗成的方法等等时,就会出现这种情况。这类似于错误的抽象问题,只不过最初的抽象本身可能没问题,只是超出了最初的用途(用布鲁克斯的话来说,就是被“重新利用”了)。即使它不再适用,你也需要保留它。

反对偶然复杂性的运动

一般来说,人类会努力消除偶然的复杂性,以便将更多精力投入到解决本质的复杂性上。这就是高级编程语言存在的原因——使用机器代码与机器交互既繁琐又容易出错,所以我们创建了程序(编译器等),将更易读的语言翻译成机器代码。这样做实际上增加了整体复杂性,因为编译器本身的复杂性就相当高。我们增加整体复杂性是为了降低某些特定场景(我们非常关注的场景)中的偶然复杂性。与此同时,这种额外的产物(编译器及其标准)最终可能会成为一种遗留问题。唉,人生处处充满权衡。

我们采取了各种措施来对抗意外的复杂性,以便我们能够更快地交付更多产品,从而解决越来越大的问题。

Today we rarely write an algorithm instead we take something ready from the shelf, we rarely implement data structures, we rarely create standards etc (statistically speaking - there are people who do this, but the number is significantly less compared to those who use it). Open source is essential here.

Nowadays, we do programming by poking.

-- Gerald Jay Sussman

There are things which can be considered as accidental complexity, but still, exist at a higher level only because we haven't figured out how to solve it in a general way. For example:

  • manual memory management. We are close to a solution, there are effective GC implementations, for example in Ponylang.
  • null issue. We can use maybe monad or maybe not.
  • need to choose algorithms and data structure instead of general data types and leave the machine to figure out what is the best implementation for given use-case is.
  • parallelism. There are a lot of attempts.

We move in direction higher levels abstractions, to more declarative solutions, to more appropriate abstractions. The convenience of the declarative solution is that we can swap low-level implementation without touching the high level.

The higher we climb this tower of abstraction the higher cost of the wrong abstraction, the more we trapped in legacy. So from time to time regression happens, when we come back to couple levels back and reevaluate best practices.

Interesting effect when people got used to some accidental complexity and can't view it as accidental complexity anymore:

  • they believe that it is an appropriate abstraction. If it is hard to understand you just need to learn it harder
  • they believe that it is required for performance. This argument was used to protect goto, for example
  • they believe that true programmers suppose to know it. I guess pointers is a good example here
  • they can't believe it can be solved without this accidental complexity. For example, long-standing believe that you need to use manually managed memory to write low-level things, like a database. There is a database written in Go, to be fair it is not possible to write Memcached in Go, but I believe it is possible with Ponylang.

Even more disturbing is that those people can have a lot of experience and authority, so it takes a lot of effort to change those opinions.

It is hard to make simple things

It is hard to produce a simple solution, but it is easy to use one. And it is easy to produce a complex solution and hard to use it (sometimes it can be seen as easy to use, but this impression goes away fast).

Examples:

  • it took 2 years for Rich Hickey to design Clojure (behind closed doors). Features are added very slowly for what it has been criticized
  • Elm adds features very slow
  • React appeared 5 years ago (something like this). It didn't have good composable state management. People invented tones of solutions (more than 20 for sure). Only this year they showed hooks and it is not final yet, we still wait for final thing and Cache thing and then it would be possible to build a proper solution based on those abstractions.

This is because it is hard to find proper abstraction, to find proper abstraction you may need to write (and use) 10 wrong ones. It is ok to do errors, as long as you learn from errors and you don't insist that your solution is the right one, its just everybody else needs to learn it.

How to deal with complexity?

Where complexity comes from?

Complexity comes from interconnected, tangled, twisted, complected (in terms of Hickey) things. Those connections grow (typically) more than a linear way with the number of items. This can idea can be considered as a derivation of rule of product from combinatorics.

Combinatorics is the study of collections of objects. Specifically,
counting objects, arrangement, derangement, etc. of objects along
with their mathematical properties.

If two events are not mutually exclusive (that is, we do them
separately), then we apply the product rule.

If you add even one thing there is a high chance that complexity will grow by more than one, typically quickly than a linear progression.

So if you about to add one more... param to a function, configuration to an application, one more feature to the project, one more choice for the consumer of your library be aware of the consequences.

How to manage complexity?

To decrease complexity we can decrease the number of things (nodes in terms of graphs). In practice, this means that we refuse from some features, narrowing down scope etc. This is the only way to decrease complexity, all other methods will help to manage it.

To manage complexity we can decrease the number of connections (edges in terms of graphs). In practice, this means that we make more strict isolation, typical examples are "divide and conquer", encapsulation, narrowing down types (decrease cardinality). By isolating and hiding some nodes we make sure that there will be no "unexpected" connections.

We can make "connection" weaker, by introduction additional intermediate node. In practice, this means that we introduce some interface or type (with structural subtyping) or instead of direct call we introduce message queue etc. The number of nodes increased (so complexity increased), but the connection is weaker, so we can tear apart graph in subgraphs and reason only about part of it.

文章来源:https://dev.to/stereobooster/complexity-5d62