Saturday, January 31, 2026

Economies of AI

This is a cost-benefit analysis on using AI to solve problems and comparing how it fares with classical methods, e.g. deterministic algorithms or manual labor, and the cost of the creation of automation.

A fair warning: currently, LLM is not able to summarize this article correctly because of my unique perspective (example), as this article is not about the H-word at all. You should try to read it yourself. If you are impatient, at least read the first sentence of each paragraph and the conclusion.

AI vs. Deterministic Algorithms

An example of a deterministic algorithm is to compute an arithmetic expression like "1+2+4". There are well-known and efficient ways to compute it.

  • First the string is tokenized: "1+2+4" → ['1', '+', '2', '+', '4']. This is called lexing.
  • Then the string is organized into an abstract syntax tree: ['1', '+', '2', '+', '4'] → Plus(1, Plus(2, 4)). This is called parsing.
  • Then the abstract syntax tree can be traversed recursively and the value is computed: Plus(1, Plus(2, 4)) → Plus(1, 6) → 7. This is called evaluation.
  • Under the hood, a machine would compute the addition using logic gates called an Adder.

For AI to do the same, the tokenizing is similarly done by a deterministic algorithm, but the rest of it is done through many large matrix multiplications. The size of these matrices are much larger than the length of the input tokens, and matrix multiplications take \(\omega(n^2)\) time complexity. Logic gates for a Binary Multiplier is also much more complex than an Adder. Large matrices take up more memory space and more communication bandwidth to move the data.

Which is why a machine could make billions if not trillions of calculations a second, but it would take AI a few seconds to complete a single prompt. Not to mention the power consumption needed by AI is several orders of magnitudes greater than a deterministic algorithm.

This is why for the problems for which we have a deterministic algorithm, it would not make economic sense to use AI to solve these problems. Furthermore, it would be AI's best interest to offload any such prompts to a deterministic algorithm. AGI may be an academic interest, but it is not economically viable for doing mundane tasks. Just like we would not be hiring humans to crunch numbers anymore once computers became commonplace.

AI vs. Manual Labor for Doing the Work

To achieve economic parity, AI would have to be relegated to the odd jobs—the long tail for which no deterministic algorithm exists. For these odd jobs, a person should try to do it first before trying AI. This is for two reasons: once they have done the job themselves, they have a better understanding how to write the prompt; and they will be in a better position to evaluate whether AI is doing the job correctly. Skipping this step is a common reason for getting AI slop. It is not necessarily the fault of the model when the prompt itself is sloppy.

If an odd job is truly one-off, it may make sense to do it only manually because the cost of learning by doing is comparable to the cost to understand how to write the correct prompt. When doing things manually, we gain insight about any potential problem, and then adjust the assumptions, requirements or expectations to avoid these problems. AI is unlikely to challenge the assumptions made by the prompt unless specifically asked. We wouldn't know what to ask for unless we are already aware of the problems. We wouldn't be aware of the problems unless we tried to do it ourselves. So just do it first. When the job happens again, then offload it to AI. This weird trick of DIY-ism will save you tons of time writing prompts, perhaps counter-intuitively.

AI vs. Manual Labor for the Creation of Automation

When these odd jobs become frequent, it then makes sense to invest in the time to automate them by creating a deterministic algorithm and writing programs. Traditionally, a human would write the computer programs for these algorithms. AI could presumably write them now, but I argue that the economy impact difference is minimal between the two. The reason is that whatever the cost is to develop software, the cost is amortized over the many jobs it ends up automating. Even though the one-time development cost may be expensive, it becomes negligible if you spread the cost over many jobs. AI may be 10x more productive than humans for writing programs, but 10% of negligible is still negligible.

What is not negligible is the cost of poorly designed automation, which has a multiplicative effect on the defects of the outcome. The defects can be the incorrectness of the output, or the inefficiency in the algorithm itself usurping too much resources or taking too long. If the algorithm is poorly designed, then it would screw up over many jobs, and the expense to clean up the mess is the polar opposite of negligible: it would be astronomical. It doesn't matter whether the algorithm is designed by a human or AI.

When it comes to the creation of automation, use whatever tool at our disposal to design an algorithm that reliably achieves the correct outcome and can do it efficiently. Even if AI is not able to vibe code a project from start to finish, it can still be a valuable tool for humans to learn about the nature of the problem through prototyping.

Divide and Conquer

So far, we treat the problem as a monolith. In reality, a problem can be broken down to many subproblems. It is like when computing "1+2+4" we compute one addition at a time, either:

  • Leftist: (1+2)+4 = 3+4 = 7
  • Rightist: 1+(2+4) = 1+6 = 7

And there is more than one way to break down the subproblems. The ability to decompose problems also gives rise to efficient algorithms known as divide and conquer algorithms, and in many cases it can be proven that this is the optimal way to solve a given class of problems.

When discussing AI's economic proposition, we should remember that many bespoke problems can be reduced to subproblems that are recurrent and can be solved at a greater economy of scale than if we considered each problem in isolation.

For example, car builders would design common parts, e.g. the engine and chassis, that can be reused across multiple models of sedans and SUVs. These engines and chassis are built out of common parts like standardized screws, nuts and bolts. Greater economy of scale is achieved by using common off the shelf parts, even if the end product is bespoke.

In the same way, we can mix AI, deterministic algorithms, and even manual labor in different configurations to achieve economy of scale.

Value Proposition

Another issue we neglected is the value proposition of the outcome of the work. In the pre-computer ages, human calculators were used for extremely high value work even though they are slow and error prone, from artillery in a battle that increases the probability of winning the battle, to scientific calculations that raced to create atomic weapons that ended World War II. Or they are employed for the backbone of economy itself, such as finances and accounting.

When computation became so cheap, they are used for entertainment like video games or watching cat videos.

Similarly, the method for which we use to solve a problem—manual, AI, or automation—speaks nothing about the value of the work that employs them. When it comes to high value work where the stake of failure is high, AI will still face a fierce competition with automation and human ingenuity, in part because of AI's high error rate. On the other hand, when AI is used to generate videos for entertainment, who cares if the video shows someone with seven fingers, or if the text is malformed, provided the entertainment value is good? There is no stake in these failures.

When company management makes the decision to replace work with AI, it is a signal that they consider the value proposition of the work to be low. They could be proven wrong by the market or the competition. Indeed, competition is a remedy for Enshittification, and we need Anti-Trust enforcement to ensure competition. I'm not sure if labor protection helps, since it enables complacency, not ingenuity.

Conclusion

We reach a conclusion where the economic viability is unsurprisingly dictated by the economy of scale.

  • High volume work should be done by a deterministic algorithm, not AI.
  • Low volume work could be done by AI, but humans should do it first so they can understand the problem better, for writing better prompts and for evaluating the efficacy of the output.
  • One-off work should be done by humans first to understand the problem.

When deciding which problems are high volume, low volume, or one-off, we should use a divide and conquer approach and break bespoke problems down to reusable and recurring subproblems, so we can achieve greater economy of scale. Again, this should not be a surprise for economists. If anything, computer science just provides the vocabulary to explain why the economy of scale is achievable.

We also came to a conclusion that AI slop is enabled by enshittification, and the remedy is more competition through Anti-Trust enforcement; this is also unsurprising for economists.

The more sober minded person will come to realize that AI is just one more way to get things done, and it is still subject to the same market forces as everything else. Commodified work will eventually be replaced by deterministic algorithms, not AI. High value work will still face competition from human ingenuity, unless the human chooses to be complacent or if our values somehow become corrupt.

That last point that our values have somehow become corrupt is my greatest fear.

Friday, January 23, 2026

Introduction to Fiber Optics for 10G Network at Home

Fiber optics are becoming cheaper and more affordable for homes nowadays, and they could be a fantastic alternative for RJ-45 10GBase-T Ethernet. Fiber transceivers at the same speed use less power and run cooler. They also don't need to be upgraded like you would need to upgrade an Ethernet cable from Cat5e to Cat6a to now Cat8.

I have worked with fiber network designs at work, at least in theory. But in the last two weeks, I learned even more than I cared for about fiber optics, and here is a quick summary about them before I forget.

  • Fiber core size: single-mode fiber (SMF, OS2) vs. multimode fiber (MMF, OM5, OM4, OM3, OM2, OM1).
    • Single mode has a smaller core and can run for longer distances. Multimode fibers may be more cost effective once upon a time, but these days single mode fibers are so cheap, they should be the default choice for even short (< 100m) runs.
    • Note: the mode type must match with the transceiver as well, so you have to use a single mode fiber with a single mode transceiver.
  • Number of strands: simplex (1 fiber) or duplex (2 fibers).
    • The transceivers most commonly used for data center applications will use duplex fibers, using one strand for each direction of transmission. Both directions typically use the same wavelength.
    • Simplex fibers are more common for Fiber To The Home (FTTH). You have to use a matching (opposite) pair of transceivers at the two ends, e.g. one side is 1330nm-TX/1270nm-RX, and the other side should be 1270nm-TX/1330nm-RX. The two wavelengths traverse on the same fiber strand.
  • Connector type: SC (standard connector, larger) vs. LC (lucent connector, smaller).
    • For some reason, the LC transceivers are more commonplace, possibly because you can fit a duplex connector into a transceiver with the SFP form factor.
    • There are also SC transceivers, but these are less common.
    • Again, you have to match the connector with the transceiver.
  • Connector contact type: UPC (domed, color coded blue) vs. APC (angled, color coded green). They refer to how the fiber ending is terminated. The angled type reduces back-reflection.
    • You have to match the contact type with the transceiver as well. Inserting APC connector to a UPC transceiver can damage the optics.
    • I typically find either SC/APC or LC/UPC to be more common. The other combinations (e.g. SC/UPC or LC/APC) are possible but uncommon.
    • Adapters exist but they can cause signal loss. Due to the naming, it can be hard to find the correct adapter by keyword (e.g. "SC/APC female to LC/UPC male" can also give you results for "SC/APC male to LC/UPC female").
    • Sometimes SC/APC is written as SC/Angled or simply SCA.

Bend radius: more recently, I learned about bend insensitive fibers from a video about the InvisiLight fiber kit. The whole kit cost $250 and includes a pre-terminated G.657.B3 simplex fiber, two 1G transceivers, and two media converters. For my purpose, I would need to buy my own 10G transceivers and a 10G media converter.

Bend insensitive fibers are characterized by having smaller bend radius than the regular fibers (30mm), and they are defined by the standards G.657.A1 (10mm), G.657.A2 or B2 (7.5mm), and G.657.B3 (5mm). This primarily allows installation in tight corners in the home, and the transceivers are agnostic to the bend radius.

FS.com has a custom fiber builder where you can build simplex or duplex fibers with custom core sizes, connector and contact types, and bend radius. They offer white color which is better than bright yellow for home installation. However, their sales representative told me that G.657.A2 with 0.9mm cable diameter has a minimum order quantity (MOQ) of 1KM (this is not on the website). The best they could do with no MOQ is G.657.A1 with a 2.0mm diameter. Perhaps if more people inquire about the G.657.A2 0.9mm, they would eventually lift the MOQ.

There appears to be some pre-terminated G.657.A2 0.9mm cables on eBay, but I have not tested them. Some of the fibers on eBay are also a bit weird, having SC/APC (green) on one side and SC/UPC (blue) on the other, or they are from SC to LC. Either way, a 40 meter (130 feet) long bend insensitive fiber cable should be in the $12-$16 range. Fiber is cheaper than Cat6 per unit length, and they are not limited to 100 meters.

For running between rooms in the home, you would typically use a simplex single-mode and matching LC/LC or SC/SC transceivers, and choose or build a fiber cable with the same connector. Here are some transceiver options I found:

  • LC/UPC single-mode simplex:
  • SC/APC single-mode simplex: Elfcam branded 10G SC Bidi transceivers (20km), sold as a matching pair for $66.
A distance of 10km / 6.2 miles will let you run fiber to your friend's house living in the next town!

For running short distances between devices in the rack room, LC duplex single-mode has more transceiver options. I also use a Direct Attached Cable, which is just copper but with both ends hard-wired to SFP+ connector.

The SFP+ connector refers to the plug form factor where the transceiver is inserted into the switch device. It determines the electrical speed as well.

  • SFP: 1G.
  • SFP+: 10G, 25G.
  • QSFP (or QSFP28, literally "quad SFP"): 100G.
    • There are four electrical lanes.
    • When a "breakout transceiver" is plugged in (e.g. 4x25G-LR), a single port can be used as four independent interfaces, e.g. 4x25G.
    • A "singleton transceiver" (e.g. 100G-LR4) would combine all 4 electrical lanes to drive one SerDes (serializer/deserializer).
  • QSFP-DD (or QSFP56, or QDD, literally "double density quad SFP"): 400G, 800G.
    • There are eight electrical lanes that support various breakout transceivers: 4x100G, 8x50G, or 8x100G.
    • They can also be used together, e.g. 400G-FR4.
    • Sometimes the breakout transceivers are confusingly named as 400G-DR4 (breaks out to 4x100G-DR) or 400G-XDR4 (breaks out to 4x100G-FR).
  • OSFP (literally "octa SFP"): 400G, 800G.
    • There are eight electrical lanes also supporting various breakout transceivers.
The main reason for these "breakout transceivers" is to allow switches to interoperate with previous generation switches, e.g. a switch with QSFP ports would use a breakout transceiver to split each port into four fibers connecting to SFP+ transceivers, or QSFP-DD/OSFP breakout out to QSFP. You don't typically see this on consumer devices, but some of them support breakout transceivers, e.g. QNAP QSW-M7308R-4X (I bought it at $999 a week ago, now it's $1299, probably due to the new shipment having Trump tariffs).

With data centers upgrading from QSFP 100G to QSFP-DD or OSFP 400G or 800G to satisfy the AI computing needs, and with more FTTH deployments, we are seeing more surplus SFP+ 10G transceivers and switches flowing to the consumer market. This could be a good time to upgrade your home network cabling from copper based cable to fiber.

Thursday, January 1, 2026

RCU, SMR, Hazard Pointers: data structure memory management for concurrent programs

Yesterday, I was looking for whether OpenBSD supports ZFS (it may have been briefly, but the code is no longer there). As I peruse Ted Unangst's blog, I found a benchmark showing that creating a socket is 10x slower on Linux than OpenBSD, referencing a post by Jann Horn. People quickly put the culprit on RCU, in particular rcu_synchronize().

Read-Copy Update (see also What is RCU, Fundamentally?) is a way to distribute data structures across multiple CPUs that favors the reader by making the writer do more work. Readers can assume their copy is immutable within the scope of some critical section, so they can read it in a non-blocking manner. The writer has to make a private copy first, update it, then publish the new copy to a global reference. However, the writer has to wait for all readers to be done with the old copy before the old copy can be reclaimed. They wait for the readers by calling rcu_synchronize().

The rcu_synchronize() is probably what actually makes the writing slow. It waits for all readers to exit the critical section for the old data, then frees it, before it can make another update again. Worse, an inconsiderate reader might treat the data as a private copy and hold onto the critical section while being blocked on I/O or something else. RCU conservatively prevents old data from accumulating at all, where in reality it might have been fine to have multiple copies of old data, as long as it doesn't grow indefinitely. This is the main idea behind GUS (globally unbounded section); the unboundedness here refers to the writer being unrestrained by the readers lagging behind. This is implemented as Safe Memory Reclamation (SMR) in FreeBSD (subr_smr.c). They use a version sequence number to determine whether the reader has moved on.

It would have been totally reasonable if we just tell the last reader that they are responsible for freeing the data (i.e. reference counting), so the writer can move on. Or perhaps the writer could provide a deleter function upon the last reader exiting the critical section. In any case, old data can only accumulate as far as there are readers, so it is bound by the number of readers.

Hazard Pointer tries to solve a similar problem but for the limited purpose of memory reclamation instead of trying to ensure a consistent view of data structures. A reader would enter the critical section by adding the object in question to a global registry of hazard pointers. Whoever wants to free some object has to check the global registry of hazard pointers to make sure it does not have a reader. This idea was first proposed by Maged Michael et al, 2002, seemingly abandoned for over 20 years, but now making its way into C++26.

But I argue that Hazard Pointer is not scalable. Even if we allow at most one hazard pointer per thread, we still have to support millions of threads in a production system all holding a hazard pointer one way or another, so the sheer size of the registry is phenomenal. The registry also begs the question about its own internal consistency: by the time a writer finishes scanning the registry whether an object is implicated by any hazard pointers in the registry, another reader might have added the hazard pointer to the registry without the scanner knowing.

When it comes to data structure, especially the high performance ones, treating the data as immutable is the way to go. Since modern multicore CPUs use MOESI protocol for cache coherence, each CPU benefits from having its private copy that it can access very quickly without having to go to the main memory, and there is very little memory bus contention.

The problem with RCU is not the immutable paradigm, but how it makes the writer wait for all readers. Also, even if entering the critical section is non-blocking, the reader should exit as soon as it can and avoid lingering in it while being potentially blocked by something else. We should also just make the last reader free the object upon exiting, so versioning is not necessary.

In some extreme cases where you might have a large amount of immutable data tied to some worker threads, it might even make sense to wind down the worker (i.e. lameduck) and spin new workers up with new data, similar to deploying new microservice versions.