Saturday, August 30, 2025

Case Study on the Impact of AI on Software

Consider this hypothetical scenario. One day, a coworker suddenly becomes hyper-productive. They are simultaneously handling production on-call duties yet cranking out pull requests for 100s lines of code every 10 minutes. They have an IT background, not computer science.

But when you look at their code, something seems off.

  • It looks messy and unnecessarily complicated for what it tries to do, and contains incoherent style every few lines of code.
    • People can write code that is unnecessarily complicated, but that's typically coming from someone who is dogmatic about their style and thought process. That sort of complexity tends to be highly self-consistent.
    • Even though the code obeys coding style rules, the style guide still gives a lot of leeway for personal discretion, so you can often tell from the code who wrote it.
  • The comments mansplain what the code is doing, rather than why it is done like that.
    • This is typically a beginner's mistake. As software engineers become more experienced, they write the code to be self-explanatory but only write comments about the rationale.
  • Each pull request makes a lot of unnecessary changes, such as renaming variables.
    • Renaming a variable is rarely done by the same person because people feel attached to the names they have given, to a point where they might feel defensive about it.
    • It is also a pain because you might break existing variable references. Automated tools exist, but it is still work, and people don't like doing work unless it is absolutely necessary.

What am I saying? It is strongly suggestive that your coworker is using AI to write code.

You are a competent software engineer, and on a good day, you can probably only write 100s lines in an hour. But not simultaneously when you might have to monitor production alerts and supervise rollouts. Furthermore, even though you can read your coworker's pull requests at the speed they are coming, you cannot type your review comments as fast.

What do you do? You try the AI code review tool, but they only spot minor surface level issues and generally say the code is okay to merge.

One of the pull requests supposedly fixes some data truncation issue. You know this typically means the tail end of the data was not flushed properly from the buffer before a file is closed. But the proposed fix would split one step in the data pipeline into two steps, with extra copies of the data being made as a result. It does not explain how it fixes the data truncation. It just happens to work.

Looking deeply, you realize that even though the code appears to finalize the data, it defers the finalization to run when that multi-hundred-lines-long function returns. The function uploads the data to a server before it has a chance to finalize it.

You comment your finding on the pull request, and figure you would try the AI code review tool again. This time, it magically figures out the same problem that you commented on, without crediting your work.

There are still some obvious issues about the incoherent coding style, the complexity, the unhelpful comments, and the unnecessary changes, but you have no energy to deal with that right now. You just merge the rest of the pull requests because it would have taken too long to write the comments to point them out.

After all, you still have your own pull requests that you hope to send out by the end of the day. Your manager informed you the day before that your position will be eliminated next month due to organizational strategy. Your coworker gets to keep his job.


There is a lot to unpack here, but there are some obvious predictions we can make when this scenario becomes more prevalent.

  • Organizations who value productivity will encourage more use of AI without considering the impact on its code base.
  • The code base will become increasingly unmaintainable by manual effort, and it will require more use of AI, as a self-fulfilling prophecy.
  • The institutional knowledge why the code was written will be lost, resulting in a collapse of human knowledge.
  • Increased complexity of code also means they run less efficiently, consuming more electricity yet run more slowly on fast hardware.
  • More electricity consumption means higher carbon footprint from the power generation, resulting in the worsening of global warming and climate change.
  • Data centers running AI operated by powerful conglomerates consume so much electricity that already makes electricity more expensive.

This case study provides a concrete example how the indiscriminate use of AI causes problems. However, AI is not inherently bad. It just exacerbates existing societal problems.

  • Many organizations already measure productivity by lines of code, the number of pull requests or issues solved, or other artificial metrics that are easy to measure.
  • There are metrics that measure the efficacy of the code: latency, throughput, cycle/ops counting, memory usage (or any sort of resource usage) and cost. But they are not as easy to measure or attribute correctly, so organizations that focus on growth will often deprioritize them.
  • Human knowledge is already susceptible to degradation over a game of telephones, not to mention getting poisoned by fake news or the desire to chase glory and fame over the desire to seek the truth.
  • Externalities from economic activities result in consequences on a third-party, whose cost is not captured by the economic transaction.
  • Unfair distribution of resources that disproportionally impacts the poor.

There are things we can work on:

  • Organizational leaders need to be more thoughtful about their organizational values and how to reward them. You reap what you sow.
  • We need to figure out how to have AI contribute to the pool of scientific knowledge and humanities insight, not just leech off existing ones.
  • Computer science has focused too much on generative problems. We need to solve more optimization problems, like how compilers optimize code, but also done at an architectural level.
  • Laws should continuously address economic externalities and unfair distribution of resources.

I think it is encouraging that, at least for the moment, courts are catching legal proceedings that have been hallucinated by AI and have reprimanded lawyers for submitting them. This means that when it comes to the matter of legal rights, the judges are still holding them to the highest level.

The EU AI Act also puts the obligation on the providers of AI to de-risk its usage and prohibits the uses of AI that are outright manipulative, exploitative, or discriminatory. This reins in on the organization values but does not solve the problems with economic externalities.


Throughout the following week, you notice more spurious changes from your coworker:

  • One pull request adds a fancy formatting function that is unnecessary for the purpose of the pull request. The programming language already has built-in formatting.
  • Another pull request makes some changes but also removes critical functionality unrelated to the purpose of the pull request. You ask if that is intentional. They thank you for spotting the mistake and reverts the change.

These are all telltale signs of AI hallucinations. Your coworker is obviously so productive now that they no longer bother reading their own code.

And the same symptoms start showing in pull requests from other coworkers.

And one day, you notice that the same hyper-productive coworker is hardly productive. You finally have some time to try the AI coding that is all the rage, and it tells you that the server is overloaded.

Tuesday, March 4, 2025

Review of Memory Safe C/C++ Proposals

In C++ creator calls for help to defend programming language from 'serious attacks', the article mentioned a few proposals:

Assuming we all know why memory safety is important, let's dive into how each of these proposals deliver on memory safety. Bear in mind that these are working proposals, so some of the details are "magic" or wishful thinking that needs to be fleshed out further.

Profiles (C++)

Profiles by Bjarne Stroustrup is not a single proposal, but a collection of proposals. Each profile states a promise about the safety property it provides and the language features or checks needed to achieve it. Profiles enforcement can be specified in code or toggled through compiler flags.

Through the use of a new expect() function, which is like assert(), error handling can be done in one of the following ways: ignore, logged (and ignored), logged (and throw an exception), throw an exception, or exit the program.

There are several profiles and their summaries:

  • Profile: Type stipulates that "every object is used only in accordance with its definition." It may be a union of several profiles such as Ranges, Invalidation, Algorithms, Casting, RAII and Union. One idea is to eliminate raw pointer handling from collection objects through a new span abstraction.
  • Profile: Arithmetic detects over and underflow conversion errors.
  • Profile: Concurrency detects race condition and deadlocks. It acknowledges that this is the "least mature of the suggested profiles" and "has received essentially no work specifically related to profiles."
  • Profile: Ranges detects out of range indexing.
  • Profile: Pointers stipulates that "every pointer points to an object or is the nullptr; every iterator points to an element or the end-of-range; every access through a pointer or iterator is not through the nullptr nor through a pointer to end-of range." It introduces a new language feature not_null, which looks like a type qualifier but it is not clearly specified.
  • Profile: Algorithms stipulates that "no range errors from mis-specified ranges (e.g., pairs of iterators or pointer and size). No dereferences of invalid iterators. No dereference of iterators to one-past-the-end of a range." There is some overlap with the pointers profile regarding the iterator end of range. It introduces a new language feature not_end(c, p) which returns whether iterator p is at the end of the container c.
  • Profile: Initialization stipulates that "every object is explicit initialized."
  • Profile: Casting prevents integer truncation by narrowing cast. Provides a narrow_cast<> with runtime checking.
  • Profile: Invalidation prevents "access through an invalidated pointer or iterator." The compiler is supposed to "ban calls of non-const functions on a container when a pointer to an element of the container has been taken," and suggests that the compiler does it through "serious static analysis involving both type analysis and flow analysis" (i.e. magic).
  • Profile: RAII prevents resource leaks by representing every resource as a scoped object. The constructor and destructor handle the acquisition and release of the resource. This can also be used to do reference counting.
  • Profile: Union says "every field of a union is used only as set" and suggests later providing pattern matching (i.e. algebraic data types) as an alternative.

It is clear that Profiles leverage many C++ only features and will not apply to C. However, the strength of this approach is that it recognizes safety as a synthesis of many issues that can be addressed incrementally. It allows legacy code to be incrementally updated to satisfy one profile at a time, so there is less upfront cost towards memory safety.

Profiles can also become unnecessarily broad. For example, concurrency through flow analysis is another can of worms that requires computing the arbitrary permutation of concurrent access to detect race conditions. Invalidation is also magic, as most code do not sufficiently express their intent to transfer resource ownership. On the other hand, it is unclear if all the profiles together will guarantee what people now expect from "safe" Rust.

TrapC

TrapC by Robin Rowe proposes a new dialect of C with the following modifications:

  • malloc() always allocates from a garbage collected heap, and free() is no-op.
  • Pointers are instrumented with type and size information, and pointer dereferencing is checked in runtime.
  • Access violations can be caught using a new trap statement. Unhandled violations will terminate the program.
  • goto and union are not supported.
  • TrapC can call C functions but not vice versa.
  • Typesafe printf() with a generic "{}" format specifier.
  • No special provision for thread-safety.

It is supposed to be able to compile unmodified legacy C code with additional runtime checks. When access violations cause unwanted program termination, users can write trap handlers as necessary. The white paper suggests a possible goal to "produce executables that are smaller and faster than from C compilers" but it is not clear how it is possible with additional runtime checking overhead (i.e. magic).

There is some escape analysis, so if a scoped object is returned as a pointer, it becomes heap allocated, similar to Go. Through this feature, TrapC proclaims that "it is possible to have code that is wrong in C, yet right in TrapC," but it is not clear how much legacy code that used to have undefined behavior in C will now benefit from having a defined behavior.

Fil-C

Fil-C by Filip Pizlo proposes a new dialect of C with the following modifications:

  • malloc() always allocates from a garbage collected heap, but free() puts an object to a free list.
  • Pointers are instrumented with type and size information, and pointer dereferencing is checked in runtime.
  • Garbage collection implementation supports concurrent collection without stop-the-world.

I have some doubts about the free list. The proposal does not prevent pointers from being aliased (having multiple pointers to the same object). Freeing an object will nullify one pointer but the other pointer is still valid. The proposal may be a little immature.

Much of the manifesto extols the virtue of author's garbage collector design, so it's not clear if the author is selling a new language or selling a new garbage collector. Garbage collector is not supposed to be tied to the language. There is no one-size-fits-all garbage collector, so it ought to be possible to use different garbage collection strategies depending on the workload requirements of the application.

Mini-C, or "Compiling C to Safe Rust, Formalized"

Aymeric Fromherz (Inria, France) and Jonathan Protzenko (Microsoft Azure Research, US) explore how to compile C to Rust without resorting to "unsafe" Rust. The resulting code strongly provides the same safety guarantee that Rust provides. Some of the considerations include:

  • Static analysis to translate pointer arithmetics in C to slices and splitting in Rust.
  • Infers when a reference needs mutable borrowing, including references from a struct.

They validated the feasibility of their approach on a subset of C that is already formally verified through other means, but it is probably a long shot from being able to accept legacy C code.

It relies on the fact that some carefully written C code has internal consistencies that are not explicitly expressed, but we can design inference algorithms to figure out what these internal consistencies are. Some of the inference techniques used in this paper can be reversely applied on Rust to reduce the notational requirements of Rust code.

The resulting executable does not need garbage collection, but still relies on runtime bounds checking (in Rust).

Safe C++ (also known as Circle C++)

Sean Baxter started the Circle C++ compiler around 2019 as "a compiler that extends C++17 for new introspection, reflection and compile-time execution" with a flare in meta-programming. Some of the memory safety extensions implemented by this compiler over the years are now being proposed as a C++ draft.

Some highlights of the proposal:

  • #feature on safety activates the compile-time memory safety checks, like #pragma in existing compilers.
  • A safe specifier that requires usage of safety language extension in a function (though it still allows explicit unsafe code), like the noexcept specifier that disallows a function from throwing an exception.
  • It still relies on runtime bounds checking.
  • Ownership tracking through checked references T^ (mutable) and const T^ (shared). Each object may have either a single mutable reference, or any number of shared references, but not both at once.
  • Named lifetime parameter /a and borrowing reference T^/a.
  • Lifetime binder template parameter typename T+.
  • A mut statement prefix to establish a mutable context that allows conversions from lvalues to mutable borrows and references.
  • A new standard library with safe containers and algorithms. In particular, it replaces begin() and end() iterators with slice iterators annotated by named lifetime parameters.
  • Pattern matching with "choice type" to enforce that optional values have to be checked before access.
  • Type traits for thread safety: T~is_send, T~is_sync.
  • Type traits for allowing unsafe pointer arithmetics: T~as_pointer, T~as_length; not fully explained.
  • Type traits T~string, T~is_trivially_destructible; not fully explained.

The safety semantics are inspired mostly by Rust, which is mentioned in the proposal 85 times.

Safe C++ may very well provide some concrete design for some of the Profiles work by Stroustrup. Contrary to Profiles, Safe C++'s monolithic "all or nothing" approach might make it more difficult to port legacy code due to the upfront cost to satisfy all memory safety requirements all at once. Perhaps choice type, thread safety, and pointer arithmetics can be split into their own Profiles.

Conclusion

There are several ways to compare and contrast these approaches.

  • Whether they expect significant modification to legacy code:
    • Upfront: Safe C++, Mini-C.
    • Incrementally: Profiles C++
    • No: TrapC, Fil-C.
  • Whether they force the use of garbage collection:
    • Yes: TrapC, Fil-C.
    • No: Profiles C++, Safe C++, Mini-C.
  • Whether they require C++.
    • Yes: Profiles C++, Safe C++.
    • No: TrapC, Fil-C, Mini-C.

In light of the recent Rust-in-Linux debacle, if we were to port Linux kernel code to a memory safe C dialect, we would not be able to use garbage collection, nor would we be able to use C++. This leaves Mini-C as the only viable option. However, the inference algorithm may not be able to handle the complexity of Linux kernel code, so some kind of object borrowing or lifetime annotation will still be needed.

Incremental safety check is a useful feature, as this alleviates the upfront cost to fix legacy code for all memory safety issues all at once.

It's worth noting that all of the proposals above require runtime bounds checking. Without some kind of size annotation throughout the code, it would be hard for static analysis to infer whether bounds checking can be safely omitted. This precise problem is solved by ATS through the use of dependent types. Perhaps it could be useful to design a dependent type system for a dialect of C for those who aren't used to ML styled programming of ATS. We can take some inspiration from Mini-C to reduce the notational overhead.

Saturday, February 8, 2025

Polling vs. Interrupt

Recently there is news reverberating across the interwebs that Tiny Linux kernel tweak could cut datacenter power use by 30%. The claim is based on this code commit detailed by the paper Kernel vs. User-Level Networking: Don't Throw Out the Stack with the Interrupts. The main idea is to mix the use of polling and interrupt driven I/O to get the best of both worlds.

Polling is just busy looping to wait for something to happen. It only takes a few CPU cycles each time we check, so the turnaround is very efficient. But if we keep looping and nothing happens, it is wasteful work.

Interrupt driven I/O programs the CPU how to handle an event when something happens, then it tells the CPU to go do something else or go to sleep when there is nothing else to do. The CPU has to save the context of the current program and switch context between programs, so the overhead is greater. But it is more efficient if there is a lot of wait between events.

When we have events that take some time between arrival, but tend to arrive in a bunch (e.g. poisson process with Exponential Distribution intervals), then it makes sense to wait for the next event using an interrupt, then process subsequent events using polling. We can further limit the amount of time spent in polling to equal to the opportunity cost if we were to use an interrupt, and it's a win-win situation: if an event arrives sooner, we could save on the cost of the interrupt minus the cost of polling, but never worse.

This idea is in fact not new. This technique is often used in task schedulers. I remember seeing its use in an old MIT Cilk version circa 1998. I have seen it in some mutex implementations. I have also used the same technique in my Ph.D. work circa 2014 (in the artifact Liu_bu_0017E_429/libikai++-dissertation.tbz2, look for parallel/parallel.cc and the deep_yield() function in particular). This technique just seems very trivial, so I have never seen anyone bother mentioning it.

So will this technique cut datacenter power use by 30%? The paper measured "queries per second" improvements in a memcached microbenchmark, but it is naive to assume that 30% increase in performance automatically translates to 30% reduction in power use. That's assuming all the datacenter does is receiving network packets with no additional work, such as: cryptography, marshaling and unmarshaling of wire protocol, local memory and data storage I/O, calling out to external services, and computing a response. The 30% figure is the upper bound in the most ideal scenario, and it caught the attention of the Internet.

It is a welcoming optimization nonetheless, and the authors have done the hard work to validate their claims with benchmarks and a thorough explanation how it works, so they deserve the credit for the execution. Perhaps someone could continue to audit whether other OS subsystems (e.g. block devices, CPU scheduler) can also benefit from this treatment, for Linux and other OSes.

Friday, January 3, 2025

Why RAID 1+0 is Better Than RAID 0+1?

In this article, we discuss why RAID 1+0 (stripes of mirrors) is better than RAID 0+1 (mirror of stripes) for those who are building a storage array.

Image credit: Network Storage Server (angle view with case open) from Wikimedia Commons

What is RAID, and why?

RAID (redundant array of independent disks) describes a system of arrangements to combine a bunch of disks into a single storage volume. The arrangements are codified into RAID levels. Here is a summary:

  • RAID 0 (striped) essentially concatenates two or more disks. The stripe refers to the fact that blocks from each of the disks are interleaved. If any disk fails, the whole array fails and suffers data loss.
  • RAID 1 (mirrored) puts the same data across all disks. It does not increase storage capacity or write performance, but reads can be distributed to each of the disks, thereby increasing read performance. It also allows continued operation until the last disk fails, which improves reliability against data loss (assuming that disk failures happen independently, see discussion below).
  • RAID 2, 3, 4, 5 (parity) employ various parity schemes to allow the array to still operate without data loss up to one disk failure.
  • RAID 6 (parity) uses two parity disks to allow up to two disk failures without data loss.
  • RAID-Z (parity) is a parity scheme implemented by ZFS using dynamic stripe width. RAID-Z1 allows one disk to fail without data loss. RAID-Z2 two disks, and RAID-Z3 three disks.

Disk failures may be correlated by many factorsoperational temperature, vibration, wear, age, and manufacture defects shared by the same batch—so they are not fully independent. However, the failures can be somewhat randomized if we mix disks from different vendors or models. Each year, Backblaze publishes hard drive failure rates aggregated by vendor and model (e.g. 2023) which is an invaluable resource to decide which vendor and model to use.

Why use nested RAID?

When disk fails in a RAID up to the tolerated failure without data loss, the failed disk can be replaced with a blank disk, and data is resilvered into the new disk. The resilver time lower bound is determined by the capacity of the disk divided by how fast we can write to it. A larger disk will take longer to resilver. If the disk failures are not fully independent, then there is a greater risk of data loss if another disk fails during resilvering.

By way of example, a 24TB disk at a maximum write speed of 280 MB/s will take about a day to resilver (rule of thumb: each 1 TB takes 1 hour). This makes a large disk somewhat unappealing due to the long resilver time. More realistically, we may want to use smaller disks so they can complete resilvering more quickly.

Creating RAID from smaller capacity bound disks necessitates nesting the RAID arrangements. Here are some ways we can arrange the nesting:

  • RAID 0+1 (a mirror of stripes) takes a stripe of disks (RAID 0) and mirror the stripes (RAID 1). When a disk fails, it impacts the entire stripe, and the whole stripe needs to be resilvered.
  • RAID 1+0 (a stripe of mirrors) takes a bunch of mirrored disks (RAID 1) and creates a stripe (RAID 0) out of the mirrors. When a disk fails, it impacts just the mirror. When the disk is replaced, only the mirror needs to be resilvered.
  • RAID 5+0, RAID 6+0, etc. creates an inner parity RAID combined as an outer stripe (RAID 0).

Nesting are not created equal!

We can see qualitatively from the failure impact point of view that RAID 1+0 has a smaller impact scope, therefore is better than RAID 0+1. But we can also analyze quantitatively what is the probability that a second failed disk might cause total data loss.

  • In RAID 0+1, suppose we have N total disks arranged into a 2-way mirror of \( N / 2 \) disk stripes. The first failure already brought down one of the mirrors. The second failed disk would bring down the whole RAID if it happens in the other stripe, or approximately 50% chance. Note that we cannot simply swap striped disks between the mirrors to avert the failure, since disk membership belongs to a specific RAID. You might be able to manually hack around that by hex-editing disk metadata, but that is a risky dealing for the truly desperate person.
  • In RAID 1+0, suppose we have N total disks arranged into \( N / 2 \) stripes of 2-way mirrored disks. The first failure already brought down one of the mirrors. The second failed disk would bring down the whole RAID if it happens in the other disk of the same mirror, or approximately \( 1 / N \) chance. The number of failed disks we can tolerate without data loss is a variation to The Birthday Paradox.

Deep dive into aggregate probability of failure

Another way to analyze failure impact quantitatively is to see what happens if we build a RAID 0+1 or RAID 1+0 array using disks of the same average single disk failure rate. We compute the aggregate probability of failure that would suffer data loss, assuming that failures are independent.

  • Suppose the probability of average single disk failure is p.
  • For RAID 0 with N striped disks, the probability that any of the N disks fails is \( P_0^N(p) = 1 - (1 - p)^N \), i.e. the opposite of all N disks do not fail, a double negation.
  • For RAID 1 with N mirrored disks, the probability that all N disks fail at the same time is \( P_1^N(p) = p^N \)
  • For RAID 0+1, with k-way mirrors of \( N / k \) stripes, the failure probability is:
    \[
      \begin{aligned}
        P_{0+1}^{k,N/k}(p) & = P_1^{k}(P_0^{N/k}(p)) \\
          & = P_1^k( 1 - (1 - p)^{N/k} ) \\
          & = ( 1 - (1 - p)^{N/k} )^k
      \end{aligned}
    \]
  • For RAID 1+0, with \( N / k \) stripes of k-way mirrors, the failure probability is:
    \[
      \begin{aligned}
        P_{1+0}^{N/k,k}(p) & = P_0^{N/k}(P_1^k(p)) \\
          & = P_0^{N/k}(p^k) \\
          & = 1 - (1 - p^k)^{N/k}
      \end{aligned}
    \]

We can plot \( P_{0+1}(p) \) and \( P_{1+0}(p) \) and compare them against p, the probability of single disk failure representing the AFR (annualized failure rate). Here are some examples of p from Backblaze Drive Stats for 2023 picking the models with the largest drive count from each vendor:

MFGModelDrive SizeDrive CountAFR
HGSTHUH721212ALE60412TB13,1440.95%
SeagateST16000NM001G16TB27,4330.70%
ToshibaMG07ACA14TA14TB37,9131.12%
WDCWUH721816ALE6L416TB21,6070.30%

In most cases, we are looking at p < 0.01. Some of the models have AFR > 10% but it is hard to tell how accurate the number is due to the small drive count for that model. Those models with a small average age can bias towards high early mortality due to the Bathtub Curve.

We generate the plots in gnuplot using the following commands:

gnuplot> N = 4
gnuplot> k = 2
gnuplot> set title sprintf("N = %g, k = %g", N, k)
gnuplot> plot [0:0.02] x title "p", (1 - (1-x)**(N/k))**k title "RAID 0+1", 1 - (1 - x**k) ** (N/k) title "RAID 1+0"

Number of disks N = 4 with mirror size k = 2 is the smallest possible nested RAID 0+1 or RAID 1+0. At p < 0.01, both RAID 0+1 and RAID 1+0 offer significant improvement over p.

At N = 8 while keeping the same mirror size k=2, we see that both RAID 0+1 and RAID 1+0 still offer improvements over p. However, RAID 1+0 failure rate doubles, and RAID 0+1 more than triples.


At N = 16, the trend of multiplying failure rate continues. Note that RAID 0+1 can now be less reliable than a single disk, while RAID 1+0 still offers 6x improvement over p.


To get the failure rate under control, we need to increase the mirror size to k = 3. Even so, RAID 1+0 failure rate (very close to 0) is still orders of magnitude lower than RAID 0+1.


So from the quantitative point of view, RAID 1+0 is much less probable to suffer a total data loss than RAID 0+1.

Conclusion

As a rule of thumb, when nesting RAID levels, we want the striping to happen at the outermost layer because striping is the arrangement that accumulates failure rates. When we mirror for the inner layer, we reduce the failure rates by orders of magnitude, so the striping accumulates failure rates much slower.

This conclusion also applies to storage pool aware filesystems like ZFS. RAID-Z does not allow adding disks to an existing set. The best practice is to always add disks as mirrored pairs. You can also add a new disk to mirror an existing disk in a pool. Stripes of mirrors offer the most flexibility to expand the storage pool in the future.

This is why RAID 1+0 is better.