tag:blogger.com,1999:blog-44015552808257665852024-03-16T11:21:00.450-04:00Life of a Computer ScientistSignificant research starts with a humble beginning.Likai Liuhttp://www.blogger.com/profile/06372207357661600589noreply@blogger.comBlogger231125tag:blogger.com,1999:blog-4401555280825766585.post-68390237448034998052024-03-16T11:20:00.000-04:002024-03-16T11:20:01.409-04:00Deep Dive into MQA-CD Encoding<p>A few weeks ago, I saw this video by Techmoan introducing the MQA-CD. MQA-CD is an audio CD that can be played back in a regular CD player, which is limited to 16-bit samples at 44.1 kHz. However, when played back through an MQA decoder, it promises better sound quality at 24-bit at 192 kHz.</p><div class="separator" style="clear: both; text-align: center;"><iframe allowfullscreen="" class="BLOG_video_class" height="360" src="https://www.youtube.com/embed/5QS9Sc8J91Y" width="640" youtube-src-id="5QS9Sc8J91Y"></iframe></div><p>Before we dig into the MQA marketing material, we need to understand that MQA is an encoding scheme that can exist outside of a CD, e.g. audio delivered over the radio or the Internet. Some of the non-CD transports are assumed to carry 24-bit at 48 kHz or higher. However, MQA-CD transport is limited to 16-bit at 44.1 kHz by the CD as its physical medium.</p><p>At first glance, MQA violates the <a href="https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem">Nyquist–Shannon sampling theorem</a> which places a hard upper-bound that a signal at frequency B must be uniquely represented by at least 2B samples per second. However, we can give it some leeway by allowing for lossy encoding, even though some MQA marketing material claims that the encoding is lossless.</p><p>In a lossy scheme, we can steal some lowest significant bits from the sample to passthrough a data stream like MP3 that employs psychoacoustic coding. The lowest significant bits sound like the noise floor when listened to without the decoder, and the psychoacoustic coding allows us to put more detail into the noise more economically—basically, the data stream contains instructions about how to synthesize only sounds humans can hear, so we use less data than if we have to encode the full Nyquist-Shannon spectrum. Furthermore, the data stream only needs to contain the delta, which is the sound not already present in the non-stolen bits.</p><p>The question about MQA-CD is how many bits it is stealing?</p><h3 style="text-align: left;">Music Origami, according to MQA</h3><p>The MQA website links to a blog by the MQA inventor, Bob Talks, which discusses the CD encoding with some <a href="https://bobtalks.co.uk/blog/science-mqa/16b-mqa-what-is-it/">technical detail</a>, but it is a little confusing:</p><blockquote><p><i>If the original source is 44.1kHz/24b or if the sample rate is 88.2, 176.4, 352,8 kHz, or DSD, then a standard MQA file will be 44.1 kHz/24b. The file contains the information for decoding, ‘unfolding’, and rendering.</i></p><p><i>This 24b MQA file is structured so that, if in distribution it encounters a ’16-bit bottle-neck’ (e.g. in a wireless or automotive application), then the information in the top 16 bits is arranged to maximise the downstream sound quality and still permits unfolding and rendering. See [2]</i></p></blockquote><blockquote><p><i>[2] <a href="https://www.bobtalks.co.uk/blog/mqaplayback/origami-and-the-last-mile/">MQA-CD: Origami and the Last Mile</a> </i></p></blockquote><p>So reference [2] should contain some information about how the 24-bit is truncated to 16-bit. Here are some mentions:</p><blockquote><p><i>The Green signal is completely removed by MQA decoders; but it is there so that we can hear more of the music when playback is limited to a 16-bit stream.</i></p></blockquote><blockquote><i>Sometimes we might want to listen to MQA music on equipment that doesn’t support 24 bits – maybe only 16? Rather than throw away all the buried information, MQA carries a small data channel (shown in Green) which can contain the ‘B’ estimates, enabling significantly improved playback quality on, e.g. a CD, over ‘Airplay’, in-car, to certain WiFi speakers and similar scenarios.</i></blockquote><p>But it is also confusing because it shows the “Green signal” at -120 dB. We know that CD dynamic range is 96 dB, so it could not have been able to represent -120 dB noise floor. Samples at 24-bit has a dynamic range of 144 dB. However, the signal charts in the page shows a floor of -168 dB, and it was putting some information below -144 dB, which requires 28-bits.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://bobtalks.co.uk/wp-content/uploads/2018/09/fig26_fold1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="581" data-original-width="800" height="465" src="https://bobtalks.co.uk/wp-content/uploads/2018/09/fig26_fold1.png" width="640" /></a></div><p>As a side note, CD dynamic range of 96 dB is determined by the formula in terms of the 16-bit sample depth: \( 20 \times \log_{10}{2^{16}} \approx 96 \). As a rule of thumb, each bit in the sample represents about 6 dB in <a href="https://en.wikipedia.org/wiki/Dynamic_range">dynamic range</a>.</p><p>Another page <a href="https://bobtalks.co.uk/a-deeper-look/deeper-look-mqa-16b-in-the-last-mile/">Deeper Look: MQA 16b and Provenance in the Last Mile</a> also states that:</p><blockquote><p><i>If we look at the block diagram above, we can see there are three components to the MQA data, broadly described as: i) top 16 bits, ii) MQA signalling and iii) bottom 8 bits</i></p></blockquote><p>The block diagram clearly shows that the encoding result in 24-bit master file, but it still does not explain how that is reduced to MQA-CD which is bottlenecked to 16-bit samples.</p><h3 style="text-align: left;">Is Bit Stealing Plausible?</h3><p>Since MQA still does not explain how the 24-bit master is reduced to 16-bit transport depth on a CD, we are left to speculate about the bit stealing idea earlier.</p><p>If we allow stealing 4 bits per sample, then we get a data rate of \( 2 \textit{ channels} \times 4 \textit{ bits per sample} \times 44100 \textit{ Hz} \approx 344 \textit{ kbps} \). This is pretty generous for high quality AAC, which is typically 256 kbps. The dynamic range before decoding is reduced from 96 dB to 72 dB, which is still comparable to a very high quality magnetic tape.</p><p>So I would say it is plausible, but it is inconclusive from the MQA marketing material if this is how they did it.</p><p>Furthermore, I don’t see the point of MQA’s “Music Origami” that folds 24-bit 192 kHz into 24-bit 48 kHz. If the transport is already capable of lossless 24-bit data, it must be a digital transport that is not a CD, which means there is no requirement to maintain backwards compatibility with a <a href="https://en.wikipedia.org/wiki/Compact_Disc_Digital_Audio">Red Book</a> CD player. We can just use the whole stream to transport encoded audio, e.g. AAC or Flac. Even some later CD players in the 2000’s can play MP3 from a data CD or from a USB drive. That was all possible before MQA launched in 2014.</p><p>Which is why Techmoan says that even if you believe MQA delivers higher quality audio, it is a format that came a little too late.</p>Likai Liuhttp://www.blogger.com/profile/06372207357661600589noreply@blogger.com0tag:blogger.com,1999:blog-4401555280825766585.post-55417658573984631232024-03-16T08:49:00.001-04:002024-03-16T08:49:52.642-04:00Carrot and Stick Security Design<p>Carrot and stick security design is the idea to have frontend and backend work together to enforce security policies in software. The frontend interacts with the user and steers them towards compliance, while the backend enforces the security rules. Although we don’t necessarily use carrot and stick to mean reward and punishment, the carrot is a “soft nudge” and the stick is a “hard boundary.” If the user bypasses the frontend and tries to interact with the backend directly, they will be met with a hard error message.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://upload.wikimedia.org/wikipedia/commons/thumb/e/e8/Carrot_and_stick_motivation.svg/1080px-Carrot_and_stick_motivation.svg.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="759" data-original-width="800" height="304" src="https://upload.wikimedia.org/wikipedia/commons/thumb/e/e8/Carrot_and_stick_motivation.svg/1080px-Carrot_and_stick_motivation.svg.png" width="320" /></a><br />Image credit: <a href="https://commons.wikimedia.org/wiki/File:Carrot_and_stick_motivation.svg">Wikimedia Commons</a></div><p>(“Good cop, bad cop” is a similar strategy, although the cop analogy may be controversial.)</p><p>An example is a photo gallery that allows visitors to browse but only signed-in users to download images. The frontend may present a “download” button but will ask the user to either login or create an account. The backend checks that the login credentials are present before allowing the image to be downloaded.</p><p>If the security check is only done in the frontend, then the user could simply bypass login by forging a URL request directly to download the images. If the security check is only done in the backend, then an innocent user that did not know they need to signup or login first may be confronted with an unfriendly error message.</p><p>I’m intentionally using the terms “frontend” and “backend” loosely. In practice, the designations may differ depending on the application:</p><p></p><ul style="text-align: left;"><li>For a user-facing website, frontend is the client side Javascript, and backend is the HTTP server.</li><li>For a mobile app, frontend is the app, and backend is some API used by the app.</li><li>For an API, frontend is the HTTP server middleware, and backend is the internal data storage.</li></ul><p style="text-align: left;">Even at the API level, the API design should try to encourage well-defined use cases (the carrot), and let the protocol layer check for malformed requests (the stick).</p><p style="text-align: left;">What this means is that a complete software stack that spans the gamut of client side Javascript or app, an API middleware, and backend storage should implement security enforcement at all layers.</p><p></p>Likai Liuhttp://www.blogger.com/profile/06372207357661600589noreply@blogger.com0tag:blogger.com,1999:blog-4401555280825766585.post-5798428463881905352024-03-01T10:44:00.019-05:002024-03-03T07:26:24.227-05:00Memory Safety State of the Union 2024, Rationale Explained<p style="clear: both; text-align: left;">There has been renewed interest in programming languages after The White House recently published a recommendation suggesting the transition to a memory safe language as a national security objective. Although I am not an author of the report, I want to explain the rationale that someone might use to consider whether their infrastructure meets the memory safety recommendations. This is more of an executive-level overview than a technical guide to programmers.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://upload.wikimedia.org/wikipedia/commons/e/e8/Whitehouse_north.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="558" data-original-width="800" height="446" src="https://upload.wikimedia.org/wikipedia/commons/e/e8/Whitehouse_north.jpg" width="640" /></a></div><div style="text-align: center;">Image credit: Wikimedia Commons, <a href="https://commons.wikimedia.org/wiki/File:Whitehouse_north.jpg">Whitehouse North</a>.</div><p>On February 26, 2024, the White House released <a href="https://www.whitehouse.gov/oncd/briefing-room/2024/02/26/memory-safety-statements-of-support/">Statements of Support for Software Measurability and Memory Safety</a> calling attention to a technical report from the Office of the National Cyber Director titled “Back to the Building Blocks: A Path Towards Secure and Measurable Software” (<a href="https://www.whitehouse.gov/wp-content/uploads/2024/02/Final-ONCD-Technical-Report.pdf">PDF link</a>). The whole framework works like this: we ultimately want to be able to measure how good the software is (e.g. by giving it a score), and it relies on memory safety as a signal. The tech report also references another report published by Cybersecurity and Infrastructure Security Agency (CISA) titled <a href="https://www.cisa.gov/case-memory-safe-roadmaps">The Case for Memory Safe Roadmaps</a> which contains a list of memory safe language recommendations.</p><p>To supplement their list, I will be using the <a href="https://www.tiobe.com/tiobe-index/">TIOBE index</a> top 20 most popular programming languages to provide the examples: Python, C, C++, Java, C#, JavaScript, SQL, Go, Visual Basic, PHP, Fortran, Pascal (Delphi), MATLAB, assembly language, Scratch, Swift, Kotlin, Rust, COBOL, Ruby.</p><p>I will also throw in some of my personal favorites: LISP, Objective Caml, Haskell, Shell, Awk, Perl, Lua, and ATS-lang.</p><h3 style="text-align: left;">High Level Languages</h3><p>Languages that do not expose memory access to the programmer tend to be memory safe. The reason is that it reduces the opportunity for programmers to make memory violation mistakes. When accessing a null pointer or an array with an out of bounds index, these languages raise an exception that could be caught, or return a sentinel value (0 or undefined value), rather than silently corrupting memory content.</p><p><b>Examples:</b> Python, Java, C#, JavaScript, SQL, Go, Visual Basic, PHP, MATLAB, Scratch, Kotlin, Ruby; LISP, Objective CAML, Haskell, Shell, Perl, Awk, Lua.</p><p>Under the hood, they employ an automatic memory management strategy such as reference counting or garbage collection, but programmers will have little to no influence over it because the language does not expose memory access.</p><p style="text-align: left;">It does not matter whether the language execution happens at the abstract syntax tree (AST) level, compiled to a byte code, or compiled to machine code. In general, any language could have a runtime implementation that spans the whole spectrum through <a href="https://en.wikipedia.org/wiki/Just-in-time_compilation">Just In Time compilation</a>.</p><h4 style="text-align: left;">Things to Watch Out For</h4><p>High level languages are prone to unsanitized data execution error such as <a href="https://owasp.org/www-community/attacks/SQL_Injection">SQL injection</a>, <a href="https://owasp.org/www-community/attacks/Code_Injection">Code Injection</a>, and most recently <a href="https://www.ncsc.gov.uk/information/log4j-vulnerability-what-everyone-needs-to-know">Log4j</a>. This happens when user input is passed through to a privileged execution environment and treated as executable code. High level languages often blur the line between data and code, so extra care must be taken to separate data from code execution. Data validation helps, but ultimately data should not have influence over code behavior unless it is explicitly designed to do so.</p><p><b>I strongly oppose using PHP</b> or any products written in PHP, which is particularly notorious for SQL and code injection problems and single handedly responsible for all highly critical <a href="https://wpscan.com/wordpresses/">WordPress vulnerabilities</a>. But if you inherited legacy infrastructure in PHP, there are <a href="https://lifecs.likai.org/2021/07/revival-of-harvard-architecture-for.html">principles that will help hardening</a> it.</p><p style="text-align: left;">Even though memory access errors are raised as an exception, if these exceptions are not caught, they could still cause the entire program to abort. They also still allow potentially unbounded memory consumption leading to exhaustion, which causes program to abort or suffer severely degraded performance, leading to denial of service.</p><p style="text-align: left;">Some languages provide an “unsafe” module which is essentially a backdoor to memory access. Using them is inherently unsafe.</p><p></p><ul style="text-align: left;"><li>Python: <a href="https://docs.python.org/3/library/ctypes.html">ctypes — A foreign function library for Python</a></li><li>Java: <a href="https://blogs.oracle.com/javamagazine/post/the-unsafe-class-unsafe-at-any-speed">The Unsafe Class: Unsafe at Any Speed</a></li><li>C#: <a href="https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/unsafe-code">Unsafe code, pointer types, and function pointers</a></li><li>Go: <a href="https://pkg.go.dev/unsafe">unsafe package</a></li></ul><div><p>Most languages also allow binding with an unsafe language through a <a href="https://en.wikipedia.org/wiki/Foreign_function_interface">Foreign Function Interface</a> (ffi) like <a href="https://www.swig.org/">SWIG</a>. This allows the high level code to run potentially unsafe code written in a non-safe language like C or C++.</p><h3 style="text-align: left;">Mid Level Languages</h3></div><p style="text-align: left;">These languages expose some aspects of memory management to the programmer—such as explicit reference counting—and provides language facilities to make it <b>safer</b>.</p><p style="text-align: left;"><b>Examples:</b> Swift, Rust; also ATS-lang.</p><p>Performance is the main reason to use these languages, as memory management overheads have negative performance impact, and in some time-sensitive applications, we have to carefully control when to incur these overheads. The tradeoff is programmer productivity, since they have to worry about more things. Since performance is the main concern, these languages tend to be compiled into machine code before running in production.</p><p style="text-align: left;">I want to call out ATS-lang because it is a language I helped working on for my Ph.D. advisor, <a href="https://www.bu.edu/cs/profiles/hongwei-xi/">Hongwei Xi</a>. It was conceived in 2002 and predated Rust (2015). ATS code can mostly be written like <a href="https://www.smlnj.org/sml.html">Standard ML</a> or <a href="https://ocaml.org/">Objective CAML</a> with fully automatic memory management. It also provides facilities to do manual memory management. Safety is ensured by requiring programmers to write theorems to prove that the code uses the memory in a safe manner (<a href="https://www.cs.bu.edu/~hwxi/atslangweb/Papers.html">papers</a>). The theorem checker uses stateful views inspired by linear logic to reason about acquisition, transferring of ownership, and disposal of resources.</p><h4 style="text-align: left;">Things to Watch Out For</h4><p style="text-align: left;">These languages are safe by virtue that the compiler can check for most programmer errors in compile time, but these languages still provided unsafe ways to access memory.</p><p style="text-align: left;"></p><ul style="text-align: left;"><li>Swift: <a href="https://developer.apple.com/documentation/swift/unsafepointer">UnsafePointer</a></li><li>Rust: <a href="https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html">Unsafe Rust</a></li><li>ATS-lang: <a href="https://ats-lang.sourceforge.net/DOCUMENT/INT2PROGINATS/HTML/x2128.html">Unsafe C-style Programming in ATS</a></li></ul><div>Furthermore, they are still prone to denial of service, SQL injection, and code injection vulnerabilities.</div><p></p><h3 style="text-align: left;">Low Level Languages</h3><p style="text-align: left;">These languages require the programmer to manually handle all aspects of memory management for legacy reasons. For this reason, they are <b>inherently unsafe</b>.</p><p style="text-align: left;"><b>Examples:</b> C, C++, Pascal.</p><p style="text-align: left;">Although <a href="https://en.wikipedia.org/wiki/Garbage_collection_(computer_science)">garbage collection</a> was invented by John McCarthy in 1959 for the LISP language, that concept did not gain mainstream adoption until much later.</p><p style="text-align: left;">Even so, there are a few strategies to make these languages more memory friendly.</p><p style="text-align: left;"></p><ol style="text-align: left;"><li>Use an add-on garbage collector like <a href="https://hboehm.info/gc/">Boehm-GC</a>. Note that object references stored in the system malloc heap are not traced for liveness, so care must be taken when using both GC malloc and system malloc.</li><li>C++ code should use <a href="https://en.cppreference.com/w/cpp/language/raii">Resource Acquisition is Initialization (RAII)</a> idiom as much as possible. The language already tracks object lifetime through variable scope. The constructor is called when an object is introduced into the scope, and the destructor is called when the object leaves the scope. Smart pointers like <a href="https://en.cppreference.com/w/cpp/memory/unique_ptr">std::unique_ptr</a> and <a href="https://en.cppreference.com/w/cpp/memory/shared_ptr">std::shared_ptr</a> use RAII to manage memory automatically.</li></ol><p style="text-align: left;">My particular contribution in this field is my <a href="https://lifecs.likai.org/p/phd-dissertation.html">Ph.D. dissertation</a> (2014), which proposed a different type of smart pointer in C++ that does not auto-free memory but still helps catching memory errors. I showed that it is practical to use, by implementing a memory allocator using the proposed smart pointer.</p><h3 style="text-align: left;">Legacy Languages</h3><p style="text-align: left;">I have omitted Fortran and COBOL from any of the lists above. Historically, Fortran and COBOL only allowed static memory management. All the memory to be used by the program are declared in the code, and the OS provisions for them before the program is loaded for execution. However, they never had any array bounds checking, so they are not memory safe. Furthermore, attempts to modernize the language with dynamic memory allocation exacerbated the problem that these languages were not designed to be memory safe.</p><p style="text-align: left;"></p><ul style="text-align: left;"><li>Fortran: <a href="https://en.wikibooks.org/wiki/Fortran/memory_management">memory management</a></li><li>COBOL: <a href="https://www.ibm.com/docs/en/developer-for-zos/16.0?topic=operations-memory-allocation">memory allocation</a></li></ul><p style="text-align: left;">I have also omitted assembly languages as a whole. Assembly languages tend to be bare metal and provide completely unfettered memory access, but there have been some research to enhance the safety of assembly languages (e.g. <a href="https://www.cs.cornell.edu/talc/">Typed Assembly Language</a>).</p><h3 style="text-align: left;">Conclusion</h3><p style="text-align: left;">There is no language that is completely memory safe. Some languages are safer by design because either it limits the ability of the programmer to manipulate memory, or it gives the programmer facilities to help them use memory in a safer way. However, almost all languages have ways to unsafely access memory in practice.</p><p style="text-align: left;">Memory safety is also not the only way to cause security breach. Executing data as code is the main reason behind SQL and code injection, and these lead to highly critical privilege escalation, penetration, and data leak attacks. One safety net we can provide is by <a href="https://lifecs.likai.org/2021/07/revival-of-harvard-architecture-for.html">making code read-only</a>, but this does not absolve the importance of good data hygiene.</p><p style="text-align: left;">Unbounded memory consumption can degrade performance or cause denial of service attacks, so memory usage planning must be considered in the infrastructure design.</p><p style="text-align: left;">In any case, the language is not the panacea to the memory safety or security problems. Programmer training and a culture to emphasize good engineering principles are paramount. However, I would love to see a renewed interest in programming language research to enhance language safety, by designing languages that encourage good practices.</p><h3 style="text-align: left;">Further Resources</h3><div><ul style="text-align: left;"><li><a href="https://memory-pool-system.readthedocs.io/en/latest/mmref/lang.html">Memory management in various languages</a></li></ul></div><p></p><p></p>Likai Liuhttp://www.blogger.com/profile/06372207357661600589noreply@blogger.com0tag:blogger.com,1999:blog-4401555280825766585.post-38072355870460766382024-02-18T09:10:00.000-05:002024-02-18T09:10:14.571-05:00Infinite Monkey Theorem, Debunked<p>If you let a monkey hit random keys on a typewriter for an infinite amount of time, will it happen to write the complete works of Shakespeare? Common wisdom says that not only will it write Shakespeare, but it will “almost surely” type every possible text an infinite number of times.</p><p>The proof idea goes like this: the probability to produce a permutation matching the complete works of Shakespeare is very small, but still greater than zero, so it will “almost surely” happen.</p><p>This is assuming that the monkey will eventually produce such permutation with a non-zero probability.</p><p>What if some permutations are simply not possible? There could be several reasons why this particular monkey and this particular typewriter could not have written Shakespeare. In order of increasing complexity:</p><p></p><ul style="text-align: left;"><li>Some letters on the keyboard are broken, so some letters in the works of Shakespeare could not be typed.</li><li>Due to the clumsiness of the monkey, it is only able to hit the space bar before or after hitting the letters above it. This means that every word has to always begin and end with ‘z’, ‘x’, ‘c’, ‘v’, ‘b’ ‘n’, or ‘m’. Shakespeare often used words that do not satisfy this criteria, so the works of Shakespeare could not have been written.</li><li>The monkey is not able to repeat what it types within a certain number of words, so phrases like “to be or not to be” can never be written.</li></ul><p style="text-align: left;">Infinity does not mean that it will allow every permutation. An example is the infinite sequence that concatenates the set of numbers consisting of only even digits: 0 2 4 6 8 20 22 24 26 28 40 42 44 46 48 etc. This sequence is infinite, but it would never contain the number ‘1’.</p><p style="text-align: left;">(On the other hand, the sequence of all even numbers concatenated will contain all finite numbers, including the odd numbers; since starting from 10—which is still an even number—we simply ignore the last digit and look at the tens and above, e.g. 123 is contained in 1230 1232 1234...)</p><p style="text-align: left;">This is not an argument about “almost surely” being practically zero. Instead, we are simply saying that infinite monkey is not almost surely, as the output is handicapped in some way albeit still infinite.</p><p style="text-align: left;">If we want to apply the infinite monkey theorem elsewhere, infinity alone is not sufficient. We have to also show that the desired permutation has a non-zero probability to happen, as this is not a given. On the contrary, an application that only establishes infinity can be disproven by showing that the desired output actually has a zero probability of happening.</p><p style="text-align: left;">∞ ⨉ 𝜀 = ∞ but ∞ ⨉ 0 = 0</p><p>In reality, we often confuse the mechanism of the permutation with the existence of output. We cannot say that the complete works of Shakespeare exists in the output <i>implies</i> that it must have been produced by the monkey. What if we have a programmed typewriter such that, if the monkey hits ‘s’ a million times in a row, would insert the complete works of Shakespeare? The complete works “exists” in the output, but it is not produced through the monkey's mechanism of permutation.</p><p>(This is analogous to the panspermia hypothesis saying that life could have been introduced from an external source rather than permuted naturally within the system. The hypothesis allows that a system capable of sustaining finite life may not have been able to produce life infinitely.)</p><p style="text-align: left;">Showing that some improbable event has zero probability of being produced could be quite difficult but “almost surely” doable. However, if we try to categorically argue that “almost surely” is practically zero, then the argument is only “almost surely” believable.</p><p></p>Likai Liuhttp://www.blogger.com/profile/06372207357661600589noreply@blogger.com0tag:blogger.com,1999:blog-4401555280825766585.post-42690351208326322682024-01-19T00:05:00.000-05:002024-01-19T00:05:29.895-05:00Deep Learning and Solving NP-Complete Problems<p>This recent article about <a href="https://www.nature.com/articles/s41586-023-06747-5">AlphaGeometry being able to solve Olympiad geometry proof problems</a> brings to my attention that there have been active research in recent years to combine deep learning with theorem proving. The novelty of AlphaGeometry is that it specializes in constructing geometry proofs, using synthetic training data guided by the theorem prover, which previous approaches have had difficulty with.</p><p>As a <a href="https://en.wikipedia.org/wiki/Formal_methods">formal method</a> like type checking of programming languages, <a href="https://en.wikipedia.org/wiki/Automated_theorem_proving">theorem proving</a> tries to overcome the ambiguity of mathematical proofs written in a natural language by formalizing proof terms using an algebraic language with rigorous semantics (e.g. <a href="https://en.wikipedia.org/wiki/Lean_(proof_assistant)">Lean</a>), so that proofs written in the language can be verified algorithmically for correctness. In the past, proofs had to be tediously written by humans, as computers had great difficulty constructing the proofs. Computers can do exhaustive proof search, but the search space blows up exponentially on the number of terms, which renders the exhaustive search intractable. However, once the proof is constructed, verification is quick. Problems that are exponentially intractable to solve but easy to verify are called <a href="https://en.wikipedia.org/wiki/NP-completeness">NP-complete</a>.</p><p>Traditional proof searching limits the search space to simple deductions, which can offload some tedium from manually writing the proofs. In recent years, machine learning is used to prune or prioritize the search space, making it possible to write longer proofs and solve larger problems. It is now possible for computers to construct complete mathematical proofs for a <a href="https://arxiv.org/abs/2202.01344">selected number of high school level math</a>. The idea is a natural extension to what AlphaGo does to prune the game's search space, which I have <a href="https://lifecs.likai.org/2016/03/how-to-play-go-like-computer-scientist.html">written about before</a>.</p><p>This is quite promising for a programming language that embeds theorem proving in its type system to express more elaborate safety properties, such as <a href="https://en.wikipedia.org/wiki/ATS_(programming_language)">Applied Type System</a> (ATS) designed by my advisor, which I have <a href="https://cs.likai.org/ats/ml-programmers-guide-to-ats">written about before</a>. Programs written in ATS enjoy a high degree of safety: think of it as a super-charged <a href="https://en.wikipedia.org/wiki/Rust_(programming_language)">Rust</a>, even though ATS predates it. Now that <a href="https://simonwillison.net/2022/Dec/5/rust-chatgpt-copilot/">generative AI can help people write Rust</a>, a logical next step would be to create a generative AI for ATS.</p><p>I would be interested to see how machine learning can be applied to solve <a href="https://en.wikipedia.org/wiki/List_of_NP-complete_problems">other NP-complete problems</a> in general, such as bin-packing, job shop scheduling, or the traveling salesman problem. However, machine learning will face fierce competition here, as these problems are traditionally solved using <a href="https://en.wikipedia.org/wiki/Approximation_algorithm">approximation algorithms</a>, which comes up with a close-enough solution using fast heuristics. Also, only a few of them have real-world applications. Admittedly, none are as eye catching as "AI is able to solve math problems" giving the impression of achieving general intelligence; in reality, the general intelligence is built into the theorem proving engine when it is used in conjunction with machine learning.</p><p>However, I think proof construction is a very prolific direction that machine learning is heading. Traditionally, machine learning models are a black box: even though it can "do stuff," it could not rationalize how to do it, let alone teaching a human or sharing the same insight with a different machine learning model. With proof construction, at least the proofs are human readable and have a universal meaning, which means we can actually learn something from the new insights gained by machine learning. It is still not general intelligence, and the symbolic insight to be gained might very well be a mere probabilistic coincidence, but it does allow machine learning to begin contributing to human knowledge.</p>Likai Liuhttp://www.blogger.com/profile/06372207357661600589noreply@blogger.com0tag:blogger.com,1999:blog-4401555280825766585.post-18978386657477343392024-01-07T01:17:00.008-05:002024-01-08T22:40:16.564-05:00So many camera systems! My thoughts.<p>My camera journey is a little meandering, but it can roughly be roughly summarized as "<a href="https://www.youtube.com/@cameraconspiracies">all I want is the perfect camera</a>."</p><p>My first camera is a <a href="https://global.canon/en/c-museum/product/dcc459.html">Canon PowerShot A50</a>, bought in 1999, which is a compact CCD camera. It can do 10-bit RAW, which I later discovered allowed me to correct underexposed photos more easily, and the colors actually looked decent! The camera writes to an original CompactFlash. Without a reader, you'd have to transfer images out of the camera with a serial port dongle which is very slow (115200 baud, or ~14 kbps). I later sold it at a fundraiser.</p><p>Next I had a <a href="https://www.imaging-resource.com/PRODS/HD1000/HD1000A.HTM">Sanyo Xacti VPC-HD1000</a>, bought in 2007, which is a camcorder that can shoot 720p60, 1080p30, or 1080i60 video. It also came with a strobe light for photos. It had some very rudimentary electronic stabilization that didn't work very well. I tried <a href="http://littlegreatideas.com/stabilizer/diy/">Jonny Lee's Poor Man Steadicam</a> hack, but I was still not happy with the result. After a few years of use, the sensor suddenly went grey on me, and eventually it was disposed through electronics recycling.</p><p>My next camcorder is a Panasonic HC-V750, bought in 2014, which can do 1080p. It had great image stabilization out of the box, but the color was a little washed out, so I never really used it very much.</p><p>My first somewhat serious camera is the <a href="https://www.theverge.com/2014/7/30/5949913/lytro-illum-review">Lytro Illum</a>. Bought in 2015, it was the first computational camera that will let you adjust focus and depth of field after a picture has been taken. It was ahead of its time. Even though it had a 40MP sensor, the pixels had to record depth, so the final image was only about 10MP, and it was very noisy in low light. Nonetheless, it was a great tool to study depth of field in composition. I still have it, but after <a href="https://techcrunch.com/2018/03/20/sources-google-is-buying-lytro-for-about-40m/">Google acquired Lytro</a>, they discontinued support. The software still works on an Intel Mac, and it should work on the Apple ARM silicon for now using <a href="https://support.apple.com/en-us/HT211861">Rosetta 2</a> emulation, but its years are numbered. The Lytro software on Windows may have a better luck!</p><p>In 2018, I decided I want an interchangeable lens camera that can shoot video. At that time, the <a href="https://shop.panasonic.com/products/gh5s-mirrorless-camera-body">Panasonic GH5S</a> was one of the first that can shoot 4K 30p 10-bit in camera, although the 10-bit video is not natively supported by Mac OS even til this day (VLC/FFmpeg supports it). In the next few years, I also started a collection of Micro Four-Thirds lenses. I will talk about them later.</p><p>I was pretty happy with the GH5S, except the contrast detection autofocus is not very good. The lack of IBIS also meant I could not do hand-held video very much. Some lenses have OIS, but they are not quite enough. I bought a gimbal, but it attracted enough attention that I was asked to leave the premise a few times when out shooting. Also, since Mac OS does not natively support the 10-bit H.264 from the GH5S, I have to transcode them to ProRes when editing the videos, which is a pain. As a result, I used it more for stills photography. The 14-bit RAW is great for color grading, though the sensor is limited to 10.2 MP.</p><p>The 14-bit RAW at 10.2 MP means the GH5S is excellent for taking time lapses. The camera's built-in conversion only resulted in a lower-resolution 8-bit video, which I ended up using more due to laziness, but the RAW files can be converted to DNG using <a href="https://helpx.adobe.com/camera-raw/using/adobe-dng-converter.html">Adobe DNG Converter</a> and edited in <a href="https://www.blackmagicdesign.com/products/davinciresolve">Davinci Resolve</a> for the most pristine quality time lapse.</p><p>In 2022, I bought two more cameras which overcompensated for the want of image stabilization and the want of megapixels.</p><p>The camera that overcompensated for my want of image stabilization is the GoPro Hero 10. Although the image stabilization is decent, the 4K 60p video is very noisy even during daytime in the shade. The 5.3K cinematic video is probably better, but it can still only record in 8-bit video (their next release later in 2022, Hero 11, records in 10-bit). The GoPro also tends to overheat after 20 minutes or so of continuous recording. This is my first camera that I have to worry about overheating, and sadly isn't the last camera. As a result, I did not end up using it very much.</p><p>The other camera that overcompensated for my want of megapixels is the <a href="https://fujifilm-x.com/global/products/cameras/gfx100s/">Fujifilm GFX 100S</a>, which is a whopping 102 megapixels. It is very satisfying to keep enlarging the picture and play a game of "where's waldo" with all the details. However, when pixel peeing, I can see some issues with chromatic aberration and corner softness at F/4 with the 32-64mm zoom lens. I don't have a different lens to know if this is a common issue. I've seen some photographers stopping down to F/22 which tends to make the lens sharper for the high resolution.</p><p>Like its predecessor GFX 100, the 100S can shoot 4Kp30 10-bit videos but is more compact. The phase-detect autofocus works great for me. The image stabilization is excellent both for photo and video, although in some situation the video can momentarily have a wobbling effect. Above all, I am most impressed with the colors. After experimenting with a few settings, combining the DR-P (dynamic range priority) mode with Velvia film simulation gave me the best result. The DR-P on its own would give a color that is too flat, but the Velvia simulation restored some contrast; Velvia simulation on its own tends to be too contrasty, since originally it is a color reversal film that only had 5 stops of dynamic range. Using DR-P and Velvia together balanced out the colors very well.</p><p>Although I shoot in RAW+JPEG, the out-of-camera colors are so good that my attempt to color grade the RAW usually results in slightly worse colors! The film simulation can also be applied to the 10-bit videos, and I am very impressed with the video colors as well. The video is 10-bit in H.265, which is natively supported by Mac OS. The 16-bit RAW from GFX 100S are still about 100MB a piece after lossless compression, so I find myself really slowing down and be more meticulous with the composition when taking the photos. However, it still attracted unwanted attention. I was also asked to leave the premise a few times when shooting.</p><p>Towards the end of 2023, I bought two more cameras that overcompensated for the avoidance of unwanted attention.</p><p>The first was the Sony A6700. It is slightly larger than the ZV-E1, but the A6700 in APS-C has more compact lenses than the ZV-E1 which is full-frame. The video looks very clean at 4K 60p. The low-light performance is great. Clear image zoom (digital zoom) also looks excellent. Not to mention, Sony has the best autofocus. However, the image stabilization of Sony is lacking: the footage is usable when standing still, but unusable when walking, which means I would have to use the gimbal again. It also tends to overheat after 20 minutes, but the Ulanzi external fan helped.</p><p>For the second camera, I decided to try the <a href="https://www.dji.com/osmo-pocket-3">DJI Osmo Pocket 3</a>. I am very impressed with the low light performance. At night, its default setting tends to overexpose the image, but I find that -2 EV compensation looked the best. The 4K has a lot of details at no zoom, but will begin to lose detail with digital zoom. Autofocus is reasonable, although I do notice the occasional hunting. The image stabilization is also impeccable being a gimbal itself, and it fits into the palm of my hand! I think this has the best potential for avoiding unwanted attention.</p><p>In reality, I probably will have to use the Pocket 3 and A6700 interchangeably: shoot with Pocket 3 when walking, and A6700 with the 16-50mm lens if I need to zoom.</p><p>My most horror with the A6700 happened when a friend asked me to do a photoshoot for their family, and they insisted that I use "my latest camera." After being spoiled with Fujifilm, I was unimpressed with Sony's <a href="https://alphauniverse.sony-asia.com/learn/tips/what-s-cinetone-and-how-do-you-use-it">S-Cinetone</a> which is supposed to be their best color profile. However, when color grading the RAW files, I have to start from the terrible lens distortion and vignetting from the pancake lens I used. Remember I bought this camera for the compactness, not for the glass! Correcting the lens defects in post is doable, which Sony can also do in camera very well, but it took me hours. This is probably why the Sony lets me apply my own LUTs in camera, and many photographers sell their LUTs online.</p><p>So my experience with cameras is indeed quite meandering, but I will summarize my thoughts on camera systems here.</p><h4>Canon</h4><p>I purposefully avoided Canon throughout the years. I will give them credit that they have a respectable camera system with a wide selection of EF mount lenses for DSLR and now RF mount lenses for mirrorless cameras. Also respectably, Canon still develops their own sensors with reasonably good colors, while most other cameras just use Sony sensors. However, I avoided Canon for two reasons:</p><p></p><ul><li>Back in the days, they have deliberately crippled the video capabilities of their photography cameras to avoid cannibalizing on their cinematic camera line. This changed with the Canon EOS R5 which can shoot 8K internally in RAW. Although R5 has overheating issues, the R5C addressed it with an internal fan.</li><li>Although there are many third-party EF lenses and third-party cameras supporting the EF mount, they are reverse-engineered and not officially supported by Canon. Furthermore, Canon started being <a href="https://petapixel.com/2022/09/06/canon-confirms-its-going-after-lens-makers-for-patent-infringement/">litigious with third-party RF lenses</a>.</li></ul><p>This means that photographers are buying into the Canon walled-garden and have to put up with their superficial product segmenting which could come again in the future if not now.</p><h4>Nikon</h4><p>I really don't have much experience with Nikon. I know some people are happy with it. That's about it.</p><h4>Micro Four-Thirds</h4><p>I have the most experience with Micro Four-Thirds, which includes cameras and lenses made by Panasonic and OM Systems (was Olympus). Panasonic lenses are pretty good for video, but I like the Olympus lenses best for their superior optical quality and bokeh. I am absolutely in love with the clutch manual focus of Olympus lenses: clutching the focus ring will engage in mechanical manual focus, and de-clutching it will allow autofocus as well as focus by wire. It gives you a taste of pulling manual focus like a cine lens, but Olympus lenses only have 90° of focus pull rotation compared to the 200° for cine lenses, which makes it a little too sensitive for cinematography.</p><p>Micro Four-Thirds sensor size is 1/4 the surface area of full-frame with exactly 2x crop factor, so focal length conversion is pretty straightforward. Due to the smaller sensor, the native telephoto lenses are miniature compared to the full-frame counterparts. The sensor size works well with Super35 (APS-C) cinema lenses. It is also possible to use the camera with B4 ENG lenses made for 2/3" sensor with the 2x teleconverter. In theory, this makes this camera system the most versatile and diverse, but throughout the years it had a lot of unrealized potential.</p><p>First is that the cameras were late to the phase-detect autofocus. Olympus had PDAF first, but their cameras could not shoot 10-bit video. Panasonic only recently released G9 II with PDAF. Even so, the G9 II body is exactly the same size as the full-frame S5 IIx. I feel that Micro Four-Thirds makes more sense with compact cameras and pancake lenses, but so far Sony is eating their lunch on compactness.</p><h4>L-Mount</h4><p>Another camera system of interest is the L-Mount, which is originally designed by Leica but now includes Panasonic, Sigma and DJI in a closed consortium. It has great lens selection in full-frame and APS-C format, and the APS-C system can potentially be very compact as well. Leica cameras and lenses are premium, but Panasonic also has great cameras and lenses for video.</p><p>It is a promising system with good vendor diversity.</p><h4>Sony E-Mount</h4><p>Sony makes the vast majority of camera sensors, so there is no question about the image quality there. I am just not personally a fan of their color science.</p><p>Sony is also reasonable when it comes to <a href="https://support.d-imaging.sony.co.jp/www/e_mount/en/detail.html">licensing E-Mount</a> to other lens makers, so there is a great diversity of lenses from Sigma, Voigtlander, and even Fujinon. However, the <a href="https://petapixel.com/2018/10/12/sony-e-mount-wasnt-designed-for-full-frame-leica-exec-says/">E-Mount is fundamentally designed for APS-C sensor size</a> and not for full-frame, as the throat-diameter partially occludes the corners of a full-frame sensor. This can become a challenge in lens design where the distortion and vignetting will have to be corrected in camera or in post. Lens distortion and vignetting is prominent even in their high-end FE G-master lenses (e.g. <a href="https://www.opticallimits.com/sonyalphaff/1169-sony2070f4g">FE 20-70mm F/4 G</a>, <a href="https://www.opticallimits.com/sonyalphaff/1100-sony20f18?start=1">FE 20mm F/1.8 G</a>).</p><p>This means photographers who want to work with RAW will find themselves spending more time in post-processing. They could save some time if they save their own LUTs to the camera and let the camera do both lens and color correction, which explains why Sony cameras seem to have a thriving after-market of LUTs.</p><p>Sony makes almost all E-Mount cameras, but it is not easy to choose the right camera, as they have many lines of camera segmented for every need and every price point. This is as far as I could gather about their <a href="https://www.kenrockwell.com/sony/compared-2016.htm">camera lineup</a>, as of 2023.</p>
<table border="1" cellpadding="2" cellspacing="0" style="margin: auto;">
<tbody><tr><th>Purpose</th><th>Full Frame</th><th>APS-C</th></tr>
<tr><td>Cinema</td><td>FX3, FX6, FX9</td><td>FX30</td></tr>
<tr><td>Vlogging</td><td>ZV-E1</td><td>ZV-E10</td></tr>
<tr><td>Photography</td><td>A7R V, A1</td><td>A6700</td></tr>
<tr><td>Videography</td><td>A7C II, A9 III</td><td>n/a</td></tr>
</tbody></table>
<p>Some of the cameras share the same sensor, for example FX3 and ZV-E1 share the same full-frame sensor, and FX30 and A6700 share the same APS-C sensor. Apart from the obvious form-factor difference, newer cameras tend to have more computational features that are missing from older cameras. Sony is known for <a href="https://www.reddit.com/r/SonyAlpha/comments/164k51k/firmware_update_controversy/?rdt=45169">neglecting their firmware updates</a>.</p><p>The A6700 is a more recent camera that is supposed to have the most intuitive user interface, but I still find myself confused. For example, if I shoot RAW photos, it also disables Clear Image Zoom for video. H.265 only has 60p or 24p, but 30p is missing. Sony also does not use the common names for video codecs: H.264 is called XAVC S, and H.265 is called XAVC HS. The camera has three separate modes: photo, video, and S&Q (slow and quick), but the custom presets in these modes are all completely independent. In photo mode, the video button can still record video, but in video mode the shutter button is disabled. None of it makes sense!</p><p>As far as I can tell, all the non-cinema cameras can overheat when shooting long video, and that is probably by design. Sony cameras tend to be the most compact, which makes thermal management more challenging.</p>
<h4>Fujifilm / Fujinon</h4><p>Fujifilm is the camera brand, while Fujinon is the lens brand. Fuji has two separate systems: XF mount for APS-C, and G mount for medium format. Sigma and Tokina also make third-party XF lenses. Venus Laowa also makes some G mount lenses. However, the vast majority of XF and G lenses are made by Fujinon, so the selection is more limited, though you should be able to find a lens for most focal length and aperture.</p><p>What Fujifilm is famous for is its <a href="https://www.bhphotovideo.com/explora/photography/tips-and-solutions/your-guide-to-fujifilm-film-simulations">legendary film simulation</a> in camera, as I have mentioned before. The film simulation profiles are based on color analysis of real film stock: Astia, Provia, Velvia, Eterna, Acros, etc. I think they are successful not so much because of the nostalgic factor, but because the colors from the original films were already time-proven.</p><p>They recently released GFX 100 II which can shoot 8K, but due to the computational requirement to scale 100 MP down to 8K, the 8K is cropped from the 35mm image area which is a shame. I actually think that a GFX 40 MP 8K makes a lot of sense. They already have X-H2 which is 40MP in APS-C that can shoot 8K non-cropped. I would be happy buying into their XF system in the future.</p><h4 style="text-align: left;">Action Cameras, etc.</h4><p>Action cameras like the GoPro can be fun to use for sports and travel and fairly easy to carry around as an extra camera just in case or for behind-the-scenes footage. They are more durable than camera phones and can stand a lot of abuse. However, the GoPro market share is increasingly getting eroded by Insta360, DJI, and other Chinese companies. Action cameras are also known to have poor low-light performance. The DJI Osmo Pocket 3 is probably going to eat their lunch, except it is not weather-proof or water-proof. People who use action cameras will probably always juggle multiple cameras anyways.</p><h4 style="text-align: left;">Conclusion</h4><p>If you ask for my recommendation, my favorite camera system by far is still the Fujifilm for the legendary color science alone. It is such a joy to shoot.</p><p>I still think that the Micro Four-Thirds has a lot of potential. I would not mind replacing my GH5S with the G9 II when it breaks, but right now I do not see a need to upgrade. I would like to see more compact cameras and pancake lenses.</p><p>Sony is probably fine for casual video, but I don't recommend it for photographers because of the time they would need to spend in post-processing to correct the lens distortion and colors. Sony cameras are also not as intuitive to use, but I suppose one can get used to it.</p><p>Although my opinion about the Sony camera system is not favorable, this is just speaking from my own experience with some influence from the research done by others. In the end, photography is highly subjective to personal preference.</p>Likai Liuhttp://www.blogger.com/profile/06372207357661600589noreply@blogger.com0tag:blogger.com,1999:blog-4401555280825766585.post-59556932873513316282023-12-22T08:46:00.012-05:002023-12-22T17:42:55.632-05:00Why is a Transformer rated in kVA, not in kW?<p>I saw a post like this fly by on my Facebook feed today. I lost it when I closed the window, but the question lingered. Why is a Transformer rated in kVA, not in kW? I did not know the answer, but I should know because I have also seen kVA rating used in datacenter contract rather than kW.</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEhhs8Pz91XrSSS3Q-Ybcq_xZ0XYsmSZ92sbaqfgKT4IRUv_hk_UPW3uxVsMN4mA8l15-O8OKveQSo4friJ2V1pE5XLxIQDDZrOvInmilG7Q1H3NH2bafwmtbOWszrxOvp70kWKpQDNMH5Zaem32iLIsbevRtbLuGuOJlmJS6pMWXvpOzwEAX00Rql73vKA" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="960" data-original-width="960" height="640" src="https://blogger.googleusercontent.com/img/a/AVvXsEhhs8Pz91XrSSS3Q-Ybcq_xZ0XYsmSZ92sbaqfgKT4IRUv_hk_UPW3uxVsMN4mA8l15-O8OKveQSo4friJ2V1pE5XLxIQDDZrOvInmilG7Q1H3NH2bafwmtbOWszrxOvp70kWKpQDNMH5Zaem32iLIsbevRtbLuGuOJlmJS6pMWXvpOzwEAX00Rql73vKA=w640-h640" width="640" /></a></div><p></p><p>Many electricians point out that kVA represents apparent power, and kW is the available amount of work under load. The difference is the power factor representing efficiency, so kW = kVA × pf. In a perfect system, kVA = kW. And electricians explain that power factor is less than 1 due to transmission loss.</p><p>But someone also pointed out that for DC (direct current), kVA = kW because the <a href="https://byjus.com/physics/difference-between-kva-and-kw/">current does not go out of phase</a> (not a great explanation though). That reminded me from college physics class that AC (alternative current) is a sine wave, and its RMS (<a href="https://www.sciencedirect.com/topics/engineering/root-mean-square-value">root mean squared</a>) voltage is smaller than the peak voltage. For sine, rmsV = (√2 / 2) × Vp ≅ 0.707 Vp. RMS voltage is used to calculate work, not the peak voltage.</p><p>As a side note, AC in the real world only <a href="https://www.eevblog.com/forum/projects/show-us-your-mains-waveform!/150/">approximates</a> a sine wave, and UPS <a href="https://arstechnica.com/gadgets/2016/08/from-the-wirecutter-the-best-uninterruptible-power-supply-ups/">modified sine wave</a> can look much worse.</p><p>So the answer is this: transformer is rated in kVA because it has to handle peak voltage, but the usable work kW is rated in RMS voltage which would under-specify the design requirements of a transformer.</p><p>Here is an afterthought: when physics gets applied in the real world, the electricians learn a simplified version of it stated to them as a matter of fact, which is not wrong, but they do not understand the reasoning behind it. Some people are perfectly happy to learn just enough to get stuff done. But if you want to keep innovating, you have to understand the reasoning behind an existing system, or else you would reinvent the wheel poorly by not learning from past mistakes and making additional mistakes that people before you knew to avoid.</p><p>On that topic, I recommend watching this video by <a href="https://www.youtube.com/watch?v=OoJsPvmFixU">SmarterEveryDay - I Was SCARED To Say This To NASA... (But I said it anyway)</a>.</p><div class="separator" style="clear: both; text-align: center;"><iframe allowfullscreen="" class="BLOG_video_class" height="400" src="https://www.youtube.com/embed/OoJsPvmFixU" width="640" youtube-src-id="OoJsPvmFixU"></iframe></div><br /><p><br /></p>Likai Liuhttp://www.blogger.com/profile/06372207357661600589noreply@blogger.com0tag:blogger.com,1999:blog-4401555280825766585.post-27129280229133439542023-07-15T01:32:00.001-04:002023-07-15T01:32:42.987-04:00ZFS zpool replace "no such device in pool"<p>Since the last time <a href="https://lifecs.likai.org/2016/11/my-setup-with-openzfs-on-mac-os-x.html">I built my ZFS pool in 2016</a> with 4x 1TB SSDs in a thunderbolt 2 enclosure, I ran out of space in 2019 and upgraded the SSDs to 4x 2TB. I didn't have an extra enclosure, so I took a fairly risky approach to yank each SSD out, put in a new SSD, and let ZFS resilver. It only worked because my pool is a raidz1. During resilvering, the pool was in a degraded state due to the lack of redundancy.</p>
<p>And I am just about to run out of space again after 4 years. I also lucked out because the SSDs are heavy discounted right now due to <a href="https://www.tomshardware.com/news/ssd-price-cuts-q3-2023">oversupply</a>. The chip makers are trying to reduce production, so the price might rise again.</p><p>This time, I bought 4x 4TB M.2 NVMe (<a href="https://www.bhphotovideo.com/c/product/1710669-REG/">non-affiliate B&H link</a>) and put them in a Thunderbolt 3 enclosure (<a href="https://www.startech.com/en-us/hdd/m2e4btb3">non-affiliate vendor link</a>). This means that I could do a proper replace without risking the degraded state.</p>
<p>But when I try to replace the SSDs, I keep getting the "no such device in pool" error. I tried these variations:</p>
<pre>$ sudo zpool replace tank disk2 disk13
cannot replace disk2 with disk13: no such device in pool
$ sudo zpool replace tank disk2 /dev/disk13
cannot replace disk2 with /dev/disk13: no such device in pool
$ sudo zpool replace tank disk2 media-728C3B9F-647B-4C40-9DDF-E140E55D8C89
cannot replace disk2 with media-728C3B9F-647B-4C40-9DDF-E140E55D8C89: no such device in pool
$ sudo zpool replace tank disk2 /var/run/disk/by-id/media-728C3B9F-647B-4C40-9DDF-E140E55D8C89
cannot replace disk2 with /var/run/disk/by-id/media-728C3B9F-647B-4C40-9DDF-E140E55D8C89: no such device in pool
</pre>
<p>None of the above worked, and it was driving me nuts. When I was younger, I would have pulled my hair out. Now that my hair is more spase, I've learned to take better care of it by showing restraint.</p>
<p>It turned out that "no such device in pool" actually meant it could not find disk2, even though it is able to find disk13 just fine for some reason. This worked:</p>
<pre>$ sudo zpool replace tank media-E235653B-0924-2F46-B708-FDEFFAB5CB15 disk13
</pre>
<p>The colon in the error message makes it appear that "no such device in pool" is about the new disk, and it is confusing why the new disk have to be added to the pool first. The error is actually about ZFS not being able to identify the existing disk in pool that needs replacement. In fact, if it could not find the new disk, the error message is different:</p>
<pre>$ sudo zpool replace tank disk2 disk777
cannot open 'disk777': no such device in /dev
must be a full path or shorthand device name
</pre>
<p>The error message could be improved, for sure. I wonder if baldness of the people in the tech industry could have been prevented if more error messages are helpful.</p>
<p>Another thing I learned is that when upgrading raidz1, it is far more efficient to replace all the disks at once and let ZFS resilver them in parallel. It takes the same amount of time to resilver one disk as to resilver all disks. The reason is that resilvering requires reading all disks in order to check the parity, whether it is resilvering one disk or many. In my case, the process is bottlenecked by the read, not the write.</p>Likai Liuhttp://www.blogger.com/profile/06372207357661600589noreply@blogger.com0tag:blogger.com,1999:blog-4401555280825766585.post-89125822992839766662023-06-30T23:51:00.002-04:002023-07-04T17:27:49.763-04:00Beware of Passkeys Account Takeover Risks<h4 style="text-align: left;">What are Passkeys?</h4><p style="text-align: left;">This is the problem passkeys promises to solve by replacing passwords, according to <a href="https://fidoalliance.org/passkeys/">FIDO Alliance</a>:</p><blockquote><div><div><i>Based on FIDO standards, passkeys are a replacement for passwords that provide faster, easier, and more secure sign-ins to websites and apps across a user’s devices. Unlike passwords, passkeys are always strong and phishing-resistant.</i></div><div><i><br /></i></div><div><i>Passkeys simplify account registration for apps and websites, are easy to use, work across most of a user’s devices, and even work on other devices within physical proximity.</i></div></div></blockquote><p>There is a lot of <a href="https://passkeys.dev/">fanfare</a>. Or if you want to <a href="https://passkeys.dev/docs/reference/specs/">read the spec</a>, it's easy to see only pine needles and miss the forest. Passkeys spec is built upon several FIPS standards whose underlying mathematics principle of the cryptography is sound. However, if the cryptography is used poorly, the overall security of the user experience can still be compromised.</p><p>The spec focuses on the communication protocol between the authenticator (device), the client (user), and the website (or app). A passkey consists of a public key and a private key. You give away the public key but keep the private key secret. You can prove possession of the private key without revealing it, and the evidence can be verified by the public key alone. Furthermore, you cannot guess the private key from the public key.</p><p>Yubico has an intuitive overview <a href="https://developers.yubico.com/Passkeys/How_passkeys_work.html">how passkeys work</a>, but here is my paraphrase of their sequence diagram.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://docs.google.com/drawings/d/e/2PACX-1vRJxDiZYQv3-26f2VnIffm8JOLwgr3DUxEKHgvpmesHWyCrsxiUt3LkIQEzRilo4pGW_xwhSpmwzewk/pub?w=864&h=1007" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="800" data-original-width="687" height="640" src="https://docs.google.com/drawings/d/e/2PACX-1vRJxDiZYQv3-26f2VnIffm8JOLwgr3DUxEKHgvpmesHWyCrsxiUt3LkIQEzRilo4pGW_xwhSpmwzewk/pub?w=864&h=1007" width="550" /></a></div><h4 style="text-align: left;">What is the Problem?</h4><p>The device used to create an account on a website stores the passkey so the user can login from the same device later, but the spec does not prescribe what happens when the user wants to login from another device. When using a <a href="https://www.amazon.com/FIDO-U2F-Security-Key/s?k=FIDO+U2F+Security+Key">hardware FIDO U2F</a> security key, the private key is stored on the hardware and cannot be copied, but you can plug in the key to another computer. Passkeys are software generated, so it is open to the solutions provided by the platform with varying degrees of security.</p><p>The solution provided by <a href="https://developer.apple.com/passkeys/">Apple</a> seems to suggest that the "device" is an iCloud keychain. This means the user has to trust Apple for safe-keeping of the private key on their servers. Despite Apple's <a href="https://support.apple.com/en-us/HT213305">synchronization security</a> claims that passkeys are end-to-end encrypted, the fact that their implementation is "rate limited to help prevent brute-force attacks" and "recoverable even if the user loses all their devices" should raise an eyebrow. This means an attacker could still "recover" your passkeys without accessing any of your devices. This is permissible because the passkeys spec allows creating roaming passkeys.</p><p>The solution provided by <a href="https://developers.google.com/identity/passkeys">Google</a> seems to suggest that when a user wants to log in from a new device, the new device generates a new public key which is shown as QR code, and the user has to take affirmative action to copy the new public key to an existing device to be authorized by an existing key. However, according to the <a href="https://developers.google.com/identity/passkeys/faq">FAQ</a>, the private key is still synchronized to the Google account and is recoverable even if you lose your device.</p><p>Security is only as strong as its weakest link. Any time a key is recoverable without using an authorized device, that is an opportunity for account takeover. If the recovery is done by sending a verification code via SMS, it can be susceptible to <a href="https://en.wikipedia.org/wiki/SIM_swap_scam">SIM swapping</a> attack. If the recovery is done by a special passcode, then the passcode itself can be brute-forced or stolen. These targeted attacks are typically done to high-profile public figures.</p><p>The one scenario that passkey does better than password is <a href="https://haveibeenpwned.com/">when a website suffers data breach</a>. Since the website only knows the public keys, they are useless to hackers.</p><h4 style="text-align: left;">It Could Be Worse</h4><p>Even if the platform makes account recovery impossible, there are still ways to design new device login that render the security useless.</p><p style="text-align: left;">Let's say an existing device lets the user copy the private key to another device. That means a malware running on the existing device can exfiltrate the private key. If the private key is shown as QR code, then anyone with a telescope could steal the private key at a distance. The private key could be encrypted by a passphrase in transit, but then the passphrase could be brute-forced.</p><p style="text-align: left;">Alternatively, the website may send a push authentication to an old device to authorize the login. When an account is under attack, the user would be overwhelmed with many push notifications and might click the wrong button and accidentally accept. They also may be duped to think that the new login comes from a device they already enrolled that somehow lost the authorization.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEibFmAKXksMCCGp-0EZ6boCbbPJ7rTxniP4t6yEv9HJBb1afI7nX9Aad-vAtOAkT5Dd4ZwkCWTRSUYyes_Dg_KSLrGfWdD0NjobNisnk9WMq4SxaUq0y-TUE_igpbc8J7uZ0ZsG56qpmufFmGrpdD895SYesFNdT1VTZvx3a2uVcLCimXBP2cSYOknCErk/s1746/Overwhelmed.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1126" data-original-width="1746" height="413" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEibFmAKXksMCCGp-0EZ6boCbbPJ7rTxniP4t6yEv9HJBb1afI7nX9Aad-vAtOAkT5Dd4ZwkCWTRSUYyes_Dg_KSLrGfWdD0NjobNisnk9WMq4SxaUq0y-TUE_igpbc8J7uZ0ZsG56qpmufFmGrpdD895SYesFNdT1VTZvx3a2uVcLCimXBP2cSYOknCErk/w640-h413/Overwhelmed.png" width="640" /></a></div><h4 style="text-align: left;">Best Practice</h4><p style="text-align: left;">The best way to implement passkey is to store the private key in a hardware secure element that makes retrieval physically impossible. This makes the passkey always physically bound to the device. To enroll a new device, the user will have to take affirmative action to copy out the new public key to an existing authorized device. After verifying the user to be who they are, the existing device would then create a signature, using an existing key that is already trusted by the website, to certifying that the new key is trustworthy. After the new public key is enrolled by the existing device, the website will allow the new device to login.</p><p style="text-align: left;">This process of creating new passkey on a new device to be authorized by an old device can still be automated by physically tethering the two devices. Authorization of new passkeys could also be done wirelessly over a reasonably secure channel, as long as the channel is mutually authenticated between the two devices and guarded against man-in-the-middle attacks. It is important to note that the passkey is still device-specific and never shared with another device. The user can keep a backup device in a bank safe deposit box if needed.</p><h4 style="text-align: left;">Conclusion</h4><p style="text-align: left;">It is fair to say that passkeys does mitigate data breaches from a website, but there are still a lot of ways a platform could introduce account takeover risks if they employ passkeys improperly. Even though the passkeys spec is built on sound cryptography, it does not rule out scenarios where account takeover could still happen under a targeted attack. Since security is as strong as its weakest link, this is something people should be aware of.</p>Likai Liuhttp://www.blogger.com/profile/06372207357661600589noreply@blogger.com0tag:blogger.com,1999:blog-4401555280825766585.post-11544954896708373362022-06-13T23:48:00.001-04:002022-06-13T23:51:35.429-04:00Two-Liner Version String Sorting in Python<p>Version sorting is sometimes called natural sorting as well. There exists a <a href="https://github.com/SethMMorton/natsort">natsort</a> package that caters to the need, and it has more than 1000 lines of code. What if I tell you we can do it in two lines?</p><p>Version sorting is the problem that a version string "1.2.1" should be less than "1.10.1", but string comparison puts "1.10.1" before "1.2.1" because the prefix "1.1" < "1.2". Version strings can be arbitrary complex, e.g. "1.2.3beta1" should be less than "1.2.10beta3" or it could be punctuated like "1.2.3-rc3".</p><p>The gist of it is that if any part of the string looks like a number, then compare it like a number. Here is the code:</p>
<pre class="prettyprint">import re
def version_key(key: str) -> tuple:
"""Returns a sortable key in version order."""
items = re.split(r'(\d+)', key)
return tuple(int(item) if item.isdigit() else item for item in items)
</pre>
<p>And it can be used like this:</p>
<pre class="prettyprint">>>> a = ['version-1.9', 'version-2.0', 'version-1.11', 'version-1.10']
>>> sorted(a, key=version_key)
['version-1.9', 'version-1.10', 'version-1.11', 'version-2.0']
</pre>
<p>So how does it work? When <a href="https://docs.python.org/3/library/re.html#re.split">re.split()</a> is given a delimiter pattern that contains capturing groups, it also returns the capturing group matches in the result list. Effectively, we get a tokenizer that can split numbers and non-numbers into separate tokens.</p>
<pre class="prettyprint">>>> re.split(r'(\d+)', '1.10.2')
['', '1', '.', '10', '.', '2', '']
</pre>
<p>Then, the second line converts any string in the list that looks like a number into an integer.</p>
<pre class="prettyprint">>>> items = ['', '1', '.', '10', '.', '2', '']
>>> tuple(int(item) if item.isdigit() else item for item in items)
('', 1, '.', 10, '.', 2, '')
</pre>
<p>When this tuple is being used by sorted() for comparison, Python would compare the tuple item-wise. The leading empty string means the original string begins with a number. It is important to keep it so that the item types are aligned to be (str, int, str, int, ..., str) in any version key tuple. If we remove the empty strings, pairwise comparison may end up comparing a str and int, which would be an error.</p><p>As a result, we can compare versions that are formatted differently, and it would still work.</p>
<pre class="prettyprint">>>> a = ['version-1.9', '2.0', 'version-1.11', '1.10']
>>> sorted(a, key=version_key)
['1.10', '2.0', 'version-1.9', 'version-1.11']
</pre>
<p>Of course, natsort does a lot more. It also has options to extract a floating point number or a human readable comma-delimited number. The working principle will be very similar, just replace the pattern given to re.split() with what you want to be interpreted as a "number" and somehow convert it to a number.</p><p>But for sorting a version string, two lines of code is all you need.</p>Likai Liuhttp://www.blogger.com/profile/06372207357661600589noreply@blogger.com0tag:blogger.com,1999:blog-4401555280825766585.post-1336967443821065382022-06-12T00:48:00.006-04:002022-06-13T10:24:11.736-04:00Designing a Data Breach Resilient Database<div class="separator" style="clear: both; text-align: center;"><a href="https://www.flickr.com/photos/joyosity/32409071384" style="margin-left: 1em; margin-right: 1em;"><img alt="Rainbow crepe cake, by The Cooking of Joy." border="0" data-original-height="1536" data-original-width="2048" height="480" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiOEi_IbfNCco5rmXAN4ZKwOZYe1kSoJ9nxFZM7VkXC_t65YG9yN9l2Ddbrd_fu-B8ZGu77GYy4QGKHcHHR0SP07jQWHdfa2cihIlqdWm8NWB5GF8lRlbf1aVzjqE_gJVgPDpm_8r6rqsgQUmN-11IAtxNL_UfEVOkvKob1sP2FIFNHzVWr_PGYup9y/w640-h480/32409071384_5bc77bc302_k.jpg" width="640" /></a></div><p>With several high profile data breach settlements—such as <a href="https://www.equifaxbreachsettlement.com/">Equifax</a> and <a href="https://www.morganstanleydatasecuritysettlement.com/">Morgan Stanley</a>—coming to a conclusion, it begs the question: how do we prevent these data breaches from happening in the first place?</p><p>In the case of Equifax, <a href="https://www.warren.senate.gov/imo/media/doc/2018.09.06%20GAO%20Equifax%20report.pdf">attackers gained unauthorized access</a> via their dispute portal website which allowed them to discover credential to the central database keeping the customer data, and they slowly exfiltrated that data over 76 days to avoid detection.</p><p>In the case of Morgan Stanley, the data breach was caused by improper disposal of used hard drives containing customer data.</p><p>Security is always as strong as the weakest link, so if we want to prevent future data breach, we have to understand how the storage stack works. It's helpful to think of storage as a layered model like how the <a href="https://www.cloudflare.com/learning/ddos/glossary/open-systems-interconnection-model-osi/">OSI model</a> (Open Systems Interconnection) helps us understand networking. We can tentatively call this the DSI model (Data Storage Interconnection). There is a straightforward analogy of the layers between these two models.</p><p>To recap, here are the OSI layers.</p>
<hr />
<table border="1" cellpadding="5" cellspacing="1" style="margin: auto;">
<caption>OSI model</caption>
<thead>
<tr><th><br /></th><th>OSI</th><th>Description</th></tr>
</thead>
<tbody>
<tr><td>7*</td><td>Application Layer</td><td>A website or app that the user interacts with.</td></tr>
<tr><td>6*</td><td>Presentation Layer</td><td>Interpretation of the data, e.g. HTML, JSON, UTF-8.</td></tr>
<tr><td>5*</td><td>Session Layer</td><td>Definition of request and response semantics, e.g. HTTP, gRPC.</td></tr>
<tr><td>4</td><td>Transport Layer</td><td>Data transmission and flow control, e.g. TCP, UDP.</td></tr>
<tr><td>3</td><td>Network Layer</td><td>Routing of data between networks, e.g. IPv4, IPv6.</td></tr>
<tr><td>2</td><td>Datalink Layer</td><td>Node identification on a local network, e.g. Ethernet MAC.</td></tr>
<tr><td>1</td><td>Physical Layer</td><td>Modulation of bits on a physical medium, e.g. 10GBASE-T.</td></tr>
</tbody>
</table>
<p>* Note: The <a href="https://www.itu.int/rec/T-REC-X.200-199407-I">original OSI model specification</a> only specified the purpose of each layer but not concrete examples of standards. Traditionally, HTTP is considered an "application" known as the web browser, but these days HTTP is also used for a transport like <a href="https://github.com/grpc/grpc/blob/master/doc/PROTOCOL-HTTP2.md">gRPC</a>, which pushes it down to a lower layer. My definition here reflects the modern web2 or web3 stacking.</p>
<hr />
<table border="1" cellpadding="5" cellspacing="1" style="margin: auto;">
<caption>DSI model</caption>
<thead>
<tr><th></th><th>DSI</th><th>Description</th></tr>
</thead>
<tbody>
<tr><td>7</td><td>Application Layer</td><td>A website or app that the user interacts with.</td></tr>
<tr><td>6</td><td>Presentation Layer</td><td>Interpretation of the data, e.g. HTML, JSON, UTF-8.</td></tr>
<tr><td>5</td><td>Schema Layer</td><td>Modeling of data, e.g. table definitions.</td></tr>
<tr><td>4</td><td>Database Layer</td><td>Transaction and indexing, e.g. SQL.</td></tr>
<tr><td>3</td><td>Filesystem Layer</td><td>Abstraction for files and directories, e.g. ZFS, NTFS, AWS S3.</td></tr>
<tr><td>2</td><td>Block Layer</td><td>Block devices, e.g. SCSI, SATA.</td></tr>
<tr><td>1</td><td>Physical Layer</td><td>Modulation of bits on a physical medium, e.g. <a href="https://en.wikipedia.org/wiki/Shingled_magnetic_recording">SMR</a>.</td></tr>
</tbody>
</table>
<hr />
<p>In a similar fashion, we can define data storage layers that are roughly analogous to the OSI layers. Application Layer and Presentation Layer are the same, but layers 5 and below are redefined for data storage purposes. Physical Layer has the same purpose again.</p><p>(As a side note, the Presentation Layer and Schema/Session Layer are typically where difficulties may arise when sharing data between different companies, for example to provide universal access to health records, and this is true both in the networking and data storage sense. But that's a topic for another day.)</p><p>As we can see, the Morgan Stanley data breach happened at the Physical Layer. The Equifax breach started at the Application Layer, but the data is exfiltrated at the Database Layer. However, data breach can happen at other layers too. Here is a summary of risk factors at each layer.</p>
<hr />
<table border="1" cellpadding="5" cellspacing="1" style="margin: auto;">
<caption>DSI attack vectors</caption>
<thead>
<tr><th></th><th>DSI</th><th>Description</th></tr>
</thead>
<tbody>
<tr><td>7</td><td>Application Layer</td><td>A vulnerable website allowing access to lower layer credentials, e.g. Equifax.</td></tr><tr><td>6</td><td>Presentation Layer</td><td>Cross-site scripting, where malicious code is injected through data.</td></tr><tr><td>5</td><td>Schema Layer</td><td>Enumeration attack due to poorly designed data access policies.</td></tr><tr><td>4</td><td>Database Layer</td><td>Exfiltration of data (e.g. Equifax), SQL injection attack.</td></tr><tr><td>3</td><td>Filesystem Layer</td><td>Local privilege escalation.</td></tr><tr><td>2</td><td>Block Layer</td><td>Network attached storage in an unsecured network or using an insecure protocol.</td></tr><tr><td>1</td><td>Physical Layer</td><td>Theft or improper disposal of physical medium, e.g. Morgan Stanley.</td></tr></tbody></table>
<p>* Note: as new attack techniques become known, I'll add them to the correct layer here.
</p><hr />
<p>It is tempting to try to put all security burden on the Application Layer, but it is unrealistic to expect that a single layer can handle all the attacks thrown at it. Again, since security is only as strong as the weakest link, we have to fortify each layer.</p><p><b>Application Layer</b> should impose a strict separation of code and data. When possible, the application should be implemented using a compiled language with code placed in a read-only filesystem (see <a href="https://lifecs.likai.org/2021/07/revival-of-harvard-architecture-for.html">Revival of Harvard Architecture</a>). The application should be shielded behind a monitoring system for detecting unusual access patterns, but reconnaissance alone is not enough. It is often already too late when the exploit is detected, but we still want to have real-time signal for exploit attempts.</p><p><b>Presentation Layer</b> should escape data consistently when inserting one piece of data into another. User-supplied HTML should be parsed and scrubbed to keep only allowlisted elements. A proper templating library should be able to handle that. Don't use string concatenation for structural data.</p><p><b>Schema Layer</b> should use itemized encryption as a means to control the sharing of data. Data is considered shared with someone if they have access to the session key for that data. One way to do this is to encrypt the item key for anyone that should have read access to the item. The user has an encrypted item key that can be decrypted by a credential of their choice. If the backend system requires access to the item, the item key will additionally be encrypted for the backend's role account credential.</p><p>Since anyone who has access to the item key will have access to all data encrypted by that item key, data should be scoped for the reader. Data for a different set of readers should be encrypted by its own item key. Nevertheless, access to the item's cipher-text should still be subject under an access-control list. Even though itemized encryption alone should prevent the enumeration attack, the additional access-control list guards against improperly designed itemized encryption key schedule or derivation.</p><p><b>Database Layer</b> should apply a page-level encryption (such as <a href="https://sqlite.org/com/see.html">SQLite Encryption Extension</a>) even though user data is already encrypted by the Schema Layer. This allows Filesystem Layer to replicate or backup the database without worrying that some tables or indices may still contain unencrypted sensitive data.</p><p>We do not presently require the <b>Filesystem Layer</b> to implement any mitigation. This allows industry-standard data availability and redundancy practices to continue to be deployed. Vulnerability at this layer comes from the OS vendor and is typically outside of our control, so we mitigate this by having encryption at the Database Layer and the Block Layer.</p><p><b>Block Layer</b> or <b>Bit Layer</b> should apply full-disk encryption so that it is impossible to recover data without the volume encryption key if the storage medium is improperly disposed. Encryption at this level alone is not enough because in case of a stolen laptop, a copy of the decryption key would still be found in system memory somewhere. And there is a good chance that the residual key can still be recovered after the laptop is powered off, e.g. by <a href="https://www.zdnet.com/article/cryogenically-frozen-ram-bypasses-all-disk-encryption-methods/">cryogenically freezing the RAM</a>.</p><p>In summary, mitigation strategies tailored for each data storage layer together provide a defense in depth. Reasoning using the DSI model allows us to clearly see how the separation of concerns at each layer contribute to the overall mitigation strategy. And this is how to design a data-breach resilient database.</p><p></p>Likai Liuhttp://www.blogger.com/profile/06372207357661600589noreply@blogger.com0tag:blogger.com,1999:blog-4401555280825766585.post-48944914464673988552022-04-11T14:05:00.007-04:002022-09-01T23:43:31.566-04:00The Correct Way to Lock a Get-Or-Create Map<p>A Get-Or-Create map is useful for the "<a href="https://en.wikipedia.org/wiki/Multiton_pattern">Multiton</a>" (mutiple-singleton) design pattern. Locking is used to make it thread-safe. It may not seem obvious, but the straightforward way to lock a map with Get-Or-Create semantics is problematic.</p>
<pre class="prettyprint">// Store keeps track of all the objects created so they can be shared.
type Store struct {
mu sync.Mutex
m map[string]*Object
}
func (s *Store) GetOrCreate(key string) (*Object, error) {
s.mu.Lock()
defer s.mu.Unlock()
if s.m == nil {
s.m = make(map[string]*Object)
}
if obj, ok := s.m[key]; ok {
return obj, nil
}
obj, err := newObject(key) // BAD!
if err != nil {
return nil, err
}
s.m[key] = obj
return obj, nil
}
</pre>
<p>To illustrate the problem, let's say a store already has <code>"bar"</code> but not <code>"foo"</code> saved in the map. Thread-A calls <code>s.GetOrCreate("foo")</code> which necessitates creating a new object for <code>"foo"</code>. Meanwhile, Thread-B comes along and simply wants to call <code>s.GetOrCreate("bar")</code> which already exists. Thread-B should be able to just get <code>"bar"</code> without waiting for Thread-A to create <code>"foo"</code>, but the lock held by Thread-A during <code>newObject("foo")</code> forces Thread-B to wait.</p>
<p>This is particularly problematic when the object creation takes on the order of seconds which is typical if the object represents a remote network resource, and that is a common use case for Get-Or-Create maps.</p><p><b>Generally speaking, holding a lock while doing blocking operations will severely limit the scalability of a service, so any lock should be confined to the scope of reading or writing an in-memory data structure.</b></p>
<p>However, Get-Or-Create maps does have to wait in some cases. Even so, we can separate the "fast path" from the "slow path" and only have to wait in the slow path. The vast majority of accesses should fall under the fast path which does not wait for blocking operations.</p>
<p>Consider another Thread-C which also wants to get <code>"foo"</code> while Thread-A is creating it. Thread-C should wait for Thread-A. If Thread-C goes ahead and creates another instance of <code>"foo"</code>, then both Thread-A and Thread-C will end up with an object for <code>"foo"</code>, and we now have a problem: which <code>"foo"</code> do we put into the map now? For this reason, the map needs a third state to indicate that an object is being created although not ready.</p>
<p>The right mechanism is to use a condition variable in conjunction with a mutex, so that Thread-A could release the lock while creating the new object, Thread-B could retrieve an existing object without blocking, and Thread-C could wait for object creation by Thread-A.</p>
<pre class="prettyprint">type Store struct {
mu sync.Mutex
c sync.Cond
m map[string]*Object
}
func (s *Store) GetOrCreate(key string) (*Object, error) {
s.mu.Lock()
defer s.mu.Unlock()
if s.m == nil {
s.m = make(map[string]*Object)
}
obj, ok := s.m[key]
for ok && obj == nil { // Object exists but it is still being created.
s.c.Wait() // Wait for its creation.
obj, ok = s.m[key] // And try again.
}
if ok && obj != nil {
return obj
}
s.m[key] = nil // Assume responsibility for creating the object.
s.mu.Unlock()
obj, err := newObject(key) // Avoid holding the lock for lengthy operations.
s.mu.Lock()
if err != nil {
delete(s.m, key) // Object is no longer being created. Next thread will retry.
} else {
s.m[key] = obj // Creation successful. Next thread will pick this up.
}
s.c.Broadcast() // Wake up everyone waiting one-by-one.
return obj, err
}
</pre>
<p>When Thread-A is about to create a new object, it puts a nil value into the map to indicate that an object is being created but not available. Thread-C will lock the mutex and see that the value is nil, and will wait on the conditional variable (which releases the mutex while waiting). When Thread-A finishes object creation, it locks the mutex again, puts the object into the map, broadcasts a signal through the condition variable, and unlocks the mutex. This will wake up Thread-C to try again. Thread-B is able to obtain <code>"bar"</code> without blocking because the mutex is not held by either Thread-A or Thread-C.</p>
<p>However, if Thread-A is unable to create the object for any reason, it will clear the pending status, still wake up all the waiting threads, and return the error. Thread-C will see that the nil value has been removed from the map and then assume the responsibility for attempting to create a new object.</p>Likai Liuhttp://www.blogger.com/profile/06372207357661600589noreply@blogger.com0tag:blogger.com,1999:blog-4401555280825766585.post-68300970601471355122022-03-28T08:40:00.000-04:002022-03-28T08:40:50.370-04:00Self-Driving Car Challenges<div class="separator" style="clear: both; text-align: center;"><a href="https://group.mercedes-benz.com/innovation/case/autonomous/drive-pilot-2.html" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="512" data-original-width="1024" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEimRRFdK0VuZFbA-cQn7GefptyXrwjH713gh2pMl5ZQvFTG_pTq9QbnQhVifo2h3jBeB_PFkBAQ1fbzuo8ObjnXJ-z7XGlogBijU3ZiJWCZcLrayrn7sVzKgw4ZbfFm2SkUjRaHTKDNNjQIDkSzgVF81e4DCD4-3QnN-fLUWAycO4pt6Mi5AETtp4_-/w640-h320/drive-pilot-w1024xh512-cutout.jpeg" title="Mercedes-Bens Drive Pilot" width="640" /></a></div><p>Mercedes recently announced <a href="https://www.roadandtrack.com/news/a39481699/what-happens-if-mercedes-drivepilot-causes-a-crash/">level 3 self-driving that takes legal liability</a>. Typically, the <a href="https://www.synopsys.com/automotive/autonomous-driving-levels.html">levels of autonomy</a> are defined by situations where the car is able to operate without human intervention, but being able to assume legal liability is a huge step forward. Levels of autonomy would only remain a marketing slogan unless it can be tested in court. The cost of liability will be factored into the insurance premium, so self-driving only becomes economically viable if the insurance premium can be lower than that of a human driver.</p><p>However, Mercedes' legal responsibility comes with limits: on certain (already-mapped) highways, below 40 mph, during daytime, in reasonably clear weather, without overhead obstructions.</p><p>The first challenge that any camera based system must overcome is the stability of the footage. You can see from this test that a <a href="https://youtu.be/rY705ZbszWs?t=515">car mounted camera with improper image stabilization</a> (circa 2016) would produce a wobbly and shaky image. The wobble can be counteracted by shortening exposure time (high camera shutter speed) but this requires good lighting condition, hence the reasonably clear weather requirement. Furthermore, when the car is traveling fast, you need a telephoto lens to look further ahead for hazardous conditions, but longer telephoto also exacerbates the wobble, hence the operating speed limit of the self-driving. If the video footage is bad, it won't help much if you feed this into machine learning because "garbage in, garbage out." More recent cameras such as a GoPro has <a href="https://youtu.be/AEK9HkqrlW0?t=134">improved image stabilization</a> (circa 2021) that also works on a <a href="https://youtu.be/VE6NZVC8H-o">toy car</a> (circa 2020) which is more challenging to stabilize. These cameras can produce clean image under more forgiving lighting conditions.</p><p><b>Car manufacturers who are serious about their self-driving should be licensing camera stabilization technology from the likes of GoPro.</b></p><p>Self-driving cars using LIDAR face a different challenge. <a href="https://cosmosmagazine.com/technology/lidar-how-self-driving-cars-see/">LIDAR works</a> by sending a short pulse of light and observe reflections, so it is not dependent on external lighting conditions. But when there are multiple LIDAR equipped cars on the road, they could be picking up each other's signals which might seem like noises. As such, a LIDAR has to encode its own unique ID into the light pulse and filter out any pulse not coming from itself.</p><p>A third challenge is about how to legally dispute a claim in court. A self-driving system must be able to produce a rationale why it made any given decision, and the decision has to be legally justifiable. Previously, machine learning is a black box that could <a href="https://ai.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html">produce surprising results</a> (it recognized a dumbbell only because it has an arm attached to it), but <a href="https://cloud.google.com/blog/products/ai-machine-learning/why-you-need-to-explain-machine-learning-models">explainable AI</a> is making some progress. Similarly, self-driving technology must be explainable.</p><p>Explainable self-driving technology can be thought of as a driving instructor that happens to be a computer. The driving instructor not only has to make the right decision on the road, but it also has to explain to a student why it is the right decision given the circumstance.</p><p><b>Car manufacturers that want to assume legal responsibility should aim for a computerized driving instructor instead.</b></p><p>A musician would understand that in order to perform at 100%, they must practice up to 120%. This mindset applies to a lot of engineering problems where safety is at stake. Building structures such as <a href="https://elevation.fandom.com/wiki/Capacity">elevators</a> are often designed with load considerations at ~200% rated capacity or more (<a href="https://www.waterboards.ca.gov/waterrights/water_issues/programs/bay_delta/california_waterfix/exhibits/docs/dd_jardins/DDJ-148%20ASCE%207-10.pdf">ASCE/SEI 7-10</a> 1.4.1.1 (b)). When it comes to self-driving, the specs are less quantifiable, but the design qualifications must similarly exceed expectation in order for the technology to be viable.</p>Likai Liuhttp://www.blogger.com/profile/06372207357661600589noreply@blogger.com0tag:blogger.com,1999:blog-4401555280825766585.post-6733169901385264582021-10-05T23:56:00.005-04:002021-10-06T10:30:44.200-04:00Four Lessons Learned from Facebook BGP Outage<p>First of all, a disclaimer: I don’t work at Facebook, but enough is known about the outage that I think we can all learn a few lessons from it. Be it a cautionary tale on network design.</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg_LqS1LD14E7ZwW3MM6QyiG8hNLd6QvjXHGBHjuwu1KufIMyYjd1OL9peiN_csKiA8wHfFBrl-zq5bZ3X87WNFn9mTf7h-QZ8_PJHNW79ApUe6M13gDLt-igpIMUizVosoE9JfMiGBCEY/s724/fbdeath.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="724" data-original-width="717" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg_LqS1LD14E7ZwW3MM6QyiG8hNLd6QvjXHGBHjuwu1KufIMyYjd1OL9peiN_csKiA8wHfFBrl-zq5bZ3X87WNFn9mTf7h-QZ8_PJHNW79ApUe6M13gDLt-igpIMUizVosoE9JfMiGBCEY/w396-h400/fbdeath.png" width="396" /></a></div><p></p><p>On Monday, Oct 4, 2021, Facebook literally unfriended everyone on the Internet. That’s because they’ve broken the BGP peering sessions and withdrawn all BGP routes to their network, so their network became unreachable. CloudFlare has <a href="https://blog.cloudflare.com/october-2021-facebook-outage/">explained how BGP worked in great detail</a>, so I don’t need to repeat it here.</p><p>The most immediate effect is that Facebook’s name servers became unreachable, which is what most people immediately notice because name servers are the first thing a web browser would try to contact when you visit a website. At the time of writing, Facebook advertises 4 name servers.</p><pre>$ host -t ns facebook.com
facebook.com name server d.ns.facebook.com.
facebook.com name server a.ns.facebook.com.
facebook.com name server b.ns.facebook.com.
facebook.com name server c.ns.facebook.com.
$ host a.ns.facebook.com
a.ns.facebook.com has address 129.134.30.12
a.ns.facebook.com has IPv6 address 2a03:2880:f0fc:c:face:b00c:0:35
$ host b.ns.facebook.com
b.ns.facebook.com has address 129.134.31.12
b.ns.facebook.com has IPv6 address 2a03:2880:f0fd:c:face:b00c:0:35
$ host c.ns.facebook.com
c.ns.facebook.com has address 185.89.218.12
c.ns.facebook.com has IPv6 address 2a03:2880:f1fc:c:face:b00c:0:35
$ host d.ns.facebook.com
d.ns.facebook.com has address 185.89.219.12
d.ns.facebook.com has IPv6 address 2a03:2880:f1fd:c:face:b00c:0:35
</pre><p>To be fair, these four IP addresses are probably <a href="https://en.wikipedia.org/wiki/Anycast">anycast</a> addresses backed by several distributed clusters of physical servers. Unfortunately, all of them are in the same Autonomous System, <a href="https://bgp.he.net/AS32934">AS 32934</a>. Even though their networks are distributed, the AS identifies the scope of BGP peering, so that becomes the single point of failure when BGP is misconfigured.</p><p><b>Lesson 1</b>: If you have multiple name servers, they should be distributed across different Autonomous Systems if feasible. Most people can’t do it, but an organization at the scale of Facebook should get multiple AS numbers.</p><p>As a side note, it is common for a big company like Facebook to have just one AS spanning multiple continents. I don’t think that’s a good idea. If a user in Europe wants to visit your server located in Asia, their traffic will enter your peering point in Europe first, which means you have to do internal routing to Asia yourself. That’s great if you build your own inter-continental backbone and want complete control over end user latency, but not so great if you experience traffic surge and want to shed traffic to the other backbone providers. It’s better to have one AS for each continent for locality reason, and have an extra AS or two for global anycast (e.g. for CDN). This gives you greater flexibility in inter-continental traffic engineering. I should probably write another blog post about this some other time.</p><p>I also found out that Facebook’s BGP peering is almost fully automated. At the company where I work, we received a notice from Facebook exactly two weeks ago. It says that an idle session has been detected, and “if your session does not establish within 2 weeks, it will be removed.” So two weeks after the notice would coincide with this Facebook outage.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiYY0IEgTH9LZci2o05xda52IGw5_fIPwBX76sfBwPA7M0N1PpAp14YSoljuaS6IKEAGeXaAO_d97eqqQqaIoOb5yqmcE7Y1rkWOSUZGn2OnsRZ51DAUA_TS9WUtZbZBqnB_GSxR_r0Irg/s875/Idle+BGP+Alert.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="528" data-original-width="875" height="386" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiYY0IEgTH9LZci2o05xda52IGw5_fIPwBX76sfBwPA7M0N1PpAp14YSoljuaS6IKEAGeXaAO_d97eqqQqaIoOb5yqmcE7Y1rkWOSUZGn2OnsRZ51DAUA_TS9WUtZbZBqnB_GSxR_r0Irg/w640-h386/Idle+BGP+Alert.png" width="640" /></a></div><p>Having looked at this in more detail, I’m skeptical that this message alone could explain the outage. I looked up the IPv6 addresses used for peering, and found that the prefix <a href="https://bgp.he.net/net/2001:7f8:64:225::/64">2001:7f8:64:225::/64</a> belonged to Telekom Indonesia (<a href="https://bgp.he.net/AS7713">AS 7713</a>), so I believe this address has been allocated for a datacenter in Indonesia that was only used for peering in that region. If so, this message would have only explained regional outage but not worldwide outage.</p><p>But if this is any indication, notice that the message has only been viewed 4 times in a whole company of thousands of people doing network operations. It is often the case that many alerts like this have been fired and then ignored.</p><p><b>Lesson 2</b>: Do not ignore automated alerts for a production system.</p><p>One thing for certain is that Facebook definitely has an automated system to clean up idle peering sessions. I recently encountered a similar issue with a project at my work where some new code was cleaning up idle sessions too aggressively and caused connectivity issues. It was discovered in a lab so all was well.</p><p>The official reason of the outage, <a href="https://engineering.fb.com/2021/10/05/networking-traffic/outage-details/">according to Facebook</a>,<br /></p><blockquote><p><i>During one of these routine maintenance jobs, a command was issued with the intention to assess the availability of global backbone capacity, which unintentionally took down all the connections in our backbone network, effectively disconnecting Facebook data centers globally. Our systems are designed to audit commands like these to prevent mistakes like this, but a bug in that audit tool prevented it from properly stopping the command.</i></p></blockquote><p>In Facebook’s case, their software might have erroneously detected all BGP sessions as idle and then promptly deleted them. This could be prevented if someone would look at the effective changes (“diffs”) to be caused by the command, not just auditing the command itself, and this would allow them to discover problems before executing the command in production. In a production system, it pays to be more careful.</p><p><b>Lesson 3</b>: The effect of production changes should be manually reviewed and signed off before executing.</p><p>Last but not the least, BGP was not the only way to define network routes. BGP was designed to find the least costly path over multiple hops of the network, but before BGP, network engineers simply configured static routes. The routing table for the entire Internet is now too large (~900K entries at the time of writing, see <a href="https://www.cidr-report.org/as2.0/">CIDR Report for the current size</a>) to be manually configured. Facebook alone advertises around 160 IPv4 prefixes, but they only need to add a few well-known minimum routes as backup to keep basic connectivity. Physical network changes take significantly more effort, so updating the static routing table is just a small overhead at Facebook’s scale.</p><p><b>Lesson 4</b>: Always have a backup minimal configuration when the configuration is decided by an automated algorithm.</p><p>That’s all, folks!</p><p>Some wisecrack post-scriptum: I’m a programming languages person by training, but somehow over the years I’ve amassed enough networking knowledge to write convincingly about it.</p><p>I may have spent more time making the cover graphics than writing, to be honest. Give me a thumbs up if you like the graphic, even if you couldn’t care less about the content.</p><p>The opinion expressed does not reflect that of my employer, not that I’ve made it easy to tell who that is either (hopefully).</p>Likai Liuhttp://www.blogger.com/profile/06372207357661600589noreply@blogger.com0tag:blogger.com,1999:blog-4401555280825766585.post-24661061982056948302021-07-15T02:04:00.007-04:002021-07-15T10:06:20.489-04:00Revival of Harvard Architecture for the Mitigation of Ransomware Attacks<p>Pictured below is the <a href="https://en.wikipedia.org/wiki/Gropius_House">Gropius House</a> in Lincoln, Massachusetts, which was the former residence designed and built by the renowned architect of the Bauhaus movement and emeritus chair of the Department of Architecture at Harvard Graduate School of Design, Walter Gropius.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://en.wikipedia.org/wiki/Gropius_House" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="800" height="480" src="https://upload.wikimedia.org/wikipedia/commons/f/fa/Gropius_House%2C_Lincoln%2C_Massachusetts_-_Front_View.JPG" width="640" /></a></div><p></p><p>But the Harvard architecture revival I'm writing about is not that kind of architecture. It was a computer <a href="https://en.wikipedia.org/wiki/Harvard_architecture">machine architecture</a> in the punched tape era that separated program and data in two different pathways. The Harvard Mark I machine was designed by Howard Aiken at Harvard University. He took inspirations from the Analytical Engine mechanical computer designed by Charles Babbage, who was the mathematics mentor of Ada Lovelace, the first computer programmer and the daughter of poet Lord Byron.</p><p>There is poetic justice about the fact that computer programming came out of the conjunction of mathematics and poetry.</p><p>What makes Harvard architecture computers notably different from today's computer is that programs were read-only, and data could not be executed as programs. In today's computer, programs and data are stored on the same storage medium. Anyone who can put data on a computer can also run it as a program.</p><p>Security breaches happen in two steps: (1) someone has write access to a data path, and (2) subsequently leverages the data path to upload and execute unauthorized programs on a computer they do not own. Gaining write access for step (1) is not that hard. Many computer systems that handle data actively solicit data from non-privileged users. Step (2) relies on having some knowledge of vulnerabilities in a running program that can trick it into passing control flow onto arbitrary data.</p><blockquote style="border: 1px solid grey; padding: 0px 1em;"><p>Case in point is WordPress, which is a blogging platform notorious for having <a href="https://www.cvedetails.com/vulnerability-list/vendor_id-2337/product_id-4096/">vulnerabilities</a> that result in <a href="https://www.techtimes.com/articles/257016/20210212/wordpress-data-breach-affects-100-000-exposed-websites-using-responsive.htm">security breaches</a>. That's partly because it is built on PHP, a language that makes it easy to run data as program:</p><p></p><ul><li>Once <a href="https://www.w3schools.com/php/php_file_upload.asp">file upload</a> is enabled on a site, PHP will allow it for any script on that site even if that particular script never expects file upload.</li><li>A moderately complex site breaks down their code into several files, and the first file <a href="https://www.php.net/manual/en/function.include.php">include() or require()</a> other files to bootstrap the program's functions. The file to be included is a computed path that could be any file on the filesystem.</li><ul><li>PHP can be configured to limit the path of what could be included using <a href="https://www.php.net/manual/en/ini.core.php#ini.open-basedir">open_basedir</a>. This limits <a href="https://medium.com/@Aptive/local-file-inclusion-lfi-web-application-penetration-testing-cc9dc8dd3601">exfiltration of system files</a> that are not part of the public site such as /etc/passwd, but uploaded files are often moved under open_basedir which can then be included.</li></ul><li>Any file type could be included, <a href="https://medium.com/@igordata/php-running-jpg-as-php-or-how-to-prevent-execution-of-user-uploaded-files-6ff021897389">even if it is an image file with PHP code</a> embedded in it.</li></ul><p>The user only needs to upload a malicious image to WordPress and trick it to run the image. It's less of an issue for a private WordPress installation, but on a shared blogging site like wordpress.com where anyone could register an account, anyone can cause a security breach and steal other people's passwords. That's why people shouldn't <a href="https://www.cisecurity.org/blog/reusing-passwords-on-multiple-sites/">reuse passwords on multiple sites</a>.</p></blockquote><p>Once they hijack the computer, they also hijack whatever data was entrusted to be handled on that computer. They may also modify existing legitimate programs on the computer and hide illicit code there, which makes it very costly to discover and clean up after a security incident. This is how <a href="https://krebsonsecurity.com/tag/ransomware/">ransomware</a> works in a nutshell. Some crooks turned security breaches into a profiteering criminal enterprise.</p><p>If today's computers have separate program and data pathways, security breaches would only be limited to the original unauthorized data path access but would not escalate to become a complete hijack of the computer.</p><p>Modern computers provide some facilities to separate code and data, but they are optional and must be deployed judiciously in a production environment in order to be effective.</p><p></p><ul style="text-align: left;"><li><a href="https://en.wikipedia.org/wiki/Executable_space_protection">Executable space protection</a> designates parts of the memory only for handling data and prohibits executing it as code. This protects programs that are already running from remote code injection vulnerabilities.</li><li>Enforce writable data to be non-executable (i.e. <a href="https://en.wikipedia.org/wiki/W%5EX">W^X</a>). This is typically part of the executable space protection for running programs, but the same principle can be applied to programs and data stored on disk.</li><li><a href="https://en.wikipedia.org/wiki/Copy-on-write">Copy on write</a> is a storage policy that maintains snapshots of data modifications. This can make it easier to restore the filesystem to a former state before the security breach.</li><li>Some operating systems support making a filesystem read-only, e.g. <a href="read-only bind mounts">read-only bind mounts</a>.</li><li>Some filesystems have the option to make files non-executable on a volume, e.g. <a href="https://linux.die.net/man/8/zfs">zfs</a> exec=off, noexec <a href="https://man7.org/linux/man-pages/man8/mount.8.html">mount</a> option.</li><li><a href="https://en.wikipedia.org/wiki/Code_signing">Code signing</a> verifies that a program comes from a trusted party and has not been modified in an unauthorized manner. Verification and execution must happen atomically. Otherwise an attacker has an opportunity to modify the program between verification and execution to evade detection.</li></ul><div>They all work towards the same principle as Harvard architecture, where program is read-only, and writable data cannot be executed as program.</div><p style="-webkit-text-stroke-width: 0px; color: black; font-family: Times; font-size: medium; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-decoration-color: initial; text-decoration-style: initial; text-decoration-thickness: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;"></p><ul style="text-align: left;"></ul><p></p><p style="-webkit-text-stroke-width: 0px; color: black; font-family: Times; font-size: medium; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-decoration-color: initial; text-decoration-style: initial; text-decoration-thickness: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">Ultimately, the reason why programs must be read-only has to do with accountability requirements imposed by <a href="https://en.wikipedia.org/wiki/Sarbanes%E2%80%93Oxley_Act">Sarbanes-Oxley Act</a> in response to corporate scandals such as Enron. Programs running in production must be able to establish its provenance, and that someone has reviewed and certified that the program serves its intended purpose. If the program is writable after the certification, then that would put its provenance in doubt.</p><p style="-webkit-text-stroke-width: 0px; color: black; font-family: Times; font-size: medium; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-decoration-color: initial; text-decoration-style: initial; text-decoration-thickness: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">In a sense, if a company falls victim to ransomware attack, there must be a loophole in the accountability of its information systems. Revival of Harvard architecture would close the loophole and eliminate ransomware attacks.</p>Likai Liuhttp://www.blogger.com/profile/06372207357661600589noreply@blogger.com0tag:blogger.com,1999:blog-4401555280825766585.post-33453700491312289802020-11-28T00:25:00.005-05:002021-03-04T09:42:24.860-05:00My notes for setting up a new Yubikey 5<p>A hardware crypto token such as Yubikey is not meant to be used forever. A new release would address old vulnerabilities and add new crypto support. The firmware is not upgradable (for security reasons), so new features and fixing vulnerabilities always require the key to be replaced. It is already a toil for anyone to rotate their hardware token with all the servers they might have used it with for authentication. If they had used a hardware token to encrypt data for long term archival, the hardware token could not be disposed without re-encrypting all data. Cloud service providers can afford to engineer a solution that uses static derived keys encrypted by a master key that could be more easily rotated, but I'm not aware of off-the-shelf solutions for individuals. As such, I do not recommend using hardware tokens such as Yubikey to encrypt archived data.</p><p>Even though I don't use Yubikey for encryption, there is still considerable effort to set it up for authentication which is my primary use for it. In this article, I will explain the feature landscape of different crypto key types, followed by a step by step instruction how I configured my key. Hopefully it helps you decide if this is the right time to start using Yubikey or upgrade your existing key, and if it is the right time, how to do it.</p><h4 style="text-align: left;">Background</h4><p>My own motivation is to replace my RSA key with ECC key because ECC 256-bit key is shorter and has roughly the same strength as RSA 3072 bit key (here is a <a href="https://security.stackexchange.com/a/95407">rationale</a> that I endorse). However, the <a href="https://crypto.stackexchange.com/questions/10263/should-we-trust-the-nist-recommended-ecc-parameters">NIST P-curves are not safe</a>. There are some <a href="https://safecurves.cr.yp.to/">criteria for establishing the safety</a> of various ECC curves, courtesy to the author of curve25519.</p><p>First of all, a cautionary tale: don't order Yubikey 5 from Amazon. I made the mistake of ordering one there in April 2020, and I got a Yubikey with an old 5.1.2 firmware (it wasn't discounted). The new <a href="https://developers.yubico.com/PGP/YubiKey_5.2.3_Enhancements_to_OpenPGP_3.4.html">firmware 5.2.3</a> was released in August 2019, a handsome 8 months ago. Amazon also refused to post my warning about the old firmware as a review because they said it pertains only to a specific supplier, not the product.</p><p>Starting from firmware 5.2.3, Yubikey implements OpenPGP specification 3.4 which added support for ed25519 keys, among other ECC types. However, NIST P-curves (known as "ecdsa" in SSH) and ed25519 are the only new ones that work with SSH. The new firmware also added OpenPGP attestation which certifies that a key is generated on chip, and whether touch is required to use the key (attestation was first introduced in U2F).</p><p>The old 5.1.2 firmware lacked ed25519 support. It was to replace my Yubikey 4 which <a href="https://www.yubico.com/support/security-advisories/ysa-2017-01/">generated weak RSA keys</a>. Before that, I had a Yubikey NEO-n which lacked touch support for OpenPGP even though U2F always required touch. Yubico is already working on implementing <a href="https://www.yubico.com/blog/getting-a-biometric-security-key-right/">biometric touch</a> for the next generation Yubikey.</p><p>Also, the software tools provided by Yubico changed over time. The previous generation tools <a href="https://developers.yubico.com/yubikey-neo-manager/">Yubikey NEO Manager</a> and <a href="https://developers.yubico.com/yubikey-personalization-gui/">Yubikey Personalization Tool</a> have been deprecated and replaced with <a href="https://developers.yubico.com/yubikey-manager-qt/">Yubikey Manager</a>. It is worth noting that the GUI version only lets you configure FIDO2 and PIV, not OpenPGP. However, a CLI version is bundled with the GUI version, and the CLI has the OpenPGP configuration options. At one point, it was only possible to configure OpenPGP touch policy with a third-party script <a href="https://github.com/a-dma/yubitouch">yubitouch</a>, but it should not be used anymore.</p><p>That said, the entire ecosystem around using OpenPGP hardware key to login over SSH is pretty fragile in my opinion. It involved several parties to be perfectly aligned: Achim Pietig who rectifies the <a href="https://gnupg.org/ftp/specs/OpenPGP-smart-card-application-3.4.pdf">OpenPGP specification</a>, the GnuPG team which is mostly just Werner Koch and sometimes NIIBE Yutaka (who designed the Gniibe Gnuk key), Yubico and other hardware vendors, the OpenSSH developers who decide which new key types to support, and the various distributions that decide which versions of GnuPG and OpenSSH are available to the end user. Since OpenSSH 8.1 added FIDO/U2F support, once this is rolled out to the various OS distributions, OpenPGP should not be used with SSH anymore. OpenSSH 8.4 was only <a href="https://tracker.debian.org/news/1195668/accepted-openssh-184p1-2bpo101-source-amd64-all-into-buster-backports-buster-backports/">accepted into Debian Buster backports</a> two days ago, and before that Buster only had OpenSSH 7.9. Mac OS X <a href="https://apple.stackexchange.com/a/406761">Big Sur finally has OpenSSH 8.1</a>. Both client and server need to support U2F over SSH in order to work.</p><h4 style="text-align: left;">Step by Step Guide</h4><p>Without further ado, here is what I did to set up my Yubikey 5 on Mac OS X.</p><p>First, make an alias for the ykman command. The slash escape is needed in order for the alias to observe the whitespace in the path.</p><pre>$ alias ykman='/Applications/YubiKey\ Manager.app/Contents/MacOS/ykman'</pre><p>Disable <a href="https://developers.yubico.com/OTP/OTPs_Explained.html">Yubico OTP</a>, otherwise accidental touches will cause the passcode to be typed as if someone was typing it on the keyboard. It appears as a string of random characters.</p><pre>$ ykman config usb --disable OTP</pre><p>Initialize the card first with gpg --card-edit before configuring OpenPGP touch policy. That's because generating new keys will cause the touch policy to be reset to OFF. Here is a condensed recipe followed by some explanations.</p><pre>$ gpg --card-edit<br />> admin<br />> factory-reset # see NOTE(1)<br />> kdf-setup # see NOTE(2)<br />> passwd # see NOTE(3)<br />> key-attr # see NOTE(4)<br /> (2) ECC<br /> (1) Curve 25519<br />> generate # see NOTE(5)<br />Make off-card backup of encryption key? (Y/n) n<br /> 0 = key does not expire<br />Key is valid for? (0)<br />Key does not expire at all<br />Is this correct? (y/N) y # see NOTE(6)<br /><br /># Optionally:<br />> name<br />> url<br />> login</pre><p>Notes:</p><p></p><ol style="text-align: left;"><li>factory-reset does not require PIN. It only resets the OpenPGP applet and will leave FIDO U2F intact. However, it does not reset the OpenPGP attestation touch policy. This only needs to be done when I f'd up and need to start over.</li><li>kdf-setup prevents plain text PIN from being sent over USB.</li><li>first change Admin PIN (3), then change PIN (1). Note that PIN does not have to be numeric. I have my Admin PIN set to my login passphrase. Many key configuring operations require both PIN and Admin PIN, and Admin PIN can be used to override PIN, so both must be set for security reasons.</li><li>key-attr will repeatedly prompt for signature, encryption, and authentication keys. It will ask for Admin PIN each time. Again, only Curve 25519 is supported by SSH. Only the authentication key is used by SSH, so signature and encryption keys could be other types.</li><ul><li>Since SSH does not support secp256k1, using it will cause "Agent refused operation" error with SSH.</li><li>The key types that SSH does not support are hidden unless the session was started in expert mode using gpg --card-edit --expert.</li></ul><li>generate will first ask for PIN, then ask for user information, then ask for Admin PIN prior to key generation. Again, generation of new keys will reset the touch policy.</li><ul><li>Do not make off-card backup. The key will be generated off-card first and then loaded back to Yubikey, which means OpenPGP attestation won't work.</li></ul><li>There seems to be no point in setting a key expiration here. The expiration can be changed using gpg --edit-key (note: this is a different session mode than --card-edit) after the key is generated.</li></ol><p></p><p>Now we need to configure the OpenPGP touch policy. There are a few oddities to be aware of before proceeding.</p><p></p><ul style="text-align: left;"><li>Firstly, after running gpg --card-edit, the ykman command seems to hang, but unplugging and replugging the Yubikey makes it work again.</li><li>Secondly, ykman seems to have trouble disabling terminal echo on Mac OS X ("Can not control echo on the terminal" with a GetPassWarning), so the PIN and Admin PIN will be echoed to the screen. It is using the Python <a href="https://docs.python.org/3/library/getpass.html">getpass</a> module, which works when I run it in Python directly. I think ykman wasn't able to import termois because the code signing prohibited it from loading dynamic libraries.</li><li>Lastly, file output could not be written, possibly due to Catalina's enhanced security policy. Writing to standard output works, but beware that prompts are also written to stdout. It means that output redirected to a file would be polluted with interactive messages. They should read my article <a href="https://lifecs.likai.org/2013/11/to-err-is-human-to-out-pipeline.html">To err is human; to out, pipeline</a>.</li></ul><p></p><p>Now, use ykman to set touch policy. It will ask for the Admin PIN.</p><pre>$ ykman openpgp set-touch SIG FIXED<br />$ ykman openpgp set-touch ENC FIXED<br />$ ykman openpgp set-touch AUT FIXED<br />$ ykman openpgp set-touch ATT FIXED</pre><p>Note: set-touch ATT FIXED only needs to be done once per Yubikey ever. This is not affected by factory-reset. Trying to set it again will result in a harmless error.</p><p>Finally, I am generating the OpenPGP attestation certificates but haven't found a use for them. The ones generated by <a href="https://developers.yubico.com/PGP/Attestation.html">Yubikey contains an X.509 v3 extension</a> also indicates what the touch policy is, so an external trust engine may use it to decide whether to give an attested key more trust. Note that unless the touch policy is FIXED or CACHED-FIXED, it can be changed to OFF later, so it makes no sense to attest any other touch policies.</p><p>These will ask for the regular PIN and write the attestation certificate to standard output.</p><pre>$ ykman openpgp attest SIG -<br />$ ykman openpgp attest ENC -<br />$ ykman openpgp attest AUT -</pre><p>After each command, copy and paste the X.509 certificate to its own file. Do not attest ATT which is the attestation key. I suspect it will destroy the preloaded attestation key but haven't tried it. It will warn you that it will overwrite an existing attestation certificate. Yubikey comes with an attestation key preloaded which certifies that the OpenPGP keys are generated by a Yubico manufactured hardware. If the attestation key is overwritten, it could no longer be recovered even after a factory reset.</p><p>I use <a href="https://gpgtools.org/">GPG Tools on Mac</a> to add a photo to the key which has a nicer interface, but gpg --edit-key should work. Although the <a href="https://help.gnome.org/users/seahorse/stable/pgp-photoid.html.en">recommended photo size is 120x150</a>, I find that 240x240 produces the best result. It will be scaled down to 120x120 by GnuPG. After this, the public key is ready to be exported.</p><p>Old public keys need to be revoked and exported again. The newly exported key will then contain the revocation certificate.</p><p>The new key should show up in ssh-add -L without further action. If not, make sure SSH_AUTH_SOCK points to gpg-agent, not launchd. Usually starting a new shell after running gpg --card-edit will do the trick. The output of ssh-add -L contains the public key (identifiable by the cardno which is the Yubikey's serial number) that needs to be added to $HOME/.ssh/authorized_keys on the SSH server.</p>Likai Liuhttp://www.blogger.com/profile/06372207357661600589noreply@blogger.com1tag:blogger.com,1999:blog-4401555280825766585.post-83481361007475249652020-11-23T12:53:00.003-05:002020-11-23T21:57:53.532-05:00pidfd_getfd is harmful!<p>Everyday, it seems like Linux is gaining more Windows features to allow one process to hijack another process, which enables computer viruses and malware to propagate and steal user data. A recent example is the addition of <a href="https://lwn.net/Articles/808997/">pidfd_getfd()</a>, a system call that lets one process duplicate a file descriptor belonging to another process. The idea is that process Mallory would inspect procfs and see which file descriptors process Victim has open, then Mallory can duplicate them at will.</p><p>Before pidfd_getfd(), regular files may be reopened via procfs. It works even if the file has been unlinked, although the opening is still subject to inode permission. It may not seem like a big deal, but file permission controls access by the owner user and group, so it could not let one process keep secret from another process belonging to the same user, or from root. Fortunately, sockets and pipes cannot be reopened from procfs.</p><p>There are some questionably legitimate uses for pidfd_getfd(), but this mechanism is flaky. A commentator on LWN already pointed out the <a href="https://lwn.net/Articles/809109/">hilarious confusion due to race condition</a>. There is a delay between Mallory enumerating the file descriptors in procfs and when it decides to steal a file descriptor. In the mean time, Victim could have duplicated another file to it (e.g. via <a href="https://man7.org/linux/man-pages/man2/dup.2.html">dup2()</a>).</p><p>Here is the real reason why pidfd_getfd() is harmful: it violates the assumption in POSIX that a file descriptor is a credential testifying that a process has access to some resource. This credential can be passed around using <a href="https://pubs.opengroup.org/onlinepubs/9699919799/functions/sendmsg.html">sendmsg()</a> with <a href="https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_socket.h.html">SCM_RIGHTS</a>, or inherited by a child process through <a href="https://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html">fork()</a>. Both of these are deliberate actions taken by someone who possesses the credential. Now the security model is weakened to allow anyone who can call pidfd_getfd() to gain unauthorized access to these resources. It's unauthorized because it lacks deliberate action by the credential owner.</p><p>Before, a process could create a socketpair() and be secure in the knowledge that only the process it shares the file descriptor with can communicate over this secure local channel. Since the socket pair is already connected, it cannot be connected to another socket. But pidfd_getfd() allows Mallory to access the secret channel between Alice and Bob.</p><p>Before, Alice could send a file descriptor for a file she owns to a specific process owned by Bob even though Bob normally couldn't open the file. Another process owned by Bob would not be able to reopen that file from procfs. However, pidfd_getfd() bypasses the file permissions check.</p><p>An important principle in designing a secure system is to disallow access or an operation by an outsider unless a <b>deliberate action</b> is taken by the owner of the resource. Unfortunately, pidfd_getfd() breaks that principle. It was introduced in Linux kernel 5.6, so versions henceforth are no longer secure by design.</p>Likai Liuhttp://www.blogger.com/profile/06372207357661600589noreply@blogger.com0tag:blogger.com,1999:blog-4401555280825766585.post-89034873000022455712020-11-05T00:12:00.005-05:002020-11-23T12:54:53.113-05:00Electronic Voting<p>As Americans patiently wait for the result of the 2020 POTUS election, I decided to take a crack at solving the electronic voting problem, with a timing that is perhaps a bit irrelevant now but may be useful again in 4 years. Now, this is by no means a survey of the state of the art on electronic voting, since the research has been done in so many years by so many people. This exercise is just for me to kill time.</p><p>A naive electronic voting may have accountability issues, namely we don't know if a vote is legitimate, or that all votes have been counted correctly. In order to know if a vote is counted, we need to know how everyone voted, which seems to sacrifice anonymity, unless we come up with an electronic analogy of a paper ballot. Traditionally, a ballot is issued in person at the voting station, or mailed to the voter in advance for mail-in ballots. There is little protection against ballot forgery, but it would be difficult to forge ballots physically at a large scale.</p><p>For electronic voting, a ballot certificate would be issued by the county government to the voter upon voter registration. The issuer is responsible for checking voter identification before issuance. This ballot certificate gives the voter the right to vote, and should be kept secret before a vote is cast (anyone in possession of a ballot certificate could vote with it). The certificate has a unique serial number and a cryptographic signature signed by the issuer. The serial number ensures that the ballot could only be used to cast at most one vote, and the signature verifies that the ballot is issued legally by the county. Otherwise the ballot is not tied to the voter identity. The county should maintain a separate list of voters who already registered, so a voter could only ever receive one ballot certificate. Large scale fraud can be detected if a county somehow issued more ballot certificates than its voting population.</p><p>Votes are entered into the voting databases along with the ballot certificate. Since the voting database contains no personal identifiable information, its entirety can be downloaded by the public when the election is concluded, and anyone could verify the presence of their vote in the database as well as independently count all of the votes. US population is only ~328 million, so the database is probably ever going to be only a few gigabytes which can comfortably fit onto a smart phone. Ballot verification could still take an hour or two (e.g. ed25519 signature verification rate is 71K/s), but counting should take only a few minutes if not seconds.</p><p>The challenge is distributing the database while ensuring integrity, since anyone receiving a copy with parts of the database missing or tempered with would not know it, and would arrive at a different count.</p><p>To ensure that all the votes are recorded without being tempered, they need to appear as part of a Merkle Tree (like how git works). Each transaction begins with a base commit hash and stores the content of the voting record (ballot, vote), both of which would derive a new commit hash. The complete voting record (base commit hash, ballot, vote, new commit hash) is returned to the voter as a receipt, and the new hash is also used as the base commit hash for the next transaction. Once a vote is cast, the ballot certificate is no longer a secret.</p><p>This simplest form of the Merkle Tree is just a singly linked list, but this type of transaction is very inefficient, since it requires all the votes in the universe to be sequentialized (i.e. only one person can vote at any given time). A tree construction allows transactions to take place in parallel (i.e. many people can cast vote at the same time). First, vote commits are built sequentially as before, but when a good number of votes are collected, they are finalized into a block by recording the final commit hash of the block. Multiple blocks can be built concurrently and then merged into a superblock by creating a commit hash out of all the block hashes. Superblocks can be similarly merged, until the election is closed, at which point the whole election is given a final commit hash.</p><p>This way, a voter can verify that their vote is counted in the election if they could trace their vote's commit hash all the way to the final election hash. If a vote is tempered with, the commit hash would change. If a block is tempered with, its block hash would change. If any merges are tempered with, the final election hash would change. Anyone could download the whole voting database and know that they have the complete, untempered copy. They can write software to independently verify the Merkle Tree without trusting the software that generated it.</p><p>The electronic ballot certificate could be printed on paper as a 2D barcode, which allows the bearer to cast the vote as a paper ballot. When voting in person, the voting machine would scan the barcode, ask for voter input, conduct the transaction, then print out the receipt with the commit hash. If this 2D barcode is a QR code, it could be scanned by a mobile app for possibly voting by phone, but the risk is that a malware app could steal the ballot and cast the vote without the voter's knowledge.</p><p>I believe this system is secure and practical, but I haven't spent too much time agonizing over it to tell for sure. I purposefully avoided mentioning the word "blockchain" because many people use that buzzword without knowing what it really means. Even though a blockchain is built using a Merkle Tree as well, it limits the rate of transactions globally to 5 per second (really low) through proof of work. The proof of work was invented for Bitcoin to impose artificial scarcity to the cryptocurrency. In the context of election, scarcity may be able to mitigate large scale fraud, but this is not a good way for ensuring election integrity.</p>Likai Liuhttp://www.blogger.com/profile/06372207357661600589noreply@blogger.com0tag:blogger.com,1999:blog-4401555280825766585.post-31389048895792990852019-10-02T01:42:00.001-04:002019-10-02T01:49:29.438-04:00Python re.finditer() from an iterable of stringsEarlier today, I told a work colleague that <a href="https://docs.python.org/3/library/re.html#re.finditer">re.finditer()</a> in Python is usually the preferred way to <a href="https://docs.python.org/3/library/re.html#writing-a-tokenizer">tokenize input</a>, rather than hand-rolling a lexer from an iterable of characters. He said that it requires reading the whole file in memory. On the spot, I sheepishly said that you can probably read it line by line and call re.finditer() repeatedly if the token regexp does not cross line boundary.<br />
<br />
That is true, but not a great answer. It turns out that constructing a chained re.finditer() from an iterable of strings is possible although not quite straightforward.<br />
<br />
The first attempt here demonstrates the main idea: since re.finditer() matches are non-overlapping, after all matches are exhausted, the next re.finditer() should resume from the remainder of the unmatched string.<br />
<pre class="prettyprint lang-py linenums">def finditer_chained_bad(pattern, strings, flags=0):
last = ''
for s in strings:
m = None
for m in re.finditer(pattern, last + s, flags):
yield m
if not m:
last = '' ##1
continue
# m is the last match object.
last = s[m.end(0):]
</pre>
The line with the comment <code>##1</code> may be a point of contention. If re.finditer() had not yielded any match, the implementation decision here is to discard s entirely assuming that nothing in it would ever match, but one could argue that a potential match could be crossing the input string boundary. If this were changed to <code>last += s</code>, then it would work if the potential match crosses input boundary, at the expense of unbounded memory growth if a long span of the input never yielded any match. The unbounded growth is necessary because that means a potential match is larger than the input chunk size, so in order to get the full match, we need to dynamically increase the chunk size, which is what we're doing here.<br />
<br />
Another problem is that the last match could be a partial match that continues into the next string. If we yield it naively, the caller would get a broken partial match. To illustrate this, suppose the input is chunked into 8 character strings:
<br />
<pre class="prettyprint lang-py linenums">def main():
for m in finditer_chained_bad(r'\w+', ['hello wo', 'rld']):
print m.group(0)
</pre>
It prints:
<br />
<pre>hello
wo
rld
</pre>
The word "world" is broken into two matches by mistake due to the string boundary. To fix this, we just need to defer the last match and restart the next re.finditer() from the beginning of the last match. The full working code becomes:
<br />
<pre class="prettyprint lang-py linenums">def finditer_chained(pattern, strings, flags=0):
last_s = ''
for s in strings:
last_m = None
last_s += s
for m in re.finditer(pattern, last_s, flags):
if last_m: yield last_m
last_m = m
if not last_m:
continue
assert last_m is m
last_s = s[m.start(0):]
if last_m: yield last_m
</pre>
And it correctly prints:
<br />
<pre>hello
world
</pre>
The resulting finditer_chained() accepts a file-like object because a file is an iterable of lines, as if re.finditer() is applied to the whole file. And it works without reading the whole file in memory.Likai Liuhttp://www.blogger.com/profile/06372207357661600589noreply@blogger.com0tag:blogger.com,1999:blog-4401555280825766585.post-22836348384112741962018-12-25T10:13:00.001-05:002018-12-29T14:23:56.387-05:00“Hybrid Log Float” 14-bit to 10-bit Pixel Transfer FunctionHere I propose a Hybrid Log Float as a transfer function that maps from a 14-bit linear pixel to 10-bit non-linear pixel in a way that is inspired by <a href="https://en.wikipedia.org/wiki/IEEE_754">floating point numbers</a>. Unlike <a href="https://en.wikipedia.org/wiki/Half-precision_floating-point_format">16-bit half float</a>, we don't have enough bits for both an exponent and the mantissa. We also don't need to represent fractions or negative numbers. Having a custom integer bit format allows us to store more significant figures while preserving the widest dynamic range from the input. Expressing the transfer function as an integer bit format allows the encoding and decoding between linear and non-linear colorspaces to be fast and bit-accurate without using a lookup table.<br />
<br />
First, some background information. In this past year, I've had the pleasure of shooting some videos (<a href="https://www.youtube.com/watch?v=BQ90uZZNdnA">here</a>, <a href="https://www.youtube.com/watch?v=utG1_0CFdMg">here</a>, and <a href="https://www.youtube.com/watch?v=URCRH0sfLPU">here</a>) with <a href="https://shop.panasonic.com/gh5s">Panasonic GH5s</a>, which is a fine camera. When it was introduced in January, it was the first compact interchangeable lens camera that could internally record DCI 4K in 4:2:2 10-bit H.264. Such recording ability was only later superseded by the <a href="https://www.blackmagicdesign.com/products/blackmagicpocketcinemacamera">Blackmagic Design Pocket Cinema Camera 4K</a> (BMPCC) released in September, which records in 12-bit CinemaDNG RAW or 4:2:2 10-bit ProRes. These two cameras have a very similar sensor (if not the same) that outputs 14-bit per pixel. GH5s will generate 14-bit RW2 when taking pictures (e.g. time lapse), but not the BMPCC.<br />
<br />
The sensor's 14-bit per pixel output means that the maximum theoretical dynamic range is 14 stops. That's because each stop is twice or half the amount of light, so the number of bits per sample corresponds to the number of stops. This is corroborated by the assessment on GH5s done by Alan Roberts in <a href="https://tech.ebu.ch/docs/tech/tech3335_s29.pdf">EBU Tech 3335 S29</a>. He measured 14.6 stops for Hybrid Log Gamma and 13.5 for V-Log L. Both Hybrid Log Gamma and V-Log L are different picture profiles that map the 14-bit sensor output to the 10-bit codec. The measured dynamic range exceeds the theoretical somewhat, but it's within measurement error. Also, quantum uncertainty means that light level below the sensor's sensitivity could sometimes be picked up as temporal noise. But as a rule of thumb, we can think of the number of bits as the number of stops.<br />
<br />
It is interesting how dynamic range can be affected the way the codec downsamples 14-bit sensor output to 10-bit. The most naive approach is to simply truncate the least significant 4 bits, but that means you lose 16 shades in the shadow and end up with only 10 stops of dynamic range. This is pretty harsh if someone wants to recover shadow from an underexposed picture. This is probably what Like709 picture profile does on GH5s (simulates <a href="https://en.wikipedia.org/wiki/Rec._709">BT.709</a> color space) since it only measures 10.3 stops.<br />
<br />
The <a href="https://en.wikipedia.org/wiki/Hybrid_Log-Gamma">Hybrid Log Gamma</a> (HLG) and <a href="https://en.wikipedia.org/wiki/Log_profile">V-Log L</a> picture profiles take advantage of the way human eyes perceive intensities of light non-linearly, so they downsample 14-bit sensor output (which is the linear measurement of light) to 10-bit for codec using <a href="https://en.wikipedia.org/wiki/Transfer_function">transfer functions</a> that preserve the most perceptible shades. However, the picture profiles are designed for different purposes. HLG is calibrated for consumer TVs that render HDR in <a href="https://en.wikipedia.org/wiki/Rec._2100">BT.2100</a> colorspace, which is hard to color grade (changing the look and feel of a picture by manipulating colors) because the colorspace is non-linear. V-Log L has a flatter picture profile which makes color grading easier, but it's still better to convert it to a linear colorspace for <a href="https://www.youtube.com/watch?v=Q-2DW9ALOjU">more accurate color grading</a>. Non-linear colorspace can be converted back to linear using a look up table (LUT).<br />
<br />
Then two things had occurred to me.<br />
<br />
First, cameras should really provide a 16-bit codec. If the sensor data are only 14-bits, normalize the sample by zero-padding the least significant bits. The additional bits aren't a significant storage overhead (probably negligible after either lossless or lossy compression), and it makes processing on a computer a lot more efficient since the samples are naturally aligned to 16-bit short integers, which is well suited for <a href="https://en.wikipedia.org/wiki/SIMD">SIMD</a> instructions. Unfortunately, the only 16-bit capable codec today is HEVC, and it's not widely implemented. Both <a href="https://www.apple.com/final-cut-pro/docs/Apple_ProRes_RAW_White_Paper.pdf">ProRes RAW</a>and <a href="https://gopro.github.io/cineform-sdk/">CineForm</a> are currently 12-bit, even though their coding principles (DCT for ProRes, Wavelet for CineForm) are bit-agnostic, so it's just a matter of someone implementing it. I suspect that 16-bit codec will be much easier to implement. It would avoid this bit shifting voodoo to <a href="https://github.com/LibRaw/LibRaw/commit/9fc4e05565ccda07e8479b197cf8f166f94f53c6#diff-f29c1da0e3b6207b99ae3e301dcb547aR3231">unpack 14-bit integers</a>.<br />
<br />
Second, since a 16-bit capable codec is a bit out of reach, it's beneficial to have a transfer function that is bit-accurate (to minimize rounding errors) but still takes into account the way human perceives light intensities: we tend to notice minute light variances in darkness, but we are less sensitive to small variances when the environment is bright.<br />
<br />
The result is the Hybrid Log Float transfer function, which encodes a linear value to logarithmic scale and decodes back to the same linear colorspace with predictable allocation of significant figures. The idea is to encode the exponent as the number of leading one bits in unary, and the rest of the bits can be used for storing significant figures, in a way that's inspired by the UTF-8 encoding. On the other hand, the <a href="http://www.ti.com/lit/an/spra163a/spra163a.pdf">A-Law, µ-Law</a> companding algorithms count the exponent in binary in the same way <a href="https://en.wikipedia.org/wiki/Minifloat">Minifloat</a> works (but different number of bits allocated to exponent and mantissa), rather than in unary.<br />
<br />
To illustrate how unary exponent works, let's consider 14-bit to 12-bit reduction first.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://docs.google.com/drawings/d/e/2PACX-1vTLKeJaQ_pOXl4ZtTnleU5Y73PhDPquYp9wRa35sIR5yf2IKdZ5zpElDP04visuXTxmgcSDTL1aiTdi/pub?w=1505&h=494" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="263" data-original-width="800" height="210" src="https://docs.google.com/drawings/d/e/2PACX-1vTLKeJaQ_pOXl4ZtTnleU5Y73PhDPquYp9wRa35sIR5yf2IKdZ5zpElDP04visuXTxmgcSDTL1aiTdi/pub?w=1505&h=494" width="640" /></a></div>
<br />
It's broken down into four spans. The first span (a) has 11 bits of significant figures, and is encoded verbatim as linear. The second span (b) has 10 bits. The last two spans (c) and (d) each have 9 bits, and is actually ½ log(x) = log(√x).<br />
<br />
Here is the 14-bit to 10-bit reduction using the same approach.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://docs.google.com/drawings/d/e/2PACX-1vRYCGXdbIX86-SwarL1wxqXvrVMX20ivtg7e4IuIREtysEWRY3lhxtg_iQvgRRzbuT16k43QzXxEz1e/pub?w=1508&h=687" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="364" data-original-width="800" height="291" src="https://docs.google.com/drawings/d/e/2PACX-1vRYCGXdbIX86-SwarL1wxqXvrVMX20ivtg7e4IuIREtysEWRY3lhxtg_iQvgRRzbuT16k43QzXxEz1e/pub?w=1508&h=687" width="640" /></a></div>
<br />
The spans are (a) 9-bit linear, (b) 8-bit log, (c) 7-bit 1/2 log, (d) 6-bit 1/4 log, (e) 5-bit 1/8 log, (f) 5-bit 1/8 log. Notice that the two highlight tiers only have 5-bits, or 32 shades, which may not be enough to recover highlights. Since this encoding allocates more bits for the shadows, it prefers the picture to be slightly under-exposed.<br />
<br />
Here is an alternative 14-bit to 10-bit encoding that reserves more bits for the midtones and highlights, so it has better color accuracy for highlight recovery. The trick is to pre-allocate bits for counting the exponent in binary rather than unary at first, but revert back to unary exponent afterwards.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://docs.google.com/drawings/d/e/2PACX-1vQWM_H9KNYB8ksaN-qt4gT6z32SBTN4z7rS5kamlS3SzrTLvU5m4_R8sQizLrLsAHf_zs0ciTtLBzRP/pub?w=1508&h=687" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="364" data-original-width="800" height="291" src="https://docs.google.com/drawings/d/e/2PACX-1vQWM_H9KNYB8ksaN-qt4gT6z32SBTN4z7rS5kamlS3SzrTLvU5m4_R8sQizLrLsAHf_zs0ciTtLBzRP/pub?w=1508&h=687" width="640" /></a></div>
<br />
The difference is that spans (b), (c), and (d) are now 7-bits each, or 128 shades, at 1/2 log scale, using three bits to count the exponent. The upper two tiers (e) and (f) have 6-bits each, or 64 shades, at 1/4 log(x) scale.<br />
<br />
Here is a plot comparing:<br />
<ul style="line-height: 1.4; margin: 0.5em 0px; padding: 0px 2.5em;">
<li style="margin: 0px 0px 0.25em; padding: 0px;">Two variants of Hybrid Log Float (float v1, float v2)</li>
<li style="margin: 0px 0px 0.25em; padding: 0px;">Log curve normalized to the output range (73*log2(x))</li>
<li style="margin: 0px 0px 0.25em; padding: 0px;">Gamma correction curve at γ = 2.4 normalized to the input range (x ** (1/2.4) * 18) and the output range (x ** (1/2.4) * 57).</li>
<li style="margin: 0px 0px 0.25em; padding: 0px;">Linear normalized to the input range (x / 16) and the output range (x).</li>
</ul>
In the plot, the x axis is the input range, and the y axis is the output. A steeper slope indicates better color resolution. Clipping happens when the curve goes off the chart.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi5V5flIUkSHnCyAYGtrpAWqPcGJlniU0ASGz2EAziUzSAxFDqnyyYpG0Q7YgbVdit4OnbFFiIWfoe-uBZ8NR9jSDKNu4VnBT1X17rCfFjWT4qDalBO6pO2n3oKeHgpW2Wn3KbwDzboHcI/s1600/comparison.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1124" data-original-width="1600" height="448" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi5V5flIUkSHnCyAYGtrpAWqPcGJlniU0ASGz2EAziUzSAxFDqnyyYpG0Q7YgbVdit4OnbFFiIWfoe-uBZ8NR9jSDKNu4VnBT1X17rCfFjWT4qDalBO6pO2n3oKeHgpW2Wn3KbwDzboHcI/s640/comparison.png" width="640" /></a></div>
<br />
It may seem that log provides too much value resolution for shadow, and gamma 2.4 seems to provide just the right amount. But if we zoom into the input range 0-256, we can see that both log and gamma use much wider output range than the input range, resulting in wasted bits and more difficult noise reduction since the noise is amplified. Both Hybrid Log Float variants follow the same slope as linear x (normalized to output) at this scale, so we don't have the noise amplification problem.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjhz9XwSUzXNumcGCLVPIAjbI9NjAnY5K-zg99APneiW41E6kNJlo1ALh26yTrgm895xsj2-qUDQcqrDTRFRhOVWnToiHmqaVXP2bZFgfYS4VyZfDSYGKFm6b4oz3d0crWdaJR7oZMf6B4/s1600/comparison-zoom.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1124" data-original-width="1600" height="448" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjhz9XwSUzXNumcGCLVPIAjbI9NjAnY5K-zg99APneiW41E6kNJlo1ALh26yTrgm895xsj2-qUDQcqrDTRFRhOVWnToiHmqaVXP2bZFgfYS4VyZfDSYGKFm6b4oz3d0crWdaJR7oZMf6B4/s640/comparison-zoom.png" width="640" /></a></div>
<br />
Overall, Hybrid Log Float has the following benefits:<br />
<ul style="line-height: 1.4; margin: 0.5em 0px; padding: 0px 2.5em;">
<li style="margin: 0px 0px 0.25em; padding: 0px;">It does not waste output bits in the deep shadow, so it doesn't have the noise amplification problem that log or gamma may suffer.</li>
<li style="margin: 0px 0px 0.25em; padding: 0px;">Good midtone resolution.</li>
<li style="margin: 0px 0px 0.25em; padding: 0px;">It resolves highlights similarly like log for greater dynamic range.</li>
<li style="margin: 0px 0px 0.25em; padding: 0px;">The alternative second variant of Hybrid Log Float provides better midtone and highlight resolution than the first variant.</li>
</ul>
<div>
Also most importantly, Hybrid Log Float provides bit-accurate and efficient decoding back to the linear colorspace, so the color grading can be done the most accurately.</div>
<div>
<br /></div>
<div>
Last but not the least, while I would like to see cameras record to Hybrid Log Float when using a legacy 10-bit codec, ultimately I would like to see native 16-bit codecs so the post-processing does not have to waste precious machine cycles realigning the awkward 10-bit integers.</div>
Likai Liuhttp://www.blogger.com/profile/06372207357661600589noreply@blogger.com0tag:blogger.com,1999:blog-4401555280825766585.post-15781737479319083962018-05-30T23:47:00.000-04:002020-03-19T01:29:50.114-04:00Passwordless Debian LoginI run Debian inside a VirtualBox VM and <a href="https://lifecs.likai.org/2014/04/virtualbox-shared-folder-linux-notes.html">share my home folder</a> on the host OS, and now I've decided that it's silly to have to enter my password inside the VM all the time. Mainly, I want to auto-login to the console tty, and also convince sudo to let me run commands without entering a password.<br />
<br />
I thought this should be as straightforward as googling for a recipe. It turns out the instructions I found on the internet are just wrong and truly awful.<br />
<br />
First, to configure auto-login on Debian running systemd the <i>correct</i> way, run <code>systemctl edit getty@tty1</code> and enter the following:
<br />
<pre>[Service]
ExecStart=
ExecStart=-/sbin/agetty --autologin USERNAME %I $TERM
</pre>
Replace USERNAME with your actual username. Here are the ways that <a href="https://www.linuxquestions.org/questions/debian-26/how-to-enable-automatic-login-on-debian-8-jessie-for-console-4175550342/">this page</a> got it wrong: on Debian, there is no <code>/usr/bin/agetty</code>; it is found under <code>/sbin/agetty</code>. Also, the dangling <code>I</code> in the ExecStart line should have been <code>%I</code> which is the <a href="https://www.freedesktop.org/software/systemd/man/systemd.unit.html">systemd expansion</a> for the instance name. Baud rate should be omitted.<br />
<br />
If something goes wrong, just switch to tty2 with Alt-F2 (or Fn-Optoin-F2 on a Mac) and fix the problem. If you have a touch bar, configure it to <a href="https://www.imore.com/how-make-function-keys-default-touch-bar-display">always show function keys</a>. Otherwise, rebooting with the Debian installer CD into rescue mode always works.<br />
<br />
Now, to convince sudo to let me run commands without a password, run <code>visudo</code> and make sure <b>we swap the user and group specifications</b> with an end result like this:<br />
<pre># Allow members of group sudo to execute any command
%sudo ALL=(ALL:ALL) ALL
# User privilege specification
root ALL=(ALL:ALL) ALL
myuser ALL=(ALL:ALL) NOPASSWD: ALL
</pre>
The space between <code>NOPASSWD:</code> and <code>ALL</code> is optional, and it works either way. What is tricky is that the statements below overrides the ones above. This is counter-intuitive as an access control list, which typically matches top-down. By default, the group specification appears below the user specification, so the group overrides the user, but that makes no sense. It makes more sense for the individual user setting to override the group one.<br />
<br />
I don't have a lot of high hopes for a man page that begins with a <a href="https://www.sudo.ws/man/1.8.15/sudoers.man.html">quick guide to EBNF</a>.
<br />
<br />
<b>New (2020/03/19)</b>: tell systemd to use sulogin --force when root account is locked. If a service fails to start during init, systemd would fallback into rescue mode by running sulogin, but sulogin would print this message, and systemd would enter an infinite loop.<br />
<pre>Cannot open access to console, the root account is locked.
See sulogin(8) man page for more details.
Press Enter to continue.
</pre>
Unfortunately this is very common. Either a new kernel does not have the vboxsf module compiled for it (since the VirtualBox guest additions modules are not manage by DKMS), or a new VirtualBox version changed the protocol so the existing vboxsf module couldn't talk to the host.<br />
<br />
The <a href="https://github.com/systemd/systemd/commit/33eb44fe4a8d7971b5614bc4c2d90f8d91cce66c">solution</a> is to convince systemd to use sulogin --force. Run <code>systemctl edit rescue.service</code> and enter the following:
<br />
<pre>[Service]
Environment=SYSTEMD_SULOGIN_FORCE=1
</pre>
This should automatically drop down to root if something goes wrong.Likai Liuhttp://www.blogger.com/profile/06372207357661600589noreply@blogger.com0tag:blogger.com,1999:blog-4401555280825766585.post-66861362279615149832018-04-19T19:33:00.000-04:002018-04-19T19:47:30.839-04:00My Experience with AWS CloudFront and Lambda@EdgeI've got a few classic Google Sites with custom domain, and now I want to enable TLS for them. I thought this would be easy; after all, both <a href="https://cloudplatform.googleblog.com/2017/09/introducing-managed-SSL-for-Google-App-Engine.html" target="_blank">App Engine</a> and <a href="https://support.google.com/blogger/answer/6284029?hl=en" target="_blank">Blogger</a> now generate managed TLS certificates for custom domains. I thought classic Google Sites would be the same, but I was wrong. Not only is this not supported, but some guy over at the <a href="https://productforums.google.com/forum/#!topic/sites/E6YWO82Jgn0" target="_blank">production support forum turned indignant</a> on people seeking help like me.<br />
<br />
The basic idea is to use a CDN to terminate TLS. There are many CDN providers, such as <a href="https://aws.amazon.com/cloudfront/">AWS CloudFront</a>, <a href="https://cloud.google.com/cdn/docs/">Google Cloud CDN</a>, and <a href="https://www.cloudflare.com/">CloudFlare</a>, and in theory it shouldn't matter which one you use.<br />
<br />
AWS CloudFront worked well for my AWS S3 bucket <a href="https://lifecs-static.likai.org/">lifecs-static.likai.org</a> which hosts the static assets for this blog. Most S3 buckets can be accessed over HTTPS only when it isn't a custom domain, and that's because the wildcard certificate only covers the immediate *.s3.amazonaws.com but not further subdomains. Fortunately, CloudFront is able to serve from an S3 bucket directly. For other destinations, you have to specify the origin protocol. My <a href="https://hello.likai.org/">hello.likai.org</a> is hosted by <a href="https://www.nearlyfreespeech.net/">nearlyfreespeech.net</a> which can be accessed over HTTPS as likaior.nfshost.com. They do provide a custom script to enable Let's Encrypt, but I still decided to go with CloudFront for TLS termination.<br />
<br />
One of the Google Sites that I want to enable TLS for is <a href="http://cs.likai.org/">cs.likai.org</a> which is a classic Google Sites. The site can also be accessed over HTTPS as <a href="https://sites.google.com/a/likai.org/cs">sites.google.com/a/likai.org/cs</a>, but when accessed in this manner, the links generated in the page have the wrong base URI (the full sites.google.com etc) which won't work when served over a custom domain. On the other hand, Google Sites refuses to serve HTTPS for custom domains, so the origin protocol would have to be plain HTTP. Even so, there is another problem: style sheet and images have HTTP links as well, so when naively served over HTTPS, content policy would block those resources and break the rendering of the site.<br />
<br />
As I was playing around, I noticed that we can use AWS Lambda with CloudFront to rewrite requests and responses, also known as <a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-edge.html">Lambda@Edge</a>. This can be used to rewrite the links in the page to the preferred scheme-less form. For example, <a href="http://example.com/foo/bar">http://example.com/foo/bar</a> would become //example.com/foo/bar, which means when the base document is served over HTTPS, then the link to example.com will also be HTTPS, but the link would be HTTP for base document served over HTTP. Of course, since you can rewrite the response body with Lambda, I could have used <a href="https://sites.google.com/a/likai.org/cs">https://sites.google.com/a/likai.org/cs</a> as my origin so I could have end-to-end encryption, and trim the base URI from the response.<br />
<br />
Before, cs.likai.org was a CNAME for ghs.google.com. Obviously, I could not tell CloudFront to both use cs.likai.org as origin and then point cs.likai.org as a CNAME back to CloudFront, as this will create an infinite request loop, so I created a new mapping cs-http.likai.org in the <a href="https://support.google.com/a/answer/2518318?hl=en">Google Apps admin console</a>. The plan is to serve cs.likai.org with CloudFront, which fetches cs-http.likai.org over unencrypted HTTP. It would have been easier if CloudFront lets me use ghs.google.com as the origin and let me override the <code>Host</code> header, but it wouldn't let me.<br />
<br />
I wrote the following script for the Lambda (feel free to adapt it for your own use), to be associated as CloudFront Origin Response handler.<br />
<pre class="prettyprint">'use strict'
const cheerio = require('cheerio')
const url = require('url')
function stripLinkProtocol(attrname, elem) {
const $ = cheerio
// cheerio uses htmlparser2 and domhandler internally, so elem is a
// domhandler DOM-like object.
const link = $(elem).attr(attrname) // $.attr() is O(n), beware!
const parsed = url.parse(link)
if (!parsed || !parsed.protocol)
return
const newlink = link.substr(parsed.protocol.length)
$(elem).attr(attrname, newlink) // $.attr() is O(n), beware!
}
function stripHtmlLinks(html) {
const $ = cheerio.load(html)
$('[href]').each((i, elem) => { stripLinkProtocol("href", elem) })
$('[src]').each((i, elem) => { stripLinkProtocol("src", elem) })
$('form').each((i, elem) => { stripLinkProtocol("action", elem) })
return $.html()
}
exports.handler = (event, context, callback) => {
const response = event.Records[0].cf.response;
if (response.status == 200 &&
response.headers['content-type'] &&
response.headers['content-type'][0].value.startsWith('text/html') &&
response.body) {
response.body = stripHtmlLinks(response.body)
response.headers['x-lambda-edge'] = [
{'key': 'X-Lambda-Edge', 'value': 'html-strip-link-protocol'}
]
}
callback(null, response)
}
</pre>
But deploying it wasn't as straightforward, so here are some highlights for the surprises I encountered along the way:<br />
<ul>
<li>Lambda@Edge only supports NodeJS 6.10 (as of April 19, 2018) even though AWS Lambda in general supports other runtimes.</li>
<li>The IAM role for Lambda@Edge needs additional <a href="https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/lambda-edge-permissions.html">trust relationship</a> (but not the IAM permissions, which are for associating the Lambda with CloudFront).</li>
<li>Any external dependencies (such as cheerio) must be prepackaged and uploaded as a naked zip file (added files are not contained in a root folder). File permissions in the zip has to be world readable, or you will get "EACCES: permission denied, open '/var/task/index.js'" error. The extra dependencies can be fetched easily with "npm install ..." command. The main Lambda entry point must be called index.js.</li>
<li>You have to publish a version of the Lambda before you can associate it with CloudFront. After association, this version can't be deleted until all replicas have been disassociated.</li>
<li>To test the lambda without affecting what's live, create a new CloudFront distribution specific for testing, then attach the testing Lambda version to it. Do note that CloudFront updates are very slow (can take a few hours).</li>
</ul>
<div>
I also created a unit test which helps trying out different things.</div>
<pre class="prettyprint" style="white-space: pre-wrap;">{
"Records": [
{
"cf": {
"config": {
"distributionId": "EXAMPLE"
},
"response": {
"status": "200",
"headers": {
"content-type": [
{
"value": "text/html; charset=utf-8",
"key": "Content-Type"
}
]
},
"statusDescription": "OK",
"body": "<link rel='stylesheet' type='text/css' href='http://www.gstatic.com'> <a href='http://example.com/foo/bar'>"
}
}
}
]
}
</pre>
For some reason, despite the test passing, and the CloudFront had apparently finished deploying with no errors, I still wasn't able to see any evidence that my Lambda ran. The links were left as they were, and there is no indication of the <code>X-Lambda-Edge</code> response header I inserted in my code.<br />
<br />
And then I discovered this fine print in <a href="https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/lambda-updating-http-responses.html">Updating HTTP Responses in Origin-Response Triggers</a> buried under the mountains of documentation.<br />
<blockquote class="tr_bq">
When you’re working with the HTTP response, note that Lambda@Edge does not expose the HTML body that is returned by the origin server to the origin-response trigger. You can generate a static content body by setting it to the desired value, or remove the body inside the function by setting the value to be empty.</blockquote>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://i.ytimg.com/vi/9S9vP2inD_U/maxresdefault.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="450" data-original-width="800" height="180" src="https://i.ytimg.com/vi/9S9vP2inD_U/maxresdefault.jpg" width="320" /></a></div>
<br />
I feel that AWS Lambda@Edge is similar to App Engine, in the sense that if you wanted, you could write the whole server in Lambda which has access to backend storage like DynamoDB and S3, and AWS takes care of replication and scaling. But the achilles heel of AWS is the slow replication.<br />
<br />
I should have created an App Engine to front my classic Google Sites instead, then I get HTTPS for free.Likai Liuhttp://www.blogger.com/profile/06372207357661600589noreply@blogger.com2tag:blogger.com,1999:blog-4401555280825766585.post-70522193662239345282018-01-19T01:04:00.001-05:002018-04-17T01:24:47.851-04:00Embedding a Binary Search Tree in an ArrayRecently I started researching what would be the most efficient data structure to represent an IP address lookup table. An IP address table is useful for making routing decisions or access control, which checks whether a single IP address is a member of a range of netblocks. A netblock is represented by a base address and a prefix length, e.g. 192.168.42.0/24 represents a range of IP addresses from 192.168.42.0 through 192.168.42.255. Hash table is not suitable for this purpose because hashes are computed on exact matches. With the given example, this would require entering all 256 addresses in the range into the hash table, which is a significant blowup.<br />
<br />
Hardware routers use <a href="https://en.wikipedia.org/wiki/Content-addressable_memory#Ternary_CAMs">Ternary Content Addressable Memory</a> which can tell whether a specific value matches anything in the memory, after applying a bitmask. A software implementation might use a prefix tree (<a href="https://en.wikipedia.org/wiki/Trie">trie</a>), which theoretically deduplicates common prefixes among the entries, resulting in a smaller table. However, any data structure employing pointers has the added overhead of the pointers (a machine word) and cache locality problems, which are not considered in theoretical papers.<br />
<br />
The most compact data structure for the vast majority of lookup tables is going to be just an array of IP netblocks. For an unsorted array, lookup takes \( O(n) \) time which will be slow for a big table. If we presort the array and use binary search, the lookup takes \( O(\lg n) \) time. But <a href="https://www.pvk.ca/Blog/2012/07/30/binary-search-is-a-pathological-case-for-caches/">binary search is a pathological case for cache locality</a> because lookups tend to jump back and forth until it settles in to a target. Rather than a binary search which partitions candidates by two parts, it was shown that ternary search (three partition) improves performance, while quaternary search (four partition) improves little over binary search.<br />
<br />
It had occurred to me that cache locality for binary search could be improved if we had simply rearranged the presorted array to become an embedding of binary search tree (BST), such that for node at index \( i \), its left and right children are at \( 2i + 1 \) and \( 2i + 2 \) respectively. The scheme is inspired by how <a href="https://en.wikipedia.org/wiki/Binary_heap#Heap_implementation">binary heap is embedded in an array</a>, but we are embedding a complete BST. <b>The rationale is that the search always occurs left to right in the array, with the most accessed items located at the beginning of the array, which gets the most cache hits</b>.<br />
<br />
Here is an example showing how a presorted array with the number sequence \( 0 \dots 9 \) could be embedded.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh97I8g30T-bFG8NG8vtpxejzXtUZM6gYKk06tVCb65rCGvX37-k2q7-fVLEef2hzLdAkepdEXnwL59uZci5drvDGW17QxSa6ZUMy1U7ZdrCNTZPwVr4HnYXCwCYo9HQwDnlgh7LhaCv1A/s1600/Embed+Binary+Search+Tree.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="405" data-original-width="453" height="285" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh97I8g30T-bFG8NG8vtpxejzXtUZM6gYKk06tVCb65rCGvX37-k2q7-fVLEef2hzLdAkepdEXnwL59uZci5drvDGW17QxSa6ZUMy1U7ZdrCNTZPwVr4HnYXCwCYo9HQwDnlgh7LhaCv1A/s320/Embed+Binary+Search+Tree.png" width="320" /></a></div>
<br />
Note that this embedding works for any presorted array. The values for \(i\) are the indices of the BST embedding, and the values for \(x\) are the indices in the presorted array. A presorted array can always be represented by a complete BST, so the array is dense. However, we can't embed a prefix tree this way because a prefix tree is sparse.<br />
<br />
The implementation strategy is to presort the array, create a new array, and copy the items from the presorted array to the new array in the order of the BST embedding. The preprocessing only needs to be done once, and it results in the most compact representation of the lookup table. The assumption is that once the table is preprocessed, it won't need to be changed again.<br />
<br />
I implemented the embedded BST in Go and compared its lookup performance with the builtin <a href="https://golang.org/pkg/sort/#Search">sort.Search()</a> which performs binary search on a presorted array. The runs are repeated for increasing array sizes in the powers of 2 as well as stride width (both in terms of the number of machine words, not in bytes). The items to be searched are incremented by the stride width for each iteration; small stride approximates sequential lookup, whereas large stride approximates random lookup. Each of the data point is the average number after running some 2,000,000 to 100,000,000 iterations (the number of iterations varies due to the idiosyncrasies of the Go benchmark framework, not my fault; it tries to run each case for about 5 seconds).<br />
<br />
The folly of this is that sequential lookup (smaller stride widths) may not cover a significant portion of a very large array size \(2^n\) where \(n \ge 20\), but the experiment is fair because (1) both algorithms are subject under the same condition, and (2) the array size above \(2^{20}\) already exceeds L2 cache size on the machine running the experiment, which is sufficient to demonstrate performance difference due to cache locality. The performance numbers are collected on a <a href="https://everymac.com/systems/apple/macbook_pro/specs/macbook-pro-core-i5-2.4-13-late-2013-retina-display-specs.html">late 2013 model of MacBook Pro 13"</a> whose <a href="https://ark.intel.com/products/75990/Intel-Core-i5-4258U-Processor-3M-Cache-up-to-2_90-GHz">Core i5 "Haswell" I5-4258U</a> CPU runs at 2.4 GHz and has 32KB L1 cache and 3MB L2 cache. This machine has 8GB DDR3 ram. The program is compiled in 64-bits.<br />
<br />
<iframe height="600" src="//lifecs-static.likai.org/binary_bench_result.html" width="100%"></iframe>
<br />
[<a href="//lifecs-static.likai.org/binary_bench_result.html" target="_blank">popup figure in a new window</a>]<br />
<br />
You can use the drop-down menus to highlight the series for a specific algorithm (binary search, embedded BST) to see how they perform at various stride widths, or to highlight specific stride widths to compare how the two algorithms perform.<br />
<br />
Here are some observations.<br />
<ul>
<li>At small array sizes \(2^n\) where \(n \le 7\), all lookups fit in the L1 cache. The two algorithms have identical performance at all stride widths.</li>
<li>From array sizes \(2^7\) to \(2^{12}\), the L1 misses start to happen. Array size \(2^{12}\) occupies 32KiB, the size of L1 cache.</li>
<li>Array sizes \(2^{12}\) through \(2^{18}\) still fit within the L2 cache, after which the performance starts skyrocketing at the largest stride width (which simulates random access). Even so, embedded BST held up much better than binary search.</li>
<li>At small strides, embedded BST has a more predictable performance than binary search across all array sizes.</li>
<li>Embedded BST performs at least as well as or better than binary search across all array sizes and strides.</li>
</ul>
Admittedly, embedded BST is optimized for lookup. After the initial preprocessing, we don't try to insert or remove entries from the table. If a modifiable table is desired, we can relax the density of the array so that the BST only needs to be approximately balanced. This allows us to use any classical balanced BST algorithms like <a href="https://en.wikipedia.org/wiki/Red%E2%80%93black_tree">red-black tree</a> or <a href="https://en.wikipedia.org/wiki/AVL_tree">AVL tree</a>. It may be worthwhile to investigate the cache performance of array embedded versions of these algorithms as future work.<br />
<br />
Some IP address lookup use cases require inserting and deleting individual addresses, such as for the purpose of intrusion detection. Since they only need to match individual addresses rather than netblocks, a hash table would suffice.<br />
<br />
Otherwise, embedded BST is a great data structure for immutable IP address range lookup tables that is initialized once and optimized for lookup. It has the most compact representation and has much improved cache locality over binary search.Likai Liuhttp://www.blogger.com/profile/06372207357661600589noreply@blogger.com0tag:blogger.com,1999:blog-4401555280825766585.post-54612181833882863712018-01-11T00:51:00.001-05:002018-01-18T20:41:31.865-05:00C++, the cause and solution to life's pimpl problems<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgCxAdJ9HM26EbzhDH7VUKRIk9CPFa4hVzKnzpAIkfpRfJAgOdDLMc8If-SEnl4p0EswhUyMjP6zlpU7uOs1Aa3ewxQ4mXZpTsERmqP3Bt-FTxZh_cfV3PfhGYfxrZo4FqdUOzu5u-F2cs/s1600/output.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="480" data-original-width="640" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgCxAdJ9HM26EbzhDH7VUKRIk9CPFa4hVzKnzpAIkfpRfJAgOdDLMc8If-SEnl4p0EswhUyMjP6zlpU7uOs1Aa3ewxQ4mXZpTsERmqP3Bt-FTxZh_cfV3PfhGYfxrZo4FqdUOzu5u-F2cs/s400/output.gif" width="400" /></a></div>
<br />
Can a programming language invite programmers to write anti-patterns that makes them write more code that is also harder to follow? Certainly, it can.<br />
<br />
PIMPL is a much hyped code pattern in C++ to separate a class's public declaration from the implementation detail by wrapping a pointer to the implementation object in the public class. This is what the user of the public class sees. The pointer is the only member of the public class.
<br />
<pre class="prettyprint">// In xyzzy.h
class Xyzzy {
public:
Xyzzy(); // Constructor.
~Xyzzy(); // Destructor.
void Foo();
private:
class Impl; // Incomplete forward declaration to the implementation.
Impl *pimpl_; // Pointer to incomplete class is allowed.
};
</pre>
The implementation would be written like this, hidden away from the user.<br />
<pre class="prettyprint">// In xyzzy.c
class Xyzzy::Impl {
...
void Foo() {
// Actual code.
}
...
// All private members are declared in the Impl class.
};
Xyzzy::Xyzzy() // Constructor.
: pimpl_(new Impl) {}
Xyzzy::~Xyzzy() { // Destructor.
delete impl_;
}
void Xyzzy::Foo() {
impl_->Foo();
}
</pre>
There are some <a href="http://www.bfilipek.com/2018/01/pimpl.html">elaborate concerns</a> like constness preservation, use of smart pointers, copying and moving, memory allocation, and generalizing the boilerplate to a template, but I want to take a step back and look at what PIMPL accomplishes, and why it is necessary.<br />
<br />
The main reason for using PIMPL is to reduce compilation dependencies, so that changes made to the implementation class would not force the user of the public interface to have to recompile code. Recompilation is necessary when the change breaks binary compatibility. In C++, a change could break binary compatibility for banal reasons like adding a private member to a class.<br />
<br />
When class members are added or removed, the size of the class would change. In C++, it is the caller's responsibility to prepare memory storage for the object, regardless whether the object is stack or heap allocated. Without recompilation, the storage might be too small. Bigger storage is not a problem, but the compiler has no way of telling the old size. With PIMPL, the size of the public class is always just a single pointer to the implementation object, so it stays the same. It shifts the responsibility of storage preparation from the user to the implementation.<br />
<br />
Another type of change that warrants recompilation is when virtual methods are added or removed, which changes the ordering of function pointers in the <a href="https://en.wikipedia.org/wiki/Virtual_method_table">vtable</a>. Without recompilation, the old code would end up calling the wrong virtual method since the index changed. PIMPL would not be able to alleviate this type of recompilation; adding or removing non-virtual methods from the public class would still require recompilation under PIMPL.<br />
<br />
One might ask, instead of PIMPL, why not use the abstract base class with a static constructor? It would look something like this for the public interface.<br />
<pre class="prettyprint">// In xyzzy.h
class Xyzzy {
public:
static Xyzzy *New(); // Static constructor.
virtual ~Xyzzy(); // Virtual destructor.
virtual void Foo();
protected:
Xyzzy(); // Not to be constructed directly by user.
};
</pre>
And the implementation:
<br />
<pre class="prettyprint">// In xyzzy.c
class XyzzyImpl : public Xyzzy {
public:
void ~XyzzyImpl() override {
...
}
void Foo() override {
...
}
// Other members.
};
Xyzzy *Xyzzy::New() {
return new XyzzyImpl;
}
</pre>
The problem is that the abstract base class forces all methods to be virtual, and virtual method calls are more expensive because it's harder to do branch prediction with function pointers. The responsibility to manage storage is shifted to the static constructor, so adding or removing members to the base class shouldn't affected binary compatibility. Even so, it still requires recompilation in practice, so members should be declared in the implementation class only.<br />
<br />
The common theme here is that any change to a class in C++ requires recompilation, period. That's because the compiler can't tell what changed, so it can't tell whether the change may or may not affect binary compatibility. Contrast this with C, where user of a struct never has to know its size, without relying on virtual methods.
<br />
<pre class="prettyprint">// In xyzzy.h
struct xyzzy; // Incomplete type.
struct xyzzy *new_xyzzy(); // Constructor.
void delete_xyzzy(struct xyzzy *this); // Destructor.
void xyzzy_foo(struct xyzzy *this); // Method foo().
</pre>
Although most build systems would still recompile the translation unit that includes xyzzy.h for posterity, it doesn't have to if the changes are binary compatible. This is why an executable compiled with an older header could still be dynamically linked with a newer version of the shared library without recompilation, and it would still work.<br />
<br />
In the end, I think any effort to reduce recompilation for C++ code is futile. C++ is inherently designed for whole-program compilation. There are other reasons, like how template instantiation requires the implementation source code to be available. But any class would require the user to know its size, and one has to go through great lengths to hide that from the user in order to ensure binary compatibility.<br />
<br />
Clearly, PIMPL is an anti-pattern that is motivated by the way C++ is designed. It's a problem that C never has to worry about.Likai Liuhttp://www.blogger.com/profile/06372207357661600589noreply@blogger.com0tag:blogger.com,1999:blog-4401555280825766585.post-53531900655131032952017-09-30T11:37:00.000-04:002017-10-01T23:33:31.856-04:00How Computability Could Help Understand ProcrastinationThe weather has been fine this summer for any sane-minded person to sit at home writing blogs, but today it's clear that summer has come to an end. I brewed myself a cup of coffee this morning to sulk at the rainy mist outside of my window while reading an article about how <a href="https://theoutline.com/post/2352/apple-is-really-bad-at-design">Apple is bad at design</a> since 2013, which sent me spiral-minded to reminisce an anecdote about a boss asking me to do something to which I said is technically infeasible, and he bemoaned "you scare me when you say that." It reminded me of the introduction in <a href="https://dl.acm.org/citation.cfm?id=574848">Computers and Intractability</a> book by Garey and Johnson where this poor chap has to defend himself to his boss why he can't find a tractable algorithm for a problem.<br />
<br />
In computer science, an algorithm is intractable when even a modestly sized problem, say a problem that involves analyzing only hundreds of items, would take all the computing power in the universe until the end of time and still won't find an answer. More formally, such algorithm is said to take an <b>exponential</b> time to compute. If you multiply 2 times 2 times 2 and so on for just 90 times, you reach a figure \(2^{90} \approx 10^{27} \) that is bigger than the <a href="http://www.physicsoftheuniverse.com/numbers.html">age of the universe</a> in nanoseconds (roughly in terms of one computing cycle of a computer), and keep multiplying for 300 times, you reach a figure \(2^{300} \approx 10^{90} \) that is bigger than the <a href="https://www.universetoday.com/36302/atoms-in-the-universe/">number of atoms in the universe</a>. So a problem size of 400 would not be solvable even if all the atoms in the universe were to compute until the end of time.<br />
<br />
Several strategies were <a href="http://home.deib.polimi.it/amaldi/LucidiFRODE-05-06/applicazioni-NP-completezza.jpg">illustrated in the book</a> what this poor chap might say to his boss.<br />
<ul>
<li>He could say "I can't find an efficient algorithm. I guess I'm just too dumb." The boss is unimpressed.</li>
<li>He could also say "I can't find an efficient algorithm because no such algorithm is possible." But it is hard to justify why the algorithm is impossible to the still unimpressed boss.</li>
<li>Finally, the chap would say "I can't find an efficient algorithm, but neither can all these famous people." Sadly, the boss is still unimpressed.</li>
</ul>
I think the reason for the boss to be unimpressed throughout is that the cartoonist copied the same drawing of the boss from the previous cartoons, but he's perhaps doing it to make a point. You can't impress a boss if he doesn't get what he wants.<br />
<br />
Or the chap could say to his boss, well I know there is an efficient algorithm that will give you a figure that is generally good enough but will be off by a factor of 2 in the worst case. Would you take it? This is known in computer science as an <a href="https://en.wikipedia.org/wiki/Approximation_algorithm">approximation algorithm</a>.<br />
<br />
It's not hard for common folks to understand that there are intractable problems in life, and you just have to make do with something that is good enough. You can be a perfectionist, but finding the optimal solution to a problem will keep you busily procrastinating until the end of the universe. To counter this tendency, Steve Jobs famously said "<a href="http://www.creativethinkinghub.com/steve-jobs-was-right-real-artists-ship/">real artists ship</a>!"<br />
<br />
One could argue whether Apple's design ills since his passing is due to procrastination, or perhaps they took the anti-procrastination to an extreme and shipped a premature product, but it's clear that Steve Jobs and Tim Cook would take different approaches. Whereas Tim Cook would be unimpressed when his designers tell him the problems are intractable, Steve Jobs would understand what can be reasonably accomplished within a given time frame and plan accordingly.<br />
<br />
When solving an intractable problem, one has to plan ahead: it's better to take the complete output of an approximation algorithm than to interrupt an optimal but intractable algorithm and use the incomplete output.Likai Liuhttp://www.blogger.com/profile/06372207357661600589noreply@blogger.com0