Research parers worth reading. Part 4

Hello there! Today I’m publishing the next set of research papers and articles that are worth reading for an IT person. Here are the previous parts:

David A. Wheeler. Fixing Unix/Linux/POSIX Filenames: Control Characters (such as Newline), Leading Dashes, and Other Problems

This article will try to convince you that adding some tiny limitations on legal Unix/Linux/POSIX filenames would be an improvement. Many programs already presume these limitations, the POSIX standard already permits such limitations, and many Unix/Linux filesystems already embed such limitations — so it’d be better to make these (reasonable) assumptions true in the first place. This article will discuss, in particular, the three biggest problems: control characters in filenames (including newline, tab, and escape), leading dashes in filenames, and the lack of a standard character encoding scheme (instead of using UTF-8).

http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html

Independent Security Evaluators. Industry-wide Misunderstandings of HTTPS

Most web browsers, historically, were cautious about caching content delivered over an HTTPS connection to disk—to a greater degree than required by the HTTP standard. In recent years, in response to the increased use of HTTPS for non-sensitive data, and the proliferation of bandwidth-hungry AJAX and Web 2.0 sites, some browsers have been changed to strictly follow the standard, and cache HTTPS content far more aggressively than before. HTTPS web servers must explicitly include a response header to block standards-compliant browsers from caching the response to disk—and not all web developers have caught up to the new browser behavior. ISE identified 21 (70% of sites tested) financial, healthcare, insurance and utility account sites that failed to forbid browsers from storing cached content on disk, and as a result, after visiting these sites, unencrypted sensitive content is left behind on end-users’ machines.

http://securityevaluators.com/content/case-studies/caching/caching.pdf

Scott A. Crosby, Dan S. Wallach. Denial of Service via Algorithmic Complexity Attacks

We present a new class of low-bandwidth denial of service attacks that exploit algorithmic deficiencies in many common applications’ data structures. Frequently used data structures have “average-case” expected running time that’s far more efficient than the worst case. For example, both binary trees and hash tables can degenerate to linked lists with carefully chosen input. We show how an attacker can effectively compute such input, and we demonstrate attacks against the hash table implementations in two versions of Perl, the Squid web proxy, and the Bro intrusion detection system. Using bandwidth less than a typical dialup modem, we can bring a dedicated Bro server to its knees; after six minutes of carefully chosen packets, our Bro server was dropping as much as 71% of its traffic and consuming all of its CPU. We show how modern universal hashing techniques can yield performance comparable to commonplace hash functions while being provably secure against these attacks.

http://www.cs.rice.edu/~scrosby/hash/CrosbyWallach_UsenixSec2003.pdf

Ulrich Drepper. Futexes Are Tricky

Starting with early version of the 2.5 series, the Linux kernel contains a light-weight method for process synchronization. It is used in the modern thread library implementation but is also useful when used directly. This article introduces the concept and user level code to use them.

http://www.akkadia.org/drepper/futex.pdf

Andreas Voellmy Junchang Wang, Paul Hudak, Kazuhiko Yamamoto. Mio: A High-Performance Multicore IO Manager for GHC

Haskell threads provide a key, lightweight concurrency abstraction to simplify the programming of important network applications such as web servers and software-defined network (SDN) controllers. The flagship Glasgow Haskell Compiler (GHC) introduces a run-time system (RTS) to achieve a high-performance multicore implementation of Haskell threads, by introducing effective components such as a multicore scheduler, a parallel garbage collector, an IO manager, and efficient multicore memory allocation. Evaluations of the GHC RTS, however, show that it does not scale well on multicore processors, leading to poor performance of many network applications that try to use lightweight Haskell threads. In this paper, we show that the GHC IO manager, which is a crucial component of the GHC RTS, is the scaling bottleneck. Through a series of experiments, we identify key data structure, scheduling, and dispatching bottlenecks of the GHC IO manager. We then design a new multicore IO manager named Mio that eliminates all these bottlenecks. Our evaluations show that the new Mio manager improves realistic web server throughput by 6.5x and reduces expected web server response time by 5.7x. We also show that with Mio, McNettle (an SDN controller written in Haskell) can scale effectively to 40+ cores, reach a throughput of over 20 million new requests per second on a single machine, and hence become the fastest of all existing SDN controllers.

http://haskell.cs.yale.edu/wp-content/uploads/2013/08/hask035-voellmy.pdf

Poul-Henning Kamp. A Generation Lost in the Bazaar

Thirteen years ago, Eric Raymond’s book The Cathedral and the Bazaar (O’Reilly Media, 2001) redefined our vocabulary and all but promised an end to the waterfall model and big software companies, thanks to the new grass-roots open source software development movement. I found the book thought provoking, but it did not convince me. On the other hand, being deeply involved in open source, I couldn’t help but think that it would be nice if he was right.

http://portal.acm.org/ft_gateway.cfm?id=2349257&type=pdf

Keith Winstein and Hari Balakrishnan. TCP ex Machina: Computer-Generated Congestion Control

This paper describes a new approach to end-to-end congestion control on a multi-user network. Rather than manually formulate each endpoint’s reaction to congestion signals, as in traditional protocols, we developed a program called Remy that generates congestion-control algorithms to run at the endpoints.
In this approach, the protocol designer specifies their prior knowledge or assumptions about the network and an objective that the algorithm will try to achieve, e.g., high throughput and low queueing delay. Remy then produces a distributed algorithm—the control rules for the independent endpoints—that tries to achieve this objective.

http://web.mit.edu/remy/TCPexMachina.pdf

SpaceX. Hyperloop Alpha

When the California “high speed” rail was approved, I was quite disappointed, as I know many others were too. How could it be that the home of Silicon Valley and JPL – doing incredible things like indexing all the world’s knowledge and putting rovers on Mars – would build a bullet train that is both one of the most expensive per mile and one of the slowest in the world? Note, I am hedging my statement slightly by saying “one of”. The head of the California high speed rail project called me to complain that it wasn’t the very slowest bullet train nor the very most expensive per mile.
The underlying motive for a statewide mass transit system is a good one. It would be great to have an alternative to flying or driving, but obviously only if it is actually better than flying or driving. The train in question would be both slower, more expensive to operate (if unsubsidized) and less safe by two orders of magnitude than flying, so why would anyone use it?

http://www.spacex.com/sites/spacex/files/hyperloop_alpha.pdf

Part 1, Part 2, Part 3, Part 4.

KISS 🇺🇦

Stop the war!

Research parers worth reading. Part 4

Comments