A set of research papers to familiarize oneself with. Part 1

Greetings, everybody!

Over the last year, I came across a number of papers that are related to the IT field and interesting to examine. I’d like to share them with you. When you have some free time, take a look at them, and hopefully come back to read more.

Here, in the list, the items don’t have any particular order. For every item, I provided the authors’ names, the paper’s title, its abstract, and a link to download a copy (all of them are PDFs).

Also check out: part 2, part 3, and part 4.

Ulrich Drepper. What Every Programmer Should Know About Memory

As CPU cores become both faster and more numerous, the limiting factor for most programs is now, and will be for some time, memory access. Hardware designers have come up with ever more sophisticated memory handling and acceleration techniques–such as CPU caches–but these cannot work optimally without some help from the programmer. Unfortunately, neither the structure nor the cost of using the memory subsystem of a computer or the caches on CPUs is well understood by most programmers. This paper explains the structure of memory subsystems in use on modern commodity hardware, illustrating why CPU caches were developed, how they work, and what programs should do to achieve optimal performance by utilizing them.

http://www.akkadia.org/drepper/cpumemory.pdf

Jeffrey Dean, Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters

MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper.
Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program’s execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system.
Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google’s clusters every day.

http://research.google.com/archive/mapreduce-osdi04.pdf

David Goldberg. What Every Computer Scientist Should Know About Floating-Point Arithmetic

Floating-point arithmetic is considered an esoteric subject by many people. This is rather surprising, because floating-point is ubiquitous in computer systems: Almost every language has a floating-point datatype; computers from PCs to supercomputers have floating-point accelerators; most compilers will be called upon to compile floating-point algorithms from time to time; and virtually every operating system must respond to floating-point exceptions such as overflow This paper presents a tutorial on the aspects of floating-point that have a direct impact on designers of computer systems. It begins with background on floating-point representation and rounding error, continues with a discussion of the IEEE floating-point standard, and concludes with examples of how computer system builders can better support floating point.

http://www.cse.msu.edu/~cse320/Documents/FloatingPoint.pdf

Seth Schoen, Marcia Hofmann, Rowan Reynolds. Defending Privacy at the U.S. Border: A Guide for Travelers Carrying Digital Devices

Our lives are on our laptops – family photos, medical documents, banking information, details about what websites we visit, and so much more. Thanks to protections enshrined in the U.S. Constitution, the government generally can’t snoop through your laptop for no reason. But those privacy protections don’t safeguard travelers at the U.S. border, where the U.S. government can take an electronic device, search through all the files, and keep it for a while for further scrutiny – without any suspicion of wrongdoing whatsoever.

https://www.eff.org/sites/default/files/EFF-border-search_2.pdf

James Grimmelmann. SEALAND, HAVENCO, AND THE RULE OF LAW

In 2000, a group of American entrepreneurs moved to a former World War II antiaircraft platform in the North Sea, seven miles off the British coast. There, they launched HavenCo, one of the strangest start-ups in Internet history. A former pirate radio broadcaster, Roy Bates, had occupied the platform in the 1960s, moved his family aboard, and declared it to be the sovereign Principality of Sealand. HavenCo’s founders were opposed to governmental censorship and control of the Internet; by putting computer servers on Sealand, they planned to create a “data haven” for unpopular speech, safely beyond the reach of any other country. This Article tells the full story of Sealand and HavenCo—and examines what they have to tell us about the nature of the rule of law in the age of the Internet.
The story itself is fascinating enough: it includes pirate radio, shotguns, rampant copyright infringement, a Red Bull skateboarding special, perpetual motion machines, and the Montevideo Convention on the Rights and Duties of State. But its implications for the rule of law are even more remarkable. Previous scholars have seen HavenCo as a straightforward challenge to the rule of law: by threatening to undermine national authority, HavenCo was opposed to all law. As the fuller history shows, this story is too simplistic. HavenCo also depended on international law to recognize and protect Sealand, and on Sealand law to protect it from Sealand itself. Where others have seen HavenCo’s failure as the triumph of traditional regulatory authorities over HavenCo, this Article argues that in a very real sense, HavenCo failed not from too much law but from too little. The “law” that was supposed to keep HavenCo safe was law only in a thin, formalistic sense, disconnected from the human institutions that make and enforce law. But without those institutions, law does not work, as HavenCo discovered.

http://illinoislawreview.org/wp-content/ilr-content/articles/2012/2/Grimmelmann.pdf

Rob Pike. Structural Regular Expressions

The current UNIX® text processing tools are weakened by the built-in concept of a line. There is a simple notation that can describe the ‘shape’ of files when the typical array-of-lines picture is inadequate. That notation is regular expressions. Using regular expressions to describe the structure in addition to the contents of files has interesting applications, and yields elegant methods for dealing with some problems the current tools handle clumsily. When operations using these expressions are composed, the result is reminiscent of shell pipelines.

http://doc.cat-v.org/bell_labs/structural_regexps/se.pdf

Michael H. Fischer, T. J. Purtell, Monica S. Lam. Email Clients as Decentralized Social Apps in Mr. Privacy

This paper proposes Mr. Privacy, a social application framework built on top of email, that encourages open competition and provides privacy for users. Applications built on Mr. Privacy are “social apps” that look nothing like email. Email is used only as a transport mechanism and personal database. We choose email because it is more pervasive than any social network and it uses standardized open protocols enabling inter-operability across vendors. Consumers can pick their email providers or even host their own servers. We have developed a prototype Mr. Privacy platform for the Android, iOS, and Firefox. On top of Mr. Privacy, we created applications which share GPS locations, music playlists, and contextual discussions of websites. Preliminary results suggest that the email protocols suffice for building these kinds of social applications. This model supports data privacy and ownership, and facilitates interoperability and competition.

http://mobisocial.stanford.edu/papers/hotpets11.pdf

Kathleen Nichols, Van Jacobson. A modern AQM is just one piece of the solution to bufferbloat

This article aims to provide part of the bufferbloat solution, proposing an innovative approach to AQM suitable for today’s Internet called CoDel (for Controlled Delay, pronounced like “coddle”). This is a “no-knobs” AQM that adapts to changing link rates and is suitable for deployment and experimentation in Linux-based routers (as well as silicon).

http://portal.acm.org/ft_gateway.cfm?id=2209336&type=pdf

Part 1, Part 2, Part 3, Part 4.

PS. How many purple links do you have? :)

KISS 🇺🇦

Stop the war!

A set of research papers to familiarize oneself with. Part 1

Comments