Stuff The Internet Says On Scalability For January 4th, 2019 - High Scalability -

 highscalability.com  01/04/2019 17:03:34  3

Friday

Friday, January 4, 2019 at 9:03AM

Wake up! It's HighScalability time:

Solar system? Nope, the beauty is in your head—neural art.

Do you like this sort of Stuff? Please support me on Patreon. Need cloud? Explain the Cloud Like I'm 10 (34 almost 5 star reviews).

  • 45%: learned scheduler improves average job completion time; 61%: apps share data with Facebook; 45,037,125: people who watched Bird Box on Netflix in first week; 32,368: color images collected by the Curiosity rover on Mars between August 2012 and November 2018; $36.1B: AI  healthcare market by 2025; 20%: object recognition failure in light rain; 350: pages in Donald Knuth's new book;  $1,000: price needed to deactivate Facebook account; $3B: Epic games profit; 1: bitcoin mined from body heat of 44,000 people; 

  • Quotable Quotes:
    • @Hannah_Chutzpah: What are the technical terms, in your field, for 'dunno'? In medicine there's 'idoeopathic' In archeology/anthropology there's 'ritual purposes' How do you professionally term 'we haven't got a clue'?
      • @peterseibel: In programming we don't have a special term for it, just like fish don't have a special term for 'water'.
    • @robn: everyone seems to want stateless services and stateless protocols but I have three kids that can't remember shit I told them ten seconds ago and believe me it's the worst and I don't know why you'd want that
    • @stranter: "A machine learning agent intended to transform aerial images into street maps and back was found to be cheating by hiding information it would need later in “a nearly imperceptible, high-frequency signal" ||  Sounds like the algo needs some parenting
    • @pmddomingos: Overheard at Stanford: trying to run a university in Silicon Valley is like trying to run a monastery in Gomorrah.
    • @scalzi: So, things I've learned in the last few days: 1. Sometimes 15GB+ updates to video games are in fact necessary and unavoidable 2. Devs mostly hate 'em too but right now there's not much to be done about it 3. I still hate it but UGH OKAY FINE I guess
    • @copyconstruct: "GPUs will increase 1000× in performance by 2025, whereas Moore’s law for CPUs essentially is dead. By replacing branch-heavy algorithms with neural networks, the DBMS can profit from these hardware trends."
    • @hotjamz: he's makin a list / it's kept in plain text / an elf just got phished / you know what is next / santa claus has leaked 2.2 billion user's names, addresses, naughty/nice statuses, and the times they are asleep in their homes
    • @Nick_Craver: Bugs used to keep me up - they don't anymore. You know what does? Any bad interaction with a person. If we're talking past each other or either is left feeling hurt or frustrated, *that* keeps me up. Bugs? Meh, whatever. There will be another one tomorrow and again the next day.
    • @samcharrington: I timed the process of getting it up and running on Lambda with Zappa. IT TOOK JUST UNDER 15 MINUTES! Half of this was messing with IAM and API Gateway again, since I set it up in a different AWS account. Half of the other half was some basic testing. Wow! /7 It BLEW MY MIND how easy it was to migrate the app from Heroku to Lambda! I've still got a bit of work to do to understand operability (logging and error handling mostly), but I expect to shut down the Heroku Dyno very soon and never look back for these kinds of apps. /8 Heroku / PaaS made deploying these kinds of apps (basically REST APIs) very easy, and cheap, for me, which created a bit of a blind spot when it came to #serverless. But scaling on Heroku gets pricey very quickly. /9
    • makecheck: I find it hard to blame anyone for leaving any [Apple App] store when there has been almost nothing of value added in years and things are even regressing. That 30% should have gone a lot further; after 10 years, I can’t even use two different search-and-sort criteria at once to find new apps (and I refuse to sit there and scroll through offensively-bad lists of useless results, much less buy them).
    • @uxresearch: Me ten years ago, on seeing a poorly designed interface “Wow, what idiot designed this?” Me, today: “What constraints were the team coping with that made this design seem like the best possible solution?” Empathy trumps fundamental attribution errors.
    • @rygorous: Work gets farmed out to whatever cluster machine currently has spare time. So even if your light mapping has no randomization, the same chunk of world might get run on different machines with different GPUs and different shader compilers on subsequent runs. That causes a ripple effect where again everything slightly changes (at the bitwise level) even when the result ends up visually indistinguishable to what it was before.
    • krastanov: There is no evidence that quantum annealing (what D-Wave does) is any better than classical computers. There is a lot of evidence that quantum computers (the gate model) or, equivalently, quantum adiabatic computing is better than classical computing. All of it is based on a family of conjectures about the complexity classes P, BQP, and NP. Scott Aaronson's blog is one of my go-to suggestions for rigorous introduction to the topic.
    • Baltes: Having more experience with a task does not automatically lead to better performance. Research has shown that once an acceptable level of performance has been attained, additional “common” experience has only a negligible effect, in many domains the performance even decreases over time. The length of experience has been found to be only a weak correlate of job performance after the first two years.
    • flexer2: After a couple emergencies where communication was identified as a big issue, they proposed moving back to the old radio system, but they had already sold off the frequencies and dismantled the infrastructure. My dad retired not long after this, but the “corporate bean counter” trope rang quite true here, and in the long run we were all a little less safe because some executive with no field experience wanted to make a name for themself by saving a little money on something that proved to be mission critical.
    • brg1007: So if I understood well the concept, this type of amoeba is like an ASIC for traveling salesman problem. Interesting concept, to find bio-organisms that can be used very efficentlly to "solve" a certain type of problem.
    • throwawaymath: Quantum supremacy is not merely about quantum computers performing tasks better than classical computers. It's about quantum computers achieving a superpolynomial speedup over classical computers, such that classical computers can't feasibly perform the task in a reasonable amount of time for all inputs. That's an important distinction because it's significantly more difficult to achieve a fundamental asymptotic improvement instead of an iterative speedup for the same factor. If a quantum computer completes a task with complexity O(2n) that a classical computer requires O(10n) to complete, you don't have quantum supremacy. If your quantum computer can accomplish a task in O(2n) that your classical computer needs O(2^n) to perform, you've got supremacy.
    • user5994461: Performance issues should be the least of your concern. The docker deamon and container simply hanged because of filesystem issues on CentOS 6. I worked at a company that was dockerizing their stateless services, then planning to dockerize their cassandra databases. Multiple contractors involved. Stateless services failed periodically because of the above issue. Load balancers can failover automatically, broken nodes are rebooted from time to time, limited impact. Noone cared, just a daily deployment routine.
    • softwaredoug: For those terrified of an AWS dominated future, projects like this are crucial. The closer we can get to OSS based push button open source DB cluster in any cloud, the less we need fear AWS will host everything and lock us in to a walled garden of closed source AWS systems.
    • George Dyson: There is now more code than ever, but it is increasingly difficult to find anyone who has their hands on the wheel. Individual agency is on the wane. Most of us, most of the time, are following instructions delivered to us by computers rather than the other way around. The digital revolution has come full circle and the next revolution, an analog revolution, has begun. None dare speak its name...The next revolution will be the ascent of analog systems over which the dominion of digital programming comes to an end. Nature’s answer to those who sought to control nature through programmable machines is to allow us to build machines whose nature is beyond programmable control.
    • Tony Barboza: What first arose as a server outage was identified Saturday as a malware attack, which appears to have originated from outside the United States and hobbled computer systems and delayed weekend deliveries of the Los Angeles Times and other newspapers across the country.
    • @dakami: Serverless is not someone else managing your servers.  Serverless is you agreeing that your code doesn't get to depend on long term implicit state.  It's not really no physical servers, it's no daemons.
    • Subbu Allamaraju: Tomorrow’s serverless offerings will likely be “frameworks as services”, with “function as a service” being just one possibility to solve a certain class of stateless programming problems.
    • @codinghorror: "The creators of the game had no idea that the player base would love to kill, not just the evil creatures like bears and wolves, but also the innocent creatures."
    • @migueldeicaza: This goes to the core of why I dislike abstractions and interfacitis.   People are designing software as [email protected] should eventually allow a plugin to send e-mail, emulate Emacs, plug a GPU pipeline, use the blockchain to store its config settins and deploy a sort fn over kubernetes
    • Pete Foley: In August of 1987 my supervisor, Walt Peschke, told me of an opportunity to join a pirate outfit headed by Steve Sakoman that was building a new portable computer – and I jumped at the chance to be one of the first members of the Newton team, see:  APPLE-NEWTON
    • Ian Cutress: The long and short of matters then is that based on the testing we've done thus far, it doesn't look like Coffee Lake Refresh recovers any of the performance the original Coffee Lake loses from the Meltdown and Spectre fixes. Coffee Lake was always less impacted than older architectures, but whatever performance hit it took remains in the Refresh CPU design.
    • @SeshuAd: Great articulation of all the confusion surrounding serverless. One other dimension to add would be services such as Redshift migrating to serverless models and the benefits with fine grained multi tenancy.
    • @GossiTheDog: After a 50 hour outage at 15 datacenters across the US — impacting cloud, DSL, and 911 services — CenturyLink say the outage is fixed, and was caused by a single network card sending bad packets (they’ve since applied bad packet filtering).
    • @davidwhogg: Thesis: The creation of the arXiv has had a bigger impact on physics than any discovery made since 1990.
    • @stevecheney: Amazon has over 10,000 employees in its Alexa group. That’s the size of about *FIVE HUNDRED* series A backed startups. Absolutely absurd. What do they do over there? To put in perspective, at a fully loaded cost of $300K per employee, that’s $3B per year...in payroll. More than Sequoia, a16z and Greylock will deploy in their current funds over 5+ years!
    • @IEEESpectrum: Taking stock of the year’s worst IT failures: In February, an error in a supplier’s logistics management system forced KFC to close 470 stores for several days because it didn’t have any chicken to fry. 🐔
    • @rodneyabrooks: On this day in 1903 was born John von Neumann, the greatest polymath of the 20th century. We are all using "von Neumann architecture" chips to read this tweet. Game theory, ergodic theory, Artificial Life, quantum mechanics, hydrodynamics, statistics, etc. He was an immigrant.
    • @sam_ferree: I’m calling for an end to holy war against code duplication. We convince young developers and engineers that it’s the worst thing ever, when time teaches all of us that it is, the vast majority of the time, duplication preferable to dependency.
    • @davidgerard: this is precisely why Bitcoin was always doomed to centralisation: capitalism is *ruthless* about optimising out financial inefficiency, and Bitcoin tried to do a judo trick on the nature of capitalism. Failed in 5 years.
    • Maximilian Fiege: “Bitcoin maximalists argue that the energy consumption of Proof-of-Work provides the network with a security moat. In reality, it introduces political risk because of the highly centralized nature of energy infrastructure and markets.”
    • @MaxCRoser: One of my favorite projects in 2018 was to make this video with @Kurz_Gesagt: the transition from zero-sum economic stagnation to positive-sum growth was likely the biggest change to society ever. Others turned from rivals into potential partners.
    • Adrian Thompson: The core principle that these ideas approach is to look for an efficient composition of electronic components selected from a set of physical (not abstract) resources, such that their coupled natural behaviours collectively give rise to the required overall system behaviour. In this paper, we have seen evolution do exactly that. A `primordial soup' of recon gurable electronic components has been manipulated according to the overall behaviour it exhibits, and on no other criterion, with no constraints imposed upon the structure or its dynamical behaviour other than those inherent in the resources provided.
    • @GossiTheDog: Signs the world has changed: theatrical releases of movies don’t get anywhere near 45m views opening week. There aren’t even 45m seats.
    • Fiona McMillan: one cubic millimeter in the brain’s cerebral cortex contains around 50,000 neurons each making 6,000 connections with other neurons (give or take a few)
    • @raudelmil: Perhaps worth noting that “eliminating unnecessary dependencies” and “reinventing the wheel” could both be popular phrases describing the same activity.
    • Fred Turner: One of the deepest ironies of our current situation is that the modes of communication that enable today’s authoritarians were first dreamed up to defeat them. 
    • @swardley: Future X : What was the fuss about conversational programming? Me : It's best I show you. Jarvis? Jarvis : Please describe your user need? Me : Build me a system to demonstrate what programming was like circa 2019. Jarvis : Deploying now. X : Oooh, what's that? Me : A console.
    • @asymco: iPhone quarterly revenue growth, last 8 quarters. 1% 3% 2% 13% 14% 20% 29% -15%
    • @artem_zin: Finally migrated from Apache Ignite to Redis Cluster deployment — finally no cluster split issues and latency spikes. Took just 1.5 days from nothing → dev → prod Remaining: custom Cluster version of Sentinel feels g00d @antirez respect++, great software
    • @lclaytonparker: Parker's Law of System Entropy: Total entropy in any given engineering system increases as the square of system complexity, chance of catastrophic failure as the cube of system complexity.
    • Joe Emison: Just as the rise of Infrastructure-as-a-Service gave rise to a new optimal way of developing software (“cloud native”), so too does Serverless; you cannot lift and shift your cloud-native applications to Functions-as-a-Service (FaaS) platforms and expect them to be optimally designed.
    • @NeilRetail: In 1985, US department stores took 14.5% of all retail spend. Last year they took 4.3%. The figure is still falling. The internet is often blamed for this. But the blunt truth is that US department stores just aren’t very good retailers. In fact, most of them are abysmal.
    • peterwwillis: As an "enterprisey" person, I disagree. I've seen a lot of enterprise infrastructure that looks like toddlers built it out of lincoln logs. And I've seen SANs lose connectivity much more often than a pool of independent local disks all going bad at once. On top of that, databases run on VMs that aren't on hypervisors dedicated for running databases results in shitty adminning and overcrowded VM pools destroying database performance+reliability. Cloud-ish infrastructure is often good for running distributed decentralized databases, but try running Oracle in a bunch of Docker containers on a crappy OpenStack cluster and soon you'll be crying into your scotch.
    • Mark Lapedus: Nearly 70% of tariffs paid by the hi-tech industry come from the $200 billion product list enacted Sep. 24. Tariffs on CTA-identified tech products jumped to $1.3 billion in October, which is seven times the amount from the same month a year ago. The industry has also paid $122 million more on 5G-related imports alone in October, compared to $65,000 a year ago, according to the trade group.
    • Jesse Allen: Researchers at the University of Massachusetts Amherst and Brookhaven National Laboratory built memristor crossbar arrays with a 2nm feature size and a single-layer density up to 4.5 terabits per square inch. The team says the arrays were built with foundry-compatible fabrication technologies.
    • jdietrich: If you're going to produce a reasonable-sized batch of anything electronic, I'd strongly recommend: a) doing some reading on design for manufacturing b) getting some quotes for PCB assembly services and c) considering factory-programmed microcontrollers At quantity 200, you could have simple boards like these assembled for about a buck a piece. Microchip will program 200 PIC or Atmel microcontrollers from the factory for about 20 cents each with a $29 setup fee. Unless you're willing to work for a very low hourly rate, you should probably take advantage of someone else's economies of scale.
    • Ali Tamaseb: It seems like having 2 or 3 co-founders is the ideal scenario, however, it is important to note that 20% of all billion-dollar startups have had a solo founder. One popular misconception is that billion dollar companies get started by college dropouts. There are certainly college dropouts in the list, but the bulk of these founders were between 24 to 36 years old when they were getting started. Contrary to the popular belief, most founders don’t have any directly relevant work experience in the industry they are disrupting. There’s also a clear distinction between the CEO and CxO where the industry experience is even less relevant for the CxO. 
    • Animats: It was. In the entire history of electromechanical switching in the Bell System, no central office was down for more than 30 minutes for any reason other than a natural disaster or major fire. There are books about Number Five Crossbar and how it worked, and they're worth reading if you design high-reliability systems...Widespread failure of 911 service suggests an overcentralized architecture. 911 requires a phone number to address lookup, so there's a database involved. Widespread failure indicates this was implemented as a remote query service ("in the cloud") rather than read-only database copies of the directory at each central office.
    • Kevin G: Speaking of the server side of things, 2019 will be interesting on the SSD front. Mentioned in the article is coming Samsung NF1 vs. Ruler format war. I strongly suspect Intel will win in the long run but if Samsung doesn't make any drives for it, there will be a vacuum to fill for Ruler providers. Intel obviously has their own Optane solution for the Ruler format and Micron to cover more traditional NAND offerings. There is room for a third party to rise up in this market. I'm also hopeful we will see some PCIe bridge chips that include hardware NVMe RAID5/6 support. This would permit six PCIe 4x NVMe drives to share a single PCIe 16x uplink to the host without sacrificing bandwidth as the NVMe RAID controller handles the parity calculations for the redundant drives. For the above mentioned server form factors, this will be critical to maximizing the number of drivers per 1U chassis (currently targeting 36). HighPoint and MicroSemi have such chips on their roadmaps but they are not shipping them yet to my knowledge. NVMe controllers will start to move over to PCIe 4.0 this year adoption spreads of this higher speed spec. One the server side we may even see some OpenCAPI compliant drives for even more bandwidth, even lower latency connectivity and reduced CPU overhead.

  • Into games? You might like Will Wright's (Sims) Masterclass: The Fundamentals of Game Design. Like most masterclasses it's a frustrating combination of a lack of specifics and deep insight. Deep insight won out. 
    • The section on System Design has a fascinating glimpse of how Sim City was structured on a foundation combing the game of life and cybernetic theory. How is that done? That would be telling. So no idea.
    • The biggest wow moment for me was when Will—and after spending hours with him on my elliptical I can call him Will—talked about the difference between games and stories. Games are thought to be a lesser art because unlike stories (books, movies, tv), games do not engage our emotions. Will said it’s not true that games don’t engage our emotions, they just engage different ones than stories. Stories are about empathy. Games are about agency. We identify with characters in a story, we feel with them and for them. Games are about personal accomplishment, team accomplishment, pride, and mastery. It’s just different. 
    • Another cool observation is the inversion of game building and science. Science is about understanding the world by building explanatory models. Game building is about inventing a model and then bringing to life the world entailed by the model. A game player is like a scientist wondering the world trying to discover the designer's model. If that doesn't strike a religious note—give it some more thought.

  • Nice 2018 Year in Review from the Cloudcast podcast. You can't be a public cloud expert anymore. It's become so big you have to speciailize—again. Open source licensing is changing so big companies can't just take work and profit without giving back—about time.

  • It's not often a npm event-stream dependency attack inspires an 8,000 word postmortem. STAMPing on event-stream. The irony is how software entropy provides endless opportunity for bad guys to bad things using good works.
    • While the analysis is instructive, what you might find more useful is its use of STAMP (System-Theoretic Accident Model & Processes) as an analysis framework.
    • The goal of a STAMP-based analysis is to determine why the events occurred… and to identify the changes that could prevent them and similar events in the future. 
    • Engineering a Safer World (free book): A new approach to safety, based on systems thinking, that is more effective, less costly, and easier to use than current techniques. Arguing that traditional models of causality are inadequate, Leveson presents a new, extended model of causation STAMP, then shows how the new model can be used to create techniques for system safety engineering, including accident analysis, hazard analysis, system design, safety in operations, and management of safety-critical systems. 
    • event-stream is a js library that provides an event stream utility for JavaScript libraries. Almost 4,000 packages used it or a dependent package. 
    • While very popular, it was abandoned by its creator. In September 2018, Dominic Tarr was contacted by “right9ctrl” who offered to take over maintenance of the package. Once Tarr signed over the access rights, right9ctrl added a malicious dependency to event-stream that, when included as a dependency of the Copay wallet, would steal the user’s private keys. 
    • There is one thing npm could have done here: it could have alerted people that the maintainer had changed.
    • Leveson calls this multiple controllers, or boundary error: there are multiple different groups that could be responsible for auditing, but not a group that is responsible. 
    • Why was a single dependency, four layers deep, able to steal everybody’s bitcoin wallets? In JavaScript, PoLP is entirely by convention. All functions have access to XmlHttpRequest, any script can dynamically load any module, anything can write to an existing object’s prototype. 
    • Auditing is a waste of time. you probably won’t find the attack even if you were auditing it. It was pretty well hidden
    • Leveson calls this model drift: The existing rules were ideal for the system in the past, but the system itself has changed. Copay was doing something that made sense in the original context of JavaScript. In the Electron context, though, blindly using minified dependencies is a performance hit and security vulnerability.
    • Leveson considers local optimization: library authors have pressing business needs- increase rate of feature development, reduce the bugginess of their packages, and reduce the size of client scripts. Trying to locally meet these needs leads to a greater client attack surface and so less system safety.
    • npm doesn’t prioritize addressing security issues in third-party packages
    • No way to restrict JavaScript module privileges

  • Why the cloud must migrate to the edge...You bought a $15,000 cloud based coffee maker. What happens when they shut down? You now have a coffee maker with all the functionality of a modern art piece. Alpha Dominche Shuts Down: Is Commercial Coffee Tech Dead?  “At this point it is unclear how long the server will remain operational. Do NOT log out of your account on the tablet. If the server is no longer operational you will not be able to log back in. If you’re unable to log in you will be unable to operate the machine.” Alasdair Allan makes a good case why subscriptions are the way we need to buy things if we want sustainable businesses. Of course, putting your only control plane in the cloud for a local device is just really bad Lock-in Oriented Design.

  • They key to saving money on AWS is trying to save money on AWS. How Battlehouse saved $60,000 a year on AWS. Serves over 5 million accounts with non-stop battle action on Amazon AWS. Spending was split 40% on EC2 instances; 30% on network bandwidth and the CloudFront CDN; 10% each on S3 storage, RDS instances, and miscellaneous other services. By moving to 12 month reserved instances they saved 50%. 12 month instances were more flexible. They served game assets like images and music through Amazon’s CloudFront CDN which threatened to become the single largest piece of our cloud spend. They moved to CloudFlare which was cheaper, cost less, performed better, and provided DDoS protection. For each of their 6 games they had a pod structure where each game had it's own separate stack. This lead to underutilization of some pods. They moved to a single stack for all games (except for game servers). This reduced the number of instances and load balancer needed. Hot data is stored in MongoDB and RDS. Cold data is stored in S3. S3 is cheap, but since the amount of data added daily is proportional to the number of active players, they run the risk of total storage scaling as O(N²) if the player base grows linearly over time. So they selectively prune archived data and keep only a month of backups. 

  • Queues are to software architecture as concrete was to Roman architecture. While there's not a lot of specifics in Achieving Resiliency With Queues: Building A System That Never Skips A Beat In A Billion, you get a good feel for the benefits. There's more queue stuff in the Facebook entry below. You can use queues to ensure consistency by retrying operations until they succeed. You can pause queues when a resource goes down so the work to be done is not lost. You can prioritize work within a queue or dynamically add new higher priority queues. You can queue on the device so work is not lost if the queue in the cloud fails. 

  • The ecosystem war continues. More work being done in the proactive cloudHunches is an optional Alexa feature that alerts you when one of your connected smart home devices isn't in its usual state.

  • The GDS Way documents the specific technology, tools and processes that GDS teams use to build and operate services. Lots of experience there.

  • Christopher Rolland, an analyst with Susquehanna International Group, listed his key semi trends for 2019: Theme #1: Extended lead-times and tariff uncertainty; Theme #2: No handset growth; Theme #3: Automotive headwinds; Theme #4: PC woes and a server slowdown?; Theme #5: AI rocks; Theme #6: M&A slows but who are the takeover targets? 

  • Formal methods are not yet mainstream, but if you're interested in what it might look like, Murat uses TLA+ (Temporal Logic of Actions) to model the infamous  two-phase commit protocol. The benefit of this mind numbing step is finding holes in your protocol design before writing the code that will add even more bugs.

  • Graceful degradation is How Facebook Keeps Messenger From Crashing on New Year's Eve: one data center struggled with the volume of incoming messages, so the team directed traffic away from that center to another one...Messenger team has developed other levers that it can pull “if things get really bad." Every new message sent to a server goes into a queue as part of a service called Iris. There, messages are assigned a timeout—a period of time after which, that message will drop out of the queue to make room for new messages. During a high-volume event, this allows the team to quickly discard certain types of messages, such as read receipts, to focus its resources on delivering ones that users have composed...We set up our systems so that if it comes to that, they start shedding the lowest-priority traffic...the group can also sacrifice the accuracy of the green dot displayed in the Messenger app that indicates a friend is currently online...Rather than having your service dying on the floor and no one using it, you make it a little less awesome and people can still use it...a scheduler, allows the system to “batch” similar messages together...You can bundle some of those together into a single large request before you send it downstream. Doing that, you reduce the computational load on downstream systems...Batches are formed based on a principle called affinity, which can be derived from a variety of characteristics. For example, two messages may have higher affinity if they are traveling to the same recipient, or require similar resources from the back end...as traffic increases, the Messenger team can have the system batch more aggressively

  • True to it's name, this is A comprehensive look back at front-end in 2018. Every year Javascript becomes more of a real programming language...but unlike Pinocchio, it will never get there. 

  • A good set of Key Takeaway Points and Lessons Learned from QCon San Francisco 2018

  • How Azure SQL DB Hyperscale Works: In Hyperscale, Microsoft [creates multiple data files, and put each data file on a separate volume] for you automatically, but in a spiffy way: when you see a data file, that’s actually a different page server. When the last page server is ~80% full, they’re adding another page server for you, and it appears as a new data file in your database. (Technically, it’s two page servers for redundancy.)...Both AWS Aurora and Microsoft Azure SQL DB Hyperscale take the same approach here, offloading deletes/updates/inserts to a log service, and then letting the log service change the data file directly. 

  • We argue that learned components can fully replace core components of a database. SageDB: A Learned Database System: With SageDB we present a vision towards a new type of a data processing system, one which highly specializes to an application through code synthesis and machine learning. By modeling the data distribution, workload, and hardware, SageDB learns the structure of the data and optimal access methods and query plans. These learned models are deeply embedded, through code synthesis, in essentially every component of the database. As such, SageDB presents radical departure from the way database systems are currently developed, raising a host of new problems in databases, machine learning and programming systems

  • Optimizing UDP for content delivery: GSO, pacing and zerocopy: recent optimizations to the UDP stack that narrow this gap. At the core are optimizations long available to TCP: segmentation offload, pacing and zerocopy. We present the recently merged UDP GSO and SO_TXTIME interfaces and discuss select implementation details. We also review partial GSO as it fits in this context and discuss optimizations that are in submission: UDP zerocopy transmit with MSG_ZEROCOPY and UDP GRO.

  • Adversarial WiFi Sensing using a Single Smartphone: We identify a passive adversarial sensing attack, where bad actors using a single smartphone can silently localize and track individuals in

    their home or office from outside walls, by just listening to ambient WiFi signals. We experimentally validate this attack in 11 real-world locations, and show user tracking with high accuracy


  • At generation 3500, we see the perfect desired behaviour. The fi nal circuit is shown in Fig. 5; observe the many feedback paths. The lack of modularity in the topology is unsurprising, because there was no bias in the genetic encoding scheme in favour of this. An evolved circuit, intrinsic in silicon, entwined with physics: `Intrinsic' Hardware Evolution is the use of arti cial evolution | such as a Genetic Algorithm | to design an electronic circuit automatically, where each tness evaluation is the measurement of a circuit's performance when physically instantiated in a real recon gurable VLSI chip. This paper makes a detailed case-study of the rst such application of evolution directly to the con guration of a Field Programmable Gate Array (FPGA). Evolution is allowed to explore beyond the scope of conventional design methods, resulting in a highly ecient circuit with a richer structure and dynamics and a greater respect for the natural properties of the implementation medium than is usual. The application is a simple, but not toy, problem: a tone-discrimination task. Practical details are considered throughout.

  • Algorithms by Jeff Erickson (free electronic version): This textbook grew out of a collection of lecture notes that I wrote for various algorithms classes at the University of Illinois at Urbana-Champaign, which I have been teaching about once a year since January 1999. Spurred by changes of our undergraduate theory curriculum, I undertook a major revision of my notes in 2016; this book consists of a subset of my revised notes on the most fundamental course material, mostly reflecting the algorithmic content of our new required junior-level theory course.
« Go back