kev009 6 years ago

I work in the CDN industry and am also heavily involved in the FreeBSD community. I find Varnish to be kind of a tragic curiosity in that is has nice fit and finish but is pretty unsuitable for high scale workloads due to the influence of its creator phk@. Ironically on the OS he should know like the back of his hand, sched_ule(4) has fairly expensive depth searches to ensure process fairness. Doing a thread per connection is about the worst conceivable design for web acceleration. It is probably ok for low connection count high bandwidth (like chunked video streaming), but you're still going to be dropping performance on the floor vs other I/O models as you get into the 1000s of connections, and as Netflix has publicly shown that is modern reality on FreeBSD with 100gbit networking.

My understanding is that fastly has rewritten the I/O model upon more industry standard event polling worker threads. They also reworked a lot of the memory model using https://github.com/fastly/uslab. None of that I/O work is upstream and it's unlikely it ever will be because of the phk@ influence and perhaps fastly sees it as IP. So be prepared to do some deep systems work if you need high scalability out of Varnish.

Apache Traffic Server is the most ready "out of the box" caching software for rolling a CDN that I've seen. If you're going to do custom development, nginx is a good I/O and HTTP model but be prepared to roll a complete cache storage model from scratch.

  • reza_n 6 years ago

    > Doing a thread per connection is about the worst conceivable design for web acceleration.

    A better statement here would be connections are serviced by a pool of workers. Event polling is used. A single worker can and does process tens of thousands of requests a second. Workers are used to get request parallelism across multiple cores. When a request blocks, that worker will work on another request. Varnish easily beats out pure event driven software in 99perc response times because of its worker pool designed. Very puzzling that you are claiming otherwise.

    > going to be dropping performance on the floor vs other I/O models as you get into the 1000s of connections

    I have done benchmarks with over 100,000 active connections a second, with a mix of blocked, slow, fast connections, cache hits, misses, and passes. Performance is great, ~100us response times, very low CPU utilization.

    > So be prepared to do some deep systems work if you need high scalability out of Varnish.

    Not at all. Once again, I did some benchmarks not too long ago and was able to get 100Gbit/sec using a stock Varnish + stock Linux. Response time was well under 1ms per request.

    Hitting high levels of scale and performance is an IO problem, not really a CPU problem. A lot of the criticism here is misinformed and focused on CPU scheduling. Not a single mention about any of the IO optimizations used in Varnish and other competing products.

    • kev009 6 years ago

      You realize that's a fraction of what other like software can do on Linux right? I can't share our in house comparison, and they're on a modified FreeBSD vm and network stack so it's not applicable to the public anyway. Someone in the varnish org should be able to do a competitive analysis of Varnish, ATS, and Nginx in a couple days that would be enlightening and prudent use of time to guide your product development. You and phk both seem to think I have something to gain by seeing you fail; I don't and want you to succeed. Egos have to subside for progress to happen.

      • reza_n 6 years ago

        > Someone in the varnish org should be able to do a competitive analysis of Varnish, ATS, and Nginx in a couple days that would be enlightening and prudent use of time to guide your product development.

        Really? I would say most of us are comfortable with how Nginx and ATS are engineered and how the kernel works.

        If anything, I think the concern is the fact that you are working for a publicly traded CDN yet you came to this thread armed with nothing more than a load of BS and FUD. I will say this connected a few dots in my mind and possibly added some color to what others think.

        • kev009 6 years ago

          Yes, really, and no, only you and phk have shown this paranoia; most the userbase here agrees by votes. The software I'm discussing is all free open source software and anyone can test for themselves in their own environment to call bullshit. I gain and lose nothing by Varnish being anywhere on the spectrum of good and bad at any aspect of content delivery. I'd hope it would be the best it can be, I don't know what kind of person would wish otherwise. If you got the I/O model I care about right I still couldn't use it (or ATS or any other $newfossproject for reasons stated in response to another person) but I'd be pleased for the professional community. The market is supporting multiple viable open caching applications, those will trivially nibble at commercial CDNs and nothing I say or do is going to change that.. the caching software is not the hardest/most expensive part of operating a global CDN, and many companies will steadfastly solve all those problems for themselves too, and I will again be pleased..

  • dormando 6 years ago

    I worked on the fastly varnish fork. None of this is accurate. Definitely not "because of phk influence".

    Most of the details are obviously secret, but occasionally when we found something architecturally concerning I'd talk it over with phk in IRC and he would make his own fix.

    Fastly is a big complex distributed system loosely based on Varnish. There are several fantastic blog posts about Fastly's distributed architectural design, but nothing to my knowledge about anything else. Comparing single-machine scalability is completely inaccurate, mostly because you'll never know what fastly actually did or more importantly why they did it.

    In general the "big co's fork of X must be superior" is a trope I'm thoroughly tired of.

  • icebraining 6 years ago

    I don't get it, from what I can tell Varnish uses worker threads exactly as you're describing. What's the difference you're talking about?

    Regarding thousands of connections, people on the web report Varnish handling 10k connections[1] just fine after a few tweaks, and this was ten years ago. How can it suddenly be a problem?

    [1] https://varnish-cache.org/lists/pipermail/varnish-misc/2008-...

    • the-dude 6 years ago

      Funny you read it this way. I am seeing : after tuning the exact parameter OP is talking about, they are able to handle 10k at the extremes of 0% miss or 100% miss. What about 50%?

      Then they describe a setup where they have to reduce the active cores from 8 to 4 because otherwise it is extremely slow.

      • kev009 6 years ago

        Correct see i.e. https://book.varnish-software.com/4.0/chapters/Tuning.html#t... - a worker thread is used per request rather than industry standard polling loops for this kind of application. That doesn't mean you are setting up and tearing down a thread each time in varnish. Most modern data movers are based on something like epoll/kqueue/IOCP. A polling loop is more efficient because the kernel notifies a worker of only the lo/hi wat socket buffers and you only fill exactly when you need to in a tight loop. Using polling doesn't mean you give up multiple process or multiple thread workers, see nginx I/O model for a nice one [1]. With sendfile(2) and splice(2) you are only wiring up the activity to be done in your event loop, the data doesn't need to copyin/copyout from the kernel and back.

        [1] https://www.nginx.com/blog/socket-sharding-nginx-release-1-9...

        • reza_n 6 years ago

          Another huge load of BS. What is your goal here?

          > epoll/kqueue

          Here is the implementation from 2.1, back maybe around 10 years ago:

          https://github.com/varnishcache/varnish-cache/blob/2.1/bin/v...

          https://github.com/varnishcache/varnish-cache/blob/2.1/bin/v...

          > nginx thread model

          https://www.nginx.com/blog/thread-pools-boost-performance-9x...

          > With sendfile(2) and splice(2) you are only wiring up the activity to be done in your event loop

          Why you are storing your cache on disk into individual files when its "2018 and RAM is commonly measured in many hundred GB" is beyond me. But its not even worth discussing because you will just introduce more BS, FUD, and techno babble.

          • kev009 6 years ago

            I apologize for being curt here but you haven't demonstrated enough background in computer architecture for me to continue the discussion in a worthwhile way. If you don't understand how nginx workers and varnish workers differ, we operate with different levels of understanding. "babble" is what knowledge sounds like when you've demonstrated to the world in this thread you have no idea how virtual memory works. Feel free to mail me if you think there is something left to debate or want book recommendations on memory management.

            • reza_n 6 years ago

              I would prefer to keep this public just because its sometimes a good idea to have a record of things like this. I feel this wont be the last time this rears its ugly head.

              > you've demonstrated to the world in this thread you have no idea how virtual memory works

              Funny, I kind of consider myself an expert in virtual memory and I previously made no mention of virtual memory in any of my comments. I do consider virtual memory an extremely important part in getting high levels of performance from a cache. Did you confuse sendfile() and copying data from block devices into userspace with virtual memory?

              Why am I not surprised at this reply.

              • kev009 6 years ago

                Nothing to do with sendfile, you claim that TLB management is "a hardware problem" which is completely wrong, whatever you think you know about virtual memory has no foundation and anyone with a basic understanding of computer architecture can tell you that.

  • phkamp 6 years ago

    I'm not going to reply to this, because you clearly have no first hand experience with Varnish and merely parrot what negative opinions you have been able to search out, in order to smear me, for reasons you don't disclose.

    • kev009 6 years ago

      It's not about you in reality, just trying to save users some headaches. On the contrary you have no idea what I have tried and more importantly what rigorous comparisons my company has done to select the basis for our products. There are lots of choices, most of them have better trade offs. The ones I listed are also free software. It's no different than comparing/contrasting RDBMS.

      • olavgg 6 years ago

        I'm a varnish user and I have no complaints. It is well documented, easy to use and solves complex problems with VLC/VMODs. I'm not pushing nowhere near 100GB/s, but I have benchmarked up to 10GB/s which isn't a problem with default settings and just 1-2 CPU cores, this was with 8kb payloads. I'm sure you have some edge cases which requires heavy OS tuning, but I would not say it is fair to warn users against Varnish.

        With 100GB/s and very small HTTP requests you are clearly in a different group than 99.9999% of the rest and most likely also have the resources to write your own special case http cache or you could hire phk@ as a consultant.

        • kev009 6 years ago

          These are both fair points, my perspective is from high scalability and some of the public technical marketing about its scalability wound me up.

      • phkamp 6 years ago

        Yeah, right. Not at all a troll.

        • abritinthebay 6 years ago

          You’re not helping. He brought up reasonable sounding criticisms and you responded with nothing but derisive insults.

          I respect your work but you’re making yourself look bad here.

          You appear to think he’s way off the mark technically so it should be easy to address his points, no?

          • phkamp 6 years ago

            No, they're not "reasonably sounding".

            Take for instance "but you're still going to be dropping performance on the floor vs other I/O models as you get into the 1000s of connections"

            Notice how it was written in passive voice, without any references to personal or 3rd party experience, and with nebulous hand-wavy numbers which can literally mean anything ?

            Well, he had to write it that way because he could not find any hard evidence to back it up anywhere, could he ?

            Here's some "reasonably sounding" information for you: builtwith.com reports that one eighth of the 10k largest web properties run Varnish.

            They would hardly do that, if performance "dropped on the floor" with "1000s of connections", would they ?

            You know what an actual Varnish users commented about performance after deploying Varnish on one of the worlds largest news sites?

            He made jokes about the servers being so bored that they were making grilled cheese sandwiches.

            No that comment is classical trolling: Technically sounding bla-bla, based on random google results, but totally removed from the legitimate and relevant critiques actual Varnish users bring.

            • the_duke 6 years ago

              You still have not addressed any of the technical concerns raised by OP, but rather brought up some meaningless anecdotes.

              Why don't you refute the claims regarding IO models (event based vs threading) instead, engaging in a proper discussion?

              That would also prevent the downvotes you are receiving.

              • CJefferson 6 years ago

                I'm not sure what you would want to see. The original post said, without evidence "vanish is unusably slow". Plainly, many large websites are using it successfully. How would one disprove accusations of slowness, without evidence, citations or benchmarks?

              • phkamp 6 years ago

                Moving a fifth of all HTTP traffic is not a meaningless anecdote in my vocabulary.

                And engaging in a debate about irelevant technical details applicability in Varnish, with a person who clearly has never actually run Varnish ?

                I have better things to do on a saturday.

            • abritinthebay 6 years ago

              Thank you for responding. I am an actual Varnish user btw (via fastly and I use it locally for testing), I quite like it.

              While much of the original comment was anecdotal the fact Fastly has extensively worked on custom subsystems for it is not wrong, correct?

              Addressing why they had to (and the original comments speculation/accusation about that) would be helpful in clearing up any confusion.

              The OP was absolutely too aggressive in his comment, so I understand why you’re very defensive.

              • phkamp 6 years ago

                What Fastly has done starting from Varnish 2.x was what they needed to do, to do what they wanted to do: Build a global CDN business.

                There are difference between "running Varnish on your website", "running Varnish for your web-hotel customers" and "running Varnish for some of the biggest web-properties in the world" and software and organizations always mirror each other.

                For instance Fastly does not run Varnish on Solaris, FreeBSD, NetBSD or ..., they run it on whatever version on Linux they decided on, and what ever brand and model of boxes they find optimal, and therefore they their own code and changes do not need to be portable and can be tuned with laser-like precision to their hardware and kernel they use.

                Even if they threw their codebase up on github, I doubt very many organizations could or would use it, because it is chock full with interfaces to Fastlys business systems, encapsulates their network strategy, and God knows what. (Remember when Y! threw Inktomi over the fence ? Not terribly useful if you were not Y!, it took years to generalize it.)

                And you will find the same situation, no matter which other major FOSS based organization you look at, Amazon, NetFlix, the pink-bit-pushers, FaceBook, Twitter, Google ...

                You simply don't run a huge company on vanilla software, if nothing else because your geeks can not resist the temptation to improve and optimize it to ease their own jobs.

                And just to stake the "Fastly vs. Varnish project" thing through the heart at the cross-roads: Fastly is a major sponsor of the Varnish Cache Project.

          • jasonwatkinspdx 6 years ago

            Don't expect much. I've been on email lists with PHK since the 90's. He's never shown any sign of interest in behavior other than what you see above.

            • jeremiep 6 years ago

              Reading this thread reminds me of what Alan Kay said a few years ago:

              - The Internet was done so well that most people think of it as a natural resource like the Pacific Ocean, rather than something that was man-made. When was the last time a technology with a scale like that was so error-free? The Web, in comparison, is a joke. The Web was done by amateurs.

              I've been doing web dev for over 15 years and this quote is pretty much how I would describe the web development scene.

            • abritinthebay 6 years ago

              He doesn’t owe me anything: I use his work for free after all.

              I can merely try to open a dialogue about it. I can totally understand him being tired and irritated about criticism like this - running large, popular, OSS project sucks from that perspective

  • bufferoverflow 6 years ago

    We had really bizzare problems with Varnish ~6 years ago even under mild load (hundreds or requests per second). Like it would randomly let all the requests through to our origin servers, bringing them down. We struggled for months, multiple devops/sysadmins looked into it, couldn't figure it out.

    But when it worked, it was fast.

    • phkamp 6 years ago

      That sounds like the dreaded "Long TTL hit-for-pass object" issue.

      The situation happens when an object which should be cached gets hit with a request which causes 'pass' (typically because of headers tested in builtin.vcl) and then VCL (by accident) setting the long TTL for caching on the "hit-for-pass" object.

      It's an ugly corner case which we have struggled a lot with finding a solution for, which doesn't have worse side-effects than the problem we are trying to fix.

      I think it is mostly a solved problem from Varnish 5.

  • ryan_jarv 6 years ago

    Not familiar with varnish internals much but 1000 seems off, we where seeing steady performance at around 300krpm on a two node cluster iirc.

  • SEJeff 6 years ago

    Thoughts on squid? I've seen it push 40G sustained no problem (4 10G interface setup with LACP) on Linux boxes.

    • kev009 6 years ago

      Two of the earliest and biggest CDNs are based on squid. Very hacked up though. I have no direct experience with squid3.

      • SEJeff 6 years ago

        Also, 100% of the internal CDN stuff for ticketmaster.com was squid. I know because I was on the team that configured them in ~2007.

    • petre 6 years ago

      I've tried to configure squid as a load balancer, struggled, found out about varnish, gave it a try and never looked back. It has been working flawlessly from day one. We're caching geocoding and routing results from a web api, load round robin balancing between two backends.

  • ksec 6 years ago

    Slightly off topic: Why might be the reason Apache Traffic Server usage hasnt caught on?

    • kev009 6 years ago

      It has, LinkedIn's CDN, most cable companies. They list Akamai here but I'm not sure how much or what product line uses it http://trafficserver.apache.org/users.html

      • ksec 6 years ago

        Missing Limelight? Or you guys use something else?

        • kev009 6 years ago

          We have a legacy platform that has been rototilled over the past few years to work above 100g. That path was just reality to provide continuity of features and lessen the risk of overhaul over several years vs a brown/greenfield. Not advisable for newcomers.

          My original comment was not meant to be partial to any tech and targeted toward people starting new in house CDNs, I mentioned ATS as it gives you the best ready to scale platform right now and have no incentives for sharing that knowledge. If your core business is CDN the choice opens up to from scratch or bending something else. Cambridge has shown results that should go an order of magnitude beyond current software [1][2] and a low power FPGA design that I can't find at the moment that would both be a magnitude greater in efficiency than the software mentioned in this thread.

          In the mean time, using the conventional BSD stack we're closing in on 200gbps on single socket x86 hw and again hitting bus and memory bandwidth limits.

          [1] https://www.cl.cam.ac.uk/research/security/ctsrd/pdfs/201408... [2] https://www.cl.cam.ac.uk/~rnw24/papers/201708-sigcomm-diskcr...

phkamp 6 years ago

In case any of you have questions about V6, I'll keep an eye on this thread to answer them today.

Contrary to the trollish claims below, Varnish works quite nicely, and is involved in around one fifth of all HTTP traffic globally.

  • pestaa 6 years ago

    kev009 doesn't seem to argue that Varnish "doesn't work quite nicely", but that it has deep architectural limitations exposed in environments requiring high scaling.

    WordPress is involved in even more than 20% of all HTTP traffic globally, yet it is the target of frequent criticism (whether rightfully or not is besides the point). Popularity doesn't grant free pass on valid feedback.

    • kev009 6 years ago

      Yes as I said Varnish has great fit and finish. If you have a small pool of servers my comments will mean little to overarching business decisions.

      To clarify my own part, this was in the back of my mind http://varnish-cache.org/docs/trunk/phk/notes.html. It's written sanctimoniously and is wrong. I care a lot about the profession of systems programming and hate to see people misled. If calling bullshit is trolling I guess everyone is either a bullshitter or a troll.

      I contest most the points in that page. Varnish I/O model is ignorant of TLB flushing, instruction cache, VM hints like madvise(2), and the overhead (and downright difficult job) of an OS scheduler, as well as high performance networking KPIs like load balancing SO_REUSEPORT or RSS. In a production web cache there are very good reasons you might want to park some objects in memory outside the vfs (the vfs is demand paged and also carries heavy nontrivial locking). The stuff under the header "More caches" is correct, and trivially dealt with using per worker (cpu) APIs, see counter(9) in the FreeBSD kernel for a nice example.

      By the text of the page, Varnish remains 2006 software. It's now 2018 and 48 hardware threads are common on servers, RAM is commonly measured in many hundred GB, and multicore programming is beginning to mature with high quality software, libraries, books, and tooling like Intel's VTune available.

      If my points didn't have some validity, Fastly would not have done what they did. That's all I have to say. I don't know nor care about phk (and find the "smear me" comment hilariously paranoid), but a lot of the technical marketing is bullshit and these problems could be fixed if he were willing to back down and revisit the bold claims.

      • reza_n 6 years ago

        This is another misinformed rant.

        > Varnish I/O model is ignorant of TLB flushing, instruction cache, VM hints like madvise(2), and the overhead (and downright difficult job) of an OS scheduler.

        TLB flushing is a hardware problem, not a software problem. Instruction cache? Last I checked, the varnishd binary is about 2MB and uses very predictable jumping and straight function calling. Im pretty sure your going to be getting a 99% instruction cache hit rate. Madvise is used in Varnish's storage engine's, and there seems to be little interest or knowledge about how Varnish uses storage here.

        > If my points didn't have some validity, Fastly would not have done what they did.

        Can you expand on what Fastly did? They published a non locking memory allocator? So the claim is that memory allocation is the bottleneck and jemalloc is not suitable?

        Sounds like someone might be stuck in performance hell and you are trying to draw parallels to Varnish which simply don't exist!

        • kev009 6 years ago

          You don't understand TLB. It is distinctly not a hardware problem. A TLB is a content addressable memory and the mappings and flushing are done by.. software. You flush it every time you context switch, and pretty much fully all the time with KPTI now that PCID's 4096 ASIDs can't safely save anything other than the kernel and runnable process. My point about madvise protects against PHKs post claiming memory pool objects will always be paging in and out, which is false and again a memory pool is desirable for certain uses as it guarantees maximum latency and avoids very heavy locking in the VFS for high concurrency.

      • phkamp 6 years ago

        Well, if you want to "contest most the points in that page." I suggest you do so on substance, rather than by going after the person ?

        There is a difference between "being ignorant" and "ignoring".

        I don't think there is much in modern HiPerf computing I'm ignorant of (hint: Guess who's writing one of the most intense HiPerf scientific code right now), but there sure is a lot of it I'm deliberately ignoring with respect to Varnish.

        TLB flushing, instruction caching ? When your CPU is 50+% idle most of the time, that's not really what (should) worry about.

        And the "downright difficult job of an OS scheduler" isn't really that hard when all threads a waiting for I/O, wake up for a few microseconds, then go back to waiting for I/O again.

        And so on, and so forth...

        Again: Your total lack of actual experience running Varnish shows.

        You should try it some day, you might actually find it useful :-)

        • kev009 6 years ago

          You keep accusing me of logical fallacy but are using them yourself so I can't have a rational conversation with you. There is little CPU idle when you are doing TCP packet pacing, TLS in core, and edge compute. That's what I do on a 30Tbit/s CDN. I was just hoping you might reconsider the thread per connection but it appears not yet. Anyway congrats on the release and have a good day, sorry if the only takeaway you got from this is to be antagonized.

          • phkamp 6 years ago

            Thread-per-request seems even more correct to me, given the increased cost of system-calls, now that Spectre and Meltdown fixes are in.

            • jeremiep 6 years ago

              I used to do thread-per-request, that has mediocre scaling at best. Even on the JVM this barely scales to a few thousand connections; native threads are heavier than that.

              I've also done a lot of task-per-request (with each thread's affinity locked to a single hardware thread to avoid ripple effects), which does scale at least an order of magnitude more than threads.

              I now use fiber-per-request for the best of both worlds: the easy sequential model of threads, yet the simple performance of tasks.

              I can't understand why someone would defend the thread-per-request model at this point. It wasn't even a good model 10 year ago.

          • olavgg 6 years ago

            Isn't TCP packet shaping normally done on the network interface these days so you have CPU offloading?

            • kev009 6 years ago

              s/shaping/pacing/, we're trying to smooth out the bursty nature of sending from high speed links and TSO so a flow is less likely to incur large contiguous buffer drops and tail drops along its path rather than limit the flow's throughput. I have a large fleet of intel NICs on the back half of their life cycle. NIC pacers sound good in theory and I'd like to eventually partial offload but they have limitations in terms of flows and number of pacing rates so will still require a software fallback. On a timerwheel, the system overhead for no offload and a partial offload is not nearly economical as theoretical full offload. And contrary to some misinformation from one of the varnish devs in this thread, taking an SWI/context switch/scheduler overhead (these basic concepts will probably get called "babble" by the dude) comprise much of the overhead.

HugoDaniel 6 years ago

Cool, I have been using Varnish for a few years, mostly in front of wordpress sites. Amazing product, almost no config and no worries :D It has made the life of my users much better.

rcarmo 6 years ago

I’m curious. Since SSL is not supported (plenty of info about why in the docs), what are people using besides HAProxy in front of Varnish?

  • phkamp 6 years ago

    Pretty much whatever they prefer, and pretty much anything can work so take your pick.

    My personal preference is 'hitch' because I want as few lines of code involved with the certificates as possible.

    For important sites, I recommend running two different SSL implementations, so that you are not dead in the water when a bad CVE in one of them comes around.

  • petre 6 years ago

    Hitch in front of Varnish.

  • amq 6 years ago

    Nginx and Traefik.