CretinDesAlpes 5 years ago

I've used tensorflow in 2016-2018 and switched to Pytorch a few months ago. My theory is that TF is used mainly because it is supported by Google, even if it is really badly designed for practitioners. In fact, TF was not meant to be public but was at first some internal tool at Google. This would explain why people at Google itself started to develop Keras or Sonnet (deepmind) at some point.

For anyone interested in research in deep learning with a python-numpy like environment, I can only recommend to switch.

  • cs702 5 years ago

    I had a similar experience at work.

    We used to be a TensorFlow shop. Last year we started playing around with Pytorch, only for R&D projects at first... and it felt like a breath of fresh air. Whereas TensorFlow always felt like it was designed and engineered from day one for massive scalability and deployment flexibility (at the expense of easy/fast development), Pytorch felt instantly like it was designed for easy/fast development by AI researchers who need to experiment and iterate through models as quickly as possible with as little hassle as possible. Almost overnight, we stopped tinkering and experimenting with TensorFlow.

    Earlier this year, after a blog post announcing the production-friendly features of Pytorch 1.0[a], we decided to switch our production systems from TensorFlow to Pytorch. So far we're happy with the decision.

    [a] https://pytorch.org/2018/05/02/road-to-1.0.html

  • me2too 5 years ago

    But why? I do use Tensorflow form the 0.x release and I'm still using it right now, migrating all the codebase to the 2.x version.

    I can do everything in both pytorch and tensorflow and when I have to define really efficient input pipelines (tf.data is a great thing), parallelize and distribute the training and export a trained model to production... With tensorflow everything is easier.

    Moreover, pytorch in 1.x will have static-graph too, exactly like Tensorflow.

    Both frameworks are converging to something really similar. I don't see a reason to switch (right now)

    • CretinDesAlpes 5 years ago

      At least 1) debugging in a numpy-like env and 2) multi GPU training on a single machine with multiple GPU (last time I was using TF it was such a nightmare that people had to use horovod, a wrapper library developed by... Uber).

    • samstave 5 years ago

      As someone on the sidelines with no explanation in either, but interest in where all this is going, can you share what exactly you do with TF or pytorch?

      What actual product(ive) things?

      • CardenB 5 years ago

        Machine learning pipelines are brittle and hard to debug as the data representation is condensed to numbers in matrices instead of expressive data structures with semantically meaningful variable names.

        To mitigate this complexity, PyTorch and Tensorflow are heavy weight machine learning frameworks that give your software a lot of structure as well as tooling to monitor the progress of training and debug your models as well as some deployment tooling.

        Any neural net based component of software will likely be developed in one of these frameworks.

      • me2too 5 years ago

        Computer Vision tasks: I train models to to object detection, image classification, semantic segmentation... export the model, load the trained model in C++ and run inference.

        Or generative models: same workflow (python definition and train. Export and use in other languages).

        I use Tensorflow basically everyday. I use pytorch only when I read a model implemented in this framework to reimplement it in Tensorflow (in order to use all the tools I developed to simplify the train to shipping to production phases)

      • noelsusman 5 years ago

        There are lots of applications, theoretically, but the primary use cases are image classification and natural language processing. It's not worth it for other problem types.

        I'm constantly surprised by how much attention it gets given how narrow the scope is. I guess a lot of people need to classify images.

      • thfuran 5 years ago

        Train neural networks for medical imaging tasks like organ segmentation or lesion detection.

  • kajecounterhack 5 years ago

    Keras wasn't developed by Google, it was adopted by TF.

    > My theory is that TF is used mainly because it is supported by Google, even if it is really badly designed for practitioners.

    This is not correct. TF is mainly used because it's designed for industrial-strength machine learning. It provides primitives with an eye to scale from the outset (because it's Google).

    It's probably true that prototyping / research was not the main audience. That's exactly why Keras was adopted, as well as features like tf.eager, to abstract away the underlying computation graphs and make it easy for people to try different things.

    Well-designed primitives / abstractions are important; Tensorflow does this well.

    • buboard 5 years ago

      for something to be "industrial" it has to be at least stable and well documented. tf doesnt feel like that.

      • kajecounterhack 5 years ago

        Care to point out some examples? (I use tf every day and it feels fine to me but am sure the tf team is curious where others see areas for improvement).

    • qnsi 5 years ago

      I thought Keras was developed by Francis Chollet, who works at Google?

      Not sure if he develops and maintains during company time, but wouldnt be surprised

      • dekhn 5 years ago

        He was hired after he made keras

        • qnsi 5 years ago

          Thanks! Sorry for misleading comment above, then.

  • sseveran 5 years ago

    I am probably an outlier (I have experience with both compiler development and lazy languages) but I found the graph model to be very natural. Much of the struggles I saw people having conceptually I had already had years before.

    The reason I bet on Tensorflow originally I admit was it was built by Google. But another really strong reason is it had a fully integrated toolchain from research to production. That toolchain has gotten much stronger with the continued investment in tf.data and tensorflow-serving. There was a binary alternative to CSVs (TFRecords). Also given some of our needs with some pretty esoteric components the tf community is actually quite deep.

    I had only minor python experience before starting with tensorflow so I literally did not want a python-numpy environment. I am still sad that the world has converged on python as the language of choice but that ship seems to have sailed.

    I will say tf.while is something I won't miss, as is tf.cond. I didn't find either really difficult, but it was more typing than was necessary. If there is something I really think should be invested in it is documentation. What should be in a SessionRunHook and what is the best way to write them? If you dig through some of google's repos that use TF estimators you find a lot of useful hooks as well as utility functions to help build them. But learning how to use them was more painful than it should have been.

    All that said I am not a researcher. I would suspect that some of my workflow overlaps with researchers as I have found myself implementing a number of papers. I also need the ability to rapidly move a model from training to production.

    • lostmsu 5 years ago

      What would you prefer to see instead of Python? My company is working on a full proprietary .NET binding for TensorFlow, and we are considering making Java version too, if there is a market for it.

      • sseveran 5 years ago

        Honestly Haskell. But on .net F# could be interesting. I don't think these bindings are useful to me in isolation. The reason I have not really gone down that path is that one would need to clone all the higher level TF code that has been written in python.

  • adamnemecek 5 years ago

    > My theory is that TF is used mainly because it is supported by Google, even if it is really badly designed for practitioners.

    Sounds like every Google developed technology. Angular, Go, Android, the list is infinite.

  • qwerty456127 5 years ago

    > My theory is that TF is used mainly because it is supported by Google, even if it is really badly designed for practitioners.

    I use TensorFlow via Keras because there are a lot of concise examples, Q&As and tutorials on it for different use cases while Pytorch looks a little bit like a magic black box you can't use before you take a course and study it as a whole. I believe this is a major point for many. Also the first Pytorch [official] tutorial I've found mentioned an nVidia GPU as a prerequisite and I don't have a GPU, just old built-in Intel Graphics with Core 2 Duo and TensorFlow seems having no problems with this.

  • solomatov 5 years ago

    Keras was an independent product

    Sonnet is based on TF

LeanderK 5 years ago

> Tensorflow 2.0 will be a major milestone for the most popular machine learning framework: lots of changes are coming, and all with the aim of making ML accessible to everyone. These changes, however, requires for the old users to completely re-learn how to use the framework

I never felt like I was not re-learning tensorflow! A constant series of breackage, deprecations, new apis etc.

  • FridgeSeal 5 years ago

    Not to mention terrible documentation.

    I think that’s a property of Google though, because there’s never been a google service for which I’ve read the documentation and not been like “I do not understand what’s going on at all” for a non-trivial amount of time.

    And all their examples for Python are just like “run this magic-code-ridden python file, congrats you did the tutorial”.

    • luckydata 5 years ago

      Ok this is very useful to read, I also felt like that lately and I just thought I was an idiot. Now maybe we are just two idiots, or the documentation could use work.

      • davmar 5 years ago

        Hello. Third idiot checking in.

      • p1esk 5 years ago

        Well, you should not start with TF documentation if you don't know how basic deep learning models work. Start with online courses

        • luckydata 5 years ago

          I wasn't talking about deep learning at all. I think most google technical docs suffer of a little bit of "and now draw the rest of the Owl" style.

        • FridgeSeal 5 years ago

          I have a degree in maths and stats, I'm familiar with the underlying theory, it's not the deep learning part I'm struggling with, it's every bit of Google documentation about how their actual software does things that is incredibly confusing.

    • franciscop 5 years ago

      I thought I was the only one! Like, I must not be getting what those smart programmers are doing.

      The breaking changes are quite well known, see Angular 1.x ~> 2.x, which is arguably a huge reason (but not the only one) why React took a big chunk of their market.

  • sebazzz 5 years ago

    You should probably employ machine learning with TensorFlow to automatically update your broken code and remove usage of deprecated APIs.

  • matt4077 5 years ago

    I feel it's a bit unfair to criticise changes in a product at the leading edge of the most dynamic field of software. It's also slightly ungrateful for such a valuable addition to OSS–and I say that even though I tend to consider this argument overused.

    But mostly I don't believe such criticism has much of chance of changing anything: The creators are already the most qualified you could have working on it, and it's unlikely they intentionally made mistakes in the early rounds of API design. Maybe it would have helped to spend a few more weeks on specifications rather than coding but, again, I wouldn't feel very comfortable second-guessing Google's project management.

    I think the most likely outcome if such complaints start to pile up would be future projects just remaining proprietary, either longer, of completely.

    • LeanderK 5 years ago

      I really don't think so. Google doesn't make keeping up with tensorflow hard because of all the wonderful innovations, but because of needless breakage. Moving of packages, renaming things. It regularly breaks not-so-old code.

      Also, I don't think I am ungreatful. It's a serious criticism of a software i use almost daily. I really don't consider tensorflow to be great or even some kind of software, but for some tasks it's the only option.

      Also, google profits a lot from releasing tensorflow into the wild.

      But, all in all, tensorflow 2.0 seems like a step in the right direction.

maaaats 5 years ago

Not trying to start a language war, but I would like the old or the new api better if it had static types. Right now it's thousands of functions that can take thousands of different objects, and I would never know what to use except the few I randomly learned in a tutorial.

  • halflings 5 years ago

    Then you'll be happy to learn about Swift for Tensorflow:

    https://www.tensorflow.org/swift/

    And no.. it's not simply a Swift wrapper for tensorflow, it's literally Tensorflow (or generally speaking: differentiable calculation graphs) built into the Swift language.

  • lostmsu 5 years ago

    My company is working on a full API .NET binding. Leave your email if you're interested in trying the first technical preview in about 2 months.

friendshaver 5 years ago

This is a great development for both new and experienced users. Most importantly, it seems to be an effort to introduce a canonical way to build a given network in Tensorflow, as opposed to the previous era where any given TF repo might implement the same network in one of several completely different ways (tf.contrib.slim, tf.layers, etc). Hopefully, defining a standard way to build models will also help accelerate the standardization of train/test scripts and data pipelines, putting to rest the era of models being deeply infected with the execution structure built around them.

Overall, it looks like Pytorch is forcing TF devs to focus more on users and usability, and I'm excited to see how they continue to spur each other's growth.

  • p1esk 5 years ago

    effort to introduce a canonical way

    I'm pretty sure there will be plenty of incompatible API changes in every 2.* release, and a completely new "canonical" way introduced in TF 3.0.

kodablah 5 years ago

> Support for more platforms and languages, and improved compatibility and parity between these components via standardization on exchange formats and alignment of APIs.

Does this mean we can get a pure C API for training and running? Right now, IIRC you can only train in Python and C++ which ignores a large part of the programming community. If not a pure C FFI API, what is the approach being taken?

  • danieldk 5 years ago

    The C API permits training and running. But you have to define and serialize the graph in Python (or C++).

    (Of course, you could also write Tensorflow protobuf directly, but that would be tedious.)

    • kodablah 5 years ago

      Pardon my naivete, so at some point when not reusing others' graphs/models, to get the most out of TF you are basically forced to use one of those two languages? With TF 2 claiming more platforms/languages, will this no longer be the case?

  • nine_k 5 years ago

    How hard would it be to provide and maintain a C++-to-pure-C API adapter? Serious question.

    • kodablah 5 years ago

      I am not sure as I haven't done it, but this has been attempted at a generic level in many languages. E.g. there are multiple cpp-to-rust bridges. That's essentially how most Qt bindings work. Most of the trouble is probably around inheritance, visibility, etc. Well, that and the "provide and maintain [...] adapter" outside of the core supporting library authors has its own upkeep challenges.

beefsack 5 years ago

Tangential, but as someone in the southern hemisphere it's frustrating when people use seasons for timelines. "Q2 2019" is immediately more understandable instead of having to do some mental gymnastics to convert it.

qnsi 5 years ago

Feels bad man. All tutorials will be outdated.

davmar 5 years ago

as if it wasn't already confusing enough. should i use tensorflow hub? TF slim? keras applications that import tensorflow?

should i still use those outdates TF slim models that are for some reason kept in models/inception?

please, tensorflow team, make this stuff easier for us non-Phds.

amelius 5 years ago

It still uses dataflow graphs :(

As a programmer, I don't want to think in terms of dataflow graphs. That is what compilers are for!

  • me2too 5 years ago

    At your contrary, I love the graph.

    However, migrating to a keras-like approach, you can work thinking about "objects with variables inside" and then the graph will be built for you by the abstraction introduced by `tf.keras.Model`.

    However, for automatic differentiation, graph is always required (as you can see from the example that uses eager)

    • amelius 5 years ago

      > At your contrary, I love the graph.

      Can you explain what you like so much about manipulating graphs, over just writing statements like z= x⋆y, where ⋆ is some tensor operation?

      Really, the graph makes me feel like I'm stacking Lego bricks with chopsticks, rather than with my bare hands ;)

      • me2too 5 years ago

        The fact that I have all the computation described in a coherent manner, in something that's agnostic to the language I'm using.

        The same description (the graph) can be taken and used in every other language. I can move a trained model to production just picking a single file (a `tf.train.SavedModel` IIRC), give it a tag and I'm ready to go, with different models with a native support for the "tagging" (hence the model versioning).

        • amelius 5 years ago

          Whether that's an advantage depends on your perspective, because what has really happened is that you have now created a new language inside the original language.

  • CMCDragonkai 5 years ago

    A very common compiler architecture for functional languages is to use a combinatory language (dataflow like graphs) as the IR, and have a more usable frontend language that has variables and binders.

breatheoften 5 years ago

I think they should really consider making a clean JavaScript/typescript api first — use that and test for awhile - then port that design to python for a 2.0 of the python api...

The worst thing about tensorflow is being locked into having to use python for graph description and training (in my opinion) ...

  • mark_l_watson 5 years ago

    Try TensorFlow.js - very nice, lots of examples included, easy to set up.

adev_ 5 years ago

By the sake of god, please try to add also a more friendly build system too. Bazel is a disaster that would almost make maven looks lightweight, npm reliable and autotools userfriendly.

  • solomatov 5 years ago

    Bazel is the best system which I have ever used.

    • adev_ 5 years ago

      Lucky you, I never met a user outside of Google that like it.

      • solomatov 5 years ago

        I am not a Google employee. There're several reasons why it worked so well for us:

        - It can take advantage of many cores which you have in your computers.

        - It's polyglot. We have several languages in our project, and mixing several build systems were a nightmare.

        - It's correct. Very rarely you really need to run clean builds.

        It has its learning curve, but once you got it, you start to love it.

        • curiousgal 5 years ago

          >it can take advantage of many cores which you have in your computers.

          Which is a bad thing when you're on a shared server. I remember it wasn't straightforward to limit the number of cores it utilizes and it even ignored the flag I passed it. I eventually resorted to waiting until the weekend when no one in the lab was running jobs.

          • maccam94 5 years ago

            Ouch. For a workaround in the future, maybe you can lower the priority of your bazel server via nice/renice?

          • laurentlb 5 years ago

            If you still have a problem, please file a bug. We can look at it.

            (I work on Bazel)

          • grandmczeb 5 years ago

            Does “--jobs N” not work?

            • jmmv 5 years ago

              Unfortunately not.

              "--jobs N" limits the parallelism during the execution phase (when Bazel executes compilers, etc.) but doesn't limit the parallelism of the analysis phase (when Bazel loads BUILD files and constructs the build graph). For the latter there is a "--loading_phase_threads N" flag.

              We are actively trying to make Bazel better use the local resources, and also clean up the current mess that the flags are in this area. See the discussion at https://groups.google.com/d/msg/bazel-dev/7pcrJY6Xo78/JDe256... for some context.

  • Judgmentality 5 years ago

    What do you dislike about bazel? Not championing it, just curious.

    • adev_ 5 years ago

      The list would be too long but out of order :

      - Using a JVM to compile mainly C++ and python, implying to deploy a JDK/JRE just for that.

      - Awful RAM consumption, compiling tensorflow takes 16GB of RAM to create a python wheel

      - Try to download the world and compile everything without allowing to specify external dependencies.

      - Compile time that are crazy long due to previous reason.

      - Invasive. Make very hard to integrate with anything else that does not build with Bazel

      - Make very hard, or sometimes, impossible to tune compiler flags

      - Just not reproducible.

      - Unstable, try to build a recent Google package with a 6 month old Bazel and good luck.

      I strongly advise you to watch this video of the last FOSDEM : https://archive.fosdem.org/2018/schedule/event/how_to_make_p...

      • haberman 5 years ago

        I strongly agree with you on the first one.

        Many of your other ones are byproducts of the fact that Bazel is primarily a build-from-source system. This has some benefits, particularly in a C++ ecosystem where binary compatibility across versions basically doesn't exist. But it also has some big drawbacks when it comes to compile times.

        I do see Bazel seems to support depending on a prebuilt .so, though I have not tried this: https://docs.bazel.build/versions/master/cpp-use-cases.html#...

        > Invasive. Make very hard to integrate with anything else that does not build with Bazel

        I think your main options here are:

        1. prebuild the other projects, then depend on the .so from Bazel (https://docs.bazel.build/versions/master/cpp-use-cases.html#...)

        2. write a BUILD file for the external project. Here is an example of a project that builds a bunch of non-Bazel deps by writing Bazel BUILD files for each of them: https://github.com/googlecartographer/cartographer/tree/mast...

        > Make very hard, or sometimes, impossible to tune compiler flags

        "bazel --copt=<your options here> :target"?

        > Just not reproducible.

        What isn't reproducible? I tend to think of reproducibility as a strength of Bazel. Because all of your dependencies are explicit and Bazel fetches a known version, the build is less dependent on your system environment and whatever you happen to have installed there.

        Disclosure: I am a Googler. I have some gripes with Bazel, but overall I think it gets some important ideas right. You have a BUILD file that is declarative, then any imperative code you need goes into separate .bzl files to define the rules you need.

        • adev_ 5 years ago

          > Many of your other ones are byproducts of the fact that Bazel is primarily a build-from-source system. This has some benefits, particularly in a C++ ecosystem where binary compatibility across versions basically doesn't exist. But it also has some big drawbacks when it comes to compile times.

          Nix, Guix and Spack packager managers solved the C++ ABI issue a long time ago already without the crazy needs of Bazel in term of resource consumption, integration and compile time. They even supports binary distributions for some of them.

          > I think your main options here are: > 1. prebuild the other projects, then depend on the .so from Bazel (https://docs.bazel.build/versions/master/cpp-use-cases.html#...) > 2. write a BUILD file for the external project. Here is an example of a project that builds a bunch of non-Bazel deps by writing Bazel BUILD files for each of them: https://github.com/googlecartographer/cartographer/tree/mast....

          I know that. But all of them are terrible options. I do not want to depend of SQLite, OpenSSL, libxml or whatever other system library compiled by Bazel, nor I want Bazel to takes 45 minutes to recompile them. Additionally, this will cause diamond dependency problem with other softwares that use Bazel artefact without compiling with Bazel.

          > "bazel --copt=<your options here> :target"?

          Can I use that to specify a flag to some target and not some other ? Without having to build sequentially each of them ?

          Concrete example of the Maddness: SQLITE will not compile if you enable some options that would make tensorflow faster.... Bazel recursively compile both.

          > What isn't reproducible? I tend to think of reproducibility as a strength of Bazel. Because all of your dependencies are explicit and Bazel fetches a known version, the build is less dependent on your system environment and whatever you happen to have installed there.

          Bazel try to build in isolated environment but do half of the job. It still depends on system compiler, and do not chroot nor "compiler-wrap" ( c.f Spack ) making the build still very vulnerable of system side effect and update.

          > Disclosure: I am a Googler. I have some gripes with Bazel, but overall I think it gets some important ideas right. You have a BUILD file that is declarative, then any imperative code you need goes into separate .bzl files to define the rules you need.

          I can understand that Bazel is very convenient in Google environment with Google resources. But it's a nightmare for everyone I talked to outside of Google.

          • haberman 5 years ago

            > Nix, Guix and Spack packager managers solved the C++ ABI issue a long time ago

            Yes this is certainly something that is solvable at the package manager level also. And this approach will certainly have shorter compile times.

            I agree it would be nice if Bazel integrated with package managers like this more easily. I hope Bazel adds support for this. There is a trade-off though: with less control over the specific version of your dependencies, there is a greater risk of build failure or bugs arising from an untested configuration. Basically this approach outsources some of the testing and bugfixing from the authors to the packagers. But it's a trade-off I know many people are willing to make.

            > Can I use that to specify a flag to some target and not some other ? Without having to build sequentially each of them ?

            You can put copts=["<opt>"] in the cc_library() rules in the BUILD file. This will give per-target granularity. You can add a select() based on compilation_mode if you need to define opt-only flags: https://docs.bazel.build/versions/master/configurable-attrib...

            > Bazel try to build in isolated environment but do half of the job. It still depends on system compiler, and do not chroot nor "compiler-wrap" ( c.f Spack ) making the build still very vulnerable of system side effect and update.

            Bazel allows you to define your own toolchain. This can support cross-compiling and isolating the toolchain I believe, though I don't have any direct experience with this: https://docs.bazel.build/versions/master/toolchains.html

            > I can understand that Bazel is very convenient in Google environment with Google resources. But it's a nightmare for everyone I talked to outside of Google.

            I hear you and I hope that we see some improvements to integrate better with package managers.

            FWIW, I have been experimenting with auto-generating CMake from my Bazel BUILD file for my project https://github.com/google/upb. My plan is to use Bazel for development but have the CMake build be a fully supported option also for users who don't want to touch Bazel.

            • adev_ 5 years ago

              > FWIW, I have been experimenting with auto-generating CMake from my Bazel BUILD file for my project https://github.com/google/upb. My plan is to use Bazel for development but have the CMake build be a fully supported option also for users who don't want to touch Bazel.

              Thank you for your answer.

              I really wish the auto-generation of CMake build scripts will solve the problem(s).

              But even that I have doubts. The way Bazel works will make very hard to auto-generate "standard modern" CMake that allows external dependencies. This is orthogonal to the way Bazel works.

    • Q6T46nT668w6i3m 5 years ago

      I’m not the OP (but I share their annoyance), I use a dozen of build systems but TensorFlow is the only project I use that uses Bazel.

      • solomatov 5 years ago

        Having experience with bazel, I wouldn't say that tensorflow is the best possible use of bazel. You can use of it in other project, for example, in the bazel itself: https://github.com/bazelbuild/bazel

xvilka 5 years ago

Hopefully this time it will support AMD opensource drivers. It is awful TF now tied up to proprietary NVIDIA drivers.

tmulc18 5 years ago

2.5 years of learning tensorflow now gone to waste :(