fourier_mode 8 days ago

Not sure what is the motivation behind this library. There are already several array GPU accelerated array libraries -- PyTorch, TensorFlow, ArrayFire, it even looks like pycuda has a small array class.

  • Smerity 8 days ago

    Chainer and potentially CuPy (which was extracted from Chainer to be independent) were around before PyTorch as it served as inspiration for PyTorch. I feel like that's a good motivation for diversity in packages and ecosystems regardless of your feelings otherwise.

    Along with a colleague I used CuPy in first Chainer and then PyTorch for implementing the Quasi-Recurrent Neural Network (QRNN) which at the time was far faster than even NVIDIA's optimized cuDNN LSTM whilst getting the same (or better) performance for many tasks.

    CuPy at the time was both the easiest and most Pythonic of potential solutions for that problem - even if it did involve writing CUDA in Python strings =]

    n.b. Our use case was literally pushing state of the art in research - CuPy is even more Pythonic if you're hitting more standard use cases.


    • DanielleMolloy 8 days ago

      PyTorch is almost (or even literally?) a fork of chainer, which can be seen when comparing example code. The latter was much more stable than the former for quite some time after PyTorch gained big popularity through Facebook. We have been using chainer for a lot of published NN research projects and only recently moved to PyTorch because students complained that they feel they can't put the more popular framework on their CVs..

      I continue to have more sympathies for chainer.

      • _0ffh 7 days ago

        >PyTorch is almost (or even literally?) a fork of chainer That's funny, I would have assumed PyTorch to be, like, the python version of Torch?

        • colesbury 7 days ago

          The PyTorch tensor library was originally basically the Python version of Torch 7. It's now moving closer towards NumPy's API (and farther from Torch 7).

          Th autograd library was inspired by Chainer's design and took a lot of concepts (but not code) directly from Chainer. The neural network API is a bit of a hybrid. It's built on top of the autograd library but the layer names, implementations, and some conventions were inherited from Torch 7's NN and cuNN libraries.

          (EDIT: and the name "autograd" originates from HIPS autograd library, which I think predates Chainer)

    • Iamhisalt 8 days ago

      Thanks for the QRNN. What’s it like working for Socher?

  • marmaduke 8 days ago

    Did you see the « NumPy compatible » part of the title?

    • fourier_mode 8 days ago

      It is "highly compatible", similar statement can be made about other libs say torch tensors.

      • buildbot 8 days ago

        You can in nearly all cases, literally do: import cupy as np and have that just work, so it’s pretty compatible.

      • marmaduke 8 days ago

        Having tried to debug some issues between autograd, PyTorch & TensorFlow, I find torch & tf tensors have different enough syntax and naming that one needs to google a bit.

smoussa 8 days ago

I’m working with Numba’s CUDA API and it works well as a drop in replacement for embarrassingly parallel functions.

  • dotdi 8 days ago

    I've done a fair bit of C++11 for CUDA and I was so happy to throw everything out and switch to Numba. It has some rough edges (like incomprehensible error messages when the type inference goes wrong) but it's been a pleasure overall to work with.

    • jjoonathan 7 days ago

      I've done a fair bit of Numba CUDA and I was so happy to throw everything out and switch to C++.

      NubaCUDA gave me lots of small problems and a few big ones. The poor support for debug/perf tools and poor integration with other high-level python CUDA code (FFTs in particular) sent me packing, but the number of small problems was excessive in comparison to the size of my code. I had 5 reduced bugs at the bottom of my notebook and two paragraphs of "baggage" at the top to support a tiny little 50LoC kernel: one paragraph for the environment variables and one for patching nubacuda itself for a trivial API incompatibility that hadn't been fixed for the better part of a year. All of this for a tool that provided a diminutive subset of functionality at the intersection of both python and C. I've felt more computational freedom writing BASIC on my TI-83.

      CuPy could well have changed that equation!

      > incomprehensible error messages when the type inference goes wrong

      NumbaCUDA is truly the galaxy-brain of type checking: first it complains loudly so as to force you to provide type information, then it opts to not complain about a mismatch, and then it silently reinterpret_casts a double* to float* behind your back.

      I know it's free software and I have no right to complain, but I sure sunk a lot of time into this dead end and regret it.

      Spiffy icon though.

      • elcritch 7 days ago

        What’s the difference of NuMBA CUDA and Pytorch or similar?

        If you’re doing custom kernels you should take a look at the Julia library CuArray [1] and generic kernels [2]. I really like that I don’t have to dig into C++ and deal with all of the memory and kernel management.

        1: 2:

        • jjoonathan 7 days ago

          My impression was that pytorch focused on linear algebra / deep learning. The reason I was playing with numbacuda in the first place was because part of my problem did not fit nicely into a (dense) linear algebra framework, so numbacuda's custom kernel support seemed attractive. Does pytorch have a good low-level kernel library? Or sparse linear algebra library?

          I love Julia, but I haven't managed to convert anyone else on my team and I already spent my informal exploration budget for the GPU project on nubacuda, so JuliaGPU will have to wait for another time. I'll be sure to keep it in mind, though!

          How is the CUDA debug/perf story with Julia? Does it play nice with the nvidia tooling?

          • elcritch 7 days ago

            Ah, that makes sense. I've only dabbled a little with DNN's recently, but pytorh/tensorflow seemed very targeted toward deep learning. Generic tools seem more useful to me. What are you doing with fft's?

            I haven't dug too deep with CudaNative / Cuarray to understand the state of Julia perf debugging. Though here's one post on the topic:


            In general It's been very pleasant experimenting with gpu programming in Julia. I couldn't quite grok tensorflow code, and it's cool to just declare a Julia array and send it the GPU.

slaymaker1907 8 days ago

The problem I see with trying to emulate NumPy with a GPU accelerated version is that the communication overhead to the GPU is so high that you are losing a lot of performance as opposed to something like TensorFlow.

It takes a couple of microseconds just to start a kernel much less the time it takes to transfer data back and forth.

  • llukas 8 days ago

    You don't transfer back and forth if you use managed memory.

lp251 8 days ago

Is it possible to pass cupy data to C? Possibly by accessing the array pointer.

I've used PyCUDA for a very long time. You can use Cython/ctypes/cffi to pass PyCUDA arrays to standard C/CUDA code.

tanilama 8 days ago

How does this compare against Google's JAX?

brian_herman__ 8 days ago

Cool I wish that other libraries like tensorflow could get cudnn installed automatically like this library!

  • penagwin 8 days ago

    Yeah the CUDNN dependencies are obnoxious as heck.

  • rahimnathwani 7 days ago
    • Ragib_Zaman 7 days ago

      Their website seems to imply different:

      E.g "To install PyTorch via Anaconda, and you are using CUDA 9.0, use the following conda command". If they are shipping with CUDA perhaps that should be phrased more like "and you want to use CUDA 9.0". And of course you do indeed need your own CUDA installation if you want to build PyTorch from source yourself.

    • oarfish 7 days ago

      At least up until 2 months ago, you had to do it manually, i don't know if that changed.

      • rahimnathwani 7 days ago

        When I installed pytorch, I had already installed CUDA and CUDNN (as I needed them for tensorflow) so I have not verified what smhx said.

  • p1esk 8 days ago

    I thought recent TF versions can?

paddy_m 8 days ago

I use pandas mostly, rarely dropping down to NumPy. Are there non ML/neural network use cases where this library is meaningfully faster than numpy?

  • p1esk 7 days ago

    Anything that involves computing dot products on large matrices will be dramatically faster with cupy than numpy (depends on your graphics card of course).

ngcc_hk 8 days ago

In another thread I ask about pyiodide ... is this an answer to use cuda in browser under python?

  • zamadatix 8 days ago

    No, browsers only expose specific Web APIs and CUDA is not one. WebGL doesn't even provide access to compute shaders.

sharperguy 7 days ago

Lucky they didn't call it CumPy