Apple releases eight small AI language models aimed at on-device use

181 points by MBCook 11 days ago

As is often the case, the article gets the difference between open source and open weights wrong.

Also, they don't seem to understand that Apple's variant of the MIT licence is GPL compatible.

> This is Apple's variant of MIT. They've added wording around patents, which is why it gets it own shortname. Apple did not give this license a name, so we've named it "Apple MIT License". It is free and GPL compatible.

https://fedoraproject.org/wiki/Licensing/Apple_MIT_License

gilgoomesh 10 days ago

The lack of distinction is confusing but Apple have released both the training source and the inference source + models/weights at the same time — so at least both are true.
- nicce 10 days ago
  
  Is the training data public as well?
  
  GeekyBear 10 days ago
  
  Yes
  > RedPajama-V2 is an open dataset for training large language models. The dataset includes over 100B text documents coming from 84 CommonCrawl snapshots and processed using the CCNet pipeline.
  https://github.com/togethercomputer/RedPajama-Data
  > Dolma Dataset: an open dataset of 3 trillion tokens from a diverse mix of web content, academic publications, code, books, and encyclopedic materials.
  https://allenai.github.io/dolma/
  
  raxxorraxor 10 days ago
  
  This time Apple could say that they didn't release it for privacy reasons and this time it would be correct for once.
whiplash451 10 days ago

The concept of open-source for a million-dollar scale LLM is not very useful, especially if you don't provide the training set as well.
Open weights with a permissive license is much more useful, especially for small and midsize companies.
- GeekyBear 10 days ago
  
  Publicly available datasets were used.
  > our release includes the complete framework for training and evaluation of the language model on publicly available datasets, including training logs, multiple checkpoints, and pre-training configurations
  https://arxiv.org/abs/2404.14619v1
- gattilorenz 10 days ago
  
  The distinction between the two terms is what is useful.
  Imagine if people only referred to open source software as "free software", and there's no distinction made whether the software is free as in beer or free as in freedom.
  
  taneq 10 days ago
  
  Maybe both are useful? There's "open source" in the FSF sense, which isn't always as useful when talking about modern neural networks (remember when papers used to be published with "open source" Python code for initializing and training the model, but no training data or weights?) Then there's "in the spirit of open source" where the weights and training data are also GPL'd. And there's the whole range in between.
  Having the training data available is nice, but for large models, having weights provided under a GPL-style license (or even a MIT-style permissive license) is far better in terms of "being able to modify the program" than having the training data that you don't have enough compute resources to use. The distinction between the two, though, is also useful.
  (I've even seen a few posters here argue that it's not really 'free software' even if everything's provided and GPL'd, if it would take more compute to re-train than they have available, which frankly I think is silly. Free-as-in-freedom software was never guaranteed to be cheap to build or run, and nobody owes you CPU time.)
  
  gattilorenz 10 days ago
  
  Of course both things are useful in practice, and unless you’re a free-as-in-freedom software purist, free-as-in-beer software is also very useful!
  But the point is exactly that the distinction matters, and conflating the terms doesn’t do either thing a favor (it also doesn’t really work well with “free software”, since the beer trope is needed to explain what you mean. “Libre” is at least non ambiguous).
  Having the training data is not useful just for retraining, but also to know what the model can reasonably answer in 0-shot, to make sure it is evaluated on things it hasn’t seen before during pretraining (e.g. winograd schemas), to estimate biases or problems in the training data… if all you have is the weights, these tasks are much harder!
  
  simion314 10 days ago
  
  >Maybe both are useful? There's "open source" in the FSF sense,
  FSF uses the term free/libre software and they do not like the vague term of open source. The problem is that in English language free also means costs zero and you can get confused with frewware.
  This is why OP is correct in pointing out wrong term usage, you should not call a GPL software Freeware or a freeware acall it "Free software", it is wrong and causes confusion,
  So for models would be great we could know if a model is actually open source, open weights with no restrictions or open weights with restrictions on top.
  
  exe34 10 days ago
  
  It's hard to read sarcasm online, but as far as I can tell, very few people realise there's a difference!
  
  stavros 10 days ago
  
  Linux is libre software. Facebook is gratis.
gattilorenz 10 days ago

Also many HN commenters don’t get the distinction between open source and open weights… not very surprising that ars technica also doesn’t
blackeyeblitzar 10 days ago

I’m shocked that Ars got this wrong and is helping Apple openwash their work. But I’m seeing this same utterly basic mistake with other tech media as well, like Venture Beat:
https://venturebeat.com/ai/apple-releases-openelm-small-open...
- gilgoomesh 10 days ago
  
  There's no open washing — Apple have released the code (both training and inference), the weights and a research paper explaining. It's about as open as you could expect from them.
  
  blackeyeblitzar 10 days ago
  
  I didn’t see training source code, but more like a promise to share insights. Am I missing something? Regardless another reason this isn’t open source is because it used a proprietary license instead of an OSI approved open source license.
- Hugsun 10 days ago
  
  I love this term, openwashing. It stumbles off the tongue.
  
  squigz 10 days ago
  
  It's invention of terms like this that make it hard for people to take some activist groups seriously...
  
  blackeyeblitzar 10 days ago
  
  Why such a negative take? Open washing is similar to green washing, a practice where companies pretend to be environmentally friendly for marketing purposes. To me the term makes immediate sense. What am I missing?
  
  mensetmanusman 10 days ago
  
  Openwashthepolice
  
  blackeyeblitzar 10 days ago
  
  Any alternative suggestions? Its similar to green washing and other such practices.

theshrike79 10 days ago

Apple is going exactly where I predicted they’d go. They’ve already had machine learning hardware on their phones for multiple generations, mostly “just” doing on-device image categorisation

Now they’re moving into more generic LLMs, most likely supercharging a fully local Siri in either this year’s or next year’s iOS release

And I'm predicting that there will be an opportunity for 3rd party developers to plug in to said LLM to provide extra data released during WWDC. So you can go "Hey Siri, what are the current news on CNN" -> CNN app provides data to Siri-language model in a standard format and it tells what's going on.

thibauts 10 days ago

The secret sauce will most likely be tight integration with Shortcuts, enabling the language interface to drive and automate existing apps.
- thethimble 10 days ago
  
  I really hope that Shortcuts gets a UX overhaul. It feels so painful to write and test new shortcuts.
  
  wseqyrku 10 days ago
  
  "play an hourly chime, oh and by the way remind me to get coffee when i'm on the way home tomorrow" no ux beats that but text/voice.
  
  theshrike79 9 days ago
  
  Shortcuts is getting to a point where I'd prefer to just write actual code instead of fighting with the stupid Scratch-style coding blocks.
  
  dlachausse 9 days ago
  
  That sounds a lot like AppleScript... https://en.wikipedia.org/wiki/AppleScript
  I wonder how hard it would be for Apple to rebuild Shortcuts around an AppleScript backend to allow power users the ability to edit the scripts directly.

buildbot 10 days ago

Huh, They used the pile - that's a pretty interesting choice for a corporate research team?

blackeyeblitzar 10 days ago

There are many variants of the pile at this point, including ones with copyrighted content removed. They are probably using one of those, like:
https://huggingface.co/datasets/monology/pile-uncopyrighted
- yunohn 10 days ago
  
  I don’t see any source for that from their paper. It just says a duplicate version of Pile with a citation linking to the original Pile paper itself.
  [15] Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, An-ish Thite, Noa Nabeshima, et al. The pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027, 2020.
- buildbot 10 days ago
  
  Oh that’s really great to know about, thank you!

ChrisArchitect 10 days ago

[dupe]

Some more discussion: https://news.ycombinator.com/item?id=40140675

winternewt 10 days ago

I hope somebody will take the time to incorporate a model like these into keyboard typing prediction soon.

wildrhythms 10 days ago

I switched from Android to iOS and back to Android because the keyboard typing predictions on iOS were so bad in comparison. It felt like I was using a phone from 10 years ago.
- opulentegg 10 days ago
  
  Did the same switch (Android -> iOS), but found SwiftKey[0] is also available on iOS. Better (or more Android:isch?) typing experience.
  They also solve the issue for us using multiple languages on a daily basis (first and second mother tongue + English), without having to cycle through all the iOS built-in keyboards just because you want to mix in a few English words/phrases.
  [0]: https://apps.apple.com/us/app/microsoft-swiftkey-ai-keyboard...
nunez 10 days ago

I thought iOS 17 was using on-device GPT-2 for keyboard predictions.
- MBCook 10 days ago
  
  It’s using something, I don’t know what. It’s a MASSIVE improvement from the continuous slow degradation that ended in 16.x.
- Toutouxc 9 days ago
  
  Dozens of languages don’t have that. My language doesn’t even have prediction, only broken autocorrect.

frabcus 10 days ago

This sounds like it is fully reproducible: "it also included reproducible training recipes that allow the weights (neural network files) to be replicated, which is unusual for a major tech company so far"

Digging into the code, is this the best definition of the datasets used? https://github.com/apple/corenet/blob/0333b1fbb29c31809663c4...

And the paper says the instruction tuning is based on UltraFeedback, this config seems to say exactly in what form: https://github.com/apple/corenet/blob/0333b1fbb29c31809663c4...

fouc 11 days ago

"Small Language Models" for on-device use. Neat.

> The eight OpenELM models come in two flavors: four as "pretrained" (basically a raw, next-token version of the model) and four as instruction-tuned (fine-tuned for instruction following, which is more ideal for developing AI assistants and chatbots)

mosselman 10 days ago

Does anyone know wether this will be on ollama? Is it a matter of someone caring enough to implement that or is there a technical limitation?

SushiHippie 10 days ago

https://github.com/ollama/ollama/issues/3910
Currently not supported in llama.cpp, if someone has the time to check out OpenELM it probably will be implemented.
ZiiS 10 days ago

The Safetensors instructions https://github.com/ollama/ollama/blob/main/docs/import.md should work.

resource_waste 10 days ago

I find it fascinating how late Apple is and how they always release some low quality products with a twist.

They really give me Nintendo vibes. They can't compete on hardware, software, etc... So they put their resources into weird things that let their fanatical users say:

"Apple has the best X"

Since no one actually cares to have the best X, its nearly useless, but... its great for marketing and post purchase rationalization.

Toutouxc 9 days ago

> They can't compete on hardware
Every iPhone generation is the most powerful phone on the market.
- resource_waste 7 days ago
  
  No aux port though, cheappie phone. Also its completely unusable for my purposes due to Apple's poor security record.
  I have important secrets.

mensetmanusman 10 days ago

If they don't fix siri in the next year, I really should drop iOS. It's sad.

Why cant I use my device to talk to GPT4 from the lock screen?

corv 10 days ago

You can map ChatGPT to the action button and talk to it from the lock screen.
littlecranky67 10 days ago

Because running GPT4 for hundreds of millions iOS users is not an easy task - especially if there is no subscription model behind it.
- mensetmanusman 10 days ago
  
  I subscribe to GPT-4, there is no reason I shouldn't be able to replace siri beyond Apple's fear of not making future profits.
  
  lm28469 10 days ago
  
  > there is no reason I shouldn't be able
  ios is a walled garden since day 1 it's hardly surprising and I doubt it'll change any time soon
  
  SSLy 10 days ago
  
  EU is working on it.

juujian 10 days ago

I would be curious how much less computationally expensive these models are. Full-blown LLMs are overkill for most of the things I do with them. Does running them affect battery life of mobile devices in a major way? This could actually end up saving a ton of electricity. (Or maybe induce even more demand...)

philjohn 10 days ago

It probably helps that Apple Silicon has dedicated die space to the Neural Engine - essentially a TPU. No good for training, great for inference.
- davedx 10 days ago
  
  I’ve been reading up on this recently but devs say ANE is kinda a pain in the ass to leverage; most OSS is using gpu instead
- anentropic 10 days ago
  
  these most likely aren't using the Neural Engine
  the ANE seemed to be optimised for small vision models like you might run on an iPhone a couple of years ago
  these will be running on the GPU
  
  smpanaro 10 days ago
  
  I bet these can all run on ANE. I’ve run gpt2-xl 1.5B on ANE [1] and WhisperKit [2] also runs larger models on it.
  The smaller ones (1.1B and below) will be usably fast and with quantization I suspect the 3B one will be as well. GPU will still be faster but power for speed is the trade-off currently.
  [1] 7 tokens/sec https://x.com/flat/status/1719696073751400637 [2] https://www.takeargmax.com/blog/whisperkit
  
  anentropic 10 days ago
  
  indeed, but probably not as written currently?
  i.e they would need converting with e.g. your work in more-ane-transformers
JKCalhoun 10 days ago

I wonder if there could be a tiered system where a "dumber" LLM fields your requests but passes them on to a smarter LLM only if it finds it confidence level below some threshold.

rubymamis 10 days ago

Did anyone try to test any of them?