ResearchAgent: Iterative Research Idea Generation Using LLMs

124 points by milliondreams 14 days ago

I've found where LLMs can be useful in this context is around free-associations. Because they don't really "know" about things, they regularly grasp at straws or misconstrue intended meaning. This, along with the volume of language (let's not call it knowledge) result in the LLMs occasionally bringing in a new element which can be useful.

gotts 14 days ago

Can you list some examples where free-associations from LLM were useful to you?
- pedalpete 14 days ago
  
  A lot of where I've benefited is in some marketing language. Rarely, or almost never has ChatGPT come up with something and I've thought "that's exactly what we wanted", but through iterations, it's taken me down paths I might not have found myself.
  Unfortunately, ChatGPT doesn't have a good search interface, so I can't search through older chats, but I know when I was looking at re-naming our company, it didn't come up with our new name, but it lead me down a path which did lead to our name.
  I was trying to understand a patent, and we were looking at the algorithm which was being used. ChatGPT misunderstood how the algorithm worked, but pointed to it's knowledge of a similar algorithm which worked differently, but was better suited to our purposes.
  Calling this "free-association" may be taking some liberty. Many people would consider these errors, or hallucinations, but in some ways, they do look very similar to what many would call free-association IMO.
- PeterStuer 14 days ago
  
  Long, long time ago (1999, before LLM's) I made a virtual museum exhibit creator for education. The collection explorer created a connected graph where the nodes were the works of art and the edges were based on commonalities from their textual descriptions. It used very rudimentary language technology so it 'suffered' from things like homographs. Rather than being seen as a problem, the users liked the serendipity it brought for ideation.
  I assume free but not random association could be a comparable support for ideation in research.
- bongodongobob 14 days ago
  
  Assume free-associations = hallucinations. Assume hallucinations are exactly what makes LLMs useful and your question can be rephrased as "Can you list some examples where LLMs were useful to you?"
  
  firewolf34 14 days ago
  
  Is not the purpose of a model to interpolate between two points? This is the underlying basis of "hallucinations" (when that works out /not/ in our favour) or "prediction" (when it does). So it's a matter of semantics and a bit of overuse of the term "hallucination". But the model would be useless as nothing more than a search engine if it were to just regurgitate it's training data verbatim.
  
  ec109685 14 days ago
  
  Hallucinations are lies. So not the same thing.
  
  Teleoflexuous 14 days ago
  
  For LLM to lie it would need to know the truth. That's an incredible level of anthropomorphization.
  
  malux85 14 days ago
  
  Hallucinations are not always lies, they are more like a transformation in the abstraction space.
  
  throwup238 14 days ago
  
  That is some weapons grade spin :-)
  
  littlestymaar 14 days ago
  
  All lies aren't useless, some can be insightful even when blatantly wrong in themselves (for instance: taken literally every scientific model is a lie). I can definitely see how an LLM hallucinating can helps fostering creativity (the same way psychedelics can), even if all they say is bullshit.
  
  bongodongobob 14 days ago
  
  I'm using hallucination to mean "not exactly the thing", not outright lying. So maybe the "truth" is "My socks are wet." A hallucination could be "My socks are damp."
  
  HeatrayEnjoyer 14 days ago
  
  Lies require intent. I can ask a model to lie and it will provide info it knows is inaccurate, and can provide the true statement if requested.
  Hallucinations are inaccuracies it doesn't realize are inaccurate.
robwwilliams 14 days ago

This approach is already useful in functional genomics. A common type of question requires analysis of hundreds of potentially functional sequence variants.
Hybrid LLM+ approaches are beginning to improve efficiency of ranking candidates and even proposing tests and soon I hope—higher order non-linear interactions among DNA variants.
- ssn 14 days ago
  
  I am interested in this. Can you point to a reference about the application of LLMs to sequence secreening? Thanks.
  
  robwwilliams 9 days ago
  
  Scaling if context window size has been a problem but now good potential of solutions using mamba.
  HyenaDNA is one to look at wrt DNA.
  And here are some other interesting links from Erik Garrison—a leader in pangenomics.
  https://hazyresearch.stanford.edu/blog/2023-06-29-hyena-dna
  https://github.com/instadeepai/nucleotide-transformer
  https://dl.acm.org/doi/pdf/10.1145/3535508.3545512
  https://github.com/dnbaker/bioseq
  https://huggingface.co/AIRI-Institute/gena-lm-bert-base
  https://discuss.huggingface.co/t/dna-long-sequence-tokenizat...
deegles 13 days ago

I like thinking of LLMs as "word calculators." Which I think really encapsulates how they aren't "intelligent" as the marketing would have you believe but also show how important the inputs are.

KhoomeiK 14 days ago

A group of PhD students at Stanford recently wanted to take AI/ML research ideas generated by LLMs like this and have teams of engineers execute on them at a hackathon. We were getting things prepared at AGI House SF to host the hackathon with them when we learned that the study did not pass ethical review.

I think automating science is an important research direction nonetheless.

srcreigh 14 days ago

That’s pretty wild. What was the reason behind failing ethics review?
- robbomacrae 14 days ago
  
  I'm generally a proponent of AI and LLM but to me the decision was the right one. You are tasking people with implementing an idea generated by an algorithmic model with (I'm guessing) zero oversight that might have very little training that teaches it the importance of coming up with ideas worth implementing. Some may be more useful than others so it won't be fair from an accomplishment or motivation point of view.
  Imagine you've already invested time going to this event and want to win the prize/credit but to do so you have to implement a plugin that makes webpages grayscale because of a random idea generator. Maybe some people would find that interesting but others would see it as wasting their time.
  
  jimmySixDOF 14 days ago
  
  Individual ideas can be subject to Ethical Review Board approvals and that should go for a hackathon project same as any study proposed in Academia or drug trial etc -- but to apply some wavey handed lum sum out of bounds lable just based on source seems like arbitrary opinionated overreach.
  
  golol 14 days ago
  
  As long as all participants are well-informed then there is absolutely no ethical issue...
  
  rsfern 14 days ago
  
  How do you make sure the participants are well informed? What if an idea suggested by a model turns out to be dangerous to implement, but nobody at the hackathon has quite the relevant experience to notice?
  
  naasking 14 days ago
  
  Such as?
  
  HeatrayEnjoyer 14 days ago
  
  rsfern is asking exactly that
  
  naasking 14 days ago
  
  No, I'm asking for an example of an idea that an LLM might produce that is too dangerous to implement but nobody at the hackathon has the relevant experience to notice. You can shut down any endeavour by imagining boogeymen that aren't actually real.
  
  rsfern 13 days ago
  
  I don’t think anything needs to be shut down necessarily, I’m just suggesting reasons why a reasonable ethics board might be hesitant to green light such a hackathon if it’s not clear the organizers have done their due diligence on safety
  I might be biased in terms of the safety profile, my background is materials and chemistry, and there are loads of ways you can get into trouble if you don’t really have experience in the materials and synthesis routes you’re working with
  One example I’ve heard of from my field (alloy design) is an ML model that suggested an composition high in Magnesium - perfectly reasonable if you’re interested in lightweight strong alloys, but the synthesis method was arc melting, which is a high risk for starting a metal fire if you aren’t careful because Mg has a low vapor pressure
  If you’re doing organic chemistry it’s maybe even worse because there can be all kinds of side products or runaway exothermic reactions, and if you’re doing novel chemistry it might take deep experience in the field to know of those things are likely
  All these concerns are manageable, but I think an ethics review panel would want to at least see that there is a reasonable safety review process in place before letting students try out random experiments in topic areas in which the models likely haven’t been fine tuned with safety in mind.
  
  taneq 14 days ago
  
  Surely the ideas themselves are what should be examined for ethical suitability, rather than the meta-idea of “ask an LLM for ideas”?
- CJefferson 14 days ago
  
  One obvious problem is, what if the ideas were obviously unethical?
  I would personally let this pass ethics if someone read all the generated ideas, and took personal responsibility for them passing the basic ethics rules, or got them through the ethics committee if required, exactly the same as they would their own ideas.
brigadier132 14 days ago

I don't think LLMs are the right approach for this. Coordinated science would basically be a search problem where we verify different facts using experiments and use what we learn to determine what experiment to do next.
- visarga 14 days ago
  
  When you can run experiments quickly it becomes feasible to use ML and evolutionary methods to do novel discoveries, like AlphaTensor's better matrix multiplication than Strassen, and AlphaZero's move 37, upturning centuries of game strategy.
  The paper "Evolution through Large Models" shows the way. Just use LLMs as genetic mutation operators. Evolutionary methods are great at search, LLMs are great at intuition but get stuck on their own, they combine well. https://arxiv.org/abs/2206.08896
  The interplay between LLMs and Evolutionary Algorithms, despite differing in objectives and methodologies, share a common pursuit of applicability in complex problems. Meanwhile, EA can provide an optimization framework for LLM's further enhancement under black box settings, empowering LLM with flexible global search capacities.
  Since chatGPT was first released hundreds of millions of people have been using it for assistance, and the model outputs influenced their actions, maybe even supported scientists to make new discoveries. The LLM text is filtered through people and ends up as real world consequences and discoveries that are reported in text, and get in the next training set closing the loop.
  Trillions of AI tokens per month do this slow feedback game. AI speeds up the circulation of useful information and ideas in human society, and AI feedback gets filtered by the contact with people and the real world.

barathr 14 days ago

This strikes me as similar to Cargo Cult Science.

https://calteches.library.caltech.edu/51/2/CargoCult.htm

https://metarationality.com/upgrade-your-cargo-cult

UncleOxidant 14 days ago

The ideas aren't the hard part.

tokai 14 days ago

This. Any researcher should, over a lunch, be able to generate more idea than can be tackled in a life time.
- falcor84 14 days ago
  
  The fact that a human expert can also do it doesn't mean the AI isn't valuable. Even if you just consider the monetary aspect, those few API calls would definitely be cheaper than buying the researcher lunch. But the big benefit is being able to generate those ideas immediately and autonomously every time there's new data.
  
  passwordoops 14 days ago
  
  I think what they are saying is that idea generation is not a pain point and not really worth solving. Taking ideas and making them happen... that's the hard part where an artificial agent could come in much more handy
- kordlessagain 13 days ago
  
  The number of the ideas has nothing to do with the quality of the ideas. Some ideas a gold, many aren’t.
fpgamlirfanboy 14 days ago

Tell that to PhD advisor that took credit for all my work because they were his ideas (at least so he claimed).
- passwordoops 14 days ago
  
  Unfortunately the good ones who do not steal credit are few and far between. Current incentives select for this behaviour. Not just in academia, but about everywhere.
  Go to any meeting and state the obvious fact that "any idiot can have an idea. Making it happen is the tough part" then watch how the decision makers react

SubiculumCode 14 days ago

In some fields of research, the amount of literature out there is stupendous, and with little hope of a human reading, much less understanding the whole literature. Its becoming a major problem in some fields, and I think, in some ways, approaches that can combine knowledge algorithmically are needed, perhaps llms.

wizzwizz4 14 days ago

Traditionally, that's what meta-analyses and published reviews of the literature have been for.
- SubiculumCode 14 days ago
  
  even so.

deegles 13 days ago

It would be fun to pair this with an automated lab that could run experiments and feed the results into generating the next set of ideas.

imranq 13 days ago

Check out: https://www.insitro.com/
They have an automated robotics powered research lab
geraneum 13 days ago

What would an automated lab look like?

not-chatgpt 14 days ago

Cool idea. Never gonna work. LLMs are still generative models that spits out training data, incapable of highly abstract creative tasks like research.

I still remember all the GPT-2 based startup idea generators that spits out pseudo-feasible startups.

bigyikes 14 days ago

Ignoring the “spits out training data” bit which is at best misleading, it’s interesting that you use the word “abstract” here.
I recently followed Karpathy’s GPT-from-scratch tutorial and was fascinated with how clearly you could see the models improving.
With no training, the model spits out uniformly random text. With a bit of training, the model starts generating gibberish. With further training, the model starts recognizing simple character patterns, like putting a consonant after a vowel. Then it learns syllables, and then words, and then sentences. With enough training (and data and parameters, of course) you eventually yield a model like GPT-4 that can write better code than many programmers.
It’s not always that clear cut, but you can clearly observe it moving up the chain of abstraction as the training loss decreases.
What happens when you go even bigger than GPT-4? We have every reason to believe that the models will be able to think more abstractly.
Your “never gonna work” comment flies in the face of exponential curve we find ourselves on.
- ethanwillis 14 days ago
  
  If we keep extrapolating eventually GPT will be omniscient. I really can't think of any reason why that wouldn't be the case, given the exponential curve we find ourselves on.
  
  esafak 14 days ago
  
  How do you know you're not on a logistic curve?
  Don't you think costs and the availability of training data might impose some constraints?
  
  dragonwriter 14 days ago
  
  With real world phenomena that have resource constraints anywhere, a good rule of thumb is: if it looks like an exponential curve, walks like an exponential curve, and quacks like an exponential curve, it’s definitely a logistic curve
  
  HeatrayEnjoyer 14 days ago
  
  The entire universe is training data.
  
  esafak 13 days ago
  
  It is, but we -- humans, and computers -- are limited in our ability to learn from it. We both learn more easily from structured data, like textbooks.
  
  fire_lake 14 days ago
  
  This has the form of a religious belief.
  
  mistermann 14 days ago
  
  And also non-religious belief...paradoxical!
  
  inference-lord 14 days ago
  
  I think they're being factitious?
  
  ethanwillis 14 days ago
  
  I am. And I think it says a lot about the state of things that many people think I'm being completely serious.
ramraj07 14 days ago

I have asked chat GPT to generate hypotheses on my PhD topic that I know every single piece of existing literature about and it actually threw out some very interesting ideas that do not exist out there yet (this was before they lobotomized it).
- ta988 14 days ago
  
  Did you try with the API directly? I've had great results with my own prompts, much less so with the chatgpt one.
- voxl 14 days ago
  
  > (this was before they lobotomized it)
  Of course, of course. Because god forbid anyone be able to reproduce your suggestion. Funnily enough I tried the same and have the exact opposite experience.
growthwtf 14 days ago

I think that ship has sailed, if you believe the paper (which I do).
LLMs are already super-human at some highly abstract creative tasks, including research.
There are numerous examples of LLMs solving problems that couldn't be found in the training data. They can also be improved by using reasoning methods like truth tables or causal language. See Orca from Microsoft for example.
CuriouslyC 14 days ago

they don't just spit out training data, they generalize from training data. They can look at an existing situation and suggest lines of experimentation or analysis that might lead to interesting results based on similar contexts in other sciences or previous research. They're undertrained on bleeding edge science so they're going to falter there but they can apply methodology just fine.
llm_trw 14 days ago

They just need to be better at it than humans, which is a rather low bar when you go beyond two unrelated fields.
krageon 14 days ago

When you're this confident and making blanket statements that are this unilateral, that should tell you you need to take a step back and question yourself.