by irsagent on 4/7/24, 12:41 AM with 42 comments
by reacharavindh on 4/7/24, 8:40 AM
But, the holy grail is an LLM that can successfully work on a large corpus of documents and data like slack history, huge wiki installations and answer useful questions with proper references.
I tried a few, but they don’t really hit the mark. We need the usability of a simple search engine UI with private data sources.
by NKosmatos on 4/7/24, 8:32 AM
by logro on 4/7/24, 12:12 PM
by MasterYoda on 4/7/24, 4:41 PM
My question is.
1 - Even if there is so much data that I can no longer find stuff, how much text data is needed to train an LLM to work ok? Im not after an AI that could answer general question, only an AI that should be able to answer what I already know exist in the data.
2 - I understand that the more structured the data are, the better, but how important is it when training an LLM with structured data? Does it just figuring stuff out anyways in a good way mostly?
3 - Any recommendation where to start, how to run an LLM AI locally, train on your own data?
by gavmor on 4/7/24, 6:33 PM
`text_splitter=RecursiveCharacterTextSplitter( chunk_size=8000, chunk_overlap=4000)`
Does this simple numeric chunking approach actually work? Or are more sophisticated splitting rules going to make a difference?
`vector_store_ppt=FAISS.from_documents(text_chunks_ppt, embeddings)`
So we're embedding all 8000 chars behind a single vector index. I wonder if certain documents perform better at this fidelity than others. To say nothing of missed "prompt expansion" opportunities.
by eole666 on 4/7/24, 12:10 PM
- how much ram is needed
- what CPU do you need for decent performances
- can it run on a GPU? And if it does how much vram do you need / does it work only on Nvidia?
by turnsout on 4/7/24, 12:48 PM
by PhilippGille on 4/7/24, 9:04 AM
by mdrzn on 4/9/24, 11:24 AM
by pentagrama on 4/7/24, 3:46 PM
I have:
Processor: Ryzen 5 3600
Video card: Geforce GTX 1660 TI 6Gb DDR6 (Zotac)
RAM: 16Gb DDR4 2666mhz
Any recommendations?
by bee_rider on 4/7/24, 3:04 PM