@Shreyas094 on Hugging Face: "Help me to upgrade my model. Hi all, so I am a complete beginner in coding…"

Shreyas094

posted an update 9 days ago

Post

477

Help me to upgrade my model.

Hi all, so I am a complete beginner in coding, however, with the help of Claude (similar to Matt :P) and GPT 4o have been able to develop this RAG PDF summarizer/Q&A plus a web search tool.

The application is specifically built for summarization task including summarizing a financial document, news article, resume, research document, call transcript, etc.

The space could be found here: Shreyas094/SearchGPT

The news tool simply use duckduckgo chat to generate the search results using llama 3.1 70bn model.

I want your support to fine tune the retrieval task for handling more unstructured documents.

John6666

9 days ago

I think changing this would change the search results somewhat, but there don't seem to be too many options to choose from.
I can give you some advice if I know how you want to enhance it.

https://huggingface.co/spaces/Shreyas094/SearchGPT/blob/main/app.py

def get_web_search_results(query: str, max_results: int = 10) -> List[Dict[str, str]]:
    try:
        results = list(DDGS().text(query, max_results=max_results))

https://pypi.org/project/duckduckgo-search/#2-text---text-search-by-duckduckgocom

Shreyas094

9 days ago

Hi John, thanks so much for the contribution. However, I would like to implement some upgrades to my RAG setup for PDF summarization task. Currently I have not worked alot on my Vector DB creation, chunking, indexing and embeddings part. I feel working on these functions shall improve the retrieval process, especially when it comes to 100-200 pager research documents. If possible, can you provide some suggestion on that part. Thanks

nicolollo

8 days ago

Bro the (similar to Matt ) killed me XD

Shreyas094

8 days ago

Hahaha atleast someone got it

Join the conversation