Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
Shreyas094 
posted an update 9 days ago
Post
477
Help me to upgrade my model.

Hi all, so I am a complete beginner in coding, however, with the help of Claude (similar to Matt :P) and GPT 4o have been able to develop this RAG PDF summarizer/Q&A plus a web search tool.

The application is specifically built for summarization task including summarizing a financial document, news article, resume, research document, call transcript, etc.

The space could be found here: Shreyas094/SearchGPT

The news tool simply use duckduckgo chat to generate the search results using llama 3.1 70bn model.

I want your support to fine tune the retrieval task for handling more unstructured documents.

I think changing this would change the search results somewhat, but there don't seem to be too many options to choose from.
I can give you some advice if I know how you want to enhance it.

https://huggingface.co/spaces/Shreyas094/SearchGPT/blob/main/app.py

def get_web_search_results(query: str, max_results: int = 10) -> List[Dict[str, str]]:
    try:
        results = list(DDGS().text(query, max_results=max_results))

https://pypi.org/project/duckduckgo-search/#2-text---text-search-by-duckduckgocom

·

Hi John, thanks so much for the contribution. However, I would like to implement some upgrades to my RAG setup for PDF summarization task. Currently I have not worked alot on my Vector DB creation, chunking, indexing and embeddings part. I feel working on these functions shall improve the retrieval process, especially when it comes to 100-200 pager research documents. If possible, can you provide some suggestion on that part. Thanks

Bro the (similar to Matt ) killed me XD

·

Hahaha atleast someone got it