Home Artists Posts Import Register

Downloads

Content

NOTES: if you get libmagic not found error then look into pip installing python-magic by reading this doc entirely:

made with langchain version 0.0.79

https://pypi.org/project/python-magic/

pip install -r requirements.txt first

then pip install python-magic separately

then pip install python-magic-bin

Start over by making a new environment with python version 3.11 and pip install everything one by one leaving unstructured to the end.

make sure to make a "data" folder in your working directory

Unfortunately I couldn't get the streamlit app to deploy to Streamlit cloud because the machine on the cloud is unable to find a version for python-magic-bin pip package. I have tried all python versions there. If anyone of you can figure this one out, please let me know! Thank you

files are for video: https://youtu.be/I3McnSQ1YnQ

Chat-Your-Data Challenge blogpost: https://blog.langchain.dev/chat-your-data-challenge/

Tutorial: ChatGPT Over Your Data blogpost: https://blog.langchain.dev/tutorial-chatgpt-over-your-data/

Chat your data github: https://github.com/hwchase17/chat-your-data

Comments

Teejay

Great videos and resources. I’ve tried the solutions presented and still can’t get the code to run. Do you know of any of any way to work around this. I new to coding

echohive42

Code here works, but problems arise when pip installing the required packages which makes it run. For example, this has persistent problems with apple m1 laptops for example. There is no one solution to all issues. Best approach would be to learn how to create custom development environments by using "venv" or "miniconda" which I am using and start over again with a new environment if pip installs fail. try different orders of pip installing things and different versions of python as well. I found out that creating a new environment with python version 3.11 and leaving the unstructured as the last pip install solved my issues. I hope this helps!

Kris Wilkinson

I'm still attempting to get this running, if I find a suitable sequence of installing dependencies I will update in here 💪.

Mark

The best way to make this work is through Docker. You can build your environment in an isolated container, install all the dependencies there, and turn it into an API. That way, your machine won't need anything installed, not even python, and it will still run the code.

echohive42

Langchain also added more info about installing unstructured. This may be helpful as well: https://langchain.readthedocs.io/en/latest/modules/document_loaders/examples/unstructured_file.html#unstructured-file-loader

Vinay Kumar Sankarapu

Where the ver andsion of the code shown in the video tutorial? The github link is for gradio and for singl file right?

Yuval Gal

Does anyone manage to run this code and can share his steps?

echohive42

I don't understand what you mean by version. But this was using langchain version 0.0.79. the github link is a starter code to get started in a chat format using cli or gradio. here is a template for streamlit as well: https://github.com/hwchase17/langchain-streamlit-template

echohive42

part 1 post for this project as a comment from a member with detailed instructions on how they got it to work by Keone. Take a look at that please as it might be helpful. Also I have made a new video on how to install Unstructured. Latest post here with a video link deals with how to install unstructured which is essential to langchain document loaders. watch that video too as it will be helpful.

echohive42

If you are new to coding I would recommend rewatching this video and the part 1 video for this along side all the instructions and comments on Patreon. Also watch how to install unstructured video and be very careful with each step. If you run into persistent errors, create a new environment and start again from scratch.

Yuval Gal

Where are the files that contains the API call to wiki API? and also all the other cool features that you show on the video?

Yuval Gal

Ohh sorry my bad, thank you!

Yuval Gal

Is this code supposed to be able to ingest audio files?

echohive42

No, As far as I know Langchain document loaders doesn't have audio to text transcription. But you can now use Whisper API from OpenAI to be able to do that. https://platform.openai.com/docs/guides/speech-to-text

chan chan

I would like to ask if it's possible for GPT to reply to a letter based on my own database content and the response direction I provide?

echohive42

It is possible if you make it do that for you. The entire documentation and the videos is so that people can build the kind of stuff they imagine. There is no one size fits all.

Vijay Betigiri

This code does not specify OpenAI model. I have changed the code to use GPT-3.5 in get_chain function. The code works fine. But it is not using SYSTEM role offered by GPT 3.5 model. How to use it instead of providing text in template variable? I suppose it will make the responses more accurate and save on the cost too.

echohive42

it is probably buried in the prompt which is somewhere in the directory structure of the pip package which is downloaded. I believe you can try and create a custom prompt for it https://langchain.readthedocs.io/en/latest/modules/prompts/examples/custom_prompt_template.html Also OpenAI says that it is better to pass crucial information in theuser message. So you can pass it in with the first user message and let it remain there for the rest of the conversation and check for it as the conversation updates and when it eventually drops from the chat history you can re insert it in. This would be a workaround and might work better.

Vijay Betigiri

Is it feasible to develop a knowledge-bot capable of processing a substantial volume of confidential documents, while ensuring that the model neither copies nor utilizes the data? Additionally, the data should remain within the customer data-center without any transmission outside of the said location.

echohive42

I don’t understand what you mean by model neither copies or utilizes. When you make a call to an llm like gpt then your data has to be transmitted to their server. But you can keep the vectorstore on premise that shouldn’t be a problem.

Vijay Betigiri

To clarify, the possible solution can be deploying the model on our own server or sending encrypted knowledge data to the model for confidential document processing. The model would not have access to the plaintext data and would only receive and process the encrypted data. The decrypted data would only be made available on the user's device for enhanced security.

Chuck Williams

what changes are needed in the code to upload PDF, CVS, Excel files or read from a SQL DB as raw datasets to be vectorized?

echohive42

Take a look at the different document loaders Langchain has. This will give you ideas: https://python.langchain.com/en/latest/modules/indexes/document_loaders.html

Jiaping Zhang

I got the error 'ValueError: Json schema does not match the Unstructured schema' when uploading the `space.json` to the streamlit app.

echohive42

Not sure why that happens. You are probably using a more recent version of unstructured. Try changing the file extension to txt. It is just a text file anyways.

francesco rig

I have the same error as @Jiaping Zhang ValueError: Json schema does not match the Unstructured schema even if I copied the content in a txt file it seems strange

echohive42

Maybe try with a dummy JSON object such as this: { 'id': 'chatcmpl-6p9XYPYSTTRi0xEviKjjilqrWU2Ve', 'object': 'chat.completion', 'created': 1677649420, 'model': 'gpt-3.5-turbo', 'usage': {'prompt_tokens': 56, 'completion_tokens': 31, 'total_tokens': 87}, 'choices': [ { 'message': { 'role': 'assistant', 'content': 'The 2020 World Series was played in Arlington, Texas at the Globe Life Field, which was the new home stadium for the Texas Rangers.'}, 'finish_reason': 'stop', 'index': 0 } ] } and see if it persists. Also check unstructured GitHub to see if other people have run into the same issue.

echohive42

Also take a look at the more recent videos dealing with chatting over documents. They might. Be more helpful as they are more up to date. You can search videos at www.echohive.live

Nicolas Blum Ferracci

Hi there, not working for me when i drop the json, i get this error : "File "C:\Users\nblum\.pyenv\pyenv-win\versions\3.11.0b4\Lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 552, in _run_script exec(code, module.__dict__) File "C:\Users\nblum\NLP_programs\Langchain_with_document_and_wikipedia\main.py", line 58, in embed_doc() File "C:\Users\nblum\NLP_programs\Langchain_with_document_and_wikipedia\ingest_data.py", line 15, in embed_doc raw_documents = loader.load() ^^^^^^^^^^^^^ File "C:\Users\nblum\.pyenv\pyenv-win\versions\3.11.0b4\Lib\site-packages\langchain\document_loaders\directory.py", line 24, in load sub_docs = UnstructuredFileLoader(str(i)).load() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\nblum\.pyenv\pyenv-win\versions\3.11.0b4\Lib\site-packages\langchain\document_loaders\unstructured.py", line 26, in load elements = partition(filename=self.file_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\nblum\.pyenv\pyenv-win\versions\3.11.0b4\Lib\site-packages\unstructured\partition\auto.py", line 276, in partition raise ValueError("" If you could indicate me what i'm doing wrong ? thanks in advance ;=)

echohive42

Ok, I figured it out. I uploaded a new requirements fie which fixes the issue. Install pip install -r requirements.txt and then separately pip install first python-magic then python-magic-bin and it should work.