We want to vectorize the text as a number array because this makes it a lot easier to test similarities between our query text and the text in our pages. Vectorizing a text transforms it from a String like “Hi how are you” to. We vectorize each page using embeddings from OpenAI. This step is when our bot ingests and indexes the data. loader = UnstructuredPDFLoader("./Alice_in_Wonderland.pdf") documents = loader.load() print(f'Loaded texts') “Read” (Note: the loading takes a while and you may need to pip install a few libraries if you get any errors). Langchain has a bunch of loaders to turn rich files like PPT and Word into usable text. Now, we will use Langchain’s PDFLoader to preprocess and load our PDF into text. import os from langchain.llms import OpenAI from _answering import load_qa_chain from langchain.document_loaders import UnstructuredPDFLoader from langchain.text_splitter import CharacterTextSplitter from langchain.embeddings import OpenAIEmbeddings from langchain.vectorstores import Chroma os.environ = "" Loading Data Setupįirst, let’s get all our imports set up and set an environment variable to contain our OpenAI key. Then, we will ask it questions specifically about the book which it will answer using only the information from the text. In this tutorial, we will create an automatic SparkNotes bot based on Lewis Carroll’s Alice In Wonderland.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
May 2023
Categories |