TL;DR
若只想看結論,請直接跳到 總結 。請注意本文所述內容只確保版本 0.3.2 可以用,沒有測試過其他版本。
簡介
GraphRAG 是微軟釋出的一種新的 RAG (Retrieval-Augmented Generation) 技術,對比於一般的 RAG 作法是把文本切塊然後向量化存在 vector store,查詢的時候把使用者的輸入也向量化然後再去比較相似度,GraphRAG 則是從文本之中提取出資訊並建成一個 Knowledge Graph,優點是可以更好的理解不同文本之間複雜的關聯性。
直至撰文當下,官方最新的版本為 0.3.2,這是 GitHub Repository 的對應版本 0.3.2 的連結 ,本文所述內容或許其他版本也可以用,但不保證。
如果你看了官方文件的教學就會發現,0.3.2 的官方設定只支援 OpenAI 和 Azure OpenAI,你需要在 config file 裡面放進你的 API Key,這樣雖然可以用,也最簡單,但是 GraphRAG 在 Indexing 的時候需要頻繁呼叫 LLM,所以這樣會很貴,而且沒辦法用自己 fine tune 的模型。
那有沒有辦法用跑在本地的開源模型例如 Llama 3.1 呢?結論是有,但是你需要先踩一堆坑,因為官方的實作現在並不支援本地的模型,需要自己魔改一些東西。
你可能會問說為什麼不用 LangChain 裡面的 LLMGraphTransformer
呢?我只能說畢竟現在它還是 experimental 的功能,跟微軟官方的實作還是有點差距。有興趣的話可以看下面這幾個連結:
- https://python.langchain.com/v0.2/docs/how_to/graph_constructing/
- https://python.langchain.com/v0.2/docs/tutorials/graph/
先備條件
- 安裝好 Ollama 然後把它跑起來。
- 一個 LLM 模型,例如
ollama pull llama3.1:8b
。 - 一個 Text Embedding 模型,例如
ollama pull nomic-embed-text
。
踩坑過程
當我一開始打開 Get Started 頁面,滿心歡喜的發現看起來非常的短,應該很快就可以弄好。現在來看那時候的我真是太天真了…
首先這幾步都沒有問題:
pip install graphrag
mkdir -p ./ragtest/input
curl https://www.gutenberg.org/cache/epub/24022/pg24022.txt > ./ragtest/input/book.txt
python -m graphrag.index --init --root ./ragtest
接下來要編輯 settings.yaml
,第一個問題來了,model 要怎麼填呢?
首先把 .env
刪了,因為我們並不需要 API key。
然後打開 settings.yaml
,寫入以下內容:
幾個跟原本的 settings.yaml
不同的地方:
llm
api_key
: 隨便亂設,不重要,因為我們不會呼叫 API,而是查詢本地的模型,這裡我設成NONE
。model
: 設成你的本地 LLM 模型,例如llama3.1:8b
。max_tokens
: 這個數字要設成小於你的 LLM 模型可接受的最大 Token 數量。api_base
: 設成http://localhost:11434/v1
。
embeddings
llm
api_key
: 隨便亂設,不重要,因為我們不會呼叫 API,而是查詢本地的模型,這裡我設成NONE
。
model
: 設成你的本地 Text Embedding 模型,例如nomic-embed-text
。api_base
: 設成http://localhost:11434/api
。batch_max_tokens
: 這個數字要設成小於你的 Text Embedding 模型可接受的最大 Token 數量。
chunks
size
:下調成 300,沒有調的話會噴以下 error,相關 issue: https://github.com/microsoft/graphrag/issues/36209:44:07,997 datashaper.workflow.workflow ERROR Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key Traceback (most recent call last): File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/datashaper/workflow/workflow.py", line 410, in _execute_verb result = node.verb.func(**verb_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py", line 102, in cluster_graph output_df[[level_to, to]] = pd.DataFrame( ~~~~~~~~~^^^^^^^^^^^^^^^^ File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/pandas/core/frame.py", line 4299, in __setitem__ self._setitem_array(key, value) File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/pandas/core/frame.py", line 4341, in _setitem_array check_key_length(self.columns, key, value) File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/pandas/core/indexers/utils.py", line 390, in check_key_length raise ValueError("Columns must be same length as key") ValueError: Columns must be same length as key 16:25:46,215 graphrag.index.reporting.file_workflow_callbacks INFO Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key details=None 16:25:46,216 graphrag.index.run ERROR error running workflow create_base_entity_graph Traceback (most recent call last): File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/index/run.py", line 325, in run_pipeline result = await workflow.run(context, callbacks) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/datashaper/workflow/workflow.py", line 369, in run timing = await self._execute_verb(node, context, callbacks) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/datashaper/workflow/workflow.py", line 410, in _execute_verb result = node.verb.func(**verb_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py", line 102, in cluster_graph output_df[[level_to, to]] = pd.DataFrame( ~~~~~~~~~^^^^^^^^^^^^^^^^ File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/pandas/core/frame.py", line 4299, in __setitem__ self._setitem_array(key, value) File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/pandas/core/frame.py", line 4341, in _setitem_array check_key_length(self.columns, key, value) File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/pandas/core/indexers/utils.py", line 390, in check_key_length raise ValueError("Columns must be same length as key") ValueError: Columns must be same length as key
改完設定後,開心的執行 python -m graphrag.index --init --root ./ragtest
,就會發現噴了以下的 error:
09:19:54,508 datashaper.workflow.workflow ERROR Error executing verb "text_embed" in create_final_entities: 'NoneType' object is not iterable
Traceback (most recent call last):
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/datashaper/workflow/workflow.py", line 415, in _execute_verb
result = await result
^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/index/verbs/text/embed/text_embed.py", line 105, in text_embed
return await _text_embed_in_memory(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/index/verbs/text/embed/text_embed.py", line 130, in _text_embed_in_memory
result = await strategy_exec(texts, callbacks, cache, strategy_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/index/verbs/text/embed/strategies/openai.py", line 62, in run
embeddings = await _execute(llm, text_batches, ticker, semaphore)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/index/verbs/text/embed/strategies/openai.py", line 106, in _execute
results = await asyncio.gather(*futures)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/index/verbs/text/embed/strategies/openai.py", line 100, in embed
chunk_embeddings = await llm(chunk)
^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/llm/base/caching_llm.py", line 96, in __call__
result = await self._delegate(input, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/llm/base/rate_limiting_llm.py", line 177, in __call__
result, start = await execute_with_retry()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/llm/base/rate_limiting_llm.py", line 159, in execute_with_retry
async for attempt in retryer:
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/tenacity/asyncio/__init__.py", line 166, in __anext__
do = await self.iter(retry_state=self._retry_state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/tenacity/asyncio/__init__.py", line 153, in iter
result = await action(retry_state)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/tenacity/_utils.py", line 99, in inner
return call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/tenacity/__init__.py", line 398, in <lambda>
self._add_action_func(lambda rs: rs.outcome.result())
^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/concurrent/futures/_base.py", line 449, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/llm/base/rate_limiting_llm.py", line 165, in execute_with_retry
return await do_attempt(), start
^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/llm/base/rate_limiting_llm.py", line 147, in do_attempt
return await self._delegate(input, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/llm/base/base_llm.py", line 49, in __call__
return await self._invoke(input, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/llm/base/base_llm.py", line 53, in _invoke
output = await self._execute_llm(input, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/llm/openai/openai_embeddings_llm.py", line 36, in _execute_llm
embedding = await self.client.embeddings.create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/openai/resources/embeddings.py", line 237, in create
return await self._post(
^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/openai/_base_client.py", line 1816, in post
return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/openai/_base_client.py", line 1510, in request
return await self._request(
^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/openai/_base_client.py", line 1613, in _request
return await self._process_response(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/openai/_base_client.py", line 1710, in _process_response
return await api_response.parse()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/openai/_response.py", line 420, in parse
parsed = self._options.post_parser(parsed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/openai/resources/embeddings.py", line 225, in parser
for embedding in obj.data:
TypeError: 'NoneType' object is not iterable
09:19:54,533 graphrag.index.reporting.file_workflow_callbacks INFO Error executing verb "text_embed" in create_final_entities: 'NoneType' object is not iterable details=None
09:19:54,541 graphrag.index.run ERROR error running workflow create_final_entities
Traceback (most recent call last):
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/index/run.py", line 325, in run_pipeline
result = await workflow.run(context, callbacks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/datashaper/workflow/workflow.py", line 369, in run
timing = await self._execute_verb(node, context, callbacks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/datashaper/workflow/workflow.py", line 415, in _execute_verb
result = await result
^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/index/verbs/text/embed/text_embed.py", line 105, in text_embed
return await _text_embed_in_memory(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/index/verbs/text/embed/text_embed.py", line 130, in _text_embed_in_memory
result = await strategy_exec(texts, callbacks, cache, strategy_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/index/verbs/text/embed/strategies/openai.py", line 62, in run
embeddings = await _execute(llm, text_batches, ticker, semaphore)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/index/verbs/text/embed/strategies/openai.py", line 106, in _execute
results = await asyncio.gather(*futures)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/index/verbs/text/embed/strategies/openai.py", line 100, in embed
chunk_embeddings = await llm(chunk)
^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/llm/base/caching_llm.py", line 96, in __call__
result = await self._delegate(input, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/llm/base/rate_limiting_llm.py", line 177, in __call__
result, start = await execute_with_retry()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/llm/base/rate_limiting_llm.py", line 159, in execute_with_retry
async for attempt in retryer:
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/tenacity/asyncio/__init__.py", line 166, in __anext__
do = await self.iter(retry_state=self._retry_state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/tenacity/asyncio/__init__.py", line 153, in iter
result = await action(retry_state)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/tenacity/_utils.py", line 99, in inner
return call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/tenacity/__init__.py", line 398, in <lambda>
self._add_action_func(lambda rs: rs.outcome.result())
^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/concurrent/futures/_base.py", line 449, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/llm/base/rate_limiting_llm.py", line 165, in execute_with_retry
return await do_attempt(), start
^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/llm/base/rate_limiting_llm.py", line 147, in do_attempt
return await self._delegate(input, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/llm/base/base_llm.py", line 49, in __call__
return await self._invoke(input, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/llm/base/base_llm.py", line 53, in _invoke
output = await self._execute_llm(input, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/llm/openai/openai_embeddings_llm.py", line 36, in _execute_llm
embedding = await self.client.embeddings.create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/openai/resources/embeddings.py", line 237, in create
return await self._post(
^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/openai/_base_client.py", line 1816, in post
return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/openai/_base_client.py", line 1510, in request
return await self._request(
^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/openai/_base_client.py", line 1613, in _request
return await self._process_response(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/openai/_base_client.py", line 1710, in _process_response
return await api_response.parse()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/openai/_response.py", line 420, in parse
parsed = self._options.post_parser(parsed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/openai/resources/embeddings.py", line 225, in parser
for embedding in obj.data:
TypeError: 'NoneType' object is not iterable
解法參考自 https://github.com/microsoft/graphrag/issues/370#issuecomment-2211821370
留言說要把某個檔案的內容換掉,但我覺得這樣不太好,因為這樣有一天你想用套件原本的功能反而可能會出問題,所以我用外部 monkey patch 的方式,首先創一個檔案叫做 monkey_patch.py
,寫入以下內容:
|
|
記得執行 pip install ollama
。
我們下載 graphrag.index
這個 CLI 的內容,然後 monkey patch 它。
先執行
wget https://raw.githubusercontent.com/microsoft/graphrag/v0.3.2/graphrag/index/__main__.py -O index.py
然後把第 8 行的 from .cli import index_cli
換成下面的內容:
|
|
接著執行 python index.py --root ./ragtest
,終於沒有 error 了!
試著執行 Global Query
python -m graphrag.query \
--root ./ragtest \
--method global \
"What are the top themes in this story?"
發生這個 error
Traceback (most recent call last):
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/llm/openai/utils.py", line 130, in try_parse_json_object
result = json.loads(input)
^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
not expected dict type. type=<class 'str'>:
Traceback (most recent call last):
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/llm/openai/utils.py", line 130, in try_parse_json_object
result = json.loads(input)
^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
not expected dict type. type=<class 'str'>:
Traceback (most recent call last):
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/llm/openai/utils.py", line 130, in try_parse_json_object
result = json.loads(input)
^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
not expected dict type. type=<class 'str'>:
Traceback (most recent call last):
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/llm/openai/utils.py", line 130, in try_parse_json_object
result = json.loads(input)
^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
not expected dict type. type=<class 'str'>:
Traceback (most recent call last):
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/llm/openai/utils.py", line 130, in try_parse_json_object
result = json.loads(input)
^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
not expected dict type. type=<class 'str'>:
Traceback (most recent call last):
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/llm/openai/utils.py", line 130, in try_parse_json_object
result = json.loads(input)
^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
試著執行 Local Query
python -m graphrag.query \
--root ./ragtest \
--method local \
"Who is Scrooge, and what are his main relationships?"
發生這個 error
Error embedding chunk {'OpenAIEmbedding': "'NoneType' object is not iterable"}
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/query/__main__.py", line 92, in <module>
run_local_search(
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/query/cli.py", line 164, in run_local_search
response, context_data = asyncio.run(
^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/asyncio/runners.py", line 190, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/query/api.py", line 222, in local_search
result: SearchResult = await search_engine.asearch(query=query)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/query/structured_search/local_search/search.py", line 67, in asearch
context_text, context_records = self.context_builder.build_context(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/query/structured_search/local_search/mixed_context.py", line 139, in build_context
selected_entities = map_query_to_entities(
^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/query/context_builder/entity_extraction.py", line 55, in map_query_to_entities
search_results = text_embedding_vectorstore.similarity_search_by_text(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/vector_stores/lancedb.py", line 118, in similarity_search_by_text
query_embedding = text_embedder(text)
^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/query/context_builder/entity_extraction.py", line 57, in <lambda>
text_embedder=lambda t: text_embedder.embed(t),
^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/query/llm/oai/embedding.py", line 96, in embed
chunk_embeddings = np.average(chunk_embeddings, axis=0, weights=chunk_lens)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/numpy/lib/function_base.py", line 550, in average
raise ZeroDivisionError(
ZeroDivisionError: Weights sum to zero, can't be normalized
所以只好再來 monkey patch query CLI 了,相關的 issue 留言:
- https://github.com/microsoft/graphrag/issues/345#issuecomment-2230317752
- https://github.com/microsoft/graphrag/issues/575#issuecomment-2252045859
在 monkey_patch.py
裡面新增兩個 Function 如下:
|
|
執行
wget https://raw.githubusercontent.com/microsoft/graphrag/v0.3.2/graphrag/query/__main__.py -O query.py
然後把第 9 行的 from .cli import run_global_search, run_local_search
換成以下內容
|
|
然後執行 Global 和 Local Search,Local Search 看起來還不錯,Global Search 看起來不是很好,可能參數還需要再調整一下,不過至少沒有 error 了!
$ python query.py --root ./ragtest --method global "What are the top themes in this story?"
INFO: Reading settings from ragtest/settings.yaml
/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/query/indexer_adapters.py:71: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
entity_df["community"] = entity_df["community"].fillna(-1)
/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/query/indexer_adapters.py:72: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
entity_df["community"] = entity_df["community"].astype(int)
creating llm client with {'api_key': 'REDACTED,len=4', 'type': "openai_chat", 'model': 'llama3.1:8b', 'max_tokens': 8191, 'temperature': 0.0, 'top_p': 1.0, 'n': 1, 'request_timeout': 180.0, 'api_base': 'http://localhost:11434/v1', 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': True, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 25}
SUCCESS: Global Search Response:
You didn't provide a story. Please share the text, and I'll be happy to help you identify the top themes.
$ python query.py --root ./ragtest --method local "Who is Scrooge, and what are his main relationships?"
INFO: Reading settings from ragtest/settings.yaml
INFO: Vector Store Args: {}
/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/query/indexer_adapters.py:71: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
entity_df["community"] = entity_df["community"].fillna(-1)
/home/chishengliu/miniforge3/envs/graphrag/lib/python3.11/site-packages/graphrag/query/indexer_adapters.py:72: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
entity_df["community"] = entity_df["community"].astype(int)
creating llm client with {'api_key': 'REDACTED,len=4', 'type': "openai_chat", 'model': 'llama3.1:8b', 'max_tokens': 8191, 'temperature': 0.0, 'top_p': 1.0, 'n': 1, 'request_timeout': 180.0, 'api_base': 'http://localhost:11434/v1', 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': True, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 25}
creating embedding llm client with {'api_key': 'REDACTED,len=4', 'type': "openai_embedding", 'model': 'nomic-embed-text', 'max_tokens': 4000, 'temperature': 0, 'top_p': 1, 'n': 1, 'request_timeout': 180.0, 'api_base': 'http://localhost:11434/api', 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': None, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 25}
SUCCESS: Local Search Response:
A classic character!
Ebenezer Scrooge is the main protagonist of Charles Dickens' novella "A Christmas Carol". He is a miserly and bitter old man who lives in London during the Victorian era. Scrooge is known for his extreme love of money, his disdain for charity and kindness, and his general grumpiness.
Scrooge's main relationships are:
1. **Jacob Marley**: The ghost of his deceased business partner, who appears to Scrooge on Christmas Eve to warn him about the consequences of his selfish ways.
2. **Bob Cratchit**: His underpaid and overworked clerk, who is struggling to provide for his large family during the holiday season. Scrooge's treatment of Bob is particularly cruel, as he pays him a meager salary and expects him to work long hours without complaint.
3. **Tiny Tim**: The youngest son of Bob Cratchit, who suffers from illness and poverty. Scrooge's heartlessness towards Tiny Tim serves as a catalyst for his transformation in the story.
4. **His nephew, Fred**: A kind and generous young man who invites Scrooge to join him for Christmas dinner, but is rebuffed by his miserly uncle.
5. **The Ghosts of Christmas Past, Present, and Yet to Come**: Three supernatural visitations that haunt Scrooge on Christmas Eve, forcing him to confront the consequences of his actions and the possibility of a better future.
Through these relationships, Dickens explores themes of redemption, kindness, and the importance of human connection in "A Christmas Carol".
總結
本文章的程式碼都放在 這個 Gist 裡面,總結一下完整的步驟為:
- 照著官方的 Get Started 跑完
python -m graphrag.index --init --root ./ragtest
- 把
settings.yaml
換成 Gist 裡面的settings.yaml
- 執行
wget https://raw.githubusercontent.com/microsoft/graphrag/v0.3.2/graphrag/index/__main__.py -O index.py
- 執行
wget https://raw.githubusercontent.com/microsoft/graphrag/v0.3.2/graphrag/query/__main__.py -O query.py
- 下載 Gist 裡面的
graphrag_monkey_patch.py
- 把
index.py
的第 8 行換成1 2 3
from graphrag.index.cli import index_cli from graphrag_monkey_patch import patch_graphrag patch_graphrag()
- 把
query.py
的第 9 行換成1 2 3
from graphrag.query.cli import run_global_search, run_local_search from graphrag_monkey_patch import patch_graphrag patch_graphrag()
- 之後遇到要使用
python -m graphrag.index
的時候都換成用python index.py
,遇到要使用python -m graphrag.query
的時候都換成用python query.py
即可。