|
|
@@ -132,6 +132,24 @@
|
|
|
"device = 'cuda' if torch.cuda.is_available() else 'cpu'"
|
|
|
]
|
|
|
},
|
|
|
+ {
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "感谢Wanghaha(xufengnian-bei)的贡献,如果在下载过程中遇到网络问题,请使用下面的步骤进行处理。\n",
|
|
|
+ "\n",
|
|
|
+ "* 访问 Hugging Face 数据集页面: https://huggingface.co/datasets/code_search_net\n",
|
|
|
+ "* 在页面上找到 \"Files and versions\" 部分。\n",
|
|
|
+ "* 点击data文件夹,下载对应的python.zip\n",
|
|
|
+ "\n",
|
|
|
+ "修改对应下载文件代码:\n",
|
|
|
+ "\n",
|
|
|
+ "datasets = load_dataset('json', data_files='data/python/python/final/jsonl/train/*.jsonl.gz') # 更换为自己的目录\n",
|
|
|
+ "datasets = datasets['train'].filter(lambda x: 'apache/spark' in x['repo']) # 这里repository_name 更换为 repo\n",
|
|
|
+ "\n",
|
|
|
+ "print(datasets[8]['original_string']) # whole_func_string 更换为 original_string"
|
|
|
+ ]
|
|
|
+ },
|
|
|
{
|
|
|
"cell_type": "code",
|
|
|
"execution_count": 4,
|