Solo  当前访客:2 登录 注册

Elasticsearch之同义词搜索

Elasticsearch 使用同义词进行搜索

服务器上传同义词文件

上传文件的路径位于es安装路径的config下,新建analysis文件夹,并新建synonyms.txt文件

[root@localhost config]# pwd
/usr/local/elasticsearch/config
[root@localhost config]# ll
total 16
drwxr-xr-x. 2 root    root      26 Aug 21 04:40 analysis
-rw-r--r--. 1 elastic elastic    5 Aug 22 00:00 elasticsearch.pid
-rw-rw----. 1 elastic elastic 2941 Aug 21 04:36 elasticsearch.yml
-rw-rw----. 1 elastic elastic 2896 Mar 27  2017 jvm.options
-rw-rw----. 1 elastic elastic 3992 Feb 24  2017 log4j2.properties
drwxr-xr-x. 2 elastic elastic    6 Mar 27  2017 scripts
[root@localhost config]# cd analysis/
[root@localhost analysis]# ll
total 4
-rw-r--r--. 1 root root 322 Aug 22 00:00 synonyms.txt
[root@localhost analysis]# 

文件内容如下:

裙子,裙
西红柿,番茄
china,中国,中华人民共和国
男生,男士,man
女生,女士,women

新建索引

{
	"settings": {
		"number_of_shards": 5,
		"number_of_replicas": 1,
		"analysis": {
			"filter": {
				"word_sync": {
					"type": "synonym",
					"synonyms_path": "analysis/synonyms.txt"
				}
			},
			"analyzer": {
				"ik_sync_smart": {
					"filter": [
						"word_sync"
					],
					"type": "custom",
					"tokenizer": "ik_smart"
				}
			}
		}
	},
	"mappings": {
		"goods": {
			"_all": {
				"enabled": false
			},
			"properties": {
				"goodsName": {
					"type": "text",
					"analyzer": "ik_sync_smart",
					"search_analyzer": "ik_sync_smart"
				},
				"goodsContent": {
					"type": "text",
					"analyzer": "ik_sync_smart",
					"search_analyzer": "ik_sync_smart"
				}
			}
		}
	}
}

结果:

{
    "acknowledged": true,
    "shards_acknowledged": true
}

查看同义词是否配置成功

在浏览器访问如下地址即可

http://192.168.118.132:9200/b2b2c_goods/_analyze?analyzer=ik_sync_smart&text=西红柿

返回结果如下:

{
    "tokens": [
        {
            "token": "西红柿",
            "start_offset": 0,
            "end_offset": 3,
            "type": "CN_WORD",
            "position": 0
        },
        {
            "token": "番茄",
            "start_offset": 0,
            "end_offset": 3,
            "type": "SYNONYM",
            "position": 0
        }
    ]
}

插入数据

数据1:

http://192.168.118.132:9200/b2b2c_goods/goods/1
{
	"goodsName": "西红柿",
	"goodsContent": "新疆的西红柿"
}

数据2:

http://192.168.118.132:9200/b2b2c_goods/goods/2
{
	"goodsName": "男生",
	"goodsContent": "广州的男生"
}

用例暂只插入2条数据。

进行同义词查询

搜索1:

http://192.168.118.132:9200/b2b2c_goods/goods/_search
{
  "query": {
    "match": {
      "goodsContent": "男士"
    }
  }
}

返回结果:

{
    "took": 7,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 1,
        "max_score": 0.41258186,
        "hits": [
            {
                "_index": "b2b2c_goods",
                "_type": "goods",
                "_id": "2",
                "_score": 0.41258186,
                "_source": {
                    "goodsName": "男生",
                    "goodsContent": "广州的男生"
                }
            }
        ]
    }
}

搜索2:

http://192.168.118.132:9200/b2b2c_goods/goods/_search
{
  "query": {
    "match": {
      "goodsContent": "番茄"
    }
  }
}

返回结果:

{
    "took": 9,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 1,
        "max_score": 0.41258186,
        "hits": [
            {
                "_index": "b2b2c_goods",
                "_type": "goods",
                "_id": "1",
                "_score": 0.41258186,
                "_source": {
                    "goodsName": "西红柿",
                    "goodsContent": "新疆的西红柿"
                }
            }
        ]
    }
}

不听乱世的耳语,只过自己想要的生活

18-08-23 04:59 1992
标签:
1992
18-08-23 08:27 回复»

我所理解的是suggest是你在搜索的时候,假如你输入了“iPhone”,搜索的下拉框可能会联想到“iPhone 7”“iPhone 8”等,也就是联想的意思,同义词的话指的是2个词的意思是差不多的,或者就是指的同一个事物,只是人们习惯的叫法不同,例如某个商品标题写的是“好看的西红柿”,因为有的人可能习惯叫“西红柿”为“番茄”,当这个人在搜“番茄“的时候,我们应该将“好看的西红柿”这个给带出来。:smile: :smile:

Ahian
18-08-23 07:12 回复»

emmn .这个和 suggest 的区别在哪,没怎么研究 ES

validate
TOP