elasticsearch cluster - yaokun123/php-wiki GitHub Wiki

es的集群

一、集群(cluster)

什么是集群

集群是一组具有相同cluster.name的节点集合。通过所有的节点一起保存你的全部数据并且提供联合索引和搜索功能的节点集合。每个集群有一个唯一的名称标识（通过cluster.name来设置），默认是“elasticsearch”。这个名称非常重要，因为一个节点(Node)只有设置了这个名称才能加入集群，成为集群的一部分。

集群的健康状态

集群健康有三种状态：green、yellow或red。

|green|所有主要分片和复制分片都可用|
|yellow|所有主要分片可用，但不是所有复制分片都可用|
|red|不是所有的主要分片都可用|

二、节点(Node)

节点是一个运行着的Elasticsearch的实例。一个集群是由一个或多个节点(服务器)组成的，其中有一个为主节点，这个主节点是可以通过选举产生的，主从节点是对于集群内部来说的。主节点将临时管理集群级别的一些变更，例如新建或删除索引、增加或移除节点等。
主节点不参与文档级别的变更或搜索，这意味着在流量增长的时候，该主节点不会成为集群的瓶颈。任何节点都可以成为主节点。

三、分片(shards)

es可以把一个完整的索引分成多个分片，这样的好处是可以把一个大的索引拆分成多个，分布到不同的节点上，构成分布式搜索。分片分为主分片和复制分片。主分片的数量只能在索引创建前指定，并且索引创建后不能更改。复制分片的数量可以在后期改变。
在建立索引的时候，主分片的数量就被固定下来了。实际上，这个数量决定了该索引中能够存储的最大数据量。(实际数量取决于你的数据特征和你的硬件)。
如果你不指定分片数量，那么es将会采用默认分片数（可以通过config/elasticsearch.yml 的index.number_of_shards:5来设置）。

注意：分片并不是越多越好，当一个查询来到之后，es会在所有分片上查询，最后合并分片的查询结果，就是这个合并的过程可能会消耗资源，所以合理分配主分片的数量很重要。

四、复制分片(replicas)

es可以设置多个索引的副本，副本的作用：
一是提高系统的容错性，当某个节点某个分片损坏或丢失时可以从副本中恢复。
二是提高es的查询效率，es会自动对搜索请求进行负载均衡。并且索引创建后可以随时更改。
三是es的水平扩展重要依据就是复制分片，当复制分片比较少的时候，一味的去增加节点数将会毫无意义。

创建索引文件之后，动态修改复制分片的数量
PUT /索引名称/_settings
{
   "number_of_replicas" : 2
}

五、集群的配置

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
# 配置es的集群名称，默认是elasticsearch，es会自动发现在同一网段下的es，如果在同一网段下有多个集群，就可以用这个属性来区分不同的集群。
cluster.name: yaok    #集群名称
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: master    #节点名称
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data    # 数据的默认存放路径
#
# Path to log files:
#
#path.logs: /path/to/logs    # 日志的默认存放路径
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
# 设置为true来锁住内存。因为当jvm开始swapping时es的效率会降低，所以要保证它不swap，可以把ES_MIN_MEM和ES_MAX_MEM两个环境变量设置
# 成同一个值，并且保证机器有足够的内存分配给es。同时也要允许elasticsearch的进程可以锁住内存，Linux下可以通过`ulimit-l unlimited`
# 命令。
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
#network.host: 192.168.0.1    # 当前节点的IP地址
#
# Set a custom port for HTTP:
#
#http.port: 9200    # 对外提供服务的端口
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
# 集群个节点IP地址，也可以使用els、els.shuaiguoxia.com等名称，需要各节点能够解析
#discovery.zen.ping.unicast.hosts: ["host1", "host2"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
# 为了避免脑裂，集群节点数最少为 半数+1
#discovery.zen.minimum_master_nodes:
#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true
http.cors.enabled: true
http.cors.allow-origin: "*"
  
# 指定该节点是否有资格被选举成为node，默认是true，es是默认集群中的第一台机器为master，如果这台机挂了就会重新选举master。
node.master: true


#指定该节点是否存储索引数据，默认为true。
node.data:true

# 设置默认索引分片个数，默认为5片。
index.number_of_shards:5


# 设置默认索引副本个数，默认为1个副本。
index.number_of_replicas:1

以上是主节点的配置，从节点的配置只需将cluster.name与主节点一致，node.name不一致以示区分即可！

elasticsearch采用广播的方式自动发现节点，需要等待一段时间才能发现新的节点：

详细配置参考

六、JVM配置

## JVM configuration

################################################################
## IMPORTANT: JVM heap size
################################################################
##
## You should always set the min and max JVM heap
## size to the same value. For example, to set
## the heap to 4 GB, set:
##
## -Xms4g
## -Xmx4g
##
## See https://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.html
## for more information
##
################################################################

# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space

-Xms1g    # JVM最大、最小使用内存
-Xmx1g

################################################################
## Expert settings
################################################################
##
## All settings below this section are considered
## expert settings. Don't tamper with them unless
## you understand what you are doing
##
################################################################

## GC configuration
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly

## optimizations

# pre-touch memory pages used by the JVM during initialization
-XX:+AlwaysPreTouch

## basic

# explicitly set the stack size
-Xss1m

# set to headless, just in case
-Djava.awt.headless=true

# ensure UTF-8 encoding by default (e.g. filenames)
-Dfile.encoding=UTF-8

# use our provided JNA always versus the system one
-Djna.nosys=true

# turn off a JDK optimization that throws away stack traces for common
# exceptions because stack traces are important for debugging
-XX:-OmitStackTraceInFastThrow

# flags to configure Netty
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0

# log4j 2
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true

-Djava.io.tmpdir=${ES_TMPDIR}

## heap dumps

# generate a heap dump when an allocation from the Java heap fails
# heap dumps are created in the working directory of the JVM
-XX:+HeapDumpOnOutOfMemoryError

# specify an alternative path for heap dumps; ensure the directory exists and
# has sufficient space
-XX:HeapDumpPath=data

# specify an alternative path for JVM fatal error logs
-XX:ErrorFile=logs/hs_err_pid%p.log

## JDK 8 GC logging

8:-XX:+PrintGCDetails
8:-XX:+PrintGCDateStamps
8:-XX:+PrintTenuringDistribution
8:-XX:+PrintGCApplicationStoppedTime
8:-Xloggc:logs/gc.log
8:-XX:+UseGCLogFileRotation
8:-XX:NumberOfGCLogFiles=32
8:-XX:GCLogFileSize=64m

# JDK 9+ GC logging
9-:-Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m
# due to internationalization enhancements in JDK 9 Elasticsearch need to set the provider to COMPAT otherwise
# time/date parsing will break in an incompatible way for some date patterns and locals
9-:-Djava.locale.providers=COMPAT