10. ElasticSearchvs RDBMS
1.1.ElasticSearch와동작방식
Relational Database
ElasticSearch
Database
Index
Table
Type
Row
Document
Column
Field
Index
Analyze
Primary key
_id
Schema
Mapping
Physical partition
Shard
Logical partition
Route
Relational
Parent/Child, Nested
SQL
Query DSL
11. ElasticSearchshard replication
1.1.ElasticSearch와동작방식
POST /my_index/_settings{ "number_of_replicas":1}
POST /my_index/_settings{ "number_of_replicas":2}
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/replica-shards
12. Creating, indexing and deleting a document
1.1.ElasticSearch와동작방식
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/distrib-write.html
13. Retrieve, query and fetch a document
1.1.ElasticSearch와동작방식
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/distrib-read.html
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_query_phase.html
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_fetch_phase.html
14. 설치하기
다운로드
압축해제
1.2.설치및실행하기
실행하기
실행
테스트
Create index
Add document
Get document
Search document
16. Modeling 구성예
1.3.Modeling 하기
Indice1
Indice2
Indice3
IndiceA
IndiceB
IndiceC
Type
Parent
Type
Child
Type
Parent
Type
Child
Type
Child
Type
1 : N
1 : N
1 : N
20. 장비관점
Network bandwidth?
Disk I/O?
RAM?
CPU cores?
2.1.성능에영향을미치는요소들
문서관점
Document size?
Total index data size?
Data size increase?
Store period?
서비스관점
Analyzer?
Analyze fields?
Indexed field size?
Boosting?
Realtimeor batch?
Queries?
21. In ElasticSearchsite:
If 1 shard is too few and 1,000 shards are too many, how do I know how many shards I need?
This is a question that is impossible to answer in the general case. There are just too many variables: the hardware that you use, the size and complexity of your documents, how you index and analyze those documents, the types of queries that you run, the aggregations that you perform, how you model your data, etc., etc.
2.1.성능에영향을미치는요소들
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/capacity-planning.html
22. In ElasticSearchsite:
Fortunately, it is an easy question to answer in the specific case: yours.
1.Create a cluster consisting of a single server, with the hardware that you are considering using in production.
2.Create an index with the same settings and analyzers that you plan to use in production, but with only on primary shard and no replicas.
3.Fill it with real documents (or as close to real as you can get).
4.Run real queries and aggregations (or as close to real as you can get).
2.1.성능에영향을미치는요소들
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/capacity-planning.html
23. 운영체제관점
Increase File descriptor
Avoid swap
2.2.설정최적화
검색엔진관점
Avoid swap
Thread pool
Segment merge
Index buffer size
Storage device
Use recent version
24. Cluster restart관점
Optimize (max segments: 5)
Close index
Restart after set “disable_allocation: true”
Increase recovery limits
2.2.설정최적화
25. Modeling
Disable “_all”fields
Disable “_source” fields, so far as possible
Set right value to “_id” fields
Set false to “store” fields, so far as possible
2.3.색인최적화
30. Shards
Data 분산을위해shard 수를늘린다.
Replica shard 수를늘린다.
2.4.질의최적화
Data distribution
Use routing
Check _id
ShardId= hash(_id) % number_of_primary_shards
31. Query
항상같은node 로query hitting이되지않도록한다.
Zero hit query를줄여야한다.
Query 결과를cache 한다.
Avoid deep pagination.
Sorting : number_of_shard×(from +size)
Script 사용시_source, _field 대신doc[‘field’]를사용한다.
2.4.질의최적화
Search type
Query and fetch
Query then fetch
Count
Scan
34. ElasticSearchHadoop 활용
Big data 분석을위한도구
Snapshot & Restore 저장소
ElasticSearchHadoop plugin 도구제공
3.1.Hadoop 통합
35. Indexing
3.1.Hadoop 통합
ElasticSearch
Hadoop plugin
Read raw data
Integrate natively
Bulk indexing
Java client
application
BulkRequestBuilder
REST API
Control concurrency request
36. Indexing
ElasticSearch
Hadoop
Plugin
MapReduce
3.1.Hadoop 통합
Configuration conf= new Configuration();
…중략…
conf.set(Configuration.ES_NODES, “localhost:9200”);
conf.set(Configuration.ES_RESOURCE, “blog/post”);
…중략…
Job job= new Job(conf);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(EsOutputFormat.class);
job.setMapOutputValueClass(LinkedMapWritable.class);
job.setMapperClass(TabMapper.class);
job.setNumReduceTasks(0);
File fl= new File(“blog/post.txt”);
long splitSize= fl.length() / 3;
TextInputFormat.setMaxInputSplitSize(job, splitSize);
TextInputFormat.setMinInputSplitSize(job, 50);
booleanresult = job.waitForCompletion(true);
37. Indexing
Java
Client
Application
MapReduce
3.1.Hadoop 통합
public static void main(String[] args) throws Exception {
...중략...
settings= Connector.buildSettings(esCluster);
client= Connector.buildClient(settings, esNodes.split(","));
runBeforeConfig(esIndice);
Job job= new Job(conf);
...중략...
for ( String distJar: esDistributedCacheJars) {
DistributedCache.addFileToClassPath(
new Path(esDistributedCachePath+"/"+distJar),
job.getConfiguration());
}
...중략...
if ( "true".equalsIgnoreCase(esOptimize) ) {
runOptimize(esIndice);
} else {
runRefreshAndFlush(esIndice);
}
runAfterConfig(esIndice, replica);
}
44. ElasticSearchSQL Syntax
Create database/table
Drop database/table
Select/Insert/Upsert/Delete
Use database
Show databases/tables
Desctable
3.2.SQL on ElasticSearch