ElasticSearch
Elastic is based on Apache Lucene, trying to provide simple RESTful interface for people to use complex Lucene. Beside that, it is also a
--key value document store where every field is indexed and searchable
--distributed search engine with real-time analytic
(Elasticsearch is a distributed document store. It can store and retrieve complex data structures—serialized as JSON documents—in real time. In other words, as soon as a document has been stored in Elasticsearch, it can be retrieved from any node in the cluster.)
--document in elastic search is data serialized in JSON. elastic search will convert data to json before storing it.
it uses structured JSON document.
(JSON is a way of representing objects in human-readable text. It has become the de facto standard for exchanging data in the NoSQL world. When an object has been serialized into JSON, it is known as a JSON document.)
Installation
1. Download 1.5.2 and extract it. impressed that it is only 30mb.
2. ./elasticsearch -XX:-UseSuperWord to start it due to my jdk is 1.7 and I do not want to destroy the hadoop environment that is built upon this Java environment.
3. curl -X get http://localhost:9200
I got this response. I've been listening to See You Again a lot these days, Wiz kid kind of making an impression that the Kid is smoking.
{
"status" : 200,
"name" : "Wiz Kid",
"cluster_name" : "elasticsearch",
"version" : {
"number" : "1.5.2",
"build_hash" : "62ff9868b4c8a0c45860bebb259e21980778ab1c",
"build_timestamp" : "2015-04-27T09:21:06Z",
"build_snapshot" : false,
"lucene_version" : "4.10.4"
},
"tagline" : "You Know, for Search"
}
4. shut it down
ctl+C
or
curl -XPOST 'http://localhost:9200/_shutdown'
Installing Marvel
Marvel
ElasticSearch's monitoring tool. free for development.
./bin/plugin -i elasticsearch/marvel/latest
Marvel Dashboard
http://localhost:9200/_plugin/marvel/
Marvel Sense: an interactive tool to communicate with elasticSearch
http://localhost:9200/_plugin/marvel/sense/
Talking to ElasticSearch
Basically two ways.
1. Java API using the native Elasticsearch transport protocol via port 9300.
1.1 using a non-data node
1.2 using a transport client
2. RESTful API with JSON via 9200. this is http protocol so that you can use curl to talk with it.
curl -X ':///?' -d ''
verb: GET, POST, PUT, HEAD, or DELETE
(
PUT: store this object at this URL
POST:store this object under this URL
)
protocol: htp or https
port: default elastic search's port is 9200
QUERY_STRING: optional parameters
body: A JSON-encoded request body (if the request needs one.)
for example, the following command display response in pretty format for easy reading and include http response header by tell curl it by -i.
curl -i -XGET 'localhost:9200/_count?pretty' -d '
{
"query": {
"match_all": {}
}
}'
Try it out
basic but very useful information is here:
https://www.elastic.co/guide/en/elasticsearch/guide/current/_indexing_employee_documents.html
This is put here for enhancing memory.
Relational DB ⇒ Databases ⇒ Tables ⇒ Rows ⇒ Columns
Elasticsearch ⇒ Indices ⇒ Types ⇒ Documents ⇒ Fields
Had a taste on creating index and search it from simple to complex.
Learnt how its clustering works.
My experiments
1. retrieving document from Oracle
set termout off
set feedback off
set header off
set pagesize 0
set newpage none
spool spriden_es.shl
select 'curl -XPUT ''http://localhost:9200/personindex/person/'||spriden_id||''' -d ''
{
"spriden_id" : "'||SPRIDEN_ID||'",'||'
"spriden_last_name": "'||replace(SPRIDEN_LAST_NAME,'''','')||'",'||'
"spriden_first_name" : "'||replace(SPRIDEN_FIRST_NAME,'''','')||'",'||'
"spriden_mi" :"'||SPRIDEN_MI||'"'||'
}
'''
from spriden
where spriden_change_ind is null;
spool off
exit
2. chmod 755 spriden_es.shl and execute it
this will load half million documents to elasticSearch under index called personindex of type person.
This is very brutal and might crash elasticSearch. I will find out more efficient way of data loading after learning more about it.
3. a closer real world query
##I do not understand why this is working only on spriden_id, while changing id to be last name or first name, it's not working.
##it turns out working on lower case only. white space generally won't work and have to be specially treated.
curl -XGET 'http://localhost:9200/personindex/person/_search?pretty' -d '
{
"query":{
"term": {
"spriden_id": "mylangid"
}
}
}
'
#GET /personindex/person/_search
curl -XGET 'http://localhost:9200/personindex/person/_search?pretty' -d '
{
"query":{
"filtered": {
"filter": {
"bool": {
"must": [
{"term": {
"spriden_last_name": "ma"
}},
{"term": {
"spriden_id": "myLangId"
}
}
]
}
}
}
}
}
'
4.mapping to SQL
4.1Exact match. score is always 1
-->a=b
{"term":{"a":"b"}}
--< a in(1,2,..)
{
"terms" : {
"a" : [1, 2]
}
}
-->a=b and (c=d or e=f)
{
"bool" : {
"must" : [],
"should" : [],
"must_not" : [],
}
}
-->range
a between c and d
"range" : {
"a" : {
"gt" : c,
"lt" : d
}
}
4.2 full text query, score/relevance will be calculated and returned.
match
(I do not know why this did not work out anything by replacing match with term)
curl -XGET 'http://localhost:9200/personindex/person/_search?pretty' -d '
{
"query" : {
"match" : {
"spriden_first_name": "ShiJie"
}
}
}
}
'
##this one request to match all words
curl -XGET 'http://localhost:9200/personindex/person/_search?pretty' -d '
{
"query" : {
"match" : {
"spriden_first_name": {
"query":"Shi Jie",
"operator": "and"
}
}
}
}
'
or you can do this:
curl -XGET 'http://localhost:9200/personindex/person/_search?pretty' -d '
{
"query":{
"match_phrase" : {
"spriden_first_name": "Shi Jie"
}
}
}
'
Kibana
After starting up Kibana, I found the sample data is too simple, so that I decide to remove the index and rebuild it with more meaningful data. This time, beside person, person's birth date and their basic student information are also included.
Script for data generation and loading, only 9999 records are retrieved for experiment.
select 'curl -XPUT ''http://localhost:9200/personindex/person/'||spriden_id||''' -d ''
{
"spriden_id" : "'||SPRIDEN_ID||'",'||'
"spriden_last_name": "'||SPRIDEN_LAST_NAME||'",'||'
"spriden_first_name" : "'||replace(SPRIDEN_FIRST_NAME,'''','''''')||'",'||'
"spriden_mi" :"'||SPRIDEN_MI||'",'||'
"term_registered" :"'||SGBSTDN_TERM_CODE_EFF||'",'||'
"term_start_date" :"'||to_char(STVTERM_START_DATE,'yyyy-mm-dd hh24:mi:ss')||'",'||'
"program_registered" :"'||SGBSTDN_PROGRAM_1||'",'||'
"major_registered" :"'||SGBSTDN_MAJR_CODE_1||'",'||'
"stst" :"'||SGBSTDN_STST_CODE||'",'||'
"bday" :"'||to_char(SPBPERS_BIRTH_DATE,'yyyy-mm-dd hh24:mi:ss')||'"'||'
}
'''
from spriden join sgbstdn
on spriden_pidm=sgbstdn_pidm
join stvterm on STVTERM_CODE=SGBSTDN_TERM_CODE_EFF
join SPBPERS on spriden_pidm=SPBPERS_PIDM
where spriden_change_ind is null
and rownum<10000 div="">
;
10000>
Rebuild index
--define a type with date columns explicitly defined.
--define a type by putting one piece of data, get its mapping, modify it and reapply it
curl -XGET 'http://localhost:9200/personindex/_mapping/person?pretty'
curl -XDELETE 'http://localhost:9200/personindex'
curl -XPUT 'http://localhost:9200/personindex/'
curl -XPUT 'http://localhost:9200/personindex/_mapping/person' -d '
{
"person" : {
"properties" : {
"bday" : {
"type" : "date"
},
"major_registered" : {
"type" : "string"
},
"program_registered" : {
"type" : "string"
},
"spriden_first_name" : {
"type" : "string"
},
"spriden_id" : {
"type" : "string"
},
"spriden_last_name" : {
"type" : "string"
},
"spriden_mi" : {
"type" : "string"
},
"stst" : {
"type" : "string"
},
"term_registered" : {
"type" : "string"
},
"term_start_date" : {
"type" : "date"
}
}
}
}
'
--reload the index by running generated script
Define index in Hibana
Now after typing the index name, time fields are displayed.
Now I can play around the Kibana to see what it can do for me.
Plan for next is to use also lean logstash and build up a system to monitor tomcat web log on application errors.
No comments:
Post a Comment