Twenty Percent Knowledge: Start to Learn ElasticSearch/Elastic and Kibana

This will be my task for next couple weeks.

ElasticSearch

Elastic is based on Apache Lucene, trying to provide simple RESTful interface for people to use complex Lucene. Beside that, it is also a
--key value document store where every field is indexed and searchable
--distributed search engine with real-time analytic

(Elasticsearch is a distributed document store. It can store and retrieve complex data structures—serialized as JSON documents—in real time. In other words, as soon as a document has been stored in Elasticsearch, it can be retrieved from any node in the cluster.)

--document in elastic search is data serialized in JSON. elastic search will convert data to json before storing it.

it uses structured JSON document.
(JSON is a way of representing objects in human-readable text. It has become the de facto standard for exchanging data in the NoSQL world. When an object has been serialized into JSON, it is known as a JSON document.)

Installation

1. Download 1.5.2 and extract it. impressed that it is only 30mb.
2. ./elasticsearch -XX:-UseSuperWord to start it due to my jdk is 1.7 and I do not want to destroy the hadoop environment that is built upon this Java environment.
3. curl -X get http://localhost:9200
I got this response. I've been listening to See You Again a lot these days, Wiz kid kind of making an impression that the Kid is smoking.

{
"status" : 200,
"name" : "Wiz Kid",
"cluster_name" : "elasticsearch",
"version" : {
"number" : "1.5.2",
"build_hash" : "62ff9868b4c8a0c45860bebb259e21980778ab1c",
"build_timestamp" : "2015-04-27T09:21:06Z",
"build_snapshot" : false,
"lucene_version" : "4.10.4"
},
"tagline" : "You Know, for Search"
}

4. shut it down

ctl+C

curl -XPOST 'http://localhost:9200/_shutdown'

Installing Marvel

Marvel

ElasticSearch's monitoring tool. free for development.

./bin/plugin -i elasticsearch/marvel/latest

Marvel Dashboard

http://localhost:9200/_plugin/marvel/

Marvel Sense: an interactive tool to communicate with elasticSearch

http://localhost:9200/_plugin/marvel/sense/

Talking to ElasticSearch

Basically two ways.

1. Java API using the native Elasticsearch transport protocol via port 9300.

1.1 using a non-data node

1.2 using a transport client

2. RESTful API with JSON via 9200. this is http protocol so that you can use curl to talk with it.

curl -X ':///?' -d ''

verb: GET, POST, PUT, HEAD, or DELETE

(

PUT: store this object at this URL

POST:store this object under this URL

)

protocol: htp or https

port: default elastic search's port is 9200

QUERY_STRING: optional parameters

body: A JSON-encoded request body (if the request needs one.)

for example, the following command display response in pretty format for easy reading and include http response header by tell curl it by -i.

curl -i -XGET 'localhost:9200/_count?pretty' -d '

{

"query": {

"match_all": {}

}

Try it out

basic but very useful information is here:

https://www.elastic.co/guide/en/elasticsearch/guide/current/_indexing_employee_documents.html

This is put here for enhancing memory.

Relational DB ⇒ Databases ⇒ Tables ⇒ Rows ⇒ Columns

Elasticsearch ⇒ Indices ⇒ Types ⇒ Documents ⇒ Fields

Had a taste on creating index and search it from simple to complex.

Learnt how its clustering works.

My experiments

1. retrieving document from Oracle

set termout off

set feedback off

set header off

set pagesize 0

set newpage none

spool spriden_es.shl

select 'curl -XPUT ''http://localhost:9200/personindex/person/'||spriden_id||''' -d ''

{

"spriden_id" : "'||SPRIDEN_ID||'",'||'

"spriden_last_name": "'||replace(SPRIDEN_LAST_NAME,'''','')||'",'||'

"spriden_first_name" : "'||replace(SPRIDEN_FIRST_NAME,'''','')||'",'||'

"spriden_mi" :"'||SPRIDEN_MI||'"'||'

}

'''

from spriden

where spriden_change_ind is null;

spool off

exit

2. chmod 755 spriden_es.shl and execute it

this will load half million documents to elasticSearch under index called personindex of type person.

This is very brutal and might crash elasticSearch. I will find out more efficient way of data loading after learning more about it.

3. a closer real world query

##I do not understand why this is working only on spriden_id, while changing id to be last name or first name, it's not working.

##it turns out working on lower case only. white space generally won't work and have to be specially treated.

curl -XGET 'http://localhost:9200/personindex/person/_search?pretty' -d '

{

"query":{

"term": {

"spriden_id": "mylangid"

}

#GET /personindex/person/_search

curl -XGET 'http://localhost:9200/personindex/person/_search?pretty' -d '

{

"query":{

"filtered": {

"filter": {

"bool": {

"must": [

{"term": {

"spriden_last_name": "ma"

}},

{"term": {

"spriden_id": "myLangId"

}

]

}

4.mapping to SQL

4.1Exact match. score is always 1

-->a=b

{"term":{"a":"b"}}

--< a in(1,2,..)

{

"terms" : {

"a" : [1, 2]

}

-->a=b and (c=d or e=f)

{

"bool" : {

"must" : [],

"should" : [],

"must_not" : [],

}

-->range

a between c and d

"range" : {

"a" : {

"gt" : c,

"lt" : d

}

4.2 full text query, score/relevance will be calculated and returned.

match

(I do not know why this did not work out anything by replacing match with term)

curl -XGET 'http://localhost:9200/personindex/person/_search?pretty' -d '

{

"query" : {

"match" : {

"spriden_first_name": "ShiJie"

}

##this one request to match all words

curl -XGET 'http://localhost:9200/personindex/person/_search?pretty' -d '

{

"query" : {

"match" : {

"spriden_first_name": {

"query":"Shi Jie",

"operator": "and"

}

or you can do this:

curl -XGET 'http://localhost:9200/personindex/person/_search?pretty' -d '

{

"query":{

"match_phrase" : {

"spriden_first_name": "Shi Jie"

}

Kibana

After starting up Kibana, I found the sample data is too simple, so that I decide to remove the index and rebuild it with more meaningful data. This time, beside person, person's birth date and their basic student information are also included.

Script for data generation and loading, only 9999 records are retrieved for experiment.

select 'curl -XPUT ''http://localhost:9200/personindex/person/'||spriden_id||''' -d ''

{

"spriden_id" : "'||SPRIDEN_ID||'",'||'

"spriden_last_name": "'||SPRIDEN_LAST_NAME||'",'||'

"spriden_first_name" : "'||replace(SPRIDEN_FIRST_NAME,'''','''''')||'",'||'

"spriden_mi" :"'||SPRIDEN_MI||'",'||'

"term_registered" :"'||SGBSTDN_TERM_CODE_EFF||'",'||'

"term_start_date" :"'||to_char(STVTERM_START_DATE,'yyyy-mm-dd hh24:mi:ss')||'",'||'

"program_registered" :"'||SGBSTDN_PROGRAM_1||'",'||'

"major_registered" :"'||SGBSTDN_MAJR_CODE_1||'",'||'

"stst" :"'||SGBSTDN_STST_CODE||'",'||'

"bday" :"'||to_char(SPBPERS_BIRTH_DATE,'yyyy-mm-dd hh24:mi:ss')||'"'||'

}

'''

from spriden join sgbstdn

on spriden_pidm=sgbstdn_pidm

join stvterm on STVTERM_CODE=SGBSTDN_TERM_CODE_EFF

join SPBPERS on spriden_pidm=SPBPERS_PIDM

where spriden_change_ind is null

and rownum<10000 div="">

;

Rebuild index

--define a type with date columns explicitly defined.

--define a type by putting one piece of data, get its mapping, modify it and reapply it

curl -XGET 'http://localhost:9200/personindex/_mapping/person?pretty'

curl -XDELETE 'http://localhost:9200/personindex'

curl -XPUT 'http://localhost:9200/personindex/'

curl -XPUT 'http://localhost:9200/personindex/_mapping/person' -d '

{

"person" : {

"properties" : {

"bday" : {

"type" : "date"

"major_registered" : {

"type" : "string"

"program_registered" : {

"type" : "string"

"spriden_first_name" : {

"type" : "string"

"spriden_id" : {

"type" : "string"

"spriden_last_name" : {

"type" : "string"

"spriden_mi" : {

"type" : "string"

"stst" : {

"type" : "string"

"term_registered" : {

"type" : "string"

"term_start_date" : {

"type" : "date"

}

--reload the index by running generated script

Define index in Hibana

Now after typing the index name, time fields are displayed.

Now I can play around the Kibana to see what it can do for me.

Plan for next is to use also lean logstash and build up a system to monitor tomcat web log on application errors.

Twenty Percent Knowledge

Tuesday, May 26, 2015

Start to Learn ElasticSearch/Elastic and Kibana

ElasticSearch

Installation

Marvel

Talking to ElasticSearch

Try it out

My experiments

Kibana

No comments: