Tuesday, November 14, 2017

Amazon's solutions to What I've being doing

Green grass is what Predix.io has been doing, edge to cloud, digital twin etc.
http://www.allthingsdistributed.com/2017/06/unlocking-value-device-data-aws-greengrass.html

Glue is my work of data management workbench in Bitstew.
http://docs.aws.amazon.com/glue/latest/dg/what-is-glue.html

It's a little scary when I first learned this. They are so similar and Amazon is so influential and I worried they will surpass us.

At the end of day, I convinced myself that at least what I have been doing is cutting edge technologies and I want to do harder to make it better than Amazon's.

Thursday, August 17, 2017

Weird IT Conventions (draft)

Frankly, for many of these so called conventions, I just want to swear. They just got your mind so twisted, so unsure about their meanings and so much more to remember.

Where to start? how about foo? foo... let's forget about it, I will never make up some examples with this name.

1. Parameter names of command line tools.
- usually is short form of it. -- is full form of it.
for example, -a is same as --attributes if a parameter is called attributes.

2. single and double quotes. ' and "

In many circumstances, these two a interchangeable, You can use 'xxx"xxx' to have a double quote in result string, or you can use "xxx'xxx" to have a single quote in your result string.

But in Unix world, single quote means everything within it is as is. Double quote means pretty much same thing, but with exceptions, except $, \ and !, which can have special meaning and allow something such as substitute a variable with its value.

This is a scenario that same symbol has different meanings in several popular coding environment, which gives you hard time by paying special attention to them.

Thursday, July 13, 2017

OAuth2 Notes

very hard to understand if you just check it from different web sources: people simply had wrong understanding or assumption on it.

OAuth:
it is not authentication nor authorization, it is about delegation protocol that is scalable.
I delegate access to someone to do something for me.

Client id: client should be registered with oauth server the first. along its id, usually there is a redirect_uri.
redirect_uri: is uased to verify the request is valid, and used to send authentication code back if the app is a web application.

access token: like session. used for secured api calls
refresh token: like a password to get new access token.

by reference token: reference is stored somewhere, it will convert reference to by value token for accessing apis
by value token: token will full information.

bearer token: like cash, when you spend it, no body ask for identification
holder of key token: like credit card. asking for identification. not shared by other users.

id token is for client to build a meaningful session between its client app and server app.
access token is meant for apis.

four roles:

user/me, client/application, authorization server/oauth server, resource server/where api or data is

steps:typical scenario

--client asks authorization erver for accessing resources on resource server,
--authorization server says sure if user agreed. it redirect client to user login page
--user signs in authorizatoin to complete authorization. once the authenticcation is successful, authorization server issues an authentication code to client app via redirect_uri.
--client app uses authentication code, its clilent id, and secrect to ask for an access token.
--client app now uses access token to access resources owned by user/on behalf of user.
--resource server can call authorization server to check if the token is valid. usually it does not need to and it simply check if the signature is trustable.
--resource server then provides resources to client app if token is valid.

it usually works with openid. after user logging in, the authorization server also returns id token that contains information about the user.
client app(server end) uses this id token to build an user session between client app's client and server.

in microservice, let each service understand JWT. and pass around JWT when it needs to call out for other services.

ID token is JWT token, JWT can also be access token.

token can expire, usually a refresh token is given at same time for client app to renew access token.

about exchanging token.
https://www.youtube.com/watch?v=1ZX7554l8hY

token has access scope
access token can be in bearer header, query string or payload, depending on oauth provider.

Client:
confidential client: web server etc
public client: model app, javascript in useragent etc

grant type:
two legged:
client credentials: accessing own resources.
you provie client id and client secret/password to get access token.
usually used on server side since it is OK when you can hide the client secret in server side configuration or code.

resource owner credentials:
this is user strongly trust client app, and give out its own user and password to client app.

implicit:usually in javascript code
https://tools.ietf.org/html/rfc6749#section-4.2
It is designed for applications that access APIs only while the user is present at the application

client app directs user to auth server to express authentication
oauth server redirect res owner back to client app along with access token
client app uses access token to access res user's resources on behalf of user
it does not have refresh token since the client app is not authenticated.it was driven directly by user himself.
(authentication code is for server that is proxy of user.)
since access token is viewiable to user on same computer, it's required to be passed only within secured transport.

redirect_uri is defined as part of client login in oauth server, it includes redirect_uri as optional configuration. redirect_uri is
a mean of verification in implicit grant, not a mean of communication. but it is mean of communication in authentication code grant.

three-legged:
authentication code: accessing other's resources

Monday, May 08, 2017

About Hash

Hash table

This is a key-value look up data structure.

You can think it is an Array coupled with hash function. Hash function takes in key and output an integer as index in array, then it stores the key and value under the index.

Key is required to be stored for the reason of collision hanlding. Key's equals() function is used to determine a key that is in hash collision.

Hashtable is roughly same as HashMap in Java, except it's multi-thread safe and does not allow null key and null value.

Hash Set

In java it is a hash table that stores key itself as its look up value.

Hash Map

This is a hash table, but not thread safe and allow null key and null value.

Collision

solution is collision is linear probing and (separate) chaining, as well as doubling hashing. linear probing can lead to a problem of clustering (major drawback of linear probing) when a lot of collisions happen. chaining is a solution Java is using.

double hashing use a fomular with second hash function involved when first hash function has a collision.

Sunday, February 19, 2017

Java Xml Tabulator

Tried to google Java or XSLT solution to convert XML to tabular data format, butcould not find an easy to understand or to use one. So that I made one myself.

https://github.com/shijiema/JavaXmlTabulator

Java Xml Tabulator

Converting XML to tabular form of data in pure Java implementation. No third party library required.
For am Xml such as

<Relations>
 <Relationship p1="v1">some text
     <id>1</id>
  <Type>OneToMany</Type>other text
  <Weight>1.0</Weight>
  <Score>100.0</Score>
 </Relationship>
 <Relationship>
  <id>2</id>   noise 3
  <Type>ManytoOne</Type>
  <Weight>1.0</Weight>
  <Score>90.0</Score>
 </Relationship>
</Relations>

It will convert it to a flatten version of data that if iterating, looks like this:

[Relations_Relationship, Relations_Relationship_Score, Relations_Relationship_Type, Relations_Relationship_Weight, Relations_Relationship_id, Relations_Relationship_p1]
[some textother text, 100.0, OneToMany, 1.0, 1, v1]
[noise 3, 90.0, ManytoOne, 1.0, 2, null]

It has two ways to access transformed data.
First method is through its iterator() method, which will return each row as List. First row is header and subsequent rows are body content. Headers and body content have been aligned.
Second method is through its getHeaders() and getBody() if partial data access is what is wanted.

Algorithm

Observations:

non-repeat elements in XML could be treated as parent node's attributes
repeat element in XML usually means multiple rows after being flattened
path from root to node makes the columns in tabular format

tablify(){
    with XML tree,
    1. merge non-repeat elements to their parents
        1.1 from leaf to root, merge non-repeat children element to its parent as its parent's attributes. this includes both text and its attributes
            attribute name for child node is child node element name; attribute value for child node element is child node's text value
            attribute name for child node attributes are child node element name + child node attribute name; value is child element attribute value
        1.2 remove these children from their parents
        1.3 repeat 1.1 and 1.2 until no more such children exists
    2. make node production from leaf to root
        2.1 for a node, make its equivalent node production
                2.1.1 leaf node's children node is null
                2.1.2 rows of repeat children element 1 * rows repeat children element 2 * rows repeat children element n
                2.1.3 insert this parent node to head of each produced row from 2.1.1

        2.2 do 2.1 for all parent nodes(null node's parent is the leaf node), but stop at root element
        2.3 do 2.1 for root element(this is because different path has different depth, they have to wait to do final production)
    3. in each row in final node production, convert node to columns
        3.1 node column name = path to node; node value = text content in node
        3.2 node attribute column name = path to node + attribute name; node attribute column value = attribute value
        (this works well for non-repeat node wrapped as parent node's attribute)
    4.return key-value paired node production
}

Shi Jie Ma - Initial work