Tuesday, 7 February 2017

Quick Steps to fine tune a Oracle database

At first ADDM and AWR reports need to be analyzed to see the real cause on database load and the below steps could be performed later.

1.    Purging recycle Bin
purge recyclebin; -- It Purges the Dropped Objects and releases storage

2.    Space Issues

select df.tablespace_name "Tablespace", totalusedspace "Used MB", (df.totalspace - tu.totalusedspace) "Free MB", df.totalspace "Total MB", round(100 * ( (df.totalspace - tu.totalusedspace)/ df.totalspace))"Pct. Free"from (select tablespace_name,round(sum(bytes) / 1048576) TotalSpace from dba_data_files group by tablespace_name) df,(select round(sum(bytes)/(1024*1024)) totalusedspace, tablespace_name from dba_segments group by tablespace_name) tu where df.tablespace_name = tu.tablespace_name;

Pct-Free -- It should not be <10% unless it’s a less prioritized or unused tables space

select * from v$sgastat where NAME = 'free memory';

shared pool, large pool, java pool should be predominantly large

3.    Check for any Anonymous  or long running process
select s.sid,s.serial#,s.username,s.osuser   from v$session s,v$process p  where s.paddr=p.addr
select * from v$session where osuser!='oracle';

OSUSER should not be Anonymous or any other Adhoc Users like DESKTOP, Standalone device like Laptop or Mobile

If you identify any Long running process , Go ahead , kill with prior notice to stakeholders

ALTER SYSTEM KILL SESSION '36, 57056, @1' IMMEDIATE;

4.    Result Cache
SELECT dbms_result_cache.status() FROM dual;
Result Cache can be enabled


5.    Trace at session level should be disabled which may increase the load on Log files
ALTER SESSION SET sql_trace = false;

Monday, 6 February 2017

Grammar-Text Analytics - Unstructured Data-Content Management in MySQL

How to use Grammar in MySQL.
MySQL has better Content Management aspects and handles bulk data efficiently
Assume data is loaded into a Table of MySQL

Let us consider a table called CORE_DATA which contains the following columns,
Identifier -- A Unique identifier for the row
SentDate -- Worked date
Content -- Actual text message, Can be a LONGTEXT or MEDIUMTEXT           
Summary -- Header or the topic of the text message

CREATE FULLTEXT INDEX content_idx ON hoot_data(content);

MATCH --A keyword to search a string literal

The below Query just gives all the rows which matches a string

(select 1 PrioID,summary,content from  CORE_DATA
where  upper(CONTENT) LIKE
upper(CONCAT('%', 'Deepika Padukone is awesome' ,'%')) limit 100 )
--- A complete Match
union
(SELECT 2,summary,content FROM CORE_DATA   WHERE MATCH (content)
    AGAINST (replace('Deepika Padukone is awesome', ' ' , ' +')
    IN BOOLEAN MODE))
--- A complete Match with ALL words present at different PLACES of the column
union
(select 3,summ,cont from
(
SELECT summary as summ,content as cont, MATCH (content) 
AGAINST (replace('Deepika Padukone is awesome', ' ' , ',')
IN BOOLEAN MODE ) ---
--- A partial Match with ANY one word present at different PLACES
AS score FROM HOOTSEARCH.HOOT_DATA) q1 order by score desc limit 100)
order by PrioID

Result set:
Deepika Padukone is awesome -- Result of 1st Union clause
Deepika last name is Padukone and she really look awesome -- 2nd Union
Deepika acted in Ramleela and looking like an angel and her costumes are really nice --3rd Union

Monday, 26 December 2016

Grammar-Text Analytics-Unstructured Data-Content Management in Oracle


It is not Hadoop but just an example on How to use Grammar in Oracle.
Let us assume that we have loaded the data from various sources into Oracle database and need to do some pattern matching in Oracle database.
This article is not going to provide end to end insight but will cover up a quick thought on Content Management and Searching with Text patterns.
CTXSYS is one package which does majority of the grammatical operations.
Run the below commands and grant the respective privileges on CTXSYS.
CTXSYS is a Package which hosts all the Grammar related functionalities that can be used in Content Management.



The below grants to be Run from SYS to enable all Grammar commands:
GRANT EXECUTE ON CTXSYS.CTX_CLS TO training;
GRANT EXECUTE ON CTXSYS.CTX_DDL TO training;
GRANT EXECUTE ON CTXSYS.CTX_DOC TO training;
GRANT EXECUTE ON CTXSYS.CTX_OUTPUT TO training;
GRANT EXECUTE ON CTXSYS.CTX_QUERY TO training;
GRANT EXECUTE ON CTXSYS.CTX_REPORT TO training;
GRANT EXECUTE ON CTXSYS.CTX_THES TO training;
GRANT EXECUTE ON CTXSYS.CTX_ULEXER TO training;
GRANT EXECUTE ON CTXSYS.CTX_MATCHES TO training;
GRANT EXECUTE ON CTXSYS.DR_DEF TO training;
GRANT EXECUTE ON CTXSYS.DRICON TO training;
GRANT EXECUTE ON CTXSYS.CTX_CATSEARCH TO training;

It gives is the respective Indexes on CTXSYS, These Indexes are mandatory to enable any Text based Search patterns for the required columns.
select owner, index_name, index_type, status, domidx_status, domidx_opstatus from dba_indexes where ityp_owner='CTXSYS'

Let us consider a table called CORE_DATA which contains the following columns,
Identifier -- A Unique identifier for the row
SentDate -- Worked date
Content -- Actual text message, Can be a CLOB or LONG data type                   
Summary -- Header or the topic of the text message

CREATE INDEX TRAINING.EM_CONSTRTIDX ON TRAINING.CORE_DATA (content) INDEXTYPE IS CTXSYS.CONTEXT

CONTAINS -- To check the existence of any keyword inside a text                          
NEAR -- To check the existence of multiple words inside a text

select * from core_data where contains(content, 'wonderful', 1) > 0
select  * from core_data where  contains(content, 'deepika near wonderful')>0
select  * from core_data where  contains(content, 'Deepika near Wonderful near looking')>0 

Result set:
Heroines are wonderful
Deepika looks wonderful
Deepika acted in Ramleela and looking like an angel and her costumes are really wonderful

Achieve 100% Availability on BI reports - Synonym Toggling of ETL

Any BI manipulates source data and during that time, The reports may have some downtime or it could be stale for a while.
Example, Ledger postings at EoD takes 45 min approx. to complete the ETL process and load the Daily Accounting reports.
For that 45 min. The reports should still be accessible by the users without any Downtime or exceptions.
Note :
Few OLAP BI tools already provides Cache techniques or auto refresh capabilities.
It can be implemented for FOSS based ETL/reporting tools where the Features are limited.

Synonym Togging:

Assume MAIN_SYN(Synonym) is a most importing source table through which the DIM,FACT and SUMMARY gets refreshed.

You can apply the concept for any number of tables.
STAGING_TEMP1 and STAGING_TEMP2 are just same copies of Staging tables which is called by the synonym MAIN_SYN.



   

Talend In One Day


  Basic Components
Explanation
tFileInputDelimited
Reads any Delimited .txt or .csv file and
tFileOutputDelimited
Silimar and generates a delimited output file -.csv
tFIleInputXML
Reads any XML file as input and extract the Xpath.
Store the XML in any location.
Specify the Xpath Loop expression <employee>
Fields to extract<empname>,<DOB>
tRowGenerator
Genarates rows sequentially.
Go to Talend Data Fabric and select the functions accordingly.
OrderID,Date sequences gets auto generated.
Similar to a Oracle Sequence
tFixedFlowInput
Manually creates Fixed dataset.
BasicSettings->Create 2 columns State_Cd,State_Nm and enter static values
Change No Of Rows and it generates static rows accordingly
InlineTable allows to add Multiple rows and multiplies with the rows * no. of columns
tMap
Performs Join between 2 input files Expr.Key performs the actual join.
row1 and row2 need to be joined with a common coloumn in the Expr.key
tSample
Generates sequential numbers as a field in the output.Range can be changed in the Basic settings
tJava
Run an Java standalone Core Java code in Basic settings
tSalesforceBulkExec
Give the SalesForce Webservice SOAP URL with username and password.
Get the mySF0.1 from the Salesforce Metadata
tLibraryLoad
Upload any external Library(JAR file) which can be called from tJava.The tJava
utility can be invoked with OnComponentOk


Big Data Components
Explanation
tHDFSConnection
Establishes a connection to a HDFS system
Distribution:Apache
NameNodeURI:hdfs://10.140.6.125:9000/
Username: <username of the virtual machine>
tHDFSConnection OnsubjobOK can be connected to tRowGenerator and tLogrow which ends with tHDFSOutput_1
tHDFSOutput
Generates the Output and stores in the Hadoop system
Filename:\user\hadoop\talendoutput\out.txt
Action: Create
Run the Program
Go to HDFS,
hadoop fs -ls /user/hadoop/talendoutput
hadoop fs -cat /user/hadoop/talendoutput/out.txt
tHDFSInput
To retrive data from Hadoop HDFS file.
Make the tHDFSConnection
Make the tHDFSInput
Enter Filename /user/hadoop/talendoutput/out.txt and Fieldseparator
Connect with tLogrow
 

Sunday, 20 November 2016

Data Model of a Employee Management with Shift schedules



Let us consider that we need to Model a employee Shift management system.This snapshot is just a ideology and could vary based on the architecture principles.

Hadoop in One day



Analyzing Big data with Hive

Hive
Structured Data-SQL


To load data into Hive Table: LOAD DATA INPATH .’dbname/userinfo’ INTO MOVIES;


Pig
Semi structured and Un structured data and the result set can be captured into two variables (sdppig,bvoippig)
grunt>LOAD ‘/user/remsrch/sdp_daily.csv  USING PigStorage(‘,’)  AS (arrivaldate:chararray,status:int,hrsopen:int,closeddate:int,summary:chararray);
grunt>DUMP sdppig;
grunt> remvar = LOAD ‘/user/remsrch/bvoip_daily.csv  USING PigStorage(‘,’)  AS (arrivaldate:chararray,status:int,hrsopen:int,closeddate:int,summary:chararray);
grunt> DUMP bvoippig;
grunt> sdphigh=FOREACH sdppig GENERATE arrivaldate,status;
grunt> bvoiphigh=FOREACH bvoippig GENERATE arrivaldate,status;
grunt>combined=JOIN sdphigh by arrivaldate,bvoiphigh by arrivaldate;
grunt>DUMP combined;
grunt>STORE combined into ‘user/remsrch USING PigStorage(‘,’); -- Storing the combined result into HDFS system
grunt>quit;
hdfs dfs –ls /user/remsrch

HBase-NOSQL Database
$hbase Shell
create ‘tablename’,’columnname’,….. –Creates a Table
put ‘tablename’,’rowNumber’, --Inserts data in row and column
get ‘tablename’,’rownumber’ –Query
scan ‘tablename’ –Display all rows and columns
disable ‘tablename’
drop ‘tablename’ –Drops Table
pig can help us to move some data from HDFS to HBase
app_stock is the Table name

scan ‘app_stock’,{ ‘LIMIT’ => 10} –Displays 10 rows from the Table

Ingesting means pushing files into HDFS-Automate all the manual file push using bash scripting into HDFS








Performance Improvements
Tez – High speed engine like parallelism
set hive.execution.engine=tez;
ORC – Allocates the data in neighboring blocks
hive.optimize.ppd=true
hive.enforce.sorting
CREATE TABLE mytable (
...
) STORED AS orc;
Vectorization- scans, aggregations, filters and joins
set hive.vectorized.execution.enabled = true;
set hive.vectorized.execution.reduce.enabled = true;
Cost Based – Considers Query Cost and picks up the least cost
set hive.cbo.enable=true;
set hive.compute.query.using.stats=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;

HDFS
Multi Node cluster
Hadoop Master: 192.168.1.15 (hadoop-master)
Hadoop Slave: 192.168.1.16 (hadoop-slave-1)
Hadoop Slave: 192.168.1.17 (hadoop-slave-2)