Monday, 26 December 2016

Grammar-Text Analytics-Unstructured Data-Content Management in Oracle


It is not Hadoop but just an example on How to use Grammar in Oracle.
Let us assume that we have loaded the data from various sources into Oracle database and need to do some pattern matching in Oracle database.
This article is not going to provide end to end insight but will cover up a quick thought on Content Management and Searching with Text patterns.
CTXSYS is one package which does majority of the grammatical operations.
Run the below commands and grant the respective privileges on CTXSYS.
CTXSYS is a Package which hosts all the Grammar related functionalities that can be used in Content Management.



The below grants to be Run from SYS to enable all Grammar commands:
GRANT EXECUTE ON CTXSYS.CTX_CLS TO training;
GRANT EXECUTE ON CTXSYS.CTX_DDL TO training;
GRANT EXECUTE ON CTXSYS.CTX_DOC TO training;
GRANT EXECUTE ON CTXSYS.CTX_OUTPUT TO training;
GRANT EXECUTE ON CTXSYS.CTX_QUERY TO training;
GRANT EXECUTE ON CTXSYS.CTX_REPORT TO training;
GRANT EXECUTE ON CTXSYS.CTX_THES TO training;
GRANT EXECUTE ON CTXSYS.CTX_ULEXER TO training;
GRANT EXECUTE ON CTXSYS.CTX_MATCHES TO training;
GRANT EXECUTE ON CTXSYS.DR_DEF TO training;
GRANT EXECUTE ON CTXSYS.DRICON TO training;
GRANT EXECUTE ON CTXSYS.CTX_CATSEARCH TO training;

It gives is the respective Indexes on CTXSYS, These Indexes are mandatory to enable any Text based Search patterns for the required columns.
select owner, index_name, index_type, status, domidx_status, domidx_opstatus from dba_indexes where ityp_owner='CTXSYS'

Let us consider a table called CORE_DATA which contains the following columns,
Identifier -- A Unique identifier for the row
SentDate -- Worked date
Content -- Actual text message, Can be a CLOB or LONG data type                   
Summary -- Header or the topic of the text message

CREATE INDEX TRAINING.EM_CONSTRTIDX ON TRAINING.CORE_DATA (content) INDEXTYPE IS CTXSYS.CONTEXT

CONTAINS -- To check the existence of any keyword inside a text                          
NEAR -- To check the existence of multiple words inside a text

select * from core_data where contains(content, 'wonderful', 1) > 0
select  * from core_data where  contains(content, 'deepika near wonderful')>0
select  * from core_data where  contains(content, 'Deepika near Wonderful near looking')>0 

Result set:
Heroines are wonderful
Deepika looks wonderful
Deepika acted in Ramleela and looking like an angel and her costumes are really wonderful

No comments:

Post a Comment