Você está na página 1de 4

Noun-Verb Based Technique of Text Watermarking Using Recursive Decent Semantic Net Parsers

Xingming Sun1 and Alex Jessey Asiimwe2


Zhuzhou Institute of Technology, Hunan University, Changsha, China sunnudt@163.com Institute of Computer Science, Makerere University, 7062, Kampala, Uganda asiimwealex@hotmail.com
1

Abstract. The proposed method of text watermarking by exploits nouns and verbs in a sentence parsed with a grammar parser while using semantic networks. Change is done on the structure of the sentence to generate nouns and verbs whose non terminals, away from the root sentence are used with random numbers to hide the watermark. The modications, range from active to passive voices or use of linking verbs or using mid-sentence modiers, terminal modiers to combining modiers. Keywords: watermarking, semantic networks, nouns and verbs.

Introduction

Watermarking natural language has proven to be a dicult task because, understanding and processing of natural language itself is even more dicult to the Articial Intelligence Community. Because of the complexity of natural language, this has motivated much of the research in natural language watermarking. Typically, the prior art natural language processing systems function in a manner analogous to the diagramming of sentences to determine the functions of the various words in the context in which they are used (noun, verb, etc.). Other techniques proposed for watermarking multimedia documents include, use of frequency domain [4], inserting spelling, syntactic, punctuation or even content errors [2]. There is also a semantically based scheme, which hides data in the text-meaning representation (TMR) [1]. The remainder of this paper is organized as follows. Section 2, briey reviews semantic networks. Section 3 discusses parsing using recursive decent parsers. Section 4 presents the embedding methodology and watermark extraction processes. Finally, Section 5 presents the conclusions.
This paper is supported by National Natural Science Fundation of China (NSFC No.60373062), Hunan Provincial Natural Science Fundation of China (HPNSFC No. 02JJYB012), Key Foundation of Science and Technology of Ministry of Education of China (No. 03092).
L. Wang, K. Chen, and Y.S. Ong (Eds.): ICNC 2005, LNCS 3612, pp. 968971, 2005. c Springer-Verlag Berlin Heidelberg 2005

Noun-Verb Based Technique of Text Watermarking

969

Semantic Networks

A semantic network is a system for capturing, storing and transferring information that works much the same way as the human brain. Semantic networks can grow to extraordinary complexity, necessitating a sophisticated approach to knowledge visualization, balancing the need for simplicity with the full expressive power of the network[5]. Semantic networks are basically composed of, concepts(any ideas or thoughts that have meaning), relation(specic kinds of links or relationships between two concepts) and instances(concepts linked by a specic relation). Let n(t) be the number of nodes at time t. Starting with a

Fig. 1. Undirected growing network

small fully connected network of M nodes (M << n), at each time step, a new node with M links is added to the network that targets its connections to some neighborhood i(in accordance with the locality principle). Let the neighborhood of a node i be the set of neighbors Hi of node i including the node i itself. The probability Pi (t), of choosing a neighborhood is based on neighborhood size and is given by: ki (t) (1) pi (t) = n(t) i=1 ki (t) where ki (t) is the degree of node i at time t. The connections of the new node are targeted towards nodes within the chosen neighborhood Hi . The probability Pij (t), of connecting to a node j in the neighborhood of node i is based on: pij (t) = Uj
iHi

Uj

(2)

If all utilities are equal, then it follows that: 1 pij (t) = ki (t)

(3)

A sentence is represented as a verb node, with various case links to node representing other participants in the action. In parsing a sentence, the program nds the verb and retrieves the case frames for that verb from its knowledge base and it binds the values of the agents, objects, etc. to the appropriate nodes in the case frame.

970

X. Sun and A.J. Asiimwe

Recursive Decent Parsers

Consider the subset of English rules below [3]; {Sentence} {Nounphrase}{Verbphrase} {NounPhrase} {NounPhrase}|{NounPhrase}{Prep Phrase} {VerbPhrase} {VerbPhrase}|{VerbPhrase}{Prep Phrase} {Prep Phrase} {Prep} {NounPhrase} {NounPhrase} {Article}{Noun} {VerbPhrase} {Verb}|{Verb}{NounPhrase} Parsing the above sentence Sarah xed the chair with glue

Fig. 2. The and/or parse tree for Sarah xed the chair with glue

Watermark Embedding and Extraction

Encoding a single bit Let the text to be watermarked consist of n sentences S1 , S2 ,.., Sn . Let the watermark W consist of k bits w1 , w2 ,...,wk . xi denotes the number of nodes between the each terminal and the root in a sentence. H(x1 x2 x3 ..) - is the hashed value after concatenating the labels. M(S) denotes the marked sentences. Let Rn be the pseudo-random numbers, Rn1 , Rn2 ,..., Rnk . generate n random numbers seeded with a secret key P from 1 to n repeat the folllowing parse sentence; for each parsed sentence for each terminal start with leftmost terminal,count(non terminals between the terminal and the start variable); create a list L1 of labels; hash the concatenation the labels; if (H(x1x2x3.)+ Rni ) mod k equals 0 mark the sentence; end for;

Noun-Verb Based Technique of Text Watermarking

971

for each sentence next to M(Si ) start with rightmost terminal for each noun or verb terminal Count(non terminals from the root); Create a list L of labels; Concatenate(labels) to form a numeric figure NV(T); if(Rni + NV(T)) mod k is a quadratic residue return bit (rb)is equal to 1; else return bit (rb) is equal to 0; if (rbj==wj) proceed; else (modify); end for; end for; end. Watermark Extraction The extraction process goes through the same steps like in watermark embedding, but only reads the returned bits. The detection algorithm is blind. It simply extracts W bits of information from the text, without requiring access to the original text or watermark to arrive at its decision. The watermark is a concatenation of the piecemeal bits from each selected sentence.

Conclusion

The consequences of neural network computing for natural language processing may be more convulsively revolutionary than anything imagined in the current technology. Therefore, the growth of intelligent systems in digital watermarking is not far from now, more specically with natural language watermarking. About author: Alex Jessey Asiimwe is going for a project in Hunan University.

References
1. Atallah, M. J., V. Raskin, M. Crogan, C. F. Hempelmann, F. Kerschbaum, D. Mohamed, and S. Naik,Natural Language Watermarking: Design, Analysis, and a Proof-of-Concept Implementation. April 2001. 2. Benjamin B., S. Gomez and V. Bogarin. Steganographic Watermarking for Documents, 34th Annual Hawaii International Conference on System Sciences (HICSS34)-Volume 9, January, 2001. 3. George F Luger, Structures and Strategies for Complex Problem Solving Articial Intelligence, Fourth Edition, 2002. 4. Huijuan Yang and Alex, C. Kot, Text Document Authentication by Integrating Inter Character and Word Spaces Watermarking. IEEE International Conference on Multimedia and Expo. June 2004. 5. Sowa, John F., ed.Principles of Semantic Networks: Explorations in the Representation of Knowledge, Morgan Kaufmann Publishers, CA, 1991.

Você também pode gostar