regex - Pattern matching for strings independent from symbols -


i have need algorithm can find pre-defined patterns in data (which present in form of strings) independent actual symbols/characters of data , pattern. care relations between symbols, not symbols themselves. legal have different pattern symbols same symbol in data. thing pattern matching algorithm has enforce multiple occurences of same symbol in pattern preserved. give example:

the pattern abca, first , last letter same. application, equivalent way write 1 2 3 1, digits variables. data have thistextisatest. resulting algorithm should give me 2 correct matches here, text , test. because in these 2 cases, first , fourth letter same, in pattern.

as second example, pattern abcd should return 12 matches (one each position in thistextisat). since no variable in pattern repeated, trivially matched everywhere. in case of text , test, because legal variables a , d of pattern map same symbol.

the goal of algorithm should detect similarities in written language. imagine having dictionary of english language , parsing pattern unseen or equivalently 1 2 3 4 4 2. see that, example, word belittle contains same pattern of letters.

so, made clear need, have questions:

  • what algorithm called? well-known problem has been solved?

  • are there publications on matter? hard find useful when don't know correct search terms separate problem regular pattern matching.

  • is there ready implementation of this?

i have not used regex complicated, don't know if possible in regex, when not care symbols such, consider pattern of occurences.

i'd appreciate help!

i don't think need regular expressions here. search term:

unseen 123442 

this has 6 characters, index each word of text 6-mers

belittle

12,12,12,12,11,12,12 2-mers 123,123,123,122,112,123 3-mers 1234,1234,1233,1223,1123 4-mers 12345,12344,12334,12234 5-mers 123455,123442,123321 6-mers 

so looking @ 6-mers, you've got match. 6 digit number less search term match, allow abcd (1234) case matching abca (1231) word.

so given search term of n characters, split each word constituent n-mers , check numeric equal or less than.


Comments

Popular posts from this blog

c# - SharpSsh Command Execution -

python - Specify path of savefig with pylab or matplotlib -

How to run C# code using mono without Xamarin in Android? -