Check format similarity between two strings

I have a string format which is like:

the word must be 15 letters long

first 8 letters are date

Example: '2009060712ab56c'

Let's say I want to compare this with another string and give a percentage of format similarity like:

result = format_similarity('2009060712ab56c', '20070908njndla56gjhk')

result is let's say in this case 80%.

Is there way of doing this?

edited Nov 15 '18 at 12:40

jonrsharpe

77.8k11105213

asked Nov 15 '18 at 12:30

s900n

462618

3

What do you mean by "format similarity"? Is Levenshtein distance enough?

– JETM
Nov 15 '18 at 12:39

Have you tried this stackoverflow.com/a/17388505/8835357

– specbug
Nov 15 '18 at 12:41

Even easier, since - if I understand correctly - both strings are 15 characters long, simply iterate over the chars of both strings and count how many of them are equal.

– quant
Nov 15 '18 at 12:57

They aren't both 15 characters long.

– Neil
Nov 15 '18 at 13:01

add a comment |

I have a string format which is like:

the word must be 15 letters long

first 8 letters are date

Example: '2009060712ab56c'

Let's say I want to compare this with another string and give a percentage of format similarity like:

result = format_similarity('2009060712ab56c', '20070908njndla56gjhk')

result is let's say in this case 80%.

Is there way of doing this?

edited Nov 15 '18 at 12:40

jonrsharpe

77.8k11105213

asked Nov 15 '18 at 12:30

s900n

462618

3

What do you mean by "format similarity"? Is Levenshtein distance enough?

– JETM
Nov 15 '18 at 12:39

Have you tried this stackoverflow.com/a/17388505/8835357

– specbug
Nov 15 '18 at 12:41

Even easier, since - if I understand correctly - both strings are 15 characters long, simply iterate over the chars of both strings and count how many of them are equal.

– quant
Nov 15 '18 at 12:57

They aren't both 15 characters long.

– Neil
Nov 15 '18 at 13:01

add a comment |

I have a string format which is like:

the word must be 15 letters long

first 8 letters are date

Example: '2009060712ab56c'

Let's say I want to compare this with another string and give a percentage of format similarity like:

result = format_similarity('2009060712ab56c', '20070908njndla56gjhk')

result is let's say in this case 80%.

Is there way of doing this?

edited Nov 15 '18 at 12:40

jonrsharpe

77.8k11105213

asked Nov 15 '18 at 12:30

s900n

462618

I have a string format which is like:

the word must be 15 letters long

first 8 letters are date

Example: '2009060712ab56c'

Let's say I want to compare this with another string and give a percentage of format similarity like:

result = format_similarity('2009060712ab56c', '20070908njndla56gjhk')

result is let's say in this case 80%.

Is there way of doing this?

python string format fuzzy-comparison

edited Nov 15 '18 at 12:40

jonrsharpe

77.8k11105213

asked Nov 15 '18 at 12:30

s900n

462618

edited Nov 15 '18 at 12:40

jonrsharpe

77.8k11105213

asked Nov 15 '18 at 12:30

s900n

462618

edited Nov 15 '18 at 12:40

jonrsharpe

77.8k11105213

edited Nov 15 '18 at 12:40

jonrsharpe

77.8k11105213

edited Nov 15 '18 at 12:40

jonrsharpe

77.8k11105213

asked Nov 15 '18 at 12:30

s900n

462618

asked Nov 15 '18 at 12:30

s900n

462618

asked Nov 15 '18 at 12:30

s900n

462618

3

What do you mean by "format similarity"? Is Levenshtein distance enough?

– JETM
Nov 15 '18 at 12:39

Have you tried this stackoverflow.com/a/17388505/8835357

– specbug
Nov 15 '18 at 12:41

Even easier, since - if I understand correctly - both strings are 15 characters long, simply iterate over the chars of both strings and count how many of them are equal.

– quant
Nov 15 '18 at 12:57

They aren't both 15 characters long.

– Neil
Nov 15 '18 at 13:01

add a comment |

3

What do you mean by "format similarity"? Is Levenshtein distance enough?

– JETM
Nov 15 '18 at 12:39

Have you tried this stackoverflow.com/a/17388505/8835357

– specbug
Nov 15 '18 at 12:41

Even easier, since - if I understand correctly - both strings are 15 characters long, simply iterate over the chars of both strings and count how many of them are equal.

– quant
Nov 15 '18 at 12:57

They aren't both 15 characters long.

– Neil
Nov 15 '18 at 13:01

What do you mean by "format similarity"? Is Levenshtein distance enough?

– JETM
Nov 15 '18 at 12:39

Have you tried this stackoverflow.com/a/17388505/8835357

– specbug
Nov 15 '18 at 12:41

Even easier, since - if I understand correctly - both strings are 15 characters long, simply iterate over the chars of both strings and count how many of them are equal.

– quant
Nov 15 '18 at 12:57

They aren't both 15 characters long.

– Neil
Nov 15 '18 at 13:01

add a comment |

2 Answers
2

active

oldest

votes

Your format consists of two different attributes which would be measured differently. How you combine those into a overall percentage similarity of format would be a business logic question. For example, if there is a missing number at the start, is it totally different now because it is no longer a date? Or is it still similar? But here is how you can get measurements:

import re 

def determine_similarity(string, other):
 length_string = len(string) # use len to get the number of characters in the string
 length_other = len(other)
 number_of_numbers_string = _determine_number_of_numbers(string)
 number_of_numbers_other = _determine_number_of_numbers(other)

 <some logic here to create a metric of simiarity>
 <find the differences and divide them?>


LEADING_NUMBERS = re.compile(
 r"^" # anchor at start of string
 r"[0-9]" # Must be a number
 r"+" # One or more matches
)

def _determine_number_of_numbers(string):
 """
 Determine how many LEADING numbers are in a string
 """
 match = LEADING_NUMBERS.search(string)
 if match is not None:
 length = len(match.group()) # Number of numbers is length of number match group
 else:
 length = 0 # No match means no numbers

 <You might want to check whether the numbers constitute a date within a certain range or something like that>
 <For example, take the first four number and check whether the year is between 1980 and 2018>
 return length

answered Nov 15 '18 at 12:55

Neil

532110

add a comment |

As JETM pointed out in the comments, https://pypi.org/project/python-Levenshtein/ might be a good resource to compare the "closeness", i.e. edit distance of two strings (how many changes have to be made to one string to match the other).

You could create your own implementation of "edit distance" that matches your custom rules such as:

first 8 characters are numeric and form valid date

total string of 15 characters

edited Nov 15 '18 at 13:05

answered Nov 15 '18 at 12:57

jrsh

1269

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53319564%2fcheck-format-similarity-between-two-strings%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

import re 

def determine_similarity(string, other):
 length_string = len(string) # use len to get the number of characters in the string
 length_other = len(other)
 number_of_numbers_string = _determine_number_of_numbers(string)
 number_of_numbers_other = _determine_number_of_numbers(other)

 <some logic here to create a metric of simiarity>
 <find the differences and divide them?>


LEADING_NUMBERS = re.compile(
 r"^" # anchor at start of string
 r"[0-9]" # Must be a number
 r"+" # One or more matches
)

def _determine_number_of_numbers(string):
 """
 Determine how many LEADING numbers are in a string
 """
 match = LEADING_NUMBERS.search(string)
 if match is not None:
 length = len(match.group()) # Number of numbers is length of number match group
 else:
 length = 0 # No match means no numbers

 <You might want to check whether the numbers constitute a date within a certain range or something like that>
 <For example, take the first four number and check whether the year is between 1980 and 2018>
 return length

answered Nov 15 '18 at 12:55

Neil

532110

add a comment |

import re 

def determine_similarity(string, other):
 length_string = len(string) # use len to get the number of characters in the string
 length_other = len(other)
 number_of_numbers_string = _determine_number_of_numbers(string)
 number_of_numbers_other = _determine_number_of_numbers(other)

 <some logic here to create a metric of simiarity>
 <find the differences and divide them?>


LEADING_NUMBERS = re.compile(
 r"^" # anchor at start of string
 r"[0-9]" # Must be a number
 r"+" # One or more matches
)

def _determine_number_of_numbers(string):
 """
 Determine how many LEADING numbers are in a string
 """
 match = LEADING_NUMBERS.search(string)
 if match is not None:
 length = len(match.group()) # Number of numbers is length of number match group
 else:
 length = 0 # No match means no numbers

 <You might want to check whether the numbers constitute a date within a certain range or something like that>
 <For example, take the first four number and check whether the year is between 1980 and 2018>
 return length

answered Nov 15 '18 at 12:55

Neil

532110

add a comment |

import re 

def determine_similarity(string, other):
 length_string = len(string) # use len to get the number of characters in the string
 length_other = len(other)
 number_of_numbers_string = _determine_number_of_numbers(string)
 number_of_numbers_other = _determine_number_of_numbers(other)

 <some logic here to create a metric of simiarity>
 <find the differences and divide them?>


LEADING_NUMBERS = re.compile(
 r"^" # anchor at start of string
 r"[0-9]" # Must be a number
 r"+" # One or more matches
)

def _determine_number_of_numbers(string):
 """
 Determine how many LEADING numbers are in a string
 """
 match = LEADING_NUMBERS.search(string)
 if match is not None:
 length = len(match.group()) # Number of numbers is length of number match group
 else:
 length = 0 # No match means no numbers

 <You might want to check whether the numbers constitute a date within a certain range or something like that>
 <For example, take the first four number and check whether the year is between 1980 and 2018>
 return length

answered Nov 15 '18 at 12:55

Neil

532110

import re 

def determine_similarity(string, other):
 length_string = len(string) # use len to get the number of characters in the string
 length_other = len(other)
 number_of_numbers_string = _determine_number_of_numbers(string)
 number_of_numbers_other = _determine_number_of_numbers(other)

 <some logic here to create a metric of simiarity>
 <find the differences and divide them?>


LEADING_NUMBERS = re.compile(
 r"^" # anchor at start of string
 r"[0-9]" # Must be a number
 r"+" # One or more matches
)

def _determine_number_of_numbers(string):
 """
 Determine how many LEADING numbers are in a string
 """
 match = LEADING_NUMBERS.search(string)
 if match is not None:
 length = len(match.group()) # Number of numbers is length of number match group
 else:
 length = 0 # No match means no numbers

 <You might want to check whether the numbers constitute a date within a certain range or something like that>
 <For example, take the first four number and check whether the year is between 1980 and 2018>
 return length

answered Nov 15 '18 at 12:55

Neil

532110

answered Nov 15 '18 at 12:55

Neil

532110

answered Nov 15 '18 at 12:55

Neil

532110

answered Nov 15 '18 at 12:55

Neil

532110

add a comment |

You could create your own implementation of "edit distance" that matches your custom rules such as:

first 8 characters are numeric and form valid date

total string of 15 characters

edited Nov 15 '18 at 13:05

answered Nov 15 '18 at 12:57

jrsh

1269

add a comment |

You could create your own implementation of "edit distance" that matches your custom rules such as:

first 8 characters are numeric and form valid date

total string of 15 characters

edited Nov 15 '18 at 13:05

answered Nov 15 '18 at 12:57

jrsh

1269

add a comment |

You could create your own implementation of "edit distance" that matches your custom rules such as:

first 8 characters are numeric and form valid date

total string of 15 characters

edited Nov 15 '18 at 13:05

answered Nov 15 '18 at 12:57

jrsh

1269

You could create your own implementation of "edit distance" that matches your custom rules such as:

first 8 characters are numeric and form valid date

total string of 15 characters

edited Nov 15 '18 at 13:05

answered Nov 15 '18 at 12:57

jrsh

1269

edited Nov 15 '18 at 13:05

answered Nov 15 '18 at 12:57

jrsh

1269

answered Nov 15 '18 at 12:57

jrsh

1269

answered Nov 15 '18 at 12:57

jrsh

1269

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Myujth