Splitting string in java : lookbehind with specified length










3















I want to split a string after the letter "K" or "L" except when either is followed by the letter "P". Meanwhile, I hope not to split if the substring length less than 4 when the string is split on a location.
For example:



- Input:
AYLAKPHKKDIV

- Expected Output
AYLAKPHK
KDIV


Now, I have achieved to split string after the letter "K" or "L" except when either is followed by the letter "P". My regular expression is (?<=[K|R])(?!P).



My result:
AYLAKPHK
K
DIV


However, I don't know how to ignore the split location where the substring length less than 4.



My Demo










share|improve this question
























  • Just to be sure: does "substring" in "I hope not to split if the substring length less than 4" also refer to remaining part? Like in case of ABCKAB, should we split it into ABCK AB or not because of AB which would not be at least 4 characters long? Your current example shows similar case but we don't know if it fails because of K or (also) because of DIV.

    – Pshemo
    Nov 15 '18 at 10:55















3















I want to split a string after the letter "K" or "L" except when either is followed by the letter "P". Meanwhile, I hope not to split if the substring length less than 4 when the string is split on a location.
For example:



- Input:
AYLAKPHKKDIV

- Expected Output
AYLAKPHK
KDIV


Now, I have achieved to split string after the letter "K" or "L" except when either is followed by the letter "P". My regular expression is (?<=[K|R])(?!P).



My result:
AYLAKPHK
K
DIV


However, I don't know how to ignore the split location where the substring length less than 4.



My Demo










share|improve this question
























  • Just to be sure: does "substring" in "I hope not to split if the substring length less than 4" also refer to remaining part? Like in case of ABCKAB, should we split it into ABCK AB or not because of AB which would not be at least 4 characters long? Your current example shows similar case but we don't know if it fails because of K or (also) because of DIV.

    – Pshemo
    Nov 15 '18 at 10:55













3












3








3








I want to split a string after the letter "K" or "L" except when either is followed by the letter "P". Meanwhile, I hope not to split if the substring length less than 4 when the string is split on a location.
For example:



- Input:
AYLAKPHKKDIV

- Expected Output
AYLAKPHK
KDIV


Now, I have achieved to split string after the letter "K" or "L" except when either is followed by the letter "P". My regular expression is (?<=[K|R])(?!P).



My result:
AYLAKPHK
K
DIV


However, I don't know how to ignore the split location where the substring length less than 4.



My Demo










share|improve this question
















I want to split a string after the letter "K" or "L" except when either is followed by the letter "P". Meanwhile, I hope not to split if the substring length less than 4 when the string is split on a location.
For example:



- Input:
AYLAKPHKKDIV

- Expected Output
AYLAKPHK
KDIV


Now, I have achieved to split string after the letter "K" or "L" except when either is followed by the letter "P". My regular expression is (?<=[K|R])(?!P).



My result:
AYLAKPHK
K
DIV


However, I don't know how to ignore the split location where the substring length less than 4.



My Demo







java regex






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 15 '18 at 9:39







huangjs

















asked Nov 15 '18 at 9:30









huangjshuangjs

437




437












  • Just to be sure: does "substring" in "I hope not to split if the substring length less than 4" also refer to remaining part? Like in case of ABCKAB, should we split it into ABCK AB or not because of AB which would not be at least 4 characters long? Your current example shows similar case but we don't know if it fails because of K or (also) because of DIV.

    – Pshemo
    Nov 15 '18 at 10:55

















  • Just to be sure: does "substring" in "I hope not to split if the substring length less than 4" also refer to remaining part? Like in case of ABCKAB, should we split it into ABCK AB or not because of AB which would not be at least 4 characters long? Your current example shows similar case but we don't know if it fails because of K or (also) because of DIV.

    – Pshemo
    Nov 15 '18 at 10:55
















Just to be sure: does "substring" in "I hope not to split if the substring length less than 4" also refer to remaining part? Like in case of ABCKAB, should we split it into ABCK AB or not because of AB which would not be at least 4 characters long? Your current example shows similar case but we don't know if it fails because of K or (also) because of DIV.

– Pshemo
Nov 15 '18 at 10:55





Just to be sure: does "substring" in "I hope not to split if the substring length less than 4" also refer to remaining part? Like in case of ABCKAB, should we split it into ABCK AB or not because of AB which would not be at least 4 characters long? Your current example shows similar case but we don't know if it fails because of K or (also) because of DIV.

– Pshemo
Nov 15 '18 at 10:55












2 Answers
2






active

oldest

votes


















1















I hope not to split if the substring length less than 4




In other words, you want to have



  1. previous match (split) separated to current match with at least 4 characters, so ABCKABKKABCD would split into ABCK|ABKK|ABCD not but not into `ABCK|ABK|.....


  2. at least 4 characters after current split since ABCKAB after split ABCK|AB would have AB at the end which length is less than 4.


To achieve first condition you can use G which represents place of previous match (or start of the string if there ware no matches yet). So first condition can look like (?<=G.4,) (WARNING: usually look-behind expects obvious maximal length of subregex it handles, but for some reasons .4, works here, which can be bug or feature added in Java 10 which I am using now. In case it complains about it, you can use some very big number which should be bigger than max amount of characters you expect between two splits like .4,10000000)



Second condition is simpler since it is just (?=.4).



BTW you don't want | in [K|R] as there it represents literal, not OR operator since by default any character in character set is alternative choice. So [K|R] represents K OR | OR R. Use [KR] instead.



DEMO:



String text = "AYLAKPHKKKKKKDIVK123KAB";
String regex = "(?<=[KR])(?!P)(?<=\G.4,)(?=.4)";
for (String s : text.split(regex))
System.out.println("'"+s+"'");



Output:



'AYLAKPHK'
'KKKK'
'KDIVK'
'123KAB'





share|improve this answer























  • The first condition is that I want to. Thank you for giving your detailed explanation. I appreciate it and I learn more knowledge about regular expression. Thanks again.

    – huangjs
    Nov 16 '18 at 2:29











  • @huangjs You are welcome. BTW from history of your questions it looks like you may not be aware of accepting answer mechanism. In short, if some answer solves a problem described in the question it can be marked as solution using tick symbol (✓) near its score. So when you have some free time revisit your previously asked questions and if it contains valid answer consider marking it as solution.

    – Pshemo
    Nov 16 '18 at 9:20



















0














You could use matcher to match each substring, rather than split, if possible - you might find logic a bit easier to follow when you can consume characters, rather than having to identify a particular position. Match three or more characters followed by a (K or R not followed by P with .3,?[KR](?!P), ensure that it's followed by at least 4 characters with (?=.4), OR, if the whole above pattern fails, match the whole rest of the string with .+$:



String s = "AYLAKPHKKDIV";
List<String> arr = new ArrayList<String>();
Matcher m = Pattern.compile(".3,?[KR](?!P)(?=.4)|.+$").matcher(s);
while(m.find())
arr.add(m.group());






share|improve this answer

























  • I want to use the regular expression to split the text into terms in elasticsearch Pattern Analyzer, so I have to identify a particular position but thank you for helping me.

    – huangjs
    Nov 16 '18 at 1:56










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53316243%2fsplitting-string-in-java-lookbehind-with-specified-length%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









1















I hope not to split if the substring length less than 4




In other words, you want to have



  1. previous match (split) separated to current match with at least 4 characters, so ABCKABKKABCD would split into ABCK|ABKK|ABCD not but not into `ABCK|ABK|.....


  2. at least 4 characters after current split since ABCKAB after split ABCK|AB would have AB at the end which length is less than 4.


To achieve first condition you can use G which represents place of previous match (or start of the string if there ware no matches yet). So first condition can look like (?<=G.4,) (WARNING: usually look-behind expects obvious maximal length of subregex it handles, but for some reasons .4, works here, which can be bug or feature added in Java 10 which I am using now. In case it complains about it, you can use some very big number which should be bigger than max amount of characters you expect between two splits like .4,10000000)



Second condition is simpler since it is just (?=.4).



BTW you don't want | in [K|R] as there it represents literal, not OR operator since by default any character in character set is alternative choice. So [K|R] represents K OR | OR R. Use [KR] instead.



DEMO:



String text = "AYLAKPHKKKKKKDIVK123KAB";
String regex = "(?<=[KR])(?!P)(?<=\G.4,)(?=.4)";
for (String s : text.split(regex))
System.out.println("'"+s+"'");



Output:



'AYLAKPHK'
'KKKK'
'KDIVK'
'123KAB'





share|improve this answer























  • The first condition is that I want to. Thank you for giving your detailed explanation. I appreciate it and I learn more knowledge about regular expression. Thanks again.

    – huangjs
    Nov 16 '18 at 2:29











  • @huangjs You are welcome. BTW from history of your questions it looks like you may not be aware of accepting answer mechanism. In short, if some answer solves a problem described in the question it can be marked as solution using tick symbol (✓) near its score. So when you have some free time revisit your previously asked questions and if it contains valid answer consider marking it as solution.

    – Pshemo
    Nov 16 '18 at 9:20
















1















I hope not to split if the substring length less than 4




In other words, you want to have



  1. previous match (split) separated to current match with at least 4 characters, so ABCKABKKABCD would split into ABCK|ABKK|ABCD not but not into `ABCK|ABK|.....


  2. at least 4 characters after current split since ABCKAB after split ABCK|AB would have AB at the end which length is less than 4.


To achieve first condition you can use G which represents place of previous match (or start of the string if there ware no matches yet). So first condition can look like (?<=G.4,) (WARNING: usually look-behind expects obvious maximal length of subregex it handles, but for some reasons .4, works here, which can be bug or feature added in Java 10 which I am using now. In case it complains about it, you can use some very big number which should be bigger than max amount of characters you expect between two splits like .4,10000000)



Second condition is simpler since it is just (?=.4).



BTW you don't want | in [K|R] as there it represents literal, not OR operator since by default any character in character set is alternative choice. So [K|R] represents K OR | OR R. Use [KR] instead.



DEMO:



String text = "AYLAKPHKKKKKKDIVK123KAB";
String regex = "(?<=[KR])(?!P)(?<=\G.4,)(?=.4)";
for (String s : text.split(regex))
System.out.println("'"+s+"'");



Output:



'AYLAKPHK'
'KKKK'
'KDIVK'
'123KAB'





share|improve this answer























  • The first condition is that I want to. Thank you for giving your detailed explanation. I appreciate it and I learn more knowledge about regular expression. Thanks again.

    – huangjs
    Nov 16 '18 at 2:29











  • @huangjs You are welcome. BTW from history of your questions it looks like you may not be aware of accepting answer mechanism. In short, if some answer solves a problem described in the question it can be marked as solution using tick symbol (✓) near its score. So when you have some free time revisit your previously asked questions and if it contains valid answer consider marking it as solution.

    – Pshemo
    Nov 16 '18 at 9:20














1












1








1








I hope not to split if the substring length less than 4




In other words, you want to have



  1. previous match (split) separated to current match with at least 4 characters, so ABCKABKKABCD would split into ABCK|ABKK|ABCD not but not into `ABCK|ABK|.....


  2. at least 4 characters after current split since ABCKAB after split ABCK|AB would have AB at the end which length is less than 4.


To achieve first condition you can use G which represents place of previous match (or start of the string if there ware no matches yet). So first condition can look like (?<=G.4,) (WARNING: usually look-behind expects obvious maximal length of subregex it handles, but for some reasons .4, works here, which can be bug or feature added in Java 10 which I am using now. In case it complains about it, you can use some very big number which should be bigger than max amount of characters you expect between two splits like .4,10000000)



Second condition is simpler since it is just (?=.4).



BTW you don't want | in [K|R] as there it represents literal, not OR operator since by default any character in character set is alternative choice. So [K|R] represents K OR | OR R. Use [KR] instead.



DEMO:



String text = "AYLAKPHKKKKKKDIVK123KAB";
String regex = "(?<=[KR])(?!P)(?<=\G.4,)(?=.4)";
for (String s : text.split(regex))
System.out.println("'"+s+"'");



Output:



'AYLAKPHK'
'KKKK'
'KDIVK'
'123KAB'





share|improve this answer














I hope not to split if the substring length less than 4




In other words, you want to have



  1. previous match (split) separated to current match with at least 4 characters, so ABCKABKKABCD would split into ABCK|ABKK|ABCD not but not into `ABCK|ABK|.....


  2. at least 4 characters after current split since ABCKAB after split ABCK|AB would have AB at the end which length is less than 4.


To achieve first condition you can use G which represents place of previous match (or start of the string if there ware no matches yet). So first condition can look like (?<=G.4,) (WARNING: usually look-behind expects obvious maximal length of subregex it handles, but for some reasons .4, works here, which can be bug or feature added in Java 10 which I am using now. In case it complains about it, you can use some very big number which should be bigger than max amount of characters you expect between two splits like .4,10000000)



Second condition is simpler since it is just (?=.4).



BTW you don't want | in [K|R] as there it represents literal, not OR operator since by default any character in character set is alternative choice. So [K|R] represents K OR | OR R. Use [KR] instead.



DEMO:



String text = "AYLAKPHKKKKKKDIVK123KAB";
String regex = "(?<=[KR])(?!P)(?<=\G.4,)(?=.4)";
for (String s : text.split(regex))
System.out.println("'"+s+"'");



Output:



'AYLAKPHK'
'KKKK'
'KDIVK'
'123KAB'






share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 15 '18 at 10:30









PshemoPshemo

95k15131192




95k15131192












  • The first condition is that I want to. Thank you for giving your detailed explanation. I appreciate it and I learn more knowledge about regular expression. Thanks again.

    – huangjs
    Nov 16 '18 at 2:29











  • @huangjs You are welcome. BTW from history of your questions it looks like you may not be aware of accepting answer mechanism. In short, if some answer solves a problem described in the question it can be marked as solution using tick symbol (✓) near its score. So when you have some free time revisit your previously asked questions and if it contains valid answer consider marking it as solution.

    – Pshemo
    Nov 16 '18 at 9:20


















  • The first condition is that I want to. Thank you for giving your detailed explanation. I appreciate it and I learn more knowledge about regular expression. Thanks again.

    – huangjs
    Nov 16 '18 at 2:29











  • @huangjs You are welcome. BTW from history of your questions it looks like you may not be aware of accepting answer mechanism. In short, if some answer solves a problem described in the question it can be marked as solution using tick symbol (✓) near its score. So when you have some free time revisit your previously asked questions and if it contains valid answer consider marking it as solution.

    – Pshemo
    Nov 16 '18 at 9:20

















The first condition is that I want to. Thank you for giving your detailed explanation. I appreciate it and I learn more knowledge about regular expression. Thanks again.

– huangjs
Nov 16 '18 at 2:29





The first condition is that I want to. Thank you for giving your detailed explanation. I appreciate it and I learn more knowledge about regular expression. Thanks again.

– huangjs
Nov 16 '18 at 2:29













@huangjs You are welcome. BTW from history of your questions it looks like you may not be aware of accepting answer mechanism. In short, if some answer solves a problem described in the question it can be marked as solution using tick symbol (✓) near its score. So when you have some free time revisit your previously asked questions and if it contains valid answer consider marking it as solution.

– Pshemo
Nov 16 '18 at 9:20






@huangjs You are welcome. BTW from history of your questions it looks like you may not be aware of accepting answer mechanism. In short, if some answer solves a problem described in the question it can be marked as solution using tick symbol (✓) near its score. So when you have some free time revisit your previously asked questions and if it contains valid answer consider marking it as solution.

– Pshemo
Nov 16 '18 at 9:20














0














You could use matcher to match each substring, rather than split, if possible - you might find logic a bit easier to follow when you can consume characters, rather than having to identify a particular position. Match three or more characters followed by a (K or R not followed by P with .3,?[KR](?!P), ensure that it's followed by at least 4 characters with (?=.4), OR, if the whole above pattern fails, match the whole rest of the string with .+$:



String s = "AYLAKPHKKDIV";
List<String> arr = new ArrayList<String>();
Matcher m = Pattern.compile(".3,?[KR](?!P)(?=.4)|.+$").matcher(s);
while(m.find())
arr.add(m.group());






share|improve this answer

























  • I want to use the regular expression to split the text into terms in elasticsearch Pattern Analyzer, so I have to identify a particular position but thank you for helping me.

    – huangjs
    Nov 16 '18 at 1:56















0














You could use matcher to match each substring, rather than split, if possible - you might find logic a bit easier to follow when you can consume characters, rather than having to identify a particular position. Match three or more characters followed by a (K or R not followed by P with .3,?[KR](?!P), ensure that it's followed by at least 4 characters with (?=.4), OR, if the whole above pattern fails, match the whole rest of the string with .+$:



String s = "AYLAKPHKKDIV";
List<String> arr = new ArrayList<String>();
Matcher m = Pattern.compile(".3,?[KR](?!P)(?=.4)|.+$").matcher(s);
while(m.find())
arr.add(m.group());






share|improve this answer

























  • I want to use the regular expression to split the text into terms in elasticsearch Pattern Analyzer, so I have to identify a particular position but thank you for helping me.

    – huangjs
    Nov 16 '18 at 1:56













0












0








0







You could use matcher to match each substring, rather than split, if possible - you might find logic a bit easier to follow when you can consume characters, rather than having to identify a particular position. Match three or more characters followed by a (K or R not followed by P with .3,?[KR](?!P), ensure that it's followed by at least 4 characters with (?=.4), OR, if the whole above pattern fails, match the whole rest of the string with .+$:



String s = "AYLAKPHKKDIV";
List<String> arr = new ArrayList<String>();
Matcher m = Pattern.compile(".3,?[KR](?!P)(?=.4)|.+$").matcher(s);
while(m.find())
arr.add(m.group());






share|improve this answer















You could use matcher to match each substring, rather than split, if possible - you might find logic a bit easier to follow when you can consume characters, rather than having to identify a particular position. Match three or more characters followed by a (K or R not followed by P with .3,?[KR](?!P), ensure that it's followed by at least 4 characters with (?=.4), OR, if the whole above pattern fails, match the whole rest of the string with .+$:



String s = "AYLAKPHKKDIV";
List<String> arr = new ArrayList<String>();
Matcher m = Pattern.compile(".3,?[KR](?!P)(?=.4)|.+$").matcher(s);
while(m.find())
arr.add(m.group());







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 15 '18 at 10:41

























answered Nov 15 '18 at 10:34









CertainPerformanceCertainPerformance

88.6k154876




88.6k154876












  • I want to use the regular expression to split the text into terms in elasticsearch Pattern Analyzer, so I have to identify a particular position but thank you for helping me.

    – huangjs
    Nov 16 '18 at 1:56

















  • I want to use the regular expression to split the text into terms in elasticsearch Pattern Analyzer, so I have to identify a particular position but thank you for helping me.

    – huangjs
    Nov 16 '18 at 1:56
















I want to use the regular expression to split the text into terms in elasticsearch Pattern Analyzer, so I have to identify a particular position but thank you for helping me.

– huangjs
Nov 16 '18 at 1:56





I want to use the regular expression to split the text into terms in elasticsearch Pattern Analyzer, so I have to identify a particular position but thank you for helping me.

– huangjs
Nov 16 '18 at 1:56

















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53316243%2fsplitting-string-in-java-lookbehind-with-specified-length%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Top Tejano songwriter Luis Silva dead of heart attack at 64

政党

天津地下鉄3号線