Splitting string in java : lookbehind with specified length
I want to split a string after the letter "K" or "L" except when either is followed by the letter "P". Meanwhile, I hope not to split if the substring length less than 4 when the string is split on a location.
For example:
- Input:
AYLAKPHKKDIV
- Expected Output
AYLAKPHK
KDIV
Now, I have achieved to split string after the letter "K" or "L" except when either is followed by the letter "P". My regular expression is (?<=[K|R])(?!P)
.
My result:
AYLAKPHK
K
DIV
However, I don't know how to ignore the split location where the substring length less than 4.
My Demo
java regex
add a comment |
I want to split a string after the letter "K" or "L" except when either is followed by the letter "P". Meanwhile, I hope not to split if the substring length less than 4 when the string is split on a location.
For example:
- Input:
AYLAKPHKKDIV
- Expected Output
AYLAKPHK
KDIV
Now, I have achieved to split string after the letter "K" or "L" except when either is followed by the letter "P". My regular expression is (?<=[K|R])(?!P)
.
My result:
AYLAKPHK
K
DIV
However, I don't know how to ignore the split location where the substring length less than 4.
My Demo
java regex
Just to be sure: does "substring" in "I hope not to split if the substring length less than 4" also refer to remaining part? Like in case ofABCKAB
, should we split it intoABCK
AB
or not because ofAB
which would not be at least 4 characters long? Your current example shows similar case but we don't know if it fails because ofK
or (also) because ofDIV
.
– Pshemo
Nov 15 '18 at 10:55
add a comment |
I want to split a string after the letter "K" or "L" except when either is followed by the letter "P". Meanwhile, I hope not to split if the substring length less than 4 when the string is split on a location.
For example:
- Input:
AYLAKPHKKDIV
- Expected Output
AYLAKPHK
KDIV
Now, I have achieved to split string after the letter "K" or "L" except when either is followed by the letter "P". My regular expression is (?<=[K|R])(?!P)
.
My result:
AYLAKPHK
K
DIV
However, I don't know how to ignore the split location where the substring length less than 4.
My Demo
java regex
I want to split a string after the letter "K" or "L" except when either is followed by the letter "P". Meanwhile, I hope not to split if the substring length less than 4 when the string is split on a location.
For example:
- Input:
AYLAKPHKKDIV
- Expected Output
AYLAKPHK
KDIV
Now, I have achieved to split string after the letter "K" or "L" except when either is followed by the letter "P". My regular expression is (?<=[K|R])(?!P)
.
My result:
AYLAKPHK
K
DIV
However, I don't know how to ignore the split location where the substring length less than 4.
My Demo
java regex
java regex
edited Nov 15 '18 at 9:39
huangjs
asked Nov 15 '18 at 9:30
huangjshuangjs
437
437
Just to be sure: does "substring" in "I hope not to split if the substring length less than 4" also refer to remaining part? Like in case ofABCKAB
, should we split it intoABCK
AB
or not because ofAB
which would not be at least 4 characters long? Your current example shows similar case but we don't know if it fails because ofK
or (also) because ofDIV
.
– Pshemo
Nov 15 '18 at 10:55
add a comment |
Just to be sure: does "substring" in "I hope not to split if the substring length less than 4" also refer to remaining part? Like in case ofABCKAB
, should we split it intoABCK
AB
or not because ofAB
which would not be at least 4 characters long? Your current example shows similar case but we don't know if it fails because ofK
or (also) because ofDIV
.
– Pshemo
Nov 15 '18 at 10:55
Just to be sure: does "substring" in "I hope not to split if the substring length less than 4" also refer to remaining part? Like in case of
ABCKAB
, should we split it into ABCK
AB
or not because of AB
which would not be at least 4 characters long? Your current example shows similar case but we don't know if it fails because of K
or (also) because of DIV
.– Pshemo
Nov 15 '18 at 10:55
Just to be sure: does "substring" in "I hope not to split if the substring length less than 4" also refer to remaining part? Like in case of
ABCKAB
, should we split it into ABCK
AB
or not because of AB
which would not be at least 4 characters long? Your current example shows similar case but we don't know if it fails because of K
or (also) because of DIV
.– Pshemo
Nov 15 '18 at 10:55
add a comment |
2 Answers
2
active
oldest
votes
I hope not to split if the substring length less than 4
In other words, you want to have
previous match (split) separated to current match with at least 4 characters, so
ABCKABKKABCD
would split intoABCK|ABKK|ABCD
not but not into `ABCK|ABK|.....at least 4 characters after current split since
ABCKAB
after splitABCK|AB
would haveAB
at the end which length is less than 4.
To achieve first condition you can use G
which represents place of previous match (or start of the string if there ware no matches yet). So first condition can look like (?<=G.4,)
(WARNING: usually look-behind expects obvious maximal length of subregex it handles, but for some reasons .4,
works here, which can be bug or feature added in Java 10 which I am using now. In case it complains about it, you can use some very big number which should be bigger than max amount of characters you expect between two splits like .4,10000000
)
Second condition is simpler since it is just (?=.4)
.
BTW you don't want |
in [K|R]
as there it represents literal, not OR operator since by default any character in character set is alternative choice. So [K|R]
represents K
OR |
OR R
. Use [KR]
instead.
DEMO:
String text = "AYLAKPHKKKKKKDIVK123KAB";
String regex = "(?<=[KR])(?!P)(?<=\G.4,)(?=.4)";
for (String s : text.split(regex))
System.out.println("'"+s+"'");
Output:
'AYLAKPHK'
'KKKK'
'KDIVK'
'123KAB'
The first condition is that I want to. Thank you for giving your detailed explanation. I appreciate it and I learn more knowledge about regular expression. Thanks again.
– huangjs
Nov 16 '18 at 2:29
@huangjs You are welcome. BTW from history of your questions it looks like you may not be aware of accepting answer mechanism. In short, if some answer solves a problem described in the question it can be marked as solution using tick symbol (✓) near its score. So when you have some free time revisit your previously asked questions and if it contains valid answer consider marking it as solution.
– Pshemo
Nov 16 '18 at 9:20
add a comment |
You could use matcher
to match each substring, rather than split
, if possible - you might find logic a bit easier to follow when you can consume characters, rather than having to identify a particular position. Match three or more characters followed by a (K
or R
not followed by P
with .3,?[KR](?!P)
, ensure that it's followed by at least 4 characters with (?=.4)
, OR, if the whole above pattern fails, match the whole rest of the string with .+$
:
String s = "AYLAKPHKKDIV";
List<String> arr = new ArrayList<String>();
Matcher m = Pattern.compile(".3,?[KR](?!P)(?=.4)|.+$").matcher(s);
while(m.find())
arr.add(m.group());
I want to use the regular expression to split the text into terms inelasticsearch
Pattern Analyzer
, so I have to identify a particular position but thank you for helping me.
– huangjs
Nov 16 '18 at 1:56
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53316243%2fsplitting-string-in-java-lookbehind-with-specified-length%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
I hope not to split if the substring length less than 4
In other words, you want to have
previous match (split) separated to current match with at least 4 characters, so
ABCKABKKABCD
would split intoABCK|ABKK|ABCD
not but not into `ABCK|ABK|.....at least 4 characters after current split since
ABCKAB
after splitABCK|AB
would haveAB
at the end which length is less than 4.
To achieve first condition you can use G
which represents place of previous match (or start of the string if there ware no matches yet). So first condition can look like (?<=G.4,)
(WARNING: usually look-behind expects obvious maximal length of subregex it handles, but for some reasons .4,
works here, which can be bug or feature added in Java 10 which I am using now. In case it complains about it, you can use some very big number which should be bigger than max amount of characters you expect between two splits like .4,10000000
)
Second condition is simpler since it is just (?=.4)
.
BTW you don't want |
in [K|R]
as there it represents literal, not OR operator since by default any character in character set is alternative choice. So [K|R]
represents K
OR |
OR R
. Use [KR]
instead.
DEMO:
String text = "AYLAKPHKKKKKKDIVK123KAB";
String regex = "(?<=[KR])(?!P)(?<=\G.4,)(?=.4)";
for (String s : text.split(regex))
System.out.println("'"+s+"'");
Output:
'AYLAKPHK'
'KKKK'
'KDIVK'
'123KAB'
The first condition is that I want to. Thank you for giving your detailed explanation. I appreciate it and I learn more knowledge about regular expression. Thanks again.
– huangjs
Nov 16 '18 at 2:29
@huangjs You are welcome. BTW from history of your questions it looks like you may not be aware of accepting answer mechanism. In short, if some answer solves a problem described in the question it can be marked as solution using tick symbol (✓) near its score. So when you have some free time revisit your previously asked questions and if it contains valid answer consider marking it as solution.
– Pshemo
Nov 16 '18 at 9:20
add a comment |
I hope not to split if the substring length less than 4
In other words, you want to have
previous match (split) separated to current match with at least 4 characters, so
ABCKABKKABCD
would split intoABCK|ABKK|ABCD
not but not into `ABCK|ABK|.....at least 4 characters after current split since
ABCKAB
after splitABCK|AB
would haveAB
at the end which length is less than 4.
To achieve first condition you can use G
which represents place of previous match (or start of the string if there ware no matches yet). So first condition can look like (?<=G.4,)
(WARNING: usually look-behind expects obvious maximal length of subregex it handles, but for some reasons .4,
works here, which can be bug or feature added in Java 10 which I am using now. In case it complains about it, you can use some very big number which should be bigger than max amount of characters you expect between two splits like .4,10000000
)
Second condition is simpler since it is just (?=.4)
.
BTW you don't want |
in [K|R]
as there it represents literal, not OR operator since by default any character in character set is alternative choice. So [K|R]
represents K
OR |
OR R
. Use [KR]
instead.
DEMO:
String text = "AYLAKPHKKKKKKDIVK123KAB";
String regex = "(?<=[KR])(?!P)(?<=\G.4,)(?=.4)";
for (String s : text.split(regex))
System.out.println("'"+s+"'");
Output:
'AYLAKPHK'
'KKKK'
'KDIVK'
'123KAB'
The first condition is that I want to. Thank you for giving your detailed explanation. I appreciate it and I learn more knowledge about regular expression. Thanks again.
– huangjs
Nov 16 '18 at 2:29
@huangjs You are welcome. BTW from history of your questions it looks like you may not be aware of accepting answer mechanism. In short, if some answer solves a problem described in the question it can be marked as solution using tick symbol (✓) near its score. So when you have some free time revisit your previously asked questions and if it contains valid answer consider marking it as solution.
– Pshemo
Nov 16 '18 at 9:20
add a comment |
I hope not to split if the substring length less than 4
In other words, you want to have
previous match (split) separated to current match with at least 4 characters, so
ABCKABKKABCD
would split intoABCK|ABKK|ABCD
not but not into `ABCK|ABK|.....at least 4 characters after current split since
ABCKAB
after splitABCK|AB
would haveAB
at the end which length is less than 4.
To achieve first condition you can use G
which represents place of previous match (or start of the string if there ware no matches yet). So first condition can look like (?<=G.4,)
(WARNING: usually look-behind expects obvious maximal length of subregex it handles, but for some reasons .4,
works here, which can be bug or feature added in Java 10 which I am using now. In case it complains about it, you can use some very big number which should be bigger than max amount of characters you expect between two splits like .4,10000000
)
Second condition is simpler since it is just (?=.4)
.
BTW you don't want |
in [K|R]
as there it represents literal, not OR operator since by default any character in character set is alternative choice. So [K|R]
represents K
OR |
OR R
. Use [KR]
instead.
DEMO:
String text = "AYLAKPHKKKKKKDIVK123KAB";
String regex = "(?<=[KR])(?!P)(?<=\G.4,)(?=.4)";
for (String s : text.split(regex))
System.out.println("'"+s+"'");
Output:
'AYLAKPHK'
'KKKK'
'KDIVK'
'123KAB'
I hope not to split if the substring length less than 4
In other words, you want to have
previous match (split) separated to current match with at least 4 characters, so
ABCKABKKABCD
would split intoABCK|ABKK|ABCD
not but not into `ABCK|ABK|.....at least 4 characters after current split since
ABCKAB
after splitABCK|AB
would haveAB
at the end which length is less than 4.
To achieve first condition you can use G
which represents place of previous match (or start of the string if there ware no matches yet). So first condition can look like (?<=G.4,)
(WARNING: usually look-behind expects obvious maximal length of subregex it handles, but for some reasons .4,
works here, which can be bug or feature added in Java 10 which I am using now. In case it complains about it, you can use some very big number which should be bigger than max amount of characters you expect between two splits like .4,10000000
)
Second condition is simpler since it is just (?=.4)
.
BTW you don't want |
in [K|R]
as there it represents literal, not OR operator since by default any character in character set is alternative choice. So [K|R]
represents K
OR |
OR R
. Use [KR]
instead.
DEMO:
String text = "AYLAKPHKKKKKKDIVK123KAB";
String regex = "(?<=[KR])(?!P)(?<=\G.4,)(?=.4)";
for (String s : text.split(regex))
System.out.println("'"+s+"'");
Output:
'AYLAKPHK'
'KKKK'
'KDIVK'
'123KAB'
answered Nov 15 '18 at 10:30
PshemoPshemo
95k15131192
95k15131192
The first condition is that I want to. Thank you for giving your detailed explanation. I appreciate it and I learn more knowledge about regular expression. Thanks again.
– huangjs
Nov 16 '18 at 2:29
@huangjs You are welcome. BTW from history of your questions it looks like you may not be aware of accepting answer mechanism. In short, if some answer solves a problem described in the question it can be marked as solution using tick symbol (✓) near its score. So when you have some free time revisit your previously asked questions and if it contains valid answer consider marking it as solution.
– Pshemo
Nov 16 '18 at 9:20
add a comment |
The first condition is that I want to. Thank you for giving your detailed explanation. I appreciate it and I learn more knowledge about regular expression. Thanks again.
– huangjs
Nov 16 '18 at 2:29
@huangjs You are welcome. BTW from history of your questions it looks like you may not be aware of accepting answer mechanism. In short, if some answer solves a problem described in the question it can be marked as solution using tick symbol (✓) near its score. So when you have some free time revisit your previously asked questions and if it contains valid answer consider marking it as solution.
– Pshemo
Nov 16 '18 at 9:20
The first condition is that I want to. Thank you for giving your detailed explanation. I appreciate it and I learn more knowledge about regular expression. Thanks again.
– huangjs
Nov 16 '18 at 2:29
The first condition is that I want to. Thank you for giving your detailed explanation. I appreciate it and I learn more knowledge about regular expression. Thanks again.
– huangjs
Nov 16 '18 at 2:29
@huangjs You are welcome. BTW from history of your questions it looks like you may not be aware of accepting answer mechanism. In short, if some answer solves a problem described in the question it can be marked as solution using tick symbol (✓) near its score. So when you have some free time revisit your previously asked questions and if it contains valid answer consider marking it as solution.
– Pshemo
Nov 16 '18 at 9:20
@huangjs You are welcome. BTW from history of your questions it looks like you may not be aware of accepting answer mechanism. In short, if some answer solves a problem described in the question it can be marked as solution using tick symbol (✓) near its score. So when you have some free time revisit your previously asked questions and if it contains valid answer consider marking it as solution.
– Pshemo
Nov 16 '18 at 9:20
add a comment |
You could use matcher
to match each substring, rather than split
, if possible - you might find logic a bit easier to follow when you can consume characters, rather than having to identify a particular position. Match three or more characters followed by a (K
or R
not followed by P
with .3,?[KR](?!P)
, ensure that it's followed by at least 4 characters with (?=.4)
, OR, if the whole above pattern fails, match the whole rest of the string with .+$
:
String s = "AYLAKPHKKDIV";
List<String> arr = new ArrayList<String>();
Matcher m = Pattern.compile(".3,?[KR](?!P)(?=.4)|.+$").matcher(s);
while(m.find())
arr.add(m.group());
I want to use the regular expression to split the text into terms inelasticsearch
Pattern Analyzer
, so I have to identify a particular position but thank you for helping me.
– huangjs
Nov 16 '18 at 1:56
add a comment |
You could use matcher
to match each substring, rather than split
, if possible - you might find logic a bit easier to follow when you can consume characters, rather than having to identify a particular position. Match three or more characters followed by a (K
or R
not followed by P
with .3,?[KR](?!P)
, ensure that it's followed by at least 4 characters with (?=.4)
, OR, if the whole above pattern fails, match the whole rest of the string with .+$
:
String s = "AYLAKPHKKDIV";
List<String> arr = new ArrayList<String>();
Matcher m = Pattern.compile(".3,?[KR](?!P)(?=.4)|.+$").matcher(s);
while(m.find())
arr.add(m.group());
I want to use the regular expression to split the text into terms inelasticsearch
Pattern Analyzer
, so I have to identify a particular position but thank you for helping me.
– huangjs
Nov 16 '18 at 1:56
add a comment |
You could use matcher
to match each substring, rather than split
, if possible - you might find logic a bit easier to follow when you can consume characters, rather than having to identify a particular position. Match three or more characters followed by a (K
or R
not followed by P
with .3,?[KR](?!P)
, ensure that it's followed by at least 4 characters with (?=.4)
, OR, if the whole above pattern fails, match the whole rest of the string with .+$
:
String s = "AYLAKPHKKDIV";
List<String> arr = new ArrayList<String>();
Matcher m = Pattern.compile(".3,?[KR](?!P)(?=.4)|.+$").matcher(s);
while(m.find())
arr.add(m.group());
You could use matcher
to match each substring, rather than split
, if possible - you might find logic a bit easier to follow when you can consume characters, rather than having to identify a particular position. Match three or more characters followed by a (K
or R
not followed by P
with .3,?[KR](?!P)
, ensure that it's followed by at least 4 characters with (?=.4)
, OR, if the whole above pattern fails, match the whole rest of the string with .+$
:
String s = "AYLAKPHKKDIV";
List<String> arr = new ArrayList<String>();
Matcher m = Pattern.compile(".3,?[KR](?!P)(?=.4)|.+$").matcher(s);
while(m.find())
arr.add(m.group());
edited Nov 15 '18 at 10:41
answered Nov 15 '18 at 10:34
CertainPerformanceCertainPerformance
88.6k154876
88.6k154876
I want to use the regular expression to split the text into terms inelasticsearch
Pattern Analyzer
, so I have to identify a particular position but thank you for helping me.
– huangjs
Nov 16 '18 at 1:56
add a comment |
I want to use the regular expression to split the text into terms inelasticsearch
Pattern Analyzer
, so I have to identify a particular position but thank you for helping me.
– huangjs
Nov 16 '18 at 1:56
I want to use the regular expression to split the text into terms in
elasticsearch
Pattern Analyzer
, so I have to identify a particular position but thank you for helping me.– huangjs
Nov 16 '18 at 1:56
I want to use the regular expression to split the text into terms in
elasticsearch
Pattern Analyzer
, so I have to identify a particular position but thank you for helping me.– huangjs
Nov 16 '18 at 1:56
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53316243%2fsplitting-string-in-java-lookbehind-with-specified-length%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Just to be sure: does "substring" in "I hope not to split if the substring length less than 4" also refer to remaining part? Like in case of
ABCKAB
, should we split it intoABCK
AB
or not because ofAB
which would not be at least 4 characters long? Your current example shows similar case but we don't know if it fails because ofK
or (also) because ofDIV
.– Pshemo
Nov 15 '18 at 10:55