Best way to get just strings, integers, and/or floats from a data file in python?

For example:

My input:

Input:
zxxxxyzzxyxyxyzxzzxzzzyzzxxxzxxyyyzxyxzyxyxyzyyyyzzyyyyzzxzxzyzzzzyxzxxxyxxxxyyzyyzyyyxzzzzyzxyzzyyy
--------
x y z
--------
A B
--------
 A B
A 0.634 0.366 
B 0.387 0.613 
--------
 x y z
A 0.532 0.226 0.241 
B 0.457 0.192 0.351


Output:
AAAAAAAAAAAAAABBBBBBBBBBBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBBBBBBBBBAAAAAAAAAAAAAAAAAAAAABBBBBBBBBBAAA

Right now I'm using this:

import sys, re

data = 
for line in sys.stdin.readlines():
 data.append(''.join(line.strip().split()))

cleanup = 
for i in range(len(data)):
 cleanup.append(re.sub(r"S+", " ", data[i]))

print(data)

and my output looks like this:

['zxxxxyzzxyxyxyzxzzxzzzyzzxxxzxxyyyzxyxzyxyxyzyyyyzzyyyyzzxzxzyzzzzyxzxxxyxxxxyyzyyzyyyxzzzzyzxyzzyyy', '--------', 'xyz', '--------', 'AB', '--------', 'AB', 'A0.6340.366', 'B0.3870.613', '--------', 'xyz', 'A0.5320.2260.241', 'B0.4570.1920.351']

But what I want my data list to look like is:

print(data)
['zxxxxyzzxyxyxyzxzzxzzzyzzxxxzxxyyyzxyxzyxyxyzyyyyzzyyyyzzxzxzyzzzzyxzxxxyxxxxyyzyyzyyyxzzzzyzxyzzyyy', 'x', 'y', 'z', 'A', 'B', '0.634', '0.366', '0.387', '0.613', '0.532', '0.226', '0.241', '0.457', '0.192', '0.351']

edited Nov 15 '18 at 1:46

Spencer Wieczorek

17.5k43345

asked Nov 15 '18 at 1:36

asttra

807

add a comment |

For example:

My input:

Input:
zxxxxyzzxyxyxyzxzzxzzzyzzxxxzxxyyyzxyxzyxyxyzyyyyzzyyyyzzxzxzyzzzzyxzxxxyxxxxyyzyyzyyyxzzzzyzxyzzyyy
--------
x y z
--------
A B
--------
 A B
A 0.634 0.366 
B 0.387 0.613 
--------
 x y z
A 0.532 0.226 0.241 
B 0.457 0.192 0.351


Output:
AAAAAAAAAAAAAABBBBBBBBBBBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBBBBBBBBBAAAAAAAAAAAAAAAAAAAAABBBBBBBBBBAAA

Right now I'm using this:

import sys, re

data = 
for line in sys.stdin.readlines():
 data.append(''.join(line.strip().split()))

cleanup = 
for i in range(len(data)):
 cleanup.append(re.sub(r"S+", " ", data[i]))

print(data)

and my output looks like this:

['zxxxxyzzxyxyxyzxzzxzzzyzzxxxzxxyyyzxyxzyxyxyzyyyyzzyyyyzzxzxzyzzzzyxzxxxyxxxxyyzyyzyyyxzzzzyzxyzzyyy', '--------', 'xyz', '--------', 'AB', '--------', 'AB', 'A0.6340.366', 'B0.3870.613', '--------', 'xyz', 'A0.5320.2260.241', 'B0.4570.1920.351']

But what I want my data list to look like is:

print(data)
['zxxxxyzzxyxyxyzxzzxzzzyzzxxxzxxyyyzxyxzyxyxyzyyyyzzyyyyzzxzxzyzzzzyxzxxxyxxxxyyzyyzyyyxzzzzyzxyzzyyy', 'x', 'y', 'z', 'A', 'B', '0.634', '0.366', '0.387', '0.613', '0.532', '0.226', '0.241', '0.457', '0.192', '0.351']

edited Nov 15 '18 at 1:46

Spencer Wieczorek

17.5k43345

asked Nov 15 '18 at 1:36

asttra

807

add a comment |

For example:

My input:

Input:
zxxxxyzzxyxyxyzxzzxzzzyzzxxxzxxyyyzxyxzyxyxyzyyyyzzyyyyzzxzxzyzzzzyxzxxxyxxxxyyzyyzyyyxzzzzyzxyzzyyy
--------
x y z
--------
A B
--------
 A B
A 0.634 0.366 
B 0.387 0.613 
--------
 x y z
A 0.532 0.226 0.241 
B 0.457 0.192 0.351


Output:
AAAAAAAAAAAAAABBBBBBBBBBBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBBBBBBBBBAAAAAAAAAAAAAAAAAAAAABBBBBBBBBBAAA

Right now I'm using this:

import sys, re

data = 
for line in sys.stdin.readlines():
 data.append(''.join(line.strip().split()))

cleanup = 
for i in range(len(data)):
 cleanup.append(re.sub(r"S+", " ", data[i]))

print(data)

and my output looks like this:

['zxxxxyzzxyxyxyzxzzxzzzyzzxxxzxxyyyzxyxzyxyxyzyyyyzzyyyyzzxzxzyzzzzyxzxxxyxxxxyyzyyzyyyxzzzzyzxyzzyyy', '--------', 'xyz', '--------', 'AB', '--------', 'AB', 'A0.6340.366', 'B0.3870.613', '--------', 'xyz', 'A0.5320.2260.241', 'B0.4570.1920.351']

But what I want my data list to look like is:

print(data)
['zxxxxyzzxyxyxyzxzzxzzzyzzxxxzxxyyyzxyxzyxyxyzyyyyzzyyyyzzxzxzyzzzzyxzxxxyxxxxyyzyyzyyyxzzzzyzxyzzyyy', 'x', 'y', 'z', 'A', 'B', '0.634', '0.366', '0.387', '0.613', '0.532', '0.226', '0.241', '0.457', '0.192', '0.351']

edited Nov 15 '18 at 1:46

Spencer Wieczorek

17.5k43345

asked Nov 15 '18 at 1:36

asttra

807

For example:

My input:

Input:
zxxxxyzzxyxyxyzxzzxzzzyzzxxxzxxyyyzxyxzyxyxyzyyyyzzyyyyzzxzxzyzzzzyxzxxxyxxxxyyzyyzyyyxzzzzyzxyzzyyy
--------
x y z
--------
A B
--------
 A B
A 0.634 0.366 
B 0.387 0.613 
--------
 x y z
A 0.532 0.226 0.241 
B 0.457 0.192 0.351


Output:
AAAAAAAAAAAAAABBBBBBBBBBBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBBBBBBBBBAAAAAAAAAAAAAAAAAAAAABBBBBBBBBBAAA

Right now I'm using this:

import sys, re

data = 
for line in sys.stdin.readlines():
 data.append(''.join(line.strip().split()))

cleanup = 
for i in range(len(data)):
 cleanup.append(re.sub(r"S+", " ", data[i]))

print(data)

and my output looks like this:

['zxxxxyzzxyxyxyzxzzxzzzyzzxxxzxxyyyzxyxzyxyxyzyyyyzzyyyyzzxzxzyzzzzyxzxxxyxxxxyyzyyzyyyxzzzzyzxyzzyyy', '--------', 'xyz', '--------', 'AB', '--------', 'AB', 'A0.6340.366', 'B0.3870.613', '--------', 'xyz', 'A0.5320.2260.241', 'B0.4570.1920.351']

But what I want my data list to look like is:

print(data)
['zxxxxyzzxyxyxyzxzzxzzzyzzxxxzxxyyyzxyxzyxyxyzyyyyzzyyyyzzxzxzyzzzzyxzxxxyxxxxyyzyyzyyyxzzzzyzxyzzyyy', 'x', 'y', 'z', 'A', 'B', '0.634', '0.366', '0.387', '0.613', '0.532', '0.226', '0.241', '0.457', '0.192', '0.351']

python regex python-3.x parsing

edited Nov 15 '18 at 1:46

Spencer Wieczorek

17.5k43345

asked Nov 15 '18 at 1:36

asttra

807

edited Nov 15 '18 at 1:46

Spencer Wieczorek

17.5k43345

asked Nov 15 '18 at 1:36

asttra

807

edited Nov 15 '18 at 1:46

Spencer Wieczorek

17.5k43345

edited Nov 15 '18 at 1:46

Spencer Wieczorek

17.5k43345

edited Nov 15 '18 at 1:46

Spencer Wieczorek

17.5k43345

asked Nov 15 '18 at 1:36

asttra

807

asked Nov 15 '18 at 1:36

asttra

807

asked Nov 15 '18 at 1:36

asttra

807

add a comment |

2 Answers
2

active

oldest

votes

You are almost right. You simply need to not join back the split() result. Instead, append the data list with each element from the split()

import sys, re

data = 
for line in sys.stdin.readlines():
 for x in re.sub(r"[^a-zA-Zds.]", "", line).strip().split():
 data.append(x)

print(data)

edited Nov 15 '18 at 2:13

answered Nov 15 '18 at 1:56

Andreas

1,97031018

1

This is close but not completely correct. Id' add re.sub(r"[^a-zA-Zds.]", "", x) so this excludes non-alphanumeric words the OP doesn't want such as "--------".

– Spencer Wieczorek
Nov 15 '18 at 2:05

Edited based on your comment. Thank you!

– Andreas
Nov 15 '18 at 2:13

add a comment |

You could do it like this...

rawLines = raw.split("n")

data = 
data["seq"] = rawLines[1]

data["mat1"] = 
for k in [8,9]:
 temp = rawLines[k].split("t")
 if(k==8):
 data["mat1"]["A"] = "A":float(temp[1]),"B":float(temp[2])
 else:
 data["mat1"]["B"] = "A":float(temp[1]),"B":float(temp[2])

data["mat2"] = 
for k in [14,15]:
 temp = rawLines[k].split("t")
 if(k == 14):
 data["mat2"]["A"]="X":float(temp[1]),"Y":float(temp[2]),"Z":float(temp[3])
 elif(k == 15):
 data["mat2"]["B"]="X":float(temp[1]),"Y":float(temp[2]),"Z":float(temp[3])

answered Nov 15 '18 at 1:53

kpie

3,58541432

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53311239%2fbest-way-to-get-just-strings-integers-and-or-floats-from-a-data-file-in-python%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

You are almost right. You simply need to not join back the split() result. Instead, append the data list with each element from the split()

import sys, re

data = 
for line in sys.stdin.readlines():
 for x in re.sub(r"[^a-zA-Zds.]", "", line).strip().split():
 data.append(x)

print(data)

edited Nov 15 '18 at 2:13

answered Nov 15 '18 at 1:56

Andreas

1,97031018

1

This is close but not completely correct. Id' add re.sub(r"[^a-zA-Zds.]", "", x) so this excludes non-alphanumeric words the OP doesn't want such as "--------".

– Spencer Wieczorek
Nov 15 '18 at 2:05

Edited based on your comment. Thank you!

– Andreas
Nov 15 '18 at 2:13

add a comment |

You are almost right. You simply need to not join back the split() result. Instead, append the data list with each element from the split()

import sys, re

data = 
for line in sys.stdin.readlines():
 for x in re.sub(r"[^a-zA-Zds.]", "", line).strip().split():
 data.append(x)

print(data)

edited Nov 15 '18 at 2:13

answered Nov 15 '18 at 1:56

Andreas

1,97031018

1

This is close but not completely correct. Id' add re.sub(r"[^a-zA-Zds.]", "", x) so this excludes non-alphanumeric words the OP doesn't want such as "--------".

– Spencer Wieczorek
Nov 15 '18 at 2:05

Edited based on your comment. Thank you!

– Andreas
Nov 15 '18 at 2:13

add a comment |

You are almost right. You simply need to not join back the split() result. Instead, append the data list with each element from the split()

import sys, re

data = 
for line in sys.stdin.readlines():
 for x in re.sub(r"[^a-zA-Zds.]", "", line).strip().split():
 data.append(x)

print(data)

edited Nov 15 '18 at 2:13

answered Nov 15 '18 at 1:56

Andreas

1,97031018

You are almost right. You simply need to not join back the split() result. Instead, append the data list with each element from the split()

import sys, re

data = 
for line in sys.stdin.readlines():
 for x in re.sub(r"[^a-zA-Zds.]", "", line).strip().split():
 data.append(x)

print(data)

edited Nov 15 '18 at 2:13

answered Nov 15 '18 at 1:56

Andreas

1,97031018

edited Nov 15 '18 at 2:13

answered Nov 15 '18 at 1:56

Andreas

1,97031018

answered Nov 15 '18 at 1:56

Andreas

1,97031018

answered Nov 15 '18 at 1:56

Andreas

1,97031018

1

This is close but not completely correct. Id' add re.sub(r"[^a-zA-Zds.]", "", x) so this excludes non-alphanumeric words the OP doesn't want such as "--------".

– Spencer Wieczorek
Nov 15 '18 at 2:05

Edited based on your comment. Thank you!

– Andreas
Nov 15 '18 at 2:13

add a comment |

1

This is close but not completely correct. Id' add re.sub(r"[^a-zA-Zds.]", "", x) so this excludes non-alphanumeric words the OP doesn't want such as "--------".

– Spencer Wieczorek
Nov 15 '18 at 2:05

Edited based on your comment. Thank you!

– Andreas
Nov 15 '18 at 2:13

This is close but not completely correct. Id' add re.sub(r"[^a-zA-Zds.]", "", x) so this excludes non-alphanumeric words the OP doesn't want such as "--------".

– Spencer Wieczorek
Nov 15 '18 at 2:05

Edited based on your comment. Thank you!

– Andreas
Nov 15 '18 at 2:13

add a comment |

You could do it like this...

rawLines = raw.split("n")

data = 
data["seq"] = rawLines[1]

data["mat1"] = 
for k in [8,9]:
 temp = rawLines[k].split("t")
 if(k==8):
 data["mat1"]["A"] = "A":float(temp[1]),"B":float(temp[2])
 else:
 data["mat1"]["B"] = "A":float(temp[1]),"B":float(temp[2])

data["mat2"] = 
for k in [14,15]:
 temp = rawLines[k].split("t")
 if(k == 14):
 data["mat2"]["A"]="X":float(temp[1]),"Y":float(temp[2]),"Z":float(temp[3])
 elif(k == 15):
 data["mat2"]["B"]="X":float(temp[1]),"Y":float(temp[2]),"Z":float(temp[3])

answered Nov 15 '18 at 1:53

kpie

3,58541432

add a comment |

You could do it like this...

rawLines = raw.split("n")

data = 
data["seq"] = rawLines[1]

data["mat1"] = 
for k in [8,9]:
 temp = rawLines[k].split("t")
 if(k==8):
 data["mat1"]["A"] = "A":float(temp[1]),"B":float(temp[2])
 else:
 data["mat1"]["B"] = "A":float(temp[1]),"B":float(temp[2])

data["mat2"] = 
for k in [14,15]:
 temp = rawLines[k].split("t")
 if(k == 14):
 data["mat2"]["A"]="X":float(temp[1]),"Y":float(temp[2]),"Z":float(temp[3])
 elif(k == 15):
 data["mat2"]["B"]="X":float(temp[1]),"Y":float(temp[2]),"Z":float(temp[3])

answered Nov 15 '18 at 1:53

kpie

3,58541432

add a comment |

You could do it like this...

rawLines = raw.split("n")

data = 
data["seq"] = rawLines[1]

data["mat1"] = 
for k in [8,9]:
 temp = rawLines[k].split("t")
 if(k==8):
 data["mat1"]["A"] = "A":float(temp[1]),"B":float(temp[2])
 else:
 data["mat1"]["B"] = "A":float(temp[1]),"B":float(temp[2])

data["mat2"] = 
for k in [14,15]:
 temp = rawLines[k].split("t")
 if(k == 14):
 data["mat2"]["A"]="X":float(temp[1]),"Y":float(temp[2]),"Z":float(temp[3])
 elif(k == 15):
 data["mat2"]["B"]="X":float(temp[1]),"Y":float(temp[2]),"Z":float(temp[3])

answered Nov 15 '18 at 1:53

kpie

3,58541432

You could do it like this...

rawLines = raw.split("n")

data = 
data["seq"] = rawLines[1]

data["mat1"] = 
for k in [8,9]:
 temp = rawLines[k].split("t")
 if(k==8):
 data["mat1"]["A"] = "A":float(temp[1]),"B":float(temp[2])
 else:
 data["mat1"]["B"] = "A":float(temp[1]),"B":float(temp[2])

data["mat2"] = 
for k in [14,15]:
 temp = rawLines[k].split("t")
 if(k == 14):
 data["mat2"]["A"]="X":float(temp[1]),"Y":float(temp[2]),"Z":float(temp[3])
 elif(k == 15):
 data["mat2"]["B"]="X":float(temp[1]),"Y":float(temp[2]),"Z":float(temp[3])

answered Nov 15 '18 at 1:53

kpie

3,58541432

answered Nov 15 '18 at 1:53

kpie

3,58541432

answered Nov 15 '18 at 1:53

kpie

3,58541432

answered Nov 15 '18 at 1:53

kpie

3,58541432

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Myujth