How to find column of same values in csv file using python [closed]

-1

I have a csv file test.csv. It have 5000 columns. Some of columns (example 50 columns), have same value in all rows. How can I find how many column have same value and print those columns in separate csv.
Example,

I want to find columns which have similar values/elements such as A,B and D . Then print those A,B and D in separate CSV file and C in separate CSV.

Thank you.

edited Nov 14 '18 at 12:06

asked Nov 13 '18 at 17:39

user680288

closed as too broad by usr2564301, Owen Pauling, Unheilig, Mickael Maison, greg-449 Nov 14 '18 at 10:24

Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.

Have you made any attempts to solve this task?

– Roman
Nov 13 '18 at 17:44

to clarify, you need to get an entire column where any of its value is a row duplicate.

– Geo Joy
Nov 13 '18 at 17:46

yes, tried with csvreader and for loop to read the column elements, csvr=csv.reader(file) for row in csvr: read , but unable to read all columns with header.

– user680288
Nov 13 '18 at 17:48

Hi , I solved by using python pandas nunique method to find the unique columns and convert the unique column to csv and non-unique columns to another csv. Thank you for all the answers.

– user680288
Nov 14 '18 at 12:09

add a comment |

-1

I want to find columns which have similar values/elements such as A,B and D . Then print those A,B and D in separate CSV file and C in separate CSV.

Thank you.

edited Nov 14 '18 at 12:06

asked Nov 13 '18 at 17:39

user680288

closed as too broad by usr2564301, Owen Pauling, Unheilig, Mickael Maison, greg-449 Nov 14 '18 at 10:24

Have you made any attempts to solve this task?

– Roman
Nov 13 '18 at 17:44

to clarify, you need to get an entire column where any of its value is a row duplicate.

– Geo Joy
Nov 13 '18 at 17:46

yes, tried with csvreader and for loop to read the column elements, csvr=csv.reader(file) for row in csvr: read , but unable to read all columns with header.

– user680288
Nov 13 '18 at 17:48

Hi , I solved by using python pandas nunique method to find the unique columns and convert the unique column to csv and non-unique columns to another csv. Thank you for all the answers.

– user680288
Nov 14 '18 at 12:09

add a comment |

-1

I want to find columns which have similar values/elements such as A,B and D . Then print those A,B and D in separate CSV file and C in separate CSV.

Thank you.

edited Nov 14 '18 at 12:06

asked Nov 13 '18 at 17:39

user680288

I want to find columns which have similar values/elements such as A,B and D . Then print those A,B and D in separate CSV file and C in separate CSV.

Thank you.

python csv

edited Nov 14 '18 at 12:06

asked Nov 13 '18 at 17:39

user680288

edited Nov 14 '18 at 12:06

asked Nov 13 '18 at 17:39

user680288

edited Nov 14 '18 at 12:06

asked Nov 13 '18 at 17:39

user680288

asked Nov 13 '18 at 17:39

user680288

asked Nov 13 '18 at 17:39

user680288

closed as too broad by usr2564301, Owen Pauling, Unheilig, Mickael Maison, greg-449 Nov 14 '18 at 10:24

Have you made any attempts to solve this task?

– Roman
Nov 13 '18 at 17:44

to clarify, you need to get an entire column where any of its value is a row duplicate.

– Geo Joy
Nov 13 '18 at 17:46

yes, tried with csvreader and for loop to read the column elements, csvr=csv.reader(file) for row in csvr: read , but unable to read all columns with header.

– user680288
Nov 13 '18 at 17:48

Hi , I solved by using python pandas nunique method to find the unique columns and convert the unique column to csv and non-unique columns to another csv. Thank you for all the answers.

– user680288
Nov 14 '18 at 12:09

add a comment |

Have you made any attempts to solve this task?

– Roman
Nov 13 '18 at 17:44

to clarify, you need to get an entire column where any of its value is a row duplicate.

– Geo Joy
Nov 13 '18 at 17:46

yes, tried with csvreader and for loop to read the column elements, csvr=csv.reader(file) for row in csvr: read , but unable to read all columns with header.

– user680288
Nov 13 '18 at 17:48

Hi , I solved by using python pandas nunique method to find the unique columns and convert the unique column to csv and non-unique columns to another csv. Thank you for all the answers.

– user680288
Nov 14 '18 at 12:09

Have you made any attempts to solve this task?

– Roman
Nov 13 '18 at 17:44

to clarify, you need to get an entire column where any of its value is a row duplicate.

– Geo Joy
Nov 13 '18 at 17:46

yes, tried with csvreader and for loop to read the column elements, csvr=csv.reader(file) for row in csvr: read , but unable to read all columns with header.

– user680288
Nov 13 '18 at 17:48

Hi , I solved by using python pandas nunique method to find the unique columns and convert the unique column to csv and non-unique columns to another csv. Thank you for all the answers.

– user680288
Nov 14 '18 at 12:09

add a comment |

3 Answers
3

active

oldest

votes

I recommend using pandas. You can solve your problem with something like the below (which should get you started).

You'll need to review this link which will give you an overview of 10 minutes to pandas (i.e. reading in/manipulating data)

import pandas as pd

data = 
 'A': [1] * 5
 , 'B': [1] * 5
 , 'C': [1] * 5
 , 'D': [i for i in range(2, 7)]



df = pd.DataFrame(data)

# loop through each column
for col in df.columns.tolist():
 # check if every value in the column is equal to the first value
 if (df[col] == df[col][0]).all():
 print('all values match in col'.format(col=col))
 else:
 print('col has non-uniform values'.format(col=col))

edited Nov 14 '18 at 12:57

answered Nov 13 '18 at 17:50

rs311

1439

add a comment |

Just find the columns which have only 1 unique value:

Create a DataFrame, I'm creating with some dummy data, you can read the csv with pd.read_csv.

>>> import pandas as pd
>>> df = pd.DataFrame(data='A': [1,1,1,1,1,1,1], 'B': [2,2,2,2,2,2,2], 'C': [1,2,3,4,5,6,7])
>>> df
 A B C
0 1 2 1
1 1 2 2
2 1 2 3
3 1 2 4
4 1 2 5
5 1 2 6
6 1 2 7

Find those columns which have only 1 unique value:

>>> equal_cols = [c for c in df.columns if len(df[c].unique()) == 1]
>>> equal_cols
['A', 'B']

Write those columns to sample1.csv, and all others to sample2.csv.

>>> df[equal_cols].to_csv('sample1.csv')
>>> df[c for c in df.columns if c not in equal_cols].to_csv('sample2.csv')

answered Nov 13 '18 at 20:21

Muhammad Ahmad

2,0241420

add a comment |

You can use pandas for pretty IO.
Just write a function to test a column and select the good ones :

Input :

import pandas as pd
df=pd.read_csv()

A short circuit function,which compare all values only if necessary:

from numba import njit
@njit # optional, for efficiency
def equal(arr):
 ref=arr[0]
 for x in arr[1:]:
 if x != ref : return False
 return True

Output:

mask=df.apply(equal,axis=0,raw=True)
#[ True, True, False, True ]
df.loc[:,mask].to_csv('equal.csv',sep=' ',index=False)
df.loc[:,~mask].to_csv('notequal.csv',sep=' ',index=False)

For:

and

edited Nov 14 '18 at 8:09

answered Nov 13 '18 at 19:27

B. M.

13.1k11934

add a comment |

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

I recommend using pandas. You can solve your problem with something like the below (which should get you started).

You'll need to review this link which will give you an overview of 10 minutes to pandas (i.e. reading in/manipulating data)

import pandas as pd

data = 
 'A': [1] * 5
 , 'B': [1] * 5
 , 'C': [1] * 5
 , 'D': [i for i in range(2, 7)]



df = pd.DataFrame(data)

# loop through each column
for col in df.columns.tolist():
 # check if every value in the column is equal to the first value
 if (df[col] == df[col][0]).all():
 print('all values match in col'.format(col=col))
 else:
 print('col has non-uniform values'.format(col=col))

edited Nov 14 '18 at 12:57

answered Nov 13 '18 at 17:50

rs311

1439

add a comment |

I recommend using pandas. You can solve your problem with something like the below (which should get you started).

You'll need to review this link which will give you an overview of 10 minutes to pandas (i.e. reading in/manipulating data)

import pandas as pd

data = 
 'A': [1] * 5
 , 'B': [1] * 5
 , 'C': [1] * 5
 , 'D': [i for i in range(2, 7)]



df = pd.DataFrame(data)

# loop through each column
for col in df.columns.tolist():
 # check if every value in the column is equal to the first value
 if (df[col] == df[col][0]).all():
 print('all values match in col'.format(col=col))
 else:
 print('col has non-uniform values'.format(col=col))

edited Nov 14 '18 at 12:57

answered Nov 13 '18 at 17:50

rs311

1439

add a comment |

I recommend using pandas. You can solve your problem with something like the below (which should get you started).

You'll need to review this link which will give you an overview of 10 minutes to pandas (i.e. reading in/manipulating data)

import pandas as pd

data = 
 'A': [1] * 5
 , 'B': [1] * 5
 , 'C': [1] * 5
 , 'D': [i for i in range(2, 7)]



df = pd.DataFrame(data)

# loop through each column
for col in df.columns.tolist():
 # check if every value in the column is equal to the first value
 if (df[col] == df[col][0]).all():
 print('all values match in col'.format(col=col))
 else:
 print('col has non-uniform values'.format(col=col))

edited Nov 14 '18 at 12:57

answered Nov 13 '18 at 17:50

rs311

1439

I recommend using pandas. You can solve your problem with something like the below (which should get you started).

You'll need to review this link which will give you an overview of 10 minutes to pandas (i.e. reading in/manipulating data)

import pandas as pd

data = 
 'A': [1] * 5
 , 'B': [1] * 5
 , 'C': [1] * 5
 , 'D': [i for i in range(2, 7)]



df = pd.DataFrame(data)

# loop through each column
for col in df.columns.tolist():
 # check if every value in the column is equal to the first value
 if (df[col] == df[col][0]).all():
 print('all values match in col'.format(col=col))
 else:
 print('col has non-uniform values'.format(col=col))

edited Nov 14 '18 at 12:57

answered Nov 13 '18 at 17:50

rs311

1439

edited Nov 14 '18 at 12:57

answered Nov 13 '18 at 17:50

rs311

1439

answered Nov 13 '18 at 17:50

rs311

1439

answered Nov 13 '18 at 17:50

rs311

1439

add a comment |

Just find the columns which have only 1 unique value:

Create a DataFrame, I'm creating with some dummy data, you can read the csv with pd.read_csv.

>>> import pandas as pd
>>> df = pd.DataFrame(data='A': [1,1,1,1,1,1,1], 'B': [2,2,2,2,2,2,2], 'C': [1,2,3,4,5,6,7])
>>> df
 A B C
0 1 2 1
1 1 2 2
2 1 2 3
3 1 2 4
4 1 2 5
5 1 2 6
6 1 2 7

Find those columns which have only 1 unique value:

>>> equal_cols = [c for c in df.columns if len(df[c].unique()) == 1]
>>> equal_cols
['A', 'B']

Write those columns to sample1.csv, and all others to sample2.csv.

>>> df[equal_cols].to_csv('sample1.csv')
>>> df[c for c in df.columns if c not in equal_cols].to_csv('sample2.csv')

answered Nov 13 '18 at 20:21

Muhammad Ahmad

2,0241420

add a comment |

Just find the columns which have only 1 unique value:

Create a DataFrame, I'm creating with some dummy data, you can read the csv with pd.read_csv.

>>> import pandas as pd
>>> df = pd.DataFrame(data='A': [1,1,1,1,1,1,1], 'B': [2,2,2,2,2,2,2], 'C': [1,2,3,4,5,6,7])
>>> df
 A B C
0 1 2 1
1 1 2 2
2 1 2 3
3 1 2 4
4 1 2 5
5 1 2 6
6 1 2 7

Find those columns which have only 1 unique value:

>>> equal_cols = [c for c in df.columns if len(df[c].unique()) == 1]
>>> equal_cols
['A', 'B']

Write those columns to sample1.csv, and all others to sample2.csv.

>>> df[equal_cols].to_csv('sample1.csv')
>>> df[c for c in df.columns if c not in equal_cols].to_csv('sample2.csv')

answered Nov 13 '18 at 20:21

Muhammad Ahmad

2,0241420

add a comment |

Just find the columns which have only 1 unique value:

Create a DataFrame, I'm creating with some dummy data, you can read the csv with pd.read_csv.

>>> import pandas as pd
>>> df = pd.DataFrame(data='A': [1,1,1,1,1,1,1], 'B': [2,2,2,2,2,2,2], 'C': [1,2,3,4,5,6,7])
>>> df
 A B C
0 1 2 1
1 1 2 2
2 1 2 3
3 1 2 4
4 1 2 5
5 1 2 6
6 1 2 7

Find those columns which have only 1 unique value:

>>> equal_cols = [c for c in df.columns if len(df[c].unique()) == 1]
>>> equal_cols
['A', 'B']

Write those columns to sample1.csv, and all others to sample2.csv.

>>> df[equal_cols].to_csv('sample1.csv')
>>> df[c for c in df.columns if c not in equal_cols].to_csv('sample2.csv')

answered Nov 13 '18 at 20:21

Muhammad Ahmad

2,0241420

Just find the columns which have only 1 unique value:

Create a DataFrame, I'm creating with some dummy data, you can read the csv with pd.read_csv.

>>> import pandas as pd
>>> df = pd.DataFrame(data='A': [1,1,1,1,1,1,1], 'B': [2,2,2,2,2,2,2], 'C': [1,2,3,4,5,6,7])
>>> df
 A B C
0 1 2 1
1 1 2 2
2 1 2 3
3 1 2 4
4 1 2 5
5 1 2 6
6 1 2 7

Find those columns which have only 1 unique value:

>>> equal_cols = [c for c in df.columns if len(df[c].unique()) == 1]
>>> equal_cols
['A', 'B']

Write those columns to sample1.csv, and all others to sample2.csv.

>>> df[equal_cols].to_csv('sample1.csv')
>>> df[c for c in df.columns if c not in equal_cols].to_csv('sample2.csv')

answered Nov 13 '18 at 20:21

Muhammad Ahmad

2,0241420

answered Nov 13 '18 at 20:21

Muhammad Ahmad

2,0241420

answered Nov 13 '18 at 20:21

Muhammad Ahmad

2,0241420

answered Nov 13 '18 at 20:21

Muhammad Ahmad

2,0241420

add a comment |

You can use pandas for pretty IO.
Just write a function to test a column and select the good ones :

Input :

import pandas as pd
df=pd.read_csv()

A short circuit function,which compare all values only if necessary:

from numba import njit
@njit # optional, for efficiency
def equal(arr):
 ref=arr[0]
 for x in arr[1:]:
 if x != ref : return False
 return True

Output:

mask=df.apply(equal,axis=0,raw=True)
#[ True, True, False, True ]
df.loc[:,mask].to_csv('equal.csv',sep=' ',index=False)
df.loc[:,~mask].to_csv('notequal.csv',sep=' ',index=False)

For:

and

edited Nov 14 '18 at 8:09

answered Nov 13 '18 at 19:27

B. M.

13.1k11934

add a comment |

You can use pandas for pretty IO.
Just write a function to test a column and select the good ones :

Input :

import pandas as pd
df=pd.read_csv()

A short circuit function,which compare all values only if necessary:

from numba import njit
@njit # optional, for efficiency
def equal(arr):
 ref=arr[0]
 for x in arr[1:]:
 if x != ref : return False
 return True

Output:

mask=df.apply(equal,axis=0,raw=True)
#[ True, True, False, True ]
df.loc[:,mask].to_csv('equal.csv',sep=' ',index=False)
df.loc[:,~mask].to_csv('notequal.csv',sep=' ',index=False)

For:

and

edited Nov 14 '18 at 8:09

answered Nov 13 '18 at 19:27

B. M.

13.1k11934

add a comment |

You can use pandas for pretty IO.
Just write a function to test a column and select the good ones :

Input :

import pandas as pd
df=pd.read_csv()

A short circuit function,which compare all values only if necessary:

from numba import njit
@njit # optional, for efficiency
def equal(arr):
 ref=arr[0]
 for x in arr[1:]:
 if x != ref : return False
 return True

Output:

mask=df.apply(equal,axis=0,raw=True)
#[ True, True, False, True ]
df.loc[:,mask].to_csv('equal.csv',sep=' ',index=False)
df.loc[:,~mask].to_csv('notequal.csv',sep=' ',index=False)

For:

and

edited Nov 14 '18 at 8:09

answered Nov 13 '18 at 19:27

B. M.

13.1k11934

You can use pandas for pretty IO.
Just write a function to test a column and select the good ones :

Input :

import pandas as pd
df=pd.read_csv()

A short circuit function,which compare all values only if necessary:

from numba import njit
@njit # optional, for efficiency
def equal(arr):
 ref=arr[0]
 for x in arr[1:]:
 if x != ref : return False
 return True

Output:

mask=df.apply(equal,axis=0,raw=True)
#[ True, True, False, True ]
df.loc[:,mask].to_csv('equal.csv',sep=' ',index=False)
df.loc[:,~mask].to_csv('notequal.csv',sep=' ',index=False)

For:

and

edited Nov 14 '18 at 8:09

answered Nov 13 '18 at 19:27

B. M.

13.1k11934

edited Nov 14 '18 at 8:09

answered Nov 13 '18 at 19:27

B. M.

13.1k11934

answered Nov 13 '18 at 19:27

B. M.

13.1k11934

answered Nov 13 '18 at 19:27

B. M.

13.1k11934

add a comment |

This page is only for reference, If you need detailed information, please check here

WsCge,Ey9n BvbbD03emH3pyHpAl69e9i Bn3hGVIsO4,hsMoIeH7B09QRVSU 1u

搜尋此網誌

Myujth