How to find column of same values in csv file using python [closed]










-1















I have a csv file test.csv. It have 5000 columns. Some of columns (example 50 columns), have same value in all rows. How can I find how many column have same value and print those columns in separate csv.
Example,



A B C D
1 2 2 3
1 2 3 3
1 2 4 3
1 2 5 3
1 2 7 3


I want to find columns which have similar values/elements such as A,B and D . Then print those A,B and D in separate CSV file and C in separate CSV.



Thank you.










share|improve this question















closed as too broad by usr2564301, Owen Pauling, Unheilig, Mickael Maison, greg-449 Nov 14 '18 at 10:24


Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.















  • Have you made any attempts to solve this task?

    – Roman
    Nov 13 '18 at 17:44











  • to clarify, you need to get an entire column where any of its value is a row duplicate.

    – Geo Joy
    Nov 13 '18 at 17:46











  • yes, tried with csvreader and for loop to read the column elements, csvr=csv.reader(file) for row in csvr: read , but unable to read all columns with header.

    – user680288
    Nov 13 '18 at 17:48












  • Hi , I solved by using python pandas nunique method to find the unique columns and convert the unique column to csv and non-unique columns to another csv. Thank you for all the answers.

    – user680288
    Nov 14 '18 at 12:09
















-1















I have a csv file test.csv. It have 5000 columns. Some of columns (example 50 columns), have same value in all rows. How can I find how many column have same value and print those columns in separate csv.
Example,



A B C D
1 2 2 3
1 2 3 3
1 2 4 3
1 2 5 3
1 2 7 3


I want to find columns which have similar values/elements such as A,B and D . Then print those A,B and D in separate CSV file and C in separate CSV.



Thank you.










share|improve this question















closed as too broad by usr2564301, Owen Pauling, Unheilig, Mickael Maison, greg-449 Nov 14 '18 at 10:24


Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.















  • Have you made any attempts to solve this task?

    – Roman
    Nov 13 '18 at 17:44











  • to clarify, you need to get an entire column where any of its value is a row duplicate.

    – Geo Joy
    Nov 13 '18 at 17:46











  • yes, tried with csvreader and for loop to read the column elements, csvr=csv.reader(file) for row in csvr: read , but unable to read all columns with header.

    – user680288
    Nov 13 '18 at 17:48












  • Hi , I solved by using python pandas nunique method to find the unique columns and convert the unique column to csv and non-unique columns to another csv. Thank you for all the answers.

    – user680288
    Nov 14 '18 at 12:09














-1












-1








-1








I have a csv file test.csv. It have 5000 columns. Some of columns (example 50 columns), have same value in all rows. How can I find how many column have same value and print those columns in separate csv.
Example,



A B C D
1 2 2 3
1 2 3 3
1 2 4 3
1 2 5 3
1 2 7 3


I want to find columns which have similar values/elements such as A,B and D . Then print those A,B and D in separate CSV file and C in separate CSV.



Thank you.










share|improve this question
















I have a csv file test.csv. It have 5000 columns. Some of columns (example 50 columns), have same value in all rows. How can I find how many column have same value and print those columns in separate csv.
Example,



A B C D
1 2 2 3
1 2 3 3
1 2 4 3
1 2 5 3
1 2 7 3


I want to find columns which have similar values/elements such as A,B and D . Then print those A,B and D in separate CSV file and C in separate CSV.



Thank you.







python csv






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 14 '18 at 12:06







user680288

















asked Nov 13 '18 at 17:39









user680288user680288

36




36




closed as too broad by usr2564301, Owen Pauling, Unheilig, Mickael Maison, greg-449 Nov 14 '18 at 10:24


Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.






closed as too broad by usr2564301, Owen Pauling, Unheilig, Mickael Maison, greg-449 Nov 14 '18 at 10:24


Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.














  • Have you made any attempts to solve this task?

    – Roman
    Nov 13 '18 at 17:44











  • to clarify, you need to get an entire column where any of its value is a row duplicate.

    – Geo Joy
    Nov 13 '18 at 17:46











  • yes, tried with csvreader and for loop to read the column elements, csvr=csv.reader(file) for row in csvr: read , but unable to read all columns with header.

    – user680288
    Nov 13 '18 at 17:48












  • Hi , I solved by using python pandas nunique method to find the unique columns and convert the unique column to csv and non-unique columns to another csv. Thank you for all the answers.

    – user680288
    Nov 14 '18 at 12:09


















  • Have you made any attempts to solve this task?

    – Roman
    Nov 13 '18 at 17:44











  • to clarify, you need to get an entire column where any of its value is a row duplicate.

    – Geo Joy
    Nov 13 '18 at 17:46











  • yes, tried with csvreader and for loop to read the column elements, csvr=csv.reader(file) for row in csvr: read , but unable to read all columns with header.

    – user680288
    Nov 13 '18 at 17:48












  • Hi , I solved by using python pandas nunique method to find the unique columns and convert the unique column to csv and non-unique columns to another csv. Thank you for all the answers.

    – user680288
    Nov 14 '18 at 12:09

















Have you made any attempts to solve this task?

– Roman
Nov 13 '18 at 17:44





Have you made any attempts to solve this task?

– Roman
Nov 13 '18 at 17:44













to clarify, you need to get an entire column where any of its value is a row duplicate.

– Geo Joy
Nov 13 '18 at 17:46





to clarify, you need to get an entire column where any of its value is a row duplicate.

– Geo Joy
Nov 13 '18 at 17:46













yes, tried with csvreader and for loop to read the column elements, csvr=csv.reader(file) for row in csvr: read , but unable to read all columns with header.

– user680288
Nov 13 '18 at 17:48






yes, tried with csvreader and for loop to read the column elements, csvr=csv.reader(file) for row in csvr: read , but unable to read all columns with header.

– user680288
Nov 13 '18 at 17:48














Hi , I solved by using python pandas nunique method to find the unique columns and convert the unique column to csv and non-unique columns to another csv. Thank you for all the answers.

– user680288
Nov 14 '18 at 12:09






Hi , I solved by using python pandas nunique method to find the unique columns and convert the unique column to csv and non-unique columns to another csv. Thank you for all the answers.

– user680288
Nov 14 '18 at 12:09













3 Answers
3






active

oldest

votes


















0














I recommend using pandas. You can solve your problem with something like the below (which should get you started).



You'll need to review this link which will give you an overview of 10 minutes to pandas (i.e. reading in/manipulating data)



import pandas as pd

data =
'A': [1] * 5
, 'B': [1] * 5
, 'C': [1] * 5
, 'D': [i for i in range(2, 7)]



df = pd.DataFrame(data)

# loop through each column
for col in df.columns.tolist():
# check if every value in the column is equal to the first value
if (df[col] == df[col][0]).all():
print('all values match in col'.format(col=col))
else:
print('col has non-uniform values'.format(col=col))





share|improve this answer
































    0














    Just find the columns which have only 1 unique value:



    Create a DataFrame, I'm creating with some dummy data, you can read the csv with pd.read_csv.



    >>> import pandas as pd
    >>> df = pd.DataFrame(data='A': [1,1,1,1,1,1,1], 'B': [2,2,2,2,2,2,2], 'C': [1,2,3,4,5,6,7])
    >>> df
    A B C
    0 1 2 1
    1 1 2 2
    2 1 2 3
    3 1 2 4
    4 1 2 5
    5 1 2 6
    6 1 2 7


    Find those columns which have only 1 unique value:



    >>> equal_cols = [c for c in df.columns if len(df[c].unique()) == 1]
    >>> equal_cols
    ['A', 'B']


    Write those columns to sample1.csv, and all others to sample2.csv.



    >>> df[equal_cols].to_csv('sample1.csv')
    >>> df[c for c in df.columns if c not in equal_cols].to_csv('sample2.csv')





    share|improve this answer






























      0














      You can use pandas for pretty IO.
      Just write a function to test a column and select the good ones :



      Input :



      import pandas as pd
      df=pd.read_csv()


      A short circuit function,which compare all values only if necessary:



      from numba import njit
      @njit # optional, for efficiency
      def equal(arr):
      ref=arr[0]
      for x in arr[1:]:
      if x != ref : return False
      return True


      Output:



      mask=df.apply(equal,axis=0,raw=True)
      #[ True, True, False, True ]
      df.loc[:,mask].to_csv('equal.csv',sep=' ',index=False)
      df.loc[:,~mask].to_csv('notequal.csv',sep=' ',index=False)


      For:



      A B D
      1 2 3
      1 2 3
      1 2 3
      1 2 3
      1 2 3


      and



      C
      2
      3
      4
      5
      7





      share|improve this answer































        3 Answers
        3






        active

        oldest

        votes








        3 Answers
        3






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        0














        I recommend using pandas. You can solve your problem with something like the below (which should get you started).



        You'll need to review this link which will give you an overview of 10 minutes to pandas (i.e. reading in/manipulating data)



        import pandas as pd

        data =
        'A': [1] * 5
        , 'B': [1] * 5
        , 'C': [1] * 5
        , 'D': [i for i in range(2, 7)]



        df = pd.DataFrame(data)

        # loop through each column
        for col in df.columns.tolist():
        # check if every value in the column is equal to the first value
        if (df[col] == df[col][0]).all():
        print('all values match in col'.format(col=col))
        else:
        print('col has non-uniform values'.format(col=col))





        share|improve this answer





























          0














          I recommend using pandas. You can solve your problem with something like the below (which should get you started).



          You'll need to review this link which will give you an overview of 10 minutes to pandas (i.e. reading in/manipulating data)



          import pandas as pd

          data =
          'A': [1] * 5
          , 'B': [1] * 5
          , 'C': [1] * 5
          , 'D': [i for i in range(2, 7)]



          df = pd.DataFrame(data)

          # loop through each column
          for col in df.columns.tolist():
          # check if every value in the column is equal to the first value
          if (df[col] == df[col][0]).all():
          print('all values match in col'.format(col=col))
          else:
          print('col has non-uniform values'.format(col=col))





          share|improve this answer



























            0












            0








            0







            I recommend using pandas. You can solve your problem with something like the below (which should get you started).



            You'll need to review this link which will give you an overview of 10 minutes to pandas (i.e. reading in/manipulating data)



            import pandas as pd

            data =
            'A': [1] * 5
            , 'B': [1] * 5
            , 'C': [1] * 5
            , 'D': [i for i in range(2, 7)]



            df = pd.DataFrame(data)

            # loop through each column
            for col in df.columns.tolist():
            # check if every value in the column is equal to the first value
            if (df[col] == df[col][0]).all():
            print('all values match in col'.format(col=col))
            else:
            print('col has non-uniform values'.format(col=col))





            share|improve this answer















            I recommend using pandas. You can solve your problem with something like the below (which should get you started).



            You'll need to review this link which will give you an overview of 10 minutes to pandas (i.e. reading in/manipulating data)



            import pandas as pd

            data =
            'A': [1] * 5
            , 'B': [1] * 5
            , 'C': [1] * 5
            , 'D': [i for i in range(2, 7)]



            df = pd.DataFrame(data)

            # loop through each column
            for col in df.columns.tolist():
            # check if every value in the column is equal to the first value
            if (df[col] == df[col][0]).all():
            print('all values match in col'.format(col=col))
            else:
            print('col has non-uniform values'.format(col=col))






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 14 '18 at 12:57

























            answered Nov 13 '18 at 17:50









            rs311rs311

            1439




            1439























                0














                Just find the columns which have only 1 unique value:



                Create a DataFrame, I'm creating with some dummy data, you can read the csv with pd.read_csv.



                >>> import pandas as pd
                >>> df = pd.DataFrame(data='A': [1,1,1,1,1,1,1], 'B': [2,2,2,2,2,2,2], 'C': [1,2,3,4,5,6,7])
                >>> df
                A B C
                0 1 2 1
                1 1 2 2
                2 1 2 3
                3 1 2 4
                4 1 2 5
                5 1 2 6
                6 1 2 7


                Find those columns which have only 1 unique value:



                >>> equal_cols = [c for c in df.columns if len(df[c].unique()) == 1]
                >>> equal_cols
                ['A', 'B']


                Write those columns to sample1.csv, and all others to sample2.csv.



                >>> df[equal_cols].to_csv('sample1.csv')
                >>> df[c for c in df.columns if c not in equal_cols].to_csv('sample2.csv')





                share|improve this answer



























                  0














                  Just find the columns which have only 1 unique value:



                  Create a DataFrame, I'm creating with some dummy data, you can read the csv with pd.read_csv.



                  >>> import pandas as pd
                  >>> df = pd.DataFrame(data='A': [1,1,1,1,1,1,1], 'B': [2,2,2,2,2,2,2], 'C': [1,2,3,4,5,6,7])
                  >>> df
                  A B C
                  0 1 2 1
                  1 1 2 2
                  2 1 2 3
                  3 1 2 4
                  4 1 2 5
                  5 1 2 6
                  6 1 2 7


                  Find those columns which have only 1 unique value:



                  >>> equal_cols = [c for c in df.columns if len(df[c].unique()) == 1]
                  >>> equal_cols
                  ['A', 'B']


                  Write those columns to sample1.csv, and all others to sample2.csv.



                  >>> df[equal_cols].to_csv('sample1.csv')
                  >>> df[c for c in df.columns if c not in equal_cols].to_csv('sample2.csv')





                  share|improve this answer

























                    0












                    0








                    0







                    Just find the columns which have only 1 unique value:



                    Create a DataFrame, I'm creating with some dummy data, you can read the csv with pd.read_csv.



                    >>> import pandas as pd
                    >>> df = pd.DataFrame(data='A': [1,1,1,1,1,1,1], 'B': [2,2,2,2,2,2,2], 'C': [1,2,3,4,5,6,7])
                    >>> df
                    A B C
                    0 1 2 1
                    1 1 2 2
                    2 1 2 3
                    3 1 2 4
                    4 1 2 5
                    5 1 2 6
                    6 1 2 7


                    Find those columns which have only 1 unique value:



                    >>> equal_cols = [c for c in df.columns if len(df[c].unique()) == 1]
                    >>> equal_cols
                    ['A', 'B']


                    Write those columns to sample1.csv, and all others to sample2.csv.



                    >>> df[equal_cols].to_csv('sample1.csv')
                    >>> df[c for c in df.columns if c not in equal_cols].to_csv('sample2.csv')





                    share|improve this answer













                    Just find the columns which have only 1 unique value:



                    Create a DataFrame, I'm creating with some dummy data, you can read the csv with pd.read_csv.



                    >>> import pandas as pd
                    >>> df = pd.DataFrame(data='A': [1,1,1,1,1,1,1], 'B': [2,2,2,2,2,2,2], 'C': [1,2,3,4,5,6,7])
                    >>> df
                    A B C
                    0 1 2 1
                    1 1 2 2
                    2 1 2 3
                    3 1 2 4
                    4 1 2 5
                    5 1 2 6
                    6 1 2 7


                    Find those columns which have only 1 unique value:



                    >>> equal_cols = [c for c in df.columns if len(df[c].unique()) == 1]
                    >>> equal_cols
                    ['A', 'B']


                    Write those columns to sample1.csv, and all others to sample2.csv.



                    >>> df[equal_cols].to_csv('sample1.csv')
                    >>> df[c for c in df.columns if c not in equal_cols].to_csv('sample2.csv')






                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Nov 13 '18 at 20:21









                    Muhammad AhmadMuhammad Ahmad

                    2,0241420




                    2,0241420





















                        0














                        You can use pandas for pretty IO.
                        Just write a function to test a column and select the good ones :



                        Input :



                        import pandas as pd
                        df=pd.read_csv()


                        A short circuit function,which compare all values only if necessary:



                        from numba import njit
                        @njit # optional, for efficiency
                        def equal(arr):
                        ref=arr[0]
                        for x in arr[1:]:
                        if x != ref : return False
                        return True


                        Output:



                        mask=df.apply(equal,axis=0,raw=True)
                        #[ True, True, False, True ]
                        df.loc[:,mask].to_csv('equal.csv',sep=' ',index=False)
                        df.loc[:,~mask].to_csv('notequal.csv',sep=' ',index=False)


                        For:



                        A B D
                        1 2 3
                        1 2 3
                        1 2 3
                        1 2 3
                        1 2 3


                        and



                        C
                        2
                        3
                        4
                        5
                        7





                        share|improve this answer





























                          0














                          You can use pandas for pretty IO.
                          Just write a function to test a column and select the good ones :



                          Input :



                          import pandas as pd
                          df=pd.read_csv()


                          A short circuit function,which compare all values only if necessary:



                          from numba import njit
                          @njit # optional, for efficiency
                          def equal(arr):
                          ref=arr[0]
                          for x in arr[1:]:
                          if x != ref : return False
                          return True


                          Output:



                          mask=df.apply(equal,axis=0,raw=True)
                          #[ True, True, False, True ]
                          df.loc[:,mask].to_csv('equal.csv',sep=' ',index=False)
                          df.loc[:,~mask].to_csv('notequal.csv',sep=' ',index=False)


                          For:



                          A B D
                          1 2 3
                          1 2 3
                          1 2 3
                          1 2 3
                          1 2 3


                          and



                          C
                          2
                          3
                          4
                          5
                          7





                          share|improve this answer



























                            0












                            0








                            0







                            You can use pandas for pretty IO.
                            Just write a function to test a column and select the good ones :



                            Input :



                            import pandas as pd
                            df=pd.read_csv()


                            A short circuit function,which compare all values only if necessary:



                            from numba import njit
                            @njit # optional, for efficiency
                            def equal(arr):
                            ref=arr[0]
                            for x in arr[1:]:
                            if x != ref : return False
                            return True


                            Output:



                            mask=df.apply(equal,axis=0,raw=True)
                            #[ True, True, False, True ]
                            df.loc[:,mask].to_csv('equal.csv',sep=' ',index=False)
                            df.loc[:,~mask].to_csv('notequal.csv',sep=' ',index=False)


                            For:



                            A B D
                            1 2 3
                            1 2 3
                            1 2 3
                            1 2 3
                            1 2 3


                            and



                            C
                            2
                            3
                            4
                            5
                            7





                            share|improve this answer















                            You can use pandas for pretty IO.
                            Just write a function to test a column and select the good ones :



                            Input :



                            import pandas as pd
                            df=pd.read_csv()


                            A short circuit function,which compare all values only if necessary:



                            from numba import njit
                            @njit # optional, for efficiency
                            def equal(arr):
                            ref=arr[0]
                            for x in arr[1:]:
                            if x != ref : return False
                            return True


                            Output:



                            mask=df.apply(equal,axis=0,raw=True)
                            #[ True, True, False, True ]
                            df.loc[:,mask].to_csv('equal.csv',sep=' ',index=False)
                            df.loc[:,~mask].to_csv('notequal.csv',sep=' ',index=False)


                            For:



                            A B D
                            1 2 3
                            1 2 3
                            1 2 3
                            1 2 3
                            1 2 3


                            and



                            C
                            2
                            3
                            4
                            5
                            7






                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited Nov 14 '18 at 8:09

























                            answered Nov 13 '18 at 19:27









                            B. M.B. M.

                            13.1k11934




                            13.1k11934













                                Popular posts from this blog

                                Top Tejano songwriter Luis Silva dead of heart attack at 64

                                政党

                                天津地下鉄3号線