pandas: union of two data frames












6















Say I have two data frames:



df1:



  A
0 a
1 b


df2:



  A
0 a
1 c


I want the result to be the union of the two frames with an extra column showing the source data frame that the row belongs to. In case of duplicates, duplicates should be removed and the respective extra column should show both sources:



  A  B
0 a df1, df2
1 b df1
2 c df2


I can get the concatenated data frame (df3) without duplicates as follows:



import pandas as pd
df3=pd.concat([df1,df2],ignore_index=True).drop_duplicates().reset_index(drop=True)


I can't think of/find a method to have control over what element goes where. How can I add the extra column?



Thank you very much for any tips.










share|improve this question







New contributor




Leon Rai is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

























    6















    Say I have two data frames:



    df1:



      A
    0 a
    1 b


    df2:



      A
    0 a
    1 c


    I want the result to be the union of the two frames with an extra column showing the source data frame that the row belongs to. In case of duplicates, duplicates should be removed and the respective extra column should show both sources:



      A  B
    0 a df1, df2
    1 b df1
    2 c df2


    I can get the concatenated data frame (df3) without duplicates as follows:



    import pandas as pd
    df3=pd.concat([df1,df2],ignore_index=True).drop_duplicates().reset_index(drop=True)


    I can't think of/find a method to have control over what element goes where. How can I add the extra column?



    Thank you very much for any tips.










    share|improve this question







    New contributor




    Leon Rai is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.























      6












      6








      6








      Say I have two data frames:



      df1:



        A
      0 a
      1 b


      df2:



        A
      0 a
      1 c


      I want the result to be the union of the two frames with an extra column showing the source data frame that the row belongs to. In case of duplicates, duplicates should be removed and the respective extra column should show both sources:



        A  B
      0 a df1, df2
      1 b df1
      2 c df2


      I can get the concatenated data frame (df3) without duplicates as follows:



      import pandas as pd
      df3=pd.concat([df1,df2],ignore_index=True).drop_duplicates().reset_index(drop=True)


      I can't think of/find a method to have control over what element goes where. How can I add the extra column?



      Thank you very much for any tips.










      share|improve this question







      New contributor




      Leon Rai is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.












      Say I have two data frames:



      df1:



        A
      0 a
      1 b


      df2:



        A
      0 a
      1 c


      I want the result to be the union of the two frames with an extra column showing the source data frame that the row belongs to. In case of duplicates, duplicates should be removed and the respective extra column should show both sources:



        A  B
      0 a df1, df2
      1 b df1
      2 c df2


      I can get the concatenated data frame (df3) without duplicates as follows:



      import pandas as pd
      df3=pd.concat([df1,df2],ignore_index=True).drop_duplicates().reset_index(drop=True)


      I can't think of/find a method to have control over what element goes where. How can I add the extra column?



      Thank you very much for any tips.







      python pandas dataframe concatenation






      share|improve this question







      New contributor




      Leon Rai is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question







      New contributor




      Leon Rai is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question






      New contributor




      Leon Rai is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 2 hours ago









      Leon RaiLeon Rai

      455




      455




      New contributor




      Leon Rai is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Leon Rai is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Leon Rai is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.
























          3 Answers
          3






          active

          oldest

          votes


















          8














          Merge with an indicator argument, and remap the result:



          m = {'left_only': 'df1', 'right_only': 'df2', 'both': 'df1, df2'}

          result = df1.merge(df2, on=['A'], how='outer', indicator='B')
          result['B'] = result['B'].map(m)

          result
          A B
          0 a df1, df2
          1 b df1
          2 c df2





          share|improve this answer
























          • Nice and succinct!

            – cph_sto
            2 hours ago











          • @cph_sto Thank you! Upvoted back.

            – coldspeed
            2 hours ago






          • 1





            I have learnt a lot from you.

            – cph_sto
            2 hours ago











          • Excellent! Could you add how to do the same for intersection? outer->inner?

            – Leon Rai
            1 hour ago








          • 1





            @LeonRai df1.merge(df2, on=['A'], how='inner').assign(B='df1, df2') (since intersection implies membership in both)

            – coldspeed
            1 hour ago



















          2














          We use outer join to solve this -



          df1 = pd.DataFrame({'A':['a','b']})
          df2 = pd.DataFrame({'A':['a','c']})
          df1['col1']='df1'
          df2['col2']='df2'
          df=pd.merge(df1, df2, on=['A'], how="outer").fillna('')
          df['B']=df['col1']+','+df['col2']
          df['B'] = df['B'].str.strip(',')
          df=df[['A','B']]
          df

          A B
          0 a df1,df2
          1 b df1
          2 c df2





          share|improve this answer
























          • thank you for the answer!

            – Leon Rai
            1 hour ago











          • pleasure Leon :)

            – cph_sto
            1 hour ago



















          2














          Use the command below:



          df3 = pd.concat([df1.assign(source='df1'), df2.assign(source='df2')]) 
          .groupby('A')
          .aggregate(list)
          .reset_index()


          The result will be:



             A      source
          0 a [df1, df2]
          1 b [df1]
          2 c [df2]


          The assign will add a column named source with value df1 and df2 to your dataframes. groupby command groups rows with same A value to single row. aggregate command describes how to aggregate other columns (source) for each group of rows with same A. I have used list aggregate function so that the source column be the list of values with same A.






          share|improve this answer










          New contributor




          Narges Ayoubi is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.





















          • thank you for the answer!

            – Leon Rai
            1 hour ago











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });






          Leon Rai is a new contributor. Be nice, and check out our Code of Conduct.










          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54315222%2fpandas-union-of-two-data-frames%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          3 Answers
          3






          active

          oldest

          votes








          3 Answers
          3






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          8














          Merge with an indicator argument, and remap the result:



          m = {'left_only': 'df1', 'right_only': 'df2', 'both': 'df1, df2'}

          result = df1.merge(df2, on=['A'], how='outer', indicator='B')
          result['B'] = result['B'].map(m)

          result
          A B
          0 a df1, df2
          1 b df1
          2 c df2





          share|improve this answer
























          • Nice and succinct!

            – cph_sto
            2 hours ago











          • @cph_sto Thank you! Upvoted back.

            – coldspeed
            2 hours ago






          • 1





            I have learnt a lot from you.

            – cph_sto
            2 hours ago











          • Excellent! Could you add how to do the same for intersection? outer->inner?

            – Leon Rai
            1 hour ago








          • 1





            @LeonRai df1.merge(df2, on=['A'], how='inner').assign(B='df1, df2') (since intersection implies membership in both)

            – coldspeed
            1 hour ago
















          8














          Merge with an indicator argument, and remap the result:



          m = {'left_only': 'df1', 'right_only': 'df2', 'both': 'df1, df2'}

          result = df1.merge(df2, on=['A'], how='outer', indicator='B')
          result['B'] = result['B'].map(m)

          result
          A B
          0 a df1, df2
          1 b df1
          2 c df2





          share|improve this answer
























          • Nice and succinct!

            – cph_sto
            2 hours ago











          • @cph_sto Thank you! Upvoted back.

            – coldspeed
            2 hours ago






          • 1





            I have learnt a lot from you.

            – cph_sto
            2 hours ago











          • Excellent! Could you add how to do the same for intersection? outer->inner?

            – Leon Rai
            1 hour ago








          • 1





            @LeonRai df1.merge(df2, on=['A'], how='inner').assign(B='df1, df2') (since intersection implies membership in both)

            – coldspeed
            1 hour ago














          8












          8








          8







          Merge with an indicator argument, and remap the result:



          m = {'left_only': 'df1', 'right_only': 'df2', 'both': 'df1, df2'}

          result = df1.merge(df2, on=['A'], how='outer', indicator='B')
          result['B'] = result['B'].map(m)

          result
          A B
          0 a df1, df2
          1 b df1
          2 c df2





          share|improve this answer













          Merge with an indicator argument, and remap the result:



          m = {'left_only': 'df1', 'right_only': 'df2', 'both': 'df1, df2'}

          result = df1.merge(df2, on=['A'], how='outer', indicator='B')
          result['B'] = result['B'].map(m)

          result
          A B
          0 a df1, df2
          1 b df1
          2 c df2






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 2 hours ago









          coldspeedcoldspeed

          126k23126213




          126k23126213













          • Nice and succinct!

            – cph_sto
            2 hours ago











          • @cph_sto Thank you! Upvoted back.

            – coldspeed
            2 hours ago






          • 1





            I have learnt a lot from you.

            – cph_sto
            2 hours ago











          • Excellent! Could you add how to do the same for intersection? outer->inner?

            – Leon Rai
            1 hour ago








          • 1





            @LeonRai df1.merge(df2, on=['A'], how='inner').assign(B='df1, df2') (since intersection implies membership in both)

            – coldspeed
            1 hour ago



















          • Nice and succinct!

            – cph_sto
            2 hours ago











          • @cph_sto Thank you! Upvoted back.

            – coldspeed
            2 hours ago






          • 1





            I have learnt a lot from you.

            – cph_sto
            2 hours ago











          • Excellent! Could you add how to do the same for intersection? outer->inner?

            – Leon Rai
            1 hour ago








          • 1





            @LeonRai df1.merge(df2, on=['A'], how='inner').assign(B='df1, df2') (since intersection implies membership in both)

            – coldspeed
            1 hour ago

















          Nice and succinct!

          – cph_sto
          2 hours ago





          Nice and succinct!

          – cph_sto
          2 hours ago













          @cph_sto Thank you! Upvoted back.

          – coldspeed
          2 hours ago





          @cph_sto Thank you! Upvoted back.

          – coldspeed
          2 hours ago




          1




          1





          I have learnt a lot from you.

          – cph_sto
          2 hours ago





          I have learnt a lot from you.

          – cph_sto
          2 hours ago













          Excellent! Could you add how to do the same for intersection? outer->inner?

          – Leon Rai
          1 hour ago







          Excellent! Could you add how to do the same for intersection? outer->inner?

          – Leon Rai
          1 hour ago






          1




          1





          @LeonRai df1.merge(df2, on=['A'], how='inner').assign(B='df1, df2') (since intersection implies membership in both)

          – coldspeed
          1 hour ago





          @LeonRai df1.merge(df2, on=['A'], how='inner').assign(B='df1, df2') (since intersection implies membership in both)

          – coldspeed
          1 hour ago













          2














          We use outer join to solve this -



          df1 = pd.DataFrame({'A':['a','b']})
          df2 = pd.DataFrame({'A':['a','c']})
          df1['col1']='df1'
          df2['col2']='df2'
          df=pd.merge(df1, df2, on=['A'], how="outer").fillna('')
          df['B']=df['col1']+','+df['col2']
          df['B'] = df['B'].str.strip(',')
          df=df[['A','B']]
          df

          A B
          0 a df1,df2
          1 b df1
          2 c df2





          share|improve this answer
























          • thank you for the answer!

            – Leon Rai
            1 hour ago











          • pleasure Leon :)

            – cph_sto
            1 hour ago
















          2














          We use outer join to solve this -



          df1 = pd.DataFrame({'A':['a','b']})
          df2 = pd.DataFrame({'A':['a','c']})
          df1['col1']='df1'
          df2['col2']='df2'
          df=pd.merge(df1, df2, on=['A'], how="outer").fillna('')
          df['B']=df['col1']+','+df['col2']
          df['B'] = df['B'].str.strip(',')
          df=df[['A','B']]
          df

          A B
          0 a df1,df2
          1 b df1
          2 c df2





          share|improve this answer
























          • thank you for the answer!

            – Leon Rai
            1 hour ago











          • pleasure Leon :)

            – cph_sto
            1 hour ago














          2












          2








          2







          We use outer join to solve this -



          df1 = pd.DataFrame({'A':['a','b']})
          df2 = pd.DataFrame({'A':['a','c']})
          df1['col1']='df1'
          df2['col2']='df2'
          df=pd.merge(df1, df2, on=['A'], how="outer").fillna('')
          df['B']=df['col1']+','+df['col2']
          df['B'] = df['B'].str.strip(',')
          df=df[['A','B']]
          df

          A B
          0 a df1,df2
          1 b df1
          2 c df2





          share|improve this answer













          We use outer join to solve this -



          df1 = pd.DataFrame({'A':['a','b']})
          df2 = pd.DataFrame({'A':['a','c']})
          df1['col1']='df1'
          df2['col2']='df2'
          df=pd.merge(df1, df2, on=['A'], how="outer").fillna('')
          df['B']=df['col1']+','+df['col2']
          df['B'] = df['B'].str.strip(',')
          df=df[['A','B']]
          df

          A B
          0 a df1,df2
          1 b df1
          2 c df2






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 2 hours ago









          cph_stocph_sto

          1,519320




          1,519320













          • thank you for the answer!

            – Leon Rai
            1 hour ago











          • pleasure Leon :)

            – cph_sto
            1 hour ago



















          • thank you for the answer!

            – Leon Rai
            1 hour ago











          • pleasure Leon :)

            – cph_sto
            1 hour ago

















          thank you for the answer!

          – Leon Rai
          1 hour ago





          thank you for the answer!

          – Leon Rai
          1 hour ago













          pleasure Leon :)

          – cph_sto
          1 hour ago





          pleasure Leon :)

          – cph_sto
          1 hour ago











          2














          Use the command below:



          df3 = pd.concat([df1.assign(source='df1'), df2.assign(source='df2')]) 
          .groupby('A')
          .aggregate(list)
          .reset_index()


          The result will be:



             A      source
          0 a [df1, df2]
          1 b [df1]
          2 c [df2]


          The assign will add a column named source with value df1 and df2 to your dataframes. groupby command groups rows with same A value to single row. aggregate command describes how to aggregate other columns (source) for each group of rows with same A. I have used list aggregate function so that the source column be the list of values with same A.






          share|improve this answer










          New contributor




          Narges Ayoubi is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.





















          • thank you for the answer!

            – Leon Rai
            1 hour ago
















          2














          Use the command below:



          df3 = pd.concat([df1.assign(source='df1'), df2.assign(source='df2')]) 
          .groupby('A')
          .aggregate(list)
          .reset_index()


          The result will be:



             A      source
          0 a [df1, df2]
          1 b [df1]
          2 c [df2]


          The assign will add a column named source with value df1 and df2 to your dataframes. groupby command groups rows with same A value to single row. aggregate command describes how to aggregate other columns (source) for each group of rows with same A. I have used list aggregate function so that the source column be the list of values with same A.






          share|improve this answer










          New contributor




          Narges Ayoubi is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.





















          • thank you for the answer!

            – Leon Rai
            1 hour ago














          2












          2








          2







          Use the command below:



          df3 = pd.concat([df1.assign(source='df1'), df2.assign(source='df2')]) 
          .groupby('A')
          .aggregate(list)
          .reset_index()


          The result will be:



             A      source
          0 a [df1, df2]
          1 b [df1]
          2 c [df2]


          The assign will add a column named source with value df1 and df2 to your dataframes. groupby command groups rows with same A value to single row. aggregate command describes how to aggregate other columns (source) for each group of rows with same A. I have used list aggregate function so that the source column be the list of values with same A.






          share|improve this answer










          New contributor




          Narges Ayoubi is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.










          Use the command below:



          df3 = pd.concat([df1.assign(source='df1'), df2.assign(source='df2')]) 
          .groupby('A')
          .aggregate(list)
          .reset_index()


          The result will be:



             A      source
          0 a [df1, df2]
          1 b [df1]
          2 c [df2]


          The assign will add a column named source with value df1 and df2 to your dataframes. groupby command groups rows with same A value to single row. aggregate command describes how to aggregate other columns (source) for each group of rows with same A. I have used list aggregate function so that the source column be the list of values with same A.







          share|improve this answer










          New contributor




          Narges Ayoubi is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.









          share|improve this answer



          share|improve this answer








          edited 2 hours ago





















          New contributor




          Narges Ayoubi is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.









          answered 2 hours ago









          Narges AyoubiNarges Ayoubi

          564




          564




          New contributor




          Narges Ayoubi is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.





          New contributor





          Narges Ayoubi is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          Narges Ayoubi is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.













          • thank you for the answer!

            – Leon Rai
            1 hour ago



















          • thank you for the answer!

            – Leon Rai
            1 hour ago

















          thank you for the answer!

          – Leon Rai
          1 hour ago





          thank you for the answer!

          – Leon Rai
          1 hour ago










          Leon Rai is a new contributor. Be nice, and check out our Code of Conduct.










          draft saved

          draft discarded


















          Leon Rai is a new contributor. Be nice, and check out our Code of Conduct.













          Leon Rai is a new contributor. Be nice, and check out our Code of Conduct.












          Leon Rai is a new contributor. Be nice, and check out our Code of Conduct.
















          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54315222%2fpandas-union-of-two-data-frames%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          SQL Server 17 - Attemping to backup to remote NAS but Access is denied

          Always On Availability groups resolving state after failover - Remote harden of transaction...

          Restoring from pg_dump with foreign key constraints