pandas: union of two data frames
Say I have two data frames:
df1:
A
0 a
1 b
df2:
A
0 a
1 c
I want the result to be the union of the two frames with an extra column showing the source data frame that the row belongs to. In case of duplicates, duplicates should be removed and the respective extra column should show both sources:
A B
0 a df1, df2
1 b df1
2 c df2
I can get the concatenated data frame (df3) without duplicates as follows:
import pandas as pd
df3=pd.concat([df1,df2],ignore_index=True).drop_duplicates().reset_index(drop=True)
I can't think of/find a method to have control over what element goes where. How can I add the extra column?
Thank you very much for any tips.
python pandas dataframe concatenation
New contributor
add a comment |
Say I have two data frames:
df1:
A
0 a
1 b
df2:
A
0 a
1 c
I want the result to be the union of the two frames with an extra column showing the source data frame that the row belongs to. In case of duplicates, duplicates should be removed and the respective extra column should show both sources:
A B
0 a df1, df2
1 b df1
2 c df2
I can get the concatenated data frame (df3) without duplicates as follows:
import pandas as pd
df3=pd.concat([df1,df2],ignore_index=True).drop_duplicates().reset_index(drop=True)
I can't think of/find a method to have control over what element goes where. How can I add the extra column?
Thank you very much for any tips.
python pandas dataframe concatenation
New contributor
add a comment |
Say I have two data frames:
df1:
A
0 a
1 b
df2:
A
0 a
1 c
I want the result to be the union of the two frames with an extra column showing the source data frame that the row belongs to. In case of duplicates, duplicates should be removed and the respective extra column should show both sources:
A B
0 a df1, df2
1 b df1
2 c df2
I can get the concatenated data frame (df3) without duplicates as follows:
import pandas as pd
df3=pd.concat([df1,df2],ignore_index=True).drop_duplicates().reset_index(drop=True)
I can't think of/find a method to have control over what element goes where. How can I add the extra column?
Thank you very much for any tips.
python pandas dataframe concatenation
New contributor
Say I have two data frames:
df1:
A
0 a
1 b
df2:
A
0 a
1 c
I want the result to be the union of the two frames with an extra column showing the source data frame that the row belongs to. In case of duplicates, duplicates should be removed and the respective extra column should show both sources:
A B
0 a df1, df2
1 b df1
2 c df2
I can get the concatenated data frame (df3) without duplicates as follows:
import pandas as pd
df3=pd.concat([df1,df2],ignore_index=True).drop_duplicates().reset_index(drop=True)
I can't think of/find a method to have control over what element goes where. How can I add the extra column?
Thank you very much for any tips.
python pandas dataframe concatenation
python pandas dataframe concatenation
New contributor
New contributor
New contributor
asked 2 hours ago
Leon RaiLeon Rai
455
455
New contributor
New contributor
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
Merge with an indicator
argument, and remap the result:
m = {'left_only': 'df1', 'right_only': 'df2', 'both': 'df1, df2'}
result = df1.merge(df2, on=['A'], how='outer', indicator='B')
result['B'] = result['B'].map(m)
result
A B
0 a df1, df2
1 b df1
2 c df2
Nice and succinct!
– cph_sto
2 hours ago
@cph_sto Thank you! Upvoted back.
– coldspeed
2 hours ago
1
I have learnt a lot from you.
– cph_sto
2 hours ago
Excellent! Could you add how to do the same for intersection? outer->inner?
– Leon Rai
1 hour ago
1
@LeonRaidf1.merge(df2, on=['A'], how='inner').assign(B='df1, df2')
(since intersection implies membership in both)
– coldspeed
1 hour ago
|
show 1 more comment
We use outer join to solve this -
df1 = pd.DataFrame({'A':['a','b']})
df2 = pd.DataFrame({'A':['a','c']})
df1['col1']='df1'
df2['col2']='df2'
df=pd.merge(df1, df2, on=['A'], how="outer").fillna('')
df['B']=df['col1']+','+df['col2']
df['B'] = df['B'].str.strip(',')
df=df[['A','B']]
df
A B
0 a df1,df2
1 b df1
2 c df2
thank you for the answer!
– Leon Rai
1 hour ago
pleasure Leon :)
– cph_sto
1 hour ago
add a comment |
Use the command below:
df3 = pd.concat([df1.assign(source='df1'), df2.assign(source='df2')])
.groupby('A')
.aggregate(list)
.reset_index()
The result will be:
A source
0 a [df1, df2]
1 b [df1]
2 c [df2]
The assign
will add a column named source
with value df1
and df2
to your dataframes. groupby
command groups rows with same A
value to single row. aggregate
command describes how to aggregate other columns (source
) for each group of rows with same A
. I have used list
aggregate function so that the source
column be the list of values with same A
.
New contributor
thank you for the answer!
– Leon Rai
1 hour ago
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Leon Rai is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54315222%2fpandas-union-of-two-data-frames%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Merge with an indicator
argument, and remap the result:
m = {'left_only': 'df1', 'right_only': 'df2', 'both': 'df1, df2'}
result = df1.merge(df2, on=['A'], how='outer', indicator='B')
result['B'] = result['B'].map(m)
result
A B
0 a df1, df2
1 b df1
2 c df2
Nice and succinct!
– cph_sto
2 hours ago
@cph_sto Thank you! Upvoted back.
– coldspeed
2 hours ago
1
I have learnt a lot from you.
– cph_sto
2 hours ago
Excellent! Could you add how to do the same for intersection? outer->inner?
– Leon Rai
1 hour ago
1
@LeonRaidf1.merge(df2, on=['A'], how='inner').assign(B='df1, df2')
(since intersection implies membership in both)
– coldspeed
1 hour ago
|
show 1 more comment
Merge with an indicator
argument, and remap the result:
m = {'left_only': 'df1', 'right_only': 'df2', 'both': 'df1, df2'}
result = df1.merge(df2, on=['A'], how='outer', indicator='B')
result['B'] = result['B'].map(m)
result
A B
0 a df1, df2
1 b df1
2 c df2
Nice and succinct!
– cph_sto
2 hours ago
@cph_sto Thank you! Upvoted back.
– coldspeed
2 hours ago
1
I have learnt a lot from you.
– cph_sto
2 hours ago
Excellent! Could you add how to do the same for intersection? outer->inner?
– Leon Rai
1 hour ago
1
@LeonRaidf1.merge(df2, on=['A'], how='inner').assign(B='df1, df2')
(since intersection implies membership in both)
– coldspeed
1 hour ago
|
show 1 more comment
Merge with an indicator
argument, and remap the result:
m = {'left_only': 'df1', 'right_only': 'df2', 'both': 'df1, df2'}
result = df1.merge(df2, on=['A'], how='outer', indicator='B')
result['B'] = result['B'].map(m)
result
A B
0 a df1, df2
1 b df1
2 c df2
Merge with an indicator
argument, and remap the result:
m = {'left_only': 'df1', 'right_only': 'df2', 'both': 'df1, df2'}
result = df1.merge(df2, on=['A'], how='outer', indicator='B')
result['B'] = result['B'].map(m)
result
A B
0 a df1, df2
1 b df1
2 c df2
answered 2 hours ago
coldspeedcoldspeed
126k23126213
126k23126213
Nice and succinct!
– cph_sto
2 hours ago
@cph_sto Thank you! Upvoted back.
– coldspeed
2 hours ago
1
I have learnt a lot from you.
– cph_sto
2 hours ago
Excellent! Could you add how to do the same for intersection? outer->inner?
– Leon Rai
1 hour ago
1
@LeonRaidf1.merge(df2, on=['A'], how='inner').assign(B='df1, df2')
(since intersection implies membership in both)
– coldspeed
1 hour ago
|
show 1 more comment
Nice and succinct!
– cph_sto
2 hours ago
@cph_sto Thank you! Upvoted back.
– coldspeed
2 hours ago
1
I have learnt a lot from you.
– cph_sto
2 hours ago
Excellent! Could you add how to do the same for intersection? outer->inner?
– Leon Rai
1 hour ago
1
@LeonRaidf1.merge(df2, on=['A'], how='inner').assign(B='df1, df2')
(since intersection implies membership in both)
– coldspeed
1 hour ago
Nice and succinct!
– cph_sto
2 hours ago
Nice and succinct!
– cph_sto
2 hours ago
@cph_sto Thank you! Upvoted back.
– coldspeed
2 hours ago
@cph_sto Thank you! Upvoted back.
– coldspeed
2 hours ago
1
1
I have learnt a lot from you.
– cph_sto
2 hours ago
I have learnt a lot from you.
– cph_sto
2 hours ago
Excellent! Could you add how to do the same for intersection? outer->inner?
– Leon Rai
1 hour ago
Excellent! Could you add how to do the same for intersection? outer->inner?
– Leon Rai
1 hour ago
1
1
@LeonRai
df1.merge(df2, on=['A'], how='inner').assign(B='df1, df2')
(since intersection implies membership in both)– coldspeed
1 hour ago
@LeonRai
df1.merge(df2, on=['A'], how='inner').assign(B='df1, df2')
(since intersection implies membership in both)– coldspeed
1 hour ago
|
show 1 more comment
We use outer join to solve this -
df1 = pd.DataFrame({'A':['a','b']})
df2 = pd.DataFrame({'A':['a','c']})
df1['col1']='df1'
df2['col2']='df2'
df=pd.merge(df1, df2, on=['A'], how="outer").fillna('')
df['B']=df['col1']+','+df['col2']
df['B'] = df['B'].str.strip(',')
df=df[['A','B']]
df
A B
0 a df1,df2
1 b df1
2 c df2
thank you for the answer!
– Leon Rai
1 hour ago
pleasure Leon :)
– cph_sto
1 hour ago
add a comment |
We use outer join to solve this -
df1 = pd.DataFrame({'A':['a','b']})
df2 = pd.DataFrame({'A':['a','c']})
df1['col1']='df1'
df2['col2']='df2'
df=pd.merge(df1, df2, on=['A'], how="outer").fillna('')
df['B']=df['col1']+','+df['col2']
df['B'] = df['B'].str.strip(',')
df=df[['A','B']]
df
A B
0 a df1,df2
1 b df1
2 c df2
thank you for the answer!
– Leon Rai
1 hour ago
pleasure Leon :)
– cph_sto
1 hour ago
add a comment |
We use outer join to solve this -
df1 = pd.DataFrame({'A':['a','b']})
df2 = pd.DataFrame({'A':['a','c']})
df1['col1']='df1'
df2['col2']='df2'
df=pd.merge(df1, df2, on=['A'], how="outer").fillna('')
df['B']=df['col1']+','+df['col2']
df['B'] = df['B'].str.strip(',')
df=df[['A','B']]
df
A B
0 a df1,df2
1 b df1
2 c df2
We use outer join to solve this -
df1 = pd.DataFrame({'A':['a','b']})
df2 = pd.DataFrame({'A':['a','c']})
df1['col1']='df1'
df2['col2']='df2'
df=pd.merge(df1, df2, on=['A'], how="outer").fillna('')
df['B']=df['col1']+','+df['col2']
df['B'] = df['B'].str.strip(',')
df=df[['A','B']]
df
A B
0 a df1,df2
1 b df1
2 c df2
answered 2 hours ago
cph_stocph_sto
1,519320
1,519320
thank you for the answer!
– Leon Rai
1 hour ago
pleasure Leon :)
– cph_sto
1 hour ago
add a comment |
thank you for the answer!
– Leon Rai
1 hour ago
pleasure Leon :)
– cph_sto
1 hour ago
thank you for the answer!
– Leon Rai
1 hour ago
thank you for the answer!
– Leon Rai
1 hour ago
pleasure Leon :)
– cph_sto
1 hour ago
pleasure Leon :)
– cph_sto
1 hour ago
add a comment |
Use the command below:
df3 = pd.concat([df1.assign(source='df1'), df2.assign(source='df2')])
.groupby('A')
.aggregate(list)
.reset_index()
The result will be:
A source
0 a [df1, df2]
1 b [df1]
2 c [df2]
The assign
will add a column named source
with value df1
and df2
to your dataframes. groupby
command groups rows with same A
value to single row. aggregate
command describes how to aggregate other columns (source
) for each group of rows with same A
. I have used list
aggregate function so that the source
column be the list of values with same A
.
New contributor
thank you for the answer!
– Leon Rai
1 hour ago
add a comment |
Use the command below:
df3 = pd.concat([df1.assign(source='df1'), df2.assign(source='df2')])
.groupby('A')
.aggregate(list)
.reset_index()
The result will be:
A source
0 a [df1, df2]
1 b [df1]
2 c [df2]
The assign
will add a column named source
with value df1
and df2
to your dataframes. groupby
command groups rows with same A
value to single row. aggregate
command describes how to aggregate other columns (source
) for each group of rows with same A
. I have used list
aggregate function so that the source
column be the list of values with same A
.
New contributor
thank you for the answer!
– Leon Rai
1 hour ago
add a comment |
Use the command below:
df3 = pd.concat([df1.assign(source='df1'), df2.assign(source='df2')])
.groupby('A')
.aggregate(list)
.reset_index()
The result will be:
A source
0 a [df1, df2]
1 b [df1]
2 c [df2]
The assign
will add a column named source
with value df1
and df2
to your dataframes. groupby
command groups rows with same A
value to single row. aggregate
command describes how to aggregate other columns (source
) for each group of rows with same A
. I have used list
aggregate function so that the source
column be the list of values with same A
.
New contributor
Use the command below:
df3 = pd.concat([df1.assign(source='df1'), df2.assign(source='df2')])
.groupby('A')
.aggregate(list)
.reset_index()
The result will be:
A source
0 a [df1, df2]
1 b [df1]
2 c [df2]
The assign
will add a column named source
with value df1
and df2
to your dataframes. groupby
command groups rows with same A
value to single row. aggregate
command describes how to aggregate other columns (source
) for each group of rows with same A
. I have used list
aggregate function so that the source
column be the list of values with same A
.
New contributor
edited 2 hours ago
New contributor
answered 2 hours ago
Narges AyoubiNarges Ayoubi
564
564
New contributor
New contributor
thank you for the answer!
– Leon Rai
1 hour ago
add a comment |
thank you for the answer!
– Leon Rai
1 hour ago
thank you for the answer!
– Leon Rai
1 hour ago
thank you for the answer!
– Leon Rai
1 hour ago
add a comment |
Leon Rai is a new contributor. Be nice, and check out our Code of Conduct.
Leon Rai is a new contributor. Be nice, and check out our Code of Conduct.
Leon Rai is a new contributor. Be nice, and check out our Code of Conduct.
Leon Rai is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54315222%2fpandas-union-of-two-data-frames%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown