Duplicate SMS messages not seen as duplicates

For user who dont speak german please use this part of my forum
vodoomoth
Beiträge: 75
Registriert: Fr 13. Jan 2012, 11:00

Duplicate SMS messages not seen as duplicates

Beitrag von vodoomoth »

I am facing a quite weird problem with duplicate SMS messages in the Archived messages.

I have literally hundreds of pairs of seemingly identical messages (which I know have somehow been duplicated in the MPE database – I received or sent only one copy of the message). The "Search duplicates" menu entry runs without finding anything... but I still see strictly identical sender, message and time entries for those duplicates.

I imagine that the comparison is based on more than "From/To", Message and Time, which is strange to me as I, as a user, don't see these additional properties that come into play in the comparison.

Is there a further way to get rid of those double entries, in addition to the "Search duplicates" command, which, in all obviousness, fails to detect these many duplications.

@FJ: I can send either logs or screen captures if needed.

I'm running the latest versions of both the PC (1.8.7) and phone (1.0.39) clients.
Thanks.
ealbamb
Beiträge: 4
Registriert: So 17. Mai 2015, 15:49

Beitrag von ealbamb »

Hi,

I'm with 1.8.7 and having the duplicates issue, however I'm one step behind .. I cannot find the Search Duplicates. Can you please give a hint (My Db counts now more than 15000 entries). In other discussions I read about Right-Click but cannot understand where I have to

Thank you all
bert
vodoomoth
Beiträge: 75
Registriert: Fr 13. Jan 2012, 11:00

Beitrag von vodoomoth »

Search duplicates appear in the contextual menu (right mouse click – for lefties :-)) when you are in "Archive (computer)" (see the "Messages" section of the left side bar).
Therefore, duplicates will be removed from the archives. If you have many of them in your Sent or Inbox folders, you may move them to the archive, remove duplicates and move them back to their original folders.
ealbamb
Beiträge: 4
Registriert: So 17. Mai 2015, 15:49

Beitrag von ealbamb »

ok. Found and used. Removed 133000 duplicates from archive .. however it still creates new every time I sync, but this is another story. Thanks a lot Vodoomoth.

After a check I found a small number of residual duplications but very few. I guess there is something different in the actual text but I can't get what it is

Thanks again
bert
ealbamb
Beiträge: 4
Registriert: So 17. Mai 2015, 15:49

Beitrag von ealbamb »

.. after a deeper check I found that in duplications left after removal, text is different.

The difference is caused by some sort of automatic formatting of special characters (apostrophe changed to diferrent apostrophe type, short dash changed to long dash, suspension bullets changed to underscore ... ). In my case they are very few (17 out of 16000 SMS and after a dedup which remved 133000 duplicates). I use Italian as a language (may be this interferes ..)

Hope this helps

Thanks
bert
vodoomoth
Beiträge: 75
Registriert: Fr 13. Jan 2012, 11:00

Beitrag von vodoomoth »

ealbamb hat geschrieben:.. after a deeper check I found that in duplications left after removal, text is different.

The difference is caused by some sort of automatic formatting of special characters (apostrophe changed to diferrent apostrophe type, short dash changed to long dash, suspension bullets changed to underscore ... ).


This may be an explanation of some (or maybe most) of the duplicates. But I have such messages that have no special characters; they are pure ASCII. Example of the contents of one such message:
Courage!


8 characters, with none that could have been replaced.

Anyway, I believe that what the user sees in the Archive view should overrule technical considerations. Same From/To, Message and Time fields in different messages must lead to a classification as duplicates.

I have uploaded screen captures of two duplicates with the 8-character contents that I've mentioned above:

Bild

Bild


The only difference is in the PDU field (two more characters in the second image, and FF09 at the end of the first line instead of FF08), which I, as a user, don't care about. I don't even know what it stands for.

I think the PDU field is a better candidate for an explanation of why MPE doesn't always see the duplicates that users see.

@FJ: could you add an option (checkbox or additional menu entry or else) to ignore that PDU field when determining whether two are duplicates? Maybe a "deep-search for duplicates" option is in order.
tubular
Beiträge: 2
Registriert: Mo 8. Jun 2015, 09:46

Beitrag von tubular »

Anything new in this topic?
I have the same problem...
vodoomoth
Beiträge: 75
Registriert: Fr 13. Jan 2012, 11:00

Beitrag von vodoomoth »

I have just come across two identical messages that seem to have the exact same PDU but which still aren't detected as duplicates. Therefore, unless I've missed something in that very long PDU (or it's too long to fit in the available space in the dialog box), the PDU field still isn't enough to explain the fact that MPE misses some duplicates.

FJ, have you got a chance to look into this?
vodoomoth
Beiträge: 75
Registriert: Fr 13. Jan 2012, 11:00

Beitrag von vodoomoth »

I have ended up finding out why this problem occurs: one version of the duplicate messages has a trailing whitespace character, and/or the time of the message is affected by daylight saving settings (meaning that the same message will appear with two different timestamps).

I found out about the trailing whitespace because I've written a small Java program to process an MPE export of messages into a text file so as to remove duplicates. It reads the exported message file and copies messages to either of two files, one with the "pristine" conversation(s) free of any duplicates, and the other that contains the duplicates. I haven't dealt with duplicates caused by daylight saving settings.

I haven't tried yet exporting all archive messages, deleting messages from the archive and reimporting the cleaned up export. I guess there should be no problem doing this.
xanda
Beiträge: 18
Registriert: Do 16. Mai 2013, 12:28

Beitrag von xanda »

We can see the issue too: having exported to CSV, the trailing whitespace are clearly shown when looked at with a spreadsheet.

Is there a way to handle this automatically?

We have 1000s of messages and reckon about a third are duplicates. It's very tedious trying to delete them all manually.

Any suggestions? Thanks.
vodoomoth
Beiträge: 75
Registriert: Fr 13. Jan 2012, 11:00

Beitrag von vodoomoth »

xanda hat geschrieben:Any suggestions?


Other than bringing FJ's attention to this topic and to the various causes that we've identified, especially the trailing whitespace (which is a breeze to fix), I can't think of any.

Like I said in an earlier post, I have written a small Java program to deal with this issue, but not every MPE user is a software developer. The best solution is always to get the fix into the upstream. But since this is not open source software, I can't contribute a fix to this supremely annoying issue.

It's funny that the very best programs I use, MyPhoneExplorer and FreeCommander both suffer from the fact that their most annoying issue has flown under the developer's radar.
Antworten