Over the halfway point! (I appreciate week 6 was a while ago, I haven’t had a chance to clean up my write up for this until now). This week we’re looking at email authentication again, trying to identify the actual date of an email. I scraped this one by with only hours to spare, and chatting with the others it was interesting to see the different approaches taken.
If you'd like to play along, the challenge has been archived here
Your team has been investigating cryptocurrency transactions for several months. You have received the following email with timing information that would be critical for your case.
Both participants in the email conversation are believed to be in Pacific Time. Having taken a quick look at the email, you suspect that the email has been manipulated.
Dig deeper and determine the correct origination date of the email (i.e., the value in the “Date:” header field). Enter the timestamp in UTC in the following format: yyyy-mm-dd hh:mm:ss (e.g., 2005-11-20 13:17:11)
Simple enough! Lets assess the situation: (Hidden in the above image, but) from the From and To fields we can see we have an email conversation between a Gmail and Yahoo mail address. We can see the email purports to be sent around 19/20 May 2020, but based on the question this is likely false. What we want to do here is identify every single date field and decode it to try and figure out whether this is a quick win to answer the question!
What goes in the date field? The date field should indicate the date that the email was sent. We can assume that this current value is incorrect.
The date field identified has the following timestamp recorded: Wed, 20 May 2020 10:13:49 -0800
Some of the other timestamps however are set by other processes and can be more reliable; Some other timestamps are set by the receiver (including possibly by their manipulation process), and therefore will be similar but not the same as the computer that sent the email’s date.
Nothing particularly groundbreaking here, all of these seem to match up with the date the email says it was sent. They are shown in the screenshot at the bottom of this post.
Gmail MIME boundaries have timestamps in them, and conveniently, Arman has written a blog post on decoding these. Instead of downloading his tool I wrote a quick Python script to decode the value for us.
from datetime import datetime boundary = "000000000000ee186a05b959a8f0" ts = int(boundary[18:-2] + boundary[12:-10], 16) / 1000000 print(datetime.utcfromtimestamp(ts).strftime('%Y-%m-%d %H:%M:%S') + " UTC")
This decoded to Wed 20 January 2021 19:07:00.957 UTC, which was deemed to be incorrect. My guess is this is the time that the email was saved out, but would have to confirm. In a quick test I sent an email and it actually recorded when the email was received by my gmail, rather than when I sent it out from my sender address.
Another timestamp is recorded in the X-Received header, which should indicate when the email was received by Gmail. It has a unix timestamp (1589998440956) as well as a decoded time (Wed, 20 May 2020 10:14:00 -0800 (PST)) , but both of these line up with the Date field more or less. Don’t think this is going to be the right one.
References/Message ID/Thread ID
This section is where I spent the most of my time. Message IDs are great, especially when they contain timestamps. Sometimes they’ll clearly display a unix timestamp (for ex Yahoo Mail) and others not (For ex Gmail, sometimes).
Decoding this timestamp gets us, Tue 19 January 2021 23:52:05 UTC, which gives us a good indication of when the initial Yahoo email was sent.
The next step would be to look at the Message ID and see what can determined. The Message ID is unique, and as previously mentioned can contain a timestamp. In this instance, the Message ID was CAMvYnDNar7u2B8+ZmzzEgabA9+ijdwvgbmp7Ti=0gosncP5Epg@mail.gmail.com.
Unfortunately, this looks like a base64 encoded string that I was unable to decode. Someone else asked a similar question just prior to the challenge, but no luck there either. I tried decoding it from base64, playing with protobuf, looking online, but didn’t find anything. Decoding this would probably give us the exact answer, but after spending a lot of time trying to figure it out I gave up.
I did not see any Thread IDs in there either, which would have been helpful.
The last place (because it gave me the answer, but not in the way you’d think) to look was the DKIM signature.
Typically I’ve seen a unix timestamp recorded in here somewhere, but this may have been removed. There is a field, s, that has a date in the YYYYMMDD format. I also had a strong suspicion that this was not the answer I was looking for.
What I can see, however, is that the DKIM signature is calculated based on a few fields, most of which I don’t think had been manipulated: Mime-version, references, in-reply-to, from, message-id, subject, and to are likely not manipulated. That leaves the “Date” field and the “Body Hash” as the fields that need some TLC.
I ran the email through DKIMPY and received a body hash mismatch (along with some other errors, but the body hash mismatch was at the end).
Taking a look at the body of the email, we can see two dates. Both of them are identical, and there’s nothing obvious in the formatting to suggest manipulation. That being said, they don’t line up with the timestamp we previously identified in the message ID, which was 19 January 2021 23:52:05 UTC
If we convert the message ID timestamp to PST, we get Jan 19, 2020 at 3:52 PM. After running DKIMPY again over the email we now just get a simple “signature verification failed”. As we have established, the Date field is likely the next culprit for manipulation, and if we can fix that one, then we’ll have our answer.
Based on the Gmail MIME boundary we think that the email should be dated around 20 January 2021 19:07:00 UTC, which is 20 January 11:07:00 PST (UTC-8). I’d like to say that I wrote some fancy program to automate this, but I figured this date would be a good starting point and thought the actual date might be a minute or two either side. So, I manually changed the time, and validated the DKIM each time, until
This gets me the correct Date field value of: Wed, 20 Jan 2021 11:06:49 -0800.
(I now realise that the seconds weren’t manipulated, the hour was identical to the MIME boundary timestamp, and the minute was off by 1, but whatever it’s in the past).
I found this challenge super tricky and spent a lot of time on it. It was pretty cool to get the DKIM verification to really show what the email was prior to manipulation. As part of my analysis I pulled out every timestamp that I could find and mapped it out in a spreadsheet. The highlighting doesn’t mean much other than the darker the orange the closer to the actual timestamp I thought it was.
Look at how many timestamps there are in a single email! And I’m sure I’m missing some encoded ones that aren’t obvious. It’s a shame email analysis tools don’t generate this for us!