Is it Fraud? Illustrating some marginal to bad practices in ad-tech

A couple weeks ago I was looking over geographic location data to inform the return on investment of an advertising campaign. A significant portion of metropolitan data was showing a large percentage of the users sitting in the exact same location. It was as if they all checked-in on their apps while they were standing at the center of the city. That seemed unlikely. Was there something wrong with the data? Well, yes and no. More importantly: was it fraud?

Was it fraud? Welcome to the gray area of ad-tech. eMarketer shows that display advertising fraud loss is hovering around $6 billion worldwide. This data is likely derived from very black and white definitions of what is and what isn’t fraud. In this post I’ll be lining up not only explicitly fraudulent behavior, but also some marginal tactics of media owners, supply side companies, advertisers and demand side companies to illuminate the fringes of acceptable behavior.

Geo Stuffing

Let’s start with what was going on with all those device signals showing up at the center of town. These days a lot of advertising campaigns are geographically targeted. That means that if a user falls outside of the targeted area, they’re not going to be a candidate for the ad. Geo stuffing happens when a media owner or their technology partner sprinkles a latitude and longitude (lat/long) into the impression details before it gets sent out to market. When the user’s location is outside the target area or legitimately unknown, this is fraud.

Fraud? Geo-stuffing

What if the buy-side system or the media owner is using an IP to geography mapping service that yields the city or DMA, but not much else. The impression is populated with city and state or region, but not lat/long. Buy-side campaigns that are strictly targeting based on lat/long won’t end up serving to that user. Seems kind of unfair. The campaign’s shortcoming is causing it to miss out on a user within it’s target area. In order to capture more campaigns (and dollars) a media owner or supply platform might go ahead and stuff a lat/long in there.

If they wanted to fly under the radar they’d vary the lat/long, but that seems a little nefarious. So what you end up with in the data are clusters of users that appear to be gathered at a rally point in the middle of town. Is this fraud? Thank about it for a few minutes, I’ll wait.

Now, let’s take it a step further. If the media owner or supply partner happens to have the home address, or happens to know the user spends a lot of time in the target geo region, what if they stuff those coordinates into the impression details? If the original request is missing a geographic element entirely and the IP lookup yields nothing, it’s possible that a slightly less scrupulous party might stuff that thing with a lat/long. I tipped my hand here though, didn’t I. Feels like this one is over the line.

Inventory Amplification

Last week I wrote about the end-all-be-all of pacing algorithms. Let me illuminate a problem with that algorithm that someone might exploit. The way the math works is by buying a random percentage of desired impressions. If a media owner wanted to exploit this attribute they might find that they make more money if they present more inventory for purchase. In a competitive landscape this would give them an edge over other media owners with similar inventory.

When both media owners submit 100,000 impressions in the same time period, and the campaign only needs 50,000, then the purchases are generally going to be distributed evenly.

  • Media Owner A: 100,000
  • Media Owner B: 100,000
  • Campaign Needs: 50,000

campaign needs / total impressions = distribution rate

50,000 / 200,000 = 25%

Each media owner sells 25,000 impressions to the campaign.

25% of 100,000 –> 25,000

Now, what if Media Owner B doubled the number of impressions in the pool by offering each impression twice. Media Owner B now captures 33,333 of the campaign’s impressions, while Media Owner A is left with only 16,666.

50,000 / 300,000 = 16.6%

A: 16.6% of 100,000 –> 16,666

B: 16.6% of 200,000 –> 33,333

This one makes me mad, of course. I mean, I’m not printing out the scenario and hanging it, laughably, on my dart board, but there is much anger here. To that end I’m going to suggest a way to solve for this as a buy-side system. Rather than allocate campaign buying according to raw impressions, distribute the buy based on unique users. That’s really what your advertisers are after anyway. Generating unique user identities is a bit tougher than doubling the impressions shoved out for purchase. Doing this will eliminate the less sophisticated set of fraudsters. Ah, there it is again. Yes, this is fraud.

Slightly bad person: How can it be fraud if it’s so easy to do?
Me: Still fraud, **angry mumble**. You’re taking money from other sellers. Hell, this might be hacking to boot.

Bear in mind, buyers and buy-side system managers, header bidding frequently presents redundant impressions. Consolidate your inventory identification because the same publisher’s inventory can come from multiple sources.

Bid Caching

Oh, those of you who know me know that there’s a special place in my heart for bid caching. Before you get too excited (seriously, sit down, Steve), I want to clear something up around the phrase. It actually can refer to at least a couple of different practices, one more legitimate than the other.

Bid caching happens in conjunction with Real-Time Bidding (RTB) or Header Bidding (completely misnamed re-invention of RTB). In both environments publishers, through tech partners, send each impression to third parties for bids. SSPs and DSPs send bids back in, publisher tech holds an auction and serve the winning campaign. The publisher tech, for some reason, may hang onto bids for later use, that’s bid caching.

More legit

A legitimate use of bid caching can occur when dealing with a high-latency environment like Digital Out Of Home (DOOH) or Over the Top Television (OTT). These media tend to have a stretch of time between when an impressions is made available, and when it actually serves. Sometimes it’s a matter of seconds, other times it’s minutes. Minutes! Media owners might allow bids to arrive over a long period of time and run the auction in the last seconds.

In the online display advertising world such lengthy durations between bid request and rendered impression will raise more than a few eyebrows. Display ads are generally served within milliseconds of a bid. Some additional latency is tolerated for mobile devices, but in general things still operate in the sub-second realm. When bid to render time gap (B2R – yeah, I’m totally coining this one right now) becomes consistently significant in display, folks start to take notice.

Less legit

There’s no need to beat a dead horse, we all know what happened. Bids were cached, and their ads rendered on different impressions. Sure, same user… probably, same website or mobile app, same “session” too, but the fact remains that the bid was destined for a previous impression.

Now you’re asking, “Why would someone set up such a scenario?” Short answer: Money. Long answer: To make your tech look more attractive than the competition (i.e. to win more business). This practice will yield more money to the media owner, but at the expense of your market’s integrity and the advertisers who are pumping money into it. It’s not Real-Time Bidding anymore, it’s like a game of got-yer-nose… Got Your Bid!

Caching bids for profit, image source: https://www.pxfuel.com/en/free-photo-ojucq/

Buying Traffic

An oldie but a goldie, the ability to buy traffic probably launched more than a few online media companies. What’s a new publisher supposed to do with limited funds and a need to show traffic and advertising growth to the investors? Just buy traffic!

What does that even mean? Buying traffic… how do you even do that. Are there armies of users sitting around waiting to get paid pennies for browsing over to a website? No (well, yes, but “no” sounds better here). Various methods can be levied to bring the appearance of popularity to a piece of media. It’s all kinda crap, but some ways are less crappy.

Pop-unders

Not to be confused with pop-overs, the tasty baked good, a pop-under is a new browser window that pops open, but underneath your current window. This new browser loads the target media, thereby adding to the traffic volume. Pop-unders were common before browsers started to police that popping activity. Thank you, browser companies, for reining in this mess. Was it fraud? Yeah, probably. Did you hit your numbers? *sigh*

Paying users

I know I said that there aren’t willing users sitting around waiting for someone to pay them to browse the web. That was a lie. There are willing users. Perpetrators (puppetmasters?) pay in bitcoin or some other microtransaction-friendly mechanism. Any kind of media can be sold to them, video, games, website impressions… whatever. I don’t know who they are, but they’re out there. Go digging around in the bitcoin world and you’ll inevitably stumble on an offer to watch videos or something in exchange for some coinage.

Bot and other bots

And more bots. One version is simple. It’s a headless bot running on a cloud server somewhere. You dump in a set of addresses and off they go, browsing away. They vary the clicks, cookies and so forth. They don’t identify themselves as bots and they do their very best to look human.

Other bots are more clever. They look more human because they’re hiding on a computer belonging to a real human. These bots mostly come out at night when their human host is asleep. Hijacking the browser, they will pull a fresh list of addresses to visit and go to work. To the internet, ad sellers, buy-side and sell-side tech this all looks pretty legit. It’s a human’s browser, from a human’s IP address, a human’s cookies and a human’s browsing history. All these factors make this bot look very human.

This type of traffic even stands up to most scrutiny. Audit all the activity of the browser, it will pass most tests (that can be conducted against it from afar). However, humans have to catch about eight hours of sleep in any 24 hour period. One of the ways to detect these bots is by looking at the complete activity of the browser and seeing the insomnia. (Sorry to those of you who actually suffer from insomnia, you kinda look like a bot.)

The Bake Off

Let’s end this post back in the gray area. Raise your hand if you love bake-offs. I see a lot of hands. Oh, you’re thinking Paul Hollywood and Mary Berry are judging. No, not that kind of bake-off. I talking about the kind where an advertiser, agency or publisher wants to pit one tech vendor against another in head-to-head competition. Which will yield the better results?! Ugh.

It’s more like: which vendor can drop their margin to zero and pull out all the protections in order to hit the key measurements of the competition, only to then go back to business as usual once they win. Is it fraud? Uh… Is it reality? No. Does everyone know that it’s not real? Yes. Why do we do it? Because the RFP asked a bunch of irrelevant questions.

Wow. I didn’t realize I was so bitter about bake-offs. I may have over-proved it.

Leave a Reply