I've not yet heard of anyone attempting to offer this. There are good reasons why. In simple terms:
A = interpretation (speech), which is done live, on-the-spot, with a delay (lag) of maybe a couple of seconds at most
B = translation (text), which takes much more time, and is a completely separate industry.
C = stenography, mostly seen in courts and the like, where someone writes / types out what is being said in real time, in the same language
If you want an event translated into English subtitles (text) in real time, you're basically asking for A but typed out rather than spoken, ie A + C. The typing process takes extra time and makes it less 'live'. You'll also be transcribing oral speech, which will look a bit like US-style event transcripts (see this for an example).
A+C is not impossible, but it's a setup I haven't seen used anywhere yet. When you see people interviewed on the news with subtitles in a different language, these are prepared ahead of time, not done live.
It's probably a lot easier and simpler (especially in technical terms) to just go for A and offer people the chance to listen to the event in English. Here are two AIIC members in Hong Kong who can set things up to do that for you:
Tze Kong HO
Philip Shing Tak LEONG