Gwibber

Merge lp:~markjtully/gwibber/twitter-entities into lp:gwibber

twitter-entities
Merge into trunk

Proposed by Mark Tully on 2012-03-05

Status:

Merged

Merged at revision:

1299

Proposed branch:

lp:~markjtully/gwibber/twitter-entities

Merge into:

lp:gwibber

Diff against target:

441 lines (+228/-56)

4 files modified

gwibber/microblog/plugins/twitter/__init__.py (+132/-24)
gwibber/microblog/util/__init__.py (+69/-17)
libgwibber-gtk/stream-view-tile.vala (+5/-2)
libgwibber/streams.vala (+22/-13)

To merge this branch:

bzr merge lp:~markjtully/gwibber/twitter-entities

Related bugs:

Bug #592283: add support for t.co URL shortener	Wishlist	Fix Released
Bug #648442: Add support for plixi.com image hosting service	Wishlist	Fix Released
Bug #879592: Messages and Home contain same content	Medium	Fix Released

Link a bug report

Reviewer	Review Type	Date Requested	Status
Ken VanDine		2012-03-05	Approve on 2012-03-05
Review via email: mp+95821@code.launchpad.net

Description of the change

Adds support for using twitter tweet entities (see https://dev.twitter.com/docs/tweet-entities) This allows the following:
  Unwrapping of t.co urls so you can see where they lead while maintaining the link to the original t.co link for security (see https://support.twitter.com/entries/109623)
  Displaying urls in tweets in the same way as twitter.com (truncated if necessary, but with the full link visible on mouseover)
  Displaying tweets with links in the links stream
  Displaying tweets with images in the images stream once again (with image previews)
  Displaying tweets with videos in the videos stream (with thumbnails)

Revision history for this message

Ken VanDine (ken-vandine) wrote on 2012-03-05:

Looks great!

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Adolfo Jayme Barrientos

Alexander Sack

Bruno

DaeHyun Ego Sung

Gasyalzmn

Kyle Blomquist

Mark Tully

MartianBoy

Michael Dabydeen

Scott J Roberts

 === modified file 'gwibber/microblog/plugins/twitter/__init__.py'
 --- gwibber/microblog/plugins/twitter/__init__.py	2012-02-13 20:39:02 +0000
 +++ gwibber/microblog/plugins/twitter/__init__.py	2012-03-05 00:08:19 +0000
@@ -1,4 +1,5 @@
  from gwibber.microblog import network, util
++import cgi
  from oauth import oauth
  from gwibber.microblog.util import resources
  from gettext import lgettext as _
@@ -74,29 +75,129 @@
    def _common(self, data):
      m = {}
      try:
--
        m["mid"] = str(data["id"])
        m["service"] = "twitter"
        m["account"] = self.account["id"]
        if data.has_key("created_at"):
          m["time"] = util.parsetime(data["created_at"])
        m["text"] = util.unescape(data["text"])
--      m["to_me"] = ("@%s" % self.account["username"]) in data["text"]
--
--      m["html"] = util.linkify(m["text"],
--        ((util.PARSE_HASH, '#<a class="hash" href="%s#search?q=\\1">\\1</a>' % URL_PREFIX),
--        (util.PARSE_NICK, '@<a class="nick" href="%s/\\1">\\1</a>' % URL_PREFIX)), escape=False)
--
--      m["content"] = util.linkify(m["text"],
--        ((util.PARSE_HASH, '#<a href="gwibber:/tag?acct=%s&query=\\1">\\1</a>' % m["account"]),
--        (util.PARSE_NICK, '@<a href="gwibber:/user?acct=%s&name=\\1">\\1</a>' % m["account"])), escape=True)
--
--      m["favorited"] = data.get("favorited", False)
--
--      images = util.imgpreview(m["text"])
--      if images:
--        m["images"] = images
--        m["type"] = "photo"
++      m["text"] = cgi.escape(m["text"])
++      m["content"] = m["text"]
++
++      # Go through the entities in the tweet and use them to linkify/filter tweeks as appropriate
++      if data.has_key("entities"):
++
++        #Get mention entries
++        if data["entities"].has_key("user_mentions"):
++          for mention in data["entities"]["user_mentions"]:
++            try:
++              screen_name  = mention["screen_name"].lower()
++              startindex = m["content"].lower().index("@" + screen_name) + 1
++              endindex   = startindex + len(screen_name)
++              start      = m["content"][0:startindex]
++              end        = m["content"][endindex:]
++              m["content"] = start + "<a href='gwibber:/user?acct=" + m["account"] + "&name=@" + mention["screen_name"] + "'>" + mention["screen_name"] + "</a>" + end
++            except:
++              pass
++
++        #Get hashtag entities
++        if data["entities"].has_key("hashtags"):
++          for tags in data["entities"]["hashtags"]:
++            try:
++              text       = tags["text"]
++              startindex = m["content"].index("#" + text) + 1
++              endindex   = startindex + len(text)
++              start      = m["content"][0:startindex]
++              end        = m["content"][endindex:]
++              m["content"] = start + "<a href='gwibber:/tag?acct=" + m["account"] + "&query=" + text + "'>" + text + "</a>" + end
++            except:
++              pass
++
++        # Get url entities - These usually go in the link stream, but if they're picturesor videos, they should go in the proper stream
++        if data["entities"].has_key("urls"):
++          for urls in data["entities"]["urls"]:
++              url = cgi.escape (urls["url"])
++              expanded_url = url
++              if urls.has_key("expanded_url"):
++                if not urls["expanded_url"] is None:
++                  expanded_url = cgi.escape(urls["expanded_url"])
++
++              display_url  = url
++              if urls.has_key("display_url"):
++                display_url = cgi.escape (urls["display_url"])
++
++              if url == m["content"]:
++                m["content"] = "<a href='" + url + "' title=" + expanded_url + "'>" + display_url + "</a>"
++              else:
++                try:
++                  startindex = m["content"].index(url)
++                  endindex   = startindex + len(url)
++                  start      = m["content"][0:startindex]
++                  end        = m["content"][endindex:]
++                  m["content"] = start + "<a href='" + url + "' title=" + expanded_url + "'>" + display_url + "</a>" + end
++                except:
++                  logger.debug ("Failed to set url for ID: %s",  m["mid"])
++
++              m["type"] = "link"
++
++              images = util.imgpreview(expanded_url)
++              videos = util.videopreview(expanded_url)
++              if images:
++                m["images"] = images
++                m["type"] = "photo"
++              elif videos:
++                m["images"] = videos
++                m["type"] = "video"
++              else:
++                # Well, it's not anything else, so it must be a link
++                m["link"] = {}
++                m["link"]["picture"] = ""
++                m["link"]["name"] = ""
++                m["link"]["description"] = m["content"]
++                m["link"]["url"] = url
++                m["link"]["icon"] = ""
++                m["link"]["caption"] = ""
++                m["link"]["properties"] = {}
++
++        if data["entities"].has_key("media"):
++          for media in data["entities"]["media"]:
++            try:
++              url = cgi.escape (media["url"])
++              media_url_https = media["media_url_https"]
++              expanded_url = url
++              if media.has_key("expanded_url"):
++                expanded_url = cgi.escape(media["expanded_url"])
++
++              display_url  = url
++              if media.has_key("display_url"):
++                display_url = cgi.escape (media["display_url"])
++
++              startindex = m["content"].index(url)
++              endindex   = startindex + len(url)
++              start      = m["content"][0:startindex]
++              end        = m["content"][endindex:]
++              m["content"] = start + "<a href='" + url + "' title=" + expanded_url  + "'>" + display_url + "</a>" + end
++
++              if media["type"] == "photo":
++                m["type"] = "photo"
++                m["photo"] = {}
++                m["photo"]["picture"] = media_url_https
++                m["photo"]["url"] = None
++                m["photo"]["name"] = None
++
++            except:
++              pass
++
++      else:
++        m["content"] = util.linkify(util.unescape(m["text"]),
++          ((util.PARSE_HASH, '#<a href="gwibber:/tag?acct=%s&query=\\1">\\1</a>' % m["account"]),
++          (util.PARSE_NICK, '@<a href="gwibber:/user?acct=%s&name=\\1">\\1</a>' % m["account"])), escape=True)
++
++      m["html"] = m["content"]
++
++      m["to_me"] = ("@%s" % self.account["username"]) in data["text"]   # Check if it's a reply directed at the user
++      m["favorited"] = data.get("favorited", False)                     # Check if the tweet has been favourited
++
      except:
        logger.error("%s failure - %s", PROTOCOL_INFO["name"], data)
        return {}
@@ -158,9 +259,16 @@
      return m
++  def _responses(self, data):
++    m = self._message(data)
++    m["type"] = None
++
++    return m
++
    def _private(self, data):
      m = self._message(data)
      m["private"] = True
++    m["type"] = None
      m["recipient"] = {}
      m["recipient"]["name"] = data["recipient"]["name"]
@@ -307,18 +415,18 @@
      return getattr(self, opname)(**args)
    def receive(self, count=util.COUNT, since=None):
--    return self._get("statuses/home_timeline.json", count=count, since_id=since)
++    return self._get("statuses/home_timeline.json", include_entities=1, count=count, since_id=since)
    def responses(self, count=util.COUNT, since=None):
--    return self._get("statuses/mentions.json", count=count, since_id=since)
++    return self._get("statuses/mentions.json", "responses", include_entities=1, count=count, since_id=since)
    def private(self, count=util.COUNT, since=None):
--    private = self._get("direct_messages.json", "private", count=count, since_id=since) or []
++    private = self._get("direct_messages.json", "private", include_entities=1, count=count, since_id=since) or []
      private_sent = self._get("direct_messages/sent.json", "private", count=count, since_id=since) or []
      return private + private_sent
    def public(self):
--    return self._get("statuses/public_timeline.json")
++    return self._get("statuses/public_timeline.json", include_entities=1)
    def lists(self, **args):
      following = self._get("%s/lists/subscriptions.json" % self.account["username"], "list") or []
@@ -326,10 +434,10 @@
      return following + lists
    def list(self, user, id, count=util.COUNT, since=None):
--    return self._get("%s/lists/%s/statuses.json" % (user, id), per_page=count, since_id=since)
++    return self._get("%s/lists/%s/statuses.json" % (user, id), include_entities=1, per_page=count, since_id=since)
    def search(self, query, count=util.COUNT, since=None):
--    return self._search(q=query, rpp=count, since_id=since)
++    return self._search(include_entities=1, q=query, rpp=count, since_id=since)
    def tag(self, query, count=util.COUNT, since=None):
      return self._search(q="#%s" % query, count=count, since_id=since)
@@ -366,5 +474,5 @@
    def user_messages(self, id=None, count=util.COUNT, since=None):
      profiles = [self.profile(id)] or []
--    messages = self._get("statuses/user_timeline.json", id=id, count=count, since_id=since) or []
++    messages = self._get("statuses/user_timeline.json", id=id, include_entities=1, count=count, since_id=since) or []
      return messages + profiles
 === modified file 'gwibber/microblog/util/__init__.py'
 --- gwibber/microblog/util/__init__.py	2012-02-13 20:39:02 +0000
 +++ gwibber/microblog/util/__init__.py	2012-03-05 00:08:19 +0000
@@ -1,4 +1,4 @@
--import os, locale, re, mx.DateTime, cgi
++import os, locale, re, mx.DateTime, cgi, httplib2
  import resources
  import dbus
  from const import *
@@ -65,6 +65,37 @@
    return re.compile(r'"www.', re.U).sub('"http://www.', link)
  def imgpreview(text):
++  images = []
++
++  # If the text is a direct link to a jpg
++  if text.endswith((".jpg", ".gif", ".png", ".bmp")):
++     images.append({"src": text, "url": text})
++     return images
++
++  # For pic.twitter.com images not wrapped in media entities
++  if "pic.twitter.com" in text:
++    # Annoyingly, we have to scrape the page of the tweet to get the actual image location
++    # The mobile site has smaller pages, so we'll use that
++    page = text.replace("/photo/1", "")
++    page = thumb.replace("http://", "http://mobile.")
++
++    resp, content = httplib2.Http().request(page)
++    start = content.index("http://p.twimg.com")
++    end = content.index(':small"><img') + 6
++
++    image = content[start:end]
++    images.append({"src": image, "url": text})
++    return images
++
++  if "instagr.am" in text:
++    # The location of the image is hidden in the header of the short link
++    thumb = text + "media?/size=m"
++    resp, content = httplib2.Http().request(thumb)
++    thumb = resp["content-location"]
++
++    images.append({"src": thumb, "url": text})
++    return images
++
    thumbre = {
      'twitpic': 'http://.*twitpic.com/(?!photos)([A-Za-z0-9]+)',
      'img.gd': 'http://img.gd/(?!photos)([A-Za-z0-9]+)',
@@ -72,12 +103,10 @@
      'twitgoo': 'http://.*twitgoo.com/(?!u/)([A-Za-z0-9]+)',
      'yfrog.us': 'http://.*yfrog.us/(?!froggy)([A-Za-z0-9]+)',
      'yfrog.com': 'http://.*yfrog.com/(?!froggy)([A-Za-z0-9]+)',
--    'twitvid': 'http://.*twitvid.com/(?!videos)([A-Za-z0-9]+)',
      'img.ly': 'http://img.ly/(?!images)([A-Za-z0-9]+)',
      'flic.kr': 'http://flic.kr/p/([A-Za-z0-9]+)',
--    'youtu.be': 'http://youtu.be/([A-Za-z0-9-_]+)',
--    'youtube.com': 'http://.*youtube.com/watch\?v=([A-Za-z0-9-_]+)',
--    'tweetphoto': 'http://.*tweetphoto.com/(0-9]+)',
++    'tweetphoto': 'http://.*tweetphoto.com/([0-9]+)',
++    'plixi': 'http://plixi.com/p/([0-9]+)',
      'pic.gd': 'http://pic.gd/([A-Za-z0-9]+)',
      'brizzly': 'http://.*brizzly.com/pic/([A-Za-z0-9]+)',
      'twitxr': 'http://.*twitxr.com\/[^ ]+\/updates\/([0-9]+)',
@@ -89,22 +118,19 @@
      'moby.to': 'http://moby.to/([A-Za-z0-9]+)',
      'movapic': 'http://.*movapic.com/pic/([A-Za-z0-9]+)',
      'znl.me': 'http://znl.me/([A-Za-z0-9-_]+)',
--    'bcphotoshare': 'http://.*bcphotoshare.com/photos/[0-9]+/([0-9]+)',
--    'twitvideo.jp': 'http://.*twitvideo.jp/(?!contents)([A-Za-z0-9-_]+)'
++    'bcphotoshare': 'http://.*bcphotoshare.com/photos/[0-9]+/([0-9]+)'
+     }
    thumburi = {
      'twitpic': 'http://twitpic.com/show/thumb/@',
      'img.gd': 'http://img.gd/show/thumb/@',
      'imgur': 'http://i.imgur.com/@s.jpg',
      'twitgoo': 'http://twitgoo.com/show/thumb/@',
--    'yfrog.us': 'http://yfrog.us/@.th.jpg',
--    'yfrog.com': 'http://yfrog.com/@.th.jpg',
--    'twitvid': 'http://images.twitvid.com/@.jpg',
++    'yfrog.us': 'http://yfrog.us/@:iphone',
++    'yfrog.com': 'http://yfrog.com/@:iphone',
      'img.ly': 'http://img.ly/show/thumb/@',
      'flic.kr': 'http://flic.kr/p/img/@_m.jpg',
--    'youtu.be': 'http://img.youtube.com/vi/@/default.jpg',
--    'youtube.com': 'http://img.youtube.com/vi/@/default.jpg',
      'tweetphoto': 'http://TweetPhotoAPI.com/api/TPAPI.svc/json/imagefromurl?size=thumbnail&url=@',
++    'plixi': 'http://api.plixi.com/api/tpapi.svc/imagefromurl?size=thumbnail&url=@',
      'pic.gd': 'http://TweetPhotoAPI.com/api/TPAPI.svc/json/imagefromurl?size=thumbnail&url=@',
      'brizzly': 'http://pics.brizzly.com/thumb_sm_@.jpg',
      'twitxr': 'http://twitxr.com/image/@/th/',
@@ -116,19 +142,45 @@
      'moby.to': 'http://api.mobypicture.com?s=small&format=plain&k=6JQhCKX6Z9h2m9Lo&t=@',
      'movapic': 'http://image.movapic.com/pic/s_@.jpeg',
      'znl.me': 'http://app.zannel.com/content/@/Image-160x120-P-JPG.jpg',
--    'bcphotoshare': 'http://images.bcphotoshare.com/storages/@/thumbnail.jpg',
--    'twitvideo.jp': 'http://twitvideo.jp/img/thumb/@'
++    'bcphotoshare': 'http://images.bcphotoshare.com/storages/@/thumbnail.jpg'
+     }
--  images = []
++
    for r, u in zip(thumbre, thumburi):
      for match in re.finditer(thumbre[r], text):
        if r == 'tweetphoto' or r == 'pic.gd' or r == 'moby.to':
--        images.append({"src": thumburi[u].replace('@', match.group(0)) , "url": match.group(0)})
++        images.append({"src": thumburi[u].replace('@', match.group(0)) , "url": text})
        else:
--        images.append({"src": thumburi[u].replace('@', match.group(1)) , "url": match.group(0)})
++        images.append({"src": thumburi[u].replace('@', match.group(1)) , "url": text})
    return images
++def videopreview(text):
++  videos = []
++
++  thumbre = {
++    'twitvid': 'http://.*twitvid.com/(?!videos)([A-Za-z0-9]+)',
++    'youtu.be': 'http://youtu.be/([A-Za-z0-9-_]+)',
++    'youtube.com': 'http://.*youtube.com/watch\?v=([A-Za-z0-9-_]+)',
++    'twitvideo.jp': 'http://.*twitvideo.jp/(?!contents)([A-Za-z0-9-_]+)'
++    }
++  thumburi = {
++    'twitvid': 'http://images.twitvid.com/@.jpg',
++    'youtu.be': 'http://img.youtube.com/vi/@/0.jpg',
++    'youtube.com': 'http://img.youtube.com/vi/@/0.jpg',
++    'twitvideo.jp': 'http://twitvideo.jp/img/thumb/@'
++    }
++  thumbvid = {
++    'twitvid': 'http://.*twitvid.com/@',
++    'youtu.be': 'http://www.youtube.com/watch?v=@',
++    'youtube.com': 'http://www.youtube.com/watch?v=@',
++    'twitvideo.jp': 'http://www.twitvideo.jp/@'
++    }
++
++  for r, u in zip(thumbre, thumburi):
++    for match in re.finditer(thumbre[r], text):
++      videos.append({ "src": thumburi[u].replace('@', match.group(1)), "url"       : text})
++  return videos
++
  def compact(data):
    if isinstance(data, dict):
      return dict([(x, y) for x,y in data.items() if y])
 === modified file 'libgwibber-gtk/stream-view-tile.vala'
 --- libgwibber-gtk/stream-view-tile.vala	2012-02-23 08:59:27 +0000
 +++ libgwibber-gtk/stream-view-tile.vala	2012-03-05 00:08:19 +0000
@@ -450,7 +450,10 @@
+       }
        else if (_stream == "videos")
+       {
--        img_uri = _video_picture;
++        if (_video_picture.length < 1 && _img_src.length > 0)
++          img_uri = _img_src;
++        else
++          img_uri = _video_picture;
          img_src = _video_src;
+       }
@@ -949,7 +952,7 @@
          var last = uri.substring(uri.last_index_of("/") + 1);
          ret = "http://i.imgur.com/%s.png".printf(last);
+       }
--      else if (uri.contains("youtube.com"))
++      else if (uri.contains("youtube.com") && !uri.contains("img.youtube.com"))
+       {
          string id = uri.substring(uri.last_index_of("/") + 1);
 === modified file 'libgwibber/streams.vala'
 --- libgwibber/streams.vala	2012-02-23 04:46:15 +0000
 +++ libgwibber/streams.vala	2012-03-05 00:08:19 +0000
@@ -675,25 +675,16 @@
              /* escape markup in some strings, pango doesn't like it */
              if (_link_name != null)
                _link_name = GLib.Markup.escape_text (_link_name);
--            if (_link_description != null)
--              _link_description = GLib.Markup.escape_text (_link_description);
              if (_image_name != null)
                _image_name = GLib.Markup.escape_text (_image_name);
              if (_video_name != null)
                _video_name = GLib.Markup.escape_text (_video_name);
--            /* FIXME: hacky scrubbing of the html, we should find a
--               better way */
              if (_html != null)
--              _html = _html.replace("&query", "&amp;query");
--              _html = _html.replace("&name", "&amp;name");
--              _html = _html.replace("class=\"nick\"", "");
--              _html = _html.replace("class=\"hash\"", "");
--              _html = _html.replace("<p>", "");
--              _html = _html.replace("</p>", "");
--              _html = _html.replace("<b>", "");
--              _html = _html.replace("</b>", "");
--              //debug ("_html: %s", _html);
++              _html = scrub (_html);
++
++            if (_link_description != null)
++              _link_description = scrub (_link_description);
              string _t = utils.generate_time_string(_time);
@@ -769,6 +760,24 @@
              //debug ("_model has %u ROWS", _model.get_n_rows ());
+         }
++        private string scrub (string content)
++        {
++          /* FIXME: hacky scrubbing of the html, we should find a
++          better way */
++          string res = content;
++          res = res.replace("&query", "&amp;query");
++          res = res.replace("&name", "&amp;name");
++          res = res.replace("class=\"nick\"", "");
++          res = res.replace("class=\"hash\"", "");
++          res = res.replace("<p>", "");
++          res = res.replace("</p>", "");
++          res = res.replace("<b>", "");
++          res = res.replace("</b>", "");
++          //debug ("res: %s", res);
++          return res;
++        }
++
++
          /**
           * com.Gwibber.Streams

Gwibber

Merge lp:~markjtully/gwibber/twitter-entities into lp:gwibber

Commit message

Description of the change

Preview Diff

Subscribers