Merge lp:~cjwatson/meliae/py3-loader-source-bytes into lp:meliae

Proposed by Colin Watson
Status: Merged
Approved by: John A Meinel
Approved revision: 224
Merged at revision: 226
Proposed branch: lp:~cjwatson/meliae/py3-loader-source-bytes
Merge into: lp:meliae
Diff against target: 523 lines (+152/-100)
4 files modified
meliae/_loader.pyx (+16/-4)
meliae/loader.py (+35/-24)
meliae/tests/test__loader.py (+4/-1)
meliae/tests/test_loader.py (+97/-71)
To merge this branch: bzr merge lp:~cjwatson/meliae/py3-loader-source-bytes
Reviewer Review Type Date Requested Status
John A Meinel Approve
Review via email: mp+378581@code.launchpad.net

Commit message

Ensure coherent bytes/text handling of sources in meliae.loader.

Description of the change

meliae.files.open_file (called by load if given a string file name) opens files in binary mode, so it seems most appropriate for the loader tests to pass in dumps as bytes. Store fields as bytes internally where necessary for a compact memory representation of dumps.

To post a comment you must log in.
Revision history for this message
John A Meinel (jameinel) wrote :

This is an interesting one. I think fundamentally we should just drop the _from_line decoder, since 'json' is now a core part of the Python stdlib.
The question remains whether we want to leverage the fact that Bytes might be more memory efficient than PyUnicode objects.

>>> import sys
>>> for i in [0, 1, 10, 50, 500]:
... x = b'1'*i
... y = '1'*i
... print(i, sys.getsizeof(x), sys.getsizeof(y))
...
   0 33 49
   1 34 50
  10 43 59
  50 83 99
500 533 549

It is a modest win at low sizes (59 vs 43 at 10 bytes is 37% more overhead).

I don't know how much it really matters, but I do remember playing a lot of tricks to make it easier to look at a large memory dump without then needing at least as much memory as you were using live. (its why there is a MemObjectCollection which is functionally a dict but typed, and why there are Proxy objects that live long enough to look like a fleshed out Python object instead of just a C struct.)

Revision history for this message
Colin Watson (cjwatson) wrote :

I hadn't considered the question of large dumps. In that case the difference in size between bytes and text isn't quite the point; the issue would be that decoding would add another copy of the data in memory. Indeed, while (simple)json.loads is a more accurate parser, it unpacks it into a much less memory-efficient representation along the way. This is only an issue if there are single objects in the dump that are very large (e.g. a large string), but of course that's quite possible.

So I guess if I just arranged for the _from_line decoder to cope with bytes to save the extra intermediate representation and dropped the line.decode call, that would be good enough? (json.loads accepts either bytes or text on Python 3, so there's no type-safety issue for _from_json.)

Revision history for this message
John A Meinel (jameinel) wrote :

So I believe the file format is such that it *could* be just-one-big json.loads(), but you don't get any progress, etc when doing that. I believe since it is just [\n{data},\n{data},...] you can load each line separately with json, which is what I thought I was doing (trimming off the trailing ,).

I'm happy to continue doing so, as any given line isn't going to be a major overhead, and the memory savings are by putting it into a compacted data structure. (though that followed the Py2 Dict model of a table with holes in it vs the python 3 table-with-no-holes and an index with holes.)

I think the scanner already truncates long strings, so you only see the prefix, so we shouldn't have to worry about that too much.

222. By Colin Watson

Use more compact representations when loading dumps.

The natural representations of the "type_str", "name", and (in some cases)
"value" fields of objects loaded from dumps would be str, but on Python 3
this is a somewhat less efficient representation of ASCII strings, and for
dumps of large processes a compact representation may well matter. To that
end, use bytes where possible.

In general I've tried to confine this to just the highest-volume objects.
For example, meliae.loader._TypeSummary doesn't need to be as dense, so
convenience makes more sense there and _TypeSummary.type_str is a str.

Some methods gain affordances to encode from or decode to str where
appropriate for convenience, where doing so doesn't cause other problems.

I extended meliae.tests.test_loader._example_dump to include both bytes and
text objects, in order to test the slightly different representations of
each.

Revision history for this message
Colin Watson (cjwatson) wrote :

OK, it took me a while to get my head around what you were driving at here, but I think I now understand. How's this? I've turned type_str and name (and sometimes value) back into bytes objects, and everything should now be more compact again even on Python 3.

Revision history for this message
John A Meinel (jameinel) wrote :

I'm happy to land this as is. I'm also willing for you to push back and say "the memory saving probably isn't worth disrupting people using the library who end up seeing b'' strings where they just expect strings".

The fact that you play around casting type_str back to string makes me wonder.
I'll wait to merge until I hear back from you.

review: Approve
223. By Colin Watson

Merge trunk.

224. By Colin Watson

Store type_str as an interned str rather than bytes.

Type strings are likely to be drawn from a relatively small pool, so this
still saves memory for large dumps while being more convenient to deal with
than bytes on Python 3.

Revision history for this message
Colin Watson (cjwatson) wrote :

As discussed on IRC, I've made type_str be an interned str instead, which indeed does make things generally easier to deal with. I think keeping the less-shareable fields as bytes rather than str is reasonably justifiable in the context of a memory debugging tool though; it's still a little surprising, but seems tolerable.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'meliae/_loader.pyx'
2--- meliae/_loader.pyx 2020-02-03 14:38:41 +0000
3+++ meliae/_loader.pyx 2020-05-05 11:04:32 +0000
4@@ -54,9 +54,15 @@
5 PyObject *val) except -1
6
7 import gc
8+import sys
9+
10 from meliae import warn
11
12
13+if sys.version_info[0] >= 3:
14+ intern = sys.intern
15+
16+
17 ctypedef struct RefList:
18 long size
19 PyObject *refs[0]
20@@ -176,6 +182,7 @@
21 addr = <PyObject *>address
22 Py_XINCREF(addr)
23 new_entry.address = addr
24+ type_str = intern(type_str)
25 new_entry.type_str = <PyObject *>type_str
26 Py_XINCREF(new_entry.type_str)
27 new_entry.size = size
28@@ -550,9 +557,12 @@
29 else:
30 # TODO: This isn't perfect, as it doesn't do proper json
31 # escaping
32- if '"' in self.value:
33- raise AssertionError(self.value)
34- value = '"value": "%s", ' % self.value
35+ text_value = self.value
36+ if sys.version_info[0] >= 3 and isinstance(text_value, bytes):
37+ text_value = text_value.decode('latin-1')
38+ if '"' in text_value:
39+ raise AssertionError(text_value)
40+ value = '"value": "%s", ' % text_value
41 else:
42 value = ''
43 return '{"address": %d, "type": "%s", "size": %d, %s"refs": [%s]}' % (
44@@ -579,7 +589,9 @@
45 # a tuple/dict/etc
46 if val.type_str == 'bool':
47 val = (val.value == 'True')
48- elif val.type_str in ('int', 'long', 'str', 'unicode', 'float',
49+ elif val.type_str in ('int', 'long',
50+ 'bytes', 'str', 'unicode',
51+ 'float',
52 ) and val.value is not None:
53 val = val.value
54 elif val.type_str == 'NoneType':
55
56=== modified file 'meliae/loader.py'
57--- meliae/loader.py 2020-03-11 20:33:25 +0000
58+++ meliae/loader.py 2020-05-05 11:04:32 +0000
59@@ -38,6 +38,9 @@
60 )
61
62
63+if sys.version_info[0] >= 3:
64+ intern = sys.intern
65+
66 timer = time.time
67 if sys.platform == 'win32':
68 timer = time.clock
69@@ -46,35 +49,39 @@
70 # faster than simplejson without extensions, though slower than simplejson w/
71 # extensions.
72 _object_re = re.compile(
73- r'\{"address": (?P<address>\d+)'
74- r', "type": "(?P<type>[^"]*)"'
75- r', "size": (?P<size>\d+)'
76- r'(, "name": "(?P<name>.*)")?'
77- r'(, "len": (?P<len>\d+))?'
78- r'(, "value": (?P<valuequote>"?)(?P<value>.*)(?P=valuequote))?'
79- r', "refs": \[(?P<refs>[^]]*)\]'
80- r'\}')
81+ br'\{"address": (?P<address>\d+)'
82+ br', "type": "(?P<type>[^"]*)"'
83+ br', "size": (?P<size>\d+)'
84+ br'(, "name": "(?P<name>.*)")?'
85+ br'(, "len": (?P<len>\d+))?'
86+ br'(, "value": (?P<valuequote>"?)(?P<value>.*)(?P=valuequote))?'
87+ br', "refs": \[(?P<refs>[^]]*)\]'
88+ br'\}')
89
90 _refs_re = re.compile(
91- r'(?P<ref>\d+)'
92+ br'(?P<ref>\d+)'
93 )
94
95
96 def _from_json(cls, line, temp_cache=None):
97 val = simplejson.loads(line)
98 # simplejson likes to turn everything into unicode strings, but we know
99- # everything is just a plain 'str', and we can save some bytes if we
100- # cast it back
101+ # everything is just plain ASCII, and we can save some bytes if we cast
102+ # things back to `bytes`. This is a little surprising on Python 3, but
103+ # it makes it easier to deal with large dumps.
104+ name = val.get('name', None)
105+ if name is not None and isinstance(name, six.text_type):
106+ name = name.encode('ASCII')
107 obj = cls(address=val['address'],
108- type_str=str(val['type']),
109+ type_str=intern(str(val['type'])),
110 size=val['size'],
111 children=val['refs'],
112 length=val.get('len', None),
113 value=val.get('value', None),
114- name=val.get('name', None))
115- if (obj.type_str == 'str'):
116- if type(obj.value) is unicode:
117- obj.value = obj.value.encode('latin-1')
118+ name=name)
119+ if (obj.type_str != six.text_type.__name__ and
120+ isinstance(obj.value, six.text_type)):
121+ obj.value = obj.value.encode('latin-1')
122 if temp_cache is not None:
123 obj._intern_from_cache(temp_cache)
124 return obj
125@@ -87,9 +94,11 @@
126 (address, type_str, size, name, length, value,
127 refs) = m.group('address', 'type', 'size', 'name', 'len',
128 'value', 'refs')
129+ if not isinstance(type_str, str):
130+ type_str = type_str.decode('UTF-8')
131 assert '\\' not in type_str
132 if name is not None:
133- assert '\\' not in name
134+ assert b'\\' not in name
135 if length is not None:
136 length = int(length)
137 refs = [int(val) for val in _refs_re.findall(refs)]
138@@ -105,9 +114,8 @@
139 length=length,
140 value=value,
141 name=name)
142- if (obj.type_str == 'str'):
143- if type(obj.value) is unicode:
144- obj.value = obj.value.encode('latin-1')
145+ if obj.type_str == six.text_type.__name__ and isinstance(obj.value, bytes):
146+ obj.value = obj.value.decode('latin-1')
147 if temp_cache is not None:
148 obj._intern_from_cache(temp_cache)
149 return obj
150@@ -443,7 +451,10 @@
151 obj.size = obj.size + dict_obj.size
152 obj.total_size = 0
153 if obj.type_str == 'instance':
154- obj.type_str = type_obj.value
155+ instance_type_str = type_obj.value
156+ if not isinstance(instance_type_str, str):
157+ instance_type_str = instance_type_str.decode('UTF-8')
158+ obj.type_str = instance_type_str
159 # Now that all the data has been moved into the instance, we
160 # will want to remove the dict from the collection. We'll do the
161 # actual deletion later, since we are using iteritems for this
162@@ -576,7 +587,7 @@
163 input_mb = input_size / 1024. / 1024.
164 temp_cache = {}
165 address_re = re.compile(
166- r'{"address": (?P<address>\d+)'
167+ br'{"address": (?P<address>\d+)'
168 )
169 bytes_read = count = 0
170 last = 0
171@@ -589,9 +600,9 @@
172 factory = _loader._MemObjectProxy_from_args
173 for line_num, line in enumerate(source):
174 bytes_read += len(line)
175- if line in ("[\n", "]\n"):
176+ if line in (b"[\n", b"]\n"):
177 continue
178- if line.endswith(',\n'):
179+ if line.endswith(b',\n'):
180 line = line[:-2]
181 if objs:
182 # Skip duplicate objects
183
184=== modified file 'meliae/tests/test__loader.py'
185--- meliae/tests/test__loader.py 2020-03-11 20:54:23 +0000
186+++ meliae/tests/test__loader.py 2020-05-05 11:04:32 +0000
187@@ -354,7 +354,10 @@
188 mop.children = [addr876542+1, addr654320+1]
189 mop.parents = [addr876542+1, addr654320+1]
190 self.assertFalse(mop.address is addr)
191- self.assertFalse(mop.type_str is t)
192+ # type_str always gets interned, so mop.type_str is identical to the
193+ # cached object even though its input string isn't.
194+ self.assertFalse(type_str is t)
195+ self.assertTrue(mop.type_str is t)
196 rl = mop.children
197 self.assertFalse(rl[0] is addr876543)
198 self.assertFalse(rl[1] is addr654321)
199
200=== modified file 'meliae/tests/test_loader.py'
201--- meliae/tests/test_loader.py 2020-01-29 13:19:59 +0000
202+++ meliae/tests/test_loader.py 2020-05-05 11:04:32 +0000
203@@ -19,6 +19,8 @@
204 import sys
205 import tempfile
206
207+import six
208+
209 from meliae import (
210 _loader,
211 loader,
212@@ -32,22 +34,26 @@
213 # a@5 = 1
214 # b@4 = 2
215 # c@6 = 'a str'
216-# t@7 = (a, b)
217+# u@8 = u'a unicode'
218+# t@7 = (a, b, u)
219 # d@2 = {a:b, c:t}
220-# l@3 = [a, b]
221+# l@3 = [a, b, u]
222 # l.append(l)
223 # outer@1 = (d, l)
224 _example_dump = [
225 '{"address": 1, "type": "tuple", "size": 20, "len": 2, "refs": [2, 3]}',
226-'{"address": 3, "type": "list", "size": 44, "len": 3, "refs": [3, 4, 5]}',
227+'{"address": 3, "type": "list", "size": 44, "len": 3, "refs": [3, 4, 5, 8]}',
228 '{"address": 5, "type": "int", "size": 12, "value": 1, "refs": []}',
229 '{"address": 4, "type": "int", "size": 12, "value": 2, "refs": []}',
230 '{"address": 2, "type": "dict", "size": 124, "len": 2, "refs": [4, 5, 6, 7]}',
231-'{"address": 7, "type": "tuple", "size": 20, "len": 2, "refs": [4, 5]}',
232-'{"address": 6, "type": "str", "size": 29, "len": 5, "value": "a str"'
233- ', "refs": []}',
234-'{"address": 8, "type": "module", "size": 60, "name": "mymod", "refs": [2]}',
235+'{"address": 7, "type": "tuple", "size": 20, "len": 2, "refs": [4, 5, 8]}',
236+'{"address": 6, "type": "%s", "size": 29, "len": 5, "value": "a str"'
237+ ', "refs": []}' % bytes.__name__,
238+'{"address": 8, "type": "%s", "size": 88, "len": 9, "value": "a unicode"'
239+ ', "refs": []}' % six.text_type.__name__,
240+'{"address": 9, "type": "module", "size": 60, "name": "mymod", "refs": [2]}',
241 ]
242+_example_dump = [line.encode('ASCII') for line in _example_dump]
243
244 # Note that this doesn't have a complete copy of the references. Namely when
245 # you subclass object you get a lot of references, and type instances also
246@@ -72,6 +78,7 @@
247 '{"address": 14, "type": "module", "size": 28, "name": "sys", "refs": [15]}',
248 '{"address": 15, "type": "dict", "size": 140, "len": 2, "refs": [5, 6, 9, 6]}',
249 ]
250+_instance_dump = [line.encode('ASCII') for line in _instance_dump]
251
252 _old_instance_dump = [
253 '{"address": 1, "type": "instance", "size": 36, "refs": [2, 3]}',
254@@ -86,6 +93,7 @@
255 ', "refs": []}',
256 '{"address": 8, "type": "tuple", "size": 28, "len": 0, "refs": []}',
257 ]
258+_old_instance_dump = [line.encode('ASCII') for line in _old_instance_dump]
259
260 _intern_dict_dump = [
261 '{"address": 2, "type": "str", "size": 25, "len": 1, "value": "a", "refs": []}',
262@@ -96,6 +104,7 @@
263 '{"address": 7, "type": "dict", "size": 512, "refs": [6, 6, 5, 5, 4, 4, 3, 3]}',
264 '{"address": 8, "type": "dict", "size": 512, "refs": [2, 2, 5, 5, 4, 4, 3, 3]}',
265 ]
266+_intern_dict_dump = [line.encode('ASCII') for line in _intern_dict_dump]
267
268
269 class TestLoad(tests.TestCase):
270@@ -116,8 +125,8 @@
271
272 def test_load_one(self):
273 objs = loader.load([
274- '{"address": 1234, "type": "int", "size": 12, "value": 10'
275- ', "refs": []}'], show_prog=False).objs
276+ b'{"address": 1234, "type": "int", "size": 12, "value": 10'
277+ b', "refs": []}'], show_prog=False).objs
278 keys = objs.keys()
279 self.assertEqual([1234], keys)
280 obj = objs[1234]
281@@ -128,16 +137,19 @@
282
283 def test_load_without_simplejson(self):
284 objs = loader.load([
285- '{"address": 1234, "type": "int", "size": 12, "value": 10'
286- ', "refs": []}',
287- '{"address": 2345, "type": "module", "size": 60, "name": "mymod"'
288- ', "refs": [1234]}',
289- '{"address": 4567, "type": "str", "size": 150, "len": 126'
290- ', "value": "Test \\\'whoami\\\'\\u000a\\"Your name\\""'
291- ', "refs": []}'
292+ b'{"address": 1234, "type": "int", "size": 12, "value": 10'
293+ b', "refs": []}',
294+ b'{"address": 2345, "type": "module", "size": 60, "name": "mymod"'
295+ b', "refs": [1234]}',
296+ ('{"address": 4567, "type": "%s", "size": 150, "len": 126'
297+ ', "value": "Test \\/whoami\\/\\u000a\\"Your name\\""'
298+ ', "refs": []}' % bytes.__name__).encode('UTF-8'),
299+ ('{"address": 5678, "type": "%s", "size": 150, "len": 126'
300+ ', "value": "Test \\/whoami\\/\\u000a\\"Your name\\""'
301+ ', "refs": []}' % six.text_type.__name__).encode('UTF-8'),
302 ], using_json=False, show_prog=False).objs
303 keys = sorted(objs.keys())
304- self.assertEqual([1234, 2345, 4567], keys)
305+ self.assertEqual([1234, 2345, 4567, 5678], keys)
306 obj = objs[1234]
307 self.assertTrue(isinstance(obj, _loader._MemObjectProxy))
308 # The address should be exactly the same python object as the key in
309@@ -146,9 +158,15 @@
310 self.assertEqual(10, obj.value)
311 obj = objs[2345]
312 self.assertEqual("module", obj.type_str)
313- self.assertEqual("mymod", obj.value)
314+ self.assertEqual(b"mymod", obj.value)
315 obj = objs[4567]
316- self.assertEqual("Test \\'whoami\\'\\u000a\\\"Your name\\\"", obj.value)
317+ self.assertTrue(isinstance(obj.value, bytes))
318+ self.assertEqual(
319+ b"Test \\/whoami\\/\\u000a\\\"Your name\\\"", obj.value)
320+ obj = objs[5678]
321+ self.assertTrue(isinstance(obj.value, six.text_type))
322+ self.assertEqual(
323+ u"Test \\/whoami\\/\\u000a\\\"Your name\\\"", obj.value)
324
325 def test_load_example(self):
326 objs = loader.load(_example_dump, show_prog=False)
327@@ -168,7 +186,7 @@
328 try:
329 content = gzip.GzipFile(mode='wb', compresslevel=6, fileobj=f)
330 for line in _example_dump:
331- content.write(line + '\n')
332+ content.write(line + b'\n')
333 content.flush()
334 content.close()
335 del content
336@@ -197,24 +215,24 @@
337 def test_remove_expensive_references(self):
338 lines = list(_example_dump)
339 lines.pop(-1) # Remove the old module
340- lines.append('{"address": 8, "type": "module", "size": 12'
341- ', "name": "mymod", "refs": [9]}')
342- lines.append('{"address": 9, "type": "dict", "size": 124'
343- ', "refs": [10, 11]}')
344- lines.append('{"address": 10, "type": "module", "size": 12'
345- ', "name": "mod2", "refs": [12]}')
346- lines.append('{"address": 11, "type": "str", "size": 27'
347- ', "value": "boo", "refs": []}')
348- lines.append('{"address": 12, "type": "dict", "size": 124'
349- ', "refs": []}')
350+ lines.append(b'{"address": 9, "type": "module", "size": 12'
351+ b', "name": "mymod", "refs": [10]}')
352+ lines.append(b'{"address": 10, "type": "dict", "size": 124'
353+ b', "refs": [11, 12]}')
354+ lines.append(b'{"address": 11, "type": "module", "size": 12'
355+ b', "name": "mod2", "refs": [13]}')
356+ lines.append(b'{"address": 12, "type": "str", "size": 27'
357+ b', "value": "boo", "refs": []}')
358+ lines.append(b'{"address": 13, "type": "dict", "size": 124'
359+ b', "refs": []}')
360 source = lambda:loader.iter_objs(lines)
361- mymod_dict = list(source())[8]
362- self.assertEqual([10, 11], mymod_dict.children)
363+ mymod_dict = list(source())[9]
364+ self.assertEqual([11, 12], mymod_dict.children)
365 result = list(loader.remove_expensive_references(source))
366 null_obj = result[0][1]
367 self.assertEqual(0, null_obj.address)
368 self.assertEqual('<ex-reference>', null_obj.type_str)
369- self.assertEqual([11, 0], result[9][1].children)
370+ self.assertEqual([12, 0], result[10][1].children)
371
372
373 class TestMemObj(tests.TestCase):
374@@ -226,12 +244,15 @@
375 expected = [
376 '{"address": 1, "type": "tuple", "size": 20, "refs": [2, 3]}',
377 '{"address": 2, "type": "dict", "size": 124, "refs": [4, 5, 6, 7]}',
378-'{"address": 3, "type": "list", "size": 44, "refs": [3, 4, 5]}',
379+'{"address": 3, "type": "list", "size": 44, "refs": [3, 4, 5, 8]}',
380 '{"address": 4, "type": "int", "size": 12, "value": 2, "refs": []}',
381 '{"address": 5, "type": "int", "size": 12, "value": 1, "refs": []}',
382-'{"address": 6, "type": "str", "size": 29, "value": "a str", "refs": []}',
383-'{"address": 7, "type": "tuple", "size": 20, "refs": [4, 5]}',
384-'{"address": 8, "type": "module", "size": 60, "value": "mymod", "refs": [2]}',
385+'{"address": 6, "type": "%s", "size": 29, "value": "a str"'
386+ ', "refs": []}' % bytes.__name__,
387+'{"address": 7, "type": "tuple", "size": 20, "refs": [4, 5, 8]}',
388+'{"address": 8, "type": "%s", "size": 88, "value": "a unicode"'
389+ ', "refs": []}' % six.text_type.__name__,
390+'{"address": 9, "type": "module", "size": 60, "value": "mymod", "refs": [2]}',
391 ]
392 self.assertEqual(expected, [obj.to_json() for obj in objs])
393
394@@ -243,11 +264,12 @@
395 objs = manager.objs
396 self.assertEqual((), objs[1].parents)
397 self.assertEqual([1, 3], objs[3].parents)
398- self.assertEqual([3, 7, 8], sorted(objs[4].parents))
399- self.assertEqual([3, 7, 8], sorted(objs[5].parents))
400- self.assertEqual([8], objs[6].parents)
401- self.assertEqual([8], objs[7].parents)
402- self.assertEqual((), objs[8].parents)
403+ self.assertEqual([3, 7, 9], sorted(objs[4].parents))
404+ self.assertEqual([3, 7, 9], sorted(objs[5].parents))
405+ self.assertEqual([9], objs[6].parents)
406+ self.assertEqual([9], objs[7].parents)
407+ self.assertEqual([3, 7], objs[8].parents)
408+ self.assertEqual((), objs[9].parents)
409
410 def test_compute_referrers(self):
411 # Deprecated
412@@ -267,11 +289,12 @@
413 warn.trap_warnings(old_func)
414 self.assertEqual((), objs[1].parents)
415 self.assertEqual([1, 3], objs[3].parents)
416- self.assertEqual([3, 7, 8], sorted(objs[4].parents))
417- self.assertEqual([3, 7, 8], sorted(objs[5].parents))
418- self.assertEqual([8], objs[6].parents)
419- self.assertEqual([8], objs[7].parents)
420- self.assertEqual((), objs[8].parents)
421+ self.assertEqual([3, 7, 9], sorted(objs[4].parents))
422+ self.assertEqual([3, 7, 9], sorted(objs[5].parents))
423+ self.assertEqual([9], objs[6].parents)
424+ self.assertEqual([9], objs[7].parents)
425+ self.assertEqual([3, 7], objs[8].parents)
426+ self.assertEqual((), objs[9].parents)
427
428 def test_compute_parents_ignore_repeated(self):
429 manager = loader.load(_intern_dict_dump, show_prog=False)
430@@ -294,6 +317,7 @@
431 for x in range(200):
432 content.append('{"address": %d, "type": "tuple", "size": 20,'
433 ' "len": 2, "refs": [2, 2]}' % (x+100))
434+ content = [line.encode('UTF-8') for line in content]
435 # By default, we only track 100 parents
436 manager = loader.load(content, show_prog=False)
437 self.assertEqual(100, manager[2].num_parents)
438@@ -307,42 +331,42 @@
439 def test_compute_total_size(self):
440 manager = loader.load(_example_dump, show_prog=False)
441 objs = manager.objs
442- manager.compute_total_size(objs[8])
443- self.assertEqual(257, objs[8].total_size)
444+ manager.compute_total_size(objs[9])
445+ self.assertEqual(345, objs[9].total_size)
446
447 def test_compute_total_size_missing_ref(self):
448 lines = list(_example_dump)
449 # 999 isn't in the dump, not sure how we get these in real life, but
450 # they exist. we should live with references that can't be resolved.
451- lines[-1] = ('{"address": 8, "type": "tuple", "size": 16, "len": 1'
452- ', "refs": [999]}')
453+ lines[-1] = (b'{"address": 9, "type": "tuple", "size": 16, "len": 1'
454+ b', "refs": [999]}')
455 manager = loader.load(lines, show_prog=False)
456- obj = manager[8]
457+ obj = manager[9]
458 manager.compute_total_size(obj)
459 self.assertEqual(16, obj.total_size)
460
461 def test_remove_expensive_references(self):
462 lines = list(_example_dump)
463 lines.pop(-1) # Remove the old module
464- lines.append('{"address": 8, "type": "module", "size": 12'
465- ', "name": "mymod", "refs": [9]}')
466- lines.append('{"address": 9, "type": "dict", "size": 124'
467- ', "refs": [10, 11]}')
468- lines.append('{"address": 10, "type": "module", "size": 12'
469- ', "name": "mod2", "refs": [12]}')
470- lines.append('{"address": 11, "type": "str", "size": 27'
471- ', "value": "boo", "refs": []}')
472- lines.append('{"address": 12, "type": "dict", "size": 124'
473- ', "refs": []}')
474+ lines.append(b'{"address": 9, "type": "module", "size": 12'
475+ b', "name": "mymod", "refs": [10]}')
476+ lines.append(b'{"address": 10, "type": "dict", "size": 124'
477+ b', "refs": [11, 12]}')
478+ lines.append(b'{"address": 11, "type": "module", "size": 12'
479+ b', "name": "mod2", "refs": [13]}')
480+ lines.append(b'{"address": 12, "type": "str", "size": 27'
481+ b', "value": "boo", "refs": []}')
482+ lines.append(b'{"address": 13, "type": "dict", "size": 124'
483+ b', "refs": []}')
484 manager = loader.load(lines, show_prog=False, collapse=False)
485- mymod_dict = manager.objs[9]
486- self.assertEqual([10, 11], mymod_dict.children)
487+ mymod_dict = manager.objs[10]
488+ self.assertEqual([11, 12], mymod_dict.children)
489 manager.remove_expensive_references()
490 self.assertTrue(0 in manager.objs)
491 null_obj = manager.objs[0]
492 self.assertEqual(0, null_obj.address)
493 self.assertEqual('<ex-reference>', null_obj.type_str)
494- self.assertEqual([11, 0], mymod_dict.children)
495+ self.assertEqual([12, 0], mymod_dict.children)
496
497 def test_collapse_instance_dicts(self):
498 manager = loader.load(_instance_dump, show_prog=False, collapse=False)
499@@ -419,16 +443,18 @@
500
501 def test_summarize_refs(self):
502 manager = loader.load(_example_dump, show_prog=False)
503- summary = manager.summarize(manager[8])
504+ summary = manager.summarize(manager[9])
505 # Note that the module is included in the summary
506- self.assertEqual(['int', 'module', 'str', 'tuple'],
507+ self.assertEqual(sorted(['int', 'module', bytes.__name__,
508+ six.text_type.__name__, 'tuple']),
509 sorted(summary.type_summaries.keys()))
510- self.assertEqual(257, summary.total_size)
511+ self.assertEqual(345, summary.total_size)
512
513 def test_summarize_excluding(self):
514 manager = loader.load(_example_dump, show_prog=False)
515- summary = manager.summarize(manager[8], excluding=[4, 5])
516+ summary = manager.summarize(manager[9], excluding=[4, 5])
517 # No ints when they are explicitly filtered
518- self.assertEqual(['module', 'str', 'tuple'],
519+ self.assertEqual(sorted(['module', bytes.__name__,
520+ six.text_type.__name__, 'tuple']),
521 sorted(summary.type_summaries.keys()))
522- self.assertEqual(233, summary.total_size)
523+ self.assertEqual(321, summary.total_size)

Subscribers

People subscribed via source and target branches

to all changes: