Ryohei Suzuki and Takeo Igarashi, Collaborative 3D Modeling by the Crowd, in Proceedings of the 43rd International Conference on Graphics, Visualization & Human-computer Interaction (GI 2017)
2. 3D modeling by crowdsourced sketching
Purpose
Synthesizing a 3D model from a single reference image.
Our approach
Crowdsourcing 2D sketching from multiple viewing angles,
then automatically integrating them into a 3D geometry.
Reference image
(photo/illustration) 3D modelSketches projections
4. 3D modeling is difficult for novice users
6DOF object operation
Local coordinate?
Global coordinate?
Many operation modes
Object mode? Edit mode?
Sculpt mode?
3D view rotation
Complex mouse operation
Many setting items
5. What is the easiest way?
Simplified 3D CAD tools
(e.g., Sketchup, Tinkercad) Sketch-based modeling
[Igarashi et al., 1999]
[Nealen et al., 2007]
[Chen et al., 2013]
Image processing
+ user interaction
7. Macrotask vs. microtask crowdsourcing
Macrotask
Outsourcing complex tasks to a small
number of professional workers
Microtask
Outsourcing simple tasks to a large
number of non-professional workers
Pros: skilled work
Cons: small worker pool, high cost
Pros: large worker pool
Cons: low-quality, unskilled work
8. Human computation for creative purposes
Human Computation (HC) [von Ahn, 2006]
“a paradigm for utilizing human processing power to
solve problems that computers cannot yet solve.”
microtask
skilled work
[Gingold et al., 2011]
Normal vector annotation
[Koyama et al., 2014]
Optimizing photo color correction
Applications of HC to content enhancement
9. Our approach
• Decomposing 3D modeling process into microtasks to enable
3D shape synthesis by HC.
• Proposing algorithms to integrate many inconsistent sketches to
extract geometrical information.
• Proposing novel crowdsourcing workflow for improving the
quality of submitted sketches.
→ Show the possibilities of HC for content creation
11. 3D modeling workflow
Crowd workers
Reference image
+ three directions
+ parts number
2D sketches
Orthogonal
projections
3D model
(output)
Iterative
refinement
Peer reviewing
Sketching
Integrate Synthesis
User (customer)
Evaluate
Continue/stop
“7 parts”
12. Sketching task
• Draw a sketch of the object seen from a specified view
• 1 sketch / 1 worker, $0.36 basic reward
• Partly/entirely occluded parts should also be drawn overlapped
14. 1. Extraction of valid sketches
Problem: existence of invalid sketches in submissions
• Sketches drawn from wrong viewing angles
• Completely meaningless submissions
Reference image
15. 1. Extraction of valid sketches
Observation
Strategy
Modified Hausdorff Distance Matrix
[Dubuisson 1994]
Clustering by Medoidshifts
[Sheikh et al. 2007]
Cluster 1
Cluster 2
Cluster 4
Cluster 3 Cluster 7
Cluster 6
Cluster 5
Reference
image
valid sketches are similar to each other
clustering sketches, then use the largest cluster
16. 2. Integrating sketches into a projection
Analyzing the correspondence between individual sketches
1. Clustering all the parts contained in the valid sketches
• Same strategy as sketch clustering
2. Calculate the average shape for every cluster
17. 3. Synthesizing 3D primitives from multi-
view projections
1. Inferring the correspondence between parts from multi-view
projections to extract triplets by cost calculation
2. Generate a 3D primitive for each triplet
19. What are the problems with sketches?
Small proportion of valid sketches
• Only ~40% of submissions are valid
• Most invalid sketches are caused by misunderstanding the task
Most valid sketches are incomplete
• Imperfect coverage of parts in the reference image
• Poor precision of parts arrangements
• Lack of motivation?
How can we help/encourage workers to draw better sketches?
20. 1. Example-sharing
• Providing satisfactory submissions from previous workers [Little et al., 2010]
• Workers can avoid misunderstanding by referring to the examples
Previously submitted
distinguished sketches
21. 2. Introducing competition
• Provide extra rewards ($0.18) for excellent submitters
• Motivating workers to draw better sketches than minimum requirements
• Peer-review based evaluation of sketches
Peer-reviewing interface
7-stage evaluation
23. Example of refinement results
Top sketches from the 1st iteration
Generation result from the 20×3 sketches
24. Example of refinement results
Top sketches from the 3rd iteration
Generation result from the 20×3 sketches
25. Example of refinement results
Top sketches from the 5th iteration
Generation result from the 20×3 sketches
Valid sketch ratio: 40% → 80% improvement
31. Difficulty of the tasks
Required timer for task completion
• Sketching 8.0 mins (median)
• Reviewing 3.8 mins (median)
Survey results from crowd workers (5 is best)
Acceptable as “microtasks”
Overall
satisfaction
Clarity of task
instruction
Ease of the task Payment
Sketching 4.7 4.5 4.1 4.1
Reviewing 4.6 4.5 4.1 4.3
32. Monetary costs / time consumption
Paid fees per an iteration
• Sketching $0.36 × 20 sketches × 3 views
• Reviewing $0.24 × 20 sketches × 3 views
• Bonus $0.18 × 4 workers × 3 views
Total $45.78/iteration (including transaction fee of CrowdFlower)
Required time for completion
• 45 mins (1 iteration) ~ 3.5 hours (5 iterations)
Fees were decided observing Dynamo
payment guidelines for research on
Mturk*
*http://wiki.wearedynamo.org/index.php?title=Guidelines_for_Academic_Requesters
33. Comparison with professional outsourcing
Model by
professional
Monetary cost $45 (vs. $46/iter)
Time consumption a whole day (vs. ~3.5h)
Extra cost ~10 email writing
Quality precise, with chamfer
Tested macrotask crowdsourcing using a freelancer platform*
*http://www.lancers.jp/
Model by
crowd
34. Advantages / disadvantages of our approach
Pros
• Small time consumption and communication cost
• High availability and scalability thanks to vast worker pool
Cons
• Lower quality than professional work
• Larger monetary cost
36. Supported 3D primitives / operations
Current algorithm supports:
• Primitives: cuboid / cylinder / ellipsoids
• Rotation: about one of X-Y-Z axes
view 1 view 2 view 3 3D primitive
rectangle rectangle rectangle cuboid
cylinder
ellipsoid
rectangle rectangle ellipse
ellipseellipseellipse
37. Ambiguity in 3D synthesis from projections
Confusion occurs when multiple parts overlap from a certain view
Overlapping
38. Future work
Applying HC for diverse 3D modeling processes
• Voting for resolving ambiguity
• Fillet / chamfer design of edges
• Alignment of objects
• etc.
40. Conclusion
• We proposed a crowd-powered approach for 3D modeling
from a single reference image
• We designed 3D synthesis algorithms as well as
iterative crowdsourcing workflow for quality improvement
• We showed the practicability of the approach by evaluation
Thank you!
Hinweis der Redaktion
Hello everyone.
I am Ryohei Suzuki, an ex-master student in the user interface research group at the University of Tokyo.
Today I'm going to talk about our work "collaborative 3D modeling by the crowd."
This paper was authored by me and Takeo Igarashi.
I would like to start from briefly introducing the problem what we want to tackle and our approach to that. Our purpose is to synthesize a complete 3D model from a single reference image, such as a picture or an illustration, as the input.
But this is one of the long-standing problems in computer graphics, and currently there is no straightforward computational solution to this. In this work, we propose an approach that takes advantage of human cognitive functions utilizing a crowdsourcing system.
That is, we gather 2D sketching of the target object drawn from multiple viewing angles by human workers, then automatically integrate them into a 3D geometry.
This approach enables synthesis of 3D models without complicated image processing or fine-tuned machine learning system.
Then, let me explain the background of our research.
Since this work is about 3D modeling, let's see the existing 3D modeling methods briefly.
Recently more and more consumers have interests in creating their own 3D objects as the penetration of digital fabrication. However, 3D modeling using conventional authoring software designed for professionals, such as Maya, Blender, Cinema4D is quite difficult for novice users.
It involves 3D view rotation with complex mouse operation, 6DOF object operation with multiple coordinate systems, transition between many operation modes, and many many setting items.
So, we are interested in what is the easiest way to create a new 3D model for such users.
Firstly, we have a number of simplified 3D CAD software such as Sketchup, Tinkercad, Fusion360.The interface designs of these modern software are sophisticated and the user can create arbitrary object with small efforts.
However, these software still take several tens of minutes to couple of hours for learning the usage, and sometimes require complex 3D operation using multi-button mouse.
As another option, we have some sketch-based modeling methods which only require 2D operations to create pretty 3D models.
But, as you may know, using these methods is not as easy as it may looks in a demo video prepared by the authors. And we have more modern techniques like 3-Sweep that combine image processing and user interaction for semi-automatically extracting geometry from inputs, such as images.
Such methods give us great ease of modeling, but do not always work, and still require the users to remember new interaction methods like sweeping.
These three ways each have advantages and are useful in certain situations, but we have another option that should be simplest.
Yes, that is crowdsourcing.
We can entirely outsource the task of 3D modeling to another person and just wait for the result.
There is largely two distinct categories of crowdsourcing, macrotask crowdsourcing and microtask crowdsourcing.
The former is outsourcing of complex tasks to one or several workers with professional skills.
It can take advantage of skilled work, so 3D modeling by macrotask crowdsourcing is straightforward.
But it has a weak point in the availability because of the small skilled worker pool.
In contrast, the latter simultaneously outsources very simple tasks that can be processed in several minutes to a large number of workers without special skills.
It can utilize the virtually infinite worker pool, but basically it can only produce low-quality and unskilled work results.
Obtaining complex fruits like that of macrotask crowdsourcing from microtask crowdsourcing is a non-trivial and challenging problem.
We would like to explore such possibility in 3D modeling.
Such idea was firstly formulated by von Ahn and named "human computation.”
His original definition of human computation was "a paradigm for utilizing human processing power to solve problems that computers cannot yet solve.”
In this passage, "human processing power" corresponds to microtask and the "problems" corresponds to skilled work of professional workers in our context.
There has been some work applying human computation to creative purposes, such as normal vector annotation of images for re-lighting and optimization of photo color correction.
These work can be seen as the applications of human computation to content enhancement.
We consider that application of HC to content creation from scratch, not enhancement, should be a challenging frontier of HCI research.
Then, let me introduce the summary our approach.
Basically, we decompose 3D modeling process into microtasks, 2D sketching, to enable 3D shape synthesis by human computation.
To do so, we propose algorithms for integrating many sketches to extract geometrical information.
We also propose a novel crowdsourcing workflow that is needed for improving the submission quality.
And, ultimately, we would like to show the possibilities of human computation for content creation.
Let me move onto the system overview.
3D modeling workflow in our system is as follows.
Firstly, the user uploads a reference image and annotate it with orthogonal viewing directions using a web interface.
The user also provide the number of parts consisting the target object.
Then, crowd workers are recruited using CrowdFlower platform and they draw 2D sketches of the target object viewed from one of the orthogonal angles.
Gathered sketches are integrated to reconstruct an orthogonal projections, then the resulting 3D model is synthesized from the projections.
In order to refine the output quality, sketches are iteratively gathered with peer reviewing process by the crowd workers.
The user evaluate the quality of output at the end of each iteration, then decides to continue or stop the iteration.
Sketching task is executed in a web interface like this, each worker is directed to draw a single sketch of the object seen from a specified view, and given 36 cents as the basic reward.
They are requested to draw occluded parts as well.
Then, I would like to present the 3D synthesis algorithms.
The process starts from the extraction of valid sketches.
Some of the submitted sketches are drawn from wrong viewing angles, and the others are completely meaningless.
We should exclude them and extract only the valid sketches to generate a clean projection.
From the observation that valid sketches are similar to each other in contrast to the diverse appearance of invalid ones, we take a strategy that firstly cluster the sketches based on their similarities, then adopt the largest cluster as the valid one.
We defined the similarity matrix by modified Hausdorff distance, then calculate the clusters by Medoidshifts method.
After extracting the valid sketches, we integrate them into a projection.
To do so, we should analyze the correspondence between elements contained in different sketches.
We take the clustering-based strategy same as the previous process to obtain the sets of 2D elements representing a same part in the target object.
We calculate the average shape, that is size, position, and rotation, for every cluster, then obtain a clean projection.
Finally, we synthesize 3D primitives from multi-view projections.
We infer the correspondence between parts contained in each projection, then extract triplets that have small costs.
Each triplet is converted to a 3D primitive according to the combination of the composing 2D parts.
The cost of a triplet is calculated as the square-sum of the mismatch between the endpoints of the parts along the three axes.
Please see the paper for the detail.
Then, I would like to present our iterative refinement mechanism.
In the pilot research, we realized that there are two major problems with the gathered sketches.
The first was the small proportion of the valid sketches.
It was only about 40%, and considerable proportion of the invalid sketches are caused by misunderstanding of the task, such as the specification of viewing direction.
The second problem was the incompleteness of the valid sketches.
The coverage of the parts drawn in the sketches was far from 100%, and the precision of parts arrangements was also poor.
This might be naturally caused by the lack of motivation for more than satisfying the minimum requirements.
Hence, we consider the way to help and encourage workers to draw better sketches.
The first element is example-sharing.
We can provide satisfactory submission from previous workers to help the successive workers to comprehend the task instruction. It can be seen as an implicit collaboration between workers.
To do so, we evaluate the submitted sketches in reasonable manner.
Then, we introduce the second element, competition.
We recruit additional workers from the crowdsourcing platform to evaluate the sketches, then we provide extra rewards to sketch workers whose submissions are rated in top 20%.
Highly-rated sketches are used in the example-sharing mechanism explained before.
We integrate these two concepts, collaboration and competition into an iterative workflow like this.
In the first iteration, submitted sketches from the workers are reviewed by other workers, then outstanding sketches are selected.
Their authors receive rewards, and they are used as the examples for the next iteration.
And the same process continues for several times.
Here we show the example results of iterative refinement.
In the first iteration, only three parts could be synthesized from the 60 submitted sketches.
After three iterations, most of the major parts become able to be synthesized,
then all the parts contained in the target object was synthesized from only the submissions of the 5th iteration.
The ratio of valid sketches increased from 40% to 80% after five iterations.
Then, let me show you some modeling results.
Interestingly, the chair model contains the back apron part.
This part is not present explicitly in the input image, but its existence can be inferred from the other visible parts. This reproduction may be done by the utilization of real world knowledge of the human workers.
It shows that the workers intensively use their cognitive functions for dealing with sketching.
Our system can accept not only pictures, but also a rough drawing as long as its spatial structure can be interpreted by human workers uniformly.
This drawer model was generated from 60 sketches gathered in 15 minutes.
Let's move on to the evaluation of the method.
In order to say that our system successfully works as a microtask crowdsourcing system, the involved tasks should be easy and light-weight enough.
The required time for task completion was 8 minutes for sketching and 4 minutes for reviewing.
The survey responses from the recruited workers indicate they consider that the task instruction was clear and easy enough, and the payment was satisfactory.
These results show that the tasks were acceptable as microtasks.
Monetary cost can be directly calculated by the rewards for tasks and the platform transaction fee rate, that is 46 dollars per iteration.
This is somewhat expensive and should be dealt to use the system in practical situations.
The required time for completion was less than a few hours.
To show the advantages and shortcomings of our approach against the straightforward macrotask crowdsourcing, we employed a professional modeler using a freelancer platform and compared the cost and the result.
The paid fee was about 45 dollars, which is equivalent to the cost for a single iteration in our system.
The time consumption was about a whole day, including recruiting, negotiation, and the modeling time.
It also required us to write about 10 short messages for negotiation and task instruction, which was quite cumbersome.
The quality of the resulting model was more precise than ours, and it included beautiful chamfers and fillets.
In summary, our approach has its advantages in the small time consumption, low communication cost, and the high availability and scalability thanks to the vast worker pool.
On the other hand, ours has its shortcoming in the lesser output quality than professional work, and also ours costs more than simple outsourcing at present.
We would like to mention the limitations and possible future work for our research.
The most serious limitation is the small number of supported 3D primitives and operations.
We support cuboids, cylinders, and ellipsoids rotated about one of X-Y-Z axes. Existing techniques such as silhouette-based modeling could be applied to extend the complexity of geometries that can be made.
And the current orthogonal projection-based modeling inevitably involves ambiguity in some situations.
For example, a simple input image shown here produces an overlap in the top-view of the projections, and it gives a wrong synthesis result.
We should introduce any mechanism to select the correct geometry from ambiguous candidate to solve the problem.
As the future work, we are thinking of utilizing human computation for solving such problems.
For example, selection of correct geometry from candidates can be processed by voting using microtask crowdsourcing.
More detailed modeling elements such as fillet and chamfer design can be processed by dedicated microtask that involves 2D operations.
Alignments or distribution of elements in a model can also be inferred and specified by human workers to improve the modeling quality.
Future work on such attempts will reveal the potential of microtask crowdsourcing for creative tasks in greater depth.
Then, let me conclude the talk.
We proposed a crowd-powered approach for 3D modeling from a single reference image.
For that, we designed a set of algorithms or 3D synthesis, as well as an iterative crowdsourcing workflow for quality improvement.
We showed the advantages and disadvantages of our approach by evaluation.
Thank you!