一、什么是s3dis
s3dis,即Stanford Large-Scale 3D Indoor Spaces Dataset,是斯坦福大學發布的大規模室內三維空間數據集。它包含了6個建筑物的室內三維地圖和物體標注數據,其中每個建筑物的數據集都包含了數千個點云和高質量的渲染圖像。s3dis提供了豐富的數據資源,被廣泛應用于室內場景分割、多視角圖像生成、室內導航等方向的研究領域。
二、s3dis的數據組成
s3dis的數據集包含了6個建筑物的室內空間,共計超過270萬點的點云數據,以及高質量的渲染圖像和物體標注數據。其中包括了辦公室、教室、會議室、走廊、洗手間等常見室內場景。在每個建筑物中,數據集以房間為單位進行劃分,并標注出了房間中的物體類型,如桌子、椅子、地毯等。 下面是s3dis數據集的一些統計信息:
Building A: 4532 room scans
31 object categories
9 object instances
Building B: 5063 room scans
27 object categories
4 object instances
Building C: 5463 room scans
27 object categories
4 object instances
Building D: 5117 room scans
27 object categories
4 object instances
Building E: 5292 room scans
27 object categories
4 object instances
Building F: 5117 room scans
27 object categories
4 object instances
除了點云數據、渲染圖像和物體標注數據,s3dis還提供了每個物體在室內的3D坐標、旋轉角度和尺寸信息,這為室內場景重建、物體識別提供了有力支撐。
三、s3dis的應用場景
由于s3dis數據集具有真實、多樣、明確的標注信息,因此在室內場景分割、多視角圖像生成、室內導航等領域得到了廣泛應用。
四、s3dis的使用示例
1. 室內場景分割
在室內場景分割方面,s3dis數據集被廣泛應用。下面,我們通過使用s3dis數據集訓練模型,實現一個室內場景分割的樣例。我們使用tensorflow框架和pointnet++網絡結構來實現場景分割。
import numpy as np
import tensorflow as tf
import os
import sys
import time
## 定義pointnet++網絡結構
def pointnet2_ssg(inputs, is_training, bn_decay=None):
# todo: add pointnet++ ssg
return seg_pred
## 數據讀取
def load_data(data_dir):
# todo: load s3dis data
return data, label
if __name__ == '__main__':
data_dir = 'data/s3dis'
model_dir = 'model/s3dis'
if not os.path.exists(model_dir):
os.makedirs(model_dir)
tf.reset_default_graph()
pointclouds_pl = tf.placeholder(tf.float32, shape=(32, 4096, 6))
labels_pl = tf.placeholder(tf.int32, shape=(32, 4096))
is_training_pl = tf.placeholder(tf.bool, shape=())
batch_size = 32
num_point = 4096
num_classes = 13
learning_rate = 0.001
max_epoch = 250
with tf.device('/gpu:0'):
logits = pointnet2_ssg(pointclouds_pl, is_training=is_training_pl, bn_decay=0.7)
loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=labels_pl)
loss = tf.reduce_mean(loss)
tf.summary.scalar('loss', loss)
if bn_decay is not None:
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
optimizer = tf.train.AdamOptimizer(learning_rate)
train_op = optimizer.minimize(loss)
saver = tf.train.Saver()
## 數據讀取
data, label = load_data(data_dir)
num_data = data.shape[0]
## 開始訓練
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
file_writer = tf.summary.FileWriter('logs', sess.graph)
for epoch in range(max_epoch):
idx = np.arange(num_data)
np.random.shuffle(idx)
total_loss = 0
## 按批次進行訓練
for from_idx in range(0, num_data, batch_size):
to_idx = min(from_idx + batch_size, num_data)
batch_data = data[idx[from_idx:to_idx], :, :]
batch_label = label[idx[from_idx:to_idx], :]
## 訓練一個批次
_, batch_loss, batch_logits, summary = sess.run([train_op, loss, logits, merged_summary_op], feed_dict={
pointclouds_pl: batch_data,
labels_pl: batch_label,
is_training_pl: True
})
total_loss += batch_loss
print('Epoch %d, loss %.4f' % (epoch, total_loss))
## 每十個epoch保存一次模型
if epoch % 10 == 0:
saver.save(sess, os.path.join(model_dir, 'model.ckpt'), global_step=epoch)
2. 多視角圖像生成
s3dis數據集包含了大量的高質量渲染圖像,這為多視角圖像生成提供了有力支撐。下面,我們通過使用s3dis數據集中的渲染圖像,訓練一個GAN網絡來生成室內場景中的多視角圖像。
## 定義GAN網絡結構
def generator(inputs, is_training):
# todo: add generator network
return gen_output
def discriminator(inputs, is_training):
# todo: add discriminator network
return dis_output
## 數據讀取
def load_data(data_dir):
# todo: load s3dis data
return data, label, imgs
if __name__ == '__main__':
data_dir = 'data/s3dis'
model_dir = 'model/s3dis'
if not os.path.exists(model_dir):
os.makedirs(model_dir)
tf.reset_default_graph()
z_ph = tf.placeholder(tf.float32, shape=(32, 100))
img_ph = tf.placeholder(tf.float32, shape=(32, 224, 224, 3))
is_training = tf.placeholder(tf.bool, shape=())
## 定義GAN網絡
gen_output = generator(z_ph, is_training=is_training)
dis_real = discriminator(img_ph, is_training=is_training)
dis_fake = discriminator(gen_output, is_training=is_training, reuse=True)
## 定義損失函數
d_loss_real = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=dis_real, labels=tf.ones_like(dis_real)))
d_loss_fake = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=dis_fake, labels=tf.zeros_like(dis_fake)))
d_loss = d_loss_real + d_loss_fake
g_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=dis_fake, labels=tf.ones_like(dis_fake)))
tf.summary.scalar("d_loss", d_loss)
tf.summary.scalar("g_loss", g_loss)
## 定義優化器
gen_vars = [var for var in tf.trainable_variables() if 'Generator' in var.name]
dis_vars = [var for var in tf.trainable_variables() if 'Discriminator' in var.name]
gan_optimizer = tf.train.AdamOptimizer(learning_rate=1e-4)
dis_optimizer = tf.train.AdamOptimizer(learning_rate=2e-4)
gen_optimizer = tf.train.AdamOptimizer(learning_rate=2e-4)
gan_train = gan_optimizer.minimize(g_loss, var_list=gen_vars, global_step=tf.train.get_global_step())
dis_train = dis_optimizer.minimize(d_loss, var_list=dis_vars, global_step=tf.train.get_global_step())
gen_train = gen_optimizer.minimize(g_loss, var_list=gen_vars, global_step=tf.train.get_global_step())
saver = tf.train.Saver()
## 數據讀取
data, label, imgs = load_data(data_dir)
num_data = data.shape[0]
## 開始訓練
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
file_writer = tf.summary.FileWriter('logs', sess.graph)
merged_summary_op = tf.summary.merge_all()
for epoch in range(max_epoch):
idx = np.arange(num_data)
np.random.shuffle(idx)
total_d_loss, total_g_loss = 0, 0
## 按批次進行訓練
for from_idx in range(0, num_data, batch_size):
to_idx = min(from_idx + batch_size, num_data)
batch_z = np.random.normal(size=[batch_size, 100])
## 訓練判別器
_, batch_d_loss, summary = sess.run([dis_train, d_loss, merged_summary_op], feed_dict={
z_ph: batch_z,
img_ph: imgs[idx[from_idx:to_idx]],
is_training: True
})
total_d_loss += batch_d_loss
## 訓練生成器
_, batch_g_loss, summary = sess.run([gen_train, g_loss, merged_summary_op], feed_dict={
z_ph: batch_z,
is_training: True
})
total_g_loss += batch_g_loss
print('Epoch %d, d_loss %.4f, g_loss %.4f' % (epoch, total_d_loss, total_g_loss))
## 每十個epoch保存一次模型
if epoch % 10 == 0:
saver.save(sess, os.path.join(model_dir, 'model.ckpt'), global_step=epoch)
3. 室內導航
利用s3dis數據集,我們可以實現室內導航系統。下面,我們通過使用s3dis數據集和強化學習算法,訓練一個智能體來實現室內導航。
import numpy as np
import tensorflow as tf
import os
import sys
import time
## 定義DQN網絡結構
def DQN(state_ph, action_ph, is_training):
# todo: add DQN network
return Q
## 數據讀取
def load_data(data_dir):
# todo: load s3dis data
return data, label, nav_path
if __name__ == '__main__':
data_dir = 'data/s3dis'
model_dir = 'model/s3dis'
if not os.path.exists(model_dir):
os.makedirs(model_dir)
tf.reset_default_graph()
state_ph = tf.placeholder(tf.float32, shape=(None, 4096, 6))
action_ph = tf.placeholder(tf.int32, shape=(None,))
is_training = tf.placeholder(tf.bool, shape=())
## 定義DQN網絡
Q = DQN(state_ph, action_ph, is_training=is_training)
## 定義損失函數和優化器
target_ph = tf.placeholder(tf.float32, shape=(None,))
action_one_hot = tf.one_hot(action_ph, num_action)
Q_pred = tf.reduce_sum(tf.multiply(Q, action_one_hot), axis=1)
loss = tf.reduce_mean(tf.square(Q_pred - target_ph))
optimizer = tf.train.AdamOptimizer(learning_rate=1e-3)
train_op = optimizer.minimize(loss)
saver = tf.train.Saver()
## 數據讀取
data, label, nav_path = load_data(data_dir)
num_data = data.shape[0]
## 開始訓練
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
file_writer = tf.summary.FileWriter('logs', sess.graph)
for epoch in range(max_epoch):
idx = np.arange(num_data)
np.random.shuffle(idx)
total_loss = 0
## 按批次進行訓練
for from_idx in range(0, num_data, batch_size):
to_idx = min(from_idx + batch_size, num_data)
batch_data = data[idx[from_idx:to_idx], :, :]
batch_nav_path = nav_path[idx[from_idx:to_idx], :, :]
## 訓練一個批次
Q_pred_ = sess.run(Q, feed_dict={
state_ph: batch_data,
is_training: False
})
## 以一定的概率采取隨機