3. • DST has established internal rules around the use of
Big Data
• Data flowing into our data lake is partitioned by,
what we call, Data Domains
• Each DST business unit is in essence at least one
Data Domain
• Data Domains serve as the primary method of
organizing our permission-ing
Big (or not) Data Security
4. • By default, one Business Unit is not granted access
to another’s data
• Agreements between business units are made to
access data for purpose
• Internal Data Scientists are given cross-Business Unit
access to data
• Management mandate to secure data which has not
been explicitly granted access
What This Means
4
5. • These rules result in a very complex matrix of permissions
• Example below
• Data Doman ‘Business Unit A’ may be accessed by Business Unit A and Business
Unit D. Business Units B and C may not access this Data Domain
Complexity
5
BU A BU B BU C BU D
DataDomain
Business Unit A X X
Business Unit B X X
Business Unit C X X X
Third Party Data X X
6. • Let’s deal with just text data on a file system in a Linux server
• Logical approach is to arrange directories to track with the Data Domains
• For permission-ing, create a group and directory for each Data Domain
• Assign the group ownership as appropriate
• Set umask to 007 – new files to have u:rw-, g:rw-, o:--- permissions
Scenario
6
10. • The directory for the Data Domain ‘Business Unit A’ can be accessed by
members of the ‘bua’ group
• How can we grant additional access to the ‘bud’ group, but still restrict
other groups?
Complexity Redux
10
BU A BU B BU C BU D
DataDomain
Business Unit A X X
Business Unit B X X
Business Unit C X X X
Third Party Data X X
11. • POSIX Access Control Lists (ACLs) are the answer to our dilemma
• Not enabled by default. Needs to be enabled at the filesystem level
• mount with the remount and acl options can enable
• mount –o remount –o acl /dev/sda5 /home
• See your system administrator for the permanent enable
The Secret Sauce
11
12. • setfacl is used to set the ACL for a file or directory
• getfacl is used to query and list the ACL of a file or directory
• Our specific need:
• In addition to rwx permissions for the group ‘buag’, add rwx permissions for
the group ‘budg’ to the directory ‘bua’
• In addition to rwx permissions for the group ‘bubg’, add rwx permissions for
the group ‘budg’ to the directory ‘bub’
• In addition to rwx permissions for the group ‘bucg’, add rwx permissions for
the groups ‘bubg’ and ‘budg’ to the directory ‘buc’
• In addition to rwx permissions for the group ‘tpdg’, add rwx permissions for the
groups ‘bucg’ and ‘budg’ to the directory ‘tpd’
The Tools
12
13. • In addition to rwx permissions for the group ‘buag’, add rwx permissions
for the group ‘budg’ to the directory and contents of ‘bua’
• setfacl –R --set u::rwx,g::rwx,o::-,g:budg:rwx bua
• In addition to rwx permissions for the group ‘bubg’, add rwx permissions
for the group ‘budg’ to the directory and contents of ‘bub’
• setfacl –R --set u::rwx,g::rwx,o::-,g:budg:rwx bub
• In addition to rwx permissions for the group ‘bucg’, add rwx permissions
for the groups ‘bubg’ and ‘budg’ to the directory and contents of ‘buc’
• setfacl –R --set u::rwx,g::rwx,o::-,g:bubg:rwx,g:budg:rwx buc
• In addition to rwx permissions for the group ‘tpdg’, add rwx permissions
for the groups ‘bucg’ and ‘budg’ to the directory and contents of ‘tpd’
• setfacl –R --set u::rwx,g::rwx,o::-,g:bucg:rwx,g:budg:rwx tpd
The Commands
13
15. • Hadoop HDFS v2.6 adds POSIX ACLs
• Make sure to turn it on first
hdfs-site.xml
<property>
<name>dfs.namenode.acls.enabled</name>
<value>true</value>
</property>
• Reboot the namenode
• Set an ACL
hdfs dfs -setfacl -m u::rwx,g::rwx,o::-,g:budg:rwx /bua
• See the ACLs
hdfs dfs –getfacl /bua
How To Hadoop It
15
16. • Use a Default ACL for Automatic Application to New Children
sudo setfacl -d --set u::rwx,g::rwx,o::-,g:budg:rwx bua
sudo setfacl -d --set u::rwx,g::rwx,o::-,g:budg:rwx bub
sudo setfacl -d --set u::rwx,g::rwx,o::-,g:bubg:rwx,g:budg:rwx buc
sudo setfacl -d --set u::rwx,g::rwx,o::-,g:bucg:rwx,g:budg:rwx tpd
• And in Hadoop…
hadoop fs -setfacl --set d:u::rwx,d:g::rwx,d:o::-,d:g:budg:rwx bua
hadoop fs -setfacl --set d:u::rwx,d:g::rwx,d:o::-,d:g:budg:rwx bub
hadoop fs -setfacl --set d:u::rwx,d:g::rwx,d:o::-,d:g:bubg:rwx,d:g:budg:rwx buc
hadoop fs -setfacl --set d:u::rwx,d:g::rwx,d:o::-,d:g:bucg:rwx,d:g:budg:rwx tpd
Other Goodies
16
18. • Don’t forget about the sticky bit
• Makes it so that only root or the directory owner can delete files
sudo chmod +t bua
• Use the setgid bit to set new files in a directory to have the same group
owner as the directory.
• Very handy when paired with default ACLS
sudo chmod g+s bua
Last Extra Bits
18